The wfMESHI-TIGRESS Branch

Posted by in WeFold3

WeFold – the wfMESHI-TIGRESS branch

Tomer Sidi and Chen Keasar, Ben-Gurion University of the Negev, Israel,
Melis Onel, Utkarsh Shah, Chris Kieslich, and Christodoulos A. Floudas, Texas A&M University, USA, and
S.N. Crivelli, Lawrence Berkeley National Laboratory, USA

chris.kieslich@gmail.com

WeFold [1] is an open collaboration initiative for protein structure prediction within CASP. It brings together labs and individuals through the science gateway http://wefold.nersc.gov/ and provides computing and storage resources through the National Energy Research Scientific Computing (NERSC) center. WeFold enables the interaction among groups that work on different components of the protein structure prediction pipeline, thus making it possible to leverage expertise at a scale that has not been done before. The combination of these components creates hybrid protein structure prediction pipelines, each submitting its own models. This collaboration aims to promote a synergistic effect among the participants and ultimately produce better results than those achieved by the individual methods. In its third round, the collaboration resulted in 12 different pipelines. Here we describe the wfMESHI-TIGRESS branch, which combines decoys of CASP servers, scoring by MESHI_Score and refinement by Princeton_TIGRESS [2].

Methods

In the first stage of this WeFold branch, the MESHI group downloaded server decoys from the CASP web site, and scored them. To this end we used practically the same protocol as MESHI_SERVER, applying it to complete decoy sets (the T0XXX.3D.srv.tar.gz tarballs), and uploading the list of scored decoys to the weFold site. The MESHI_SERVER protocol is described in its own abstract. In a nutshell, it first standardizes the decoys by scwrl4 [6] rotamer optimization followed by energy minimization. Then, the protocol extracts 106 structural features from each decoy and feed them to MESHI_Score [7], an ensemble of a thousand independent predictors. Each of these predictors is trained to predict decoy qualities using a unique subset of the features. The final score is the weighted median of the thousand individual scores.

For the second stage of wfMESHI-TIGRESS, the FLOUDAS group applied protein refinement via Princeton_TIGRESS to the top 5 unique decoys identified by the MESHI_SERVER. The decoys were submitted to the Princeton_TIGRESS webserver, following the same procedure utilized by FLOUDAS_REFINESERVER. Princeton_TIGRESS utilizes a strategy consisting of separate sampling and selection stages, with sampling involving CYANA [3] torsion angle dynamics and Rosetta FastRelax [4], and selection based on an SVM predictor. The SVM model includes a decomposition of physics-based and hybrid energy functions, as well as a geometry-free representation of the protein structure through distance-binning Cα-Cα distances to capture fine-grained movements. Following selection, CHARMM [5] molecular dynamics simulations are utilized for further refinement. A more elaborate description of the refinement protocol can be found in the abstract of FLOUDAS_REFINESERVER. The protocol was followed consistently with no manual intervention.

Availability
• Princeton_TIGRESS refinement webserver is available at http://atlas.engr.tamu.edu/refinement/.
• The MESHI software package (version 9.29, which was used in CASP12) is available in:
https://www.dropbox.com/sh/mb31bjdvvydhuzh/AADVcclTZKtFiSl6I9hBx8Dxa?dl=0

1. Khoury, G. A., Liwo, A., Khatib, F., Zhou, H., Chopra, G., Bacardit, J., Bortot, L., Delbum, A. C. B., Deng, X., Faccioli, R., He, Y., Krupa, P., Li, J., Mozolewska, M., Baker, D., Cheng, J., Floudas, C. A., Keasar, C., Levitt, M., PopaviL, Z., Scheraga, H. A., Skolnick, J., Crivelli, S. N. & Players, F. (2014). WeFold: A Coopetition for Protein Structure Prediction. Proteins: Structure, Function, Bioinformatics 82, 1850-1868.
2. Khoury, G. A., Tamamis, P., Pinnaduwage, N., Smadbeck, J., Kieslich, C. A. & Floudas, C.A. (2014). Princeton_TIGRESS: Protein geometry refinement using simulations and support vector machines. Proteins: Structure, Function, and Bioinformatics 82, 794-814.
3. Guntert, P. (2004). Automated NMR structure calculation with CYANA. METHODS IN MOLECULAR BIOLOGY-CLIFTON THEN TOTOWA- 278, 353-378.
4. Leaver-Fay, A., Tyka, M., Lewis, S. M., Lange, O. F., Thompson, J., Jacak, R., Kaufman, K.,
Renfrew, P. D., Smith, C. A., Sheffler, W., Davis, I. W., Cooper, S., Treuille, A., Mandell, D.
J., Richter, F., Ban, Y. E., Fleishman, S. J., Corn, J. E., Kim, D. E., Lyskov, S., Berrondo, M.,
Mentzer, S., Popovic, Z., Havranek, J. J., Karanicolas, J., Das, R., Meiler, J., Kortemme, T.,
Gray, J. J., Kuhlman, B., Baker, D. & Bradley, P. (2011). ROSETTA3: an object-oriented
software suite for the simulation and design of macromolecules. Methods in Enzymology 487,
545-74.
5. MacKerell, J., A. D., Brooks, B., Brooks, III, C.L., Nilsson, L., Roux, B., Won, Y., and Karplus, M. (1998). CHARMM: The Energy Function and Its Parameterization with an Overview of the Program. In The Encyclopedia of Computational Chemistry (al., P. v. R. S.e., ed.), Vol. 1, pp. 271-277. John Wiley & Sons: Chichester.
6. Krivov,G.G. et al. (2009) Improved prediction of protein side-chain conformations with SCWRL4. Proteins, 77, 778–795.
7. Mirzaei,S. et al. (2016) Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics, in press.