The wfMESHI-Seok Branch

Posted by in WeFold3

WeFold – the wfMESHI-Seok branch
T. Sidi and C. Keasar, Ben-Gurion University of the Negev, Israel, and
Lim Heo, Gyu Rie Lee, Minkyung Baek, and Chaok Seok, Department of Chemistry, Seoul National University, Seoul 151-747, Republic of Korea

chaok@snu.ac.kr

WeFold is an open collaboration initiative for protein structure prediction within CASP. It brings together labs and individuals through the science gateway http://wefold.nersc.gov/ and provides computing and storage resources through the National Energy Research Scientific Computing (NERSC) center. WeFold enables the interaction among groups that work on different components of the protein structure prediction pipeline, thus making it possible to leverage expertise at a scale that has not been done before. The combination of these components creates hybrid protein structure prediction pipelines, each submitting its own models. This collaboration aims to promote a synergistic effect among the participants and ultimately produce better results than those achieved by the individual methods. In its third round, the collaboration resulted in 12 different pipelines. Here we describe the wfMESHI-Seok branch, which combines decoys of CASP servers, scoring by MESHI_Score and refinement and selection by GalaxyRefine.

Methods
In the first stage of this weFold branch, the MESHI group downloaded server decoys from the CASP web site, and scored them. To this end we used practically the same protocol as MESHI_SERVER, applying it to complete decoy sets (the T0XXX.3D.srv.tar.gz tarballs), and uploading the list of scored decoys to the weFold site. The MESHI_SERVER protocol is described in its own abstract. In a nutshell, it first standardizes the decoys by scwrl4 rotamer optimization followed by energy minimization. Then, the protocol extracts 106 structural features from each decoy and feeds them to MESHI_Score, an ensemble of a thousand independent predictors. Each of these predictors is trained to predict decoy qualities using a unique subset of the features. The final score is the weighted median of the thousand individual scores.
The number of server models to further refine was reduced to 48 by taking the models with the highest MESHI scores that are structurally distinct with mutual TMscore lower than 95. The representative 48 models were refined by using GalaxyRefine [1]. This refinement method involves repetitive short molecular dynamics relaxations after perturbing the structure by sidechain repacking. Backbone conformational change is driven by sidechain repacking in this way. The energy function used by GalaxyRefine is a hybrid of molecular mechanics-based energy components and knowledge-based components. In CASP12, a newly developed knowledge-based potential that considers solvation states of interacting atoms as well as their distances replaced the dDFIRE potential in the previous energy function used in CASP11. The refined models were ranked by the new knowledge-based potential, and the top five models were submitted as final predictions.

1. Heo,L., Park,H. & Seok,C. (2013) GalaxyRefine: Protein structure refinement driven by side-chain repacking. Nucleic Acids Res. 41 (W1), W384-W388.