The wfAll-Cheng Branch

Posted by in WeFold3

Tertiary Structure Prediction by wfAll-Cheng
Jie Hou1, Badri Adhikari1, Renzhi Cao2, Silvia Crivelli3, and Jianlin Cheng1*
1 - Department of Computer Science, University of Missouri, Columbia, MO 65211, USA, 2 - Department of Computer Science, Pacific Lutheran University, WA 98447, USA, 3 - Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

WeFold [1] is a community-wide cooperation project for protein structure prediction within CASP. As part of the WeFold collaborative, we evaluated all of the WeFold models generated by other WeFold branches and selected 5 top ranked models, followed by model refinement.

Our wfAll-Cheng server firstly collected all of the WeFold models and CASP12 server models for each target. Then the models with high similarity within each group, or non-full length models were filtered out from the model pool. All the full-length models were evaluated by a fully pairwise model comparison method -- APOLLO [2], and a single-model quality assessment method -- Qprob [3]. APOLLO calculates the averaged GDT-TS score between pairwise structures in the model pool and generates the global quality scores. Qprob generates a predicted GDT-TS score for each model, and all the models are ranked based on GDT-TS score from high to low.
The wfAll-Cheng server averaged two rankings for each model, the APOLLO score and the Qprob score, and generated consensus rankings for all models. Specifically, when the highest APOLLO’s GDT-TS score was less than 0.2 (eg., likely free-modeling target), only Qprob ranks were considered to select the final top models since it has shown good performance for the assessment of model quality of hard targets [3]. On the other hand, when the top APOLLO’s GDT-TS score was larger than 0.8 (eg., easy target), we used APOLLO’s score to select the final models.
The selected top 5 models were refined by ModRefiner [4] to improve the global and local structures. Finally, the local quality scores by ModFOLDclustQ [5] were added into models before their submission to CASP.

We evaluated wfAll-Cheng along with CASP12 server predictors on 11 human targets whose experimental structures were released to date. The sum of Z-scores of the first (i.e. TS1) models predicted by these predictors for the 11 targets was reported in Table 1. The Z-score of a model was calculated as the model's GDT-TS score minus the average GDT-TS score of all the models in the model pool of a target divided by the standard deviation of GDT-TS scores. A negative Z score is converted to 0 during summation of Z-scores for a predictor.

Table 1. The top 10 predictors were ranked based on the summation of the Z scores. wfAll-Cheng and MULTICOM (for details, see our CASP12 abstract entitle “Tertiary Structure Prediction by the MULTICOM Human Group”) were human predictors, while all others were server predictors. The 11 targets are T0859, T0862, T0863, T0864, T0868, T0869, T0870, T0872, T0900, T0904 and T0944.

RANK Predictor name Sum of Z_score
1 wfAll-Cheng (Human) 17.05
3 MULTICOM (Human) 16.29
4 GOAL 11.04
5 Zhang-Server 8.95
6 QUARK 7.88
9 RaptorX 6.15
10 RaptorX-Contact 5.87

1 Khoury, G. A. et al. WeFold: a coopetition for protein structure prediction. Proteins: Structure, Function, and Bioinformatics 82, 1850-1868 (2014).
2 Wang, Z. APOLLO: a quality assessment service for single and multiple protein models. Bioinformatics 27, doi:10.1093/bioinformatics/btr268 (2011).
3 Cao, R. & Cheng, J. Protein single-model quality assessment by feature-based probability density functions. Scientific reports 6 (2016).
4 Xu, D. & Zhang, Y. Improving the physical realism and structural accuracy of protein models by a two-step atomic-level energy minimization. Biophysical journal 101, 2525-2534 (2011).
5 McGuffin, L. & Roche, D. Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments. Bioinformatics 26 (2010).