Predictein

Citation

Ansell BRE, Pope BJ, Georgeson P, Emery-Corbin SJ, Jex AR. Annotation of the Giardia proteome through structure-based homology and machine learning. GigaScience. 2018;8: 846. doi:10.1093/gigascience/giy150

Definitions of computed statistics for protein models

Value	Definition
GeneID	Gene Identifier
Code_Chain	Closest structural homologue in PDB (reference) database
Molecule	Description of PDB reference molecule
Chain_match	Indicates whether a molecular description and amino acid length could be extracted for the Code_Chain. This is often not the case for ribosomal proteins with multiple chain IDs, labelled 'Uncertain'.
TM	Template modeling score - describes overall goodness of fit between query and reference structures
RMSD	Root mean squared deviation in alpha carbon atoms between query and reference
AA_%ID	Percentage amino acid identity between query and reference
Cov	Three dimensional coverage of query structure relative to reference
C_score	Convergence of simulated structures
C_score_sd	Standard deviation of expected Cscore
TM_model	Expected TM score
RMSD_model	Expected RMSD
Length_ratio	Ratio of query peptide AA length to reference peptide AA length
SS_sd	Standard deviation of secondary structure proportions (lower indicates greater complexity)
Match_status	exactMatch: query and reference peptides are annotated with at least 1 identical PFAM code; noMatch: no matching PFAM codes
Exact_match_prediction	Likelihood of exactMatch between query and reference, output from random forest classifier
Confidence_category	Confidence in model (query) structure based on agreement between actual and predicted PFAM match status. HiConf Match_status = exactMatch and Exact_match_prediction > 0.5 HiConf-like Match_status = noMatch and Exact_match_prediction > 0.5 LowerConf Match_status = noMatch and Exact_match_prediction ≤ 0.5 LowerConf-like Match_status = exactMatch and Exact_match_prediction ≤ 0.5