Citation

Ansell BRE, Pope BJ, Georgeson P, Emery-Corbin SJ, Jex AR. Annotation of the Giardia proteome through structure-based homology and machine learning. GigaScience. 2018;8: 846. doi:10.1093/gigascience/giy150

Definitions of computed statistics for protein models

Value Definition
GeneID Gene Identifier
Code_Chain Closest structural homologue in PDB (reference) database
Molecule Description of PDB reference molecule
Chain_match Indicates whether a molecular description and amino acid length could be extracted for the Code_Chain. This is often not the case for ribosomal proteins with multiple chain IDs, labelled 'Uncertain'.
TM Template modeling score - describes overall goodness of fit between query and reference structures
RMSD Root mean squared deviation in alpha carbon atoms between query and reference
AA_%ID Percentage amino acid identity between query and reference
Cov Three dimensional coverage of query structure relative to reference
C_score Convergence of simulated structures
C_score_sd Standard deviation of expected Cscore
TM_model Expected TM score
RMSD_model Expected RMSD
Length_ratio Ratio of query peptide AA length to reference peptide AA length
SS_sd Standard deviation of secondary structure proportions (lower indicates greater complexity)
Match_status exactMatch: query and reference peptides are annotated with at least 1 identical PFAM code; noMatch: no matching PFAM codes
Exact_match_prediction Likelihood of exactMatch between query and reference, output from random forest classifier
Confidence_category Confidence in model (query) structure based on agreement between actual and predicted PFAM match status.
HiConf
Match_status = exactMatch and Exact_match_prediction > 0.5
HiConf-like
Match_status = noMatch and Exact_match_prediction > 0.5
LowerConf
Match_status = noMatch and Exact_match_prediction ≤ 0.5
LowerConf-like
Match_status = exactMatch and Exact_match_prediction ≤ 0.5