Citation
Ansell BRE, Pope BJ, Georgeson P, Emery-Corbin SJ, Jex AR. Annotation of the Giardia proteome through structure-based homology and machine learning. GigaScience. 2018;8: 846. doi:10.1093/gigascience/giy150
Definitions of computed statistics for protein models
Value | Definition |
---|---|
GeneID | Gene Identifier |
Code_Chain | Closest structural homologue in PDB (reference) database |
Molecule | Description of PDB reference molecule |
Chain_match | Indicates whether a molecular description and amino acid length could be extracted for the Code_Chain. This is often not the case for ribosomal proteins with multiple chain IDs, labelled 'Uncertain'. |
TM | Template modeling score - describes overall goodness of fit between query and reference structures |
RMSD | Root mean squared deviation in alpha carbon atoms between query and reference |
AA_%ID | Percentage amino acid identity between query and reference |
Cov | Three dimensional coverage of query structure relative to reference |
C_score | Convergence of simulated structures |
C_score_sd | Standard deviation of expected Cscore |
TM_model | Expected TM score |
RMSD_model | Expected RMSD |
Length_ratio | Ratio of query peptide AA length to reference peptide AA length |
SS_sd | Standard deviation of secondary structure proportions (lower indicates greater complexity) |
Match_status | exactMatch: query and reference peptides are annotated with at least 1 identical PFAM code; noMatch: no matching PFAM codes |
Exact_match_prediction | Likelihood of exactMatch between query and reference, output from random forest classifier |
Confidence_category | Confidence in model (query) structure based on agreement between actual and predicted PFAM match status.
|