Protein level predictions 1 - Pathogenic or not predictors
This site collects together tools which predict whether the mutation changing the amino acid of the protein is considered to increase disease susceptibility or considered to be benign.
- nsSNP Analyzer
- PolyPhen (1 and 2)
Align GVGD is a web-based program that combines biophysical characteriscs of amino acids and protein multiple sequence alignments to predict where missense substitutions are enriched deleterious or enriched neutral. As an input the program needs protein multiple sequence alignments and list of substitutions. The algorithm is very dependent on the quality of the alignment.
Tavtigian et al. Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral.J Med Genet. 2005 Jul 13. doi:10.1136/jmg.2005.033878
Mathe et al. Computational approaches for predicting the biological effect of p53 missense mutations: a comparison of three sequence analysis based methods. Nucleic Acids Res. 2006 Mar 6;34(5):1317-25. Print 2006. doi:10.1093/nar/gkj518
Bongo (“Bonds ON Graph”) is a structure based approach to predict structural effects of nsSNPs. It considers protein structures as residue-residue interaction networks and applies graph theoretical measures to identify residues that are critical for maintaining structural stability by assessing the consequences on the interaction network of single point mutations. Bongo is able to identify mutations that cause both local and global structural effects. As input Bongo needs a protein structure. Results indicate that structural changes resulting from nsSNPs are closely related to their pathological consequences.
References: Cheng et al. Prediction by graph theoretic measures of structural effects in proteins arising from non-synonymous single nucleotide polymorphisms. PLoS Comput.Biol., 2008, 4, 7, e1000135. doi: 10.1371/journal.pcbi.1000135.
CanPredict is a computational tool for predicting Cancer-associated mutations. As an input it requires either protein accession number or protein sequence in FASTA format and changes to be tested.
References: Kaminker et al. Distinguishing cancer-associated missense mutations from common polymorphisms. Cancer Research 67, 465-73. doi:10.1093/nar/gkm405
LS-SNP/PDB is a new WWW source for genome-wide annotation of human non-synonymous SNPs. which serves high-quality protein graphics rendered wit UCSF Chimera molecular visualization software. It builds on the LS-SNP. LS-SNP/PDB annotates all human SNPs that produce an amino acid change in a protein structure in PDB using the following features: local structural environment, putative binding interactions and evolutionary conservation. SNPs can be searched either by using spesific rs ID or then by using several other IDs spesifying gene or protein of interest or even by using the genomic region.
Reference: Ryan et al. LS-SNP/PDB: annotated non-synonymous SNPs mapped to Protein Data Bank structures. Bioinformatics, 2009, 25, 11, 1431-1432. doi:10.1093/bioinformatics/btp242
MAPP combines an alignment with amino acid physicochemical characteristics to calculate the physicochemical centroid of each position and the variance between each of the 20 amino acids and that centroid. As an input it needs an alignment of the protein sequences and a tree describing the distances between the sequences in the alignment. As an output, the user gets a many-column table that gives the physicochemical characteristics of each position; the MAPP impact score, which is a continuous variable, for all 20 amino acids at each position; and a listing of which amino acids should be deleterious and which should be neutral.
Reference: Stone and Sidow. Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res., 2005, 15, 7, 978-986.doi:10.1101/gr.3804205
msSNPAnalyzer is system to capture relationship between SNPs associated with disease and disease-causing genes. As an input it needs protein sequence in the fasta format and SNP data. In addition user can provide own PDB file and chain. The method is based on random forests.
Reference: Bao et al. nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms.Nucleic Acids Res., 2005, 33, Web Server issue, W480-2. doi:10.1093/nar/gki372
Panther estimates the likelihood of a particular nonsynonymous coding SNP to cause a functional impact on protein by calculating subPSEC (substitution position-spesific evolutionary conservation score). As an input it needs a protein sequence and information about the substitution. A number of outputs are given, the most usefull being the probability that a variant is deleterious.
Reference: Thomas et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res., 2003, 13, 9, 2129-2141. doi:10.1101/gr.772403
Parepro(Prediction of amino acid replacement probability) is a method of identifying which non-synonymous single base changes have a deleterious effect on protein function, based on support vector machine (SVM). As an input it requires the protein sequence and other protein sequences homologous to it.
References: Tian et al .Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines. BMC Bioinformatics, 2007, 8, 450. doi:10.1186/1471-2105-8-450
Predictor of human Deleterious Single Nucleotide Polymorphisms (PhD-SNP) is based a SVM-based classifier. As an input it requires protein sequence and the position of the SNP. Three slightly different algorithms are available to use; 'sequence-based', 'hybrid method' and 'sequence and profile-based'. This tool cannot handle batch input of SNPs.
Reference: Capriotti et al. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics, 2006, 22, 22, 2729-2734. doi:10.1093/bioinformatics/btl423
PMut combaines sequence alignment/PSSM with structural factors to characterize missense substitutions.To accomplish this it uses a feed-forward neural network. The neural network used in the analysis has been trained with the large database of disease-accociated and neutral mutations. As an input Pmut needs a sequence of the protein or its SWISSProt/trEMBL code. As an output user gets the confidence index and a binary prediction of "neutral" vs "pathological" represented by pathogenicity index. It is also possible for the user to get all the intermediate information (alignments and Blast and PHD outputs) used by PMut while generating a prediction. Also if the protein structure is available the PMut server allows the display of the mutation site on the protein structure using a color code to trace the pathogenicity associated with the mutation. This 3D visualisation is obtained as a Rasmol script and the user needs either a Rasmol or Chime plug-in to see the visualization. In addition to this PMut allows the detection of mutational hotspots.
Reference: Ferrer-Costa et al. PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics, 2005, 21, 14, 3176-3178. doi:10.1093/bioinformatics/bti486
PolyPhen models SNP effects using both structure and sequence information. It calculates a PSIC score, which is the difference in fitness between wild-type and mutant amino acid, and then converts this information into a 3-category classification of mutation: benign, possibly damaging, probably damaging. As an input program needs either protein identifier or protein sequence in fasta format and the position of the SNP. Note! New version of Polyphen (Polyphen-2) is now available and it can be accessed through the webpage of the original Polyphen.
Download: Polyphen-2 available from Polyphen website
Reference: Ramensky et al. Human non-synonymous SNPs: server and survey. Nucleic Acids Res., 2002, 30, 17, 3894-3900. http://nar.oxfordjournals.org/cgi/reprint/30/17/3894
SIFT (Sorting Intolerant from Tolerant) is based on homology comparisons and does not require structural information. It uses sequence alignments to create a Dirichlet mixtures-based score matrix for each position in the alignment.The score for each possible amino acid substitution is converted to a normalized probability that the substitution would be evolutionarily tolerated (the SIFTscore).
Ng and Henikoff. Predicting deleterious amino acid substitutions. Genome Res., 2001, 11, 5, 863-874. doi:10.1101/gr.176601
Ng and Henikoff. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res., 2003, 31, 13, 3812-3814. doi:10.1093/nar/gkg509
SNAP is a method for evaluating effects of single amino acid substitutions on protein function. As an input it needs a protein sequence and list of substitutions.
Reference: Bromberg and Rost. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res., 2007, 35, 11, 3823-3835. doi:10.1093/nar/gkm238
SNPs3D is a support vector machine (SVM) based tool. It was trained on a set of mutations causative of disease, and a control set of non-disease causing mutations. In jack-knifed testing, the method identifies 74% of disease mutations, with a false positive rate of 15%.This tool strongly supports the hypothesis that loss of protein stability is a major factor contributing to monogenic disease. Goal of the tool is to provide a general and fully automatic stability perturbation model that can be used for analysis of the impact of non-synonymous single nucleotide polymorphisms (SNPs) found in the human population. SNPs3D website offers results of the analysis which can either be browsed in the website or downloaded to the local machine. Results can be searched by giving a SNP id or protein or genomic sequence id .
Reference: Yue et al. SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics, 2006, 7, 166. doi:10.1186/1471-2105-7-166
topoSNP produces an interactive visualization of disease and non-disease associated non-synonymous single nucleotide polymorphisms (nsSNPs) and displays geometric and relative entropy calculations.
Reference: Stitziel et al.topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D520-2. doi: 10.1093/nar/gkh104.
FitSNP is database of highly differentially expressed genes which are more likely to have variants associated with disease
- Gene Prospector
Gene Prospector is a bioinformatics tool designed to sort, rank, and display information about genes in relation to human diseases, risk factors and other phenotypes. Links are provided to evidence from published literature and to other online data sources. Search terms can include diseases, risk factors and phenotypes
Diseasome is an integrated database of Genes, Genetic variation, and Diseases.
- Janita Thusberg and Mauno Vihinen. Pathogenic or Not? And If So, Then How? Studying the Effects of Missense Mutations Using Bioinformatics Methods. Hum Mutat. 2009 May;30(5):703-14: doi:10.1002/humu.20938