What is dbNSFP?
dbNSFP is a database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome. Its current version is based on the Gencode release 46 / Ensembl version 112 and includes a total of 83,572,434 potential nsSNVs and ssSNVs (splicing-site SNVs).
The database compiles prediction scores from 34 algorithms, including:
SIFT, SIFT4G, PROVEAN, Polyphen2-HDIV, Polyphen2-HVAR, MutationTaster 2021, MutationAssessor, FATHMM-XF coding, CADD, VEST4, DANN, MetaSVM, MetaLR, MetaRNN, Eigen, Eigen-PC, M-CAP, REVEL, MutPred, MVP, gMVP, MPC, PrimateAI, GEOGEN2, ALoFT, BayesDel, ClinPred, LIST-S2, VARITY, ESM1b, AlphaMissense, PHACTboost, MutFormer, and MutScore.
9 conservation scores, including:
PhyloP (3 versions), phastCons (3 versions), GERP++, GERP_91_mammals, and bStatistic.
And observed allele frequencies in:
The 1000 Genomes Project Phase 3, gnomAD v2.1.1 (including non-neuro, non-cancer and control sample subsets) and v4.1, TOPMed freeze 8 and ALFA.
Moreover, dbNSFP provides related gene information, including:
Various gene IDs from different databases.
Function descriptions, expression data and related disease of genes from various sources, including The Human Protein Atlas, Uniprot, OMIM, consensusPathDB, KEGG pathway, The Human Phenotype Ontology, GWAS catalog, ClinGen Dosage Sensitivity, etc.
For a full list of the data sources, please refer to the current version of the README file.
We welcome developers of functional prediction methods to provide their predictions and scores to the database. Please contact us at collaboration@dbnsfp.org.