What is dbNSFP?
dbNSFP is a database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome. Its current version is based on the GENCODE release 47 (Ensembl version 113) and includes a total of 83,681,419 potential nsSNVs and ssSNVs (splicing-site SNVs) of all known protein-coding genes in the human genome.
The database compiles prediction scores from 34 algorithms, including:
SIFT, SIFT4G, PROVEAN, Polyphen2-HDIV, Polyphen2-HVAR, MutationTaster 2021, MutationAssessor, FATHMM-XF coding, CADD, VEST4, DANN, MetaSVM, MetaLR, MetaRNN, Eigen, Eigen-PC, M-CAP, REVEL, MutPred, MVP, gMVP, MPC, PrimateAI, DEOGEN2, ALoFT, BayesDel, ClinPred, LIST-S2, VARITY, ESM1b, AlphaMissense, PHACTboost, MutFormer, and MutScore.
9 conservation scores, including:
PhyloP (3 versions), phastCons (3 versions), GERP++, GERP_91_mammals, and bStatistic.
And observed allele frequencies in:
The 1000 Genomes Project, gnomAD v2.1.1 (including non-neuro, non-cancer and control sample subsets) and v4.1, TOPMed, All of Us, RGC Million Exome and ALFA (aggregated from dbGaP and dbSNP).
Moreover, dbNSFP provides related gene information, including:
Various gene IDs from different databases.
Function descriptions, expression data and related disease of genes from various sources, including The Human Protein Atlas, Uniprot, OMIM, consensusPathDB, KEGG pathway, The Human Phenotype Ontology, GWAS catalog, ClinGen Dosage Sensitivity, etc.
For a full list of data included in dbNSFP, please check the README files in the Releases page.
We welcome developers of functional prediction methods to provide their predictions and scores to the database. Please contact us at collaboration@dbnsfp.org.