LSDB - Controlled vocabulary terms
| Contributed by: | Juha Muilu |
| Originally posted: | 8th January 2010: 12:01 pm |
| Last updated: | 19th July 2011: 11:36 am |
| Short URL: | http://gen2phen.org/node/11356 |
Note: This is a working document. Final information will be collected into the VarioML wiki pages.
Misc
Evidence codes
Mutation events
Inheritance pattern (of phenotype, note: look also inheritance ontology. http://www.human-phenotype-ontology.org/index.php/hpo_docu.html)
- 'familial'
- 'familial, consanguineous parents'
- 'familial, autosomal dominant'
- 'familial, autosomal recessive'
- 'familial, X-linked'
- 'sporadic'
- 'sporadic, consanguineous parents'
- 'sporadic, consanguineous parents (1st degree)'
- 'sporadic, consanguineous parents (2nd degree)'
- 'sporadic, consanguineous parents (3rd degree)'
- 'sporadic, non-consanguineous parents'
- 'sporadic, consanguinity parents?'
- 'sporadic? (parents not tested)'
Genetic source
- 'de novo'
- 'de novo, maternal chromosome'
- 'de novo, paternal chromosome'
- 'de novo, from either parent'
- 'inherited'
- 'inherited, maternal chromosome'
- 'inherited, paternal chromosome'
- 'inherited, from either parent
- somatic
Variant
Consequence
See for example sequence ontology: List of variant effect terms
From DMuDB:
Complex frameshift: Frameshift involving insertions and deletions
Exon deletion: Deletion encompassing a whole exon or exons, frameshift status unknown
Exon duplication: Duplication of one of more exons, frameshift status unknown
Frameshift: Deletion or insertion causing reading frame shift
In-frame deletion: Deletion of a whole codon or codons. Can include deletion of one or more exons
In-frame duplication: A duplication that does not change the reading frame. Can include one or more exons
In-frame insertion: An insertion of a whole codon or codons. Can include one or more exons
Intronic variant: A variant in an intron which has not been shown to affect splicing
Missense: Substitution resulting in a change to a different amino acid
Nonsense: Substitution resulting in a change to a stop codon
Out of frame deletion: Deletion of part of a codon or number of codons resulting in a frameshift. Can include one or more exons
Out of frame duplication: A duplication that changes the reading frame. Can include one or more exons
Out of frame insertion: An insertion of part of a codon or number of codons resulting in a frameshift. Can include one or more exons
Silent: A nucleotide change that does not change the amino acid
Splice site variant: A mutation that affects splicing
Number of independent observations of a DNA variant (Frequency in XML)
Example values from the Data sharing between LSDBs paper
found once (should it be "found at least once" ?. Same with other terms)
2–10 times
11–99 times
over 100 times
Origin (note this field will be replaced by genetic source and inheritance pattern . JM DEC 2010)
Source LOVD
in vitro (cloned) familial familial, consanguineous parents familial, autosomal dominant familial, autosomal recessive familial, X-linked sporadic sporadic, consanguineous parents sporadic, consanguineous parents (1st degree) sporadic, consanguineous parents (2nd degree) sporadic, consanguineous parents (3rd degree) sporadic, non-consanguineous parents sporadic, consanguinity parents? sporadic? (parents not tested) uniparental disomy de novo de novo, somatic mosaicism de novo, germline mosaicism de novo, germline and somatic mosaicism de novo, in patient de novo, in patient (maternal allele) de novo, in patient (paternal allele) de novo, in mother de novo, in mother (grandmaternal allele) de novo, in mother (grandpaternal allele) de novo, in father de novo, in father (grandmaternal allele) de novo, in father (grandpaternal allele) uniparental disomy, maternal allele uniparental disomy, paternal allele
Tissue distribution (Note. This will be normalized with other fields JM May 2011)
- constitutional
- mosaci
- mosaic in germline
Parental origin
Source LOVD
Parent #1 Parent #2 Paternal (inferred) Paternal (confirmed) Maternal (inferred) Maternal (confirmed) de novo de novo, on paternal allele de novo, maternal allele
Pathogenicity
See the paper From LOVD:
No known pathogenicity
Probably no pathogenicity
Unknown Probably pathogenic Pathogenic
From DMuDB:
Non-Pathogenic
Probably Not Pathogenic
Not Known Probably Pathogenic Pathogenic Unclassified
From CMGS/VKGL paper as by Alamut:
Class 1 – Certainly not pathogenic
Class 2 – Unlikely to be pathogenic but cannot be formally proven
Class 3 – Likely to be pathogenic but cannot be formally proven
Class 4 - Certainly pathogenic
Class 5 - Unknown (not in spec)
Other comments:
Not Known implies that a submitter has given no data on pathogenicity. Unclassified implies that the submitter has specifically indicated that they are unable to classify the pathogenicity of the variant.
Patient
Gender
Source iso5218 (codes 0,1,2,9)
not known
male
female
not applicable
Geographical region
- Country (iso-3166 codes)
- dbSNP population classes
- See also geonames datatabase / web servcies
Ethnicity
Detection Technique
From DMuDB:
ARMS CF20: CF Common Mutation Test CF29: Analysis of 29 mutations using the Elucigene CF29 kit CSCE: Conformation sensitive capillary electrophoresis DGGE: Denaturing gradient gel electrophoresis dHPLC: Denaturing high performance liquid chromatography Heteroduplex analysis Loss of heterozygosity analysis Meta-PCR MLPA: Multiplex ligation-dependent probe amplification MS-PCR: Mutagenically separated PCR Multiplex PCR Not Known: The information has not been recorded or provided Not Specified: Test information cannot be determined PCR-PAGE PTT: Protein Trucation Test RNA: RNA work performed Sequencing SNPlex: The SNPlex™ Genotyping System from ABI SSCP SSCP/Heteroduplex
- Login to post comments

Comments
Comments
#1 LOVD 2.0 handles the
LOVD 2.0 handles the pathogenicity as a dual value, one for "submitter's opinion" and one for "curator's opinion". Each one has 5 options:
- => No known pathogenicity
-? => Probably no pathogenicity
? => Unknown
+? => Probably pathogenic
+ => Pathogenic
#2 Thanks Ivo. I updated the
Thanks Ivo. I updated the wiki accordingly, As a new feature, all terms like these have optional evidence code suggested by Mauno. The code is ontology term (i.e. it has those optional attributes telling source of ontology and accession number of term). I am not sure what the evidence terms can be in this case, perhaps some kind of assessment methods?.
#3 Hi Juha, evidence for
Hi Juha, evidence for pathogenicity could be long list of different things. Mostly, wet lab research can confirm pathogenicity by proving loss-of-function or gain-of-function of the mutated protein. But there are also computer-generated predictions, or the "evidence" is a combination of different knowledge in the head of the curator.
#4 Thanks Ivo. The evidence is
Thanks Ivo. The evidence is actually list. In one element there can be zero to many evidence codes.
#5 I added some examples of the
I added some examples of the controlled terms in DMuDB. In general these are not supposed to be exhaustive or complete lists, but e.g. new techniques would be added when required.
#6 Hi All I sent the following
Hi All
I sent the following in ane email some time ago, but none of my concerns seems to have been considered in the latest LSDB vocabulary. Please at least give them some serious consideration/discussion as I fear we may be making some fundamental errors in the current vocabulary that contradict basic genetics knowledge...
PATHOGENICITY
I think this has to be considered on TWO levels:
1) Evidence about the variant IN THE PATIENT
a) being a de novo mutation in a sporadic case argues for pathogenicity (likelihood depends upon how many genes are examined and the completeness of the gene scanning)
b) finding cosegregation of mutation and disease in the patient's family argues for pathogenicity (formally only argues for involvement of the genome region, and not the specific mutation)
2) Evidence about the variant IN GENERAL
a) sometimes there will be accepted 'fact' regarding pathogenicity (e.g. delta F508 in CF)
b) previous reports of being a normal variant (e.g., in dbSNP, LSDBs) would argue against pathogenicity
c) theoretical predictions (e.g., nature of amino-acid or splice site change) can suggest pathogenicity or neutrality
d) functional studies, gene knockouts, animal model data, and so on, can suggest pathogenicity
This split (between considerations of the variant in general, and considerations of the specific occurance of the variant in the patient) needs to be kept in mind across all aspects of the LSDB record. I'd like to see distinct sections in LSDBs for these two different aspects of a variant. Specifically;
- the 5 categories of pathogenicity are a good starting point, but they could be used to refer to the variant in general or the occurance of the variant in the patient (some mutations could be pathogenic in some genetic backgrounds and not in others)
- the genetic mechanism (autosomal dominant, autosomal recessive, X-linked, maternally imprinted, paternally imprinted) refers to the variant in general, whereas the zygosity (homozygote, heterozygote, compound heterozygote) refers to the occurance of the variant in the patient
INHERITANCE
I feel your list of inheritance terms actually covers a mixed bunch of different things...
- familial & sporadic (with consanguinity sub-categories): refer to the disease (not the variant)
- paternally inherited, maternally inherited, de novo mutation (from father), de novo mutation (from mother), consanguinous origin, mosaicism (germline and somatic), uniparental disomy, etc: refer to mode of inheritance (and this list can easily be made complete by some googling ...not forgetting mtDNA)
I definitely feel that all inheritance options should be made available via a controlled vocabulary list, and there may even need to be an option to select >1 item from the list
OTHER
Regarding the categories "Country of origin" and "Ethnicity", we first need to be very clear about what we are trying to achieve. I assume you are trying to ensure that the database eventually allow one to ask about the genetic history of the catalogued inherited variants. If so, then you would need fields to capture one or both of these two bits of information for many/all of the patients ancestors. It does not seem logical to try to capture this complexity by having one or two fields that refer to the 'origin/ethnicity of the variant'. To eventually query the genetic history of an inherited variant, one would integrate the recorded data on a) the patient's ancestors, and b) the mode of inheritance
In general, underlying all my comments is a sense that there needs to be crystal clear demarcation between different aspects of the data (patient data, family data, disease data, pathogenicity data, variant data, inheritance data, zygosity data, genetic mechanism data, method data) and the solid data model you have I assume this should be possible.