Login/Register
or use OpenID
LSDB minimal requirements (D3.4), compared to LOVD and the LSDB XML format
See the current XML format: http://www.gen2phen.org/wiki/lsdb-xml-data-format and workshop notes: http://askja.gene.le.ac.uk/drupal5/content/lsdb-minimal-requirements
This list is a mapping between the LSDB minimal data requirements as listed in
deliverable 3.4, LOVD 2.0/3.0 and the XML format currently under development (see link
above). As the XML format will be developed further, this list will be updated.
Last update: 2010-01-21.
| Name | D3.4 says | Availability in LOVD
2.0 |
Availability in LOVD 3.0 (planned) | Cafe Rouge | element or attribute currenly in XML format |
| Variant/Exon | Recommended | Standard, may be removed | Standard, may be removed | Recommended | variant/exon |
| Variant/DNA_genomic | Obligatory | Not available, but genomic position can automatically be generated | Always available (generated) |
(Obligatory)* | variant/name |
| Variant/DNA_coding | Recommended | Always available | Always available | (Obligatory)* | variant/aliases/variant/name |
| Variant/RNA | Obligatory | Always available | Always available | Recommended (might not be available) |
variant/seq_change/variant/name |
| Variant/Protein | Obligatory | Always available | Always available | Recommended (might not be available) |
variant/seq_change/variant/name |
| Variant/DBID | Obligatory | Always available | Always available | Should be generated by LSDB | variant/id |
| Variant/Reference | Obligatory | Not standard LOVD; can be derived from Patient/Reference | Undecided | Obligatory | variant/ref_seq |
| Variant/DNA_published | Recommended | Sometimes available | Sometimes available | Recommended | variant/aliases/variant/name |
| Variant/Detection/Template | Obligatory | Not standard LOVD; can be derived from Patient/Detection/Template | Always available; as Screening/Template | Obligatory | variant/variant_detection/template |
| Variant/Detection/Technique | Obligatory | Not standard LOVD; can be derived from Patient/Detection/Technique | Always available; as Screening/Technique | Obligatory | variant/variant_detection/technique |
| Variant/DNA_remark | Recommended | Compatible with Variant/Remarks which is sometimes available | Sometimes available; as Variant/Remarks | Recommended | variant/comment |
| Variant/Frequency | Recommended | Standard, may be removed | Standard, may be removed | Recommended | variant/frequency |
| Variant/Origin | Recommended | Not standard LOVD, can only partially be derived from other (optional) columns | Undecided | Recommended | variant/origin |
| Variant/Restriction_site | Optional | Standard, may be removed | Standard, may be removed | Optional | variant/restriction_site |
| Variant/Allele | Recommended | Always available | Undecided | Recommended | variant/parental_origin |
| Variant/Pathogenicity | Recommended | Always available | Undecided | Recommended | variant/pathogenicity |
| Patient/Patient_ID | Obligatory | Non-public information | Non-public information | Obligatory | patient/original_id |
| Patient/Phenotype/Disease | Obligatory | Always available | Always available |
Obligatory | patient/phenotype |
| Patient/Remarks | Recommended | Standard, may be removed | Standard, may be removed | Recommended | patient/comment |
| Patient/Origin/Geographic | Recommended | Sometimes available | Sometimes available | Recommended | patient/population (type="region") |
| Patient/Origin/Ethnic | Recommended | Sometimes available | Sometimes available | Recommended | patient/population (type="ethnic") |
| Patient/Gender | Recommended | Sometimes available | Sometimes available | Recommended | patient/gender |
| ID_submitterid | Obligatory | Sometimes available (field can be empty, which means the curator is the submitter) | Always available | Obligatory | source/id |
| Variant/HGNC gene Symbol | Obligatory | Obligatory | variant/gene/accession and source="HGNC" | ||
| Variant/Sharing policy (public/private) | Obligatory | variant/sharing_policy (since release 1.4) | |||
| Variant/Use permission (default Creative Commons 0) | Obligatory | variant/use_permission (since release 1.4) | |||
| Legend | |||||
| Always available: Needs modification of LOVD to
allow removal |
|||||
| Standard, may be removed: Is enabled by default but users are allowed to remove these columns | |||||
| Sometimes available: Is available in LOVD but not enabled by default; users can activate these columns | |||||
| Not standard LOVD: Some LOVD's (especially
Leiden-based) have these columns |
|||||
| Undecided: May be same or similar as in LOVD 2.0, but we haven't decided on the exact implementation yet. | |||||
| (Obligatory)*: One of these fields needs to be
present. |
|||||
- Printer-friendly version
- Login or register to post comments

Comments
Hi,
would it be possible to add a column for Cafe Rouge? It would allow us to reach a consensus on what is obligatory and recommended for this use case and progress on a functioning implementation.
I would see following:
Variant/Exon Recommended
Variant/DNA_genomic Recommended *can be generated by LSDB*
Variant/DNA_coding Recommended
Variant/RNA Recommended *might not be available*
Variant/Protein Recommended *might not be available*
Variant/DBID *Should be generated by LSDB*
Variant/Reference Obligatory
Variant/DNA_published Recommended
Variant/Detection/Template Obligatory
Variant/Detection/Technique Obligatory
Variant/DNA_remark Recommended
Variant/Frequency Recommended
Variant/Origin Recommended
Variant/Restriction_site Optional
Variant/Allele Recommended
Variant/Pathogenicity Recommended
Patient/Patient_ID Obligatory
Patient/Phenotype/Disease Obligatory
Patient/Remarks Recommended
Patient/Origin/Geographic Recommended
Patient/Origin/Ethnic Recommended
Patient/Gender Recommended
ID_submitterid_ Obligatory
Depending on the submission software at least one of:
Variant/DNA_genomic
Variant/DNA_coding
Variant/RNA
or Variant/Protein
should be included in the submission, what do you think?
Hi David, all of these attributes are included, if I have not forgotten something by mistake. We are happy to add more if needed. Do you have more attributes in your database ? Please do not hesitate to include those.
David, I have added a column in this table using the information you provided, but it seems now the end of the table is missing because it's getting too wide, so the readability of the table is not that great anymore. I will try what happens if I make the font smaller.
I personally would frown if a laboratory only tries to detect mutations on RNA or protein level and not on DNA level.
Also in LOVD, DNA is absolutely mandatory. If really needed, the HGVS schema can show the variant name is just predicted, like: c.(1234C>G)?
So I would say: include DNA (whichever one), RNA and Protein in the submission.
Hi Juha: we do not want to add more than there is already in LOVD, the idea behind adding a column was to clearly show what the Cafe Rouge platform is expecting (obligatory and recommended)
Hi Ivo: thanks for adding the column. I agree on the obligatory part for at least one of the DNA (either c. or g.) (is there a way to describe this in the table?)
For the protein: is it always possible to predict the protein change? What happens if the breakpoints are not clearly identified in rearrangements indels...?
Hello David! The table is used also for the XML format which goes beyond the Cafe Rouge. For the purposes it is good to add as much as possible, or at least to know what kind of data there are. It would help us to evaluate the format and possibly add new elements.
We would like to use eg the variation part in different contexts. For example now the format works for national mutation databases as well ( or at least for the findis), because I added place for gene name (database cross reference)
Another new features are: evidence_code asked by Mauno and source element which gives information on data sources. I have commited the new version to the svn http://www.gen2phen.org/post/lsdb-xml-schema
I have added missing bits into the table.
Hi Juha,
agree, we should add as much as possible to the format (but not as mandatory): the Cafe should act as an intermediate between as many as possible agents. I am just trying to reach a consensus on a minimal list of mandatory elements that diagnostics labs would submit to the LSDB 'world' Johan is working on creating.
Excellent! Good to get the info from the diagnostic lab side as well! We will keep most of the attributes optional.
Hi David: I've tried describing your suggestion in the table, using (Obligatory)* - I hope it's clear this way.
When RNA nor Protein has been analyzed, and the exact breakpoints are unknown (like almost all entries in the DMD whole-exon changes database), Johan describes them like: "p.(fsX)" or even "p.(?)". It may not seem much, but it's better than no value, because the values I just mentioned may indicate a prediction but at least show the Protein has not been analyzed.
Hi Juha: Thanks for your additions to the table. I have one question though: "Patient/Patient_ID" now is the "variant/local_id" XML element. Shouldn't that be in the patient element?
Hi Ivo,
you are right for p.: it's good to confirm that the protein has not been analyzed using "p.(?)"
If we all agree here on the 'Cafe Rouge' column content as it is right now, we should freeze it by end next week and get the other partners to agree to it as well through the science mailing list.
Thanks and fixed!
Hi All,
What's the intention with respect to the obligatory fields Variant/Detection/Template and Variant/Detection/Technique? In some instances, splicing defects may be apparent after analysis of mRNA that has been reverse transcribed, PCR amplified and subjected to agarose gel electrophroesis. However, the underlying variant leading to altered splicing is not found until genomic DNA is analysed by PCR amplification and DNA sequencing. In such a case, is the template mRNA or is it genomic DNA, or is it both. Also, there is no single technique that has been used. Does the schema allow for this?
Hello Raymond. Thanks for the question. The format should be able to handle that, if I understood you correctly, because the detection information can be added into all sequence levels. One thing we could add more is reference to detection protocol details, if that is needed.
Any chance we could color code this mapping to show required and optional and no-option field, and also to make conflicts apparent?
Also, the CaFE RouGE model has been updated recently, but not sure if that is represented in the sheet (Owen?)
* Keep just one 'Variant' class but add/adapt the following new attributes:
This will discriminate between these two forms of variant
It will also need a self-recursive relationship to capture alleleic variant to genotypic variant relationships
For alleleic variants, value 1 would equate to heterozygosity, value 2 to homozygosity, value 3 to trisomy, etc, and value 1.5 to a situation in a cancer sample where half the cells have lost one allele [so we would not put values like heterozygosity in the 'Origin' field].
For genotypic variants this same field would provide a way to capture the count of copy number variant.
* Have a distinct 'Pathogenicity' (rather than having this as an attribute of Variant), joined to Variant by a many-many relationship. It should have at least these attributes:
Values should include terms such as 'Individual', 'Family', 'Population', 'Population XYZ', 'Ethnic group', 'Ethnic group XYZ'
This should also be a Required field, as it is key for interpreting and integration pathogenicity statements
* Have an 'Observation_Target' superclass, with subclasses 'Individual' and 'Panel', with a many-many association between these two. Obviously, there is also a need for a many-many relationship between Observation_Target and Variant
THE ONLY PROBLEM I can see with the above, is how one then makes it clear which Observation_Target a particular Pathogenicty entry refers to, in situations where this connections needs to be recorded. The simplest solution would be to have an association link between Observation_Target and Pathogenicty. Alternative solutions can be imagined, but they are all far more complicated and so I won't go into them here.
XML STRUCTURE
-------------
I hope this takes us forward a little :-)
Tony
>> "Genotypic"
>> This will discriminate between these two forms of variant
>> It will also need a self-recursive relationship to capture allelic
>> variant to genotypic variant relationships
>
> What are these different two types? Do you have some examples?
>
...At a position where the mutant/normal alternatives are 'T/C', the 'T' alternative would be an 'allelic' variant. Furthermore, we have to conceptually discriminate between the 'T' as an allele in general (which will have certain features, such as frequency, and pathogenicity on average in the population), and a specific instance of the 'T' in an individual (which will other features, such as its zygosity and its pathogenicity in that individual).
>> - Add 'DiploidCount' which can have any numeric value (not just
>> integers!)
>> For allelic variants, value 1 would equate to heterozygosity, value 2
>> to homozygosity, value 3 to trisomy, etc, and value 1.5 to a situation
>> in a cancer sample where half the cells have lost one allele [so we
>> would not put values like heterozygosity in the 'Origin' field].
>> For genotypic variants this same field would provide a way to capture
>> the count of copy number variant.
>>
>> - Rename 'Origin' to 'Genetic Origin' for values such as 'Unknown',
>> 'de novo (certain)', 'de novo (inferred)', 'from mother (certain)',
>> 'from mother (inferred)', 'from father (certain)', 'from father
>> (inferred)', 'from either parent' ...to precisely capture the genetic
>> origin [and nothing more!]
>
> What do you put in the origin field for a homozygous mutation? And how
> would you store a mutation that was inherited from the father, but is
> the novo on the chromosome that come from the mother? I'm not in favor
> of grouping homozygous mutations; they have two different sources so my
> gut tells me they should be stored as two separate mutations.
>
...This question illustrates the need for a clear distinction between allelic variants in general, allelic variants in individuals, genotypic variants in general, and genotypic variants in individuals. For your specific example, you're talking about a use case that requires patient specific data recording (rather than population level), and so there would be one variant entry, as follows;
'AllelicOrGenotypic' = "Allelic"
'DiploidCount' = 2.0
'Genetic Origin' = "de novo (certain)" and "from father (certain)"
'AllelicOrGenotypic' = "Allelic"
'DiploidCount' = 1.0
'Genetic Origin' = "de novo (certain)"
'AllelicOrGenotypic' = "Allelic"
'DiploidCount' = 1.0
'Genetic Origin' = "from father (certain)"
>> * Have a distinct 'Pathogenicity' (rather than having this as an
>> attribute of Variant), joined to Variant by a many-many relationship.
>> It should have at least these attributes:
>>
>> - 'Inferential Scope' to specify what set of individuals it refers to.
>> Values should include terms such as 'Individual', 'Family',
>> 'Population', 'Population XYZ', 'Ethnic group', 'Ethnic group XYZ'
>> This should also be a Required field, as it is key for interpreting
>> and integration pathogenicity statements
>
> This really confuses me... If I understand correctly, this will lose all
> information on the variants separately? So variants are linked to a
> Pathogenicity statement that refers to an individual or even more
> abstract, a population? But, a pathogenicity class linked to an
> individual sounds like phenotype. I understand that there are issues
> with the current setup (two pathogenic variants can be non-pathogenic
> when combined, dominant/recessive, imprinting, etc), but I do believe we
> need to keep the pathogenicity info per variant as well.
> If this is how I understand it is, LOVD will not be able to generate
> these values.
>
...I am glad you realise the current system has major problems. But saying that "we need to keep the pathogenicity info per variant" reflects a lack of consideration of the above issues regarding what a 'variant' actually means. This is all completely resolved by pulling 'pathogenicity' into its own class, with an 'Inferential Scope' attribute. E.g.,
'Inferential Scope' might refer to "all populations", or "Africans", or "Ashkenazi Jews" etc, depending on what the evidence was. There may be several pieces of pathogenicity evidence, potentially based upon different ontologies, and so we'd need a separate pathogenicity class to capture all of these.
'Inferential Scope' might refer to "the patient", or "the patients family", depending on what the evidence was. Again, there may be several pieces of pathogenicity evidence, potentially based upon different ontologies, and so we'd need a separate pathogenicity class to capture all of these.
>> * Have an 'Observation_Target' superclass, with subclasses
>> 'Individual' and 'Panel', with a many-many association between these
>> two. Obviously, there is also a need for a many-many relationship
>> between Observation_Target and Variant
>
...These many-many relationships are surely needed? An individual may be part of several panels (families, populations, groups affected by disease, or whatever 'panel' clustering might be needed for a certain use case), and a panel will have many individuals. Equally, each target (e.g., patient) may have many variants, and each allelic & genotypic variants in general might be found in many individuals.
> How many times will a variant, with all information associated to it,
> actually be referenced throughout several individuals? These
> many-to-many relationships in an XML file make the file quite
> complicated. I've never seen it before either; data always get repeated.
> Simple example:
>
> <entry id="001">
> <created_by>
> <id>1</id>
> <name>Ivo Fokkema</name>
> </created_by>
> <edited_by>
> <id>1</id>
> <name>Ivo Fokkema</name>
> </edited_by>
> </entry>
> <entry id="002">
> <created_by>
> <id>1</id>
> <name>Ivo Fokkema</name>
> </created_by>
> <edited_by>
> <id>1</id>
> <name>Ivo Fokkema</name>
> </edited_by>
> </entry>
>
> Yes, the user information gets repeated all the time. But introducing
> many-to-many relationships and therefore splitting data in the XML file
> will, in my opinion, make the XML document too complicated. If we're
> connecting data that way, we might as well just send JSON objects back
> and forth and drop the whole XML stuff. Well, probably nobody will agree
> with me there ;)
>
> Yes, the user information gets repeated all the time. But introducing
> many-to-many relationships and therefore splitting data in the XML file
> will, in my opinion, make the XML document too complicated. If we're
> connecting data that way, we might as well just send JSON objects back
> and forth and drop the whole XML stuff. Well, probably nobody will agree
> with me there ;)
<id>1</id>
<name>Ivo Fokkema</name>
</author>
<entry id="001">
<created_by>
<id>1</id>
</created_by>
<edited_by>
<id>1</id>
</edited_by>
</entry>
<entry id="002">
<created_by>
<id>1</id>
</created_by>
<edited_by>
<id>1</id>
</edited_by>
</entry>
Cheers
Tony