LSDB minimal requirements (D3.4), compared to LOVD and the LSDB XML format

See the current XML format: http://www.gen2phen.org/wiki/lsdb-xml-data-format

This list is a mapping between the LSDB minimal data requirements as listed in deliverable 3.4, LOVD 2.0/3.0 and the XML format currently under development (see link above).  As the XML format will be developed further, this list will be updated.
Last update: 2010-01-21.

 

Name D3.4 says Availability in LOVD 2.0
Availability in LOVD 3.0 (planned) Cafe RougeElement/attribute currently in XML format
Variant/Exon Recommended Standard, may be removed Standard, may be removed Recommendedvariant/exon
Variant/DNA_genomic Obligatory Not available, but genomic position can automatically be generated Always available (generated)
(Obligatory)*variant/name or variant/aliases/variant/name
Variant/DNA_coding Recommended Always available Always available(Obligatory)*

variant/name or variant/aliases/variant/name

Variant/RNA Obligatory Always available Always availableRecommended
(might not be available)
variant/seq_change/variant/name
Variant/Protein Obligatory Always available Always availableRecommended
(might not be available)
variant/seq_change/variant/name
Variant/DBID Obligatory Always available Always availableShould be generated by LSDBvariant[id]
Variant/Reference Obligatory Not standard LOVD; can be derived from Patient/Reference UndecidedObligatoryvariant/publications/publication
Variant/DNA_published Recommended Sometimes available Sometimes availableRecommendedvariant/aliases/variant/name
Variant/Detection/Template Obligatory Not standard LOVD; can be derived from Patient/Detection/Template Always available; as Screening/TemplateObligatoryvariant/variant_detection/detection[template]
Variant/Detection/Technique Obligatory Not standard LOVD; can be derived from Patient/Detection/Technique Always available; as Screening/TechniqueObligatoryvariant/variant_detection/detection[technique]
Variant/DNA_remark Recommended Compatible with Variant/Remarks which is sometimes available Sometimes available; as Variant/RemarksRecommendedvariant/comment
Variant/Frequency Recommended Standard, may be removed Standard, may be removedRecommendedvariant/frequency
Variant/Origin Recommended Not standard LOVD, can only partially be derived from other (optional) columns UndecidedRecommendedvariant/parental_origin
Variant/Restriction_site Optional Standard, may be removed Standard, may be removedOptionalvariant/restriction_site
Variant/Allele Recommended Always available UndecidedRecommendedvariant/parental_origin
Variant/Pathogenicity Recommended Always available UndecidedRecommendedvariant/pathogenicity
Patient/Patient_ID Obligatory Non-public information Non-public informationObligatorypatient/local_id
Patient/Phenotype/Disease Obligatory Always available Always available
Obligatoryvariant/patient/phenotypes/phenotype
Patient/Remarks Recommended Standard, may be removed Standard, may be removedRecommendedvariant/patient/remarks
Patient/Origin/Geographic Recommended Sometimes available Sometimes availableRecommendedvariant/patient/geographical_region
Patient/Origin/Ethnic Recommended Sometimes available Sometimes availableRecommendedvariant/patient/ethnicity
Patient/Gender Recommended Sometimes available Sometimes availableRecommendedvariant/patient/gender
ID_submitterid_ Obligatory Sometimes available (field can be empty, which means the curator is the submitter) Always availableObligatorysource/submitter_id
      
  Legend 
  Always available: Needs modification of LOVD to allow removal
  Standard, may be removed: Is enabled by default but users are allowed to remove these columns 
  Sometimes available: Is available in LOVD but not enabled by default; users can activate these columns 
  Not standard LOVD: Some LOVD's (especially Leiden-based) have these columns
  Undecided: May be same or similar as in LOVD 2.0, but we haven't decided on the exact implementation yet.
  (Obligatory)*: One of these fields needs to be present.

 

0
Your rating: None

Comments

Hi,
would it be possible to add a column for Cafe Rouge? It would allow us to reach a consensus on what is obligatory and recommended for this use case and progress on a functioning implementation.
I would see following:
Variant/Exon Recommended
Variant/DNA_genomic Recommended *can be generated by LSDB*
Variant/DNA_coding Recommended
Variant/RNA Recommended *might not be available*
Variant/Protein Recommended *might not be available*
Variant/DBID *Should be generated by LSDB*
Variant/Reference Obligatory
Variant/DNA_published Recommended
Variant/Detection/Template Obligatory
Variant/Detection/Technique Obligatory
Variant/DNA_remark Recommended
Variant/Frequency Recommended
Variant/Origin Recommended
Variant/Restriction_site Optional
Variant/Allele Recommended
Variant/Pathogenicity Recommended
Patient/Patient_ID Obligatory
Patient/Phenotype/Disease Obligatory
Patient/Remarks Recommended
Patient/Origin/Geographic Recommended
Patient/Origin/Ethnic Recommended
Patient/Gender Recommended
ID_submitterid_ Obligatory

Depending on the submission software at least one of:
Variant/DNA_genomic
Variant/DNA_coding
Variant/RNA
or Variant/Protein
should be included in the submission, what do you think?

Hi David, all of these attributes are included, if I have not forgotten something by mistake. We are happy to add more if needed. Do you have more attributes in your database ? Please do not hesitate to tell.

David, I have added a column in this table using the information you provided, but it seems now the end of the table is missing because it's getting too wide, so the readability of the table is not that great anymore. I will try what happens if I make the font smaller.

I personally would frown if a laboratory only tries to detect mutations on RNA or protein level and not on DNA level.
Also in LOVD, DNA is absolutely mandatory. If really needed, the HGVS schema can show the variant name is just predicted, like: c.(1234C>G)?
So I would say: include DNA (whichever one), RNA and Protein in the submission.

Hi Juha: we do not want to add more than there is already in LOVD, the idea behind adding a column was to clearly show what the Cafe Rouge platform is expecting (obligatory and recommended)

Hi Ivo: thanks for adding the column. I agree on the obligatory part for at least one of the DNA (either c. or g.) (is there a way to describe this in the table?)
For the protein: is it always possible to predict the protein change? What happens if the breakpoints are not clearly identified in rearrangements indels...?

Hello David! The table is used also for the XML format which goes beyond the Cafe Rouge. For the purposes it is good to add as much as possible, or at least to know what kind of data there are. It would help us to evaluate the format and possibly add new elements.

We would like to use eg the variation part in different contexts. For example now the format works for national mutation databases as well ( or at least for the findis), because I added place for gene name (database cross reference)

Another new features are: evidence_code asked by Mauno and source element which gives information on data sources. I have commited the new version to the svn http://www.gen2phen.org/post/lsdb-xml-schema

I have added missing bits into the table.

Hi Juha,
agree, we should add as much as possible to the format (but not as mandatory): the Cafe should act as an intermediate between as many as possible agents. I am just trying to reach a consensus on a minimal list of mandatory elements that diagnostics labs would submit to the LSDB 'world' Johan is working on creating.

Excellent! Good to get the info from the diagnostic lab side as well! We will keep most of the attributes optional.

Hi David: I've tried describing your suggestion in the table, using (Obligatory)* - I hope it's clear this way.
When RNA nor Protein has been analyzed, and the exact breakpoints are unknown (like almost all entries in the DMD whole-exon changes database), Johan describes them like: "p.(fsX)" or even "p.(?)". It may not seem much, but it's better than no value, because the values I just mentioned may indicate a prediction but at least show the Protein has not been analyzed.

Hi Juha: Thanks for your additions to the table. I have one question though: "Patient/Patient_ID" now is the "variant/local_id" XML element. Shouldn't that be in the patient element?

Hi Ivo,
you are right for p.: it's good to confirm that the protein has not been analyzed using "p.(?)"
If we all agree here on the 'Cafe Rouge' column content as it is right now, we should freeze it by end next week and get the other partners to agree to it as well through the science mailing list.

Thanks and fixed!

Hi All,

What's the intention with respect to the obligatory fields Variant/Detection/Template and Variant/Detection/Technique? In some instances, splicing defects may be apparent after analysis of mRNA that has been reverse transcribed, PCR amplified and subjected to agarose gel electrophroesis. However, the underlying variant leading to altered splicing is not found until genomic DNA is analysed by PCR amplification and DNA sequencing. In such a case, is the template mRNA or is it genomic DNA, or is it both. Also, there is no single technique that has been used. Does the schema allow for this?

Hello Raymond. Thanks for the question. The format should be able to handle that, if I understood you correctly, because the detection information can be added into all sequence levels. One thing we could add more is reference to detection protocol details, if that is needed.

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.