Exchange format for LSDBs

Hi All

We have been discussing with Ivo about the data format and getting into following proposal:

- Use xml-elemnts and not attributes, due to extendability etc.
- Add separate elements for variation aliases and sequence changes. The latter one is for related, often consequential, sequence and structural changes that are often carried with the main variation entry. Aliases are like db_xrefs but can also have info on reference sequence + other details.

<variant>
 <name>c.34G>C</name>
  <naming_scheme>HGVS</naming_scheme>
  <ref_seq>XY000000</ref_seq>

 ... then detection templates and other usual stuff in a same way

  <aliases>

    <variant>
       <name>c.342G>C</name>
       <naming_scheme>HGVS</naming_scheme>
       <ref_seq>LRG000001</ref_seq>
     </variant>

     <variant>
       <name>MUTXYZ</name>
       <naming_scheme>FINDIS</naming_scheme>
       <ref_seq>NM000001</ref_seq>
     </variant>

  </aliases>

   <seq_change>

      <variant>
         <name>g.232323G>T</name>  <!-- now the change is in genomic DNA -->
         <ref_seq>AC0001</ref_seq>
         <naming_scheme>HGVS</naming_scheme>
      </variant>

      <variant>
         <name>A447RfsX11</name>
        <naming_scheme>FINDIS</naming_scheme>
      </variant>

   </seq_change>

</variant>

============================

Comments:

The main Variation element  is the reference entry people are working with. Basically the sub-variation elements can have same details of data, but this should be optional.

It is not always possible to say should related variation info go into alias or seq_change section. For example variations on different splice variant templates (cDNA templates). But perhaps this does not matter.

Perhaps we should also add  a tag which tells is the seq_change experimentally verified or not.

Implementation specific things like global ids should go into attributes, if those are needed. (?)

<variant id="lsid://findis.org/variant/00001" />


Juha

4
Your rating: None Average: 4 (1 vote)