Development of LOVD RESTful / Atom webservice

(updated 2009-10-22; includes gene listing and some updates to the output format)
(updated 2009-12-27; described new additions and genomic locations)
(updated 2010-02-15; described new additions and world-wide LOVD querying service)
(updated 2010-04-20; changed all URLs to using rest.php in stead of rest)

Please note that since I still want to add a few features to this API, it's output format is currently not yet stable.

Since version 2.0-22, released October 5th, LOVD includes a simple webservice enabling simple queries or listing of variant data (not patient data), allowing the creation of a overall LOVD querying service. One can search on a gene symbol, get the list of available genes in the database, or on a per-gene basis, list all variants or search for a certain variant or DNA location.

The output it creates is an Atom 1.0 feed with basic variant information in plain text:
    (snippet)

Genes:
    <content type="text">
      id:CRYAA
      entrez_id:1409
      symbol:CRYAA
      name:Crystallin, alpha-A
      chromosome_location:21q22.3
      position_start:chr21:44589141
      position_end:chr21:44592913
      refseq_genomic:NC_000021.8
      refseq_mrna:NM_000394.2
      refseq_build:hg19
    </content>

Variants:
    <content type="text">
      symbol:CRYAA
      id:0000001
      position_mRNA:NM_000394.2:c.27
      position_genomic:chr21:44589236
      Variant/DNA:c.27G>T
      Variant/DBID:CRYAA_00001
    </content>

The id field contains the internal ID of the variant entry. The position is read out from the Variant/DNA field, and interpreted by a Mutalyzer module, if possible. If the variant can not be interpreted by Mutalyzer, LOVD tries to isolate the position from the Variant/DNA field by itself. The genomic location will only be available if the gene has been configured properly (i.e. has proper reference sequence information associated), and Mutalyzer could interpret the variant correctly. The Variant/DBID is the field which is actually used to link back to LOVD, since it is shared by other variant entries with the same change on DNA level. The link to LOVD is included in Atom's <link> element of the entry.

 

Full format

The full output of a query returning one variant is:

<?xml version="1.0" encoding="ISO-8859-1"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>
    Results for your query of the CRYAA gene database
  </title>
  <link rel="alternate" type="text/html" href="http://chromium.liacs.nl/LOVDv.2.0-dev/"/>
  <link rel="self" type="application/atom+xml" href="http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA"/>
  <updated>2007-06-21T17:23:00+02:00</updated>
  <id>tag:chromium.liacs.nl,2006-11-21:Chr:LOVDv.2.0-dev/REST_api</id>
  <generator uri="http://www.LOVD.nl/" version="2.0-22">
    Leiden Open Variation Database
  </generator>
  <rights>Copyright (c), the curators of this database</rights>
  <entry xmlns="http://www.w3.org/2005/Atom">
    <title>CRYAA:c.27G>T</title>
    <link rel="alternate" type="text/html" href="http://chromium.liacs.nl/LOVDv.2.0-dev/variants.php?select_db=CRYAA&amp;action=search_unique&amp;search_Variant%2FDBID=CRYAA_00001"/>
    <link rel="self" type="application/atom+xml" href="http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA/0000001"/>
    <id>tag:chromium.liacs.nl,1970-01-01:CRYAA/0000001</id>
    <author>
      <name>Unknown</name>
    </author>
    <published>1970-01-01T00:00:00+01:00</published>
    <updated>1970-01-01T00:00:00+01:00</updated>
    <content type="text">
      symbol:CRYAA
      id:0000001
      position_mRNA:NM_000394.2:c.27
      position_genomic:chr21:44589236
      Variant/DNA:c.27G>T
      Variant/DBID:CRYAA_00001
    </content>
  </entry>
</feed>

Note that the actual variant content is currently in plain text format. Once the XML export format(s) are agreed on, I will implement that also.

 

Possibilities

Please note that I used "rest.php" in all the URLs here, although "rest" also works is most cases, without the PHP extension. However, on some servers you may still need the .php suffix. So for clarity, I use them here, too.
The webservice currently supports (please note that these links point to a development installation, not an actually maintained database):
Listing of all genes in the database:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes

Searching on the gene symbol (full match only):
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes?search_symbol=CRYAA

Showing only one specific gene entry:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes/CRYAA

Searching on the genomic position:
Chromosome only:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes?search_position=chr21
Chromosomal location:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes?search_position=chr21:44589236
Chromosomal range, exact match (only match genes having exactly this range):
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes?search_position=chr21:44589141_44592913&position_match=exact
Chromosomal range, exclusive match (only match genes completely within this range):
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes?search_position=chr21:44589141_44592913&position_match=exclusive
Chromosomal range, partial match (match any gene overlapping the given region):
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes?search_position=chr21:44589141_44592913&position_match=partial

Listing of all variant entries in a certain gene:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA

Searching on the DNA position:
Coding DNA or genomic position, exact match only:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA?search_position=c.27
This does not allow for partial matches, so mutation c.27_28del is not matched. c.34 will match c.34+? and c.34_35 will match c.34+?_35-?. However, c.34 does not match c.34+5. Searching on genomic locations can be achieved using g. as a prefix.
Genomic position only, exclusive match (only match variants completely within this range):
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA?search_position=g.44589000_44590000&position_match=exclusive
Genomic position only, partial match (match any variant overlapping the given region):
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA?search_position=g.44589000_44590000&position_match=partial

Searching on the DNA field:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA?search_Variant/DNA=c.27G>T
This does not allow for partial matches, but c.(27G>T) or c.27G>T? will also match.

Searching on the DBID field:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA?search_Variant/DBID=CRYAA_00001

Showing only one specific variant entry (internal ID only).
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA/0000001

 

Genomic positions

Starting at version 2.0-23, released December 7th, LOVD allows the generation of genomic locations of variants, provided a reference sequences has properly been configured in the database. We are using a new Mutalyzer tool for this, which is using information from the UCSC genome browser. A current problem with this information is that we can only map one version of a NM reference sequence to the genome; the UCSC data model does not allow for more versions of each reference sequence to be stored. We will change the datamodel of our local database to be able to store more versions of each NM transcript reference sequence, to be able to partially fix this problem.

 

World-wide LOVD quering service

Since the LOVD 2.0-24 API allows for searching for genes based on genomic position, it has become easier to utilize the LOVD APIs to quickly locate LOVD databases storing variants in a certain genomic region without the need to first find out which genes are located there. This way, the amount of queries needed per LOVD installation have reduced to usually one or two: 1) find gene databases on the given location, 2) if found, query that gene for any variants on the given location. Early February 2010, we have created a service that can query all LOVD installations that have selected to be published on the public list of LOVD installations on LOVD.nl (52 LSDBs with 1049 genes in total, 32 LSDBs with 822 genes have an useful LOVD version, results at 15/Feb/2010). We will test it using next-generation sequencing output to see how many of the variants found in the sequenced individual have been described somewhere in an LOVD on our list. This service will be later put online for the public to use.