Skip to main content

Development of LOVD RESTful / Atom webservice

(updated 2009-10-22; includes gene listing and some updates to the output format)
(updated 2009-12-27; described new additions and genomic locations)
(updated 2010-02-15; described new additions and world-wide LOVD querying service)
(updated 2010-04-20; changed all URLs to using rest.php in stead of rest)

Please note that since I still want to add a few features to this API, it's output format is currently not yet stable.

Since version 2.0-22, released October 5th, LOVD includes a simple webservice enabling simple queries or listing of variant data (not patient data), allowing the creation of a overall LOVD querying service. One can search on a gene symbol, get the list of available genes in the database, or on a per-gene basis, list all variants or search for a certain variant or DNA location.

The output it creates is an Atom 1.0 feed with basic variant information in plain text:
    (snippet)

Genes:
    <content type="text">
      id:CRYAA
      entrez_id:1409
      symbol:CRYAA
      name:Crystallin, alpha-A
      chromosome_location:21q22.3
      position_start:chr21:44589141
      position_end:chr21:44592913
      refseq_genomic:NC_000021.8
      refseq_mrna:NM_000394.2
      refseq_build:hg19
    </content>

Variants:
    <content type="text">
      symbol:CRYAA
      id:0000001
      position_mRNA:NM_000394.2:c.27
      position_genomic:chr21:44589236
      Variant/DNA:c.27G>T
      Variant/DBID:CRYAA_00001
    </content>

The id field contains the internal ID of the variant entry. The position is read out from the Variant/DNA field, and interpreted by a Mutalyzer module, if possible. If the variant can not be interpreted by Mutalyzer, LOVD tries to isolate the position from the Variant/DNA field by itself. The genomic location will only be available if the gene has been configured properly (i.e. has proper reference sequence information associated), and Mutalyzer could interpret the variant correctly. The Variant/DBID is the field which is actually used to link back to LOVD, since it is shared by other variant entries with the same change on DNA level. The link to LOVD is included in Atom's <link> element of the entry.

 

Full format

The full output of a query returning one variant is:

<?xml version="1.0" encoding="ISO-8859-1"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>
    Results for your query of the CRYAA gene database
  </title>
  <link rel="alternate" type="text/html" href="http://chromium.liacs.nl/LOVDv.2.0-dev/"/>
  <link rel="self" type="application/atom+xml" href="http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA"/>
  <updated>2007-06-21T17:23:00+02:00</updated>
  <id>tag:chromium.liacs.nl,2006-11-21:Chr:LOVDv.2.0-dev/REST_api</id>
  <generator uri="http://www.LOVD.nl/" version="2.0-22">
    Leiden Open Variation Database
  </generator>
  <rights>Copyright (c), the curators of this database</rights>
  <entry xmlns="http://www.w3.org/2005/Atom">
    <title>CRYAA:c.27G>T</title>
    <link rel="alternate" type="text/html" href="http://chromium.liacs.nl/LOVDv.2.0-dev/variants.php?select_db=CRYAA&amp;action=search_unique&amp;search_Variant%2FDBID=CRYAA_00001"/>
    <link rel="self" type="application/atom+xml" href="http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA/0000001"/>
    <id>tag:chromium.liacs.nl,1970-01-01:CRYAA/0000001</id>
    <author>
      <name>Unknown</name>
    </author>
    <published>1970-01-01T00:00:00+01:00</published>
    <updated>1970-01-01T00:00:00+01:00</updated>
    <content type="text">
      symbol:CRYAA
      id:0000001
      position_mRNA:NM_000394.2:c.27
      position_genomic:chr21:44589236
      Variant/DNA:c.27G>T
      Variant/DBID:CRYAA_00001
    </content>
  </entry>
</feed>

Note that the actual variant content is currently in plain text format. Once the XML export format(s) are agreed on, I will implement that also.

 

Possibilities

Please note that I used "rest.php" in all the URLs here, although "rest" also works is most cases, without the PHP extension. However, on some servers you may still need the .php suffix. So for clarity, I use them here, too.
The webservice currently supports (please note that these links point to a development installation, not an actually maintained database):
Listing of all genes in the database:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes

Searching on the gene symbol (full match only):
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes?search_symbol=CRYAA

Showing only one specific gene entry:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes/CRYAA

Searching on the genomic position:
Chromosome only:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes?search_position=chr21
Chromosomal location:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes?search_position=chr21:44589236
Chromosomal range, exact match (only match genes having exactly this range):
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes?search_position=chr21:44589141_44592913&position_match=exact
Chromosomal range, exclusive match (only match genes completely within this range):
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes?search_position=chr21:44589141_44592913&position_match=exclusive
Chromosomal range, partial match (match any gene overlapping the given region):
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/genes?search_position=chr21:44589141_44592913&position_match=partial

Listing of all variant entries in a certain gene:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA

Searching on the DNA position:
Coding DNA or genomic position, exact match only:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA?search_position=c.27
This does not allow for partial matches, so mutation c.27_28del is not matched. c.34 will match c.34+? and c.34_35 will match c.34+?_35-?. However, c.34 does not match c.34+5. Searching on genomic locations can be achieved using g. as a prefix.
Genomic position only, exclusive match (only match variants completely within this range):
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA?search_position=g.44589000_44590000&position_match=exclusive
Genomic position only, partial match (match any variant overlapping the given region):
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA?search_position=g.44589000_44590000&position_match=partial

Searching on the DNA field:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA?search_Variant/DNA=c.27G>T
This does not allow for partial matches, but c.(27G>T) or c.27G>T? will also match.

Searching on the DBID field:
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA?search_Variant/DBID=CRYAA_00001

Showing only one specific variant entry (internal ID only).
http://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA/0000001

 

Genomic positions

Starting at version 2.0-23, released December 7th, LOVD allows the generation of genomic locations of variants, provided a reference sequences has properly been configured in the database. We are using a new Mutalyzer tool for this, which is using information from the UCSC genome browser. A current problem with this information is that we can only map one version of a NM reference sequence to the genome; the UCSC data model does not allow for more versions of each reference sequence to be stored. We will change the datamodel of our local database to be able to store more versions of each NM transcript reference sequence, to be able to partially fix this problem.

 

World-wide LOVD quering service

Since the LOVD 2.0-24 API allows for searching for genes based on genomic position, it has become easier to utilize the LOVD APIs to quickly locate LOVD databases storing variants in a certain genomic region without the need to first find out which genes are located there. This way, the amount of queries needed per LOVD installation have reduced to usually one or two: 1) find gene databases on the given location, 2) if found, query that gene for any variants on the given location. Early February 2010, we have created a service that can query all LOVD installations that have selected to be published on the public list of LOVD installations on LOVD.nl (52 LSDBs with 1049 genes in total, 32 LSDBs with 822 genes have an useful LOVD version, results at 15/Feb/2010). We will test it using next-generation sequencing output to see how many of the variants found in the sequenced individual have been described somewhere in an LOVD on our list. This service will be later put online for the public to use.

Tags:

Comments

Comments

#1 Good stuff, Ivo!! Here's a

Good stuff, Ivo!! Here's a few nuts & bolts comments:

There seems to be a problem with SSL certificates when I use the https URLs you provide in the browser, or via curl on the commandline:

[mummi@host-153-80]curl httpk https://chromium.liacs.nl/LOVDv.2.0-dev/api/rest.php/variants/CRYAA
curl: (60) SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
More details here: http://curl.haxx.se/docs/sslcerts.html
...

For the single variant resource, you currently return a feed containing a single entry. It would be more appropriate to return just the element. The linkback URL you provide in each in your feed refers to not a single-item feed, but the itself as a standalone resource. It may be helpful here to use a singular noun in the URL linking to the variant report, like this: /variant/my_stable_variant_IDxxxx (instead of the plural /variants/xxx)

I noticed that in your feed XML you are in each linking back to an HTML page which is actually not the variant report, but a rather a search result return just the single variant:
https://chromium.liacs.nl/LOVDv.2.0-dev/variants.php?select_db=CRYAA&action=search_unique&search_Variant%2FDBID=CRYAA_00006

This should surely be the single-variant URL /variant/xxxx instead? Generally, I would try to settle on a set of stable, simple URL pointers for each representation of a variant (XHTML, Atom XML) and use these consistently in web pages or feeds.

Also, you might want to consider consolidating on just a single, master or canonical URL for each variant which gets used in *all* your responses, and then do one of the following on the server side:

A) serve HTML if client askes for content-type 'text/html' or 'xml/xhtml', and Atom XML if the client asks for 'application/atom+xml'.

B) redirect the client to the appropriate location to find i) the HTML page, or ii) Atom XML representation.

The main point here is to avoid circulating multiple URLs (direct, or complex multi-parameter searches) pointing to the same resource (the variant report).

#2 Hi Mummi, Thanks for your

Hi Mummi,

Thanks for your remarks. Ah, I forgot about that test SSL certificate, which isn't signed since we don't want to spend money on it. I have configured that LOVD not to require SSL anymore, and I have adapted the links in the article.

About the single variant resource, you mean starting directly with the tag, don't you? I will fix that in the next release, and also add that link - but I will also keep the alternate link to LOVD, otherwise there's no way linking the feed entry to an actual entry in LOVD. LOVD actually does not allow me to link directly to an entry, without also specifying an patient ID, so I have to link to the search results in LOVD.
That same problem prevents me to use
/variants/GENE/0000001?content-type=text/html to directly point to an entry.

#3 Hi Ivo, I've been using

Hi Ivo,

I've been using LOVD's feeds web services for a while and I've found something that (IMHO) should be corrected.

I don't remember in which LOVD installation this happened exactly but it is relevant in any LOVD LSDB that has many variants replicated (the "found x times" note). The feed I was reading contained the same variant 617 times. That is, there were 617 entries in a feed with exactly the same data. While this may not be an issue when dealing with the web application itself, this overhead will surely hinder the performance when dealing with programmatic methods like gene-wide searches (among others).

Maybe a field in the entry mentioning the number of times that a specific variant was found?

Keep up the good work!

#4 Hi Pedro, Thanks for your

Hi Pedro,

Thanks for your suggestion. 617 entries is definitely a lot of repetition. However, none of the variants are really totally equal, as the internal ID of these variants are different. So they are actually pointing to different entries, even though the data shown may be equal. With the data shown now, it may seem logical to ignore entries similar to the one already displayed, but if we start sharing more data suddenly the differences between similar entries may become apparent. So in short, I don't mind building something into the API to ask to group similar entries together, but I prefer not to make it the default. How about that?

#5 Hi Ivo, Yeah, I also supposed

Hi Ivo,

Yeah, I also supposed that the internal identifiers (among other features) were distinct... I can't estimate how frequently I'll read LOVD feeds and the impact this kind of overhead will have in future systems, however, a solution like the one you propose sounds good for faster reads! Maybe a "unique" parameter in the service? something like "...lovd/api/rest/variants/BRCA2/unique"?

Cheers!

#6 Hi Pedro, Sorry to be so slow

Hi Pedro, Sorry to be so slow in replying to you. I agree with you I need to add a new argument to show the unique variants only - your suggestion seems perfect. I'll be on holiday from next Thursday, so when I come back I will implement it!

#7 Would it make sense to use

Would it make sense to use RDFa to annotate content (given in html) and utilize the lsdb-xml schema for definitions/structure. Also GRDDL could be perhaps utilzed in transforming the html representation into a standard form

#8 Hi Juha, do you mean the

Hi Juha, do you mean the normal LOVD HTML output? It's not structured properly to be converted to XML like that; it shows tables with patient information repeated for each variant, not grouped or sorted by patient, or otherwise it shows just one patient with all the variants found. So there is currently no HTML output in LOVD that contains all info needed to be converted to XML.

G2P Knowledge Centre is part of GEN2PHEN and funded by the Health Thematic Area of the Cooperation Programme of the European Commission within the VII Framework Programme for Research and Technological Development.

© GEN2PHEN 2011