Developing Prototype LOVD web services at NGRL

At the National Genetics Reference Laboratory (NGRL), Manchester we are investigating the development of prototype web services to allow machine-to machine access to LOVD data. Here, I hope to provide an overview, and record some of the issues involved as we develop the simplest of these services.

Requirements

The simple service we aim to investigate first of all will provide a means of retrieving all public variants (and any associated public patient data) for a particular gene. This also implies a requirement to be able to get a list of genes supported by a particular LOVD database. For our Browser use case we are also interested in retrieving information about the reference sequence in use.

As this could be the basis of future, more fully featured services, ease of use and ease of installation are also important. Ease of use suggests a REST rather than/as well as a SOAP/WSDL interface. Ease of installation suggests that the service should be developed in PHP, as this will already be present on the target system, since LOVD is developed in PHP.

Implementation Notes

LOVD installation was relatively straightforward. The only problems encountered are well documented on the LOVD website. One of these was to do with strict settings in MySQL.

We investigated PHP Frameworks hoping to find something useful for rapid development of REST and SOAP interfaces and abstraction between the database, PHP classes, XML, etc. We initially looked at WSO2, but quickly found that it was not the quick lightweight solution we needed. It essentially needs rebuilt from source for different platforms, PHP versions, etc. This was not easy and did not meet our requirements. In the end we found the Zend framework to be useful, and have made use of the REST Server in particular.

We found it relatively easy to integrate with the existing PHP code and MySQL tables (centring around the many-to-many relationship between patients and variants) and have started to produce services that reproduce access to the publicly visible LOVD data.

The Web Service

We have so far concentrated on a REST interface to retrieve various information from an LOVD instance in a simple XML format. Because the aim of the exercise is prototyping and exploration we have not supplied an XML schema or stuck to the XML interchange format that is being developed as part of Gen2Phen. A more mature LOVD web service should aim to do this however.

The service is still at an early stage, but you can see the progress (on our own LOVD instance) using the URLs below:

http://ngrl.man.ac.uk/lovd2/ws/rest.php?method=getAllGenes - returns all genes available at that particular LOVD instance
http://ngrl.man.ac.uk/lovd2/ws/rest.php?method=getAllVariants&hgnc_symbol=UBE3A - returns all variants for a particular gene (i.e. as reported in every patient)
http://ngrl.man.ac.uk/lovd2/ws/rest.php?method=getUniqueVariants&hgnc_symbol=UBE3A - as above, but returns unique variants (i.e. one variant element, with 0-n patient sub-elements)
http://ngrl.man.ac.uk/lovd2/ws/rest.php?method=getVariantById&hgnc_symbol=UBE3A&id=UBE3A_00001 - return a single variant based upon the publicly visible database id.


The XML formats returned are more or less what was easiest to produce, and they attempt to reproduce the publicly visible information from LOVD. As you will see we have not dropped any empty optional elements in the results, and are not returning LOVD URLs yet. Below is an example instance of a variant:

	<variant>
<id>UBE3A_00001</id>
<exon> 08</exon>
<dna_change>c.3_16del14</dna_change>
<rna_change/>
<protein_change>Frame shift (predicted)</protein_change>
<restriction_site/>
<frequency>-</frequency>
<patient>
<pathogenicity>
<reported>Probably pathogenic</reported>
<concluded>Probably pathogenic</concluded>
<short>+?/+?</short>
</pathogenicity>
<id>003199(MC)</id>
<disease>Angelman syndrome</disease>
<reference/>
<template>DNA</template>
<technique>SEQ</technique>
<remarks>Parents not tested - out of frame deletion so pathogenicity assumed.</remarks>
<times_reported>1</times_reported>
<variant_created>2008-12-08 16:01:02</variant_created>
<variant_edited>2009-05-01 16:32:13</variant_edited>
<patient_created>2008-12-08 16:01:02</patient_created>
<patient_edited>2009-02-04 12:40:22</patient_edited>
</patient>
</variant>

The service is quite lightweight and only requires copying the PHP to your LOVD directory (SimpleXML module is required in PHP, but this is commonly enabled anyway).

Issues

  • Many LSDBs do not specify reference sequences, making raw HGVS nomencalture difficult to interpret
  • No simple mapping/binding framework from PHP classes to XML was found. The prototype could therefore be hard to maintain as the target XML schema becomes more complex.

Potential Further Work

  • Demonstrate visualisation using NGRL browser (this does not require genomic coordinates)
  • Rationalise services with other efforts
  • Align with XML interchange format and provide feedback on changes to this format
  • Get variant by region
  • Get variant by feature (5', exon/intron, 3')
  • Get variants updated/added since XXX
  • Get suggested reference sequence given HGNC gene symbol, list of HGVS variants
  • Machine accessible registry service for discovering LOVD web services

Requiring authentication/authorisation:

  • Allow viewing of non-public data (e.g. for admin, curators)
  • Service to allow submission of data (e.g. from other software)
5
Your rating: None Average: 5 (3 votes)