D4.4 Foundational LSDBs for all disease-related genes
| Contributed by: | Acacia Reiche |
| Originally posted: | 12th August 2010: 12:12 pm |
| Last updated: | 1st July 2011: 11:52 am |
| Short URL: | http://gen2phen.org/node/25704 |
| Attachment | Size |
|---|---|
| D4.4 Foundational LSDBs for all disease-related genesv1.3_Final.pdf | 326.15 KB |
Embedded Scribd iPaper - Requires Javascript and Flash Player
HEALTH-F4-2007-200754
www.gen2phen.org
D4.4 Foundational LSDBs for all diseaserelated genes
WP4 – Genetics G2P Databases
V1.3 Final
Lead beneficiary: LUMC Date: 03/08/2010 Nature: Report Dissemination level: PU
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
2/9
TABLE OF CONTENTS DOCUMENT INFORMATION .................................................................................................. 3 DOCUMENT HISTORY ............................................................................................................. 3 DEFINITIONS .............................................................................................................................. 3 1. 2. 3. INTRODUCTION................................................................................................................. 5 DESCRIPTION OF WORK ................................................................................................ 5 FUTURE WORK .................................................................................................................. 7 3.1. 3.2. 3.3. POPULATING THE MENDELIAN GENES DATABASE............................................................ 7 DATA CURATION .............................................................................................................. 7 FUTURE DEVELOPMENTS .................................................................................................. 8
REFERENCES.............................................................................................................................. 9
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
3/9
Document Information
Grant Agreement HEALTH-F4-2007-200754 Number Full title Project URL Acronym GEN2PHEN
Genotype-To-Phenotype Databases: A Holistic Solution http://www.gen2phen.org
EU Project officer Iiro Eerola (Iiro.EEROLA@ec.europa.eu ) Deliverable Work package Delivery date Status Nature Dissemination Level Authors (Partner) Responsible Author Report Public Number 4.4 Number 4 Contractual Final Prototype Confidential Other Title Title Month 30 Foundational LSDBs for all disease-related genes Genetics G2P Databases Actual final 03/08/2010
P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Johan T. den Dunnen Partner LUMC Email J.T.den_Dunnen@lumc.nl Phone +31-71-5269501
Document History
Name P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) P. Taschner (LUMC) P. Taschner (LUMC) P. Taschner (LUMC) Date Version Description
18/06/10 05/07/10 15/07/10 03/08/10
1.0 1.1 1.2 1.3
Draft Final draft Final draft Final
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
4/9
Definitions
Partners of the GEN2PHEN Consortium are referred to herein according to the following codes: ULEIC – University of Leicester (UK) – Coordinator EMBL – European Molecular Biology Laboratory (Germany) – Beneficiary FIMIM – Fundació IMIM (Spain) – Beneficiary LUMC – Leiden University Medical Center (Netherlands) – Beneficiary INSERM – Institut National de la Santé et de la Recherche Médicale (France) – Beneficiary KI – Karolinska Institutet (Sweden) – Beneficiary FORTH – Foundation for Research and Tecnology Hellas (Greece) – Beneficiary CEA – Comissariat à l’Energie Atomique (France) – Beneficiary EMC – Erasmus Universitair Medisch Centrum Rotterdam (Netherlands) – Beneficiary UH.FGC – Helsingin Yliopisto (Finland) – Beneficiary UAVR – Universidade de Aveiro (Portugal) – Beneficiary UWC – University of the Western Cape (South Africa) – Beneficiary CSIR – Council of Scientific and Industrial Research (India) – Beneficiary SIB – Swiss Institute of Bioinformatics (Switzerland) – Beneficiary UNIMAN – The University of Manchester (UK) – Beneficiary BIOBASE – BioBase GmbH. (Germany) – Beneficiary deCODE – Islensk Erfoagreining EH (Iceland) – Beneficiary PHENO – Phenosystems S.A. (Belgium) – Beneficiary BCP – Biocomputing Platforms Ltd. Oy (Finland) – Beneficiary UPAT – University of Patras (Greece) – Beneficiary Grant Agreement: The agreement signed between the beneficiaries and the European Commission for the undertaking of the GEN2PHEN project (HEALTH-200754). Project: The sum of all activities carried out in the framework of the Grant Agreement by the Consortium. Work plan: Schedule of tasks, deliverables, efforts, dates and responsibilities corresponding to the work to be carried out for the GEN2PHEN project, as specified in Annex I to the Grant Agreement. Consortium: The GEN2PHEN Consortium, conformed by the above-mentioned legal entities. Consortium agreement: agreement concluded amongst GEN2PHEN participants for the implementation of the Grant Agreement. Such an agreement shall not affect the parties’ obligations to the Community and/or to one another arising from the Grant Agreement.
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
5/9
1. INTRODUCTION
Inter-individual genome variation plays a major role in differential normal development and disease processes. However, the details of how these relationships work are far from clear, even in the case of most Mendelian disorders where single genetic alterations are fully penetrant (essentially causative, rather than risk modifying). Background genetic effects (modifier genes), epistasis, somatic variation, and environmental factors all complicate the situation. Extensive research is therefore being conducted worldwide to characterise genetic variation in normal and disease contexts. To store this information, several hundred locusspecific databases (LSDBs) that target specific diseases or genes existed before the start of the Gen2Phen project (1). These had been constructed using a plethora of different technologies and designs. Most of the resulting databases were rather primitive in implementation and small in scale (2). The Universal Mutation Database (UMD http://www.umd.be/, (3)) and the Leiden Open-source Variation Database (LOVDhttp://www.lovd.nl, (4)) ‘LSDB-in-a-box’ applications have solved this problem by supporting existing LSDBs that wished to transfer to our platforms. From the start of the project, WP4 has created many new LSDBs for interested researchers. On June 11, 2010, 1482 LSDBs, of which the majority is LOVD-based, were listed on the Waystation website (http://www.centralmutations.org). Many of the clinically important genes, however, are covered by several databases. WP4 Activity 4.2 LSDB Creation, which is led by LUMC, culminates in this deliverable: the creation of foundational LSDBs for the remaining diseaserelated genes. The steps involved to reach this milestone and to populate the databases are described below.
2. DESCRIPTION OF WORK
Previously, we have successfully demonstrated the feasibility of creating many LSDBs in a single LOVD2 installation (http://www.lovd.nl/MR) in the Mental Retardation database pilot project. For this >500 LSDBs for genes on the X chromosome were created to present the results of a large-scale resequencing study in patients with X-linked mental retardation (5). We decided to use the same approach and created a new LOVD2 installation for the LOVD Mendelian Genes database (http://www.lovd.nl/mendelian_genes). The next step was the creation of a list of genes involved in disorders with Mendelian inheritance with the necessary information according to the Human Variome Project guidelines (6) to automatically setup separate LSDBs for each gene. The June 2010 morbid map from the Online Mendelian Inheritance in Man database (OMIM http://www.ncbi.nlm.nih.gov/Omim/, (7)) contains 5508 entries with the loci of all 4348 mapped diseases, 189 traits and 967 susceptibilities. Not all the genes involved in these phenotypes have been identified yet. OMIM lists 2436 genes with one or more allelic variants associated with a phenotype. All human gene symbols with the necessary information for the gene homepage (e.g. reference sequence accession numbers, cross-references, etc.) were downloaded from the Human Gene Nomenclature Committee website (HGNC http://www.genenames.org/, (8)). This was combined with MIM disease numbers from OMIM using Biomart (http://www.biomart.org/, (9)). Mendelian disease genes for which no
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
6/9
LSDB existed or for which only static web pages were available are included in the Mendelian Genes database (See https://grenada.lumc.nl/LOVD2/mendelian_genes/status.php for an overview of the 2118 disease-related genes and reported variants). This may help to entice the maintainers of static web pages to switch to using LSDB software. This achievement was the final step to reach milestone M13: Completion of web-accessible foundational LSDBs for all Mendelian disease-related genes. It is also a major step towards a standardised and integrated ‘federated’ LSDB database with common search tools, query interfaces, and data output formats, since most existing databases are using LOVD2 software. Clinicians and researchers can access the database to submit new variants in different genes associated with specific diseases or phenotypes or to search for sequence variants in specific genes, although the Mendelian Genes database only contains several hundreds of variants at the moment (Fig. 1). The Mendelian Genes database and all other LOVD installations hosted on our servers are accessible with the centralised search capabilities of the LOVD API (See http://www.gen2phen.org/post/development-lovd-restful-atom-webservice for a description and examples). The Human Genome Variation Society (HGVS) sequence variant nomenclature plays an important role in the correct search and exchange of data in LSDBs and genomic databases (See http://www.hgvs.org/mutnomen , (10)). The Mendelian Genes database also uses LOVD’s Mutalyzer module to ensure the use of the unambiguous HGVS sequence variant descriptions.
Fig. 1. Mendelian Gene LSDBs ordered by the number of variants in the Manager’s View. All genes have reference sequences, but some of them have not yet been selected by the curator for display on the LOVD homepage (RefSeq? column).
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
7/9
3. FUTURE WORK
We have created foundational LSDBs for all disease-related genes, but these are mostly empty repositories for sequence variation data for each gene involved in Mendelian disorders. Although empty LSDBs are not very useful, their mere existence invalidates excuses used by DNA diagnostic labs and researchers not to submit data into the public domain. Journal editors and reviewers can now refute such excuses by referral to the Mendelian genes database. They also have no excuse to exempt authors from submission of variant data before accepting a manuscript. Filling databases without proper curation is also debatable, because it might result in low quality data. On the other hand, these databases might also attract potential curators by their content, i.e. phenotypic information suggesting an association with particular variants, and evolve into a high quality database. WP4 will continue to support curators with the migration of their static databases to this or other database systems to accomplish a standardised and integrated ‘federated’ LSDB. 3.1. Populating the Mendelian Genes database Without advertisement, the Mendelian genes database is already receiving submissions. To provide initial content, WP7 activities to populate the foundational LSDBs were foreseen in the original Gen2Phen Description of Work. For a start, variants can be retrieved from comprehensive databases, which contain information about all genes. OMIM and the Human Gene Mutation Database (HGMD - www.hgmd.cf.ac.uk, (11)) contain many variants, which have been extracted from the literature. The public part of HGMD contains no detailed information about the frequency of the variant and the phenotype. Other potential sources are the Single Nucleotide Polymorphism database (dbSNP, (12)), the pharmacogenetics database (PharmGKB - http://www.pharmgkb.org/, (13)) and SwissVar (http://www.expasy.org/swissvar/, (14)). Descriptions from these databases have to be reformatted to HGVS format before import. As demonstrated by our Mental Retardation database pilot project, researchers can submit new variants identified by whole genome or exome resequencing. Since the quality of next generation sequencing data varies, variant submissions should be accompanied by evidence codes or other information indicating the likelihood that it is not a sequencing error. The next challenge will be to get data from diagnostic laboratories either directly or via the Diagnostic Mutation Database (DMudb http://www.ngrl.org.uk/Manchester/projects/informatics/dmudb) or the Café for Routine Genetic data Exchange (Café Rouge - http://www.caferouge.org/). 3.2. Data curation Although volunteers regularly contact us to become database curator, most of the new LSDBs have a curator vacancy. As part of WP4 Activity 4.6 Curators (‘guardians’ of the foundational databases) will be enlisted from a list of experts consulted in WP2, from the HVP and HGVS communities, and from other sources such as WikiProteins members, to perform this task. Another source of curators will be provided by contacting publishers to
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
8/9
suggest LSDB creation each time a new disease gene has been identified. Interest in curatorship may also be raised by participation in the training activities organized in WP8 or by the dissemination activities organized in WP9. In this ‘federated’ LSDB system, the LSDB curators will retain control and ownership of their own specialised database and its contents, which will exist as part of a seamless holistic system. This effort will be led and realized by ULEIC. 3.3. Future developments The Mendelian Genes database described here will have to be extended on a regular basis with new disease genes, which are discovered rapidly due to whole genome or exome resequencing. It provides a sufficiently organised and mature database infrastructure to gather, store, integrate and query variant and phenotype data from the gene-to-disease perspective mostly used by researchers. For clinicians, user interfaces to support a disease-to-gene view of clinically relevant mutations need to be developed. For most efficient retrieval of data across genes in different databases, two levels of standardization are necessary: the database syntax is standardized using the common Gen2Phen data model, but semantic standardization is required for full interoperability. This can be achieved by promoting the use of controlled vocabularies or ontologies to describe the contents of database fields. An overview of existing ontologies can be found via the EBI Ontology Lookup Service (http://www.ebi.ac.uk/ontology-lookup/ ) or the bioportal of the National Center for Biomedical Ontology (http://bioportal.bioontology.org/). Curators are strongly recommended to select a set of terms from these controlled vocabularies or ontologies, which are appropriate for the phenotypes associated with the variants in their LSDBs. If specific terms are missing, the curator should contact the ontology developers and discuss the different options with the user community, consisting of researchers and clinicians. To facilitate the use of controlled vocabularies or ontologies during data submission, selection lists with appropriate terms could be implemented in LSDB software. This can be seen as a logical step towards a ‘federated’ holistic Gen2Phen database.
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
9/9
References
1. Horaitis O., Talbot CC Jr, Phommarinh M, Phillips KM, Cotton RG. 2007. A database of locus-specific databases. Nat Genet 39:425. 2. Patrinos GP, Brookes AJ. 2005. DNA, diseases and databases: disastrously deficient. Trends Genet 21:333-338. 3. Beroud C, et al. 2005. UMD (universal mutation database). Hum Mutat 26:184-191.
4. Fokkema IF, den Dunnen JT, & Taschner PE. 2005. LOVD: easy creation of a locusspecific sequence variation database using an "LSDB-in-a-Box" approach. Hum Mutat 26:63-68. 5. Tarpey PS, et al. 2009. A systematic, large-scale resequencing screen of Xchromosome coding exons in mental retardation. Nat Genet. 41:535-543. 6. Cotton RG, Auerbach AD, Beckmann JS, Blumenfeld OO, Brookes AJ, Brown AF, Carrera P, Cox DW, Gottlieb B, Greenblatt MS, Hilbert P, Lehvaslaiho H, Liang P, Marsh S, Nebert DW, Povey S, Rossetti S, Scriver CR, Summar M, Tolan DR, Verma IC, Vihinen M, den Dunnen JT. 2008. Recommendations for locus-specific databases and their curation. Hum Mutat 29:2-5. 7. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. 2005. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33:D514–D517. 8. Bruford EA, Lush MJ, Wright MW, Sneddon TP, Povey S, Birney E. 2008. The HGNC Database in 2008: a resource for the human genome. Nucleic Acids Res 36 (Database issue):D445-448. 9. Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A. 2009. BioMart--biological queries made easy. BMC Genomics 10:22. 10. den Dunnen JT, Antonarakis SE. 2000. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum Mutat 15:7-12. 11. Krawczak M, Cooper DN. 1997. The human gene mutation database. Trends Genet 13:121-122. 12. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. 2001. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311. 13. Klein TE, Altman RB. 2004. PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. Pharmacogen J 4:1. 14. Mottaz A, David FP, Veuthey AL, Yip YL. 2010. Easy retrieval of single amino-acid polymorphisms and phenotype information using SwissVar. Bioinformatics 26:851-852.
© Copyright 2010 GEN2PHEN Consortium
HEALTH-F4-2007-200754
www.gen2phen.org
D4.4 Foundational LSDBs for all diseaserelated genes
WP4 – Genetics G2P Databases
V1.3 Final
Lead beneficiary: LUMC Date: 03/08/2010 Nature: Report Dissemination level: PU
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
2/9
TABLE OF CONTENTS DOCUMENT INFORMATION .................................................................................................. 3 DOCUMENT HISTORY ............................................................................................................. 3 DEFINITIONS .............................................................................................................................. 3 1. 2. 3. INTRODUCTION................................................................................................................. 5 DESCRIPTION OF WORK ................................................................................................ 5 FUTURE WORK .................................................................................................................. 7 3.1. 3.2. 3.3. POPULATING THE MENDELIAN GENES DATABASE............................................................ 7 DATA CURATION .............................................................................................................. 7 FUTURE DEVELOPMENTS .................................................................................................. 8
REFERENCES.............................................................................................................................. 9
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
3/9
Document Information
Grant Agreement HEALTH-F4-2007-200754 Number Full title Project URL Acronym GEN2PHEN
Genotype-To-Phenotype Databases: A Holistic Solution http://www.gen2phen.org
EU Project officer Iiro Eerola (Iiro.EEROLA@ec.europa.eu ) Deliverable Work package Delivery date Status Nature Dissemination Level Authors (Partner) Responsible Author Report Public Number 4.4 Number 4 Contractual Final Prototype Confidential Other Title Title Month 30 Foundational LSDBs for all disease-related genes Genetics G2P Databases Actual final 03/08/2010
P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Johan T. den Dunnen Partner LUMC Email J.T.den_Dunnen@lumc.nl Phone +31-71-5269501
Document History
Name P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) P. Taschner (LUMC) P. Taschner (LUMC) P. Taschner (LUMC) Date Version Description
18/06/10 05/07/10 15/07/10 03/08/10
1.0 1.1 1.2 1.3
Draft Final draft Final draft Final
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
4/9
Definitions
Partners of the GEN2PHEN Consortium are referred to herein according to the following codes: ULEIC – University of Leicester (UK) – Coordinator EMBL – European Molecular Biology Laboratory (Germany) – Beneficiary FIMIM – Fundació IMIM (Spain) – Beneficiary LUMC – Leiden University Medical Center (Netherlands) – Beneficiary INSERM – Institut National de la Santé et de la Recherche Médicale (France) – Beneficiary KI – Karolinska Institutet (Sweden) – Beneficiary FORTH – Foundation for Research and Tecnology Hellas (Greece) – Beneficiary CEA – Comissariat à l’Energie Atomique (France) – Beneficiary EMC – Erasmus Universitair Medisch Centrum Rotterdam (Netherlands) – Beneficiary UH.FGC – Helsingin Yliopisto (Finland) – Beneficiary UAVR – Universidade de Aveiro (Portugal) – Beneficiary UWC – University of the Western Cape (South Africa) – Beneficiary CSIR – Council of Scientific and Industrial Research (India) – Beneficiary SIB – Swiss Institute of Bioinformatics (Switzerland) – Beneficiary UNIMAN – The University of Manchester (UK) – Beneficiary BIOBASE – BioBase GmbH. (Germany) – Beneficiary deCODE – Islensk Erfoagreining EH (Iceland) – Beneficiary PHENO – Phenosystems S.A. (Belgium) – Beneficiary BCP – Biocomputing Platforms Ltd. Oy (Finland) – Beneficiary UPAT – University of Patras (Greece) – Beneficiary Grant Agreement: The agreement signed between the beneficiaries and the European Commission for the undertaking of the GEN2PHEN project (HEALTH-200754). Project: The sum of all activities carried out in the framework of the Grant Agreement by the Consortium. Work plan: Schedule of tasks, deliverables, efforts, dates and responsibilities corresponding to the work to be carried out for the GEN2PHEN project, as specified in Annex I to the Grant Agreement. Consortium: The GEN2PHEN Consortium, conformed by the above-mentioned legal entities. Consortium agreement: agreement concluded amongst GEN2PHEN participants for the implementation of the Grant Agreement. Such an agreement shall not affect the parties’ obligations to the Community and/or to one another arising from the Grant Agreement.
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
5/9
1. INTRODUCTION
Inter-individual genome variation plays a major role in differential normal development and disease processes. However, the details of how these relationships work are far from clear, even in the case of most Mendelian disorders where single genetic alterations are fully penetrant (essentially causative, rather than risk modifying). Background genetic effects (modifier genes), epistasis, somatic variation, and environmental factors all complicate the situation. Extensive research is therefore being conducted worldwide to characterise genetic variation in normal and disease contexts. To store this information, several hundred locusspecific databases (LSDBs) that target specific diseases or genes existed before the start of the Gen2Phen project (1). These had been constructed using a plethora of different technologies and designs. Most of the resulting databases were rather primitive in implementation and small in scale (2). The Universal Mutation Database (UMD http://www.umd.be/, (3)) and the Leiden Open-source Variation Database (LOVDhttp://www.lovd.nl, (4)) ‘LSDB-in-a-box’ applications have solved this problem by supporting existing LSDBs that wished to transfer to our platforms. From the start of the project, WP4 has created many new LSDBs for interested researchers. On June 11, 2010, 1482 LSDBs, of which the majority is LOVD-based, were listed on the Waystation website (http://www.centralmutations.org). Many of the clinically important genes, however, are covered by several databases. WP4 Activity 4.2 LSDB Creation, which is led by LUMC, culminates in this deliverable: the creation of foundational LSDBs for the remaining diseaserelated genes. The steps involved to reach this milestone and to populate the databases are described below.
2. DESCRIPTION OF WORK
Previously, we have successfully demonstrated the feasibility of creating many LSDBs in a single LOVD2 installation (http://www.lovd.nl/MR) in the Mental Retardation database pilot project. For this >500 LSDBs for genes on the X chromosome were created to present the results of a large-scale resequencing study in patients with X-linked mental retardation (5). We decided to use the same approach and created a new LOVD2 installation for the LOVD Mendelian Genes database (http://www.lovd.nl/mendelian_genes). The next step was the creation of a list of genes involved in disorders with Mendelian inheritance with the necessary information according to the Human Variome Project guidelines (6) to automatically setup separate LSDBs for each gene. The June 2010 morbid map from the Online Mendelian Inheritance in Man database (OMIM http://www.ncbi.nlm.nih.gov/Omim/, (7)) contains 5508 entries with the loci of all 4348 mapped diseases, 189 traits and 967 susceptibilities. Not all the genes involved in these phenotypes have been identified yet. OMIM lists 2436 genes with one or more allelic variants associated with a phenotype. All human gene symbols with the necessary information for the gene homepage (e.g. reference sequence accession numbers, cross-references, etc.) were downloaded from the Human Gene Nomenclature Committee website (HGNC http://www.genenames.org/, (8)). This was combined with MIM disease numbers from OMIM using Biomart (http://www.biomart.org/, (9)). Mendelian disease genes for which no
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
6/9
LSDB existed or for which only static web pages were available are included in the Mendelian Genes database (See https://grenada.lumc.nl/LOVD2/mendelian_genes/status.php for an overview of the 2118 disease-related genes and reported variants). This may help to entice the maintainers of static web pages to switch to using LSDB software. This achievement was the final step to reach milestone M13: Completion of web-accessible foundational LSDBs for all Mendelian disease-related genes. It is also a major step towards a standardised and integrated ‘federated’ LSDB database with common search tools, query interfaces, and data output formats, since most existing databases are using LOVD2 software. Clinicians and researchers can access the database to submit new variants in different genes associated with specific diseases or phenotypes or to search for sequence variants in specific genes, although the Mendelian Genes database only contains several hundreds of variants at the moment (Fig. 1). The Mendelian Genes database and all other LOVD installations hosted on our servers are accessible with the centralised search capabilities of the LOVD API (See http://www.gen2phen.org/post/development-lovd-restful-atom-webservice for a description and examples). The Human Genome Variation Society (HGVS) sequence variant nomenclature plays an important role in the correct search and exchange of data in LSDBs and genomic databases (See http://www.hgvs.org/mutnomen , (10)). The Mendelian Genes database also uses LOVD’s Mutalyzer module to ensure the use of the unambiguous HGVS sequence variant descriptions.
Fig. 1. Mendelian Gene LSDBs ordered by the number of variants in the Manager’s View. All genes have reference sequences, but some of them have not yet been selected by the curator for display on the LOVD homepage (RefSeq? column).
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
7/9
3. FUTURE WORK
We have created foundational LSDBs for all disease-related genes, but these are mostly empty repositories for sequence variation data for each gene involved in Mendelian disorders. Although empty LSDBs are not very useful, their mere existence invalidates excuses used by DNA diagnostic labs and researchers not to submit data into the public domain. Journal editors and reviewers can now refute such excuses by referral to the Mendelian genes database. They also have no excuse to exempt authors from submission of variant data before accepting a manuscript. Filling databases without proper curation is also debatable, because it might result in low quality data. On the other hand, these databases might also attract potential curators by their content, i.e. phenotypic information suggesting an association with particular variants, and evolve into a high quality database. WP4 will continue to support curators with the migration of their static databases to this or other database systems to accomplish a standardised and integrated ‘federated’ LSDB. 3.1. Populating the Mendelian Genes database Without advertisement, the Mendelian genes database is already receiving submissions. To provide initial content, WP7 activities to populate the foundational LSDBs were foreseen in the original Gen2Phen Description of Work. For a start, variants can be retrieved from comprehensive databases, which contain information about all genes. OMIM and the Human Gene Mutation Database (HGMD - www.hgmd.cf.ac.uk, (11)) contain many variants, which have been extracted from the literature. The public part of HGMD contains no detailed information about the frequency of the variant and the phenotype. Other potential sources are the Single Nucleotide Polymorphism database (dbSNP, (12)), the pharmacogenetics database (PharmGKB - http://www.pharmgkb.org/, (13)) and SwissVar (http://www.expasy.org/swissvar/, (14)). Descriptions from these databases have to be reformatted to HGVS format before import. As demonstrated by our Mental Retardation database pilot project, researchers can submit new variants identified by whole genome or exome resequencing. Since the quality of next generation sequencing data varies, variant submissions should be accompanied by evidence codes or other information indicating the likelihood that it is not a sequencing error. The next challenge will be to get data from diagnostic laboratories either directly or via the Diagnostic Mutation Database (DMudb http://www.ngrl.org.uk/Manchester/projects/informatics/dmudb) or the Café for Routine Genetic data Exchange (Café Rouge - http://www.caferouge.org/). 3.2. Data curation Although volunteers regularly contact us to become database curator, most of the new LSDBs have a curator vacancy. As part of WP4 Activity 4.6 Curators (‘guardians’ of the foundational databases) will be enlisted from a list of experts consulted in WP2, from the HVP and HGVS communities, and from other sources such as WikiProteins members, to perform this task. Another source of curators will be provided by contacting publishers to
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
8/9
suggest LSDB creation each time a new disease gene has been identified. Interest in curatorship may also be raised by participation in the training activities organized in WP8 or by the dissemination activities organized in WP9. In this ‘federated’ LSDB system, the LSDB curators will retain control and ownership of their own specialised database and its contents, which will exist as part of a seamless holistic system. This effort will be led and realized by ULEIC. 3.3. Future developments The Mendelian Genes database described here will have to be extended on a regular basis with new disease genes, which are discovered rapidly due to whole genome or exome resequencing. It provides a sufficiently organised and mature database infrastructure to gather, store, integrate and query variant and phenotype data from the gene-to-disease perspective mostly used by researchers. For clinicians, user interfaces to support a disease-to-gene view of clinically relevant mutations need to be developed. For most efficient retrieval of data across genes in different databases, two levels of standardization are necessary: the database syntax is standardized using the common Gen2Phen data model, but semantic standardization is required for full interoperability. This can be achieved by promoting the use of controlled vocabularies or ontologies to describe the contents of database fields. An overview of existing ontologies can be found via the EBI Ontology Lookup Service (http://www.ebi.ac.uk/ontology-lookup/ ) or the bioportal of the National Center for Biomedical Ontology (http://bioportal.bioontology.org/). Curators are strongly recommended to select a set of terms from these controlled vocabularies or ontologies, which are appropriate for the phenotypes associated with the variants in their LSDBs. If specific terms are missing, the curator should contact the ontology developers and discuss the different options with the user community, consisting of researchers and clinicians. To facilitate the use of controlled vocabularies or ontologies during data submission, selection lists with appropriate terms could be implemented in LSDB software. This can be seen as a logical step towards a ‘federated’ holistic Gen2Phen database.
© Copyright 2010 GEN2PHEN Consortium
D4.4 Foundational LSDBs for all disease-related genes
WP4 - Genetics G2P databases Author(s): P. Taschner (LUMC), I. Fokkema (LUMC), J. den Dunnen (LUMC) Security: PU Version: v1.3
Final
HEALTH-200754
9/9
References
1. Horaitis O., Talbot CC Jr, Phommarinh M, Phillips KM, Cotton RG. 2007. A database of locus-specific databases. Nat Genet 39:425. 2. Patrinos GP, Brookes AJ. 2005. DNA, diseases and databases: disastrously deficient. Trends Genet 21:333-338. 3. Beroud C, et al. 2005. UMD (universal mutation database). Hum Mutat 26:184-191.
4. Fokkema IF, den Dunnen JT, & Taschner PE. 2005. LOVD: easy creation of a locusspecific sequence variation database using an "LSDB-in-a-Box" approach. Hum Mutat 26:63-68. 5. Tarpey PS, et al. 2009. A systematic, large-scale resequencing screen of Xchromosome coding exons in mental retardation. Nat Genet. 41:535-543. 6. Cotton RG, Auerbach AD, Beckmann JS, Blumenfeld OO, Brookes AJ, Brown AF, Carrera P, Cox DW, Gottlieb B, Greenblatt MS, Hilbert P, Lehvaslaiho H, Liang P, Marsh S, Nebert DW, Povey S, Rossetti S, Scriver CR, Summar M, Tolan DR, Verma IC, Vihinen M, den Dunnen JT. 2008. Recommendations for locus-specific databases and their curation. Hum Mutat 29:2-5. 7. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. 2005. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33:D514–D517. 8. Bruford EA, Lush MJ, Wright MW, Sneddon TP, Povey S, Birney E. 2008. The HGNC Database in 2008: a resource for the human genome. Nucleic Acids Res 36 (Database issue):D445-448. 9. Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A. 2009. BioMart--biological queries made easy. BMC Genomics 10:22. 10. den Dunnen JT, Antonarakis SE. 2000. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum Mutat 15:7-12. 11. Krawczak M, Cooper DN. 1997. The human gene mutation database. Trends Genet 13:121-122. 12. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. 2001. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311. 13. Klein TE, Altman RB. 2004. PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. Pharmacogen J 4:1. 14. Mottaz A, David FP, Veuthey AL, Yip YL. 2010. Easy retrieval of single amino-acid polymorphisms and phenotype information using SwissVar. Bioinformatics 26:851-852.
© Copyright 2010 GEN2PHEN Consortium
This document is © 2010 by acaciareiche - all rights reserved.
Tags:
- Login to post comments
