Login
/
Register
or use
OpenID
More options
Home
News
Events
View all events
Calendar
Map
Community
Blogs
--Groups--
Biobank Informatics
Cafe Rouge Development
DNA Enrichment
Functional Prediction
LRG
Phenotype Modelling
Researcher identification
Semantic Web in GEN2PHEN
Sharing Summary GWAS Data
Web services and exchange formats
BRIF: Bio-resource Impact Factor
Data
LSDB Listing
About GEN2PHEN
General Information
Project Summary and Objectives
Background and Concept
Future Vision - Current Reality
Strategy
Work Packages
Deliverables
Dissemination activities
Publications
Work Packages
WP1
WP2
WP3
WP4
WP5
WP6
WP7
WP8
WP9
WP10
Deliverables
Meeting document
General Assembly Meetings
Meeting minutes
Mid-term review
Steering Committee
Annual report
4 monthly progress report
Monthly progress report
Template
Dissemination tools
Grant agreement
Working document
Other
Subversion
D2.1 - Workshop to Review the G2P Database Field and Current Data Models
Posted Thu, 07/05/2009 - 16:43 by
Acacia Reiche
Brief decription of this document
Attachment
Size
D2 1_v1.3final.doc
170 KB
Embedded Scribd iPaper - Requires Javascript and Flash Player
Enable JavaScript in your browser to view this document as it was initially formatted.
HEALTH-F4-2007-200754 http://www.gen2phen.org
D2.1. Workshop to Review the G2P Database Field and Current Data Models
WP2 – Domain analysis and community relations
V1.3 Final
Lead beneficiary: EMC Date: 22/01/2009 Nature: Report Dissemination level: PU
(Public)
© Copyright 2009 GEN2PHEN Consortium
D2.1 - Workshop to Review the G2P Database Field and Current Data Models
WP2 – Domain analysis and community relations
HEALTH-200754
Author(s): Christina Mitropoulou, George P. Patrinos
Security: PU Version:
v1.3 –Final
2/9
TABLE OF CONTENTS DOCUMENT INFORMATION ………………………………………………………………3 DOCUMENT HISTORY……………………………………………………………….………3 1. INTRODUCTION ………………………………………………………….…….…….4 2. APPROACH OVERVIEW …………………………………………………….………4 GEN2PHEN DOMAIN MODELLING MEETING ……………………………………4 HUMAN VARIOME PROJECT PLANNING MEETING ……………………….……5 3. FUTURE PLANS ……………………………………………………………………….6 REFERENCES ………………………………………………………………………………….7
© Copyright 2009 GEN2PHEN Consortium
D2.1 - Workshop to Review the G2P Database Field and Current Data Models
WP2 – Domain analysis and community relations
HEALTH-200754
Author(s): Christina Mitropoulou, George P. Patrinos
Security: PU Version:
v1.3 –Final
3/9
Document Information
Grant Agreement HEALTH-F4-2007-200754 Number Full title Project URL http://www.GEN2PHEN.org Acronym GEN2PHEN
Genotype-To-Phenotype Databases: A Holistic Solution
EU Project officer Frederick Marcus (Frederick.Marcus@ec.europa.eu) Deliverable Work package Delivery date Status Nature Dissemination Level Number D2.1 Title Number 2 Contractual Title Workshop to Review the G2P Database Field and Current Data Models WP2 – Domain analysis and community relations Actual final December 2008
December 2008
Version 1.3 Report Prototype Other Public Confidential
Authors (Partner) 9. EMC Responsible Author George P. Patrinos Partner EMC Email g.patrinos@erasmusmc.nl Phone +30-6958.008355
Document History
Name D2.1 Workshop report: Review of the G2P database field and current data models Date Version Description
15.11.2008 27.11.2008 10.12.2008 22.01.2009
1 1.1 1.2 1.3
CM Draft GP Edit AJB review DD review, GP Final
© Copyright 2009 GEN2PHEN Consortium
D2.1 - Workshop to Review the G2P Database Field and Current Data Models
WP2 – Domain analysis and community relations
HEALTH-200754
Author(s): Christina Mitropoulou, George P. Patrinos
Security: PU Version:
v1.3 –Final
4/9
1. INTRODUCTION
Work Package 2 (WP2) has two main objectives: (1) by employing a plethora of approaches we shall consult and seek expert opinion, on an ongoing basis, from key stakeholders, G2P database developers, data creators, data users, and the entire G2P database community, and (2) by technically analyzing the field we shall define the genetics and genomics database domain in order to understand the key features of related database projects. In both of these activities, the ultimate goal is to define the real needs, desires, and challenges facing the G2P community, to document this, and thereby to fine-tune and focus the GEN2PHEN project. The work will not only be key in helping us to gain the trust of the end-users of GEN2PHEN developments, but it will also ensure we build things with an optimal system architecture, adopt the most adequate and ethically-sound approaches for data gathering/submission, use sound data models, and bring enhanced visibility of the project’s aims and goals to interested parties. Our analysis will focus on data models and will include a comparison of the requirements of the various types of G2P databases, database development and curation criteria, data models and requirements, and the existing knowledge regarding G2P related ontologies. This work shall cover human and model organism G2P database projects, as well as data integration systems.
2. APPROACH OVERVIEW
To strengthen and benefit from good community relations, GEN2PHEN will undertake extensive consultations with various G2P field stakeholders, ranging from G2P database technologists through to biobank teams, data creators, and G2P data end-users. These consultations will enable us to compare, contrast and synergise our project plans with the activities of others. To this end, we have committed to consult with most leading G2P database teams, the broad locus-specific database (LSDB) community, and a large range of data-generation projects (such as the GAIN and WTCCC initiatives). Additionally, our plan is to seek input from leading coordination and unification efforts, such as the Human Variome Project and the Public Population Project in Genomics (P3G) consortium, and we shall consider the activities of other EU projects (past and present), such as EuroGenTest, GenomEUtwin and INFOBIOMED. Finally, we will interact with various human genetics societies and genetic journals, particularly within Europe. Many of these connections have now been made, involving a range of approaches such as meetings, focused workshops, e-mail exchanges, teleconferences, surveys, etc. The many existing relations GEN2PHEN members have with others in the field provided us with a good starting point for these various consulting efforts.
© Copyright 2009 GEN2PHEN Consortium
D2.1 - Workshop to Review the G2P Database Field and Current Data Models
WP2 – Domain analysis and community relations
HEALTH-200754
Author(s): Christina Mitropoulou, George P. Patrinos
Security: PU Version:
v1.3 –Final
5/9
In particular, to achieve our stated goals we have so far (co-)organized 2 workshops to assess the needs of the G2P community. The first workshop, held at the EBI in Hinxton (UK) during April 2008, concentrated on data/object models and related technologies, and involved partners from the GEN2PHEN consortium. The second meeting, held at a venue in the Costa Brava (Spain) during May 2008, concentrated on a wide range of issues around G2P data gathering, databasing, and utilization. This meeting was co-organized with the Human Variome Project and the Human Genome Variation Society (http://www.hgvs.org), and involved many participants from the broad G2P community. Detailed reports on these two workshops are provided as appendices I and II, respectively. This current document outlines the main findings and conclusions of these two meetings, from the perspective of the GEN2PHEN consortium objectives.
2.1. GEN2PHEN DOMAIN MODELLING MEETING This meeting was organized among the key GEN2PHEN partners to address the following issues relating to technical standards across the G2P databasing domain: • • • • •
•
To develop and prioritise GEN2PHEN use cases. To understand resources already provided by partners both technically and in terms of existing use cases, local data models, use of ontologies, and requirements for integration, To identify commonalities and differences between the GEN2PHEN existing resources, To understand the process of object modelling in the biomedical domain, To share previous experience of object modelling in the biomedical domain, To evaluate relevant public domain models
A detailed report on this meeting is provided in appendix I. The meeting demonstrated that several public data models currently exist in the Genotype to Phenotype space, and these were evaluated for relevance, domain coverage compared to existing resources, ease of use and complexity. One major initiative is the Phenotype and Genotype Object Model (PaGE-OM). This started life as the Polymorphism Markup Language (PML), which became registered as an OMG standard
© Copyright 2009 GEN2PHEN Consortium
D2.1 - Workshop to Review the G2P Database Field and Current Data Models
WP2 – Domain analysis and community relations
HEALTH-200754
Author(s): Christina Mitropoulou, George P. Patrinos
Security: PU Version:
v1.3 –Final
6/9
model (accepted in 2005). It was developed primarily to allow the exchange of SNP data between resources and analysis applications. The PaGE-OM is a further developed version of PML, with scope extended to include phenotype and study/experiment data. The model is necessarily more complex than PML, and it is currently being considered by the OMG formalisation process. GEN2PHEN use cases were not explicitly considered in the development of PAGE-OM as the model was submitter to OMG concurrently with the start of the GEN2PHEN project, but various database implementations (including the first version of HGVbase-G2P which is now an activity in GEN2PHEN) have validated the utility of PaGE-OM. The FuGE model was produced in response to attempts to merge two existing models – the Microarray Object Model (MAGE-OM) and Pedro, a proteomics model. Pedro and MAGE-OM covered different technologies – proteomics and microarray respectively. Both models represent information on the technology and biology but in a technology specific way. FuGE provides a technology independent base object model from which application specific models can be derived. By considering PaGE-OM, FuGE, and MAGE-OM, as well by reviewing some pertinent database designs and by assembling a list of relevant use cases, the workshop concluded that there is a need for a great deal more standardisation on matters of syntax, semantics, and scope of G2P databasing projects. Without this, the current diverse and dispersed activities in G2P databasing will be extremely difficult (or impossible) to harmonise and integrate. Particular attention was given to the immediate need for improved standardisation of data/object modelling – both in the core design of databases, and in terms of exchange formats for channelling data flows. The international PaGE-OM project represents a good first step to create a standard G2P data model, but further work on that is warranted to optimise the model, to update it for new data challenges, and to harmonise it with other models such as FuGE and MAGE. This is especially needed in the context of developing data exchange formats that would be compatible across these different modelling domains. Most troubling is the high level diversity apparent in the realm of Locus Specific Databases (LSDBs) and related projects, not only in terms of content heterogeneity (reflecting differences in scope and objective) but especially in terms of the lack of obvious compatibility between their data models. Indeed, no LSDB or mutation database considered at the meeting was found to have a formalised and specifically designed data model underlying the database schema. This poses a serious hurdle in the development of LSDBs and, most importantly, in their interoperability. Currently there are over 700 such LSDBs (http://www.hgvs.org/dblist/glsdb.html), mostly web accessible. Complete collection and expert curation of gene sequence variants and their coupling to phenotypic consequences (if any), will be essential for proper future healthcare and research. Several partners also addressed the representation of ‘Phenotype’ in their respective resources, and so this question was evaluated further by the workshop. There are public domain efforts
© Copyright 2009 GEN2PHEN Consortium
D2.1 - Workshop to Review the G2P Database Field and Current Data Models
WP2 – Domain analysis and community relations
HEALTH-200754
Author(s): Christina Mitropoulou, George P. Patrinos
Security: PU Version:
v1.3 –Final
7/9
modelling phenotype, not least that which lies at the heart of PaGE-OM, and activities in the model organism community (CASIMIR, PATO, ZFIN, etc). A likely GEN2PHEN use case may involve data integration with model organism database: e.g. identification of mouse models based on comparison of mouse/human phenotypes and compatibility needs should be investigated as part of GEN2PHEN. Explicit actions will therefore be taken to work with the mouse community on phenotype data modelling and ontology developments. 2.2. HUMAN VARIOME PROJECT PLANNING MEETING This meeting was designed to examine a broad range of issues surrounding G2P databasing, with a most emphasis upon data issues related to Mendelian type mutations and gene/disease specific information. It was organised in conjunction with the Human Variome Project (HVP). The HVP is an international effort to systematically identify genes, their mutations, and their variants associated with human disease. They aim to help orchestrate and optimise activities around linking clinical, medical, and research laboratories for developing knowledge housed within databases. The composite knowledge should be accessible to the research and medical communities to improve research strategies and clinical medical practice. An example of the need for the HVP as applied to neurological disorders has recently been published . Our joint meeting with HVP involved >100 participants, all of which were invited to attend based upon their being actively involved in relevant projects. We also involved members of the Human Genome Variation Society (HGVS). The meeting agenda covered many areas, including: • • • • • • • • • • classifying genetic variation from unlinked clinical medicine or research laboratories capturing data from diagnostic and service laboratories assessment of mutation pathogenicity optimising data transfer streamlining data integration access questions of funding and governance the role of emerging countries ethical, legal and social issues questions of attribution and publication example pilot projects
A detailed report for these sessions is provided in appendix II. The main take home message of this meeting was that a great deal of work needs to be done in all the above areas, and gratifyingly, GEN2PHEN was explicitly recognised as a major and well
© Copyright 2009 GEN2PHEN Consortium
D2.1 - Workshop to Review the G2P Database Field and Current Data Models
WP2 – Domain analysis and community relations
HEALTH-200754
Author(s): Christina Mitropoulou, George P. Patrinos
Security: PU Version:
v1.3 –Final
8/9
focussed project that is helping significantly to move the field forward in a globally coordinated manner. However, one striking ‘gap’ that no-one yet seems to be dealing with effectively, is the overarching challenge of getting G2P research data to be used widely and suitably in clinical practice. The meeting agreed with GEN2PHEN that robust standards and generic databases solutions need to be developed to assist genome variation capture, and that at least the core elements of this information should be free and publically available. With particular reference to Mendelian type mutations, the role of diagnostic labs in the data generation side of the equation should involve connections that enable their findings to be fed into the public domain system. To database and disseminate Mendelian type mutations, there clearly needs to be improved standardisation of the platforms and the syntax and semantics of the managed information – exactly in line with GEN2PHEN plans. From the available off-the-shelf solutions, the Leiden Open Variation Database (LOVD - http://www.lovd.nl) and the Universal Mutation Database (UMD - http://www.umd.be/) solutions were recognised as the leading systems. These must be further developed to enable querying across cyberspace of a range of LSDBs to retrieve and analyze data. Relying upon a common database, language, and interoperability will enforce quality standards across clinical and research laboratories and will contribute towards data uniformity. In addition, the MUTbase software also offers useful functionalities, such as a effective visualization that could be also incorporated into LOVD and UMD. Another key area of immediate focus should be devising ways to validate data and data quality prior to its broader use in a federated LSDB network. Therefore, clinical and pathology data standards must be developed by experts in each genetic disorder for interpreting the effects of genetic variation.
3. OUTCOMES AND FUTURE PLANS
Central conclusions that emerged from both meetings were as follows: • • The LOVD and UMD database suites stand as attractive solutions to fulfil the LSDB development and curation needs, both utilizing open-source software, i.e. PHP and SQL. Standards (syntax and semantics) must be further developed, and this should be done by a coordinated bottom-up approach, enabled by extensive workshops and consortia interactions
Future activities in WP2 will concentrate upon a detailed technical analysis of the G2P databasing field, emphasising LSDBs, Diagnostic databases, and Genomics databases that contain either individual or summary level datasets. A comparative analysis is currently under way to consider these features, and also take note of how they are used in conjunction with
© Copyright 2009 GEN2PHEN Consortium
D2.1 - Workshop to Review the G2P Database Field and Current Data Models
WP2 – Domain analysis and community relations
HEALTH-200754
Author(s): Christina Mitropoulou, George P. Patrinos
Security: PU Version:
v1.3 –Final
9/9
specific data curation criteria. Documented data model summaries will provide important supporting materials for the data model development work.
REFERENCES
© Copyright 2009 GEN2PHEN Consortium
This document is © 2009 by acaciareiche - all rights reserved.
Average:
0
Your rating:
None
Groups:
Login
or
register
to post comments
© 2009
GEN2PHEN Project
.
The
GEN2PHEN Knowledge Centre
has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013)
under grant agreement number 200754 - the GEN2PHEN project.