Login
/
Register
or use
OpenID
More options
Home
News
Events
View all events
Calendar
Map
Community
Blogs
--Groups--
Biobank Informatics
Cafe Rouge Development
DNA Enrichment
Functional Prediction
LRG
Phenotype Modelling
Researcher identification
Semantic Web in GEN2PHEN
Sharing Summary GWAS Data
Web services and exchange formats
BRIF: Bio-resource Impact Factor
Data
LSDB Listing
About GEN2PHEN
General Information
Project Summary and Objectives
Background and Concept
Future Vision - Current Reality
Strategy
Work Packages
Deliverables
Dissemination activities
Publications
Work Packages
WP1
WP2
WP3
WP4
WP5
WP6
WP7
WP8
WP9
WP10
Deliverables
Meeting document
General Assembly Meetings
Meeting minutes
Mid-term review
Steering Committee
Annual report
4 monthly progress report
Monthly progress report
Template
Dissemination tools
Grant agreement
Working document
Other
Subversion
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
Posted Wed, 19/08/2009 - 11:54 by
Acacia Reiche
Brief decription of this document
Attachment
Size
D3.5 High-Level_Domain_Model_Version_2_with_Sample_Phenotype_Focus.pdf
1.05 MB
Embedded Scribd iPaper - Requires Javascript and Flash Player
Enable JavaScript in your browser to view this document as it was initially formatted.
HEALTH-F4-2007-200754
www.gen2phen.org
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
WP3 – Standard data models and terminologies
V5.0 Final Lead beneficiary: EMBL Date: 10/08/2009 Nature: Report Dissemination level: PU
(Public)
© Copyright 2009 GEN2PHEN Consortium
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Helen Parkinson Security: PU Version: v5.0
Final
HEALTH-200754
2/12
TABLE OF CONTENTS DOCUMENT INFORMATION .................................................................................................. 3 DOCUMENT HISTORY ............................................................................................................. 3 1. 2. 3. INTRODUCTION................................................................................................................. 4 DESCRIPTION OF WORK ................................................................................................ 5 EXISTING MODEL EVALUATION................................................................................. 5 3.1. 3.2. 3.3. 4. 4.1. 4.2. 5. 6. GENOMEUTWIN ............................................................................................................... 6 PAGE-OM ...................................................................................................................... 6 XGAP.............................................................................................................................. 6 PHENOTYPE MODEL CLASS DESCRIPTIONS ...................................................................... 7 OBJECT INSTANCE ............................................................................................................ 8
GEN2PHEN PHENOTYPE MODEL................................................................................. 7
PHENOTYPE MODEL IMPLEMENTATION AND TESTING.................................. 10 FUTURE PLANS ................................................................................................................ 11 6.1. 6.2. A HIGH-LEVEL DOMAIN MODEL VERSION 3 (D3.6) ...................................................... 11 DERIVATION AND SPECIFICATION OF EXCHANGE FORMAT (D3.7)................................. 11
7.
ABBREVIATIONS ............................................................................................................. 12
REFERENCES............................................................................................................................ 12 APPENDIX I - Report on the First GEN2PHEN Phenotype Workshop APPENDIX II - GEN2PHEN Phenotype Model Reference Implementation
© Copyright 2009 GEN2PHEN Consortium
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Helen Parkinson Security: PU Version: v5.0
Final
HEALTH-200754
3/12
Document Information
Grant Agreement HEALTH-F4-2007-200754 Number Full title Project URL Acronym GEN2PHEN
Genotype-To-Phenotype Databases: A Holistic Solution http://www.gen2phen.org
EU Project officer Frederick Marcus (Frederick.Marcus@ec.europa.eu) Deliverable Work package Delivery date Status Nature Dissemination Level Report Public Number D3.5 Title Number 3 Contractual Title June 2009 High-Level Domain Model Sample/Phenotype Focus Version 2, with
WP3 – Standard data models and terminologies Actual final Other August 2009
Version 5.0 Prototype Confidential
Authors (Partner) Tomasz Adamusiak (EMBL), Juha Muilu (UH.FGC), Morris Swertz (EMBL), Helen Parkinson (EMBL) Responsible Author Helen Parkinson Partner EMBL-EBI Email parkinson@ebi.ac.uk Phone +44 (0)1223 494 672
Document History
Name Date Version Description
Tomasz Adamusiak Helen Parkinson Tomasz Adamusiak Helen Parkinson Helen Parkinson
16/6/2009 7/7/2009 12/7/2009 14/7/2009 10/8/2009
1 2 3 4 5
First Draft Created Internal Review Corrections Comments Review
© Copyright 2009 GEN2PHEN Consortium
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Helen Parkinson Security: PU Version: v5.0
Final
HEALTH-200754
4/12
Definitions
Partners of the GEN2PHEN Consortium are referred to herein according to the following codes: ULEIC – University of Leicester (UK) – Coordinator EMBL – European Molecular Biology Laboratory (Germany) – Beneficiary FIMIM – Fundació IMIM (Spain) – Beneficiary LUMC – Leiden University Medical Center (Netherlands) – Beneficiary INSERM – Institut National de la Santé et de la Recherche Médicale (France) – Beneficiary KI – Karolinska Institutet (Sweden) – Beneficiary FORTH – Foundation for Research and Tecnology Hellas (Greece) – Beneficiary CEA – Comissariat à l’Energie Atomique (France) – Beneficiary EMC – Erasmus Universitair Medisch Centrum Rotterdam (Netherlands) – Beneficiary UH.FGC – Helsingin Yliopisto (Finland) – Beneficiary UAVR – Universidade de Aveiro (Portugal) – Beneficiary UWC – University of the Western Cape (South Africa) – Beneficiary CSIR – Council of Scientific and Industrial Research (India) – Beneficiary SIB – Swiss Institute of Bioinformatics (Switzerland) – Beneficiary UNIMAN – The University of Manchester (UK) – Beneficiary BIOBASE – BioBase GmbH. (Germany) – Beneficiary deCODE – Islensk Erfoagreining EH (Iceland) – Beneficiary PHENO – Phenosystems S.A. (Belgium) – Beneficiary BCP – Biocomputing Platforms Ltd. Oy (Finland) – Beneficiary
Grant Agreement: The agreement signed between the beneficiaries and the European Commission for the undertaking of the GEN2PHEN project (HEALTH-200754). Project: The sum of all activities carried out in the framework of the Grant Agreement by the Consortium. Work plan: Schedule of tasks, deliverables, efforts, dates and responsibilities corresponding to the work to be carried out for the GEN2PHEN project, as specified in Annex I to the Grant Agreement. Consortium: The GEN2PHEN Consortium, conformed by the above-mentioned legal entities. Consortium agreement: agreement concluded amongst GEN2PHEN participants for the implementation of the Grant Agreement. Such an agreement shall not affect the parties’ obligations to the Community and/or to one another arising from the Grant Agreement.
© Copyright 2009 GEN2PHEN Consortium
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Helen Parkinson Security: PU Version: v5.0
Final
HEALTH-200754
5/12
1. INTRODUCTION
Work package 3 ‘Standard data models and terminologies’ provides domain standards to develop GEN2PHEN specific architecture, facilitate data exchange and integrate data across existing and emerging resources. This work package is focused on providing standards to act as the foundation for much of the database development activities of other work packages. The work package objectives include the rapid development of a standard data model(s) capable of representing the minimum agreed content standard (as determined by WP2) and a derived data exchange format. Data models developed in coordination with WP3 will have several uses in GEN2PHEN: data from pre-existing databases will be mapped to generate data in a derived data exchange format, thus offering a flexible solution for integrating and exchanging existing and new data. In this respect, data model development is a necessary prerequisite, initially separated from implementation details.
2. DESCRIPTION OF WORK
The focus of the GEN2PHEN High-Level Domain Model Version 2, with Sample/Phenotype Focus development process is: • • • To evaluate relevant public phenotype models To develop a core GEN2PHEN phenotype model To support primary GEN2PHEN use cases, especially in LSDB and HTP domains
The two GEN2PHEN modelling workshops: Hinxton (April 9-11, 2008) and Helsinki (January 19-22, 2009) laid the groundwork for specific sub domain development. Subsequent work was continued during the first GEN2PHEN Phenotype Workshop (Geneva, May 7-8, 2009), hosted by SIB). Use cases were gathered and models were developed and minimum content standards to be used in exchanging data between partners were discussed in the context of specific phenotype extensions. See Appendix 1 for detailed workshop proceedings. External invited participants from the epidemiology, medical genetics, ontology development and model organism communities provided expertise and use cases beyond those of Consortium Partners.
3. Existing model evaluation
Several public data models 1 currently exist in the Phenotype space and those closely aligned to GEN2PHEN were evaluated for relevance, domain coverage compared to existing resources, ease of use and complexity during the First Phenotype Workshop.
1
Some of the data models have been documented at www.schemalet.org, which is an experimental wiki site for documenting use case specific data models.
© Copyright 2009 GEN2PHEN Consortium
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Helen Parkinson Security: PU Version: v5.0
Final
HEALTH-200754
6/12
3.1. GenomEUtwin A 5th Framework Programme aimed at unifying studies of European volunteer twins to identify genes underlying common diseases. The GenomEUtwin object model has already been tested on large population cohorts by UH.FGC. See paragraph 4.1 in Appendix 1 for a diagram and more details of the model.
3.2. PAGE-OM A complete OMG standard reference model that represents genotype data at summary and at the level of the individual. It also represents LSDB type data, phenotype, and supports some legacy technology use cases. PAGE-OM is very detailed and is useful as a reference model; meaning that GEN2PHEN specific models can be aligned to it and it can be used as a meta-mapping model for mapping external data representations. It is however, rather complex and one aim of WP3 modelling activities is to develop ‘modules’ whereby domain specific models can be developed, used alone, implemented and made interoperable. See paragraph 4.2 in Appendix 1 for a diagram and more details of the model.
3.3. XGAP The XGAP model (http://www.xgap.org). XGAP addresses the challenges of system-wide genetics experiments in data management, querying and integration via a simple tabular text file format to exchange data between collaborators, a customizable data infrastructure to store, query and integrate data, as well as providing a foundation for the analysis tools. See paragraph 4.3 in Appendix 1 for a diagram and more details of the model.
© Copyright 2009 GEN2PHEN Consortium
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Helen Parkinson Security: PU Version: v5.0
Final
HEALTH-200754
7/12
4. GEN2PHEN Phenotype Model
Figure 1. GEN2PHEN Phenotype Model
A GEN2PHEN phenotype model was developed during the Phenotype Workshop in Geneva based on Partners’ input and invited domain experts’ opinions. It was later iterated through a series of face to face meetings and teleconferences among Partners. Figure 1 presents the l.0 version of the model, constructed in Enterprise Architect. It is also available from the schemalet.org website as well as in Enterprise Architect and XML formats from the GEN2PHEN SVN: (https://svn.gene.le.ac.uk/gen2phen/trunk/object_models/) 4.1. Phenotype Model class descriptions • Individual – Individual. Subject of a study. • • • Inferred_value – Inferred conclusion, derived from zero or many Observed_value instances. Observable_feature – A measurable feature of an Individual, e.g. blood pressure. Observation_target – Super class of all observation targets like Individual or Panel.
© Copyright 2009 GEN2PHEN Consortium
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Helen Parkinson Security: PU Version: v5.0
Final
HEALTH-200754
8/12
• • • • • •
Ontology_term – Term defined in a specific namespace (ontology source). All names and terms should be defined using ontology terms whenever possible. Observed_value – Specific value measured in an experiment, e.g. 120 (systolic BP, mmHg). Panel – Collection of Individual instances. Protocol – Describes how measurement is to be performed, or a specific Standard Operating Procedure. Protocol_application – Describes how Protocol was instantiated a particular case, how the measurement was done, e.g. on 16/6/2009 by Tomasz Adamusiak. Variable_definition – Extends the Observable_feature class to enable precise definition of the feature in used applications (for example has unit).
Mappings to PaGE-OM and XGAP are available on the schemalet wiki at: http://www.schemalet.org/mediawiki/index.php/COMMON:Phenotype 4.2. Object instance
Figure 2. GEN2PHEN Phenotype Model object instance
An example instance of the model is shown in Figure 2. A blood pressure measuring protocol was applied to observation target Juha on 25/5/2009. Two values were measured at 10am: 150 and 90, which were systolic and diastolic blood pressure in mmHg respectively.
© Copyright 2009 GEN2PHEN Consortium
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Helen Parkinson Security: PU Version: v5.0
Final
HEALTH-200754
9/12
Figure 3. Inferred value example
The instance depicted in Figure 3 extends the previous one to show how a previously measured blood pressure can be used to infer disease status. A separate inference protocol was applied on 31/5/2009, and a high blood pressure was observed at 2pm.
© Copyright 2009 GEN2PHEN Consortium
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Helen Parkinson Security: PU Version: v5.0
Final
HEALTH-200754
10/12
5.
Phenotype Model implementation and testing
Figure 4. GEN2PHEN Phenotype Model implementation in Molgenis notation
© Copyright 2009 GEN2PHEN Consortium
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Helen Parkinson Security: PU Version: v5.0
Final
HEALTH-200754
11/12
In order to test and develop the Gen2Phen phenotype model we have collaborated with the developers of MOLGENIS [1, 2]. MOLGENIS is an open source software platform to efficiently design, implement, and autogenerate database, APIs, and web applications from object models. Its power is in the use of models and generators so the best solutions are easily reused between applications. MOLGENIS in one simple step generates a database (mySQL or postgreSQL), a web-based GUI, programmatic interfaces including Java API, SOAP web services usable in tools like Taverna (http://taverna.sourceforge.net) and by statistical scripts written in the R language (http://www.r-project.org), as well as a full documentation of the object model. Several Java plug-in mechanisms are also available to customize the generated software. By developing smaller models and ensuring interoperability using MOLGENIS some or all of the models can be consumed by various partners, the majority of whom have use cases which encompass only some of the models. MOLGENIS has been successfully used within the GEN2PHEN Consortium by: 1. MAGE-TAB OM: http://magetab-om.sourceforge.net 2. LSDB object model developed in the course of the Second Modelling Workshop: http://magetab-om.sourceforge.net/lsdb/1.0/object_model.html 3. An example LSDB - Findis, the Finnish National Mutation Database (NMDB): http://www.schemalet.org/mediawiki/index.php/FINDIS:Database Figure 4 depicts GEN2PHEN Phenotype Model as implemented on the MOLGENIS platform. Full documentation is available in Appendix 2 and a working implementation of the model, comprising a back end database, GUI, etc. is available from: http://wwwdev.ebi.ac.uk/microarray-srv/pheno/
6. FUTURE PLANS
6.1. A High-Level Domain Model Version 3 (D3.6) This will be an improved and tested set of standard UML data models for all required domains, ready to be implemented by all Partners. Feedback from Partners will be then used to provide the ultimate design underpinnings for all GEN2PHEN databases in Iterative Specialized Domain Modelling Complete (D3.9). These sub-domain models including GEN2PHEN Phenotype Model will all be extensively tested and a reference implementation will be provided on the MOLGENIS platform. 6.2. Derivation and Specification of Exchange Format (D3.7) The priorities for data formats in GEN2PHEN are the data exchange between locus specific databases and central repositories and HTP data. The modelling work to date has separated these
© Copyright 2009 GEN2PHEN Consortium
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Helen Parkinson Security: PU Version: v5.0
Final
HEALTH-200754
12/12
domains to support immediate needs for data exchange. The models developed will eventually support the phenotype extension reported here as well. Validation of LSBD data model commenced in 2009 by working with the existing LSDBs inside and outside the GEN2PHEN consortium, most of who have existing data formats. Those formats will support the data content of the GEN2PHEN Phenotype Model. Validation of the MAGE-TAB OM is underway and progress is promising. We envisage that the phenotypic descriptors, e.g. membership of a cohort through a shared phenotype, or trait will require an extension of MAGE-TAB, and the requirement to provide details of markers in context of HTP data will also require an extension.
7. Abbreviations
HGVS LSDB XGAP PaGE-OM Human Genome Variation Society Locus Specific Database Xtensible Genotype And Phenotype data platform Phenotype and Genotype Experiment object model
REFERENCES
1. Swertz, M.A., et al., Molecular Genetics Information System (MOLGENIS): alternatives in developing local experimental genomics databases. Bioinformatics, 2004. 20(13): p. 2075-83. Swertz, M.A. and R.C. Jansen, Beyond standardization: dynamic software infrastructures for systems biology. Nat Rev Genet, 2007. 8(3): p. 235-43. Wildeman, M., et al., Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker. Hum Mutat, 2008. 29(1): p. 6-13.
2. 3.
© Copyright 2009 GEN2PHEN Consortium
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus. Appendix 1 - Report on the First GEN2PHEN Phenotype Workshop
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Gudmundur Thorisson, Helen Parkinson Security: PU Version: v1.0
HEALTH-200754
1/11
Appendix 1 Report on the First GEN2PHEN Phenotype Workshop
Host Venue Swiss Institute of Bioinformatics (SIB) Centre Medicale Universitaire (CMU) 1 Rue Michel-Servet CH1211 Geneva 7-8 May 2009
Dates
1. Overview
The First GEN2PHEN Phenotype Workshop (Geneva 7-8 May 2009) was hosted by SIB as a follow up the Second Modelling Workshop hosted by UH.FGC (Helsinki 19-22.1.2009). See http://askja.gene.le.ac.uk/drupal5/Modelling_Workshop_2_Report for details on the previous workshop. Use cases and models evaluated previously, served as a basis in developing minimum content standards for exchanging phenotypic information among partners as well as for building and evaluating preliminary phenotype model in partial fulfilment of WP3 deliverables D3.5. Use cases identified in the Genotype to Phenotype domain in a previous deliverable D3.1 were subsequently refined by contact with the wider community and used to drive the development of a domain independent phenotype model. Various pre-existing domain models exist and the workshop began the process of evaluating these for GEN2PHEN needs. This report describes the workshop content.
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus. Appendix 1 - Report on the First GEN2PHEN Phenotype Workshop
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Gudmundur Thorisson, Helen Parkinson Security: PU Version: v1.0
HEALTH-200754
2/11
2. Participants
Consortium members Name Andrew Devereau Mike Cornell Veronique Humbertclaude Christophe Beroud Anna Pigeon David Atlan Gudmundur Thorisson Sergio Matos Anne-Lise Veuthey Lydie Bougueleret Annais Mottaz Lina Yip Juha Muilu Helen Parkinson James Malone Tomasz Adamusiak Organisation UNIMAN UNIMAN INSERM INSERM INSERM PHENO ULEIC UAVR SIB SIB SIB SIB UH.FGC EMBL EMBL EMBL
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus. Appendix 1 - Report on the First GEN2PHEN Phenotype Workshop
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Gudmundur Thorisson, Helen Parkinson Security: PU Version: v1.0
HEALTH-200754
3/11
Invited domain experts Domain experts represented among others the following consortia: CASIMIR (www.casimir.org.uk), ENGAGE (www.euengage.org), GenomEUTwin (www.genomeutwin.org) BBMRI (www.bbmri.eu) and P3G (www.p3g.org). Name Alan Rector Peter Robinson John Hancock Paul Burton Isabel Fortier Morris Swertz Mauno Vihinen Maria Krestyaninowa Mike Gostev IIlkka Lappalainen Sraboni Ghost Abriel Hugues Organisation UNIMAN Charite Universitaetsmedizin MRC ULEIC ENEP University Medical Center Groningen EMBL EMBL EMBL EMBL Genionics Universitaet Bern
3. Agenda and slides
Agenda and speakers' slides are available from http://askja.gene.le.ac.uk/drupal5/content/firstphenotype-workshop-agenda
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus. Appendix 1 - Report on the First GEN2PHEN Phenotype Workshop
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Gudmundur Thorisson, Helen Parkinson Security: PU Version: v1.0
HEALTH-200754
4/11
4. Models evaluated
4.1. TWIN:Phenotype
Observation is phenotypic observation done by a specific method, which is documented under an observation framework. Classification is inferred or classified conclusion of measurement(s) (here blood pressure). Ontology is the name space (E.g. EUTwin) used for vocabulary (i.e. high blood pressure, low blood pressure) and Classification method provides information on classification specification. Time_accuracy is needed because it is not always possible to know
the time exactly (e.g. in some cases exact time cannot be given and date and month must be coded using agreed convention). More information on the model available on the Schemalet website
http://www.schemalet.org/mediawiki/index.php/TWIN:Phenotype
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus. Appendix 1 - Report on the First GEN2PHEN Phenotype Workshop
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Gudmundur Thorisson, Helen Parkinson Security: PU Version: v1.0
HEALTH-200754
5/11
4.2. PAGEOM:Phenotype
Observable features (nose size) can be measured using different observation methods (e.g. ruler) leading to single or multiple observed values (nose size) over observation target(s) (individual). Features can be categorised under different feature categories (e.g. clinical test, heart function, etc.) More information on the model available on the Schemalet website
http://www.schemalet.org/mediawiki/index.php/PAGEOM:Phenotype
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus. Appendix 1 - Report on the First GEN2PHEN Phenotype Workshop
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Gudmundur Thorisson, Helen Parkinson Security: PU Version: v1.0
HEALTH-200754
6/11
4.3. XGAP:Trait
XGAP-OM is the conceptual model behind the XGAP platform. It can be used to consistently model a wide variety of organisms, experimental designs, and biomolecular profiling technologies: • • • • Describe core experimental data using only four core data types Trait, Subject, Data and DataElement. Add experimental design annotations using core FuGE data types Investigation, Protocols and ProtocolApplications, OntologyTerms, etc. Consistently annotate Traits and Subjects using standardized extensions of Trait (e.g. Probe, Marker) and Subject (e.g. Individual, Strain). Consistently extend XGAP for new types of annotations by adding more types of Strain and Subject (e.g. add 'MassPeak' as a new Trait to annotate 'retentiontime' and 'mz')
More information on the model available from http://www.xgap.org/objectmodel.html
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus. Appendix 1 - Report on the First GEN2PHEN Phenotype Workshop
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Gudmundur Thorisson, Helen Parkinson Security: PU Version: v1.0
HEALTH-200754
7/11
5. 5. Models developed
5.1. COMMON:Phenotype
Note: the attributes were not added during the workshop and the model will be amended with them after a cooperative iteration effort. • • • • • • • Individual - Individual. Subject of study Inferred_value - Inferred conclusion, derived from zero or many observed values. Observable_feature - Something we can measure in relation to individual. For example blood pressure. Observation_target - Super class of all observation targets like Individual or Panel. Observed value - Measured value. Panel - Collection of individuals. Protocol - Description how measurement is planned to be done.
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus. Appendix 1 - Report on the First GEN2PHEN Phenotype Workshop
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Gudmundur Thorisson, Helen Parkinson Security: PU Version: v1.0
HEALTH-200754
8/11
• •
Protocol_Application - Description of how an actual measurement was done (optional different from protocol). Ontology_term - Term defined in specific name space (ontology source). All names and terms will be defined using ontology terms.
More information on the model available on the Schemalet website
http://www.schemalet.org/mediawiki/index.php/COMMON:Phenotype
The model is also available for download in following formats: • • Enterprise Architect http://bio-models.svn.sourceforge.net/viewvc/biomodels/trunk/object_models/enterprise_architect/phenotype.eap?view=log XML http://bio-models.svn.sourceforge.net/viewvc/biomodels/trunk/object_models/enterprise_architect/phenotype.xml?view=log
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus. Appendix 1 - Report on the First GEN2PHEN Phenotype Workshop
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Gudmundur Thorisson, Helen Parkinson Security: PU Version: v1.0
HEALTH-200754
9/11
5.2. MOLGENIS:Pheno implementation
This is a preliminary evaluation of the model, which will be further developed among Partners. More detailed documentation is available from http://bio-models.svn.sourceforge.net/ viewvc/
bio-models/ molgenis4phenotype/ WebContent/doc/objectmodel.html
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus. Appendix 1 - Report on the First GEN2PHEN Phenotype Workshop
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Gudmundur Thorisson, Helen Parkinson Security: PU Version: v1.0
HEALTH-200754
10/11
6. Minimal information on phenotype
It was agreed that reporting of the phenotypes is inconsistent. For example only some of the observation targets are annotated with ultrasound of the liver was significant in one of the subjects, but no information is given for other observation targets. Thus it is unclear whether they have also been tested. There are also a number of ethical ramifications which will be followed up in the Ethics Session during the upcoming Fourth GEN2PHEN General Assembly Meeting. It was also suggested that minimal information should be content specific, e.g. obligatory smoking status in reporting of hypertension. It was agreed that published phenotypic information should at least contain the following information about observation targets: • • • • Age Gender Age of onset Ontology (controlled vocabulary) term for signs and symptoms
Optional information would include: • Therapy information (ontology coverage is coming up short in this domain)
D3.5 High-Level Domain Model Version 2, with Sample/Phenotype Focus. Appendix 1 - Report on the First GEN2PHEN Phenotype Workshop
WP3 – Standard data models and terminologies Authors: Tomasz Adamusiak, Juha Muilu, Morris Swertz, Gudmundur Thorisson, Helen Parkinson Security: PU Version: v1.0
HEALTH-200754
11/11
7. Pathogenicity
Agreeing on the meaning of pathogenicity was a challenging task, as different communities use it in a slightly different way. It was proposed to distinguish between pathogenicity modifiers (positive/negative) and factors directly pathogenic. Pathogenicity could be variant causing disease or risk, but in a medical setting it is rather mutation causing a disease. Definition for diagnostic labs would also have to be different. A definition stating that pathogenicity leads to disease was found too broad, and the final version defined pathogenicity as an ability to cause disease. Issues raised during the discussion • • • Laboratory testing aims to link the existence of a variant to the occurrence of a disease (bias in over-reporting of pathogenicity). It is not recorded often enough, as it is hugely important and extremely useful. How to record values? It was proposed to use a continuous scale (e.g. p-values) to represent pathogenicity values. It was agreed that from a practical point of view it is more feasible to deal with four levels.
But this should also be extended to record values: non known and unclassified. • • Pathogenicity values should be backed up by an evidence reference, e.g. journal paper. In some cases a context is required, e.g. it is pathogenic only in association with...
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
1/14
Appendix 2 GEN2PHEN Phenotype Model Reference Implementation
The GEN2PHEN Phenotype Model is a minimal data model to represent a data set of phenotypic observations resulting from one or more investigations. The objective is to harmonize the exchange of phenotype descriptions between various repositories and to host phenotype information ranging annotations in locus specific databases to rich clinical reports from cohort studies. The initial version of this model was compiled at the GEN2PHEN phenotype workshop (Geneva, 8th-9th May 2009), building on previous modeling efforts from the XGAP, PaGE, FuGE, LOVD, and MAGE-TAB projects. Where appropriate mapping to these models is provided. This document was created by: Morris Swertz, Juha Muilu, Gudmundur Thorisson, Tomasz Adamusiak, Isabel Fortier, Paul Burton, John Hancock, Illke Lappalainen, Anthony Brookes, other members of the GEN2PHEN collaboration and Helen Parkinson. This work is sponsored by EU-GEN2PHEN, EU-CASIMIR, P3G, NWO-Rubicon, NBIC BioAssist/Biobanking. Changelog/decisions 11-06-2009 (following G2P AM4): 1. Added self-reference on Protocol to create aggregated protocols Use case: a study is a set of Questionnaires, each questionaire being a protocol 2. Added VariableDefinition as subclass of Observable feature and moved attribute 'unit' from ObservedValue to ValueDefinition. VariableDefinition can refer to one (?) ObservableFeature concept. Use case: a questionaire (protocol) is defined to measure 'length' in cm; 'length' is the observable feature, 'length in cm' the VariableDefinition. Motivation: if unit was defined on ObservedValue than one cannot define the unit for a protocol. If unit was defined in two places (protocol and value level) then they can conflict with each other. 3. Added timestamp to both the protocolApplication and ObservedValue Use case: blood pressure was measured at five ten minute intervals at 8:00, 8:10, 8:20. The motivation herefor is that protocols often include repeated measurements. A positive example is the use case of blood pressure time series. A negative example is 'blood pressure standing' and 'blood pressure lying down' which are different observableFeatures. 4. Adapted the description of protocolapplication to say it is an 'instance' of the protocol usage. 5. Did not change observableFeature.name into observableFeature.description, this is not advisable as it is inconsistent. 6. Did not replace subclass InferredValue with a directional self reference on ObservedValue for clarity.
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
2/14
Changelog/decisions 12-06-2009 (following meeting Juha Muilu, Morris Swertz, Tomasz Adamusiak): 1. Protocol.name is not unique within an investigation as it can be reused in multiple studies, a relationship is definable via ProtocolApplication. 2. ObservationTargets are not unique to one investigation as they can be observed in multiple studies, a relationship definable via the ObservedValue. 3. SelfRecursion on ObservedValue for multivalue and derived value was dropped for simplicity reasons. Until shown otherwise multivalue features can be grouped by protocol. 4. ObservedValue name is not made unique within investigation as it defies its purpose to integrate between studies. 5. There is no explicit relationship between ObservedValue.value and Code.term; such constraint checking is outside the scope of this model. 6. Added a 'value' to ParameterValue which was missing. 7. Changed that Code doesn't extend the OntologyTerm class but instead refers to an instance. 8. InferredValue seems not normalized in the sense that one has to repeat ObservationTarget which is implied via the ObservedValues it refers to. However, this is not changed because it can be that an inference is provided without providing the ObservedValues or that a Panel level inference is derived from a set of individual level Observedvalues.
Table of contents pheno.system package: Identifiable Nameable OntologySource OntologyTerm pheno.observation package: Investigation ObservableFeature ObservedValue InferredValue ObservationTarget pheno.target package: Individual Panel pheno.variable package: VariableDefinition CodeList Code pheno.protocol package: Protocol ProtocolApplication ProtocolParameter ParameterValue
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
3/14
1. pheno.system package
This packages describe basic classes that are used as building blocks for the pheno.core model.
1.1. Identifiable (interface) (For implementation purposes) The Identifiable interface provides its sub-classes with a unique numeric identifier within the scope of one database. This class maps to FuGE::Identifiable (together with Nameable interface)
Attributes:
id: int (required) Automatically generated id-field 1.2. Nameable (interface) (For modeling purposes) The Nameable interface provides its sub-classes a meaningful name that need not be unique. This class maps to FuGE::Identifiable (together with Identifiable interface)
Attributes:
name: string (required) A human-readable and potentially ambiguous common identifier
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
4/14
1.3. OntologySource
implements Identifiable, Nameable
The OntologySource class defines a reference to a an existing ontology or controlled vocabulary from which well-defined and stable (ontology) terms can be obtained. For instance: MO, GO, EFO, UMLS, etc. Use of existing ontologies/vocabularies is recommended to harmonize phenotypic descriptions. This class maps to FuGE::OntologySource, MAGETAB::TermSourceREF.
Attributes:
ontologyURI: hyperlink (required) A URI that references the location of the ontology. 1.4. OntologyTerm
implements Identifiable
The OntologyTerm class defines references to a single entry from an ontology or a controlled vocabulary. Other classes can reference to this OntologyTerm to harmonize naming of concepts. Each term should have a local, unique label. Good practice is to label it 'sourceid:term', e.g. 'MO:cell' If no suitable ontology term exists one can define new terms locally in which case there is no formal accession for the term. In those cases the local name should be repeated in both term and termAccession. Maps to FuGE::OntologyIndividual; in MAGE-TAB there is no separate entity to model terms.
Attributes:
term: string (required) The ontology term itself, also known as the 'local name' in some ontologies. termLabel: string (required) The label that is used to refer to this term inside this data set. For instance 'MO:cell' termAccession: string (optional) The accession number assigned to the ontology term in the source ontology. If empty it is assumed to be a locally defined term.
Associations:
termSource: OntologySource (0..1) The source ontology or controlled vocabulary list that ontology terms have been obtained from.
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
5/14
2. pheno.observation package
This package describes the minimal model for phenotypes.
2.1. Investigation
implements Identifiable, Nameable
The Investigation class defines self-contained units of study, each having a unique name and a group of actions (protocol applications) and/or results (in ObservedValues). For instance: Framingham study. Maps to XGAP/FuGE Investigation and MAGE-TAB experiment. Discussion: should we adopt MAGE-TAB::IDF type of minimal information about an investigation? 2.2. ObservableFeature
implements Identifiable, Nameable
The ObservableFeature class defines anything that can be observed (there may be many alternative protocols to measure them). For instance: systolic blood pressure, Diastolic blood pressure, Treatment for hypertension. These names are unique within a data set. Preferably each
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
6/14
ObservableFeature should be named according to a well-defined ontology. This class maps to XGAP Trait, FuGE DimensionElement and PaGE ObservableFeature. Multi-value features can be grouped by protocol. For instance: blood pressure consists of observations for features systolic and diastolic blood pressure.
Associations:
ontologyReference: OntologyTerm (0..1) Reference to the formal ontology definition for this feature 2.3. ObservedValue
implements Identifiable
The ObservableValue class defines the actual observation. For instance: 160 mmHg, 90mmHg, "no treatment". This class has no FuGE equivalent because in FuGE the data protocolapplication association is reversed, i.e. the ProtocolApplication has input/output Data (which could be ObservedValues). Maps to XGAP DataElement that uses the FuGE approach, so oberved values are grouped into 'Data'; Maps to PaGE observed value.
Attributes:
time: datetime (required) time when the protocol was applied. value: string (required) The value observed
Associations:
investigation: Investigation (1..1) Reference to the Investigation this observedValue belongs to. observationTarget: ObservationTarget (1..1) Reference to the subject that has been observed observableFeature: ObservableFeature (1..1) Reference to the feature that was observed protocolApplication: ProtocolApplication (0..1) Reference to the protocol application that produced this observation
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
7/14
2.4. InferredValue
extends ObservedValue
The InferredValue class defines ObservedValues that are inferred as result of human or computational post-processing of previous ObservedValues. The protocol used for this inference can be defined via the protocolApplication association that is inherited from ObservedValue. For instance: hypertensive = yes when mean arterial pressure = 135 AND no hypertension affecting medicine is taken. This class has no direct mapping to other models: XGAP would use input/ouput Data; PaGE would use a self reference on ObservedValue Implementation discussion: how to make the derivedFrom relationship understandeable in UI. Would need a multicolumn lookup including target, feature, value, and unit. Now one just gets a value.
Associations:
derivedFrom: ObservedValue (1..n) References to one or more observed values that were used to infer this observation 2.5. ObservationTarget
implements Identifiable, Nameable
An ObservationTarget class defines the subjects of observation. For instance: individual 1 from study x. This class maps to XGAP subject and maps to Page Abstract_Observation_Target. The name of observationTargets is unique within its Investigation.
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
8/14
3. pheno.target package
3.1. Individual
extends ObservationTarget
The Individuals class defines human cases that are used as observation target. This class maps to XGAP and PaGE individual. Discussion: what minimal properties should be hard-coded? E.g. sex is assumed to be an observablefeature while in PAGE/XGAP it as a direct property of individual.
Attributes:
sex: enum (required)
Associations:
species: OntologyTerm (1..1) mother: Individual (0..1) Refers to the mother of the individual. father: Individual (0..1) Refers to the father of the individual.
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
9/14
3.2. Panel
extends ObservationTarget
The Panel class defines groups of individuals that can act as a single ObservationTarget. Thus a whole group can have ObservedValues such as 'middle aged man' or 'recombinant mouse inbred Line dba x b6'. This class maps to XGAP/PaGE panel classes.
Associations:
individuals: Individual (1..n) The list of individuals in this panel
4. pheno.variable package
The variable package provides classes to define variables as used within a protocol/questionaire. Variables are specific types of observable features in that they have a unit attached
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
10/14
4.1. VariableDefinition
extends ObservableFeature
The VariableDefinition class extends the ObservableFeature class to enable precise definition of the unit of ObservableFeature.
Associations:
unit: OntologyTerm (1..1) Reference to the well-defined measurement unit used to observe this features (if feature is that concrete). E.g. mmHg codeList: CodeList (0..1) 4.2. CodeList
implements Identifiable, Nameable
The CodeList class names lists of discrete values that are available as options for a particular VariableDefintion. 4.3. Code
implements Identifiable
The Code class names the code values for a particular codelist. It extends from ontologyTerm adding the option to define pretty labels. For instance 'f=female', 'm=male'
Attributes:
value: string (required) The value that represents the code in the data label: string (required) The pretty label that represents the human understandeable meaning of the code. For instance the label on a CRF.
Associations:
codeList: CodeList (1..1) The code-list this code is defined to be part of ontologyTerm: OntologyTerm (0..1)
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
11/14
5. pheno.protocol package
The protocol package provides classes to describe protocols that are planned, or have been used, for observation. This can include questionnaires, wet-lab protocols and dry-lab protocols. Very similar to FuGE/XGAP and MAGE-TAB
5.1. Protocol
implements Identifiable, Nameable
The Protocol class defines parameterizable descriptions of methods; each protocol has a unique name within a dataset. Each ProtocolApplication can define the ObservableFeatures it can observe as well as the optional Parameters. For instance: SOP for blood pressure measurement used by UK biobank. This class maps to FuGE/XGAP/MageTab Protocol, but in contrast to FuGE it is not required to extend protocol before use. Note that the FuGE's mechanism of
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
12/14
parameters (for protocol) and parametervalues (for application) is not shown. Has no equivalent in PaGE.
Associations:
observableFeatures: ObservableFeature (0..n) The features that can be observed using this protocol. protocolComponents: Protocol (0..n) The set of protocols that together to make up this protocol. For instance: a set of questionnaires. 5.2. ProtocolApplication
implements Identifiable, Nameable
A ProtocolApplication class defines the actual action of observation by instantiating a protocol and optional ParameterValues. For example: the action of blood pressure measurement on 1000 individuals, using a particular protocol, resulting in 1000 associated observed values. This class maps to FuGE/XGAP ProtocolApplication, but in FuGE ProtocolApplications can take Material or Data (or both) as input and produce Material or Data (or both) as output. Similar to PaGE.ObservationMethod
Attributes:
time: datetime (required) time when the protocol was applied.
Associations:
protocol: Protocol (1..1) Reference to the protocol that is being used. investigation: Investigation (1..1) Reference to the Investigation this protocolapplication belongs to. 5.3. ProtocolParameter
implements Identifiable, Nameable
ProtocolParameter represents a variable of a Protocol that is instantiated as a Parameter Value (see ParameterValue). For instance 'growth temperature' in a protocol where yeast are grown at permissive and non permissive temperatures. It implements Unit to define the parameter type and
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
13/14
allowed values. ProtocolParameter maps to FuGE::Parameter
Associations:
protocol: Protocol (0..1) 5.4. ParameterValue
implements Identifiable
A ParameterValue is instantiated when a ProtocolApplication applies a Protocol with Parameters. ParameterValue implements Measurement to provide values and Units for ParameterValues. The FuGE equivalent to ParameterValue is FuGE::ParameterValue
Attributes:
value: string (required) The chosen value of the parameter within this protocol application
Associations:
protocolApplication: ProtocolApplication (1..1) Reference to the protocol application for which this parameter value was chosen for protocolParameter: ProtocolParameter (1..1) Reference to the protocol parameter that is being bound by this value
Appendix 2. GEN2PHEN Phenotype Model reference implementation
WP3 – Standard data models and terminologies
HEALTH-200754
Security: PU Version: 1
Authors: Morris Swertz
14/14
6. Supplementary figure: complete data model
This document is © 2009 by acaciareiche - all rights reserved.
Average:
0
Your rating:
None
Groups:
Login
or
register
to post comments
© 2009
GEN2PHEN Project
.
The
GEN2PHEN Knowledge Centre
has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013)
under grant agreement number 200754 - the GEN2PHEN project.