About GEN2PHEN
The GEN2PHEN project aims to unify human and model organism genetic variation databases towards increasingly holistic views into Genotype-To-Phenotype (G2P) data, and to link this system into other biomedical knowledge sources via genome browser functionality.
FUNDING

| GEN2PHEN is funded by the Health Thematic Area of the
Cooperation
Programme of the European Commission within the VII Framework Programme
for Research and Technological Development. |  |
General Information
HEALTH theme - contract no. 200754.
Duration: 60 months.
Start date: 1-Jan-2008.
Funding: 11.889.367 €
Participating institutions:
- University of Leicester, UK
- European Molecular Biology Laboratory, Germany
- Fundació IMIM, Spain
- Leiden University Medical Center, Netherlands
- Institut National de la Santé et de la Recherche Médicale, France
- Karolinska Institutet, Sweden
- Foundation for Research and Technology – Hellas, Greece
- Commissariat à l’Energie Atomique, France
- Erasmus University Medical Center, Netherlands
- Institute for Molecular Medicine Finland, University of Helsinki, Finland
- University of Aveiro – IEETA, Portugal
- University of Western Cape, South Africa
- Council of Scientific and Industrial Research, India
- Swiss Institute of Bioinformatics, Switzerland
- University of Manchester, UK
- BioBase GmbH, Germany
- deCODE genetics ehf, Iceland
- PhenoSystems SA, Belgium
- Biocomputing Platforms Ltd Oy, Finland
- University of Patras, Greece
FUNDING

| GEN2PHEN is funded by the Health
Thematic Area of the
Cooperation
Programme of the European Commission within the VII Framework Programme
for Research and Technological Development. |  |
Partners
The GEN2PHEN Consortium constitutes a talented pool of European research groups and companies that are interested in the G2P databasing challenges. A few non-EU participants have been included to bring extra capabilities to the initiative.
- University of Leicester (ULEIC), UK. Anthony J Brookes
- European Molecular Biology Laboratory - The European Bioinformatics Institute (EBI - EMBL), Germany. Paul Flicek; Helen Parkinson
- Fundació IMIM (FIMIM), Spain. Carlos Díaz
- Leiden University Medical Center (LUMC), Netherlands. Johan den Dunnen
- Institut National de la Santé et de la Recherche Médicale (INSERM), France. Anne Cambon-Thomsen; Christophe Béroud
- Karolinska Institutet (KI), Sweden. Jan-Eric Litton
- Foundation for Research and Technology (FORTH), Greece. Giorgos Potamias
- Commissariat à l’Energie Atomique (CEA), France. Mark Lathrop
- Erasmus University Medical Center (EMC), Netherlands. (Until June 2009)
- Institute for Molecular Medicine Finland, University of Helsinki (UHFCG), Finland. Juha Muilu
- University of Aveiro – IEETA (UAVR), Portugal. José Luis Oliveira
- University of Western Cape (UWC), South Africa. (Until June 2009)
- Council of Scientific and Industrial Research (CSIR), India. Samir K Brahmachari
- Institute of Genomics and Integrative Biology (IGIB). Debasis Dash
- Swiss Institute of Bioinformatics (SIB), Switzerland. Yum Lina Yip
- University of Manchester (UNIMAN), UK. Andrew Devereau
- BioBase GmbH (BIOBASE), Germany. Edgar Wingender
- deCODE genetics ehf (deCODE), Iceland. Hakon Gudbjartsson
- PhenoSystems SA (PHENO), Belgium. David Atlan
- Biocomputing Platforms Ltd Oy (BCP), Finland. Timo Kanninen
- University of Patras, Greece. George Patrinos (From July 2009)
Associate Members
Project Summary and Objectives
The GEN2PHEN project aims to unify human and model organism genetic variation databases towards increasingly holistic views into Genotype-To-Phenotype (G2P) data, and to link this system into other biomedical knowledge sources via genome browser functionality. The project will establish the technological building-blocks needed for the evolution of today’s diverse G2P databases into a future seamless G2P biomedical knowledge environment, by the projects end. This will consist of a European-centred but globally-networked hierarchy of bioinformatics GRID-linked databases, tools and standards, all tied into the Ensembl genome browser. The project has the following specific objectives:
- To analyse the G2P field and thus determine emerging needs and practices
- To develop key standards for the G2P database field
- To create generic database components, services and integration infrastructures for the G2P database domain
- To create search modalities and data presentation solutions for G2P knowledge
- To facilitate the process of populating G2P databases
- To build a major G2P internet portal
- To deploy GEN2PHEN solutions to the community
- To address system durability and long-term financing
- To undertake a whole-system utility and validation pilot study
The GEN2PHEN Consortium members have been selected from a talented pool of European research groups and companies that are interested in the G2P database challenge. Additionally, a few non-EU participants have been included to bring extra capabilities to the initiative. The final constellation is characterised by broad and proven competence, a network of established working relationships, and high-level roles/connections within other significant projects in this domain.
Background and Concept
By providing a complete Homo sapiens ‘parts list’ (the gene sequences) and a powerful ‘toolkit’ (technologies), the Human Genome Project has revolutionised mankind’s ability to explore how genes cause disease and other phenotypes. Studies in this domain are proceeding at a rapid and ever-increasing pace, generating unprecedented amounts of raw and processed data. It is now imperative that the scientific community finds ways to effectively manage and exploit this flood of information for knowledge creation and practical benefit to society. This fundamental goal lies at the heart of the “Genotype-To-Phenotype Databases: A Holistic Solution (GEN2PHEN)” project.
Previous genetics studies have shown that inter-individual genome variation plays a major role in differential normal development and disease processes. However, the details of how these relationships work are far from clear, even in the case of most Mendelian disorders where single genetic alterations are fully penetrant (essentially causative, rather than risk modifying). Background genetic effects (modifier genes), epistasis, somatic variation, and environmental factors all complicate the situation. This is particularly the case in complex, multi-factorial disorders (e.g., cancer, heart disease, diabetes, dementia) that will affect most of us at some stage in our lifetime. Strategies do, however, now exist to study the genetics of these disorders, and such investigations are a major focus of research throughout Europe and beyond. A common thread in these studies is the need to create ever-larger datasets and integrate these more effectively.
Success in deciphering the mechanisms and pathways underpinning genotype-to-phenotype (G2P) relationships will bring about radical new opportunities for predicting, preventing, diagnosing, and treating all forms of illness. It will launch an era of truly effective personalised medicine. Extensive research is therefore being conducted worldwide to characterise genetic variation in normal and disease contexts. Sadly though, the resulting flood of primary information is not yet being managed or utilised as effectively as it should be - due simply to the lack of a sufficiently organised and mature database infrastructure by which the discoveries can be gathered, stored, integrated and queried as a composite whole in the electronic (internet) domain. Furthermore, whilst new positive findings are being handled sub-optimally, ‘negative’ observations are in most cases not even reported in any way, shape, or form – despite the fact that they constitute an essential part of any complete and accurate G2P depiction. This needs to change, and an international ‘Human Variome Project’ (HVP) has emerged to help argue this case.
It is against this backdrop that the GEN2PHEN project aims to become the key European contribution to the challenges listed above, harmonised with similar projects elsewhere, and dovetailed into many related European programmes of work. It will provide an important and timely solution to a current research need that was highlighted by the European Strategy Forum on Research Infrastructures (ESFRI) - Priority area: ‘Upgrade of European Bio-Informatics Infrastructure (Shared platform for data resources in the Life Sciences)’. It will provide European G2P research and biotech industries with the proper support they need in terms of database technologies and data integration systems. Only then can our societies maximally benefit from the current exponentially increasing rate of genetic data generation in disease research and clinical settings.
Future Vision - Current Reality
Looking to the future, one can imagine a world wherein ‘omics’ biomedical sciences are commonplace, even to the point of having one’s genome sequenced in routine medical checkups. In this envisaged world, phenomenally large amounts of G2P data will be produced daily, much of which would flow effortlessly into the internet to be fully absorbed into a sophisticated and powerful ‘biomedical knowledge environment’. Some of this information will be secured for restricted access, whilst much of the raw data and the derived knowledge should be free for everyone to search and exploit.
The system will enable extensive scientific reporting and discussions, it will provide a core reference platform for medical practice, and it will open exciting new operational vistas for journals, industry, and funders. It will provide for and underpin activities in biomedical research, biotechnology, drug development, and personalised healthcare. And it will probably even impact our basic cultural practices (e.g., insurance, the law, employment policies) as society comes to grips with the immense power and relevance of genetics to the human state. But this envisaged future is nothing like the world we presently live in.
No system yet exists that even begins to approximate to a ‘biomedical knowledge environment’ properly able to support G2P data gathering and analysis. There are instead a limited number of unconnected G2P databases that are mostly at rather early stages in their development, with no agreed structured way of effectively modelling phenotype data or G2P relationships, and no convenient mode for passing data from discovery laboratories into the database world. A few recent initiatives are building large databases to host individual-specific genotypes and phenotypes to support some high-throughput disease association studies, but these do not have a global remit, have not engaged with the extensive existing knowledge from Medelian disorders, and are not focused on all the research and clinical communities around G2P. Most progress has arguably been made with locus-specific databases (LSDBs) that target specific diseases or genes, but the vast majority of the several hundred LSDBs that do exist are rudimentary in design and implementation, and operationally isolated from one another. This all contrasts with the situation for databases concerned with purely genetic data (without phenotype association), of which there are many, including several large data warehouses and genome browsers that act as central repositories and search centres for all the human and model organism genome sequences, variants, and feature annotations yet produced.
There are a number of reasons why the G2P database field is so poorly developed. Problems include the complexity/diversity of the pertinent data elements, the contemporary nature of the challenge, and certain practical/cultural issues. However, perhaps the most critical obstacle is the overwhelming scale of the problem. Whereas the genome is a bounded domain of only ~3,000,000,000 nucleotides and ~25,000 genes (in man), there is essentially no limit to the number of G2P relationships that can be examined, each by multiple different procedures. The former is thus relatively straightforward and can be managed and hosted in one or a few large data depositories (as has been accomplished). In contrast, the latter is too large in scale and scope to handle in this way.
There is virtually no limit to how many G2P data will eventually be created, or to their diversity or purpose. The database solutions for G2P information must therefore be based upon new ways of thinking and organising the field’s development - emphasising standards, integration, federation, and broad community participation from the very outset.
The GEN2PHEN Strategy
The GEN2PHEN project has the overall ambition of unifying human and model organism genetic variation databases, and doing this in such a way that the resulting holistic view of G2P data can be blended with all other biomedical database domains via one or more central genome browsers. The project will put in place the main building blocks needed to move substantially from today’s G2P database situation towards the ultimate future of a complete biomedical knowledge environment. The project will then utilise these building blocks to construct a first-generation version of a G2P knowledge environment by the project’s end. This will consist of a European-centred but globally networked hierarchy of bioinformatics GRID-linked databases, tools and standards, all tied into the Ensembl genome browser. To ensure the project builds something that truly works and tangibly benefits the community, rather than merely devising potentially useful technologies, we have focussed the project’s objectives on the three essential components of a functioning G2P database system. These can be viewed as three legs of a ‘stool’, each of which must be robust for the stool to properly function (see Figure 1).
1. TO ANALYSE THE G2P FIELD AND INVESTIGATE CURRENT NEEDS AND PRACTICES
We recognise that other work is going on in the field, and that different users have related but different needs. Our Consortium comprises representatives of each sector currently building G2P databases, and we have many deep connections into the broader G2P community. We shall utilise these skills and relationships to ensure that our activities match the latest needs and progress of others, and to gain community trust and acceptance of the GEN2PHEN system. This will be achieved by broad opinion gathering and open discussion with the community from the projects outset. This will lead to state-of-the-art documents that describe the general progress of the field and the specific data models and technologies that are particularly favoured and effective. GEN2PHEN itself will almost certainly have a big influence on these things, but we will adapt our work as necessary to maximally interoperate with external developments.
2. TO DEVELOP KEY STANDARDS FOR THE G2P FIELD
From an intimate knowledge of what others are doing, we will develop data models, nomenclature, and technology standards that will be building blocks for us and the community. We will not develop ontologies ourselves, but connect to and be led by the various expert groups doing this for the G2P domain. Each finalised standard will be formally documented, and wherever possible registered with independent bodies to make them official global standards.
3. TO CREATE GENERIC DATABASE COMPONENTS, SERVICES AND INTEGRATION INFRASTRUCTURES FOR THE G2P DOMAIN
Based upon GEN2PHEN-derived and other emerging standards, we will build generic database components and a deeply networked infrastructure (one leg of the stool). This will include solutions for genetic (gene or disease-specific) and genomic (whole genome) databases, with appropriately styled interfaces for the target communities: namely, for biomedical researchers, clinical practitioners, and the general public. The genetic database work will concentrate upon providing one or more ‘LSDB-in-a-box’ applications, so enabling anyone to easily set up an LSDB for their gene/disease of interest. We will also establish an LSDB hosting service for those that prefer this way of proceeding. The genomics database work will concentrate upon providing components for flexible and future-proof database implementations that support summary-level G2P datasets. We will not target support for individual-level G2P datasets as databases for these are already being constructed to support large-scale genetic association studies and medical re-sequencing projects. Instead, we are already partnered with such groups and we will ensure compatibility between their and our developments. At least one major genomics G2P database will be brought into operation by our Consortium. The components of this will be passed on to others so that many such databases can be put in place by the end of the project. The genomics databases will be designed to function towards the top of hierarchies wherein resources towards the bottom carry increasingly detailed datasets. As such, GEN2PHEN databases will help compile and channel information from the wide community into the Ensembl browser. A range of integration technologies and data exchange procedures/conventions will underpin, surround, and infiltrate the databases we wish to build, thus bringing interoperability within the project and with the broader G2P database field.
4. TO CREATE DATA SEARCH AND PRESENTATION SOLUTIONS FOR G2P KNOWLEDGE
A standardised and integrated database layer will make it possible to provide sophisticated and powerful search functionality across an ever greater fraction of all G2P knowledge (another leg of the stool). The databases will be able to reuse common search tools, query interfaces, and data output formats, giving the system the benefits of both branding and familiarity. Search functions and tools will be designed with various different users in mind (especially researchers, clinicians, and the general public), with special emphasis on the needs of the medical/diagnostic community. The most unifying aspect of the project, however, will entail providing support for pan-resource searching via the Ensembl platform. This will be achieved by a range of standardised data output/exchange protocols and new browser capabilities, anchored on GRID technologies and further development of the Mart system. User interfaces across the system will be tailored to meet the needs of the relevant communities. This implies providing, at various overlapping levels in the GEN2PHEN system, a gene-to-disease perspective with entity concepts that will be mostly used by researchers, as well as a disease-to-gene view that is built around medical terminologies that will be more relevant to clinicians. A further view that uses lay terms and simpler interrogation systems will be provided for the general public, and this will further connect to other websites that provide medico-genetic data to the public. Atop all of this we will establish chat and discussion fora, by which anyone can debate relevant subject matters, even down to the level of commenting on individual database records. These community inputs will then be made visible alongside core search results when the database network is searched.
5. TO FACILITATE THE POPULATING OF RESEARCH AND DIAGNOSTIC G2P DATABASES
By both tool development and community interactions, we will proactively seek to populate the G2P domain with valuable data (the third leg of the stool), much of which will not otherwise be brought forward (e.g., negative data) or suitably packaged (e.g., the content of journals or raw datasets). We will additionally seek to devise pipelines and protocols that will enable highly-informative diagnostic laboratory genetic data to also flow into public G2P databases. Success on these undertakings will be apparent by the growth in data content of the GEN2PHEN databases.
6. TO BUILD A MAJOR G2P INTERNET PORTAL
To provide a global focus for G2P database activities and developments, we will construct a ‘GEN2PHEN Knowledge Centre’. This will be an internet domain that not only summarises our project activities and provides downloads of all our available code/software but also provide access to many other sources of relevant information, host calendars and diaries of meeting/activities, enable chat amongst field participants, and offer personalised and holistic search capabilities to the complete G2P internet domain, tailored to the needs of the different communities. This will be seamlessly joined to many of the functions we will set up with Ensembl. Citations and website hits will be used to track the value and usage of this novel ‘main G2P data portal’. A particularly important feature of this Knowledge Center will be that it will include a system that enables users to directly comment upon, and thereby update, contest, or launch a public discussion about, any database record, group of records, or reported observation in the total G2P domain. This totally original G2P feature will help bring the GEN2PHEN system ‘to life’ and inspire healthy debate which is the hallmark of productive science.
As the technology development work proceeds, we will take steps to interest the community in those developments and enable the community to adopt and use them. Many strategies will be used to achieve this, not least outreach via the GEN2PHEN Knowledge Centre. Much of this deployment work will be based upon the ‘database federation’ concept - the cultural equivalent of the integration technologies that we will be developing. For LSDBs in particular, a community already exists that has started to grow in this direction. Especially in the second half of the project, we expect to devote substantial resources to advertising, explaining, and training researchers with the uptake of our solutions.
8. TO ADDRESS SYSTEM DURABILITY AND LONG-TERM FINANCING
Questions of durability must be considered. The standards and software we devise will survive for as long as they remain useful. But G2P databases (ours and others) can only survive given a funding stream to resource their maintenance and ongoing development. In the future, many G2P databases may have to be supported by new business models beyond those of academic funding. Academic and industry members of the Consortium will, together, explore this question, and explore ways by which the innate value of G2P data can be ethically and effectively leveraged to keep such databases growing self-sustainably. The solution may entail devising ways to provide incentives for stakeholders to value and contribute to the G2P database future. A ‘Bio-Resource Impact Factor’ may be relevant here (i.e., an index that quantifies the impact of a given bio-resource), and an ethics panel will work within GEN2PHEN and with other EU projects to actively explore this possibility and report their findings.
9. TO UNDERTAKE A SYSTEM UTILITY AND VALIDATION PILOT STUDY
To objectively track progress and deficiencies in the GEN2PHEN project we will continually cycle versions of a ‘System Utility and Validation’ pilot project. This will focus upon specific genes/diseases of interest in clinical medicine, starting from the perspectives/needs of the diagnostic laboratory. A team will attempt to use GEN2PHEN systems to explore and interpret "thematic areas" of current and important biomedical importance – for example, genetic aspects of cancer. The objective will be to use the GEN2PHEN system to glean a complete picture of what is known or predictable about the thematic area of interest. This will span questions of immediate interest to medical clinicians/diagnosticians, and also move into basic research questions, animal model evidence, and perhaps even other non-DNA domains of biology. Besides providing assessment of the usefulness of the system in a ‘real-life-like scenario’, it will also judge the relevance/utility of GEN2PHEN training activities, and the GEN2PHEN Knowledge Centre. This assessment pilot will be run every 12-20 months, delivering reports that will be carefully considered and used to refine and redirect GEN2PHEN activities as necessary.
Work Packages
WP1 - Scientific Coordination
This work package covers all the higher-level oversight and scientific control measures needed to make the project a success. It thus impinges on all of WP2-10. Work package activities primarily involve gathering and reacting to new scientific ideas, optimising the use made of the project committees, and supervising work package leaders as they execute their role. It also involves ensuring that all project results are produced using appropriate quality policies. For this latter undertaking, robust quality assessment procedures will be devised by the work package and suitably applied to all technology development work packages, especially WP4, WP5, and WP6. Furthermore, this work package will provide ethical oversight of the whole project, and as part of this it will undertake specific ethical assessment exercises as and when the need arises. WP1 will thus provide the central ‘leadership’ role upon which all other work packages will depend. To perform these roles, WP1 will be strongly intertwined with WP10, so that scientific leadership and management of the project are mutually reinforced as drivers of the initiative.
A particular priority for WP1 will be that of continually assessing the effectiveness and utility of the items emerging from GEN2PHEN. To this end, WP1 will organise a rolling 'Pilot Project' that will entail trying to use the developing G2P database network for the purpose for which it was created - i.e., to explore G2P relationships in depth and across a range of species and situations. This Pilot will be run towards the start of the project, and then again every year or so, with resulting documented findings being used by the GEN2PHEN Consortium to recursively adapt and improve our project activities. This activity will thus fundamentally assess and potentially redirect efforts in all other work packages.
WP2 - Domain Analysis and Community Relations
This work package is designed to reach out to database users and to database development experts, to formally assess what other key groups are doing that is of relevance to GEN2PHEN and investigate what are the current and future needs of the different actors in the G2P field. Deliverables will include formal documentation of our findings, and these will be focussed in two directions. Firstly, we will establish general trends and needs for G2P databasing, and use this analysis to refine the activities of our project. We do not anticipate this will imply any major changes to our project plan, but we do hope to identify ways to adapt our work so that it brings maximum synergy with the efforts of others. Secondly, we will formalise the data models and the nomenclature systems being utilised by others. This will provide the detailed domain analysis upon which WP3 will build. WP2 will thus constitute a formal ‘requirements analysis’ upon which the rest of our project work will be based. It will also help establish and sustain a spirit of understanding and trust between our project and other G2P domain initiatives, which will encourage others to adopt and use our systems. WP2 will thus be key in initiating a chain-reaction, needed for a long-lasting impact in the field.
WP3 - Standard Data Models and Terminologies
This work package will take the domain analysis documents and priority use cases produced by WP2, and build on these to create reference data models upon which all our subsequent database development and implementation work will be based. Inter-compatible data models will be devised that will support each type of G2P database and data exchange. These data models will also be aligned with genomics data standards being developed in the clinical domain, for example by HL7. The model(s) will be formalized via submission to standards organisation(s). Subsequently, as new scientific issues arise, the models will be enhanced accordingly via a rolling review process. A similar standardisation path will be followed with mutation nomenclatures. However, for the complex field of G2P ontologies, this work package will not undertake direct ontology development work. It will instead devise a plan for precisely incorporating suitable ontologies into G2P databases. WP3 will thus provide mission-critical ‘blue-prints’ upon which all our database development work will be based. By standardising the data models and terminologies from an early stage, and registering these as official global standards, this work package will ensure that the databases and other deliverables from all subsequent work packages are maximally interoperable.
WP4 - Genetics G2P Databases
These two related work packages (WP4 and WP5) will create modular and generic G2P database components, and then use these ‘building blocks’ to construct major demonstration databases for use by the field. All of this work will be based upon the standards developed in WP3. The main distinction between these two work packages is that WP4 will concentrate upon genetics databases (i.e., where the focus is on one or a few genes or diseases, with clinical utility being most important) whereas WP5 will concentrate upon summary level genomics databases (i.e., aggregated datasets, where there is no focus on any particular gene or disease, and with research utility being most important). The two workpackages will implement rather different user interfaces, with WP4 oriented mostly towards the needs of clinical users and WP5 oriented more towards biomedical researchers. WP4 and WP5 are therefore complementary, and will benefit from close co-operation as their respective tools and software solutions are developed. Furthermore, in both cases, the database components and/or complete databases produced will be used as tangible assets around which federated teams of (existing and new) database operators can grow, and this will be actively encouraged by activities in these work packages. WP4 and WP5 involve substantial technology development, and, as such, will be subject to quality assurance measures dictated by WP1. Both work packages will generate operational databases, and these will be used for the data gathering efforts performed in WP7. These databases will also be valuable in their own right, and in that capacity they represent example depositories that will be integrated and made universally searchable by the activities of WP6.
WP5 - Genomics G2P Databases
These two related work packages (WP4 and WP5) will create modular and generic G2P database components, and then use these building blocks to construct major demonstration databases for use by the field. All of this work will be based upon the standards developed in WP3. The main distinction between these two work packages is that WP4 will concentrate upon genetics databases (i.e., where the focus is on one or a few genes or diseases, with clinical utility being most important) whereas WP5 will concentrate upon summary level genomics databases (i.e., aggregated datasets, where there is no focus on any particular gene or disease, and with research utility being most important). The two workpackages will implement rather different user interfaces, with WP4 oriented mostly towards the needs of clinical users and WP5 oriented more towards biomedical researchers. WP4 and WP5 are therefore complementary, and will benefit from close co-operation as their respective tools and software solutions are developed. Furthermore, in both cases, the database components and/or complete databases produced will be used as tangible assets around which federated teams of (existing and new) database operators can grow, and this will be actively encouraged by activities in these work packages. WP4 and WP5 involve substantial technology development, and, as such, will be subject to quality assurance measures dictated by WP1. Both work packages will generate operational databases, and these will be used for the data gathering efforts performed in WP7. These databases will also be valuable in their own right, and in that capacity they represent example depositories that will be integrated and made universally searchable by the activities of WP6.
WP6 - Integration and Data Access Technologies
This work package involves a series of activities designed to variously tackle the core challenges of inter-resource data integration. This involves enabling processes of data exchange between databases, data integration and synchronisation within central databases or warehouses, and holistic searching across databases. It also concerns the question of how to best manage complex G2P queries, and how to represent the results of any and all G2P database searches. This work package will clearly make extensive use of general integration strategies already employed by Ensembl and others, and it will particularly exploit the products of WP4 and WP5. Aspects of the work package will explore using the concepts and capabilities of the GRID to bring ever-greater sophistication to the emerging network of G2P databases. Furthermore, as a core technology work package, WP6 joins WP4 and WP5 in that its software will be subject to quality assurance measures dictated by WP1.
WP7 - Data Flows
This work package is primarily concerned with the gathering of data to populate the databases constructed by the GEN2PHEN project. The undertaking will span various types of G2P data gathered from many different sources, including receiving direct submissions from the community. To accomplish its goals, the work package will utilise tools and systems built by WP4, WP5, and WP6 as appropriate. These tools will provide the mechanics of the process, but there must also be an engine to drive the data flow, and this will come from the community involvement activities of WP2 and the community federation activities of WP4 and WP5. As GEN2PHEN databases consequently become increasingly populated by WP7 efforts, they will exponentially increase in their utility - exemplifying the main purpose of the GEN2PHEN project.
WP8 - GEN2PHEN Knowledge Centre
This work package is designed to concomitantly endow the GEN2PHEN project with maximal visibility, whilst also providing great added-value utility. The intention is to build a strategic internet portal that is named after our project, providing a virtual ‘Centre of Excellence’ for G2P science. This ‘GEN2PHEN Knowledge Centre’ will encompass many functions. It will act as the project website and present a diary of our events, thereby assisting with dissemination activities of WP9. It will also carry useful expert knowledge for the G2P community (some emerging from various GEN2PHEN work packages, and some originating outside our project), provide a G2P meeting calendar, host G2P chat pages, and make available for download all the public-domain outputs from GEN2PHEN – thus enabling the community involvement work of WP2 and WP9, and the tool deployment work of WP4, WP5, and WP6. Centralised search functions created by WP6 will also be possible to launch directly from this site, and by a related modality it will be possible to deposit comment or updates pertaining to any aspect of the field, even individual database records. We thus intend that the GEN2PHEN Knowledge Centre, via the efforts of this work package, will become a highly popular website that will significantly assist the G2P community and thereby place the GEN2PHEN project centre-stage in that scientific domain.
The GEN2PHEN Knowledge Centre will also constitute the central component of a broader knowledge management strategy. This will encompass both ‘internal’ training for partners (so that expertise and knowledge is effectively shared within the Consortium for maximum efficiency in the development of the work) and ‘external’ training for prospective users of the GEN2PHEN results (to support deployment and adoption of the solutions devised by the project). Both types of training are therefore included in WP8.
WP9 - Dissemination, Use and Future Sustainability
This work package is designed to reinforce and complete the tasks undertaken under WP2 and WP8, thereby helping the project create good relations with, and have a durable impact on, the G2P community. To this end, activities in WP9 encompass both ‘dissemination’ and ‘exploitation’ tasks. The former will focus on publicising the project and its results, according to a well-designed communication plan and ad-hoc developed tools. This will involve synergistic interaction with the GEN2PHEN Knowledge Center developed under WP8, for mutual benefit.
The ‘exploitation’ side of WP9 will involve studying incentive, reward, and business issues, as well as other socio-economic aspects that currently hamper progress in the G2P field and limit recurrent funding. This activity is predicated on the view that technical and scientific developments can provide only part of the solution to current problems; there is also a need to discuss and develop sustainability models that accommodate isolated academic and industrial perspectives into a bigger, inclusive framework. The ideal scenario would harmonise a) the need to quickly and openly translate the scientific and healthcare benefits of G2P activities to the targeted communities and citizens in general, with b) the need to continuously be able to gather sufficient resources to undertake the huge task of creating, developing and maintaining G2P data systems. As such, WP9 will be an ‘end-point’ of the other work packages, and a key component in helping GEN2PHEN achieve its ultimate goal of having a widespread and durable effect on the G2P database domain.
WP10 - Management
While strong scientific leadership can be sufficient to run smaller, individual grants, additional expertise is required in the supervision and monitoring of large-scale, complex projects. Specifically, complementary professional project management is required in such undertakings so that the amount of resources and number of participants are orchestrated along the extended schedule towards the appropriate fulfilment of objectives. Traditional project management standards have in these cases to be flexibly adapted and selected to match the specifics of EC-funded projects, not least because; i) trade-offs between scope, quality, time and cost cannot always be readily solved; ii) there are contractual obligations that limit what can be done and how; iii) there are specific financial and administrative procedures that have the potential to create an excessive overhead; iv) there is a need to create a working team out of independent, geographically scattered institutions. This implies the need for generating adequate work and communication dynamics that support the role of the scientific co-ordination in commanding the project and underpinning the whole work plan, without prejudice of the co-ordinator retaining legal responsibility on specific issues as indicated in the Grant Agreement. It also requires strong financial and legal management that effectively deals with the added flexibility that European projects represent, and the evolving circumstances that a 5-year endeavour will have to face. All of these activities, deeply interrelated with WP1, will be developed under WP10.