Login
/
Register
or use
OpenID
More options
Home
News
Events
View all events
Calendar
Map
Community
Blogs
--Groups--
Biobank Informatics
Cafe Rouge Development
DNA Enrichment
Functional Prediction
LRG
Phenotype Modelling
Researcher identification
Semantic Web in GEN2PHEN
Sharing Summary GWAS Data
Web services and exchange formats
About GEN2PHEN
General Information
Project Summary and Objectives
Background and Concept
Future Vision - Current Reality
Strategy
Work Packages
Deliverables
Dissemination activities
Publications
Group home
Whitepaper
IRBW2009 Workshop
Executive summary
Meeting minutes
Primer
Wiki
Researcher Identification whitepaper
Posted Wed, 04/03/2009 - 10:44 by
Gudmundur A Thorisson
Brief decription of this document
The original researcher ID whitepaper circulated via E-mail in Feb'09.
Embedded Scribd iPaper - Requires Javascript and Flash Player
Attachment
Size
Researcher_Identification_whitepaper_v4mummi.100209.doc
202.5 KB
Enable JavaScript in your browser to view this document as it was initially formatted.
Document version: 1 Date: 4/3/09
White Paper IDENTIFYING USERS AND CONTRIBUTORS ON THE BIOMEDICAL INTERNET
Authors: Gudmundur A Thorisson (gt50@le.ac.uk) Anthony J Brookes (ajb97@le.ac.uk)
Introduction
A number of ostensibly separate initiatives, with diverse objectives, have begun considering the risks, benefits, and practicalities of unambiguously identifying researchers as they use and contribute to biomedical data sources on the Internet. The GEN2PHEN project (www.gen2phen.org) is one such initiative, given its general aim of helping to unify human and model organism genetic variation databases towards increasingly holistic views into Genotype-To-Phenotype (G2P) data. More specifically, the GEN2PHEN project considers researcher identification to be an absolutely central part of how biomedical databasing, and scientific reporting in general, needs to be developed. At the heart of this lies the concept of a user-centric system for researcher identification – in simple terms, one or more ‘ID systems’ by which individuals can be unambiguously identified along with various types of information that is associated with them (see Figure 1 for an overview). Key examples of research-related activities and services that would benefit from having a robust way of identifying researchers as they interact with the internet would include: • • • Practical options for the global management of access privileges to sensitive datasets Disambiguation of author names in the scientific literature and establishing/validating relationships between authors and publications A solid foundation for permitting and tracking online scientific contributions, such as database submissions, scientific blogging, and community curation efforts Security in Semantic Web applications Biobanking applications, including services enabling individuals to track how data from studies they have participated in are used Knowledge discovery applications using some or all of the above components.
• • •
1
Document version: 1 Date: 4/3/09
The current document was initially put together by members of the GEN2PHEN project, but the ideas and issues raised will hopefully stimulate debate, and we would welcome contributions/corrections that will be included and fully acknowledged. The document is thus an evolving ‘work in progress’, and hence it is versioned and dated. Shortly, a dedicated website will be launched where issues and opinions can be discussed in a completely open manner (check the GEN2PHEN website for further information), and a workshop on the subject is currently being organized for May 2009.
A user-centric system for individual identification on the Internet One strategy for researcher identification is a top-down approach, whereby each researcher is unilaterally assigned an identifier and this would subsequently be used wherever information relating to the researcher needs to be tracked or linked. Arguably, however, the idea of pushing a ‘single-identifier-everywhere’ solution on to researchers will be difficult to set up and operate, and meet with considerable pushback due to concerns about liberty and free will. Instead, more attractive would be a user-centric pull system, in which each individual seeks out the ID(s) they wish to utilize, and establishes their own linking to other information as and when needed. This ‘pull’ situation is highly analogous to recent developments in the online social community. At popular networking websites such as Facebook and Flickr, various Web 2.0 services (such as personal blogging platforms) are increasingly being linked together seamlessly to enhance the user experience. A key component in many of these developments is a relatively new technology called OpenID1 - a decentralized, open authentication protocol backed by Google, Yahoo, Microsoft and numerous other Internet heavyweights. OpenID provides a way for individuals to identify themselves uniquely across the Internet with a single set of credentials with a provider of their choice, thus avoiding the pain of managing multiple usernames and passwords across a plethora of different websites. OpenID is rapidly gaining ground in the wider online community, and as recently suggested2 it would be possible to use the same system for researcher identification. This proposal has much to merit it, though other options need to be considered, and there may even be a case for devising a completely new system specifically for biomedical researchers. Whichever system(s) come to be used, however, it is important to realize that individual sub-domains of biomedical research (e.g., journal publishing, funding organisations) will very often wish to employ their own set of individual IDs. This in no way conflicts with the principle of researchers having a universal OpenID, as this would be matched ‘behind the scenes’ to the alternative IDs used 2
Document version: 1 Date: 4/3/09
publishers, and funders, etc. More generally, whichever ID and authentication system is used, there are many reasons to make it user-centric, so that: a) the individual is made able to manage his own online identity, and b) the individual has principal control over where his identifier and online profile(s) is deployed and who has access to what sections of it. At present OpenID fits these requirements very well, and so for the remainder of this whitepaper we will provisionally assume that OpenID represents the preferred authentication system of choice. However, bear in mind that the usage scenarios described in the sections to follow merely depend on some common mechanism for identification, and not on the use of the OpenID protocol per se.
Main Application Areas for OpenID in Biomedicine:
Sensitive Datasets, Data Privacy, and Access Control Investigations into clinical materials, especially high-throughput experiments and genetic epidemiology studies using thousands of individuals, generate data from which study participants can be identified. In order to protect these individuals from potential misuse of the data generated about them (e.g. discrimination by health insurance providers or potential employers), the dissemination of these data must be carefully controlled. But this will become increasingly costly and difficult to manage on a case by case basis, given increases in; the number of such studies; the number of groups/consortia generating such datasets; the number of databases wishing to integrate and disseminate the information; and the number of researchers wishing to access these data. For example, currently, to gain access to genotype data from genome-wide association studies (GWAS) from the Wellcome Trust Case-Control Consortium (WTCCC)3, one must complete a special form, wait up to 2 months for approval from the relevant Data Access Committee, and sign a Data Access Agreement. The researcher is then allowed to download encrypted files from the EGA website4 to his computer, and must decrypt these files with a provided key. NCBI’s dbGaP database5 has similar procedures in place. While there are good reasons for these measures, they already impede the rate of research progress, and will increasingly do so as opportunities for broad dataset integration and meta-analysis become ever more curtailed due to limitations on access. Simply extending the current system will not change the core fact that access permissions must to be applied for per dataset/project, making it very onerous for researchers who need to access many datasets from multiple sources. Also, as the researcher must download the data to his local computer, the system does not scale up to future applications where data integration will take place on-the-fly across many diverse data sources on the Internet. 3
Document version: 1 Date: 4/3/09
The whole process would obviously be greatly streamlined if one or more services (probably operated by major regional data centres such as WTCC/SI and NCBI) were to store information on access privileges for each researcher based on an OpenID that he would provide upon registration. The registry (or registries) could then be used by various primary and secondary data providers (whether or not part of the WTCC/SI and NCBI) to check whether or not a person should be allowed access to a given type of sensitive dataset (Figure 1b). The same registry could also be used to ‘blacklist’ individuals found guilty of inappropriate use of data (though the complex issue of sanctions needs much further consideration, whatever mechanism for access approval is in operation). Author profiles and name disambiguation Ambiguity in author names for scholarly publications has long been a problem in science. Multiple authors can have the same name and authors sometimes change their name (e.g. women marrying and taking their husband’s family name). This can in result in inaccurate literature searches, the wrong person being to be asked to peer-review a paper, and a host of other problems. This is particularly pronounced for non-English authors from countries such as India or China where a large number of individuals share the same family name, a situation made worse when different names end up being spelt in different ways when converted into English6. Unique author identifiers have been suggested to resolve this problem7, and two commercial services, ResearcherID8 by Thomson-Reuters and Scopus Author Identifier9 by Elsevier are attempts at doing just that. The non-profit CrossRef organization10 is also working on a system for contributor identifers11 provisionally named CrossReg (G. Bilder, personal communication). Whether run by a single organization or multiple organizations/companies, an open contributor identifier service or multiple linked services (hereafter referred to as simply CrossReg, for convenience) would be valuable on many levels in scientific publishing, just as DOIs have done for the publications themselves. So how may authors benefit from such a system? One important answer to this concerns centralized author profile management: given that a researcher has registered with CrossReg to claim his profile (and in the process supplied proof that he is who he claims to be), he could then associate his OpenID with his/her contributor ID. This would enable a host of new possibilities, such as logging on to a publisher’s website via OpenID (e.g. in order to submit a manuscript) and allow the publisher to retrieve the author’s current affiliation and other profile information from the CrossReg service (Figure 1a). Another important feature is that CrossReg would provide a measure of validity for a given author-publication relationship. This would be useful in
4
Document version: 1 Date: 4/3/09
all sorts of settings where incorrect publication attribution is a problem; consider the hypothetical author J. Smith who claims he has authored three papers, but in reality another J. Smith authored two of these papers. Potential usage might include employers who wish to check publications listed on a job applicant’s CV, and authors who could display on their personal website a list of publications alongside a ‘verified by CrossReg’ icon (Figure 1d). Incentives/Rewards for Scientific Contributions The traditional way to gauge a researcher’s scientific prowess is to look at his publication record in peer-reviewed journals, and use crude, imperfect metrics like the ISI Impact Factor (IF) as a measure of the quality of these journals. But there are many other ways, besides authoring traditional papers, in which researchers contribute to science. Submissions to biological databases, curation of data in those databases, Web 2.0 activities like scientific blogging, online commenting on and rating of scientific papers (pioneered by the journal PLoS ONE12), represent examples of activities for which researchers get little or no credit for at present. If these contributions can be tracked and linked to the identity of each researcher via his OpenID (Figure 1c), what would then gradually emerge is a web of publication credit-like (aka ‘microattribution’) information which can be mined and aggregated to produce far more useful metrics of individual scientific contribution than is possible today (see e.g. Scholar Factor as proposed in ref 2). These ideas are being further developed in the guise of a BioMedical Resource Impact Factor (BRIF) 13, which is heavily centered on the needs and activities of the Biobanking community.
Summary
There are many stakeholders that do (or arguably should) have an interest in internet-based researcher identification, and the above text will hopefully contribute to helping these individuals become more aware of allied projects in the field, and the latest relevant technologies. Any system which enables detailed tracking of individuals’ activities, whether online or in the real world, brings with it the potential for invasion of privacy by governmental agencies and other parties. These ‘Big Brother’ concerns are valid and need to be addressed. But researchers cannot expect to have their cake (anonymity) and eat it too (accurate publication record, microattribution etc.). As pointed out in a recent report14, there is “a careful balance to be struck between giving credit where credit is due and knowing everything about everyone”. Nevertheless, a system such as outlined above, where the individual is in the driving seat and controls his identity and how/where it is used, would go a long way towards addressing these privacy concerns and will be an 5
Document version: 1 Date: 4/3/09
important aspect of how science is conducted in the future.
Figure 1: A hypothetical example showing a researcher interacting with a variety of online services, all tied together via a universal authentication system.
References 1. OpenID. at <http://openid.net/> 2. Bourne, P.E. & Fink, J.L. I am not a scientist, I am a number. PLoS Comput Biol 4, e1000247(2008). 3. Wellcome Trust Case Control Consortium. at <http://www.wtccc.org.uk> 4. European Genotype Archive. at <http://www.ebi.ac.uk/ega/> 5. The database of Genotypes and Phenotypes. at <http://www.ncbi.nlm.nih.gov/gap> 6. Qiu, J. Scientific publishing: Identity crisis. Nature News, Published online: 13 February 2008; | doi:10.1038/451766a 451, 766(2008). 7. Falagas, M.E. Unique Author Identification Number in Scientific Databases: A Suggestion. PLoS Med 3, e249(2006). 8. ResearcherID. at <http://www.researcherid.com> 9. The Scopus Author Identifier. at <http://info.scopus.com/etc/authoridentifier/> 10. crossref.org : : dois for research content. at <http://www.crossref.org> 6
Document version: 1 Date: 4/3/09
11. CrossTech: CrossRef Author ID meeting. at <http://www.crossref.org/CrossTech/2007/02/crossref_author_id_meeting.html> 12. PLoS ONE : Publishing science, accelerating research. at <http://www.plosone.org/static/commentGuidelines.action> 13. Cambon-Thomsen, A. Assessing the impact of biobanks. Nature Genetics 34, 25– 6(2003). 14. Wolinsky, H. What's in a name? EMBO Rep 9, 1171–4(2008).
Acknowledgements The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 - the GEN2PHEN project.
7
This document is © 2009 by gt50 - all rights reserved.
Average:
0
Your rating:
None
Groups:
Researcher identification
Comments
Post new comment
Comment:
*
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
© 2009
GEN2PHEN Project
.
The
GEN2PHEN Knowledge Centre
has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013)
under grant agreement number 200754 - the GEN2PHEN project.
Comments
Post new comment