Tying it all together

In previous sections of this primer we introduced several aspects of the identification problem and outlined scenarios where a universal authentication system forms an integral part of the solution (see Figure 1 below). Now it is time to investigate how this may work in practice, which technologies (in addition to OpenID) might play a role, and how a researcher might leverage these tools to aggregate information about himself in a meaningful way.

Tying it all together

Case study: data access control 

Imagine a researcher working for company X who has applied (and been approved) for access to a collection of diabetes type I GWAS datasets in several online archives.  The company does not want competitors to know that they are working on this particular disease (NB the same would be true for many academic researchers as well). In general, we can conclude that information on a researchers' data access permits constitutes private information that most users will want to keep private (akin to one's address book), and should not be shared without the user's approval. 

Relating this to the access mechanism proposed in a previous section, we can identify a key set of requirements: A) the user needs to log onto one website (the data provider) and this service needs to be able to securely request information from another site (the researcher / data permit service) on the user's behalf (see Figure 1b). The good news is that these requirements can be fulfilled with existing technologies.

Authentication vs authorization

We have already introduced OpenID as a candidate for solving the authentication part of this equation (proving who you are). Put another way, OpenID works sort of like a set of master keys that will open the doors to your house, your car etc. You would never give these keys to just any person on the street and trust them to not do anything inappropriate (like going to your house and stealing your television!). For the same reason, you should never give out your OpenID credentials to anyone.

The other part of the equation is authorization: in the scenario above we need some way for the authenticated user to explicitly permit the data provider to use his OpenID credentials to connect to the researcher registry and retrieve a particular piece of the user's private information. It is possible combine OpenID (or some other authentication protocol) with its companion open network protocol named OAuth, in order to control permissions for web services in a fine-grained manner. OAuth is often likened to a special valet key for luxury sports cars, with which the parking attendant can only drive the car a short distance around the parking lot. Another analogy is giving somebody the keys to your house, with the restriction that this somebody can only enter on a Saturday afternoon and only to watch the football game on your television. (perhaps better explained here)

If you are interested in seeing how this works hands-on, see this site for a simple demo which retrieves the contact list from your Google account. A real-life example (albeit using proprietery technology, rather than OpenID+OAuth) is provided by the Facebook social networking website: If you are a Facebook user, chances are you are already using authentication/authorization technologies to share your private data with certain Facebook applications, or connecting your Facebook account with external services, such as the Flickr photo-sharing site.

Case study: microcredit tracking

When it comes to tracking of database submissions, curation and other scientific contributions, one can imagine a researcher initially associating his OpenID with a microcredit tracker service. Whenever the researcher submits data to a biological repository (logged in via his OpenID), the repository submission service contacts the tracker service (securely, via OAuth) and transmits an indicator of the contribution (an authenticated token-based mechanism has been suggested1). The same could be done for data curation efforts, wiki editing and many other kinds of contributions; the tracker mechanism could be made completely generic.

Over time, the tracker (or possibly multiple trackers) would therefore aggregate submission credit information for the researcher, and the researcher may choose to make this aggregated information public (though some may not want to).

Aggregating information to populate a professional profile

Given that a future researcher has, through his online activites, accumulated various kinds of information at many different locations but all connected via his online identity. How can he aggregate this information and put it to use? One realistic objective is to create a professional profile (e.g. for job applications) which, among other things, would list scholarly publications and other contributions. It is important to many professionals to maintain such a profile online, often via dedicated websites such as LinkedIn or on Facebook and similar social networking websites.

One can easily imagine an extension to LinkedIn which lets a user configure his profile to include a list of published papers retrieved from, and verified by, a central CrossReg service (see previous section), as well as a summary of verified database submissions fetched a microcredit tracker service.

Conclusions

Any system which enables detailed tracking of individuals’ activities, whether online or in the real world, brings with it the potential for invasion of privacy by governmental agencies and other parties. These ‘Big Brother’ concerns are valid and need to be addressed. But researchers cannot expect to have their cake (anonymity) and eat it too (accurate publication record, microattribution etc.). As pointed out in a recent report2 there is “a careful balance to be struck between giving credit where credit is due and knowing everything about everyone”.

Nevertheless, a system such as outlined above, where the individual is in the driving seat and controls his online identity and how/where it is used, would go a long way towards addressing these privacy concerns and will be an important aspect of how science is conducted in the future.

  1. 1. Bourne et al. I am not a scientist, I am a number. PLoS Comput Biol (2008) vol. 4 (12)doi:10.1371/journal.pcbi.1000247
  2. 2. Wolinsky. What's in a name?. EMBO Rep (2008) vol. 9 (12) doi:10.1038/embor.2008.217

Comments

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.