News hit the stands about a new research collaboration to find biological markers for Alzheimer’s disease (read the stories in the New York Times and the Wall Street Journal). (HT @atreolar on Twitter). One thing that sets this collaboration apart was that the work being done would have researchers
“share all the data, making every single finding public immediately, available to anyone with a computer anywhere in the world.”
The advantages of sharing data were made clear with respect to this project in the article:
“Different people using different methods on different subjects in different places were getting different results, which is not surprising. What was needed was to get everyone together and to get a common data set.”
And this is a very strong argument for data sharing. But as interesting as the story itself is, I find more interesting some of the issues it identified with respect to scientists sharing data at such a wide scale. Specifically this paragraph brought back some things to mind:
“At first, the collaboration struck many scientists as worrisome — they would be giving up ownership of data, and anyone could use it, publish papers, maybe even misinterpret it and publish information that was wrong. “
This (with different grammatic construction) is the argument floating around. We (scientists) may all see the advantage of data sharing, but are we willing to ‘give it up’?
If you ask scientists many of us would probably say that we do science for a specific purpose: try to help find a cure for a disease, solve some environmental problem, to contribute to human culture through the creation of knowledge. Data sharing makes us put our money where our mouths are.
But is it that easy?
I would argue it isn’t. Even when we may be willing to put our data out there, to have others use it and interpret it, there is a reality we still need to face: our hiring and promotion committees. And these look at our scientific output as ‘papers published’.
There has been a lot of chatter on what the values of the papers are: should impact factor matter? Should we be looking at article level metrics? But either still look at the papers. Should we stop valuing papers and start valuing datasets?
I brought this issue up at the Data Matters MoRST meeting I attended. The current PBRF system is incompatible with data sharing. It still measures ‘output’ as individual papers. And whether I like it or not, my University’s funding (and my ability to survive in the system) depends on me satisfying these criteria. So to promote data sharing, this too needs to change.
I wonder what would happen next time I apply for promotion if instead of listing my publications on my CV I were to list my ‘datasets’: This is the data I have generated (and made public), and this is how it has been used by me and by others. Wouldn’t that be a real measure of the impact of my work? Does it really matter ‘who’ used the data to advance knowledge? Or in other words, has the time come for ‘Data Level Metrics’?
Perhaps if we gave data the same hierarchy as papers when it comes to evaluating performance, people may quickly learn that by putting the data out there the impact of our work may be easily increased (and measured). And we may be quicker to put it out.
On other news:
The Open Science Summit‘s opening session are now online thanks to ForaTV. It was a great opening session to be at, and I am glad I managed to make it there. Unfortunately I wasn’t able to stay for the rest of the meeting.
At the same time that this was happening, the government of New Zeland released its Open Access and Licencing Framework (NZGOAL). You can read about it on the Open Knowledge Foundation website, which has links to all of the documents. This is indeed good news for data sharing in New Zealand. And when I returned from my trip I found an email from The Creative Commons Aotearoa New Zealand informing me that I had been selected as a member of the CCANZ Advisory Panel.
I want to thank CCANZ for allowing me to be part of this panel, it is indeed an honour and I look forward to the good things that promise to come out of it.
It’s been a while since I wrote one of these posts, and a lot has been happening which is really great. But, as usual, exam marking took priority, or should I say, took over my life.
And the SPARC goes to….
“SPARC has become a catalyst for change. Its pragmatic focus is to stimulate the emergence of new scholarly communication models that expand the dissemination of scholarly research and reduce financial pressures on libraries.”
According to the press release of June 22nd, the authors of the Panton Principles (Peter Murray-Rust, Cameron Neylon, Rufus Pollock and John Willbanks) were given this award because:
“The authors advocate making data freely available on the Internet for anyone to download, copy, analyze, reprocess, pass to software or use for any purpose without financial, legal or technical barriers. Through the Principles, the group aimed to develop clear language that explicitly defines how a scientist’s rights to his own data could be structured so others can freely reuse or build on it.”
There is a great article on the award and the history of the Panton Principles here. It is definitely worth a read.
(HT Jonathan Gray)
Open Science Summit
The “First Ever Open Science Summit” will be taking place in Berkeley California on July 29-31. It promises to be a great event not only because I am sure it will bring a lot of energy from the people attending a “First” but also because the session schedule just made me drool. From a retrospective of the human genome, to Open Access publishing, to Citizen Science, this looks like it will be a couple of days to remember by those able to attend.
(HT @JasonHoyt on Twitter)
Licencing Open Data
The Panton Principles address some of the issues surrounding how data should be share. Last week Glynn Moody on twitter pointed to this site: the Open Data Commons, a project ran by the Open Knowledge Foundation. This site provides 3 types of licences for data. I found the FAQ section quite informative, especially the linked section that discusses why these licences are put into place as opposed to the Creative Commons licences. (HT @glynmoody on twitter)
Related to this, there is a really interesting article on the Open Knowledge Foundation site that discusses the differences in non-commercial (NC) and share-alike (SA) licences, which addresses why the licences offered by the Open Data Commons are the way they are.
“This interoperability is absolutely key to realizing the main practical benefits of “openness” which is the ease of use and reuse — which, in turn, mean more and better stuff getting created and used.
[…]The aim is to ensure that any license which complies with the definition will be interoperable with any other such license meaning that data or content under the one license can be combined with data or content under the other license.
[…]Non-commercial provisions are not permitted because they fundamentally break the commons, not only through being incompatible with other licenses but because they overtly discriminate against particular types of users.”
Already the Panton Principles had suggested that licences other than CCZero of the Creative Commons should be discouraged. The Open Data Commons provides licencing formats for data and databases that should facilitate the way that data can be shared.
And that is a good thing.
Congratulations to our fellow Sciblings who made it to the finals for the Research Blogging Awards 2010.
- Misc.ience (by Aimee Whitcroft) was nominated for best blog in Chemistry, Physics or Astronomy.
- The Atavism (by David Winter) was nominated for best lay-level blog.
Congratulations to you both (and if I have failed in identifying a sciblogger, please let me know!
Random samples of my reading list brought to you through the magic of the internet, bloggers and Open Access.
This has been a busy week, but I managed to get some reading in anyway.
I loved this article in PLoS Computational Biology (Getting Started in Gene Expression Microarray Analysis) by Slonim and Yanai. It is a great “do’s and don’t’s” of the technique. I love articles that spell out techniques I don’t have personal experience with, because they give me the information I need to be able to make a critical assessment of the literature that make use of them. I will be coming back to this article a lot!
Steve Wilbanks has a great post: “Open Source Science? Or distributed science?” He starts his blog by saying:
I was asked in an interview recently about “open source science” and it got me thinking about the ways that, in the “open” communities of practice, we frequently over-simplify the realities of how software like GNU/Linux actually came to be. Open Source refers to a software worldview. It’s about software development, not a universal truth that can be easily exported. And it’s well worth unpacking the worldview to understand it, and then to look at the realities of open source software as they map – or more frequently do not map – to science.
And that was enough to hook me. Very interesting read.
Misha (from Mind Hacks) has a great post on brain stories and neuronovels, or about how neuroscience is seeping into literature. The post is a comment on a story by Marco Roth (n+1) which is a must read for those that love both literature and the brain.
My favourite tweet this week is by @gnat, pointing to the historical thesaurus of the Oxford English Dictionary. Lust, indeed!