‘what would Software Carpentry look like if it was delivered as a university course?’
A number of conversations and workshops were had that kept indicating that the thirst and need for this was there, that there wasn’t a clear solution in place, and that the solution was not going to be easy to produce. We knew what we wanted the house to look like, but we needed to find an architect. And of course, money to pay them.
Enter Nat Torkington
Nat organises an unconference called KiwiFoo. He invites a bunch of people to a retreat north of Auckland and lets the awesome happen. In 2015 I was invited, and, by pure luck Kaitlin Thaney was invited too as she was around that time in Australia for a software carpentry instructor training around ResBaz Melbourne. Also invited were Nick Jones, director of NeSi which had recently become the New Zealand institutional partner of Software Carpentry, and John Hosking, Dean of the Faculty of Science, University of Auckland.
The words that Kaitlin Thaney said at one of our meetings came back as if from a loudspeaker: You need to engage with the University Leadership. You need to think strategically.
And KiwiFoo gave us that opportunity.
Kaitlin, Nick and I brought John Hosking into the conversation, and his response was positive. We tried to exploit the convergence as much as we could over that weekend – there are not that many chances to get to sit with this group of people in a relaxed environment and without interruptions or the need to run to another meeting. We had each other’s full attention. And exploit we did.
Back in Auckland, Nick suggested that I talk about the project to the Centre of eResearch Advisory Board. The Centre of eResearch at the University of Auckland is helping researchers with exactly these kinds of issues. Next thing I know, Cameron McLean and I are trying to get everything we learned through the workshops into something more concrete. I talked to those details, and when the Board asked: ‘how can we help you’ I did not know what to say.
Luckily, Nick Jones, as usual came to the rescue. We had a chat, and decided to work with me on higher level thinking. I was still missing the big picture that we could offer the leadership. Watching Nick’s thinking process was a humbling joy. I think I learned more from that session than what I did in all the Leadership programmes I was part of. I realised also how far I was from getting to where we needed to get. What is the long term vision? What are the gaps? Why do we need to fill them? How are you going to manage change?
At this meeting we saw we needed to engage with CLeaR, the organisation that provides Professional Development for staff and the group has a lot to offer in instructional design. We had already agreed that this training project should not be focused solely on students, but, rather, should have a broader scope. We produced an initial outline of what we were proposing, and invited Adam Blake from CLeaR to join the conversation and contribute to this document.
I was invited again to the eResearch Advisory Board, and this time I was better prepared. The timing was also perfect. The application window for the Vice Chancellor’s Strategic Development fund was open and I now knew what I needed: support to put an application through. We built a team of key project advisors, each who could contribute something quite specific: Adam Blake, to advise on course structure and to provide support to do the research on the course, Mark Gahegan, Director of the Centre for eResearch, Poul Nielsen, from the Auckland Bioengineering Institute, Nick Jones, from NeSI, and myself as the Project Lead, and the intention of hiring Cameron McLean as project manager. We worked on the application and backed, by the eResearch Advisory Board, it went in.
Our proposal was to develop a training suite, based on Software and Data Carpentry that could be used to be delivered to students and staff in different formats, to support a ResBaz in Auckland in February 2016, and to run a pilot course for students about to enter the research lab on second semester in 2016. We knew our bottleneck was time – people’s time to do the work. We asked for $150,000 in salaries.
In September we got the email: your application has been approved….
The Vice Chancellor’s fund was giving us initially a limited amount of money with the rest of the money contingent on the approval of a needs analysis by the eResearch Advisory Board.
We accepted the offer and hired Cameron McLean as Project Manager (by now he was a trained Software Carpentry Instructor and had submitted his PhD thesis and was waiting for his viva). First order of business, a needs analysis.
Time to go to the library.
When Billy Meinke and I sat to work on planning our sprint session for MozFest, he suggested that the activities of science could be grouped into 3 objects (text, code, data) and 3 actions (create, share, reuse). I was skeptical – surely science is way more complex than that. After running the session at MozFest and later in New Zealand, however, convinced me that Billy was right. We never encountered any object or action that we could not fit within that classification.
The actions are self explanatory – create, share, reuse. You either are the author of a manuscript or you are not, you have contributed data or not, you have contributed to software or not. You create, share or reuse, or you don’t. However, what emerged at MozFest was how these 3 actions (which we seem to engage with apparently separately) are actually very dependent on each other – how we can share depends on how we created. Let’s look at an example:
I am capturing neural data using proprietary software that creates proprietary formats. That decision affects how those data can be reused (by a future self or others): only those with access to that software can open those files. Sharing and reuse, hence, becomes limited. If instead, I think of sharing and reuse from the onset, I may choose to use a piece of software (proprietary or not) that at least lets me export the raw data in an open format. Once I do this, then the opportunities to share for reuse come down to licencing. (Note: using a proprietary software may bring about other issues, but that is slightly a different discussion). So, during the act of creation we build constraints (or eliminate them) around sharing and reuse. So why not think about this upfront? Similarly, once we decide to share the licences we use will determine what kinds of reuse our work can have. These 3 actions are very interconnected – and it would be useful to think about how each affect each other from step 0 (planning). Those decisions will affect not just how we create, but also the infrastructure we choose to use so that the act of sharing and reuse is made as easy as possible when the time comes. Licences may state what we can do ‘legally’ but the infrastructure we use defines what we can do ‘easily’.
The objects became a lot more interesting as the MozFest and the New Zealand workshops progressed. At first, the idea of text, data and code seemed quite explanatory. Each of them we can identify with something we actually recognise, like a manuscript, the software we use for data acquisition and analysis, the data we measure or that is produced by some automated system. The fun part started when we tried to describe how these objects ‘behaved’, and, given those behaviours, how we were able to describe them (e.g., metadata).
Examples of text usually came across as manuscripts. When we think of manuscripts, we think of things that tell the story of data and code. They have a narrative that provides context to the work, there is usually a version of record which is difficult to modify, we usually publish it, it is peer reviewed, authors are well-defined, etc. The drafts prior to the version of record, the peer review, the corrections, etc., are usually not available (although that is changing in places, e.g., PeerJ, F1000, BioRxiv, to name a few). We usually interact with the versions of records; our ability to comment on those artefacts are limited , but are becoming available (e.g., PubMed) and mechanisms to suggest modifications (or to modify) these artefacts are almost non-existent. In other words – text (manuscripts) are stable. Another artefact that ‘behaves’ like text is equipment. Look at the following comparison:
|Volume, page numbers||Internal asset tag|
|Errata and corrections||Maintenance and repair records|
In other words, equipment seems to ‘behave’ like manuscripts. If you want to do (or say) something the equipment doesn’t do (or the paper doesn’t say) you need to buy (or write) a new one. So, when describing equipment you end up using similar descriptors to those for papers. In itself, a piece of equipment is a ‘stable’ object that can be described with ‘stable’ descriptors, not too dissimilar to a paper. Defective equipment breaks, defective papers get retracted. What this means is that the category ‘text’, when thought of as a set of behaviours and descriptors can help us build better descriptors for other artefacts with similar behaviours. This behaviour also determines how we create, share and reuse. How we publish a manuscript (behind paywalls, or with an Open Access licence) determines who can reuse and how. Using ‘open hardware’ equipment is similar to putting an open creative commons licence to a manuscript – using proprietary equipment is equivalent to publishing behind a paywall.
At the other end of the spectrum is code. Code likes to live in places like github. There is version control, the ability for multiple external and internal contributions, the list of contributors is agile and expands, and can stay alive and dynamic for long periods of time. There may be stable versions that are released at different time points, but in between, code changes. Code is at the core or reproducibility – it is the recipe that lists the ingredients and the sequence of what we did with and to the data. Not sharing the code is the equivalent of giving someone a cupcake and expecting them to be able to go and bake an identical version. So code is dynamic and its value is in the details. A lot of the value in code is that it is amenable to adaptations and modifications. One artefact of research that behaves like code is the experimental protocol, e.g., a protocol that describes a specific method for in situ hybridization, or how to make a buffer.
|Original author, plus future contributors||Original author plus future contributors|
|What a line of code does is described through annotation||What a line describing a step does ‘should’ be described by annotation|
|Bits and pieces can be copied to be part of another piece of software||Bits and pieces can be copied to be part of a different protocol|
|A single version for a single study||A single version for a single study|
|Otherwise constantly changing and being updated||Otherwise constantly changing and being updated|
So protocols seem to behave like code. Unfortunately, we tend to treat them as text (we share them in the materials and methods of our manuscripts). It would be much more useful to have protocols on places like github – allowing line testing and annotation, allowing ‘test driven development’ of protocols, allowing branching and merging, etc. If we were to think of protocols as ‘code’ we could then share them in a way that they could be more amenable for reuse. And if we do so, then we might think that an appropriate way to licence a protocol for sharing and reuse would be to apply the licences that promote sharing and reuse for code, not licences for text.
Data sits a bit in the middle of the two. Like text, it has stable versions – e.g., the data that accompanies a specific manuscript. Once data is captured, it cannot be changed (except of course to correct an error, or to legitimately remove ‘bad’ data points or outliers). In essence data changes by growing or reorganising, subsetting, etc, not by changing specific pre-existing values. It has some dynamic behaviours of code, and stable behaviours of text. It has stable versions and dynamic versions. How data is created determines how it can be shared and reused: are the formats open or proprietary, is it licenced openly or not, etc., as the project progresses, as outliers are eliminated, as new data is added. But for the most part there is a point in time where data moves from behaving like ‘code’ to behaving like text. Good open formats and licences can bring data back to a dynamic state (something harder to do with text-objects). This behaviour is important when we write the descriptors of data. There are the authors, data is linked to protocols and code, and eventually text, it can be used for different analysis, etc. Chemicals, in a way, behave like data:
|File name||Catalogue number|
|Version||Lot #, shipping date, aliquots|
|Storage place||Storage place|
|Linked to code and text||Linked to protocols and manuscripts|
How we share and describe data and chemicals is again similar. Is the chemical/data available to other researchers so they can repeat my experiments? Or is it something I produced in my lab and only share with a limited number of people? Again, how you ‘licence’ data and chemicals determines the extent to which these artefacts can be shared and reused. And, again, thinking about this intention at the planning stage makes a difference.
All three objects can be published and cited, and data and code and slowly claiming the hierarchical position they deserve in the research cycle. The need for unique identifiers for resources is also recognised here and here, for example.
During the workshops it was fun to get people to ‘classify’ their research artefacts based on these behaviours. At MozFest, for example, Daniel Mietchen suggested his manuscripts behave more like code. I would argue that they should then be licenced (and described) like code.
What I learned from these workshops (and Billy’s 3×3 table) is that if we can classify all of our artefacts within these categories, then the process of describing our research artefacts and building them with the intention of openly sharing for reuse becomes much easier. And teaching the skills to understand how your choices constraint downstream effects becomes more achievable.
As long as, of course, you think about this from the beginning.
Footnote: This is my interpretation of Billy Meinke’s thinking model – he may loudly laugh about my interpretation. He may even roll his eyes, hit his head against the wall – I don’t know. But the clarity he brought to my approach to the problem is something I am extremely grateful for. Hat tip.
We put a man on the moon about half a century ago yet we still haven’t solved the problem of access to the scientific literature.
I was invited to speak at the New Zealand Association of Scientists meeting this year. The theme was “Science and Society” and I was asked to speak about Open Access from that perspective.
The timing was really good. Lincoln University published their Open Access Policy last year, Waikato University released their Open Access mandate a couple of weeks ago, and the University of Auckland is examining their position around Open Access. New Zealand is catching up.
I opened my talk by referring to the New Zealand Education Act which outlines the role of univeristies:
“…a university is characterised by a wide diversity of teaching and research, especially at a higher level, that maintains, advances, disseminates, and assists the application of, knowledge, develops intellectual independence, and promotes community learning”
[New Zealand Education Act (1989) Section162.4.b.iii] (emphasis mine)
I argued that those values could be best met by making the research outputs available under Open Access as defined by by the Budapest Open Access Initiative, that is, not limited to “access” but equally importantly, allowing re-use.
After summarising the elements of the Creative Commons licences that can support Open Access publishing, I invited the audience to have an open conversation with their communities of practice to examine what values each place on how to share the results of our work.
My position is that the more broadly we disseminate our findings the more likely we will achieve the goals set out by the NZ Education Act to maintain, advance, assist in the application of knowledge, develop intellectual independence and promote community learning. I am also of the position that this is what should be rewarded in academic circles. I think that. as a community , we should move away from looking for value in the branding of the research article (i.e., where it is published) and focus instead on measuring the actual quality and impact of the research within and outside the academic community.
How do we measure quality and impact?
At times I feel we have we become lazy. We often stick to using impact factor as a proxy for quality instead of interrogating the research outputs to understand their contribution and impact. Impact factor may be an easy metric – but it is not one that measures in any way the quality or impact of an individual article, let alone of the researchers who authored it. It is just an easy way out, a number we can quickly look at so we can tick the right box. As a metric it is easy, quick and objective. As a metric of value of an individual piece of work it is also useless and, because of that, it inevitably lacks fairness in research assessment.
What does this have to do with OA?
By the end of the conference I couldn’t shake the thought that the barriers to Open Access may not be financial and the costs of publication fees may be the least of our problems. (This issue of cost just keeps coming up.) I can’t but wonder if the cost Open Access might just be a red herring that lets us avoid the real (and bigger) issue: quality assessment. Open Access may help our articles have a wider reach but, except for a few titles, Open Access journals are not recognisable brands. If we are forced to stop looking at the “journal brand” we will be forced to assess the individual articles for their intrinsic value and impact. And, although it may lead to better, more valid, assessment, it is also a big and difficult job.
A lot of what was said today at the conference revolved about the value of New Zealand science (and scientists) to society and the importance of science communication. We spoke about the importance of evidence-based policy, the need to be the critic and conscious of society and the challenges of working with the public to build a trust in scientific evidence despite its uncertainties. We expect politicians and society to do the hard job of making decisions based on evidence. I couldn’t help but ask whether we, as a community of scientists, can live up to those standards.
Can we ditch the bad and easy for the good and hard?
We put a man on the moon. Solving the issues around open access and research assessment must certainly be easier to solve. Are we ready to put our money where our mouth is?
What do brain machine interfaces and Open Science have in common?
They are two examples of concepts that I never thought I would get to see materialised in my lifetime. I was wrong.
I had heard of the idea of Open Access as Public Library of Science was about to launch (or was in its early infancy) . It was about that time that I moved to New Zealand and was not able to go to conferences as frequently as I did in the USA, and couldn’t afford having an internet connection at home. Email communication (especially when limited to work hours) does not promote the same kind of chitter-chatter you might have as you wait in cue for your coffee – and so my work moved along, somewhat oblivious to what was going to become a big focus for me later on: Open Science.
About 6 years ofter moving to New Zealand things changed. Over a coffee with Nat Torkington, I became aware of some examples of people working in science embracing a more open attitude. This conversation had a big impact on me. Someone whom I never met before described me a whole different way of doing science. This resonated (strongly) because what he described were the ideals I had at the start of my journey; ideals that were slowly eroded by the demands of the system around me. By 2009 I had found a strong group of people internationally that were working to make this happen, and who inspired me to try to do something locally. And the rest is history.
What resonated with me about “Open Science” is the notion that knowledge is not ours to keep – that it belongs in the public domain where it can be a driver for change. I went to a free of fees University and we fought hard to keep it that way. Knowledge was a right and sharing knowledge was our duty. I moved along my career in parallel with shrinking funding pots and a trend towards academic commodification. The publish or perish mentality, the fears of being back-stabbed if one shares to early or too often, the idea of the research article placed in the “well-branded” journal, and the “paper” as a measure of one’s worth as a scientist all conspire to detract us from exploring open collaborative spaces. The world I walked into around 2009 was seeking to do away with all this nonsense. I have tried to listen and learn as much as I can, sometimes I even dared to put in my 2 cents or ask questions.
How to make it happen?
The biggest hurdle I have found is that I don’t do my work in isolation. As much as I might want to embrace Open Science, when the work is collaborative I am not the one that makes the final call. In a country as small as New Zealand it is difficult to find the critical mass at the intersection of my research interests (and knowledge) and the desire to do work in the open space. If you want to collaborate with the best, you may not be able to be picky on the shared ethos. This is particularly true for those struggling with building a career and getting a permanent position, the advice of those at the hiring table will always sound louder.
The reward system seems at times to be stuck in a place where incentives are (at all levels) stacked against Open Science; “rewards” are distributed at the “researcher” level. Open Research is about a solution to a problem, not to someone’s career advancement (although that should come as a side-effect). It is not surprising then how little value is placed in whether one’s science can be replicated or re-used. Once the paper is out and the bean drops in the jar, our work is done. I doubt that even staffing committees or those evaluating us will even care about pulling those research outputs and reading them to assess their value – if they did we would not need to have things like Impact Factors, h-index and the rest. And here is the irony – we struggle to brand our papers to satisfy a rewards system that will never look beyond its title. At the same time those who care about the content and want to reuse it are limited by whichever restrictions we chose to put at the time of publishing.
So what do we do?
I think we need to be sensitive to the struggle of those that might want to embrace open science, but are trying to negotiate the assessment requirements of their careers. Perhaps getting more people who embrace these principles at staffing and research University Committees might at least provide the opportunity to ask the right questions about “value” and at the right time. If we can get more open minded stances at the hiring level, this will go far in changing people’s attitudes at the bench.
I, for one, find myself in a relatively good position. My continuation was approved a few weeks ago, so I won’t need to face the staffing committee except for promotion. A change in title might be nice – but it is not a deal-breaker, like tenure. I have tried to open my workflow in the past, and learned enough from the experience, and will keep trying until I get it right. I am slowly seeing the shift in my colleagues’ attitudes – less rolling of eyes, a bit more curiosity. For now, let’s call that progress.
I came to meet in person many of those who inspired me through the online discussions since 2009, and they have always provided useful advice, but more importantly support. Turning my workflow to “Open” has been as hard as I anticipated. I have failed more than I have succeeded but always learned something from the experience. And one question that keeps me going is:
What did the public give you the money for?
or the day after the sting
I got the embargoed copy of Science Magazine article on peer review in Open Access earlier this week, which gave me a chance to read it with tranquility. I have to say I really liked it. It was a cool sting, and it exposed many of the flaws in the peer review system. And it did that quite well. There was a high rate of acceptance of a piece of work that did not deserve to see the light. I also immediately reacted to the fact that the sting had only used Open Access journals – cognizant of how that could be misconstrued as a failure of Open Access and detracting from the real issue, which is peer review.
I had enough time to write a blog post, and was lucky enough to be able to link to Michael Eisens’ take on the issue before I posted, so I did not need to get into the nitty gritty of why the take from the sting had to be taken for nothing more than what it was – an anecdotal set of events. Because what it was not, is a scientific study.
One of the things that I found valuable from the sting (or at least my take-home message) was that there is enough information out there to help researchers navigate the Open Access publishing landscape they are so scared of and provided some information on how to choose good journals. The excuse that there are too many predatory journals to justify not publishing in Open Access is now made weaker. It also provided all of us with an opportunity to reflect on the failures of peer review and the value of the traditional publication system.
Or so I thought.
Then the embargo was lifted, and I have been picking up brain bits spilled over twitter, blogs and other social media as the tsunami of heads exploding started. And as the morning alarm clocks went off as the sun rose in different time zones, new waves of brain bits came along.
By now, I could look at the entire ‘special issue’ and what else was in it. Here is where I see the problem.
There were lots of articles talking about science communication. Not one of them could I find (please someone correct me if I am wrong!) that took on the sting to refocus the discussion in the right direction (that is, peer review), nor to reflect on how Science and the AAAS behind it measure up to those issues they so readily seemed to criticise.
I never liked the AAAS – or rather I began disliking it after I got my first invitation to join in the late 1980’s. It seemed that all I needed to do to become a member was send them cash. There was no reason to do that – since obviously, without requiring anyone to endorse me as a “proper scientist” I could not see what that membership said about me other than having the ability to write a check. I was already doing that with the New York Times, and if I couldn’t put that down in my CV, then neither could I put down my membership with AAAS. Nothing gained, nothing lost, move on.
What I didn’t know back at that time, was that that first letter would be the first in a long (long!) series of identical invitations that would periodically arrive in my mailbox where they were be quickly disposed of in the rubbish bin in the corner of the room. I am sure one would be able to find plenty of those in the world’s landfills.
“The vitality of the scientific meeting has given rise to a troubling cottage industry: meetings held more for profit than enlightenment.” (Stone, R., & Jasny, B.)
Wut? Let’s apply the same logic to the AAAS membership – Would we consider that predatory behaviour too?
Let’s move on to peer review.
Moving back to the sting. Yes, they sent a lot of articles out. The article in science seems to me to be delivered from a very high horse, and one with no legs to stand on. Their N is large (perhaps not large enough, but that is beyond the point). Because to each journal they just sent one (n=1; “en equal one”) hoax paper (singular, not plural). I may ask – had they sent say 10 hoax papers to each journal, would each journal have accepted the 10, only 5 or perhaps only 1? Because that makes a difference at the individual journal level. If we are going to accept that such n=1 is enough to make any informed conclusion about whether a journal is predatory or not, then, well, arsenic life. ‘Nuff said.
Let’s take a second look at the arsenic paper. n=1. The arsenic paper was so bad that poor Michael Eisen’s head exploded because readers of his blog actually believed he had sent it in as a hoax – I myself even got caught doing a double-take when I started reading his blog post (but I kept on reading!). That’ll teach him for being such a convincing writer.
So, if n=1 is enough, does that mean Science magazine is ready to add their name to the list of journals that don’t meet the mark? I could not, on their issue, find any reflection on that (please someone correct me if I am wrong!).
… and to open access
But the bigger issue in my view was what appears to be a position of Science on Open Access. Now Science is not Nature. Science is the flagship journal of AAAS. AAAS says it is an organisation “serving science, service society”. Here are some of their mission bullet points:
Enhance communication among scientists, engineers, and the public;
Promote and defend the integrity of science and its use;
Foster education in science and technology for everyone;
Increase public engagement with science and technology; and
How is any of this better served by having their flagship magazine behind a paywall?
Can they support, through scientific data, that having their flagship journal behind a paywall helps achieve any of those goals? Now those are data I would love to see. Because their “special issue” ‘s biased criticism (please someone correct me if I am wrong!) of Open Access seems to suggest so. Now, if they can’t provide a scientific argument as to why we should give them so much money to be members or access their publication, then how are they any different from the “cottage industry” they seem so ready to criticize? Is preying on libraries or readers less bad than on authors? If I purchase a “pay per view” article and don’t like it, or it does not contain the data promised by the abstract, do I get my money back? Or do these paywalled journals just take the money and run? Because, as much as I dislike the predatory open access journals, at least they are putting the papers out there so that we can all croudsource on how much crap they are.
Do I find an issue with they bringing to the attention of their readership the troubled state of the publishing industry? No.
Do I find an issue with some of the articles in the special issue focusing on some of the naughty players in the Open Access landscape? No.
What I do have a problem with, is the apparent lack of reflection on Science’s and AAAS’ own practices (please someone correct me if I am wrong!).
There was an opportunity to step up, and that opportunity was missed. Science might have a shiny coat of wool decorated with double digit impact factors, but I am not buying it.
I am sticking with the New York Times.
[Updated Oct 5 1:19 to add missing link]