Build Things Together

Bill Mills

Scientific Software Developers

August 19, 2015 | 6 Minute Read

Programming continues to move towards center stage in all fields of research, and this growing primacy is creating a generation of grad students and post docs like myself: classically trained scientists thrust into roles that are mostly or entirely coding. My colleagues and I are and will continue to demand recognition for our contributions, and by virtue of experience are creating a new role in the lab: that of the Scientific Software Developer. This role will come into focus over the next few years, which means that now is the time to think about who and what we want these new scientists to be.

This is a crucial conversation for the open science movement. Without care, scientific software developers may narrow their focus to their most obvious responsibility: writing quality scientific software. But the lesson we need to learn from Mozilla’s Working Open Guide, and from collaborations like NCEAS’ Codefest, CERN’s Webfest, and the Global Sprint is that the ways and contexts in which we write that software will dramatically impact how open that code is, how sharable and collaborative it is, and by extension how much impact this new subfield has. Scientific software developers have the opportunity to break their fields of research out of their silos - but only if we are willing to embody a number of meta-skills exceeding code alone.

Code Curation

One of the key meta-responsibilities of the scientific software developer is to provide their lab with an awareness of what code is already out there, and informed recommendations on how to take advantage of it. Without this awareness, we will continue to reinvent wheels in isolation, which is the antipattern I would most like to see smashed beyond repair. We can do that smashing, by having a loud conversation about what we’re hacking on at the moment consisting of a few parts:

  • Software Citation. In order to break our code out of institutional silos, we need a way to discover it in context with the research it supports. That way, by staying abreast of the current literature in our fields (as one does in any case), we can create a virtuous circle of citation, discoverability, and reuse.
  • Conference talks & tracks. In addition to citing our software after the fact, we need to promote it as part of active scientific discourse, so that our fields are as broadly aware of our work as possible. Traditional software developers have a rich ecology of software conferences to discover what’s new; scientific software developers need the same.

[Software citation] can create a virtuous circle of citation, discoverability, and reuse.

End-To-End Openness

Something Abby Cabunoc Mayes does an excellent job of championing over at Mozilla is the concept of ‘working open’, which she describes in detail in the similarly-named Guide: it won’t be enough for scientific software developers to take their code and slap it up on GitHub after the fact - we need to include our colleagues in the open source development process from the very beginning, and keep them on board from start to finish.

By keeping our colleagues involved in an open conversation around a development process that incorporates their input, we create opportunities to fold in their ideas for how to make our projects as useful as possible, which multiplies the impact those software projects can have. But what’s more, including people in a well-managed project helps them gain a sense of ownership of the code, and communicating throughout the process helps win hearts and minds as people watch, become familiar with, and participate. In contrast, parachuting a completed project onto people who have never heard of it before is a tough sell; relevance and usage aren’t clear, and the up-front costs of integrating a new piece of software are too daunting without a long tail of enthusiasm to carry people over the hump. The Working Open Guide treats these ideas in depth, but here’s a few things to start:

  • Onramps & Roadmaps. By providing a clear roadmap of a project (starting perhaps with the 10 minute plans I wrote about a few months ago), we give our colleagues the context and clarity to offer feedback early; by making clear contributing guidelines, communicating future goals and curating simple issues for new contributors to tackle, we pave the way for people to get involved in our development projects - and it’s those people who will champion the project in future in their home institutions, extending the impact and value of our work.
  • Consistent Communication. By leveraging things like twitter and a professional blog, as well as encouraging conversation in our issue trackers, scientific software developers can ensure that a conversation about their work is both visible and accessible throughout the development process, feeding all of the goals so far discussed: these communication channels support the discoverability of our work, generate feedback that let us adapt early in the process, and help include as wide an audience as possible in development.

Capacity Building

I love node.js. I could pretend to make a bunch of technical or strategic arguments for its use, but honestly: I just think it’s fun, and left to my own devices, I’ll use it for a lot of my server-side needs, which is exactly what I did during my last postdoc. Trouble was, I was the only JavaScript developer in that lab, and when I left, all that work might as well have been advanced technology from a lost and inscrutable alien empire - interesting, possibly powerful, but unusable and unmaintainable in the hands it fell into. In order for our work as scientific software developers to reach its potential, we need to help support a program of capacity building in our labs and institutions. That way, our colleagues will be better able to use, adapt and maintain our code without our direct involvement - and so we can focus on building code with them, rather than building code for them.

… focus on building code with [our colleagues], rather than building code for them.

In order to mitigate the around-the-block help request lineups many of us entertain, there are a few programs scientific software developers should involve themselves in:

  • Software Carpentry needs little introduction at this point - every cohort at every lab and institution should have this workshop made available to them as they begin their terms. Arranging a workshop is not especially difficult; a scientific software developer should shoot to throw one at least once a year.
  • Mozilla Study Groups are the program I founded at Mozilla this year: regular, casual get-togethers for researchers to share skills and work together on problems and challenges. By helping organize an ongoing conversation and skill-building space like a Study Group, we create opportunities for our colleagues to discover and participate in what we’re working on, and build the skills necessary to participate effectively.

Code curation, procedural openness, and capacity building are things I intend to hammer on in my career as a scientific software developer; I believe that these meta-skills are what will allow us to create a renaissance of scientific programming, rather than simply shoulder the overflow programming labor from our colleagues. What do you think? Are there other opportunities for scientific software developers to support openness and transparency in science? Let me know in the comments.