Someone Else's Code
setting collaborators up for success
The thing that originally lead me to the open science movement is one of the things I remain most passionate about: code reuse & collaboration. It drives me B A N A N A S when we are forced to start from scratch instead of building on one another’s work; code reuse is a big piece of what ‘standing on the shoulders of giants’ means today.
Discoverability is the first obvious barrier to finding opportunities to work together; but I’m betting that the stronger communication and community ties that undergird the Study Group program we’re spooling up over at Mozilla will take the punch out of that one. We experience the next big hurdle every time we look at someone else’s code for the first time, and ask ourselves, brow furrowed and slightly annoyed, what is this hot mess, anyway?
Making sense of someone else’s code is harder than writing your own. Write your own, and you can choose tools and techniques you are comfortable with, in patterns that make intuitive sense to you, one step at a time. Use someone else’s, and there’s no telling what will come up. For most, myself included for many years, that’s game over - figuring out what code science hath wrought can feel like too big a barrier to be worthwhile.
How can we make this easier? How can we set open science up to welcome newcomers? Recently I’ve been exploring the possibility of building better on-ramps to coding projects, in the form of the Ten Minute Plan.
Ten Minute Plans
What I’m starting to do on my own projects and would like to see more of out there, is the creation of what I call ‘ten minute plans’: a description that gives a newcomer a short, high-level tour of the programmatic logic of the project, that can be read and digested in under ten minutes.
A good ten minute plan should:
- Give people an introduction to the qualitative logic of the program, so they can imagine its operation at high level.
- Give people an overview of how all the technical pieces fit together (who calls who, and when?), possibly as a flow chart.
- Point people at a handy cheat sheet for anything they’re likely to need to refer to frequently during development (variable definitions, function specs).
A ten minute plan does not:
- Include every detail about the operation and logic of the software.
- Explore in full technical depth the details of even the parts of the code it does touch on.
- Substitute for more complete or conventional forms of documentation.
Sure But Why?
The point of the ten minute plan is to help new users and contributors build a first mental model of the code - there will still be tons of details left to learn after the fact, but at least they’ve got the general lay of the land. For my Software Carpentry friends, this is your novice learner’s basic ‘concept map’. While we certainly can’t communicate everything about a piece of code in ten minutes (and shouldn’t try), the one crucial thing I think we can do, is dispel that overwhelming sense of ‘WTF?’ people experience when they first look at our code.
Another thing we get out of helping to construct this mental model, is that we have a better chance of helping new collaborators make contact with things they already know. Your black box was mysterious and kind of unnerving at first, but when we break it down even at the view from 30,000 feet that a ten minute plan affords, newcomers have a better chance of recognizing some bits and pieces they might be a little more confident starting with.
Also, a decent ten minute plan can help us communicate with our collaborators. By giving even a little bit of early guidance in the formation of their mental models, we set ourselves up to be on the same page as them when they start asking questions and making pull requests. I first articulated the notion of a ten minute plan in the primer on code review I wrote for MSL (ok, they were five minute plans there, but whatever), in the context of helping contributors articulate exactly what they were trying to contribute to; if we all have the same set of high-level boxes to begin with, we’ll have an easier time discussing what ideas fit where (or don’t fit at all).
The first thing resembling a ten minute plan I ever wrote looked something like this:
The details are a bit arcane for anyone not into nuclear sort codes, but the idea is universal: a flowchart that describes how data was going to enter this piece of software, and how its different major components were going to fit together. I came up with this when I was tasked with getting about a dozen post docs to successfully contribute to the same project, and this helped keep everyone sitting at the same table. Check out the figure in context here.
I wrote another one, for those who do not dig flow charts as much as me, for the Diversibee project, on deck for hacking at MSL’s Global Sprint in a couple weeks. The first section gives a few bullets on the general flow of the game (it’s an educational ecology game about going bankrupt from overfarming); the second describes how some of the main functions fit together to make the magic happen; and at the bottom I include a handy reference of where I’ve stashed all the game state data. In the same vein, I also included a wiki page giving a brief overview of the ecological processes imitated in the game, so people can get a quick idea of the project from a bee enthusiast’s perspective.
Ten minute plans are part of a bigger narrative: encouraging reuse, collaboration and participation in research software. It’s always going to take a ton of effort and focus to find your feet in any science project, code or otherwise; but what I’d like to see us do, is find and build ways to better reward and amplify that effort, and ensure that it leads to a greater return on investment for the people that choose to participate. By helping scaffold and guide participation in our coding projects, we not only make that effort more efficient, we encourage participation and welcome newcomers by making their road to success as smooth and rewarding as possible.