I am often asked how to get better at something. It is a great honor for someone else to recognize a set of skills that I work hard to hone, but I honestly want to answer “the hell if I know.” Last week I talked with someone that was interested in strengthening their programming skills and perhaps converting some of their home grown code into a package for others to use. For the first time I answered honestly, “You’re going to suck, but it will get better.”
Our science and the process of doing that science will not be perfect, but it should get better as we move through our careers. I’m pretty proud of the computational tools that I’ve developed over the years. I know that a hardcore software engineer would cringe at some of our code, that it spans more than 170,000 lines, our public test coverage is pathetic, our willingness to comment the code comes in waves, etc. Want to know something really scary? The predecessor of mothur, DOTUR, was originally written in Perl. I used that program to help me learn Perl. With that code, clustering a dataset of 250 sequences into operational taxonomic units took 45 minutes. And I thought it was pretty awesome. I wrote another tool, s-libshuff in Perl. My collaborator, Bret Larget, showed me his code for the same program in C++. His kicked my ass. So, I used that exercise to learn C++ and apply that skill to rewrite DOTUR. The dataset now clustered in under a minute. This iteration continued and although mothur has various warts, it is considerably better than that original Perl script. Even with these warts (most of which are invisible to most users), mothur works really well and has been cited 3,390 times since December 2009. What if we had waited until we had the perfect package? It never would have been published. What if I had never acknowledged how much that Perl code sucked? It never would have gotten any better.
Fast forward to 2013 when a postdoc working in my lab, Tao Ding, and I were working on a paper looking at the co-variation of microbial community types across the human body. This concept of community types or enterotypes has been controversial (here, here, and here). Although people could quibble with the concept and interpretation of those community types, I didn’t want them to quibble with our execution of the methods. I wanted to make them as reproducible as I could. So we generated an IPython notebook describing all of the commands that we ran to generate our results. This notebook definitely documents everything we did. It is also a monstrosity. Through this process I learned a few things that would serve me in my next project including things I liked and didn’t like about IPython notebooks. In our next paper we wrote it entirely as an R markdown document. I enjoyed this process a lot more and felt it did a better job of showing readers directly where the code came from for different parts of the paper. But it was still a monstrosity and took forever to render the code to a document that we could submit. Next, I wondered whether I could encapsulate away a bunch of the heavy lifting in the computational steps to separate scripts run outside of the R markdown document. I basically made my own version of the program make. Of course, my version sucked. Shortly after this paper was completed I heard Karl Broman make a comment that was something to the effect that make was central to everything he does for making his work reproducible. He is even more forceful on his website, “I would argue that the most important tool for reproducible research is not Sweave or knitr but GNU make”. I then spent the week between Christmas and New Years re-doing an old paper in a reproducible manner using make to force myself to learn make. I’ve written a few more papers using make to control all the heavy lifting that then feeds into an R markdown document that is the paper. I feel really good about my process at this point, but it’s taken me nearly two years and I’m sure it could be better. The key is that through each iteration it has gotten better and I appreciate that my process probably still sucks. More worrisome, is that I have yet to teach my peeps how to use this workflow.
As I think about these two anecdotes and others from my professional and personal life I know that many times I’ve sucked until I’ve made it work well. I feel best about the things where I’ve polished skills to a point where I can at least get my head around where I still have weaknesses and can at least think about the types of things I need to work on. For example, I know that my use of make is still pretty hacky and perhaps not as efficient as it could be. The make file for the last paper I published had 954 lines! I suspect that with a better understanding of make, I could have cut that in half. I’m not going to hit the brakes and stop working on my ongoing projects until I figure out how to perfect my process. After all, I may be too hard on myself and the results I get aren’t wrong. It’s just the process that isn’t ideal and doesn’t lend itself super well to being extended by others. How do I get better now? My plan is to post a snippet of one make file as a blogpost and get feedback from others on what they see as strengths and weaknesses of my approach. I’ve had this idea for a few weeks now, but have been far too proud to bare my soul and let others realize that not only does it suck, but it really sucks.
We live in a world where we either expect that we can be experts in under 24 hours or we can never be experts because you greatness is a gift. Both of these ideas are insulting to people that have busted their asses to get better. The concept of needing 10,000 hours to become an expert is controversial, but even the studies that call this into question still say that practice is important. I know that whenever I’ve wanted to pick up a skill, it’s taken a lot of mistakes and repetition to get better at a skill. Just remember, you’re going to suck, but it will get better.