Wednesday, January 12, 2011

Sadly, the beautiful formatting in Lyx was lost because it cannot export HTML. Oh well, nobody's perfect.

Postmortem of a flawed research manuscript part deux

Thomas E. Keller

Abstract I describe some of the reasons why I got interested in trying to do a simulation of RNA critters, and when I got tired of working on it. I wrote this manuscript in a few hours; the story tolds winds over a couple years. The original manuscript was sent and tentatively accepted to Evolution, but I think dumping it here is a better place for now. I didn't tell Jim and Claus about this plan, but I think they'll be OK with it. I would have attached the code in all of its byzantine horror (at least most of it is python), but I'm too lazy to figure out how to attach things to blog posts. Note that I would have preferred to just publish this automatically to PLoS one instead, but I'm lazy and their submission guidelines are too detailed for me to bother. Wouldn't it be nice if the default of Lyx was ready to immediately be published, warts and all, on PLoS one?

1 I do not enjoy wet-lab work

2 The coding begins

Now, this was one of the first research projects where I used Python a lot. In retrospect, I was a little ambitious and reinvented the wheel. Many previous researchers have used simulations of RNA in an attempt to figure out how different aspects of RNA secondary structure contribute to evolution. One of Lauren Ancel Meyer's former students, Matt Cowperthwaite, had published several studies that I found very interesting.

Matt provided the source code. Unfortunately for me, it was in C and I was unwilling to tinker more with it. So I wrote a new program that did more or less the same thing. One thing I discovered is that Matt's program in fact relied on a C library called ViennaRNA to estimate how a sequence of RNA folds. How then, do I tell Python to interact with some library? Well, the option I ended up with with was Cython, which provides a relatively easy interface with C code functions.

I tried many small experiments, with input over time from Claus and Jim. Eventually, we settled on a story we felt was interesting, and might be publishable. Then the question became, where should we try to submit it? I found it hard to evaluate these questions. The manuscript at first was absolutely dreadful, because Jim and Claus forced me to try it myself; they only provided comments on clarity and sometimes suggestions for new experiments.

Submitting to a journal

We first submitted the manuscript to Genetics. In time, the reviews came back. The general consensus was that we had clearly described our intent and findings; they did not feel the results and implications sufficiently interest for publishing in Genetics.

I was pretty bummed at this point. I don't remember how long I sat on the manuscript, months at least. Eventually, Jim and Claus prodded me into submitting it to Evolution. The reviews came back; the general consensus again was that we had clearly described our intent and results; the reviews, however, had issues with how we interpreted the data. One of the reviewers signed his name. I suppose he felt concerned that he had serious issues with how we interpreted a part of the results that touched on his research interests. I don't know what the general consensus is of when to sign a peer review. I will say that the section that he picked on was something that wasn't integral to the main points, and felt tacked on. I felt much better deleting the section entirely, because I wasn't sure what the results meant and it made me uncomfortable to make strong claims about their implications.

The final manuscript attempts to address the concerns of the reviewers. In some cases, I didn't feel like doing what they suggested, but I at least acknowledged them. I will also mention that the code is capable of doing many things that are not mentioned in the manuscript, such as different types of recombination. I in fact generated many results on recombination, and showed them to some folks at the Evolution conference in the summer. I never sat down and wrote a manuscript because I dreaded having to frame an introduction and discussion that would be sufficient for publication in a journal that considered merit, or the future implications. I had developed a lot of new software that others might use in the future, but several limitations of simulations made me feel uneasy. Many theoretical studies, I feel, do not adequately acknowledge that a real worry is that some of the assumptions made in the model may differ from real life. Does this difference change our interpretation, and if so, to what extent?

I quite enjoy doing simulations, because you can rapidly generate new data that you know is based on certain assumptions. Then you can think about how different assumptions in the model are appealing for a certain biological question. One strength I enjoyed is that the simulations are based on something found in real life, RNA secondary structure. And indeed, there are organisms whose entire genome is made of RNA. However, some of our assumptions made a little less sense. We assumed that for some reason, RNA is under selection for a specific RNA secondary structure. The honest reason for why I chose this form of selection is that Matt wrote his program that way, and so that is what I did as well. Fortunately, I found out that some types of noncoding RNA have highly conserved structures, and that made me feel a little bit less uneasy about that specific assumption. For viruses whose genome is made of RNA, there are probably other factors that have a much higher selection pressure.

One final question: which type of citation (numbers or author and year) more efficiently conveys information while reading the main text in your opinion? I do not know the answer to this question, but I think it's an interesting question worth thinking and writing about.

2 comments:

  1. My immediate response to your last question was "Author and year, totally!" But then I realized the real answer was that it depends on if it's a subject in my area of expertise. If it is, then I want to know whether someone is citing Maynard Smith 1979 or more recent, new research that I might not know about. If it isn't, then I don't care and a list of references is just annoying to wade through.

    ReplyDelete
  2. I definitely prefer author-year. Shannon's right that it can sometimes be annoying, but at least after being trained on so many papers I am pretty good at ignoring the citations without really registering them when I don't care.

    ReplyDelete