This is the same behaviour I've seen time and time again in biology labs.
People there are re-doing the same experiment over and over until it gives them the result they want, and then they publish that. It's the only field where I've heard people saying "Oh, yeah, my experiment failed, I have to do it again". What does it even mean that an experiment failed? It did exactly what it was supposed to: it gave you data. It didn't fit your expectations? Good, now you have a tool to refine your expectations. But instead, we see PhD students and post-doc working 70h hours week on experiments with seemingly random results until the randomness goes their way.
A lot of them have no clue about statistical treatment of data, making a proper model to try and test assumptions against reality. Since they deal with insanely complicated system, with hidden variables all over the place, a proper statistical analysis would be the minimum expected to be able to extract any information from the data, but no matter, once you have a good looking figure, you're done. In cellular/molecular biology, nobody cares about what a p-value is, so as long as Excel tells you it's <0.05, you're golden.
The scientific process has been forgotten in biology. Right now it's basically what alchemy was to chemistry.
I very happy to see efforts like this one. Sure, they might show that a lot of "key" papers are very wrong, but that's not the crux of it. If there is a reason for biologists to make sure that their results are real, they might try to put a little more effort into checking their work. And when they figure out how much of it is bullshit, they might even try to slow down a little on the publications and go back to the basics for a little while.
I'm sorry about this rant, but I've been driven away from a career in virology by those same issues, despite my love for the discipline, so I'm a bit bitter.
You should know that there are a million of tiny ways for an experiment to "fail", requiring one do repeat it. Reagents could be bad, a machine could have broken mid-cycle, a positive (or negative) control could have been wrong... Basically meaning, "it didn't work". In this case, any "data" that would have gotten would be incredibly suspect and you'd need to repeat the experiment.
In the vast majority of labs, this is nothing nefarious, it's the way science is done. If something fails, you try to figure out why it failed, try to fix the issue, and then repeat the experiment. Only once everything works correctly can you get valid data to then try and interpret. And if you get an interesting result, you still need to repeat the experiment 2-3 times to be sure.
Repeating failed experiments isn't an issue - and has nothing to do with alchemy. It's just basic troubleshooting.
The issue is there is "failed" and then there is "failed". Yes, many times you have to repeat an experiment because of bad reagents, broken machines, little tweaks are needed to some obscure parameter, or someone left the lab door open...
However, if you experiment is well controlled, then the controls will reveal this sort of "failure". When I was still running gels, many times when first running a new setup we'd run only controls. If your experiment fails because your controls failed, then that's just science.
But I've also seen the other kind of "failure". The kind where the controls came out perfectly, but the effect or the association or the expression profile that the researcher was hoping for didn't show up. When these sorts of failures are ignored or discarded, then we do science a huge disservice.
I am encouraged, though, that there recently seems to be a movement toward, if not outright publishing such negative results, then at least archiving them and sharing them with others in the field. After all, without Michelson and Morley's "failure" we might not have special relativity.
>But I've also seen the other kind of "failure". The kind where the controls came out perfectly, but the effect or the association or the expression profile that the researcher was hoping for didn't show up. When these sorts of failures are ignored or discarded, then we do science a huge disservice.
Why does this happen? Clearly this is what the article insinuates. Is publish or perish that strong? Every honest experiment with honest results benefits society. Not every prediction and result combination results in a prize in your lifetime, but that in no way should influence someone's value as a scientist. That science may be used later for something we had not intended (could i offer you the hope of posthumus recognition?). Finding a way it does not work may save someone else some time. This benefits the scientific community.
Not everyone gets to walk on another planet, some people have to build the ship.
For better or worse, most scientific journals still operate on a business model dependent on physical subscriptions. Since this sets something of a limit on how much can be published, and since scientists tend to prefer paying for positive results vs negative, there has been a strong cultural bias toward favoring positive results.
The good news is that this is gradually changing. As scientists begin to understand that online distribution models don't have the same sorts of limitations, and that search can be a powerful tool, there has been a move toward at least collecting negative results. Of course, they still don't benefit the scientists in the "publish-or-perish" world, but even that may be changing...maybe...
>But I've also seen the other kind of "failure". The kind where the controls came out perfectly, but the effect or the association or the expression profile that the researcher was hoping for didn't show up. When these sorts of failures are ignored or discarded, then we do science a huge disservice.
I absolutely agree that sometimes, you need to redo an experiment for good reasons.
In most cases I've seen, people do not know why they redo the experiment, though. They know it hasn't produced the data they expected, so they redo it. Maybe it was because a reagent was bad, or a co-worker left the incubator open overnight, or maybe it was because the model is stupid. Who knows?
That's my point, actually. Biologists are playing with systems they do not understand, changing parameters somewhat randomly without any control over them, and they then try to interpret whatever comes out, but ONLY it fits what they wanted. If it doesn't, then "Oh, the PCR machine is at it again!", and they throw the results away.
It seems like you had a really bad experience in lab. I'm sorry for that. But it's a mistake to paint the entire field in a negative light because of this. Not all labs are bad, and some produce really outstanding work.
Sometimes the issue is the PCR machine. Sometimes it's the water (my favorite troubleshooting experience from grad school). And figuring out where the issues are (is the the protocol? the reagents? or is this real signal?) can be difficult.
Playing with systems we don't understand is kinda the point.
I've been lucky enough to work in outstanding labs, with people published in Nature, and other journals of that quality. I've worked in 2 countries, and for 4 different labs. I've also talked with people from all over the world, who have worked everywhere, from Harvard to the Pasteur Institute to Cambridge University. The stories are all the same. I hoped I would find some place where people were trying to do things the right way, but what I found is that currently, you don't need to to be published in top journals, so why bother?
It's really refreshing to hear you talk about trying to troubleshoot why an experiment didn't work the way you expected, I hear mostly of people retrying blindly until it "succeeds". What did you do with what you learned with the water causing the failure? Did you publish this, so that someone (or you!) could try to figure out why water was a problem, or at least so that no one would have the same issue? This is the other point: when people do bother about finding about why things fail, I've never seen any of them try to follow up on that, and figure out not only what made it fail, but why it made it fail. "Yeah, the annealing temp was not the right one". Ok, but why?
Of course playing with systems we don't understand is the point, but we have to be very careful about them. We should be varying 1 parameter at a time. This is mostly impossible in biology, but right now we're not even trying to do anything about it.
People don't investigate things like that, because, frankly, no one cares. Nor should they. If your computer is acting funny, and you find out a spider has made a nest inside it, and it works fine once you clear out the spider nest, would you then decide to determine exactly why that spider nest, in that place, causes the exact problems you observed? No, of course not. Because that's not an interesting question. Computers are complicated machines, and they can break in lots of different ways, most of which are not very interesting in their details.
Similarly, suppose you study cultured cells (which are notoriously finicky), and you want to compare what the cells do in the presence of drug X vs control. But at first, you find that all of your control cells die. And eventually you find out that if you use brand X of bottled water vs tap water, the control cells thrive. Are you seriously proposing that you should then drop all work on drug X, and get to work determining exactly what is up with the tap water in your town? I mean, maybe that would be a fruitful research avenue, if you're worried that the tap water isn't safe for human consumption, or you think that there's something interesting about exactly how the tap water is killing your control cells. But most of the time, investigating the tap water would be an expensive distraction from the question you actually want to answer. And most scientists, I think, would (reasonably) decide to get on with investigating the effects of drug X on their cells, and not worry too much about precisely why the tap water killed the control cells. And I don't think there's anything wrong with that. Life is short, and you have to choose your questions carefully.
So you just accept for no reason that tap water is bad somehow, and discard the result you've just gotten?
I do understand that you have a limited amount of time, and can't just go after everything, but when something happens in science, it needs to be documented. Yeah, maybe someone else should investigate, but someone should. Maybe that particular phenomena that lead to the water influencing your result will give you knowledge about cell metabolism. Who knows? If it has that much of an effect on cell growth that you need to deal with it, it's already more active than a lot of compounds we try out, anyway...
To go back to the computer analogy, it feels like my program is bugged, and to debug it I'm changing variables names (which as far as I know shouldn't matter), and then the code magically works again. Sure, some days, I'll go "Ok, compiler magic, got it", but most days I'd be pretty intrigued, and I'd look into it, because yeah, I might just have found a GCC bug.
I agree, no one cares, but I did. I don't know what I don't know yet, and I don't want to presume anything. The tap water thing might actually lead us to solid models which would explain why tap water breaks the experiment. That's why I really think we should start a movement of publishing everything, and trying to deal with simpler models/systems we do understand before going up to models with so many unknowns that the results are basically a dice roll.
This makes me think of Feynman's comments on Cargo Cult Science:
"In 1937 a man named Young did a very interesting [experiment]. He had a long corridor with doors all along one side where the rats came in, and doors along the other side where the food was. He wanted to see if he could train the rats to go in at the third door down from wherever he started them off. No. The rats went immediately to the door where the food had been the time before.
The question was, how did the rats know, because the corridor was so beautifully built and so uniform, that this was the same door as before? Obviously there was something about the door that was different from the other doors. So he painted the doors very carefully, arranging the textures on the faces of the doors exactly the same. Still the rats could tell. Then he thought maybe the rats were smelling the food, so he used chemicals to change the smell after each run. Still the rats could tell. Then he realized the rats might be able to tell by seeing the lights and the arrangement in the laboratory like any commonsense person. So he covered the corridor, and still the rats could tell.
He finally found that they could tell by the way the floor sounded when they ran over it. And he could only fix that by putting his corridor in sand. So he covered one after another of all possible clues and finally was able to fool the rats so that they had to learn to go in the third door. If he relaxed any of his conditions, the rats could tell.
Now, from a scientific standpoint, that is an A-number-one experiment. That is the experiment that makes rat-running experiments sensible, because it uncovers that clues that the rat is really using-- not what you think it's using. And that is the experiment that tells exactly what conditions you have to use in order to be careful and control everything in an experiment with rat-running.
I looked up the subsequent history of this research. The next experiment, and the one after that, never referred to Mr. Young. They never used any of his criteria of putting the corridor on sand, or being very careful. They just went right on running the rats in the same old way, and paid no attention to the great discoveries of Mr. Young, and his papers are not referred to, because he didn't discover anything about the rats. In fact, he discovered all the things you have to do to discover something about rats. But not paying attention to experiments like that is a characteristic example of cargo cult science."
"So you just accept for no reason that tap water is bad somehow, and discard the result you've just gotten?"
What is the "result" that you referring to here? That the tap water in town X kills cultured cells of type Y? Yeah, I guess you could try writing that up and publishing it, but that's a good way to waste a lot of time publishing results that are interesting only to a very small audience. Honestly, if it were me, I'd send an e-mail to people in the same town that might be working with cells of the same or similar type, and then move on.
"but when something happens in science, it needs to be documented"
No, it really doesn't. Stuff doesn't document itself. That takes time, which sometimes is better spent doing other things. Like answering more interesting questions.
"Maybe that particular phenomena that lead to the water influencing your result will give you knowledge about cell metabolism. Who knows?"
That's true, but my point is that it doesn't make you a bad scientist if you shrug your shoulders about why the tap water kills your cells, and get on with your original experiment. For every experiment you do, there's a million others you're not doing, and so it makes sense to focus on the one experiment you're most interested in, not chase after a bunch of side-projects that will (probably) not lead to any kind of an interesting result.
And all of this is very different than a case where you compare control to treatment, find no difference, and therefore just start fiddling with other experimental parameters until you do get a difference between control and treatment. That is, I think you'll agree, a bad way to do science.
"To go back to the computer analogy, it feels like my program is bugged, and to debug it I'm changing variables names (which as far as I know shouldn't matter), and then the code magically works again. Sure, some days, I'll go "Ok, compiler magic, got it", but most days I'd be pretty intrigued, and I'd look into it, because yeah, I might just have found a GCC bug."
I think a better analogy would be if compiler x acts in ways you don't understand, so you switch to gcc, which (most of the time) works as you expect. Are you really required to figure out exactly why compiler x acts as it does? Or would you just get on with using a compiler that works the way you think it should?
"I agree, no one cares, but I did. I don't know what I don't know yet, and I don't want to presume anything. The tap water thing might actually lead us to solid models which would explain why tap water breaks the experiment."
They might, but there are a lot of experiments that have a very very small chance of an interesting outcome, and a near-one chance of a pedestrian outcome. You can do those experiments, and you might get lucky and get the interesting result, but probably you will just get the pedestrian result. And there's nothing wrong (and a lot right) with instead focusing on experiments where (for instance), either outcome would be interesting to people in the field.
"That's why I really think we should start a movement of publishing everything, and trying to deal with simpler models/systems we do understand before going up to models with so many unknowns that the results are basically a dice roll."
I think you're conflating a number of things here. I agree that a reductionist approach to science has bourne a lot of fruit, historically. I agree that studying systems with a lot of unknowns has risks. And it may be that "publish everything" would work better than what we have now. But even if scientists all decide to publish everything they do, they'll still have to make strategic choices about what experiment to do on a given day, and in many cases that will mean not doing a deep dive into questions like "why does the tap water kill my cells, even in control"?
I think we'll have to agree to disagree on most of those points then.
I do not think there are trivial/uninteresting questions. You have to prioritise, but you can't just sweep stuff under the rug and call it a day. I'm not even using the "it might be huge!" argument, just that science is about curiosity. Most math won't end up as useful as cryptography, but it doesn't matter.
I do think that it is part of your job, as a scientist, to document what you do, and what you observe. If a software engineer on my team didn't document his code/methodology correctly, he'd be reprimanded, for good reason. Yeah, it takes time, but it's part of the job. This way, we avoid having 4 people independently rediscovering how to set up the build tools.
* tap water---control failed
* bottled water (generic store brand)---control failed
* distilled water---control succeeded
and then when writing up the experiment, mention the use of distilled water? You might not be interested in why only distilled water worked, but someone else reading the paper might think to investigate.
The problem with everything you've said is that statistical significance tests are almost always statistical power tests -- do you have enough statistical power given the magnitude of the effect you've seen. The underlying assumption of something like the p-value test is that you are applying to p-value test to all known data sampled from an unknown distribution.
If it is standard laboratory procedure to discard results that are aberrant and to repeat tests, and then to apply the p-value test ONLY to the results that conform to some prior expectation, then the assumptions underlying the p-value test are not being followed -- you're not giving it ALL the data that you collected, only the data that fits with your expectations. Even if this is benign the vast majority of the time -- if 99.9% of the times you get an aberrant result are the result of not performing the experiment correctly -- using the p-value test in a way that does not conform to its assumptions increases the likelihood of invalid studies being published.
"That's why I really think we should start a movement of publishing everything, and trying to deal with simpler models/systems we do understand before going up to models with so many unknowns that the results are basically a dice roll."
I would love to see this implemented, and encouraged in every lab around the world! It's not like we don't have computer programs that could collate all the data?
I don't think I will ever see this happen; because the truth is Not what companies want. They want a drug/theory they can sell. It's a money game to these pharmaceutical companies in the end. Companies, and I believe many researchers want positive results, and will hide, cherry pick the experiments/studies that prove their hypotheses? I know their must be honest, addenda free researchers out there, but I have a feeling they are not working for organizations with the money to fund serious, large scale projects?
Take for instance, Eli Lilly--whom has a history of keeping tight control over their researchers. The history of Prozac is a good example of just how money produces positive results;
"Eli Lilly, the company behind Prozac, originally saw an entirely different future for its new drug. It was first tested as a treatment for high blood pressure, which worked in some animals but not in humans. Plan B was as an anti-obesity agent, but this didn't hold up either. When tested on psychotic patients and those hospitalised with depression, LY110141 - by now named Fluoxetine - had no obvious benefit, with a number of patients getting worse. Finally, Eli Lilly tested it on mild depressives. Five recruits tried it; all five cheered up. By 1999, it was providing Eli Lilly with more than 25 per cent of its $10bn revenue."
(1) I love how Prozac was tested on mild depressives. Don't current prescribing guidelines only recommend the administration of Prozac for the most seriously ill--the clinically depressed? Actually--no, it's still recommended for for a myriad of disorders? Wasn't Prozac proved to be only slightly better than placebo? If you dig deeper, their are some sources that don't see any benefit over placebo.
(2) Wouldn't patients/society benefit from seeing all of the studies Eli Lilly presented to the FDA? Not just the favorable ones? How many lives would have been saved if this drug was given an honest evaluation--if every study was published, and put through statistical calculations in the 90's? Think about the millions of patients who suffered terrible side effects of this once terrible expensive drug? Think about the birth defects that could have been prevented?
So yes, I would love to see everything published, but I don't think the business cycle/politics will ever allow it? They want results! They want money! They want endowments! It's a sick game, until you are the one needing help. Help that only honest, good science can produce?
Making sure that everyone working in the field knows not to use tap water seems worth doing, though, even if the reason why isn't understood yet. It sounds like replication is a problem because this sort of knowledge isn't shared widely enough?
tl;dr: Sonnenschein and Soto were studying effects of estrogens on in vitro proliferation of breast cancer cells. Their assays stopped working. Eventually they figured out that Corning had added p-nonylphenol, which is estrogenic, to the plastic, to reduce brittleness.
"We need to get a new carbon filter for that MilliQ machine"
"Wait there's a carbon filter in the milliQ machine?"
"Yeah..."
"...oh, that explains all those results I got"
^ Actual conversation I had in a breakroom.
Stuff breaks all the time. The temperature of the lab changes. Connections get loose. 95% of the time, when the experiment doesn't work, it's because you screwed up on something that's either difficult to control or impossible to control or which you didn't even realize probably wasn't the thing you expected it to be.
But then you risk thinking you understand the system, but you actually don't. How can you be sure the equipment failure didn't make you see something that wasn't there?
That's why you repeat the experiment to confirm your results. A result due to a sporadic equipment failure likely won't be reproducible. If you can't replicate your own results, how could you expect anyone else to be able to?
Spot on with the alchemy remark, I've made similar comparisons before. Coming into bioinformatics/computational biology with a strong discrete math background I found a lot of professors excited to work with me until I started telling them how their ideas and models and experiments didn't imply what they wanted them to. Just like the startup world is awash with "it's like Uber for X" the biology world is full of "let's apply X mathematical technique to $MY_NICHE" and somehow this is supposed to always generate novel positive results worthy of publication. Then you tell them that you applied such-and-such mathematical/statistical model to their pet system and that the results contradict their last 10 years of published papers . . . and they ask you to do it again.
I remember one professor studied metabolic reaction networks modeled with differential equations. The networks themselves were oversimplifications and relied on ~5N parameters (N being the number of compounds in the network). The problem was that while all the examples in publication converged on a nice steady state (yay homeostasis is a consequence of the model!) it was trivial to randomize the parameters within their bounds of experimental measurement and create chaotic systems. Did this mean the model wasn't so great? No, it just meant those couldn't be the real-life configuration of those parameters . . . sigh. And now I'm a data engineer and no one asks me to get data from an API that doesn't actually provide it and I'm much happier.
"it was trivial to randomize the parameters within their bounds of experimental measurement and create chaotic systems. Did this mean the model wasn't so great? No, it just meant those couldn't be the real-life configuration of those parameters . . . sigh."
Why is this sigh-worthy? Maybe the model is still ok. Maybe some of the parameters are constrained in nature in ways that you/we just don't know about yet. Of course, it could be that the model is bad, but you act as if your results, as stated, demonstrate this. Which they do not, as far as I can see.
I hoped that it wasn't as bad in computational biology, or ecology, or any other biology field where systems and models are actually defined. It saddens me to read that your experience was as bad as mine...
I'm a CompBio PhD student, and my experience is that folks in that field are much more careful with statistics than in, say, molecular biology labs, but it varies from lab to lab. My PI is exceedingly meticulous about stats -- for instance, we don't report p-values, but rather entire distributions -- but that's because our work is all in silico, so it's easy to run tons of replicate simulations. Wet lab work that's finicky should definitely be held to high statistical standards, but I don't think it's fair to presume everyone in the field guilty until proven innocent.
But wet lab scientists should be even more careful! They have way less control over the system they're trying to study than you do, so stats are the only security net we have to even attempt to do anything with the data we produce.
I also agree on the innocent until proven guilty part, but by now I've seen and talked to hundreds of people with the best intentions, who do not realise how important careful examination of the data is, so I'm growing a bit disillusioned.
I think it is entirely fair to presume biologists incompetent and sloppy until they have proven otherwise.
(My impression from admittedly limited contact with biology students and from browsing through the occasional paper is that most of them barely approach mediocrity from below.)
Many fields (macroeconomics, IO, etc.) write models that end up with some type of calibration which is implemented in thousands of lines of code. Those lines are written by RAs with almost no programming experience, so what happens is this:
While (result!=expectation)
Ask assistant to look for bugs and repeat simulation
The end result is that you don't stop when there are no more bugs, you stop when you've got the coefficient signs you want, and then get published.
Only if your paper has a high impact, has the code and number open, AND was coded in a very easy language, people discover the flaws:
"the paper ... was, and is, surely the most influential economic analysis of recent years [...] First, they omitted some data; second, they used unusual and highly ques ionable statistical procedures; and finally, yes, they made an Excel coding error"
> we see PhD students and post-doc working 70h hours week on experiments with seemingly random results until the randomness goes their way.
There are known and tested protocols that can fail. Not every step can be accurately recorded. It's very common that an experiment will not work well the first time it's performed (even when supervised by someone experienced). Over time, researchers improve their skills and achieve better results following the same exact protocol. Does that mean that the science behind the experiment is bad?
Does that mean that the science behind the experiment is bad?
No, the science might be solid. But if attempts by peers to reproduce the results fail more often than they succeed, the paper describing the science is (by definition?) inadequate. The level of detail required in the paper varies from field to field, and experiment to experiment, but if the techniques aren't described well enough for others to follow them, then the paper needs more detail.
Even commercial products (where they have a financial incentive to provide clear and comprehensive instructions) often take a lot of training before they work properly. How can we expect a small research lab to be better?
You are right to point out that giving clear instructions for a complex task is difficult. And elsewhere in this thread, 'nycticorax' makes some great points. I fear my answer is along the lines of Rutherford's often ridiculed quote about statistics and experimental design: "If you need to use statistics, then you should design a better experiment."
If the experiment that you describe in your paper is too difficult for others to reproduce, perhaps it shouldn't yet be published as a paper? Would the public interest be better served by funding researchers who do simpler but reproducible work, rather than complex work where the results essentially need to be taken on faith? Carried to the extreme, this is a terrible rule, but I feel there is a kernel of truth to it.
I guess the right strategy depends on how much faith you have in the correctness of published results, evaluated solely on plausibility and the reputation of the researcher, and thus how much value there is in a conclusion based on irreproducible results. I think there is a currently a justified crisis of belief in science, and that many fields would do well to get back on to solid ground.
Isn't this actually an attractive ethical hazard† (in a very broad incentive sense) - and as such, to counteract it couldn't we actually encode an ethical obligation to immediately publish the data from any experiment, to counteract this hazard? Just in any old place, not as a full paper.
You could re-run your experiment of course if you thought there was some experimental methodological error, but as you disclose somewhere your first, your second, and your third dataset all showing the same data with more or less the same methodology, you would have to show increasing confidence in your fourth or fifth dataset (the one you would otherwise publish alone), because it has to explain all of the earlier datasets as statistical flukes only: you would no longer have the incentive to run the fourth experiment alone and publish it without reference to earlier trials.
To give an example, suppose anyone is rewarded by publicity if they show that flipping U.S. coins favors heads by more than 2%. This is temptingly easy to do if you don't publish experiments that don't statistically prove that: you just keep re-running the experiment until you get the result you want at the p-value you want, so if the reward for the result is more than the number of experiments you need to do to get it times the cost of each experiment (which can be truly tiny), the experiment presents an attractive ethical hazard.
But if you are ethically forced to publish (or even summarize) your earlier complete experiment and dataset really in any old place, then if there is no actual difference it becomes vanishingly unlikely that you can suddenly prove your theory and explain all of the earlier experiments as statistical outliers. You would stop exploring after your second or third experiment. (Which you would still quickly summarize.) This does however present an added burden to researchers, especially if they quickly test something before really refining the methodology to do so. So these quick disclosures could still be considered quite dirty and not very meaningful. However their disclosure would give a good indication regarding how strong a result really is. (i.e. by glancing at how much dirty data precedes the actual experiment being published.)
-
† I'd like to recall the words "moral hazard" because it does indicate that people are tempted toward the bad behavior. But economically I think that term is too specific (meaning risk-taking where someone else has to clean up in case of failure) -https://en.wikipedia.org/wiki/Moral_hazard
Yeah, I've seen this xkcd. It's on point as usual, but it doesn't show the frightening thing: it's the scientists who are going "Whoa" and believing in the results they've produced...
On the other hand, reproducing badly-designed, not-well thought of, and probably inadequately executed experiments won't be of much help to anything. In biology its not often clear what is known, what is not, and what are the open questions, and the scientists themselves have a lot of the blame for it (for religiously sticking to closed publishing models and a general lack of initiative to share their data).
Looking at this through the lens of drug discovery is the wrong way to do this. The problem is with our drug discovery strategy, generally, not with the reproducibility of our research.
STK33, for example, is definitely implicated in cancer through a wide variety of mechanisms. It is often mutated in tumors, and multiple studies have picked it up as having a role in driving migration, metastasis, etc.
This doesn't mean we can make good drugs to it.
Making drugs is hard - they need to be available in the tissue in the right concentrations, often difficult to achieve with a weird-shaped, sticky molecule. They need to have specificity for the tumor, they need to have specificity for the gene target(s) of interest. They need to be effective at modulating the target.
More importantly, though, the drug is modulating a target (gene) that is involved in a biological system that involves complex systems of feedback control, produces adaptive responses, and otherwise behaves in unexpected ways in response to modulation.
In my experience this is usually underappreciated by most drug discovery strategies, which merely seek to "inhibit the target" as if its involvement in the tumor process means we can simply treat it as an "on-off" switch for cancer. This assumption is asinine, and of course will (and does) lead to frequent failure. STK33 is not an on-off switch, and attempting to treat it that way will likely result in a drug that does nothing.
This is absolutely correct. Pharma companies are running into a wall and are flailing to figure out what to do. It's quite clear that from first principles bathing the entire body in trillions of little molecules hoping that they only and completely shut down a single kind of protein, at the right time, in the right place, of the right cell, and do nothing more, is insane. There is some logic behind the ability for a small molecule to help against invading diseases (the antibiotics of the 20th century), but the same strategy will philosophically not work for entire classes of cancers or other innate biological problems. Though we all somehow just assumed that it would.
The pharmacy of the future will be entirely curative to non-invading diseases, will repair the DNA that's been mutated, will express or inhibit the proteins that need to be expressed or inhibited. And small molecules will be the payload of these fancy protein-based nano-machines. But this hunt to bring down the cost of those small-molecule targets at the cost of the reputation of the science itself might be foolhardy when the cost is near-infinite in the first place because they're looking from the wrong perspective.
tldr; Pharmaceutical companies' hammer that worked so well against the nails of bacterial infection is in no way suited to the plumbing of cancer. And now they're 'investigating the plumbers' to figure out why their fancy new 50-billion-dollar "water-hammers" don't work so well to unclog pipes.
I agree it's challenging to create small molecule treatments for oncology. That said, there have been some recent, massive successes recently. Look at Imbruvica which is a massive jump forward in treating MCL and CLL.
Even if you drop small molecules and focus on antibodies, it's not like it's all that easier.
Certainly there are success stories when so much effort is put forward. But look at all the things even Imbruvica does in addition to helping treat cancer [1].
If you want to fix MCL you figure out how to engineer a genetic payload that targets B-cells IFF they express particular genes, and reengineer those cells' genomes to either no longer reproduce abnormally, or shut them down. You do NOT covalently turn off an entire class of kinases in the ENTIRE body...
And that's the success story. Antibodies are just the tip of the protein-iceberg. They're the 'same things as a small molecule' but in protein form - baby steps into a whole new world. Sure, they can bind to stuff tightly, but if that (alone) is what you're aiming for, then they're not being used to try to fix the problem or engineer a way to the solution. There's lots more that we could do if we started actually engineering the proteins and their interactions, and delivering them in directed ways. We have access to those primitive engineering tools, but instead of focusing on those nascent tools, we're polishing up the old hammer.
Well, the wall they are running into is kind of self inflicted. The cure(s) have been present before them since 1920s yet we are in this state.
First off, the medical science is itself based on wrong theories, see what Pasteur himself had to say "Terrain is everything, the Germ is nothing", yet the medical industry is hell bent on developing cures based on germs.
As for cancer,HIV and other incurable diseases many researchers have come forward, some of them include Rife, Nessens, William Koch, Oxygen therapy, Lakhovsky MWO, Priore device etc.
Unless we revise the our theories and procedures and look into the works of above reseachers, we will never find the cure.
The reason they do this is easy to understand: in science you try the simplest processes first, before you do something really complicated. If the drug companies are using this process is because there are still having successes (low hanging fruits) that can be achieved right here, right now. I agree that the best way to create effective drugs will involve more complicated processes that will take a long time to develop, but above all, to approve through regulatory agencies.
> This past January, the cancer reproducibility project published its protocol for replicating the experiments, and the waiting began for Young to see whether his work will hold up in their hands. He says that if the project does match his results, it will be unsurprising—the paper's findings have already been reproduced. If it doesn't, a lack of expertise in the replicating lab may be responsible. Either way, the project seems a waste of time, Young says. “I am a huge fan of reproducibility. But this mechanism is not the way to test it.”
One swallow does not make a spring. With a belief like 'one replication is enough', I'm not sure Young actually appreciates how large sampling error is under usual significance levels or how high heterogeneity between labs is.
As far as I can tell from the article, Young's stance is that the contract lab doing the reproduction doesn't seem competent enough to correctly reproduce the experiment.
I don't know enough to know if that's true or not. But it seems at least possible that in the rush to reproduce much, the project is cutting costs by using less-skilled contract labs.
> Young's stance is that the contract lab doing the reproduction doesn't seem competent enough to correctly reproduce the experiment.
Yeah, but the contract lab doesn't need to be filled with geniuses (like Young perhaps) just to reproduce a now-mundane result from a few years ago, right? We're not talking about coming up with cutting-edge new experiments, just reproducing an old result using lab techniques that have probably become far less "cutting edge" in the intervening years...
I am highly suspicious of a claim that some experiment from a few years ago can still only be replicated by a tiny number of the top experts.
I am highly suspicious of a claim that some experiment from a few years ago can still only be replicated by a tiny number of the top experts.
I don't think you can find any claim on the order of "only a few top experts can do this..." in the article, the objections claimed to be against specific contract shops. As I understand things, biology experiments are still a matter of craft, of doing things by hand. Like any such thing (say, wood working or computer programming) there are going to be levels of skill involved with the lowest level of qualified people often being rather bad (for example lowest level contract programming shop). An analogy that springs to mind is whether one would want one's video game ported by the cheapest porting operation the business.
Of course, all this is me talking about what I can glean from the article compared to what others imagine the article says - until and unless someone with specific knowledge of the complaints and their validity or lack-there-of appears.
But as the op mentions, lab techniques vary and not everything can be documented. It's reasonable for a scientist to ask for a certain level in a lab attempting reproduction. I don't know if these scientists are demanding too much and have something to hide or if there are problem with the lab involved. However, the article actually mentions numerous scientists expressing misgivings.
A scientist should be able to describe all assumptions and manipulable variables that go into an experiment.
>It's reasonable for a scientist to ask for a certain level
Maybe, but a scientist should be able to specifically describe what that 'level' is.
Instead, this article seems to present many individuals who think appealing to an abstract 'level' of their own choosing and to which they seem to not be able to describe sufficiently to anyone else (which is fishy, because all of this serves, intentionally or not, to prevent anyone from attempting to replicate their experiment).
In my view of science, there should never be a case where 'misgivings' about a replication attempt should ever be expressed. Either the experiment will be accurately replicated or it won't, in which case we can probably determine if the replication was not accurate or if the original experiment was flawed (either in design, execution, and/or analysis).
Science gives us the tools to remove things from the influence and design of opinion and to examine them from viewpoints that are as free from subjectivity as possible. Once you start moving into the realm of opinion (expressed misgivings) you are leaving the realm of science and moving into philosophy, religion, or worse.
I disagree here, because the inevitable result of a highly visible series of failed reproductions is a big media hit. The scientific community may be able to poke holes in the reproduction attempt and sort through the damage, but the media and the court of public opinion certainly can't. Not until long after the reputations of possibly faultless scientists have been ruined irreparably.
So it's important for the reproduction attempts to be as high quality and rigorous as we would hope the original studies were. And it behooves scientists to make sure that these attempts are legitimate, unbiased and equitable, and to investigate any experimental flaws and biases of the experimenters before the results.
Misgivings about a reproduction attempt don't indicate denial of the validity of the scientific process, but recognition of the volatility of scientific news media. The unfortunate reality is that both sides of this effort, both original researchers and reproduction attempts, are subject to a great number of biases and restrictions. Subjective opinion does deeply affect the lives of scientists, and it's not possible for even the best scientists to live in a bubble of scientific purity and assume things will work out.
>I disagree here, because the inevitable result of a highly visible series of failed reproductions is a big media hit.
Interesting proclamation.
If a result can't be replicated after many 'highly visible' attempts then the result should be called into serious question.
>The scientific community may be able to poke holes in the reproduction attempt and sort through the damage, but the media and the court of public opinion certainly can't. Not until long after the reputations of possibly faultless scientists have been ruined irreparably.
This sounds like unwarranted fatalism to me. If the result was not reproduced because the experiment was not actually reproduced...I don't see what the issue is here.
>So it's important for the reproduction attempts to be as high quality and rigorous as we would hope the original studies were. And it behooves scientists to make sure that these attempts are legitimate, unbiased and equitable, and to investigate any experimental flaws and biases of the experimenters before the results.
This is why it is critical for any researcher that desires credibility (and more importantly: explanatory power) to detail their work accurately enough that someone else can exactly replicate their experiments in order to provide independent verification of their claims.
The best way to ensure that replication attempts are 'legitimate, unbiased and equitable' is to ensure your work is good enough that someone can actually (as opposed to merely attempting to) reproduce it.
>Misgivings about a reproduction attempt don't indicate denial of the validity of the scientific process, but recognition of the volatility of scientific news media.
Maybe.
>The unfortunate reality is that both sides of this effort, both original researchers and reproduction attempts, are subject to a great number of biases and restrictions. Subjective opinion does deeply affect the lives of scientists, and it's not possible for even the best scientists to live in a bubble of scientific purity and assume things will work out.
This sounds like something scientists need to work towards resolving.
"As far as I can tell from the article, Young's stance is that the contract lab doing the reproduction doesn't seem competent enough to correctly reproduce the experiment."
I'm suddenly having flashbacks, something about cold fusion and how groups with the right competence were indeed able to reproduce the results, for some value of "reproduce".
Agreed. I would actually be surprised if only a single lab had tried to reproduce a key result.
There may well be other labs which tried, and failed, to reproduce the original author's findings. In any event, it is unlikely we would know, as failure is unfortunately rarely publicized in academia (especially so when such failure is commonly dismissed out of hand for reasons such as a "lack of expertise").
I personally think the work this replication work is incredibly important and should absolutely be a key priority for funding institutions.
>I'm not sure Young actually appreciates how large sampling error is under usual significance levels or how high heterogeneity between labs is.
Rick Young is an MIT Professor, has been in biology research for 3 decades, is in the National Academies of Sciences and has published hundreds of high-impact papers across a huge number of disciplines. I know this doesn't mean he is God but to say that he doesn't understand "sampling errors" or "heterogeneity between labs" is an absurd insult.
One might think it's absurd, but nevertheless, that is what he is quoted as saying, and there are a lot of prestigious names on record as criticizing things like evidence-based medicine or replication. (You may remember a particularly asinine editorial not that long ago by a prestigious Harvard researcher arguing against any attempt to replicate studies.) I would no more be surprised to learn that Young genuinely thinks what he is quoted as arguing, than I would be to see, say, yet another misinterpretation of p-values in a peer-reviewed published research article or yet another article where the power is closer to 0% than 100% & the results near-meaningless or another study with scores of described hypothesis tests and no consideration of false-positive issues or any of the dreary litany of problems we all know about.
Being a published biology researcher does not make one an instant meta-analyst, qualified to say that there is no need for any additional replications (to say that sort of thing, you should be drawing on something like a trial sequential analysis indicating futility of additional studies, not saying 'well it seems to me like one (successful) replication of my result is enough and this second replication is a waste').
You are misinterpreting him; he's not saying that there are no more replications needed, he's saying that this method is not necessary.
And true, if another lab already got it going, then some random contract researcher getting it going or not is in no way definitive; if they do get it going, fine, and if they don't replicate, then it's not evidence of anything.
So what's the point? Isn't it better to use that same time and money to build a new experiment that has a higher upside rather than one that has a very limited upside?
> You are misinterpreting him; he's not saying that there are no more replications needed, he's saying that this method is not necessary.
I am not misinterpreting him at all. Look at the disjunctive argument he offers; both forks imply a lack of appreciation for sampling error and heterogeneity. (If the experiment is replicated, then far from being useless, the specifics will add substantial precision in the effect size and/or existence and also help estimating heterogeneity across labs attempting the procedure; and if not, then it 'may' be due to incompetence, but may also not be, in which case one has learned even more...)
> if they do get it going, fine, and if they don't replicate, then it's not evidence of anything.
I disagree, it is evidence. If the lab has an 80% chance of replicating correctly yadda yadda bayes theorem yadda yadda. Unless the contract lab is so incompetent that all things it does are purely random, their success or failure is evidence.
> Isn't it better to use that same time and money to build a new experiment that has a higher upside rather than one that has a very limited upside?
No. I don't know why you would think all experiments are of equal value and that replication is worthless. The more important an experiment is, the more important it is that it be replicated. The less important an experiment, the less important that anyone spend the resources replicating it. It can very easily be much better to replicate an experiment to gain additional confidence and/or precision. (See Bayesian search & optimization, decision theory, value of information, etc.) Imagine a medical clinical trial for a cancer treatment, would you really argue 'well, the first experiment turned in p<0.05, let's roll this sucker out to all the patients in the world! We could run more trials to make sure it works in more than one place and is worth the money and side-effects, but after all, isn't it better to use that same time and money to study new drugs, which have higher upsides rather than one with a very limited upside?'
As a complete outsider reading the attitudes behind the original scientists, it seems to me that they resent the oversight and hate to do extra work. In defending their practices they fall back on "expert work" and essentially are arguing that what they are doing is too complex for anyone else to do and they should be left alone to continue to do it.
And from their point of view, it seems all very reasonable. But from the rest of humanity who is being asked to materially support them, and waits for their conclusions to make the world a better place, it seems ... frankly... lazy and selfish. 30 emails, wow! 2 weeks of a graduate student's time -- these are the people who are the least paid right? Below minimum wage even? The demands on their time seem so low, yet the complaints are so high, that one can't help but wonder if the concern really is that their results are too 'magical' and irreproducible and they just fear other people learning about it.
I've seen this behavior in professional settings, and ultimately it comes down to a lack of confidence in oneself, the tools and technology and the quality of work being done. Careers are at stake, but is the alternative to just give people a free pass?
Sorry, but how is this not just anti-intellectualism?
> 30 emails, wow!
It can take up to half a day to reply to an email containing even one fairly technical question. Obviously the time invested depends on the content of those 30 emails, but it's not at all difficult to imagine this eating a week worth of a PI's time.
> 2 weeks of a graduate student's time -- these are the people who are the least paid right? Below minimum wage even?
The typical attitude is that this is a justification for not making them spend their time managing these sorts of tasks. You don't get to say "we get to pay you crap because your job is exciting and rewarding" and then load that same person down with a bunch of grunt work (on top of all the normal grunt work). That's how you bleed students and kill the lab's productivity.
Also, grad students are not well paid but they are typically no cheaper than post docs; PIs pay tuition.
> The demands on their time seem so low, yet the complaints are so high
So let's imagine a world where people calling for massive improvements to reproducability get their way. Let's say it is a month worth of time for the lab. For each paper. Multiple papers a year. That's a pretty massive time investment. If you're a top lab, that could become a full-time position. And believe me, you're not going to be able to fill that position with a grad student. That person will have to be well-paid, because their job is going to suck.
So it's reasonable that scientists are peeved when they invest all this time and don't perceive their collaborators as acting in good faith, or feel like their collaborators are trying to cut corners to pinch pennies.
If an engineer builds the worlds greatest new engine but says "unfortunately it'll only run in my lab, no one else is competent enough to run it or build a copy" then what good is it to society?
If the researchers in a lab are such geniuses that they are doing experiments almost nobody else can duplicate and it is therefore impossible to determine the veracity of their claims, how is that helping society and why should society fund them?
Isn't the onus on the researchers to focus experiments that are also reproducible by non-supergeniuses?
Science is not engineering. In engineering, we understand the problem domain enough to make everything reproducible, the governing principles are well enough understood that reasoning from first principles can usually come up with answers.
In contrast, in science, we are trying to discover those first principles. So if something doesn't replicate in another lab, we have a responsibility to check it out, but that doesn't mean that the scientist has to discover all the first principles before making a publication! The non-replication could be due to new, solid, first principles, and the assumption that it should be "reproducible" exactly from the English words on a sheet of paper are a faulty assumption.
If you want to be able to pick up the scientific literature and read it like a science text book, you're doing it wrong. That is not the purpose of publishing papers.
> If an engineer builds the worlds greatest new engine
There is more to science than advanced product development. Conflating the two is wrong-headed.
> but says "unfortunately it'll only run in my lab, no one else is competent enough to run it or build a copy" then what good is it to society?
It's fantastically good to society. A company interested in monetizing the research could provide that researcher with a multi-year sabbatical to come to their company and turn is Research into a Product.
Incidentally, that happens. It's also not unheard of for phd students to carry an adviser's idea forward toward application in the context of a permanent position at a relevant company.
> almost nobody else can duplicate
There's a difference between not being able to duplicate, and duplication being expensive.
> and it is therefore impossible to determine the veracity of their claims
Again, reproducibility should focus on the veracity of the claims, not the economics of reproducing them. Nothing is wrong with calling for better reproducibility. The problem is in expecting to get it for free, and assuming that it's always appropriate at every stage of research.
Investment in reproducibility should be in proportion to the degree of trust the scientific community puts in the claim, and it is absolutely reasonable for that investment to grow over time. But, don't kid yourself, it's an investment. And society would have to be stupid to invest enormous amounts of money into ensuring every single scientific paper ever published is held to an extremely strong standard for reproducibility.
> Isn't the onus on the researchers to focus experiments that are also reproducible by non-supergeniuses?
If by "super genius" you mean "someone else who does research in the same or a closely related field", then Hell. No. The onus on the scientists is to focus on experiments that push science forward in service of mankind.
Sometimes this means helping mega-corps figure out how to reliably reproduce your research without expert help and thereby increase profit by decreasing required investment. Sometimes this means focusing on discovery.
"A paper that Young, a biologist at the Massachusetts Institute of Technology in Cambridge, had published inCell in 2012 on how a protein called c-Myc spurs tumor growth was among 50 high-impact papers chosen for scrutiny by the Reproducibility Project: Cancer Biology."
That hardly sounds like "every single scientific paper ever published".
You are correct; I was being hyperbolic. But still, not sure Nature/Science/Cell is a high enough standard for "anyone should be able to replicate with little effort" -- lots of "non-late-game" results that aren't necessarily ready for industry applications get published in those venues (which, I guess I've been arguing, is a good thing.)
Well, the economics for the LHC are difficult, precisely for this reason.
Yes, it is possible the money spent on LHC could have been spent better on other "smaller scale" projects with a better cost/benefit ratio, and better aligned incentives (possibly even outside of the field of high energy physics)
I would have to study the economics of the LHC better to have a strong opinion on this though.
Alternatively, any regulatory approval should be limited to versions of the resulting drug at that particular lab, since only they know how to produce it ... although output may be limited. :-p
When I was an academic, such questions did not take half a day to answer. That's because my papers were detailed and answered the questions - I would simply say "go read section 4.3".
Further, I also put source code on my webpage (this was in the pre-github era). "To replicate and check methodology, `darcs clone ...`".
If replying to technical emails takes too much time, it is because your publications are incomplete. Full stop.
If you can even come close to explaining how to replicate your results in 12 or 20 pages, you're not really doing an experiment that is non-trivial to replicate. Constraining science to protocols that can be explained in 12-20 pages would be stupid and wrong; almost any protocol used in a modern lab will take longer to explain than that.
With all due respect, the assertion that a typical protocol should be able to fit in the length of a typical CS publication is pure hubris. (And when you need a 200 page TR to explain things, my arguments about unjustified cost start resonating because you're demanding a dissertation or two per published paper.)
> such questions did not take half a day to answer.
Just because the science you did was trivial to replicate does not mean all science is trivial to replicate.
Yeah, when replication involves nothing more or less than compiling code on commodity hardware, I can imagine replication protocols are probably pretty easy to explain... it's a small wonder anyone had to email you about this at all.
> "To replicate and check methodology, `darcs clone ...`".
However, the assertion that replicating any experiment is basically no more difficult that downloading some source code and compiling/running it is... arrogant in the extreme.
It's nice that some Computer Scientists can "replicate" their results by saying "oh yeah compile and run this code on the same dataset".
But asserting that all science is like this is underestimating the difficulty of replication even in CS. What happens when your dataset is terabytes in size and requires a specially configured cluster of GPUs or FPGAs to process in the next year? What if you have a half million dollar robot that has to be built to spec? Give me the 12 page IKEA manual for building a half-million dollar robot that has only ever been built once before. Or even the 200 page manual. Not all Computer Science can be done on a laptop, and the assertion that it can is, again, arrogant.
Even more to the point, most laboratory experiments are not run entirely by autonomous robots that you can load code into and walk away; in fact, many require steps that require skills in the lab that can only come with lots of practice. Hell, limiting ourselves to protocols that are easy to use and fit in 12-20 pages would ban the contents of upper-level undergraduate lab assignments.
There is nothing wrong with an experiment requiring skill to replicate. Banning protocols that are difficult for non-experts to follow for intrinsic reasons would basically ban all modern experimental science.
My papers were mostly a lot longer than 12-20 pages (one was 10x this), or at least referred to such a paper where details were explained. I have no objection to complex protocols, I simply assert that the protocol needs to be explained at most once.
The real answer to this problem is to invest more money on research. If there are more people looking at a particular problem, there is less and less chance that a big mistake will be hiding. Many have the wishful thinking that if we were just more careful with procedures or software or technology or something else, then the results would be more robust. Unfortunately this doesn't work when you have fixed resources to perform extremely complex research. The power of science is not on individual results (papers), but on how these results confirm or refute each other over time.
It seems like a sensible check is for this collaboration to include original studies, like the one mentioned in the article lede, that have already been replicated elsewhere. (Ideally they would keep blind the relevant members of the collaboration to this fact.) Then when you say "we failed to replicate X% of the studies" you also say "of the subgroup that had already be replicated, we failed to replicate Y%". If Y isn't much smaller than X, you know the replication collaboration is probably botching this.
I can't state how strongly I disagree with the conclusion that papers should be providing excruciating detail about protocol just because "pharmaceutical companies can't reproduce key cancer papers [without the help of the original scientists]". Science has rarely been done like this.
It would be like Google complaining that they can't copy psuedocode verbatim out of a paper and have a highly performant algorithm. Or Microsoft complaining that a static analysis defined in a paper wasn't accompanied by a production-ready implementation.
Producing protocols that literally anyone could replicate without expending effort is not the business of Science.
Replication should focus on the veracity of the underlying truth claim, not the economics of reproducing the results.
That may be true, but one of the key ideals in science is that the findings are reproduceable. If I can't reproduce it, it's helpful for the science to be fairly specific, in case the findings depended on details that were thought to be unimportant, but we later find out were critical. The most interesting science comes in unexpected results.
I feel it's like a piece of closed source software releasing "the source code" and yet no one can make it compile. You don't have to have a clean codebase, it can be a bit tricky to build, but it has to be able to be compiled before anyone can have any confidence in your claims. If you don't provide that, it's much less useful.
The separate issues of for profit labs and what support they should expect in this context are tough issues, but we shouldn't undermine the science to thwart them.
Thanks for giving an excellent example. NuPRL is exactly like this, but no one in their right mind would say the group that put out NuPRL hasn't done good science for 20+ years.
To clarify, the NuPRL project is like this in what way? They discuss cs topics and algorithms but no source code exists? or it exists but they don't provide it? Or (unlikely) they provide source code no one can compile?
I don't want to mix up the side issue of "should code-based research always provide source code" with the metaphors being discussed. I definitely think useful CS research can happen without source code needing to exist, which I hope is the NuPRL project's case.
I do feel code based research should provide source code that was created as part of the research when it is material to their claims, including 'we ran a simulation' and similar findings. If you already have that level of detail, you should include it. If you make claims based on data, you need to provide that data. If you make claims based on source code, you need to provide that source code.
Difficult to obtain and get running (I'm told by everyone I've asked). So definitely not the first thing; mostly those last two things.
And yeah, I think that high-quality source code is definitely a pretty impressive feat when researchers pull in off. And I also agree source code should be provided. But beyond providing a VM, I think it's really dangerous to sort of fixate on building software that's easy to setup/use. There's a pretty significant time cost there in some cases, and it's worth asking whether that's what we want to be spending our Science dollars on.
Your analogies are completely off the mark. It is not the same at all. The cornerstone of science is that other people can reproduce your results. Period. There is no use publishing otherwise. Withholding key information, which is so ubiquitous now, is a great disservice to the scientific community.
The question is not whether reproducibility is good, it's how much labs should invest upfront in producing descriptions of protocols. My argument is that they should probably invest more than they do now, but not enough that pharmacutical companies are able to reproduce a given experiment without talking to the lab.
Science is a collaborative process. There's nothing wrong with collaboration being part of the reproducibility process, as long as the person doing the reproduction maintains their objectivity.
People often misunderstand the idea of reproducibility in science. The idea is not that scientists need to given a complete, easy to follow set of procedures that will let anyone reproduce an observation. In some areas this is close to impossible. For example, in high energy physics there is only one equipment in the world that can reproduce (with luck) some key experiment. The idea is that, with enough effort and funding somebody else could possibly reach the same conclusions. If you don't agree with some published results, there is an easy way to do it: create your own experiment and publish the results. If the result conflicts, then a new step is to determine why and in which conditions there is a conflict. Science evolves through the debate of ideas and observations, not because somebody is sharing a cookie-cut recipe.
It also casts a doubt on the independence of the verification. It will only catch outright cases of falsification, but innocent situations where the procedure is erroneous or the conclusions are wrongly drawn will slip through.
If the reproducers don't quite know what they are doing if left to their own devices, they will invariably be influenced by the original researchers into being blind to exactly the same mistakes and making the same wrong inferences.
Another software analogy: it's like QA people needing detailed hand-holding from the original developer in testing some program, instead of independently looking for ways to break it. That is then no better than the original functional testing by the developer.
There are at least two distinct concepts here. 1) Exactly (as much as possible, given apparent environmental constraints) reproducing the procedure and results claimed and 2) attacking the procedure or results as ineffective or misleading (to what is actually being claimed).
I'm not sure either is quite analogous to QA testing in the way you claimed though. Yes, QA testing should be reproducible and yes, QA testing should search for novel tests, but the scientific process is still a bit different and there are multiple distinct and important phases and concepts at play.
Yes, it's the truth of the biological knowledge that is most important. And we get at that best not by replicating a particular experimental design in two labs, but by examining it in the context of a diversity of different experiments with different assumptions.
The Reproducibility Project is run by people in the business of replicating experimental design for profit. They argue that a proportion of public research money should be given to businesses like theirs for this process. This will take a lot of resources that would otherwise be spent examining biological knowledge from different angles rather than the same angle multiple times. In the end, we will know less.
If I claim, "An implementation of this algorithm (described in pseudocode) achieves O(N) peformance on that task" and no one can reproduce it based on the pseudocode, I've published a lie.
This is a phenomenon I've seen many times in my work as a consultant translating academic discoveries to commercial code. This work invovles digging through both published papers and actual code to find that some grad student has implemented a clever heuristic that is not described in the paper because it is "inessential" that is what the claimed performance actually relies on.
The article says, "It's unrealistic to think contract labs or university core facilities can get the same results as a highly specialized team of academic researchers, they say."
I can understand this sentiment: science is hard, and at the cutting edge it's really hard. I've certainly got results that I think it would be hard for others to achieve and I've succeed with experiments where others have failed simply because I have good hands.
I once took over an experiment that had been worked on by a heavy smoker with shaky hands. He had developed a complicated apparatus for streching ultra-thin plastic foils over a frame. I just used my fingers. If I had published the way I did it and someone like him had tried to reproduce it, they would have failed.
However... the kind of reproducibilty being talked about here is not (primarily) technique, it is results. If a result can't be reproduced because the cell line was so very special or whatever, then it likely isn't robust enough to be very interesting in practical biology. It certainly isn't of much interest in drug discovery. So while the science may be OK, results that are incredibly hard to reproduce are of very low value for biotech, and that knowledge is worth something in and of itself.
You make a lot of assertions, but the closest you come to explaining the reason for any of those assertions is, "That's the way it's always been done."
Can you elaborate on why this new approach isn't even worth trying?
* Because this is in the context of a for profit company arguing that they should get a slice of the Science funding pie so that pharmaceutical companies have an easier time at reproducing results. It's unclear that this is a better use of funding than investing in new discoveries (unless you are profoundly suspicious of scientists and think they are all fabricating results), but it's fairly obvious why this aligns with the profit motives of pharmaceutical companies. I hope it's obvious why that's troubling to me and others.
* Because there is value in reproducing results without doing everything exactly identically (see other posts).
* Because there are other mechanisms for obtaining high-quality descriptions of protocols and it doesn't necessarily always make sense to invest lots of time in this early in the scientific process.
* Because public morality panics rarely result in good policy making.
My criticism was only of the justification given. And honestly, the attitude in scientific publishing has never been "science should be written so that people in industry can reproduce the results at minimal effort/expense and without acquiring scientists who know the area and techniques well."
(Also, your post would have worked just as well without the first sentence.)
yeah, sorry about that. but biological systems are very complex so it's more important to know whether the claimed argument is verified. Finding a negative result under a different protocol would be very hard to analyze and pinpoint the source of the discrepancy.
Also, empirical science is very much the art of precise things. That's the difference between 20th century physics and ancient philosophy.
Agreed. I think results should be reproducible. I guess my point is that the amount of time spent writing up and communicating protocol should be governed by an analysis of the scientific benefits of doing so, rather than commercial benefits.
I'm not saying that protocols can't be improved at appropriate cost. Rather, I'm saying that the cost of creating protocols that random person at pharma company can use correctly the first time without asking for help is a going to result in an enormous amount of wasted time getting to a level of reproducibility that isn't necessary for doing good science (where good != profitable).
And at the frontiers of science where we're using a novel protocol for the first time, I think it's OK to publish without extensive documentation. If the protocol ends up being useful and important enough that other people want to use it, then that's the time to start investing time in high-quality documentation. Which is one of the scientific benefits of collaborative reproduction.
(FWIW I think michaelhoffman did a better job of capturing the essence of my complaint in a few sentences than I've done over multiple posts.)
> The specifics of the animals used for example may be the cause of severe errors and cause wildly different results.
That only underscores nmrm's point!
If the experiment is sensitive to the specifics of some animals, it might perhaps be wrong because of that. If the reproducers use animals with the same specifics, the could be led down the same wrong path.
But the only way to find out that the experiment is sensitive to specifics of some animals is to run it with different animals in the same laboratory.
If the original paper finds a result and I fail to find the same result with different animals, I have no way of knowing if the difference is because of my animals or some other small difference in experimental setup -- unless the experimental setup is explained in enough detail in the original paper.
To follow up, the papers can sometimes be deliberately vague with respect to actual volumes/masses/materials, as theoretically the concentration, & activity and other state-functions sufficiently describe the system. The exact DNA sequence I use might be important, but if I just say I used gene, 'hGENE1', that should be sufficient for the published science. Sure, my hGene1 might have a mutation, or yours might, but we'd never learn that if we just kept passing around the same plasmid, calling it hGene1 (I'm looking at you, cell-lines...).
That you used brand-X tubes should generally not be important to the science (conceptually, though sometimes they are physically), and so should be left out of the paper's protocol precisely so that another lab does not use that tube during a replication study. It's not bad science that using brand-X tube yields different results than brand-Y tube, though it can be frustrating to learn that is the crux of the difference you've been seeing. It is precisely that kind of variable that must be ferreted out, and can often only be ferreted out when labs attack the same problem from a different angle.
In terms of listing certain brands, you're completely correct that this should not cause there to be different results. But I disagree that these details should be left out of the paper. Any competent scientist attempting to replicate a result can make assumptions about which equipment (like tubing) should not reasonably be expected to change the results. It should be up to the replicating scientist to make the equipment substitutions that he deems reasonable.
An example might be listing that a certain brand of pippetor tips or centrifuge were used during the experiment. It is highly unlikely that a lab is going to go out of their way to acquire the exact brand of pipettor tips or centrifuge in order to replicate the experiment. But should they have difficulty reproducing the results, having the additional specific data in the original write up allows for the replicating lab to being troubleshooting on their own. They can begin looking at small level details which may have affected their results.
> Jeff Settleman, who left academia for industry 5 years ago and is now at Calico Life Sciences in South San Francisco, California, agrees. “You can't give me and Julia Child the same recipe and expect an equally good meal,” he says. Settleman has two papers being replicated.
Uh, Julia Child WROTE FREAKING COOKBOOKS. The entire point of Julia Child was that she tried to develop recipes in such a way that another cook could produce an equally good meal. Now, yes, if I went into a boiler room at Goldman Sachs and picked 10 guys at random, I doubt that most would be able to duplicate the recipe. If I picked 10 professional sous chefs at random and none of them were able to make a dish as good as Julia Child's from her recipe, I would start to have my doubts about the recipe.
By the same token, I don't expect rank amateurs to be able to duplicate state of the art cancer research. But if labs run by pharma companies and academic institutions are having the failure rate at reproducing research that the article claims, I think it's more than reasonable to start questioning the paper that documented that research, if not the research itself.
Analogy doesn't hold up. The complexity of cancer research is vastly greater than cooking. But anyway, chefs claim that there is special, non-reproducible elements about their environment and creations, e.g., oven, ingredients, etc. all the time.
> none of them were able to make a dish as good
Defining "good" is the problem for both cancer research and cooking. You can have two experts saying this is "good" or this is "bad" and not really be able to prove who is right. Fine, maybe for cancer research you can ultimately prove who is right but its not realistic in most cases given the resource constraints of even top-flight labs.
Well, it's not my analogy. But I think it's a good one, it just undermines the point of the person who made it.
If I go into Momofuko, I'm looking for a good meal. A good meal is one that tastes good, in this case. If I'm looking at a Julia Child cookbook, I'm looking for a good recipe. One of the criteria for a good recipe may be that it tastes good, or that it produces a healthy meal -- there's several different criteria you can use here. But one criteria for a good recipe is reproducibility -- in order for a recipe to be good it must contain enough information and be accurate enough for me to create the dish that the recipe is for. A recipe for a tasty meal that does not contain the right ingredients or enough detail in the steps to prepare it is a bad recipe.
By the same token, an experiment that cannot be repeated is a bad experiment. It may not be false. But its explanatory value is limited -- if a reaction can only take place in water that's treated a certain way or has/lacks certain minerals, then a paper that doesn't tell me that is leaving out important information. Regardless of whether or not you define the point of cancer research in purely scientific terms -- that is, to learn more about cancer -- or in more pragmatic terms -- that is, to allow us to create better cancer treatments -- omitted information about the circumstances surrounding the test that has a significant effect on the test result gives us less information and is less likely to lead to better cancer treatments.
> It's unrealistic to think contract labs or university core facilities can get the same results as a highly specialized team of academic researchers, they say. Often a graduate student has spent years perfecting a technique using novel protocols, Young says.
Then they need to spend the time documenting those protocols.
My dad worked in biological research, and his attitude has always been: if you don't write it down, you might as well not have done the work at all. ESPECIALLY in research.
So hold on a moment. These researchers are doing experiments so badly that they can't find the actual procedures they used to get their results? And now they are tracking down old postdocs and lab technicians just to pick their Brian's as to what they actually did?!?
How the heck did this stuff get through peer review? Surely I'm missing something critical?
Academic peer review has nothing to do with quality control and everything to do with maintaining the status quo. Don't take my word for it, read Retraction Watch: http://retractionwatch.com/
Interesting... Over the past year I was tickling the idea of making an organization/company/website which automatically would give tax dollars to research. The idea being, you can maximize the amount you right off in taxes and donate to research you desire (i.e. more NASA, less children killed around the globe).
The idea stemmed from the the idea that people want access to research (public) AND reproducible. The funds would go directly to research groups, and as an incentive, reproducibility would have bounties based on what people were willing to donate. Because virtually every research group with public research is supported by a non-profit, no one losses additional money, but more funds go towards public interest research GROUPS not organizations with bureaucracy.
Somewhat off topic, but this seems another reason for me to start the project.
I tend to agree that the biology papers often lack proper documentation of procedures and methodologies. This is a wonderful effort to reproduce some of the key experiments. That being said, I think it's also very important to look at the quality and qualifications of the labs doing the reproductions.
I don't have any direct link to cancer research, so I can't speak with authority on the subject, but I have been involved in the past with a company working in the Preimplantation Genetic Diagnosis field.
The basics of their procedure is to create one or more human embryos via IVF, incubate the embryos for up to 6 days, than either freeze them, or transplant them into the prospective mother. On day 3 or 5 of incubations the embryo is biopsied, and the genetic material is tested to make sure there are no aneuploidy defects. We were also able to test for some other types of genetic abnormalities. This is for people who are having problem becoming pregnant.
In any case, some time in the mid 2000s there were 3 papers published in Europe claiming that performing biopsy on Day 3 is extremely detrimental to the embryo, and their conclusion was that PGD with Day 3 should not be performed. The experiments were conducted by people who were unskilled in micro manipulation.
They did follow proper protocols, and I am sure they did their best to replicate proper procedures. But micro manipulation is as much skill as it is knowledge. For instance, I can write a detailed procedure on how to shoot a compound bow, and you can follow that procedure exactly. But, without practice, you are not going to hit the bullseye on the first try.
Because we were in the business of providing services to doctors, not publishing papers, we constantly tracked our embryo mortality rates, birth rates, and accuracy of testing. The better our results were, the more business we would get. And we couldn't fake the results, because clinics ordering the test would be the once recording all of those statistics for us.
Any way, long story short, none of our data agreed with the papers claiming that Day 3 biopsy was detrimental to the embryo. In fact, quite the contrary, many of our statistics suggested that Day 3 biopsy and Day 4 or Day 5 transfer would result in better implantation rates. But, the papers were published, and referenced, and then it became "common knowledge" that Day 3 biopsy is bad, and the medical industry moved on to Day 5 biopsy and embryo cryopreservation, and so has the company I worked with.
To the company I worked for it's all the same, money is money. Day 3 or Day 5 biopsy, they make money all the same. But the patients are not more limited. From the stats we have seen, it doesn't look like Day 4 or 5 biopsy is worse for the embryo, but being frozen isn't a walk in the park. With Day 5 biopsy you have to freeze the embryo, in order to allow time for the test results to come back.
Any way, it's my 2 cents. Reproducibility is important, but I think it's just as important to change the incentives of those who publish papers. If you goal is to be published, then of course your research will suffer. It's the publish or parish mentality in academia that is the problem, I think.
Why didn't your company publish their data. If day 3 biopsy is better and you have the data to show it get it out. This is not some meaningless result - this is a matter of life and death.
Because things in medical science function a lot differently from the rest of science. Take a look at this video to see what I mean:
https://www.youtube.com/watch?v=VArT6Kj_x_8
I agree with the overall sentiment, the scientific community working on cancer cure(s) is failing us, the patients and their families.
And they are failing us because of some fundamental gaps in how the research, and subsequent review/dissemination/presentation of finding is done. I suspect there are multiple failures in the process. The standards of scientific proof and repeatability, used by mathematicians, physicists and chemists are not followed
The net result is the following disappointing statistic:
"...
In 1971, President Nixon and Congress declared war on cancer. Since then, the federal government has spent well over $105 billion on the effort (Kolata 2009b).
...
Gina Kolata pointed out in The New York Times that the cancer death rate, adjusted for the size and age of the population, has decreased by only 5 percent since 1950 (Kolata 2009a)." [1]
And this was just the US federal government investment. Not counting the private donations, and private company research.
Today the annual fed investment is 5bln annually [3]
I do not mean to sound totally discouraged, as clearly the screenings have helped many to detect cancers before they metastasized. And I would say the science results are showing that that part of the research is working well.
However,for the cancers that can be rarely be detected before the spread (eg pancreatic cancer and others) -- the investment our country and other societies have put in -- simply has not payed off.
What worries me is that our research quality gates are not able to improve the QoS of the underlying research.
And with my 'management hat on' -- I am reaching out for this quote by Einstein.
"Insanity: doing the same thing over and over again and expecting different results.
The OP paper is not the first one pointing at the lack of reproducible results, and it not just for cancer research
"...
But it may also be due to current state of science. Scientists themselves are becoming increasingly concerned about the unreliability – that is, the lack of reproducibility — of many experimental or observational results.
..." [2]
There needs to be a bit of a revolution in the science of the cancer research and the way money is allocated to it.
Clearly the current model does not work and likely is encouraging the pseudo science to prosper.
the group clarified that they did not want to replicate the 30 or so experiments in the Cell paper, but just four described in a single key figure. And those experiments would be performed not by another academic lab working in the same area, but by an unnamed contract research organization.
sounds like someone wants to quietly weaponize this.
People there are re-doing the same experiment over and over until it gives them the result they want, and then they publish that. It's the only field where I've heard people saying "Oh, yeah, my experiment failed, I have to do it again". What does it even mean that an experiment failed? It did exactly what it was supposed to: it gave you data. It didn't fit your expectations? Good, now you have a tool to refine your expectations. But instead, we see PhD students and post-doc working 70h hours week on experiments with seemingly random results until the randomness goes their way.
A lot of them have no clue about statistical treatment of data, making a proper model to try and test assumptions against reality. Since they deal with insanely complicated system, with hidden variables all over the place, a proper statistical analysis would be the minimum expected to be able to extract any information from the data, but no matter, once you have a good looking figure, you're done. In cellular/molecular biology, nobody cares about what a p-value is, so as long as Excel tells you it's <0.05, you're golden.
The scientific process has been forgotten in biology. Right now it's basically what alchemy was to chemistry.
I very happy to see efforts like this one. Sure, they might show that a lot of "key" papers are very wrong, but that's not the crux of it. If there is a reason for biologists to make sure that their results are real, they might try to put a little more effort into checking their work. And when they figure out how much of it is bullshit, they might even try to slow down a little on the publications and go back to the basics for a little while.
I'm sorry about this rant, but I've been driven away from a career in virology by those same issues, despite my love for the discipline, so I'm a bit bitter.