About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« The Synthesis Machine | Main | Is The Current Patent System Distorting Cancer Research? »

August 8, 2014

Mouse Models of Inflammation: Wrong Or Not?

Email This Entry

Posted by Derek

I wrote here about a study that suggested that mice are a poor model for human inflammation. That paper created quite a splash - many research groups had experienced problems in this area, and this work seemed to offer a compelling reason for why that should happen.

Well, let the arguing commence, because now there's another paper out (also in PNAS) that analyzes the same data set and comes to the opposite conclusion. The authors of this new paper are specifically looking at the genes whose expression changed the most in both mice and humans, and they report a very high correlation. (The previous paper looked at the mouse homologs of human genes, among other things).

I'm not enough of a genomics person to say immediately who's correct here. Could they both be right: most gene and pathway changes are different in human and mouse inflammation, but the ones that change the most are mostly the same? But there's something a bit weird in this new paper: the authors report P values that are so vanishingly small that I have trouble believing them. How about ten to the minus thirty-fifth? Have you ever in your life heard of such a thing? In whole-animal biology, yet? That alone makes me wonder what's going on. Seeing a P-value the size of Planck's constant just seems wrong, somehow.

Comments (28) + TrackBacks (0) | Category: Animal Testing


1. Chad Orzel on August 8, 2014 9:04 AM writes...

I've seen a paper in laser physics that reported a violation of a classical prediction by 100 standard deviations. That's so big that Mathematica couldn't convert it to a p-value...

Permalink to Comment

2. RegularReader on August 8, 2014 9:09 AM writes...

"My apple is better than your orange."

Derek, you (on 02/17/14) and others have already commented on the over-use, misuse, and abuse of p-values. Perhaps this reflects an educational deficit--how many life science graduate programs require their students to take applied statistics courses?

Hopefully, advances in primary cell culture, patient-derived xenografts, and organs/humans-on-a-chip will address many issues with current whole animal models.

Quoting a former professor:
"Where applicable, the best analog of your compound is the enantiomer."

Permalink to Comment

3. Anonymous on August 8, 2014 10:57 AM writes...

#2 RR you attribute this statement to your prof "Where applicable, the best analog of your compound is the enantiomer." and would suggest the "where applicable" criteria must be pretty wide open as most enantiomers are likely to have activity and not necessarily in a positive or easily detectable fashion in early preclinical testing (thalidomide being the classic example). I for one would be very careful of the context before application of that principle

Permalink to Comment

4. a. nonymaus on August 8, 2014 11:13 AM writes...

Of course your correlations improve when you suppress all the data points that are near an axis, which is what the authors did. Basically, they have established that things that are correlated are correlated. On the other hand, if there are a lot of other things that are not correlated, those things will lead to a bad model.

Permalink to Comment

5. Oblarg on August 8, 2014 11:14 AM writes...


It most certainly is an educational deficit. It's the same reason you can't trust any published results in nutrition - the people writing them have no fundamental understanding of statistics or experimental design.

Math education in general is in tatters, and our science is worse for it.

Permalink to Comment

6. kjk on August 8, 2014 11:20 AM writes...

Most papers just do p

Permalink to Comment

7. Oblarg on August 8, 2014 11:25 AM writes...


That's not strictly true - one can easily generate pathological data sets which have almost no data near an axis yet have zero correlation.

It's worth noting that correlation coefficients are, to some degree, wonky and misleading whenever you're working with any system that isn't described by a simple linear relation.

Permalink to Comment

8. Biff on August 8, 2014 11:45 AM writes...

I've done a lot of gene expression analyses over the years. From the beginning, I've been impressed with how wildly different the results can be when comparing rat data with mouse data (collected in the same set of experiments by the same set of hands). Obviously a rat is not a mouse and vice versa, but the difficulty of extrapolating between closely related rodents suggests the need for real caution when extrapolating between rodents and humans. Well, at least some humans.

Permalink to Comment

9. Barry on August 8, 2014 12:12 PM writes...

Upjohn spent a few $hundred million (sounded like a lot of money at the time) on a small-molecule blocker of degranulation. Worked impressively in mouse models of inflammation. Showed zero efficacy in human disease--except for one family group in Finland.

Permalink to Comment

10. Argon on August 8, 2014 12:30 PM writes...

@8 Biff
Hmmm. Sequencing papers suggest the house mouse / Norway rat split was about 12-24 mya and that the human and mouse lineages split about 75 mya. So if the mice/rat comparison experiments are a bit dodgy, multiply that by 3 or so for comparisons with humans...?

Permalink to Comment

11. Harrison on August 8, 2014 12:32 PM writes...


That's more of a criticism of where the hypothesis for the model comes from rather than the animal itself. I would argue that some of the failed Alzheimer's drugs may work in early on-set Swedish Alzheimer's patients. At least this hypothesis is being tested by the DIAN trials.

Permalink to Comment

12. luysii on August 8, 2014 12:39 PM writes...

Hopefully the inflammation work in mice is more applicable to man that similar work on stroke in animals which was an unmitigated disaster -- no treatment showing efficacy in animals was of any use when tried in humans.

Even as early as 24+ years ago, of 25 different compounds of proven efficacy for treating focal and global brain ischemia over the past 10 years based on published articles in refereed journals, NONE has worked in clinical trials in man [ Stroke vol. 21 pp. 1 - 3 '90 ]

Permalink to Comment

13. Cellbio on August 8, 2014 12:58 PM writes...

Inflammation models are really pretty close in my opinion. That is, close to inflammation in humans. The problem is that most auto-immune models are essentially inflammation models (acute stimulation of immune cell migration and activation) and therefore many actives in these models do not address pathophysiology of more complex auto-immune disease. They work for mechanistic components which can then be tested for therapeutic benefit in humans.

Permalink to Comment

14. RKN on August 8, 2014 1:09 PM writes...

Haven't read either paper, but it has been my experience working with commercial pathway/network software that the p-values associated with this or that discovered pathway/network/disease map are often extraordinarily small, as Derek mentioned. If the paper mentioned used commercial pathway software the authors may have just reported the p-values the software returned and never questioned them. You can get some fantastically small p-values when you evaluate significance on a hypergeometric distribution, which at least one software package I used, did.

Permalink to Comment

15. Anonymous on August 8, 2014 2:21 PM writes...

look in the supplemental. The most extreme p values are e**-323!

I smell a rat, no a mouse, well certainly not a human.

Permalink to Comment

16. matt on August 8, 2014 2:23 PM writes...

#2 RR writes:
"Hopefully, advances in primary cell culture, patient-derived xenografts, and organs/humans-on-a-chip will address many issues with current whole animal models."


Permalink to Comment

17. pete on August 8, 2014 2:51 PM writes...

I briefly looked at the 1st paper & don't have immediate access to the 2nd so I'm in no position to comment on the data.

Still, I'm stunned that there could be such a radical disconnect in microarray data interpretation. After all, there's a common focus on immune system transcripts in particular. And both papers were sponsored for submission by heavy-hitters in immunology, so you'd think that the data handling by both groups would be sound & thorough.

How could this be ??

Is the analysis of gene transcript data really so treacherous ? Should we all go back to good old Northern Blots ?

Permalink to Comment

18. Anonymous on August 8, 2014 2:58 PM writes...

"look in the supplemental. The most extreme p values are e**-323!"

If p values are supposed to be the probability of obtaining a test result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true, then wouldn't they have had to test at least 10^323 samples to get such a value?

Permalink to Comment

19. captaingraphene on August 9, 2014 4:34 AM writes...

People seem to make the assumption that the authors have a sufficient grasp of statistics (and of mathematics in general), which is in my experience nowhere near warranted. Big names in say, immunology, unfortunately does not automatically equal a 'big time' understanding of statistical principles. Even a 'star' PI may make quantitative mistake after mistake, yet still get the paper published, solely because of his/her name and reputation.

Permalink to Comment

20. newnickname on August 9, 2014 8:50 AM writes...

@2. RegularReader, Quoting a former professor:
"Where applicable, the best analog of your compound is the enantiomer.": That concept is usually attributed (Matthew Effect) to RB Woodward IN THE CONTEXT OF ORGANIC SYNTHESIS. In those days, there were frequent large scale resolutions to obtain chiral materials. What is the best way to test and optimize reaction conditions? On the precious compound with the natural configuration or on the 50% "waste" compound with the identical (other than chiral) chemical properties?

Permalink to Comment

21. Neo on August 9, 2014 9:55 AM writes...

The real problem is not that p-values are small or that many leading chemists/biologists do not have a sufficient understanding of statistics.

The real problem is that p-values are often an answer to a different and/or far easier problem of little practical significance.

Permalink to Comment

22. PUI Prof on August 9, 2014 8:56 PM writes...

@10 Argon
Interesting diversion, and I have not thought through the calculation. But my gut says a factor of 2^3 to 10^3 would be more realistic (probably closer to 2 than 10).

Permalink to Comment

23. clinicaltrialist on August 10, 2014 1:51 AM writes...

This second paper is deeply flawed. As #4 said, they select only the genes that are correlated and run a p value. Duh!

The goal of a model is to mimic the human conditions/diseases. If you filter out most of the genes that are upregulated in inflammation in humans, as these authors did, the p values mean nothing.

Permalink to Comment

24. captaingraphene on August 10, 2014 10:57 AM writes...

@clinicaltrialist brings up a good point.

How in the world did nobody in the peer review process question these things?

Permalink to Comment

25. anonymous on August 10, 2014 2:56 PM writes...

Any time you see someone use the notation "p equals x", you know they lack some fundamental statistics background. "p is less than" is the proper notation. "p equals x" is so very wrong when you understand the basics of p-values.

I just glanced at each article and saw that cringe-worthy mistake all over the second article. I'm no expert in the data so their conclusions could still be correct, but they are not boosting confidence there.

(I'm not using the actual equals and less-than symbols because I think it messes up the html here.)

Permalink to Comment

26. @18 on August 10, 2014 3:21 PM writes...

For some perspective, that number of experiments would exceed the number of protons and electrons in the observable universe!

Permalink to Comment

27. Dana on August 10, 2014 7:01 PM writes...

Regarding the small p-values, keep in mind that a comparison like a t-test is a signal to noise measure, and you can get very low p-values when the variability of the data is very low. For instance, try in Excel doing a t-test of two populations with values (1,1,1) and (2, 2.0001, 2), and you will get a t-test p-value of the order of 7E-18. So I have commonly seen quite low p-values, and often as an artifact of low variance, which may be spurious (chance). With the many comparisons that you tend to do in genomics/omics in general, that sort of stuff can happen, but it may not be important.

Permalink to Comment

28. dcgent on August 12, 2014 2:47 PM writes...

Science covered the new PNAS paper:

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry