Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases.
To contact Derek email him directly: derekb.lowe@gmail.com
Twitter: Dereklowe

I don't usually do more than one post a day, but this really caught my eye. In an ongoing review of Pfizer's (now discontinued) inhaled insulin (Exubera), an increased chance of lung cancer has turned up among participants in the clinical trials. Six of the over four thousand patients in the trials on Exubera have since developed the disease, versus one of the similarly-sized control group. Six isn't many, but with that large a sample size, it's something that statistically can't be ignored, either.

The concerns would have to be, naturally, that this number could increase, since damage to lung tissue might take a while to show up. This, needless to say, completely ends Nektar's attempts to find another partner for Exubera. Their stock is getting severely treated today (down 25% as I write), but things are even worse for another small company, Mannkind, that's been working on their own inhaled insulin for years now (down 58% at the moment).

There's no guarantee that another inhaled form would cause the same problems, but there's certainly no guarantee that it wouldn't, either. Whether this is an Exubera-specific problem, an insulin-specific one, or something that all attempts at inhaled proteins will have to look out for is just unknown. And unknown, in this case, is bad. It's going to be hard to make the case to find out, if this is the sort of potential problem waiting for your new product. Inhaled therapeutics of all sorts have taken a huge setback today.

1. David Young on April 9, 2008 12:44 PM writes...

Makes one wonder if the stimultation of insulin dependent growth factors plays a role in cancer formation and progression. This might help explain why individuals who exercise and lose weight after breast cancer surgery or colon cancer surgery seem to fair better. Exercise build us tolerance and lower glucose levels overall, thereby requiring less insulin, thereby less stimulation of insulin dependent growth factors, thereby less cancer. or at least something like this.

2. willber deck on April 9, 2008 2:05 PM writes...

6 out of 4,000 Exubera patients, versus one out of 4,000 controls. You say with such a sample size, "it's something that statistically can't be ignored".

Statistically, it makes no difference what the denominator is, 6 out of 400 versus 1 out of 400 would have exactly the same statistical significance. Only the numerator matters. In fact, since the expected number is 1, you can describe the Poisson distribution very succinctly: the probabilities of having 0, 1, 2, 3, 4 and 5 or more Exubera lung cancers is 37%, 37%, 18%, 6%, 1.5%, and 0.4%, respectively. The probability of having 6 or more such events would be 0.06%, obviously below the traditional 2.5% level to be called significant.

On the other hand, this is the kind of situation where a Bayesian analysis may be more useful, if this is an unexpected finding. After all, there are a zillion things that might have popped up, and those numbers above don't account for the large number of potential comparisons. I bet if you did the study again, you might well not get such a big difference, but I doubt we will ever find out!

I recall that nasally inhaled insulin was withdrawn by another manufacturer roughly 20 years ago because of tumors, too.

Sounds like respiratory epithelium does bad things when "over-insulinized"---and inhaled insulin always was super-dosed because absorption was far less than by IV or SC injection.

willber: The actual expected number isn't 1, though; it's unknown (but we have a data point that suggests it might be about 1). And, unless I'm misunderstanding your statistics, you haven't taken into account the possibility that the results in the control group were a bit low by random chance.

I believe that the proper statistical analysis here is different from the one proposed by Willber Deck. The control group here was not so large as to determine the expected number of events in the Exubera group. Instead, the best we can say is that there were 7 events, and since the two groups were of equal size, the null hypothesis is that each event would be as likely to occur in one group as the other. The pertinent distribution is the binomial: one might expect all 7 events to occur in the same group 2/128 of the time, a 6/1 split (which is what happened) 14/128 of the time, and so on.

6. willber deck on April 9, 2008 4:05 PM writes...

OK, fair enough, I agree with both the previous posts. My main point was that the denominator doesn't matter, what matters is the 6 vs 1.

Using the frequentist approach, Robert Fenichel's proposal strikes me as reasonable, and takes account of the objections raised by OleJim. It would give you a p-value of 14/128, not significant according to convention.

In this case, since the insulin-lung cancer association is novel, unexpected and not very intuitive, I would be even more skeptical, unless there was another study along the same lines, and no one is likely to be planning one now.

9. Anonymous BMS Researcher on April 9, 2008 7:05 PM writes...

Actually, I think the most appropriate statistical test for this case might be the Fisher Exact Test, and for that test the total sample size does matter but not in a simple and obvious manner. If we have 4000 total in each group, one cancer case among controls, 6 cancer cases among treated patients and confounding factors properly controlled for, by the Fisher Exact Test, one-sided, mid-P then we get P of 0.035. If we use a two-sided test, we'd get P of 0.07. While the issue of one-sided versus two-sided hypothesis testing is somewhat controversial among statisticians I think the majority would say for trying to demonstrate efficacy a two-sided test is appropriate but for adverse events a one-tailed test is the more conservative approach.

So depending how one handles the analysis this either is significant or is borderline significant. Presumably the folks who did the study also known when each patient started the trials and when the cancers were discovered, and therefore could use the much more powerful time-to-event methods to assess the significance of these findings...

10. Anonymous BMS Researcher on April 10, 2008 5:53 AM writes...

With a rare but nasty potential adverse event like this, unless somebody comes up with a really convincing mechanistic explanation, the only other way to get a handle on a one-in-one-thousand AE would be to study enough patients for enough time to obtain decent estimates of true frequencies, which means many many people for many many weeks. The question: "are we looking at a real effect here?" is not gonna be quick or easy to resolve, that's for sure.

11. Wilber Deck on April 10, 2008 7:34 AM writes...

AnonBMS: Your question: "are we looking at a real effect here?" is the important one indeed, nuances of Fisher's exact test vs binomial vs Poisson aside. After all, all these statistical models ignore our prior knowledge and put plausible and implausible phenomena on the same footing, and given the small numbers, the conclusion will be highly dependent on the details of the model, one-sided vs two, mid-p or not, etc.

Basically, you have an unexpected result with pretty limited biological plausibility, just one of many, many associations that was possible, like asthma attacks, tuberculosis reactivation, pulmonary hypertension, suicide, etc. etc. So the 6 vs 1 issue is ultimately less important than the number of other hypotheses that could have been elicited, retrospectively, when looking at the data.

In this case, lung cancer is particularly implausible. After all, how often do you see cancers as the result of a short-term exposure? Typically, lung cancer has 20-30 year latency periods between exposures (smoking, asbestos, silicon, radon, etc.) and diagnosis. So this one, for me, goes in the implausible pile.

Given the nature of Exubera, a convenience route of administration where reasonable alternatives already exist, I think inhaled insulin has probably received a death blow, but although it will probably never be proved, I would bet that this blow is below the belt. It makes you wonder how many truly important drugs have been abandoned for the same faulty reasons.

12. stats novice on April 10, 2008 10:22 AM writes...

Thanks for your insights, Wilber!

I'm finding that Excel can't handle factorials of the size necessary to calculate the p-values for these data using Fisher's exact test. What software to you use/know of that's up to the task?

Fisher's Exact Test is probably optimum here, as Anonymous BMS Researcher says, but the event rates were high enough to use chi-squared without bias. Chi-squared is well-adapted to spreadsheet use. If one wants to be cute, one can even have the spreadsheet look up the computed chi-squared value and provide the p-value range.
In this case, the chi-squared (1 DF) value is 3.57, so 0.1 > P > 0.05, just as ABMSR says.
I often get a quick-and-dirty estimate from a binomial approximation (here, it wasn't far off), but then I sometimes forget that one can do better if one is willing to use real tools.

www.openepi.com
extremely useful resource: open-source statistical package written in Javascript so it can be run in a web browser. Under "Counts" the Fisher Exact Test and many others are included the "Two by Two Table" section

16. Still Scared of Dinosaurs on April 11, 2008 8:06 PM writes...

I'm wicked bummed I missed out on this discussion earlier. One important aspect that needs to be kept in mind is that Fisher's is very insensitive to assumptions whereas most time-to-event models need to have their assumptions checked. 1 vs 6 isn't enough events to satisfy the tests of assumptions.
A quick rule of thumb is that time to event works best when the average time to occurrence "kind of" favors one group and differences in event rates "kind of" favors the same group. Here the entire effect of the time is whether the single event was early or late.
What makes more sense, and is more standard for AE data, is to determine the events per time. This most often makes sense in cases like this because while each group may have 4000 patients the actively treated patients most often have more follow up time per patient. The easy way to look at this in context would be if each exubera patient was followed for 6 years and each placebo for one then there is no difference between groups.

Jimbo: lung cancer correlates very very strongly with smoking, but one would like to assume that they controlled for the percentage of smokers (and former smokers) in each population of four thousand. I mean, it is a study involving something that goes into the lungs, so not controlling for that would be too stupid a mistake for a reasonably intelligent layperson to make. So one imagines they did control for that.

Even so, lung cancer is a fairly common thing. (Lifetime risk is over 1%.) I'd want to know what the timeframe was, but in general if six people out of four thousand in a population get lung cancer, I'm not sure it's really necessary to go looking for any special cause. That sounds in line with the normal incidence rate to me, again, depending on timeframe.

Come to think of it, if there were a normal (based on society) percentage of smokers in the study population, seven cases in eight thousand people would be an astonishingly low incidence of lung cancer over any sufficiently-long timeframe to make a meaningful study. But maybe they controlled for the percentage of smokers by excluding smokers. That would make seven people in eight thousand seem about right over a typical not-very-long study timeframe. Approximately.

(Nominally, if the lifetime risk is about 1.3%, give or take, then the risk in eight years, if the risk were spread out evenly over the lifetime (which it's not) would be about .13%, or in a five-year study about .075%. This is not exactly right, among other things because the risk of lung cancer correlates somewhat with age, and the study participants were probably all adults, and my figure for average lifespan might not be exactly right. But for a back-of-envelope calculation it's awefully close to the 7/8000 (i.e., 0.875%) observed. Though, again, I don't know how long the study was.)

If six people in the population of four thousand had come down with something fairly unusual, that would probably mean something. But lung cancer is just not that rare. If six people out of four thousand in a clinical study for topically applied foot cream had come down with lung cancer, nobody would have thought anything about it. The fact that Exubera is inhaled is the only reason it even got noticed.

It's worth continuing to watch, to see if more of the people in the population have trouble or anything, but I would not jump to a conclusion, based on six cases in a population of four thousand, that the drug causes lung cancer.

Logically, if the incidence in your combined population is normal, then any unexpected higher-than-normal risk in one half the population is only meaningful if the corresponding lower-than-normal risk in the other half is also meaningful (which makes no sense at all; being part of the control group in a drug study would NOT be expected to reduce your risk of lung cancer). Otherwise it just means there's a risk factor you didn't control for properly.

My bet is there's a (known or unknown) risk factor they neglected to control for properly.

I'm neither a chemist nor a biologist, and it's been a few years since my mathematical prob and stat class, so there could be something I'm missing. But common sense tells me six in four thousand is not a worrisomely high incidence of lung cancer unless there's something more to the study that I don't understand.

Oh, and by "smokers" I mean "people who live or work in a building where smoking occurs", and similarly but in the past tense for "former smokers". But as I said, one supposes they controlled for that because, duh, it's a clinical study for something that goes into the lungs.

This is just another devestating blow blow to me .. Exubera has worked so well for me and i am am confused and a bit angry about this whole Situation with Phizer .. I called and spoke with a rep for Phizer and she had to look the announcement up .. and in my opinion down played the Whole thing .. i have a call in to my doctor and i am waiting to hear back ..

20. Eddie B on September 2, 2009 9:28 PM writes...

I was one of those "control" subjects who used exubera for over 6 years before it was pulled. The required a lung xray at least once a year. I know that increases the potential for cancer in itself. I really am starting to get nervous now that I am reading about what I have been exposed to. Granted that I wasn't forced to inhale the insulin. Psalms 103:15 "As for man, his days are like grass, he flourishes like a flower of the field;"

COMMENTS1. David Young on April 9, 2008 12:44 PM writes...

Makes one wonder if the stimultation of insulin dependent growth factors plays a role in cancer formation and progression. This might help explain why individuals who exercise and lose weight after breast cancer surgery or colon cancer surgery seem to fair better. Exercise build us tolerance and lower glucose levels overall, thereby requiring less insulin, thereby less stimulation of insulin dependent growth factors, thereby less cancer. or at least something like this.

David

Permalink to Comment2. willber deck on April 9, 2008 2:05 PM writes...

6 out of 4,000 Exubera patients, versus one out of 4,000 controls. You say with such a sample size, "it's something that statistically can't be ignored".

Statistically, it makes no difference what the denominator is, 6 out of 400 versus 1 out of 400 would have exactly the same statistical significance. Only the numerator matters. In fact, since the expected number is 1, you can describe the Poisson distribution very succinctly: the probabilities of having 0, 1, 2, 3, 4 and 5 or more Exubera lung cancers is 37%, 37%, 18%, 6%, 1.5%, and 0.4%, respectively. The probability of having 6 or more such events would be 0.06%, obviously below the traditional 2.5% level to be called significant.

On the other hand, this is the kind of situation where a Bayesian analysis may be more useful, if this is an unexpected finding. After all, there are a zillion things that might have popped up, and those numbers above don't account for the large number of potential comparisons. I bet if you did the study again, you might well not get such a big difference, but I doubt we will ever find out!

Great blog! Regards, WD

Permalink to Comment3. OleJim on April 9, 2008 2:17 PM writes...

I recall that nasally inhaled insulin was withdrawn by another manufacturer roughly 20 years ago because of tumors, too.

Sounds like respiratory epithelium does bad things when "over-insulinized"---and inhaled insulin always was super-dosed because absorption was far less than by IV or SC injection.

Permalink to Comment4. Brooks Moses on April 9, 2008 3:27 PM writes...

willber: The actual expected number isn't 1, though; it's unknown (but we have a data point that suggests it might be about 1). And, unless I'm misunderstanding your statistics, you haven't taken into account the possibility that the results in the control group were a bit low by random chance.

Permalink to Comment5. Robert R. Fenichel on April 9, 2008 3:39 PM writes...

I believe that the proper statistical analysis here is different from the one proposed by Willber Deck. The control group here was not so large as to determine the expected number of events in the Exubera group. Instead, the best we can say is that there were 7 events, and since the two groups were of equal size, the null hypothesis is that each event would be as likely to occur in one group as the other. The pertinent distribution is the binomial: one might expect all 7 events to occur in the same group 2/128 of the time, a 6/1 split (which is what happened) 14/128 of the time, and so on.

Permalink to Comment6. willber deck on April 9, 2008 4:05 PM writes...

OK, fair enough, I agree with both the previous posts. My main point was that the denominator doesn't matter, what matters is the 6 vs 1.

Using the frequentist approach, Robert Fenichel's proposal strikes me as reasonable, and takes account of the objections raised by OleJim. It would give you a p-value of 14/128, not significant according to convention.

In this case, since the insulin-lung cancer association is novel, unexpected and not very intuitive, I would be even more skeptical, unless there was another study along the same lines, and no one is likely to be planning one now.

Permalink to Comment7. Jimbo on April 9, 2008 6:04 PM writes...

I wonder how the numbers would look like for 4000 smokers vs. 4000 non-smokers?

Permalink to Comment8. satan on April 9, 2008 6:25 PM writes...

While high levels of IGFs have been associated with more aggressive cancers (like prostate cancers), I have to say that I did not see this one coming.

Permalink to Comment9. Anonymous BMS Researcher on April 9, 2008 7:05 PM writes...

Actually, I think the most appropriate statistical test for this case might be the Fisher Exact Test, and for that test the total sample size does matter but not in a simple and obvious manner. If we have 4000 total in each group, one cancer case among controls, 6 cancer cases among treated patients and confounding factors properly controlled for, by the Fisher Exact Test, one-sided, mid-P then we get P of 0.035. If we use a two-sided test, we'd get P of 0.07. While the issue of one-sided versus two-sided hypothesis testing is somewhat controversial among statisticians I think the majority would say for trying to demonstrate efficacy a two-sided test is appropriate but for adverse events a one-tailed test is the more conservative approach.

So depending how one handles the analysis this either is significant or is borderline significant. Presumably the folks who did the study also known when each patient started the trials and when the cancers were discovered, and therefore could use the much more powerful time-to-event methods to assess the significance of these findings...

Permalink to Comment10. Anonymous BMS Researcher on April 10, 2008 5:53 AM writes...

With a rare but nasty potential adverse event like this, unless somebody comes up with a really convincing mechanistic explanation, the only other way to get a handle on a one-in-one-thousand AE would be to study enough patients for enough time to obtain decent estimates of true frequencies, which means many many people for many many weeks. The question: "are we looking at a real effect here?" is not gonna be quick or easy to resolve, that's for sure.

Permalink to Comment11. Wilber Deck on April 10, 2008 7:34 AM writes...

AnonBMS: Your question: "are we looking at a real effect here?" is the important one indeed, nuances of Fisher's exact test vs binomial vs Poisson aside. After all, all these statistical models ignore our prior knowledge and put plausible and implausible phenomena on the same footing, and given the small numbers, the conclusion will be highly dependent on the details of the model, one-sided vs two, mid-p or not, etc.

Basically, you have an unexpected result with pretty limited biological plausibility, just one of many, many associations that was possible, like asthma attacks, tuberculosis reactivation, pulmonary hypertension, suicide, etc. etc. So the 6 vs 1 issue is ultimately less important than the number of other hypotheses that could have been elicited, retrospectively, when looking at the data.

In this case, lung cancer is particularly implausible. After all, how often do you see cancers as the result of a short-term exposure? Typically, lung cancer has 20-30 year latency periods between exposures (smoking, asbestos, silicon, radon, etc.) and diagnosis. So this one, for me, goes in the implausible pile.

Given the nature of Exubera, a convenience route of administration where reasonable alternatives already exist, I think inhaled insulin has probably received a death blow, but although it will probably never be proved, I would bet that this blow is below the belt. It makes you wonder how many truly important drugs have been abandoned for the same faulty reasons.

Permalink to Comment12. stats novice on April 10, 2008 10:22 AM writes...

Thanks for your insights, Wilber!

I'm finding that Excel can't handle factorials of the size necessary to calculate the p-values for these data using Fisher's exact test. What software to you use/know of that's up to the task?

Permalink to Comment13. Robert R. Fenichel on April 10, 2008 12:06 PM writes...

Fisher's Exact Test is probably optimum here, as Anonymous BMS Researcher says, but the event rates were high enough to use chi-squared without bias. Chi-squared is well-adapted to spreadsheet use. If one wants to be cute, one can even have the spreadsheet look up the computed chi-squared value and provide the p-value range.

Permalink to CommentIn this case, the chi-squared (1 DF) value is 3.57, so 0.1 > P > 0.05, just as ABMSR says.

I often get a quick-and-dirty estimate from a binomial approximation (here, it wasn't far off), but then I sometimes forget that one can do better if one is willing to use real tools.

14. Anonymous BMS Researcher on April 10, 2008 8:55 PM writes...

Stats novice, here is some information about Fisher's Exact Test; Excel is not the way to go here.

http://udel.edu/~mcdonald/statfishers.html

good general explanation

www.openepi.com

extremely useful resource: open-source statistical package written in Javascript so it can be run in a web browser. Under "Counts" the Fisher Exact Test and many others are included the "Two by Two Table" section

http://www.bmj.com/collections/statsbk/9.dtl

Online stats textbook chapter from the British Medical Journal about F.E.T.

Hope these are helpful!

Permalink to Comment15. Dr. Incognito on April 11, 2008 1:13 PM writes...

Derek,

Thank you for speaking on this issue. It's just the type of information that needs to be spread amongst those in the medical field.

In fact, I've placed it on my weekly Honorable Mention list on redscrubs.com

Sincerely,

Permalink to CommentDr. Incognito

16. Still Scared of Dinosaurs on April 11, 2008 8:06 PM writes...

I'm wicked bummed I missed out on this discussion earlier. One important aspect that needs to be kept in mind is that Fisher's is very insensitive to assumptions whereas most time-to-event models need to have their assumptions checked. 1 vs 6 isn't enough events to satisfy the tests of assumptions.

Permalink to CommentA quick rule of thumb is that time to event works best when the average time to occurrence "kind of" favors one group and differences in event rates "kind of" favors the same group. Here the entire effect of the time is whether the single event was early or late.

What makes more sense, and is more standard for AE data, is to determine the events per time. This most often makes sense in cases like this because while each group may have 4000 patients the actively treated patients most often have more follow up time per patient. The easy way to look at this in context would be if each exubera patient was followed for 6 years and each placebo for one then there is no difference between groups.

17. Jonadab the Unsightly One on April 13, 2008 2:01 PM writes...

Jimbo: lung cancer correlates very very strongly with smoking, but one would like to assume that they controlled for the percentage of smokers (and former smokers) in each population of four thousand. I mean, it is a study involving something that goes into the lungs, so not controlling for that would be too stupid a mistake for a reasonably intelligent layperson to make. So one imagines they did control for that.

Even so, lung cancer is a fairly common thing. (Lifetime risk is over 1%.) I'd want to know what the timeframe was, but in general if six people out of four thousand in a population get lung cancer, I'm not sure it's really necessary to go looking for any special cause. That sounds in line with the normal incidence rate to me, again, depending on timeframe.

Come to think of it, if there were a normal (based on society) percentage of smokers in the study population, seven cases in eight thousand people would be an astonishingly low incidence of lung cancer over any sufficiently-long timeframe to make a meaningful study. But maybe they controlled for the percentage of smokers by excluding smokers. That would make seven people in eight thousand seem about right over a typical not-very-long study timeframe. Approximately.

(Nominally, if the lifetime risk is about 1.3%, give or take, then the risk in eight years, if the risk were spread out evenly over the lifetime (which it's not) would be about .13%, or in a five-year study about .075%. This is not exactly right, among other things because the risk of lung cancer correlates somewhat with age, and the study participants were probably all adults, and my figure for average lifespan might not be exactly right. But for a back-of-envelope calculation it's awefully close to the 7/8000 (i.e., 0.875%) observed. Though, again, I don't know how long the study was.)

If six people in the population of four thousand had come down with something fairly unusual, that would probably mean something. But lung cancer is just not that rare. If six people out of four thousand in a clinical study for topically applied foot cream had come down with lung cancer, nobody would have thought anything about it. The fact that Exubera is inhaled is the only reason it even got noticed.

It's worth continuing to watch, to see if more of the people in the population have trouble or anything, but I would not jump to a conclusion, based on six cases in a population of four thousand, that the drug causes lung cancer.

Logically, if the incidence in your combined population is normal, then any unexpected higher-than-normal risk in one half the population is only meaningful if the corresponding lower-than-normal risk in the other half is also meaningful (which makes no sense at all; being part of the control group in a drug study would NOT be expected to reduce your risk of lung cancer). Otherwise it just means there's a risk factor you didn't control for properly.

My bet is there's a (known or unknown) risk factor they neglected to control for properly.

I'm neither a chemist nor a biologist, and it's been a few years since my mathematical prob and stat class, so there could be something I'm missing. But common sense tells me six in four thousand is not a worrisomely high incidence of lung cancer unless there's something more to the study that I don't understand.

Permalink to Comment18. Jonadab the Unsightly One on April 13, 2008 2:32 PM writes...

Oh, and by "smokers" I mean "people who live or work in a building where smoking occurs", and similarly but in the past tense for "former smokers". But as I said, one supposes they controlled for that because, duh, it's a clinical study for something that goes into the lungs.

Permalink to Comment19. Vyceroy on April 22, 2008 11:10 AM writes...

This is just another devestating blow blow to me .. Exubera has worked so well for me and i am am confused and a bit angry about this whole Situation with Phizer .. I called and spoke with a rep for Phizer and she had to look the announcement up .. and in my opinion down played the Whole thing .. i have a call in to my doctor and i am waiting to hear back ..

I am still on exubera in the transition program

Permalink to Comment20. Eddie B on September 2, 2009 9:28 PM writes...

I was one of those "control" subjects who used exubera for over 6 years before it was pulled. The required a lung xray at least once a year. I know that increases the potential for cancer in itself. I really am starting to get nervous now that I am reading about what I have been exposed to. Granted that I wasn't forced to inhale the insulin. Psalms 103:15 "As for man, his days are like grass, he flourishes like a flower of the field;"

Permalink to Comment