Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases.
To contact Derek email him directly: derekb.lowe@gmail.com
There's a problem in the drug industry that people have recognized for some years, but we're not that much closer to dealing with it than we were then. We keep coming up with these technologies and techniques which seem as if they might be able to help us with some of our nastiest problems - I'm talking about genomics in all its guises, and metabolic profiling, and naturally the various high-throughput screening platforms, and others. But whether these are helping or not (and opinions sure do vary), one thing that they all have in common is that they generate enormous heaps of data.
We're not the only field to wish that the speed of collating and understanding all these results would start to catch up with the speed with which they're being generated. But some days I feel as if the two curves don't even have the same exponent in their equations. High-throughput screening data are fairly manageable, as these things go, and it's a good thing. When you can rip through a million compounds screening a new target, generating multiple-point binding curves along the way, you have a good-sized brick of numbers. But you're looking for just the ones with tight binding and reasonable curves, which is a relatively simple operation, and by the time you're done there may only be a couple of dozen compounds worth looking at. (More often than you'd think, there may be none at all).
But genomics/metabolomics/buzzwordomics platforms are tougher. In these cases, we don't actually know what we're looking for much of the time. I mean, we don't understand what the huge majority of the genes on a gene-chip assay really do, not in any useful detail, anyway. So the results of a given assay aren't the horserace leader board of a binding assay; they're more like a huge, complicated fingerprint or an abstract painting. We can say that yes, this compound seems to be different from that one, which is certainly different from this one over here but maybe similar to these on the left - but sometimes that's about all we can say.
Of course, the story isn't supposed to stop there, and everyone's hoping it won't. The idea is that we'll learn to interpret these things as we see more and more compounds and their ultimate effects. Correlations, trends, and useful conclusions are out there (surely?) and if we persevere we'll uncover them. The problem is, finding these things looks like requiring the generation of still more endless terabytes of data. It takes nerve to go on, but we seem to have no other choice.
1. The Pharmacoepidemiologist on July 10, 2007 8:10 PM writes...
Can anyone tell me of a blockbuster discovered by genomics? or any of the other high tech omics? Aside from the industry's economics not being able to handle such drugs, the fact is there aren't any which have been found that way. I remember in the late 1990s, BMS invested a small fortune in genomics. What blockbuster came out of the effort? Nada. Nothing. (How does one have a small fortune from genomics? Start with a large fortune.)
The technology is really cool--and not worth as much as its backers would suggest. When it finally does produce a few blockbusters, I'll happily change my tune. Until then...
2. Polymer Bound on July 10, 2007 11:19 PM writes...
Weren't most of the hot new cancer targets discovered via genomics? Seems like most of these things are useful for generating new targets to look at. I don't know how useful -omics are once a med chem group gets involved, but they might be good at giving us projects.
There's a parallel situation coming up in astrophysics: there are several big new telescopes (the Large Synoptic Survey Telescope, the Atacama Large Millimeter Array, and the Square Kilometer Array, for example) being constructed that will in principle allow fantastically good observations of the sky. They're so good that the limiting factor is going to be computing power and storage space: they are each going to produce a Niagara of data far in excess of what Moore's Law predicts we're going to be able to cope with by the time they're built. Shoveling through it all for the observations one actually wants is going to require a whole new collection of skills. In fact, shoveling through the mountains of data that exist now is a potentially productive area of research: ROSAT flew in the 90s, and the vast majority of the bright sources it identified have not yet been identified. All the X-ray sattellites' data becomes public after a year, and much of this archival data has not been searched for pulsations (for example)...
Anne, any idea where you might see solutions for these problems arising? IT groups, Stat folks, google expats, chemo/bioinformatics types? Library Science database manangers? Just wondering if there is a niche there...
Astronomy's an excellent example of a field with a similar data problem - as an amateur observer, I wish I'd thought of that analogy myself. Of course, as an amateur, I don't have to stick my head into the giant pile of data as much, although the availability of things like the Sloan Deep Sky Survey, the Hipparcos catalog, etc. has already affected the amateur ranks a great deal.
7. Molecular Geek on July 11, 2007 7:06 AM writes...
Well, the solution almost certainly won't come out of corporate IT. With so much riding on 21 part 11 compliance and the resulting tendency towards tightly regulated environments, they usually aren't agile enough to keep up with the research side, let alone lead the charge. Having a separate research support/informatics group helps to some degree, but as long as they have to answer to corporate IT (and I don't know many CIOs whose ego can handle the threat of independent computing in their territory), the siren calls of uniformity and strict process will always trump the demand for better tools.
"But genomics/metabolomics/buzzwordomics platforms are tougher. In these cases, we don't actually know what we're looking for much of the time."
That sums it up better than I ever could have. Many of the genomics projects I've seen or heard about seem to have been conceived to stay ahead of the competition--rather than with any particular hypothesis. There has been a lot of learn-as-you-go, and a lot of the data really doesn't justify the effort to sift through it. The projects that seem to have a future (to me, anyway) are the ones where there was some kind of hypothesis and a plan.
And that leads to the second problem: what do you do with a hit? The in vitro folks always seem to have a robust binding assay that they can use to follow up their HTS hits, but the genomics people are often swamped just doing the RT-PCR to make sure the chips were behaving themselves. And then there are the resources required to work out the details of the confirmed hits, which (as Derek points out) are often pretty unknown quantities....
Genomics was certainly overhyped, but I think that the two commenters are holding it to an unfair standard -- what research programs resulting in blockbusters were originated 10 years ago? Pharmaceutical development just doesn't move that fast.
I agree it is difficult to go from genomics 'hit' to small molecule pharmaceutical. The first fruits of genomics seem to be more along the lines of biologicals (e.g. HGS's Lupus treatment) or pharmacogenomic info driving future development paths of existing drugs (e.g. Herceptin).
10. TW Andrews on July 11, 2007 10:16 AM writes...
When anyone says that industry doesn't do basic research, they should have pointed out to them all the billions of dollars dumped into techniques and technologies which have had questionable impact on the bottom line.
There may eventually be a payoff, but it's hard to say that companies don't invest in new stuff.
They may not make the data public, but it definitely gets generated.
My experience as a pharma and biotech IT droid working for two very different companies over the last six years is that there is a lack of understanding at (typically software develeopment illiterate) senior management level that developing IT systems, especially those intended for analysis of oddly-shaped scientific data, takes time, resources and money. Science doesn't stand still, especially drug discovery, and at my last company, a biotech startup, over the last four years I implemented upwards of 50 user requested new software tools and associated tweaks and updates, with an outstanding list of much longer length, mostly on my own or, sometimes, with the assistance of a very capable student or part-time consultant. I think the intangible nature of software - that it kind of just appears, and that most of the end-users (chemists and molecular biologists) are not involved in the actual programming or database development and maintenance effort - misleads management into believing it's a straightforward thing to do.
12. Anonymous BMS Researcher on July 12, 2007 7:21 AM writes...
We often joke about acquiring some storage vendor just so we'd have a captive source of disk space! We keep getting disks, but with stuff like high-content screening we still keep running out of space. Even more of a problem is backing up all those drives -- tape speeds have fallen way behind disk capacities in recent years so our sysadmins barely manage to run a full backup over a weekend nowadays, but when I was a sysadmin in the 1990s I could run a full backup of my servers overnight while I slept.
Also, something many readers may not know: high-end storage that meets our needs for performance and reliability costs a LOT more per gigabyte than do mass-market disks sold to consumers.
I wonder how much the spooks at NSA spend on storage!?
You could take the Google approach - use mountains of cheap unreliable disks. Of course that requires even more investment in software to build an adequately reliable system... part of the trick, I think, is to acknowledge that when you get data at 86 GB/hr (say), you have to treat it as somewhat disposable. If a terabyte disk containing your only copy of some data dies, it sucks - that's two days of observing lost. But it's not a monumental disaster, and it's not necessarily worth (say) quadrupling your expenditures on disks. You really need to ask yourself how much your data is worth per terabyte...
14. NJBiologist on July 13, 2007 6:41 AM writes...
Anne--that's the magic of RAID: the data are spread across a series of disks, with some fraction used for redundancy of data. That way, with four terabytes of data smeared across five drives, you only need to replace drives and repopulate the array faster than the rate at which drives fail (which rates Google has released numbers on; not surprisingly, they don't match the manufacturers' own MTBF numbers).
There is a very interesting article in Nature Reviews Drug Discovery 2003;2:151-154 about the promise (or lack thereof) of the current reductionistic method of drug discovery. Very entertaining...
16. Fries with that? on July 14, 2007 11:59 AM writes...
It's not the data itself that is the problem, it's the fact that in order to process and analyze that data, biologists, chemists, informaticists, statisticians and upper management all have to share information and play nice. There's the rub!
Maybe part of the problem is how the interpreting is done. Say an experimentalist sends in some RNA samples and requests microarray analysis. A few days later, a spreadsheet comes back - with neat columns of hundreds of gene symbols (or probe IDs) and associated p-values. It is hard to imagine even the most dedicated reader "click" all the way through the pile (think detailed reading, with frequent detours, of a PubMed query returning hundreds of abstracts) - yet consider all the combinations and patterns hiding in the data. Access to gene expression databases and data mining tools, and, especially, the interaction with the biostatistics/analysis team is crucial. There is clearly much more to biomedical data analysis than biostatistics, and interpretation is the key. It is unfortunate that too often the problem gets reduced to the proverbial lists and cut-offs.
I think, as with any other research project, one of the problems associated with "omics" studies is that if the experiment is not designed to make full use of these high-throughput techniques then the interpretation of the data becomes even harder. As they say - "garbage in - garbage out". Majority of the "omics" studies carried out do not have all the necessary groups, due to cost constrains, needed to carry out proper analysis. As a result of this it becomes difficult to "filter" out noise and obtain "clean" results.
Derek states that data generated by genomics / metabolomics / etc platforms is harder to analyze because scientists aren't sure what type of information is useful to look for. Since 2000 we have seen a shift in retail, business analytics, and (more recently) financial analysis applications from give-me-what-I-ask-for towards real-time information discovery. To see an example of what I mean, click through Home Depot's navigation options on the left side of their homepage. Information discovery platforms are rapidly growing their scalability. Perhaps a platform such as Endeca's will be able to 'suggest' interesting facets of information to scientists working on these problems as the platform scales to handle ever more massive quantities of data.
Cracking the genome promised much but hass so far not delivered and the problem with using IT is computers can idenfify and quantify far more patterns and potential connections that anyone can realistically look at. But when you get down to it, diseases are just so damned unscientific.
1. The Pharmacoepidemiologist on July 10, 2007 8:10 PM writes...
Can anyone tell me of a blockbuster discovered by genomics? or any of the other high tech omics? Aside from the industry's economics not being able to handle such drugs, the fact is there aren't any which have been found that way. I remember in the late 1990s, BMS invested a small fortune in genomics. What blockbuster came out of the effort? Nada. Nothing. (How does one have a small fortune from genomics? Start with a large fortune.)
The technology is really cool--and not worth as much as its backers would suggest. When it finally does produce a few blockbusters, I'll happily change my tune. Until then...
Permalink to Comment2. Polymer Bound on July 10, 2007 11:19 PM writes...
Weren't most of the hot new cancer targets discovered via genomics? Seems like most of these things are useful for generating new targets to look at. I don't know how useful -omics are once a med chem group gets involved, but they might be good at giving us projects.
Permalink to Comment3. Anonymous IT worm on July 10, 2007 11:39 PM writes...
Too bad pharma computing is more territorial than the old borders between Poland and Russia.
IT people in pharma spend more time in political gamesmanship than in getting ready to deal with those terabytes.
Permalink to Comment4. Anne on July 11, 2007 1:01 AM writes...
There's a parallel situation coming up in astrophysics: there are several big new telescopes (the Large Synoptic Survey Telescope, the Atacama Large Millimeter Array, and the Square Kilometer Array, for example) being constructed that will in principle allow fantastically good observations of the sky. They're so good that the limiting factor is going to be computing power and storage space: they are each going to produce a Niagara of data far in excess of what Moore's Law predicts we're going to be able to cope with by the time they're built. Shoveling through it all for the observations one actually wants is going to require a whole new collection of skills. In fact, shoveling through the mountains of data that exist now is a potentially productive area of research: ROSAT flew in the 90s, and the vast majority of the bright sources it identified have not yet been identified. All the X-ray sattellites' data becomes public after a year, and much of this archival data has not been searched for pulsations (for example)...
Permalink to Comment5. Jose on July 11, 2007 1:10 AM writes...
Anne, any idea where you might see solutions for these problems arising? IT groups, Stat folks, google expats, chemo/bioinformatics types? Library Science database manangers? Just wondering if there is a niche there...
Permalink to Comment6. Derek Lowe on July 11, 2007 6:07 AM writes...
Astronomy's an excellent example of a field with a similar data problem - as an amateur observer, I wish I'd thought of that analogy myself. Of course, as an amateur, I don't have to stick my head into the giant pile of data as much, although the availability of things like the Sloan Deep Sky Survey, the Hipparcos catalog, etc. has already affected the amateur ranks a great deal.
Permalink to Comment7. Molecular Geek on July 11, 2007 7:06 AM writes...
Well, the solution almost certainly won't come out of corporate IT. With so much riding on 21 part 11 compliance and the resulting tendency towards tightly regulated environments, they usually aren't agile enough to keep up with the research side, let alone lead the charge. Having a separate research support/informatics group helps to some degree, but as long as they have to answer to corporate IT (and I don't know many CIOs whose ego can handle the threat of independent computing in their territory), the siren calls of uniformity and strict process will always trump the demand for better tools.
MG
Permalink to Comment8. NJBiologist on July 11, 2007 7:42 AM writes...
"But genomics/metabolomics/buzzwordomics platforms are tougher. In these cases, we don't actually know what we're looking for much of the time."
That sums it up better than I ever could have. Many of the genomics projects I've seen or heard about seem to have been conceived to stay ahead of the competition--rather than with any particular hypothesis. There has been a lot of learn-as-you-go, and a lot of the data really doesn't justify the effort to sift through it. The projects that seem to have a future (to me, anyway) are the ones where there was some kind of hypothesis and a plan.
And that leads to the second problem: what do you do with a hit? The in vitro folks always seem to have a robust binding assay that they can use to follow up their HTS hits, but the genomics people are often swamped just doing the RT-PCR to make sure the chips were behaving themselves. And then there are the resources required to work out the details of the confirmed hits, which (as Derek points out) are often pretty unknown quantities....
Permalink to Comment9. Keith Robison on July 11, 2007 9:28 AM writes...
Genomics was certainly overhyped, but I think that the two commenters are holding it to an unfair standard -- what research programs resulting in blockbusters were originated 10 years ago? Pharmaceutical development just doesn't move that fast.
I agree it is difficult to go from genomics 'hit' to small molecule pharmaceutical. The first fruits of genomics seem to be more along the lines of biologicals (e.g. HGS's Lupus treatment) or pharmacogenomic info driving future development paths of existing drugs (e.g. Herceptin).
Permalink to Comment10. TW Andrews on July 11, 2007 10:16 AM writes...
When anyone says that industry doesn't do basic research, they should have pointed out to them all the billions of dollars dumped into techniques and technologies which have had questionable impact on the bottom line.
There may eventually be a payoff, but it's hard to say that companies don't invest in new stuff.
They may not make the data public, but it definitely gets generated.
Permalink to Comment11. daen on July 11, 2007 12:47 PM writes...
My experience as a pharma and biotech IT droid working for two very different companies over the last six years is that there is a lack of understanding at (typically software develeopment illiterate) senior management level that developing IT systems, especially those intended for analysis of oddly-shaped scientific data, takes time, resources and money. Science doesn't stand still, especially drug discovery, and at my last company, a biotech startup, over the last four years I implemented upwards of 50 user requested new software tools and associated tweaks and updates, with an outstanding list of much longer length, mostly on my own or, sometimes, with the assistance of a very capable student or part-time consultant. I think the intangible nature of software - that it kind of just appears, and that most of the end-users (chemists and molecular biologists) are not involved in the actual programming or database development and maintenance effort - misleads management into believing it's a straightforward thing to do.
Permalink to Comment12. Anonymous BMS Researcher on July 12, 2007 7:21 AM writes...
We often joke about acquiring some storage vendor just so we'd have a captive source of disk space! We keep getting disks, but with stuff like high-content screening we still keep running out of space. Even more of a problem is backing up all those drives -- tape speeds have fallen way behind disk capacities in recent years so our sysadmins barely manage to run a full backup over a weekend nowadays, but when I was a sysadmin in the 1990s I could run a full backup of my servers overnight while I slept.
Also, something many readers may not know: high-end storage that meets our needs for performance and reliability costs a LOT more per gigabyte than do mass-market disks sold to consumers.
I wonder how much the spooks at NSA spend on storage!?
Permalink to Comment13. Anne on July 12, 2007 9:56 AM writes...
You could take the Google approach - use mountains of cheap unreliable disks. Of course that requires even more investment in software to build an adequately reliable system... part of the trick, I think, is to acknowledge that when you get data at 86 GB/hr (say), you have to treat it as somewhat disposable. If a terabyte disk containing your only copy of some data dies, it sucks - that's two days of observing lost. But it's not a monumental disaster, and it's not necessarily worth (say) quadrupling your expenditures on disks. You really need to ask yourself how much your data is worth per terabyte...
Permalink to Comment14. NJBiologist on July 13, 2007 6:41 AM writes...
Anne--that's the magic of RAID: the data are spread across a series of disks, with some fraction used for redundancy of data. That way, with four terabytes of data smeared across five drives, you only need to replace drives and repopulate the array faster than the rate at which drives fail (which rates Google has released numbers on; not surprisingly, they don't match the manufacturers' own MTBF numbers).
Permalink to Comment15. emjeff on July 13, 2007 3:13 PM writes...
There is a very interesting article in Nature Reviews Drug Discovery 2003;2:151-154 about the promise (or lack thereof) of the current reductionistic method of drug discovery. Very entertaining...
Permalink to Comment16. Fries with that? on July 14, 2007 11:59 AM writes...
It's not the data itself that is the problem, it's the fact that in order to process and analyze that data, biologists, chemists, informaticists, statisticians and upper management all have to share information and play nice. There's the rub!
Permalink to Comment17. BioDataGuy on July 16, 2007 12:11 PM writes...
Maybe part of the problem is how the interpreting is done. Say an experimentalist sends in some RNA samples and requests microarray analysis. A few days later, a spreadsheet comes back - with neat columns of hundreds of gene symbols (or probe IDs) and associated p-values. It is hard to imagine even the most dedicated reader "click" all the way through the pile (think detailed reading, with frequent detours, of a PubMed query returning hundreds of abstracts) - yet consider all the combinations and patterns hiding in the data. Access to gene expression databases and data mining tools, and, especially, the interaction with the biostatistics/analysis team is crucial. There is clearly much more to biomedical data analysis than biostatistics, and interpretation is the key. It is unfortunate that too often the problem gets reduced to the proverbial lists and cut-offs.
Permalink to Comment18. Sanjiv on July 16, 2007 1:02 PM writes...
I think, as with any other research project, one of the problems associated with "omics" studies is that if the experiment is not designed to make full use of these high-throughput techniques then the interpretation of the data becomes even harder. As they say - "garbage in - garbage out". Majority of the "omics" studies carried out do not have all the necessary groups, due to cost constrains, needed to carry out proper analysis. As a result of this it becomes difficult to "filter" out noise and obtain "clean" results.
Permalink to Comment19. ChrisB on July 31, 2007 1:04 PM writes...
Derek states that data generated by genomics / metabolomics / etc platforms is harder to analyze because scientists aren't sure what type of information is useful to look for. Since 2000 we have seen a shift in retail, business analytics, and (more recently) financial analysis applications from give-me-what-I-ask-for towards real-time information discovery. To see an example of what I mean, click through Home Depot's navigation options on the left side of their homepage. Information discovery platforms are rapidly growing their scalability. Perhaps a platform such as Endeca's will be able to 'suggest' interesting facets of information to scientists working on these problems as the platform scales to handle ever more massive quantities of data.
Permalink to Comment20. Ian Thorpe on August 6, 2007 10:48 AM writes...
Cracking the genome promised much but hass so far not delivered and the problem with using IT is computers can idenfify and quantify far more patterns and potential connections that anyone can realistically look at. But when you get down to it, diseases are just so damned unscientific.
Permalink to Comment