About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Now With Ethyl Mesylate! | Main | First Impressions »

July 10, 2007

Travels In Numerica Deserta

Email This Entry

Posted by Derek

There's a problem in the drug industry that people have recognized for some years, but we're not that much closer to dealing with it than we were then. We keep coming up with these technologies and techniques which seem as if they might be able to help us with some of our nastiest problems - I'm talking about genomics in all its guises, and metabolic profiling, and naturally the various high-throughput screening platforms, and others. But whether these are helping or not (and opinions sure do vary), one thing that they all have in common is that they generate enormous heaps of data.

We're not the only field to wish that the speed of collating and understanding all these results would start to catch up with the speed with which they're being generated. But some days I feel as if the two curves don't even have the same exponent in their equations. High-throughput screening data are fairly manageable, as these things go, and it's a good thing. When you can rip through a million compounds screening a new target, generating multiple-point binding curves along the way, you have a good-sized brick of numbers. But you're looking for just the ones with tight binding and reasonable curves, which is a relatively simple operation, and by the time you're done there may only be a couple of dozen compounds worth looking at. (More often than you'd think, there may be none at all).

But genomics/metabolomics/buzzwordomics platforms are tougher. In these cases, we don't actually know what we're looking for much of the time. I mean, we don't understand what the huge majority of the genes on a gene-chip assay really do, not in any useful detail, anyway. So the results of a given assay aren't the horserace leader board of a binding assay; they're more like a huge, complicated fingerprint or an abstract painting. We can say that yes, this compound seems to be different from that one, which is certainly different from this one over here but maybe similar to these on the left - but sometimes that's about all we can say.

Of course, the story isn't supposed to stop there, and everyone's hoping it won't. The idea is that we'll learn to interpret these things as we see more and more compounds and their ultimate effects. Correlations, trends, and useful conclusions are out there (surely?) and if we persevere we'll uncover them. The problem is, finding these things looks like requiring the generation of still more endless terabytes of data. It takes nerve to go on, but we seem to have no other choice.

Comments (28) + TrackBacks (0) | Category: Drug Assays


1. The Pharmacoepidemiologist on July 10, 2007 8:10 PM writes...

Can anyone tell me of a blockbuster discovered by genomics? or any of the other high tech omics? Aside from the industry's economics not being able to handle such drugs, the fact is there aren't any which have been found that way. I remember in the late 1990s, BMS invested a small fortune in genomics. What blockbuster came out of the effort? Nada. Nothing. (How does one have a small fortune from genomics? Start with a large fortune.)

The technology is really cool--and not worth as much as its backers would suggest. When it finally does produce a few blockbusters, I'll happily change my tune. Until then...

Permalink to Comment

2. Polymer Bound on July 10, 2007 11:19 PM writes...

Weren't most of the hot new cancer targets discovered via genomics? Seems like most of these things are useful for generating new targets to look at. I don't know how useful -omics are once a med chem group gets involved, but they might be good at giving us projects.

Permalink to Comment

3. Anonymous IT worm on July 10, 2007 11:39 PM writes...

Too bad pharma computing is more territorial than the old borders between Poland and Russia.

IT people in pharma spend more time in political gamesmanship than in getting ready to deal with those terabytes.

Permalink to Comment

4. Anne on July 11, 2007 1:01 AM writes...

There's a parallel situation coming up in astrophysics: there are several big new telescopes (the Large Synoptic Survey Telescope, the Atacama Large Millimeter Array, and the Square Kilometer Array, for example) being constructed that will in principle allow fantastically good observations of the sky. They're so good that the limiting factor is going to be computing power and storage space: they are each going to produce a Niagara of data far in excess of what Moore's Law predicts we're going to be able to cope with by the time they're built. Shoveling through it all for the observations one actually wants is going to require a whole new collection of skills. In fact, shoveling through the mountains of data that exist now is a potentially productive area of research: ROSAT flew in the 90s, and the vast majority of the bright sources it identified have not yet been identified. All the X-ray sattellites' data becomes public after a year, and much of this archival data has not been searched for pulsations (for example)...

Permalink to Comment

5. Jose on July 11, 2007 1:10 AM writes...

Anne, any idea where you might see solutions for these problems arising? IT groups, Stat folks, google expats, chemo/bioinformatics types? Library Science database manangers? Just wondering if there is a niche there...

Permalink to Comment

6. Derek Lowe on July 11, 2007 6:07 AM writes...

Astronomy's an excellent example of a field with a similar data problem - as an amateur observer, I wish I'd thought of that analogy myself. Of course, as an amateur, I don't have to stick my head into the giant pile of data as much, although the availability of things like the Sloan Deep Sky Survey, the Hipparcos catalog, etc. has already affected the amateur ranks a great deal.

Permalink to Comment

7. Molecular Geek on July 11, 2007 7:06 AM writes...

Well, the solution almost certainly won't come out of corporate IT. With so much riding on 21 part 11 compliance and the resulting tendency towards tightly regulated environments, they usually aren't agile enough to keep up with the research side, let alone lead the charge. Having a separate research support/informatics group helps to some degree, but as long as they have to answer to corporate IT (and I don't know many CIOs whose ego can handle the threat of independent computing in their territory), the siren calls of uniformity and strict process will always trump the demand for better tools.


Permalink to Comment

8. NJBiologist on July 11, 2007 7:42 AM writes...

"But genomics/metabolomics/buzzwordomics platforms are tougher. In these cases, we don't actually know what we're looking for much of the time."

That sums it up better than I ever could have. Many of the genomics projects I've seen or heard about seem to have been conceived to stay ahead of the competition--rather than with any particular hypothesis. There has been a lot of learn-as-you-go, and a lot of the data really doesn't justify the effort to sift through it. The projects that seem to have a future (to me, anyway) are the ones where there was some kind of hypothesis and a plan.

And that leads to the second problem: what do you do with a hit? The in vitro folks always seem to have a robust binding assay that they can use to follow up their HTS hits, but the genomics people are often swamped just doing the RT-PCR to make sure the chips were behaving themselves. And then there are the resources required to work out the details of the confirmed hits, which (as Derek points out) are often pretty unknown quantities....

Permalink to Comment

9. Keith Robison on July 11, 2007 9:28 AM writes...

Genomics was certainly overhyped, but I think that the two commenters are holding it to an unfair standard -- what research programs resulting in blockbusters were originated 10 years ago? Pharmaceutical development just doesn't move that fast.

I agree it is difficult to go from genomics 'hit' to small molecule pharmaceutical. The first fruits of genomics seem to be more along the lines of biologicals (e.g. HGS's Lupus treatment) or pharmacogenomic info driving future development paths of existing drugs (e.g. Herceptin).

Permalink to Comment

10. TW Andrews on July 11, 2007 10:16 AM writes...

When anyone says that industry doesn't do basic research, they should have pointed out to them all the billions of dollars dumped into techniques and technologies which have had questionable impact on the bottom line.

There may eventually be a payoff, but it's hard to say that companies don't invest in new stuff.

They may not make the data public, but it definitely gets generated.

Permalink to Comment

11. daen on July 11, 2007 12:47 PM writes...

My experience as a pharma and biotech IT droid working for two very different companies over the last six years is that there is a lack of understanding at (typically software develeopment illiterate) senior management level that developing IT systems, especially those intended for analysis of oddly-shaped scientific data, takes time, resources and money. Science doesn't stand still, especially drug discovery, and at my last company, a biotech startup, over the last four years I implemented upwards of 50 user requested new software tools and associated tweaks and updates, with an outstanding list of much longer length, mostly on my own or, sometimes, with the assistance of a very capable student or part-time consultant. I think the intangible nature of software - that it kind of just appears, and that most of the end-users (chemists and molecular biologists) are not involved in the actual programming or database development and maintenance effort - misleads management into believing it's a straightforward thing to do.

Permalink to Comment

12. Anonymous BMS Researcher on July 12, 2007 7:21 AM writes...

We often joke about acquiring some storage vendor just so we'd have a captive source of disk space! We keep getting disks, but with stuff like high-content screening we still keep running out of space. Even more of a problem is backing up all those drives -- tape speeds have fallen way behind disk capacities in recent years so our sysadmins barely manage to run a full backup over a weekend nowadays, but when I was a sysadmin in the 1990s I could run a full backup of my servers overnight while I slept.

Also, something many readers may not know: high-end storage that meets our needs for performance and reliability costs a LOT more per gigabyte than do mass-market disks sold to consumers.

I wonder how much the spooks at NSA spend on storage!?

Permalink to Comment

13. Anne on July 12, 2007 9:56 AM writes...

You could take the Google approach - use mountains of cheap unreliable disks. Of course that requires even more investment in software to build an adequately reliable system... part of the trick, I think, is to acknowledge that when you get data at 86 GB/hr (say), you have to treat it as somewhat disposable. If a terabyte disk containing your only copy of some data dies, it sucks - that's two days of observing lost. But it's not a monumental disaster, and it's not necessarily worth (say) quadrupling your expenditures on disks. You really need to ask yourself how much your data is worth per terabyte...

Permalink to Comment

14. NJBiologist on July 13, 2007 6:41 AM writes...

Anne--that's the magic of RAID: the data are spread across a series of disks, with some fraction used for redundancy of data. That way, with four terabytes of data smeared across five drives, you only need to replace drives and repopulate the array faster than the rate at which drives fail (which rates Google has released numbers on; not surprisingly, they don't match the manufacturers' own MTBF numbers).

Permalink to Comment

15. emjeff on July 13, 2007 3:13 PM writes...

There is a very interesting article in Nature Reviews Drug Discovery 2003;2:151-154 about the promise (or lack thereof) of the current reductionistic method of drug discovery. Very entertaining...

Permalink to Comment

16. Fries with that? on July 14, 2007 11:59 AM writes...

It's not the data itself that is the problem, it's the fact that in order to process and analyze that data, biologists, chemists, informaticists, statisticians and upper management all have to share information and play nice. There's the rub!

Permalink to Comment

17. BioDataGuy on July 16, 2007 12:11 PM writes...

Maybe part of the problem is how the interpreting is done. Say an experimentalist sends in some RNA samples and requests microarray analysis. A few days later, a spreadsheet comes back - with neat columns of hundreds of gene symbols (or probe IDs) and associated p-values. It is hard to imagine even the most dedicated reader "click" all the way through the pile (think detailed reading, with frequent detours, of a PubMed query returning hundreds of abstracts) - yet consider all the combinations and patterns hiding in the data. Access to gene expression databases and data mining tools, and, especially, the interaction with the biostatistics/analysis team is crucial. There is clearly much more to biomedical data analysis than biostatistics, and interpretation is the key. It is unfortunate that too often the problem gets reduced to the proverbial lists and cut-offs.

Permalink to Comment

18. Sanjiv on July 16, 2007 1:02 PM writes...

I think, as with any other research project, one of the problems associated with "omics" studies is that if the experiment is not designed to make full use of these high-throughput techniques then the interpretation of the data becomes even harder. As they say - "garbage in - garbage out". Majority of the "omics" studies carried out do not have all the necessary groups, due to cost constrains, needed to carry out proper analysis. As a result of this it becomes difficult to "filter" out noise and obtain "clean" results.

Permalink to Comment

19. ChrisB on July 31, 2007 1:04 PM writes...

Derek states that data generated by genomics / metabolomics / etc platforms is harder to analyze because scientists aren't sure what type of information is useful to look for. Since 2000 we have seen a shift in retail, business analytics, and (more recently) financial analysis applications from give-me-what-I-ask-for towards real-time information discovery. To see an example of what I mean, click through Home Depot's navigation options on the left side of their homepage. Information discovery platforms are rapidly growing their scalability. Perhaps a platform such as Endeca's will be able to 'suggest' interesting facets of information to scientists working on these problems as the platform scales to handle ever more massive quantities of data.

Permalink to Comment

20. Ian Thorpe on August 6, 2007 10:48 AM writes...

Cracking the genome promised much but hass so far not delivered and the problem with using IT is computers can idenfify and quantify far more patterns and potential connections that anyone can realistically look at. But when you get down to it, diseases are just so damned unscientific.

Permalink to Comment

21. Kendall Sue on July 14, 2008 10:12 AM writes...

If only such a thing could crack the diseases we fight so hard. I don't know if something like cancer can be cracked with nothing more than sheer luck as most cures have come from. It would be nice though.

Kendall Sue

Permalink to Comment

22. Todd on July 19, 2008 10:00 PM writes...

It's nice to know that science is evolving and using the newest technologies as much as possible to find the drugs to cure the worst diseases. I just hope they find the cures before the resources for the medicines (such as the rain forests) are completely destroyed!

Permalink to Comment

23. Abhishek Tiwari on March 10, 2009 5:47 AM writes...

"Any strategy that gives technology an independent role as problem solver is doomed to fail" and that what is happening. Omics is just comics, read it feel good but don't expect anything.

Permalink to Comment

24. Mutatis Mutandis on August 15, 2009 5:46 AM writes...

The current generation of biologists is not really equipped to handle all that data: I've seen so many errors against basic statistics, not to mention high-school mathematics... Before -omics can be really useful, we must convince pharma to integrate people with a better grip on data deeply into their biological teams: Mathematicians, physicists, programmers specializing in computational modelling. And the way biologists are trained needs to be seriously overhauled to give them in a solid foundation in these apparently arcane subjects.

For that we may have to wait until the young people who are working in -omics now, have trained a new generation of researchers, who have not been born yet...

As for IT, the problem is indeed the choking grip of corporate IT departments who are unable and unwilling to support research groups with increasingly heavy IT requirements. However, to appreciate the absurdity of the situation you must also factor in that everybody in the industry knows this: It has become routine for researchers on conferences, and even for vice-presidents in staff meetings, to denounce their IT department as a hopeless misfit. But apart from a few fortunate exceptions, nothing is done or can be done about it.

We may have to wait until some of the future researchers who will be trained by our current -omics technologists, have become CEOs.

Permalink to Comment

25. simpl on June 1, 2011 11:21 AM writes...

re #2 cancer cures
Yes, the genome has identified targets: but arguably the advance of Gleevec had more to do with showing that a kinase blocker didn't do as much metabolic damage had been expected - and since the benefit was clear, it opened the field for others to try the same trick on other intracellular enzymes.
Re #3, 24 IT dinosaurs
I've spent decades on IT projects, and feel their importance is often overestimated. Pharma lags other industries of like size, because in R&D, the core of Pharma, IT value is about turning data into useful information. This is still a human domain, merely supported by high-powered systems. If there is a critical area in Pharma, it is in the area of supply chain integration, where globalisation and outsourcing is forcing us to catch up with the engineering branches.
A tip for you researchers; our company has always done well by separating R&D computing teams, and budgets, from those of the commercial area.

Permalink to Comment

26. Bobehr on May 19, 2013 1:27 PM writes...

One help is sophisticated multivariate data mining. Just finished a project analyzing ten years of lightning data (every stroke) over North Dakota. Data full of artifacts and was a helluva job . Most present-day people labelled "data miners" use very old technology loaded with assumptions.

Permalink to Comment

27. ama on June 23, 2013 10:45 AM writes...

Within genomics there is an assumption that a
data miner will find a large signal to noise ratio once they have a partial sequence or something else to match. However, the database is really all signal (remember when they used the term "junk DNA"?) I think the data integrity is compromised by what is really a small sample size, and then skews/errors with replication, reading and storage. There is also a lot entered into the system that is incomplete or unverified because of people racing to get "published." How do we fix that?

Permalink to Comment

28. Heteromeles on January 29, 2014 1:29 PM writes...

Mutatis Mutandis hit the nail on the head. I was lucky enough to have a good multivariate statistics professor who hammered us about the importance of false positive and false negatives in big data sets. My brief foray into genomics involved debugging a chip too, which reinforced my feelings in the area.

Despite what Ama said, there is a lot of noise on things like gene chips, in the sense that most of your "data" is not stuff you're looking for, and you don't have a good algorithm for eliminating the cruft

The -omics boys (and girls) basically have the same problem that the NSA does: they're trying to find a real patterns in a system that throws up an effectively infinite array of attractive false positives. The NSA has failed to thwart any major attack, not because they can't connect the dots, but because most of the dots they look at have suspicious connections, almost all of which are spurious false positives. Jon Stewart on the Daily Show used the example of using your phone to order a pizza from the same parlor that a suspected terrorist ordered a pizza from. Are you a terrorist? No, but the NSA thinks you are, at least until they send an FBI agent to check out the pizza shop.

To an outsider like me, the current NSA looks like a boondoggle designed to sell massive amounts of expensive equipment to sucke--excuse me, bureaucrats--who think it will solve their problems. I'm not sure how many of the -omics equipment suppliers use similar business models, but we do have to be careful.

The other alternative is to come up with the equivalent of a dead-salmon-in-the-fMRI paper for each -omics product out there, just to alert practitioners of the major problems in their systems.

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry