About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Nano-Drugs: Peaked, Or Maybe Past | Main | Traffic and More Traffic »

April 14, 2013

How To Deal With the Ridiculously Huge Universe of Compounds

Email This Entry

Posted by Derek

Here's another look at the vast universe of things that no chemist has ever made. Estimates of the number of compounds with molecular weights under 500 run as high as ten to the sixtieth, which is an incomprehensibly huge number. We're not going to be able to put any sort of dent in that figure even if we convert the whole mass of the solar system into compound sample vials, so the problem remains: what's out there in that territory, and how do we best approach it?

Well, numbers of that magnitude are going to need some serious computation paring-down before we can take a crack at them, and that's what this latest paper tries to do. I'll refer interested readers to it (and to its supplementary information) for the details, but in brief, it takes a seed structure or two, adds atoms to them, goes through rounds of mutations and parings (according to filters that can be set for functional groups, properties, etc.) and then sends the whole set back around for more. This is going to rapidly explode in size, naturally, so at each stage the program picks a maximally diverse subset to go on with and discards the rest.
There are some of the compounds that come out, just to give you the idea. And they're right; I never would have thought of some of these, and I hope some of them never cross my mind again. I presume that this set has been run with rather permissive structural filters, because there are things there that (1) I don't know how to make, and (2) I'm not sure if anyone else knows how to make yet, and (3) I'm not sure how stable and isolable they'd be even if anyone did. My first reaction is that there sure are a lot of acetals, ketals, hemithioketals and so on in this set, but I'm sure that's an artifact of some sort. Any selection of a set of 10^60 compounds is an artifact of some sort.

So my next question is, what might people use such a program for? Ideas that they wouldn't have come up with, something to stir the imagination? Synthetic challenges to try for, to realize some of these compounds? The authors point out that neither nature nor man has ever really taken advantage of chemical diversity, not compared to what's possible. And that's true, but the possible numbers of compounds are still so terrifying that I wonder what we'll accomplish with drops in the bucket. (There's another paper that bears on this that I'll comment on later this week; this theme will return shortly!)

Comments (34) + TrackBacks (0) | Category: Chemical News


1. C-drug on April 14, 2013 10:09 PM writes...

Protein models are improving steadily. Perhaps this could be used to generate a super diverse library for in silico screening.

Permalink to Comment

2. Esteban on April 14, 2013 10:26 PM writes...

As a non-chemist I have no idea what I'm looking at, but I love the tic-tac-toe pattern in CVI.

Permalink to Comment

3. Yazeran on April 15, 2013 1:25 AM writes...

Actually that CVI also caught my attention, I wouldn't be surprised that if anyone actually manged to create it it would be the German guys which also did that CN7 anion thing.. (the 'tic-tac-toe' end looks suspisiously related to triacetonetriperoxide)...

Plan: To go to Mars one day with a hammer.

Permalink to Comment

4. chris on April 15, 2013 1:57 AM writes...

Whilst these Algorithms offer an insight into the vast potential size and diversity of chemical space I can't help but feel they are asking the wrong question. Whilst chemical space is vast there are only a relatively limited number of useful biomolecular targets, whilst there may be multiple binding sites and interactions possible the number will be vastly smaller than all chemical space. Would it not be better to try and identify useful "drug space" and seek means to populate that efficiently?

Permalink to Comment

5. Jose on April 15, 2013 2:29 AM writes...

Natural products chemistry without all those pesky homogenized nudibrachs to extract!

Permalink to Comment

6. Insilicoconsulting on April 15, 2013 2:43 AM writes...

If anything, such efforts should be encouraged. These should not be discarded just because our synthesis knowledge SEEMS inadequate.

If we had better and more reliable models/filters for stability, permability, pk , tox etc then these could easily be pared down computationally.

Consider the hard time small molecule DD is going through. How will it compete with alternatives in drug types and surgery etc etc? This could be one way for small molecules to make a comeback.

Permalink to Comment

7. Vanzetti on April 15, 2013 4:30 AM writes...

>>>>(1) I don't know how to make, and (2) I'm not sure if anyone else knows how to make yet

I'm not a chemist, but shouldn't the first, most important filter over the 10E60 compounds collection be a filter for stuff that can be synthesized in a sane amount time and money?

Permalink to Comment

8. Algirdas on April 15, 2013 6:08 AM writes...


"I'm not a chemist, but shouldn't the first, most important filter over the 10E60 compounds collection be a filter for stuff that can be synthesized in a sane amount time and money?"

Not necessarily. 100 years ago no one knew how to synthesize strychnine (or even what the actual bond connectivity was in it) - and yet it is a very pharmacologically active substance, very relevant to exploration of drug-like space.

Depends on what your purpose is: to explore everything sorta-drug-like that can exist in a stable form under realistic conditions? Then the fact that some molecules are too hard to synthesize with today's methodology is not an issue. We, the scientists, actually like these sorts of challenges: the universe demonstrates our ignorance once again, so we go and figure it out!

Permalink to Comment

9. eugene on April 15, 2013 6:17 AM writes...

This program could never come up with a nanoputian. Humans are still many times more... 'imaginative' is the word I'm looking for I guess. I'm sure if you add oxygen knees and joints to the nanoputians, they could become more druglike too. Or maybe put a sulphate in the crown of the nanoregent. The marketing department of a big pharma company would kill for a nanoputian based drug.

Permalink to Comment

10. Vanzetti on April 15, 2013 7:04 AM writes...


I thought the purpose is to create better compound libraries for screening purposes.

Permalink to Comment

11. shoy on April 15, 2013 7:21 AM writes...

I'm forwarding this post to K.C. Nicolaou as we speak.

Permalink to Comment

12. LeeH on April 15, 2013 8:35 AM writes...

This is an example of a computational method that is sure to be a failure with the project team, designed by an academic group that doesn't understand what it takes to connect with the synthetic chemists. It's a tool about which people will say 'Oh, it's just an idea generator', rather than one that generates GOOD ideas.

Permalink to Comment

13. John Spevacek on April 15, 2013 9:26 AM writes...

Like Yazeran (#3), that tetra-dioxolane structure on the left is wild and crazy stuff. If this weren't computer generated, I'd of thought someone was doodling too much.

Permalink to Comment

14. simpl on April 15, 2013 9:40 AM writes...

@vanzetti (7, 8, 10)
This question has been implicit in the demise in the European specialty chemicals industry, which saw their market fade away for dyestuffs, additives and many agrochemicals, but not yet for Pharmaceuticals.
In fact, we currently assume that chemists can produce any successful API in tonne quantities at a few percent of the sales income - the dosage, at fractions of a gram per day, is low enough to pay for any production synthesis for chemicals with "molcular weights under 500". This cost compares favorably to licence fees (5 - 30%) for instance.
As you point out, there must be limits to these costs, they have just not yet been challenged: and often the topic underlays discussions about pricing of new drugs. in contrast, additive developers may assume that a synthesis of >2 steps is not economically viable, for instance.

Permalink to Comment

15. luysii on April 15, 2013 9:40 AM writes...

Suppose you wanted to make just one molecule of all possible polypeptides/proteins starting with 20 amino acids 400 dipeptides, 8000 tripeptides etc. etc. Assume that the whole earth is made of C, H, O, N and S. At what protein length would you run out of material?

I did a calculation on this point a few years ago, and (so far) no one has disputed it. Take a guess, then have a look at

If you're supersmart, start phosphorylating serine, threonine and tyrosine in all possible combinations adding each to the mix and figure out the newer (and significantly lesser) protein length. I haven't done it. I think this is an NP time problem.

Permalink to Comment

16. darwinsdog on April 15, 2013 10:14 AM writes...

Nobody steal this idea, OK, but I am working on reprogramming a roomba (TM) to change direction correlating to specific atomic bonding dihredral angles. I will equip it (duck tape) with a pen (random multicolor is v 2.0) and lay some paper on the floor - voila computational chemistry ! Randomly insert stock phrases about diversity space and drug-ly-ness from conference announcements from the 1990's and the manuscript literally writes itself.

Permalink to Comment

17. ScientistSailor on April 15, 2013 10:53 AM writes...

I've seen worse-looking compounds in some current vendor libraries...

Permalink to Comment

18. JRnonchemist on April 15, 2013 11:40 AM writes...

@Esteban (2)

Ignore the arm coming off the side for a bit.

Take a piece of strong tied in a loop. Put your hands together face up, and have corners of the loop off the sides of your hands where the thumbs are. Fold your hands up, together. I think the strong will be in a shape that can help visualize the real shape of the tic-tac-toe. It is a ring folded in loops, two up, two down.

The cross hatches are the top and bottom. The pair going one way are at one end, and the pair going the other way are at the other end. The four boxy things between the Os are the sides, connecting the top and the bottom.

This part kinda looks like a wheel, a box, or a barrel.

This is merely my best guess as to how it looks most of the time, in a low energy state.

Add the arm back in, well, the whole thing looks like a ladle. Which makes me wonder if there is a way to stick something in the center of the 'cross hatched' portion.

Permalink to Comment

19. anon2 on April 15, 2013 11:51 AM writes...

Whimsical molecules that sit on the shelf after being made.

Permalink to Comment

20. MoMo on April 15, 2013 12:26 PM writes...

Molecular Universe? More like from another Universe for treating Extraterrestrials and their diseases!

Thanks, Duke and Universe of Pittsburgh, and the NIH for showing us all how you spend our American Tax Dollars. Real scientists would have liked to see what exits in OUR Universe.

You all should be fired.

Permalink to Comment

21. exGlaxoid on April 15, 2013 12:50 PM writes...

First filter ought to be water stability, since most people are made of lots of it. If the molecule will fall apart in water to a ketone or diol, then that should be the molecule screened. So all of the ketals, etc should be skipped, as the ketones could be screened instead, and even they are dubious in most drugs, at least the easily oxidized ones.

And in ones like CIV, I would simple replace the o-tolyl with a phenyl and simply the molecule. It makes no sense to screen every aromatic variation possible, especially if they make chiral molecules, as for screening, achiral compounds or racemic mixtures make more sense.

Many of the listed compounds would also fail reactivity and toxic component screens, which would make them undesirable to screen, since most companies would ignore them as hits. I might see allowing one "S" in a "hit" molecule, but I would proceed with a molecule that had 3 or more S atoms. Nitrogen is similar, as too many and the molecule is too energetic and might go boom. hey just have too many liabilities. If you simply constrain the list of starting elements to reasonable numbers of "N", "F", "Cl", and "S", remove ketals, peroxides, epoxides, etc, you would get a much more useful and smaller set of compounds to explore, which might be a more useful exerperiment.

Permalink to Comment

22. Dr. Zoidberg on April 15, 2013 1:29 PM writes...

It's worth noting that nowhere in the text of the paper are they claiming to be designing new drugs, but rather "exploring chemical space." ie; this is the new DOS/Combichem. In that respect, I have no problem with this paper.

Almost inevitably, someone will claim this is the next savior to drug discovery and that WILL bother me.

Permalink to Comment

23. David Borhani on April 15, 2013 2:38 PM writes...

@20 I agree with the gist of your comments. But as to water stability (indeed, stomach-acid stability), have a look at topiramate---most of it goes right through you, unchanged.

Permalink to Comment

24. bad wolf on April 15, 2013 4:34 PM writes...

The other way to generate a large number of previously unimagined chemical structures is to use the answers on undergrad student's organic exams.

Permalink to Comment

25. Esteban on April 15, 2013 4:44 PM writes...

exGlaxoid said: I might see allowing one "S" in a "hit" molecule

Wouldn't that make it a Shit molecule?

Permalink to Comment

26. joeylawn on April 15, 2013 9:00 PM writes...

"CVI" looks as if it would be really unstable, if it could even be made at all.

Permalink to Comment

27. alejandro on April 15, 2013 9:30 PM writes...

A comment to LeeH (#12) Well if the synthetic chemists in industry are so good then where are the drugs for cancer, neurodegenerative diseases, and all the drugs for orphan diseases that your industry promised. But rather industry seems to be pre-occupied with me-to drugs, aka the safe route. Rather than be a hater of academia one should appreciate its utility, and its ability to be "bold" and "go where no one has gone before."

Permalink to Comment

28. henry's cat on April 16, 2013 5:35 AM writes...

Those structures look like the product of an hour of bonding and valency theory followed by an evening of dope and absinthe.

Permalink to Comment

29. Hap on April 16, 2013 10:18 AM writes...

1) The structures were (supposedly) put through Lipinski and synthetic accessibility filters, so they were supposed to have passed a sanity test. Some of the earlier (A and B class) structures don't really look that bad (although not fun to make).

2) Apparently the synthetic universe loves thioketals, orthoesters, and orthothioesters.

27: The problem with your theory is that 1) academic research is supposed to go interesting places and 2) tell you how to get to them. The second part is often lacking (for example, with Booth's company, as expounded upon almost ad infinitum by Prof. Ioannadis's research). In other cases, one doesn't know what's interesting ahead of time, so exploration without being totally constrained by interest can be useful. However, when academic research is being used as a replacement for drug company research (i.e. the current "business" model for big pharma), one would assume that it could select interesting places better than the company research it replaces. So far, that would not appear to be the case.

In this example, if drug-likeness (or even stability) is a criterion for interesting compounds (the authors talk about materials and drug research as reasons for their work), it's not clear whether this research fits criterion 1).

Permalink to Comment

30. MattF on April 17, 2013 6:54 AM writes...

Just to note that, according to Wikipedia, the total mass of the observable universe is on the order of 10^53 kilograms. So, there are probably enough atoms in the observable universe to make a milligram or two of each of your 10^60 compounds... but it's a close call.

Permalink to Comment

31. EdC on April 18, 2013 7:53 AM writes...

Obviously, brute force and ignorance won't work quite yet (it may in a few years: Moore's Law), but massively parallel methods, like those used by LIGO and SETI@Home may offer a way to start exploring.

My first question is do those people in pharmaceuticals have some some heuristics for biologically useful activity. It may also be useful to filter out things like Hexanitrohexaazaisowurtzitane and some of the other lovely compounds that you refuse to work with.

My second is what sort of filters did Virshup et al use to calculate 10^60 compounds? How are they counting racemic pairs?

Permalink to Comment

32. Anonymous on April 19, 2013 5:38 AM writes...

@CVI: I've never seen that kind of ring structure before.

CII and CIII are about the craziest rings I've seen.

DI is interesting, I can't remember hearing of something with a triangle being synthesized.

... And is that an Actinium in CI?

Permalink to Comment

33. Yazeran on April 19, 2013 10:09 AM writes...

Well cyclopropane does exist, however making one with an nitrogen at one apex will likely not be easy or add to the stability, and then binding that nitrogen to another nitrogen is likely not helping either... (I'm not a chemist, so I may be wrong though)


Permalink to Comment

34. Anonymous on April 22, 2013 9:44 AM writes...

@Yazeran: Apparently, it's called an aziridine group, and it does exist. The Wikipedia article even gives synthesis pathways.

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry