Corante

About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: derekb.lowe@gmail.com Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Emolecules
ChemSpider
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
PubChem
Not Voodoo
DailyMed
Druglib
Clinicaltrials.gov

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
Kilomentor
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
ChemBark
Realizations in Biostatistics
Chemjobber
Pharmalot
ChemSpider Blog
Pharmagossip
Med-Chemist
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
SimBioSys
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Business|Bytes|Genes|Molecules
Eye on FDA
Chemical Forums
Depth-First
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa


Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
FuturePundit
Aetiology
Gene Expression (I)
Gene Expression (II)
Sciencebase
Pharyngula
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net


Medical Blogs
DB's Medical Rants
Science-Based Medicine
GruntDoc
Respectful Insolence
Diabetes Mine


Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem


Politics / Current Events
Virginia Postrel
Instapundit
Belmont Club
Mickey Kaus


Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Cancer Drugs: Value for the Money? | Main | Cuts at Bristol-Myers Squibb »

July 31, 2012

Synthetic Chemistry: The Rise of the Algorithms

Email This Entry

Posted by Derek

Here are two papers in Angewandte Chemie on "rewiring" synthetic chemistry. Bartosz Grzybowski and co-workers at Northwestern have been modeling the landscape of synthetic organic chemistry for some time now, looking at how various reactions and families of reactions are connected. Now they're trying to use that information to design (and redesign) synthetic sequences.

This is a graph theory problem, a rather large graph theory problem, if you assign chemical structures to nods and transformations to the edges connecting them. And it quickly turns into one that is rather computationally demanding, as are all these "find the shortest path" types, but that doesn't mean that you can't run through a lot of possibilities and find a lot of things that you couldn't by eyeballing things. That's especially true when you add in the price and availability of the starting materials, as the second paper linked above does. If you're a total synthetic chemist, and you didn't feel at least a tiny chill running down your back, you probably need to think about the implications of all this again. People have been trying to automate synthetic chemistry planning since the days of E. J. Corey's LHASA program, but we're getting closer to the real deal here:

We first consider the optimization of syntheses leading to one specified target molecule. In this case, possible syntheses are examined using a recursive algorithm that back-propagates on the network starting from the target. At the first backward step, the algorithm examines all reactions leading to the target and calculates the minimum cost (given by the cost function discussed above) associated with each of them. This calculation, in turn, depends on the minimum costs of the associated reactants that may be purchased or synthesized. In this way, the cost calculation continues recursively, moving backward from the target until a critical search depth is reached (for algorithm details, see the Supporting Information, Section 2.3). Provided each branch of the synthesis is independent of the others (good approximation for individual targets, not for multiple targets), this algorithm rapidly identifies the synthetic plan which minimizes the cost criterion.

That said, how well does all this work so far? Grzybowski owns a chemical company (ProChimia), so this work examined 51 of its products to see if they could be made easily and/or more cheaply. And it looks like this optimization worked, partly by identifying new routes and partly by sending more of the syntheses through shared starting materials and intermediates. The company seems to have implemented many of the suggestions.

The other paper linked in the first paragraph is a similar exercise, but this time looking for one-pot reaction sequences. They've added filters for chemical compatibility of functional groups, reagents, and solvents (miscibility, oxidizing versus reducing conditions, sensitivity to water, acid/base reactions, hydride reagents versus protic conditions, and so on). The program tries to get around these problems, when possible, by changing the order of addition, and can also evaluate its suggestions versus the cost and commercial availability of the reagents involved.

Of course, the true value of any theoretical–chemical algorithm is in experimental validation. In principle, the method can be tested to identify one-pot reactions from among any of the possible 1.8 billion two-step sequences present within the NOC (Network of Organic Chemistry). While our algorithm has already identified over a million (and counting!) possible sequences, such randomly chosen reactions might be of no real-world interest, and so herein we chose to illustrate the performance of the method by “wiring” reaction sequences within classes of compounds that are of popular interest and/or practical importance.

They show a range of reaction sequences involving substituted quinolines and thiophenes, with many combinations of halogenation/amine displacement/Suzuki/Sonogashira reactions. None of these are particularly surprising, but it would have been quite tedious to work out all the possibilities by hand. Looking over the yields (given in the Supporting Information), it appears that in almost every case the one-pot sequences identified by the program are equal to or better than the stepwise yields (sometimes by substantial margins). It doesn't always work, though:

Having discussed the success cases, it is important to outline the pitfalls of the method. While our algorithm has so far generated over a million structurally diverse one-pot sequences, it is clearly impossible to validate all of them experimentally. Instead, we estimated the likelihood of false-positive predictions by closely inspecting about 500 predicted sequences and cross-checking them against the original research describing the constituent/individual reactions. In few percent of cases, the predicted sequences turned out to be unfeasible because the underlying chemical databases did not report, or reported incorrectly, the key reagents or reaction conditions present in the original reports. This result underscores the need for faithful translation of the literature data into chemical database content. A much less frequent source of errors (only few cases we encountered so far) is the algorithm's incomplete “knowledge” of the mechanistic details of the reactions to be wired. One illustrative example is included in the Supporting Information, Section 5, where a predicted sequence failed experimentally because of an unforeseen transformation of Lawesson's reagent into species reactive toward one of the intermediates. We recognize that there is an ongoing need to improve the filters/rules that our algorithm uses; the goal is that such improvements will ultimately render the algorithm on a par with the detailed synthetic knowledge of experienced organic chemists. . .

And you know, I don't see any reason at all why that can't happen, or why it won't. It might be this program, or one of its later versions, or someone else's software entirely, but I truly don't see how this technology can fail. Depending on the speed with which that happens, it could transform the way that synthetic chemistry is done. The software is only going to get better - every failed sequence adds to its abilities to avoid that sort of thing next time; every successful one gets a star next to it in the lookup table. Crappy reactions from the literature that don't actually work will get weeded out. The more it gets used, the more useful it becomes. Even if these papers are presenting the rosiest picture possible, I still think that we're looking at the future here.

Put all this together with the automated random-reaction-discovery work that I've blogged about, and you can picture a very different world, where reactions get discovered, validated, and entered into the synthetic armamentarium with less and less human input. You may not like that world very much - I'm not sure what I think about it myself - but it's looking more and more likely the be the world we find ourselves in.

Comments (28) + TrackBacks (0) | Category: Chemical News


COMMENTS

1. Puff the Mutant Dragon on July 31, 2012 8:35 AM writes...

Never send a human to do a machine's job...

Permalink to Comment

2. processchemist on July 31, 2012 8:42 AM writes...

I heavyly relied on (reasoned) computational methods to optimize reactions/processes (DOE) in the last few years, and I can tell you that people of little knowledge about DOE approach with the aid of black box style software published opinable papers... in many kind of computational approaches control by a skilled operator is all, to avoid nonsense or obvious results.
I find interesting the second paper for a couple of reasons: in process chemistry one pot reactions and telescoping are solutions often used (and investigated any time). The examples reported are a bit obvious (you can find tons of one pot or telescoped reactions in OPRD) but here a DOE approach would require many experiments (with discrete parameters for every reactant) and the capability of prevision of the algorythm seems good. It would be nice to see it crunch less simple targets.

Permalink to Comment

3. HAL 9000 on July 31, 2012 10:20 AM writes...

I'm sorry, Dave, I can't let you do that reaction.

Permalink to Comment

4. NoDrugsNoJobs on July 31, 2012 10:57 AM writes...

If one analogized planning a complex synthetic scheme to a chess match where several moves are planned out in advance, then a computer clearly can do quite well. The difference here is the rather huge additional challenge of the underlying assumption/information within each transformation. With chess, the particular piece, its movement and the other pieces and their movement are the only variables and they can be accounted for with 100% accuracy. However, where so much more uncertainty enters in, there is where the art and intuition and personal experience begin to play a role. It seems an interesting idea but unlike chess, will be limited by the quality of information going into it. This means the best reaction program will not beat the best synthetic chemist but will certainly be a powerful tool in his/her aresenal.

Permalink to Comment

5. Josh on July 31, 2012 11:46 AM writes...

@3
Just plain hysterical!
Made my day

Permalink to Comment

6. NCharles on July 31, 2012 12:18 PM writes...

The word 'repertoire' comes more to mind for me, but I have to admit that it's the first time I have every seen the word 'armamentarium' used.

Permalink to Comment

7. ech on July 31, 2012 12:24 PM writes...

These kinds of problems have been of interest to the AI community for a long time, and there are a number of techniques to attack them. Unless you use heristics to narrow the scope, the algorithms are all NP-complete, meaning that they explode computationally as the number of nodes and edges gets large. Fortunately, the computing power now available is available to attack larger and larger versions of these problems. Quantum computers might help some in the long term.

Even if you have to dedicate a $1000 node in a server farm for a month to optimize a reaction, if it saves quite a bit over the life of a compound, that's still a win.

ObGetOffMyLawn Comment: I was reading a paper on performance of an adsorption reaction that talked about how it took a PC a few hours to simulate a 24 hour reaction run, and how this was a really long time. Oh Yeah? I was doing research in the 80s that took five workstations in parallel all night to do one simulation run. (Uphill both ways, in the snow, @ 100 degrees.)

Permalink to Comment

8. Anonymous on July 31, 2012 12:25 PM writes...

Big yawn; computers can create paint-by-numbers slock art, but they will never evolve into a Dali or Picasso. The same is true with organic synthesis. Need paint-by-numbers organic synthesis turn the chore over to the machines. If you want art leave it to the humans.

Permalink to Comment

9. JC on July 31, 2012 12:31 PM writes...

I, for one, look forward to our Synthesis Robot Overlords.

Permalink to Comment

10. AndrewD on July 31, 2012 1:35 PM writes...

@9, JC
I thought that was Big Pharma managment.

Permalink to Comment

11. Phil on July 31, 2012 1:52 PM writes...

@8.

Following your analogy, industrial syntheses don't need to be Picassos. In fact, they are usually Thomas Kinkade prints. They would be happy hanging in a dentist's office. If it gets the job done, perfect.

Permalink to Comment

12. DCRogers on July 31, 2012 1:52 PM writes...

"This result underscores the need for faithful translation of the literature data into chemical database content."

Early retrosynthetic programs suffered mightily from this -- the results were only as good as the quality of the retrosynthetic transforms the program knew.

I recall a quote by someone (Al Long?), something to the effect that only E.J.Corey himself could truly write a good transform. Given his many responsibilities, I doubt he spent much actual time on this!

(As an aside, the other groundbreaking early retrosynthetic program was from Todd Wipke's SECS program at UCSC... not sure what the state of that effort is now.)

Permalink to Comment

13. Tokamak on July 31, 2012 1:53 PM writes...

When everything is automated, even repair of the machines themselves, and nobody has to do anything, how will we, as a society, distribute wealth?

Permalink to Comment

14. David Formerly Known as a Chemist on July 31, 2012 2:15 PM writes...

This will undoubtedly lead to the Chinese-version of the "In The Pipeline" blog wherein Chinese chemists complain how all their jobs are being taken by low-cost software.

Permalink to Comment

15. John Wayne on July 31, 2012 3:00 PM writes...

@3 and 14: I laughed out loud twice while reading the comments for this one topic; new record :)

Permalink to Comment

16. Am I Lloyd peptide on July 31, 2012 3:10 PM writes...

Corey blew it by making his program prohibitively expensive and virtually inaccessible to everyone. He ignored the now well-established the fact that the most successful computational techniques are cheap or free. Hopefully the Northwestern group will be cognizant of this fact and will make their program open-source, available for everyone to test and refine.

Permalink to Comment

17. oldstang on July 31, 2012 4:01 PM writes...

As any process chemist can tell you, the hard part of executing a synthesis is the isolation of products. I don't forsee a time when software can predict that. You can only go so far when the last step of your procedure is "load the reaction mixture onto the CombiFlash and elute with EtOAc."

Permalink to Comment

18. ech on July 31, 2012 4:27 PM writes...

@13: For two differing fictional perspectives on wealth distribution in automated societies, see:
- Kurt Vonnegut's "Player Piano"
- Ian Banks' "Culture" novel series. Mostly standalone novels, I recommend "Consider Phlebas" as a good place to start

Permalink to Comment

19. Luddite on July 31, 2012 5:28 PM writes...

One major problem with this is that the reactions that don't work are generally not published, hence the algorithm can not know about many of the conflicts that will exist. This is a problem for the humans too of course but to a far lesser extent I would guess.

Permalink to Comment

20. MoMo on July 31, 2012 8:44 PM writes...

This is leading to a waste of time, like combinatorial chemistry did back in the 90's.

Why not subject every product of every reaction to every reaction? Same result.

Now get back to work, all of you.

Permalink to Comment

21. I, Robot on August 1, 2012 4:42 AM writes...

@12: Hendrickson at Brandeis was another early player in the computer aided organic synthesis (CAOS) game. See syngen2.chem.brandeis.edu

His program was Syngen (Synthesis generator; 1978ish; same time as Wipke's SECS). It was retrosynthetic; it linked to a catalog of starting materials to rank availability and estimate and rank cost; it linked to a reaction database to estimate yields for proposed steps and rank them.

There are or were a bunch of other CAOS programs out there.

There is also a link at syngen2.chem.brandeis.edu to Webreactions, a nice little reaction database search engine. It's not SciFinder, but it's fast, simple and FREE.

Permalink to Comment

22. Sundowner on August 1, 2012 5:47 AM writes...

Having played with computers for more than 20 years, I love the idea. Being a synthetic chemist, I hate the idea, though it would make my work much easier. Or would put me out for work for the design.

The main problem I see here is the reliability of the reactions database. Because honestly, everybody knows that a lot of the info contained there is not exactly reliable...

Permalink to Comment

23. MolecularGeek on August 1, 2012 8:04 AM writes...

@13:
Obviously, the wealth will go to the people who know how to make the machines do what we want them to and repair them when something goes wrong. In other words, the geek shall inherit the earth. *duck*

MG

Permalink to Comment

24. Paul on August 1, 2012 1:01 PM writes...

Shouldn't it be possible to automate validation of the transformations in the database?

Permalink to Comment

25. Design Monkey on August 2, 2012 1:53 PM writes...

@17. oldstang

Isolation in the typical research lab already go that way.

Sheesh, the young kids nowadays, they don't have any crystallization skills, and nervously blink, when you suggest them to do a fractional vacuum distillation.

Permalink to Comment

26. Kaleberg on August 4, 2012 10:54 PM writes...

Over the past three or four decades an awful lot of fields have been rebuilt around a piece of software. Civil engineers got COGO back in the 60s. Circuit designers got SPICE in the 80s. Mathematicians got Mathematica or MATLAB. Mechanical engineers got one of several finite element analysis packages. It's high time chemists moved into the 20th century.

So, expect a bunch of really awful software that seems to have a good idea or two but is basically unusable. Then look for a few packages that are sort of useful, but ridiculously expensive. Then come the first actually useful commercial products that aren't insanely priced and the first almost useful open software versions with crappy databases. If chemistry is like other fields, it's going to be awful, but in ten or twenty years, this kind of software will be pervasive.

Permalink to Comment

27. Anonymous on November 19, 2012 1:42 PM writes...

the only software a chemist need is chemdraw

Permalink to Comment

28. benzyme on January 2, 2014 7:08 PM writes...

I cant believe I am reading such negative posts. Computers are going to make chemistry into information technology. You people think our minds can comprehend literally thousands of distinctive rules ? Computers do that every day, without single mistake. They can also predict yeilds, reaction speeds and equlibria, required temperatures, best catalysts and other usefull things. Synthesis is braching problem, with many routes to the desired product. If you want optimalization - that is to pick the best possible route to the product, you have to go through all routes and compare them with each other. Good luck human brain. And yes, you people who studied organic chemistry and r mechanisms, you wasted A LOT of time. To design new synthesis route, you chemists used some memorized algorithms. Well it turns out, computers are better in algorithms then chemists. But sure, use drawing boards instead LOL

Permalink to Comment

POST A COMMENT




Remember Me?



EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
Pharma and Ebola
Lilly Steps In for AstraZeneca's Secretase Inhibitor
Update on Alnylam (And the Direction of Things to Come)
There Must Have Been Multiple Chances to Catch This
Weirdly, Tramadol Is Not a Natural Product After All
Thiola, Retrophin, Martin Shkrell, Reddit, and More
The Most Unconscionable Drug Price Hike I Have Yet Seen
Clinical Trial Fraud