Here are two papers in Angewandte Chemie on "rewiring" synthetic chemistry. Bartosz Grzybowski and co-workers at Northwestern have been modeling the landscape of synthetic organic chemistry for some time now, looking at how various reactions and families of reactions are connected. Now they're trying to use that information to design (and redesign) synthetic sequences.
This is a graph theory problem, a rather large graph theory problem, if you assign chemical structures to nods and transformations to the edges connecting them. And it quickly turns into one that is rather computationally demanding, as are all these "find the shortest path" types, but that doesn't mean that you can't run through a lot of possibilities and find a lot of things that you couldn't by eyeballing things. That's especially true when you add in the price and availability of the starting materials, as the second paper linked above does. If you're a total synthetic chemist, and you didn't feel at least a tiny chill running down your back, you probably need to think about the implications of all this again. People have been trying to automate synthetic chemistry planning since the days of E. J. Corey's LHASA program, but we're getting closer to the real deal here:
We first consider the optimization of syntheses leading to one specified target molecule. In this case, possible syntheses are examined using a recursive algorithm that back-propagates on the network starting from the target. At the first backward step, the algorithm examines all reactions leading to the target and calculates the minimum cost (given by the cost function discussed above) associated with each of them. This calculation, in turn, depends on the minimum costs of the associated reactants that may be purchased or synthesized. In this way, the cost calculation continues recursively, moving backward from the target until a critical search depth is reached (for algorithm details, see the Supporting Information, Section 2.3). Provided each branch of the synthesis is independent of the others (good approximation for individual targets, not for multiple targets), this algorithm rapidly identifies the synthetic plan which minimizes the cost criterion.
That said, how well does all this work so far? Grzybowski owns a chemical company (ProChimia), so this work examined 51 of its products to see if they could be made easily and/or more cheaply. And it looks like this optimization worked, partly by identifying new routes and partly by sending more of the syntheses through shared starting materials and intermediates. The company seems to have implemented many of the suggestions.
The other paper linked in the first paragraph is a similar exercise, but this time looking for one-pot reaction sequences. They've added filters for chemical compatibility of functional groups, reagents, and solvents (miscibility, oxidizing versus reducing conditions, sensitivity to water, acid/base reactions, hydride reagents versus protic conditions, and so on). The program tries to get around these problems, when possible, by changing the order of addition, and can also evaluate its suggestions versus the cost and commercial availability of the reagents involved.
Of course, the true value of any theoretical–chemical algorithm is in experimental validation. In principle, the method can be tested to identify one-pot reactions from among any of the possible 1.8 billion two-step sequences present within the NOC (Network of Organic Chemistry). While our algorithm has already identified over a million (and counting!) possible sequences, such randomly chosen reactions might be of no real-world interest, and so herein we chose to illustrate the performance of the method by “wiring” reaction sequences within classes of compounds that are of popular interest and/or practical importance.
They show a range of reaction sequences involving substituted quinolines and thiophenes, with many combinations of halogenation/amine displacement/Suzuki/Sonogashira reactions. None of these are particularly surprising, but it would have been quite tedious to work out all the possibilities by hand. Looking over the yields (given in the Supporting Information), it appears that in almost every case the one-pot sequences identified by the program are equal to or better than the stepwise yields (sometimes by substantial margins). It doesn't always work, though:
Having discussed the success cases, it is important to outline the pitfalls of the method. While our algorithm has so far generated over a million structurally diverse one-pot sequences, it is clearly impossible to validate all of them experimentally. Instead, we estimated the likelihood of false-positive predictions by closely inspecting about 500 predicted sequences and cross-checking them against the original research describing the constituent/individual reactions. In few percent of cases, the predicted sequences turned out to be unfeasible because the underlying chemical databases did not report, or reported incorrectly, the key reagents or reaction conditions present in the original reports. This result underscores the need for faithful translation of the literature data into chemical database content. A much less frequent source of errors (only few cases we encountered so far) is the algorithm's incomplete “knowledge” of the mechanistic details of the reactions to be wired. One illustrative example is included in the Supporting Information, Section 5, where a predicted sequence failed experimentally because of an unforeseen transformation of Lawesson's reagent into species reactive toward one of the intermediates. We recognize that there is an ongoing need to improve the filters/rules that our algorithm uses; the goal is that such improvements will ultimately render the algorithm on a par with the detailed synthetic knowledge of experienced organic chemists. . .
And you know, I don't see any reason at all why that can't happen, or why it won't. It might be this program, or one of its later versions, or someone else's software entirely, but I truly don't see how this technology can fail. Depending on the speed with which that happens, it could transform the way that synthetic chemistry is done. The software is only going to get better - every failed sequence adds to its abilities to avoid that sort of thing next time; every successful one gets a star next to it in the lookup table. Crappy reactions from the literature that don't actually work will get weeded out. The more it gets used, the more useful it becomes. Even if these papers are presenting the rosiest picture possible, I still think that we're looking at the future here.
Put all this together with the automated random-reaction-discovery work that I've blogged about, and you can picture a very different world, where reactions get discovered, validated, and entered into the synthetic armamentarium with less and less human input. You may not like that world very much - I'm not sure what I think about it myself - but it's looking more and more likely the be the world we find ourselves in.
1. Puff the Mutant Dragon on July 31, 2012 8:35 AM writes...
Never send a human to do a machine's job...
Permalink to Comment2. processchemist on July 31, 2012 8:42 AM writes...
I heavyly relied on (reasoned) computational methods to optimize reactions/processes (DOE) in the last few years, and I can tell you that people of little knowledge about DOE approach with the aid of black box style software published opinable papers... in many kind of computational approaches control by a skilled operator is all, to avoid nonsense or obvious results.
Permalink to CommentI find interesting the second paper for a couple of reasons: in process chemistry one pot reactions and telescoping are solutions often used (and investigated any time). The examples reported are a bit obvious (you can find tons of one pot or telescoped reactions in OPRD) but here a DOE approach would require many experiments (with discrete parameters for every reactant) and the capability of prevision of the algorythm seems good. It would be nice to see it crunch less simple targets.
3. HAL 9000 on July 31, 2012 10:20 AM writes...
I'm sorry, Dave, I can't let you do that reaction.
Permalink to Comment4. NoDrugsNoJobs on July 31, 2012 10:57 AM writes...
If one analogized planning a complex synthetic scheme to a chess match where several moves are planned out in advance, then a computer clearly can do quite well. The difference here is the rather huge additional challenge of the underlying assumption/information within each transformation. With chess, the particular piece, its movement and the other pieces and their movement are the only variables and they can be accounted for with 100% accuracy. However, where so much more uncertainty enters in, there is where the art and intuition and personal experience begin to play a role. It seems an interesting idea but unlike chess, will be limited by the quality of information going into it. This means the best reaction program will not beat the best synthetic chemist but will certainly be a powerful tool in his/her aresenal.
Permalink to Comment5. Josh on July 31, 2012 11:46 AM writes...
@3
Permalink to CommentJust plain hysterical!
Made my day
6. NCharles on July 31, 2012 12:18 PM writes...
The word 'repertoire' comes more to mind for me, but I have to admit that it's the first time I have every seen the word 'armamentarium' used.
Permalink to Comment7. ech on July 31, 2012 12:24 PM writes...
These kinds of problems have been of interest to the AI community for a long time, and there are a number of techniques to attack them. Unless you use heristics to narrow the scope, the algorithms are all NP-complete, meaning that they explode computationally as the number of nodes and edges gets large. Fortunately, the computing power now available is available to attack larger and larger versions of these problems. Quantum computers might help some in the long term.
Even if you have to dedicate a $1000 node in a server farm for a month to optimize a reaction, if it saves quite a bit over the life of a compound, that's still a win.
ObGetOffMyLawn Comment: I was reading a paper on performance of an adsorption reaction that talked about how it took a PC a few hours to simulate a 24 hour reaction run, and how this was a really long time. Oh Yeah? I was doing research in the 80s that took five workstations in parallel all night to do one simulation run. (Uphill both ways, in the snow, @ 100 degrees.)
Permalink to Comment8. Anonymous on July 31, 2012 12:25 PM writes...
Big yawn; computers can create paint-by-numbers slock art, but they will never evolve into a Dali or Picasso. The same is true with organic synthesis. Need paint-by-numbers organic synthesis turn the chore over to the machines. If you want art leave it to the humans.
Permalink to Comment9. JC on July 31, 2012 12:31 PM writes...
I, for one, look forward to our Synthesis Robot Overlords.
Permalink to Comment10. AndrewD on July 31, 2012 1:35 PM writes...
@9, JC
Permalink to CommentI thought that was Big Pharma managment.
11. Phil on July 31, 2012 1:52 PM writes...
@8.
Following your analogy, industrial syntheses don't need to be Picassos. In fact, they are usually Thomas Kinkade prints. They would be happy hanging in a dentist's office. If it gets the job done, perfect.
Permalink to Comment12. DCRogers on July 31, 2012 1:52 PM writes...
"This result underscores the need for faithful translation of the literature data into chemical database content."
Early retrosynthetic programs suffered mightily from this -- the results were only as good as the quality of the retrosynthetic transforms the program knew.
I recall a quote by someone (Al Long?), something to the effect that only E.J.Corey himself could truly write a good transform. Given his many responsibilities, I doubt he spent much actual time on this!
(As an aside, the other groundbreaking early retrosynthetic program was from Todd Wipke's SECS program at UCSC... not sure what the state of that effort is now.)
Permalink to Comment13. Tokamak on July 31, 2012 1:53 PM writes...
When everything is automated, even repair of the machines themselves, and nobody has to do anything, how will we, as a society, distribute wealth?
Permalink to Comment14. David Formerly Known as a Chemist on July 31, 2012 2:15 PM writes...
This will undoubtedly lead to the Chinese-version of the "In The Pipeline" blog wherein Chinese chemists complain how all their jobs are being taken by low-cost software.
Permalink to Comment15. John Wayne on July 31, 2012 3:00 PM writes...
@3 and 14: I laughed out loud twice while reading the comments for this one topic; new record :)
Permalink to Comment16. Am I Lloyd peptide on July 31, 2012 3:10 PM writes...
Corey blew it by making his program prohibitively expensive and virtually inaccessible to everyone. He ignored the now well-established the fact that the most successful computational techniques are cheap or free. Hopefully the Northwestern group will be cognizant of this fact and will make their program open-source, available for everyone to test and refine.
Permalink to Comment17. oldstang on July 31, 2012 4:01 PM writes...
As any process chemist can tell you, the hard part of executing a synthesis is the isolation of products. I don't forsee a time when software can predict that. You can only go so far when the last step of your procedure is "load the reaction mixture onto the CombiFlash and elute with EtOAc."
Permalink to Comment18. ech on July 31, 2012 4:27 PM writes...
@13: For two differing fictional perspectives on wealth distribution in automated societies, see:
Permalink to Comment- Kurt Vonnegut's "Player Piano"
- Ian Banks' "Culture" novel series. Mostly standalone novels, I recommend "Consider Phlebas" as a good place to start
19. Luddite on July 31, 2012 5:28 PM writes...
One major problem with this is that the reactions that don't work are generally not published, hence the algorithm can not know about many of the conflicts that will exist. This is a problem for the humans too of course but to a far lesser extent I would guess.
Permalink to Comment20. MoMo on July 31, 2012 8:44 PM writes...
This is leading to a waste of time, like combinatorial chemistry did back in the 90's.
Why not subject every product of every reaction to every reaction? Same result.
Now get back to work, all of you.
Permalink to Comment21. I, Robot on August 1, 2012 4:42 AM writes...
@12: Hendrickson at Brandeis was another early player in the computer aided organic synthesis (CAOS) game. See syngen2.chem.brandeis.edu
His program was Syngen (Synthesis generator; 1978ish; same time as Wipke's SECS). It was retrosynthetic; it linked to a catalog of starting materials to rank availability and estimate and rank cost; it linked to a reaction database to estimate yields for proposed steps and rank them.
There are or were a bunch of other CAOS programs out there.
There is also a link at syngen2.chem.brandeis.edu to Webreactions, a nice little reaction database search engine. It's not SciFinder, but it's fast, simple and FREE.
Permalink to Comment22. Sundowner on August 1, 2012 5:47 AM writes...
Having played with computers for more than 20 years, I love the idea. Being a synthetic chemist, I hate the idea, though it would make my work much easier. Or would put me out for work for the design.
The main problem I see here is the reliability of the reactions database. Because honestly, everybody knows that a lot of the info contained there is not exactly reliable...
Permalink to Comment23. MolecularGeek on August 1, 2012 8:04 AM writes...
@13:
Obviously, the wealth will go to the people who know how to make the machines do what we want them to and repair them when something goes wrong. In other words, the geek shall inherit the earth. *duck*
MG
Permalink to Comment24. Paul on August 1, 2012 1:01 PM writes...
Shouldn't it be possible to automate validation of the transformations in the database?
Permalink to Comment25. Design Monkey on August 2, 2012 1:53 PM writes...
@17. oldstang
Isolation in the typical research lab already go that way.
Sheesh, the young kids nowadays, they don't have any crystallization skills, and nervously blink, when you suggest them to do a fractional vacuum distillation.
Permalink to Comment26. Kaleberg on August 4, 2012 10:54 PM writes...
Over the past three or four decades an awful lot of fields have been rebuilt around a piece of software. Civil engineers got COGO back in the 60s. Circuit designers got SPICE in the 80s. Mathematicians got Mathematica or MATLAB. Mechanical engineers got one of several finite element analysis packages. It's high time chemists moved into the 20th century.
So, expect a bunch of really awful software that seems to have a good idea or two but is basically unusable. Then look for a few packages that are sort of useful, but ridiculously expensive. Then come the first actually useful commercial products that aren't insanely priced and the first almost useful open software versions with crappy databases. If chemistry is like other fields, it's going to be awful, but in ten or twenty years, this kind of software will be pervasive.
Permalink to Comment27. Anonymous on November 19, 2012 1:42 PM writes...
the only software a chemist need is chemdraw
Permalink to Comment