Here are two papers in Angewandte Chemie on "rewiring" synthetic chemistry. Bartosz Grzybowski and co-workers at Northwestern have been modeling the landscape of synthetic organic chemistry for some time now, looking at how various reactions and families of reactions are connected. Now they're trying to use that information to design (and redesign) synthetic sequences.
This is a graph theory problem, a rather large graph theory problem, if you assign chemical structures to nods and transformations to the edges connecting them. And it quickly turns into one that is rather computationally demanding, as are all these "find the shortest path" types, but that doesn't mean that you can't run through a lot of possibilities and find a lot of things that you couldn't by eyeballing things. That's especially true when you add in the price and availability of the starting materials, as the second paper linked above does. If you're a total synthetic chemist, and you didn't feel at least a tiny chill running down your back, you probably need to think about the implications of all this again. People have been trying to automate synthetic chemistry planning since the days of E. J. Corey's LHASA program, but we're getting closer to the real deal here:
We first consider the optimization of syntheses leading to one specified target molecule. In this case, possible syntheses are examined using a recursive algorithm that back-propagates on the network starting from the target. At the first backward step, the algorithm examines all reactions leading to the target and calculates the minimum cost (given by the cost function discussed above) associated with each of them. This calculation, in turn, depends on the minimum costs of the associated reactants that may be purchased or synthesized. In this way, the cost calculation continues recursively, moving backward from the target until a critical search depth is reached (for algorithm details, see the Supporting Information, Section 2.3). Provided each branch of the synthesis is independent of the others (good approximation for individual targets, not for multiple targets), this algorithm rapidly identifies the synthetic plan which minimizes the cost criterion.
That said, how well does all this work so far? Grzybowski owns a chemical company (ProChimia), so this work examined 51 of its products to see if they could be made easily and/or more cheaply. And it looks like this optimization worked, partly by identifying new routes and partly by sending more of the syntheses through shared starting materials and intermediates. The company seems to have implemented many of the suggestions.
The other paper linked in the first paragraph is a similar exercise, but this time looking for one-pot reaction sequences. They've added filters for chemical compatibility of functional groups, reagents, and solvents (miscibility, oxidizing versus reducing conditions, sensitivity to water, acid/base reactions, hydride reagents versus protic conditions, and so on). The program tries to get around these problems, when possible, by changing the order of addition, and can also evaluate its suggestions versus the cost and commercial availability of the reagents involved.
Of course, the true value of any theoretical–chemical algorithm is in experimental validation. In principle, the method can be tested to identify one-pot reactions from among any of the possible 1.8 billion two-step sequences present within the NOC (Network of Organic Chemistry). While our algorithm has already identified over a million (and counting!) possible sequences, such randomly chosen reactions might be of no real-world interest, and so herein we chose to illustrate the performance of the method by “wiring” reaction sequences within classes of compounds that are of popular interest and/or practical importance.
They show a range of reaction sequences involving substituted quinolines and thiophenes, with many combinations of halogenation/amine displacement/Suzuki/Sonogashira reactions. None of these are particularly surprising, but it would have been quite tedious to work out all the possibilities by hand. Looking over the yields (given in the Supporting Information), it appears that in almost every case the one-pot sequences identified by the program are equal to or better than the stepwise yields (sometimes by substantial margins). It doesn't always work, though:
Having discussed the success cases, it is important to outline the pitfalls of the method. While our algorithm has so far generated over a million structurally diverse one-pot sequences, it is clearly impossible to validate all of them experimentally. Instead, we estimated the likelihood of false-positive predictions by closely inspecting about 500 predicted sequences and cross-checking them against the original research describing the constituent/individual reactions. In few percent of cases, the predicted sequences turned out to be unfeasible because the underlying chemical databases did not report, or reported incorrectly, the key reagents or reaction conditions present in the original reports. This result underscores the need for faithful translation of the literature data into chemical database content. A much less frequent source of errors (only few cases we encountered so far) is the algorithm's incomplete “knowledge” of the mechanistic details of the reactions to be wired. One illustrative example is included in the Supporting Information, Section 5, where a predicted sequence failed experimentally because of an unforeseen transformation of Lawesson's reagent into species reactive toward one of the intermediates. We recognize that there is an ongoing need to improve the filters/rules that our algorithm uses; the goal is that such improvements will ultimately render the algorithm on a par with the detailed synthetic knowledge of experienced organic chemists. . .
And you know, I don't see any reason at all why that can't happen, or why it won't. It might be this program, or one of its later versions, or someone else's software entirely, but I truly don't see how this technology can fail. Depending on the speed with which that happens, it could transform the way that synthetic chemistry is done. The software is only going to get better - every failed sequence adds to its abilities to avoid that sort of thing next time; every successful one gets a star next to it in the lookup table. Crappy reactions from the literature that don't actually work will get weeded out. The more it gets used, the more useful it becomes. Even if these papers are presenting the rosiest picture possible, I still think that we're looking at the future here.
Put all this together with the automated random-reaction-discovery work that I've blogged about, and you can picture a very different world, where reactions get discovered, validated, and entered into the synthetic armamentarium with less and less human input. You may not like that world very much - I'm not sure what I think about it myself - but it's looking more and more likely the be the world we find ourselves in.