I've written several times about flow chemistry here, and a new paper in J. Med. Chem. prompts me to return to the subject. This, though, is the next stage in flow chemistry - more like flow med-chem:
Here, we report the application of a flow technology platform integrating the key elements of structure–activity relationship (SAR) generation to the discovery of novel Abl kinase inhibitors. The platform utilizes flow chemistry for rapid in-line synthesis, automated purification, and analysis coupled with bioassay. The combination of activity prediction using Random-Forest regression with chemical space sampling algorithms allows the construction of an activity model that refines itself after every iteration of synthesis and biological result.
Now, this is the point at which people start to get either excited or fearful. (I sometimes have trouble telling the difference, myself). We're talking about the entire early-stage optimization cycle here, and the vision is of someone topping up a bunch of solvent reservoirs, hitting a button, and leaving for the weekend in the expectation of finding a nanomolar compound waiting on Monday. I'll bet you could sell that to AstraZeneca for some serious cash, and to be fair, they're not the only ones who would bite, given a sufficiently impressive demo and slide deck.
But how close to this Lab of the Future does this work get? Digging into the paper, we have this:
Initially, this approach mirrors that of a traditional hit-to-lead program, namely, hit generation activities via, for example, high-throughput screening (HTS), other screening approaches, or prior art review. From this, the virtual chemical space of target molecules is constructed that defines the boundaries of an SAR heat map. An initial activity model is then built using data available from a screening campaign or the literature against the defined biological target. This model is used to decide which analogue is made during each iteration of synthesis and testing, and the model is updated after each individual compound assay to incorporate the new data. Typically the coupled design, synthesis, and assay times are 1–2 h per iteration.
Among the key things that already have to be in place, though, are reliable chemistry (fit to generate a wide range of structures) and some clue about where to start. Those are not givens, but they're certainly not impossible barriers, either. In this case, the team (three UK groups) is looking for BCL-Abl inhibitors, a perfectly reasonable test bed. A look through the literature suggested coupling hinge-binding motifs to DFG-loop binders through an acetylene linker, as in Ariad's ponatinib. This, while not a strategy that will earn you a big raise, is not one that's going to get you fired, either. Virtual screening around the structure, followed by eyeballing by real humans, narrowed down some possibilities for new structures. Further possibilities were suggested by looking at PDB structures of homologous binding sites and seeing what sorts of things bound to them.
So already, what we're looking at is less Automatic Lead Discovery than Automatic Patent Busting. But there's a place for that, too. Ten DFG pieces were synthesized, in Sonogashira-couplable form, and 27 hinge-binding motifs with alkynes on them were readied on the other end. Then they pressed the button and went home for the weekend. Well, not quite. They set things up to try two different optimization routines, once the compounds were synthesized, run through a column, and through the assay (all in flow). One will be familiar to anyone who's been in the drug industry for more than about five minutes, because it's called "Chase Potency". The other one, "Most Active Under Sampled", tries to even out the distributions of reactants by favoring the ones that haven't been used as often. (These strategies can also be mixed). In each case, the model was seeded with binding constants of literature structures, to get things going.
The first run, which took about 30 hours, used the "Under Sampled" algorithm to spit out 22 new compounds (there were six chemistry failures) and a corresponding SAR heat map. Another run was done with "Chase Potency" in place, generating 14 more compounds. That was followed by a combined-strategy run, which cranked out 28 more compounds (with 13 failures in synthesis). Overall, there were 90 loops through the process, producing 64 new products. The best of these were nanomolar or below.
But shouldn't they have been? The deck already has to be stacked to some degree for this technique to work at all in the present stage of development. Getting potent inhibitors from these sorts of starting points isn't impressive by itself. I think the main advantage to this is the time needed to generated the compound and the assay data. Having the synthesis, purification, and assay platform all right next to each other, with compound being pumped right from one to the other, is a much tighter loop than the usual drug discovery organization runs. The usual, if you haven't experienced it, is more like "Run the reaction. Work up the reaction. Run it through a column (or have the purification group run it through a column for you). Get your fractions. Evaporate them. Check the compound by LC/MS and NMR. Code it into the system and get it into a vial. Send it over to the assay folks for the weekly run. Wait a couple of days for the batch of data to be processed. Repeat."
The science-fictional extension of this is when we move to a wider variety of possible chemistries, and perhaps incorporate the modeling/docking into the loop as well, when it's trustworthy enough to do so. Now that would be something to see. You come back in a few days and find that the machine has unexpectedly veered off into photochemical 2+2 additions with a range of alkenes, because the Chase Potency module couldn't pass up a great cyclobutane hit that the modeling software predicted. And all while you were doing something else. And that something else, by this point, is. . .what, exactly? Food for thought.