Here's a new paper in PlOSOne on drug development over the past 20 years. The authors are using a large database of patents and open literature publications, and trying to draw connections between those two, and between individual drug targets and the number of compounds that have been disclosed against them. Their explanation of patents and publications is a good one:
. . .We have been unable to find any formal description of the information flow between these two document types but it can be briefly described as follows. Drug discovery project teams typically apply for patents to claim and protect the chemical space around their lead series from which clinical development candidates may be chosen. This sets the minimum time between the generation of data and its disclosure to 18 months. In practice, this is usually extended, not only by the time necessary for collating the data and drafting the application but also where strategic choices may be made to file later in the development cycle to maximise the patent term. It is also common to file separate applications for each distinct chemical series the team is progressing.
While some drug discovery operations may eschew non-patent disclosure entirely, it is nevertheless common practice (and has business advantages) for project teams to submit papers to journals that include some of the same structures and data from their patents. While the criteria for inventorship are different than for authorship, there are typically team members in-common between the two types of attribution. Journal publications may or may not identify the lead compound by linking the structure to a code name, depending on how far this may have progressed as a clinical candidate.
The time lag can vary between submitting manuscripts immediately after filing, waiting until the application has published, deferring publication until a project has been discontinued, or the code name may never be publically resolvable to a structure. A recent comparison showed that 6% of compound structures exemplified in patents were also published in journal articles. While the patterns described above will be typical for pharmaceutical and biotechnology companies, the situation in the academic sector differs in a number of respects. Universities and research institutions are publishing increasing numbers of patents for bioactive compounds but their embargo times for publication and/or upload of screening results to open repositories, such as PubChem BioAssay, are generally shorter.
There are also a couple of important factors to keep in mind during the rest of the analysis. The authors point out that their database includes a substantial number of "compounds" which are not small, drug-like molecules (these are antibodies, proteins, large natural products, and so on). (In total, from 1991 to 2010 they have about one million compounds from journal articles and nearly three million from patents). And on the "target" side of the database, there are a significant number of counterscreens included which are not drug targets as such, so it might be better to call the whole thing a compound-to-protein mapping exercise. That said, what did they find?
Here's the chart of compounds/target, by year. The peak and decline around 2005 is quite noticeable, and is corroborated by a search through the PCT patent database, which shows a plateau in pharmaceutical patents around this time (which has continued until now, by the way).
Looking at the target side of things, with those warnings above kept in mind, shows a different picture. The journal-publication side of things really has shown an increase over the last ten years, with an apparent inflection point in the early 2000s. What happened? I'd be very surprised if the answer didn't turn out to be genomics. If you want to see the most proximal effect of the human genomics frenzy from around that time, there you have it in the way that curve bends around 2001. Year-on-year, though (see the full paper for that chart), the targets mentioned in journal publications seem to have peaked in 2008 or so, and have either plateaued or actually started to come back down since then. Update: Fixed the second chart, which had been a duplicate of the first).
The authors go on to track a number of individual targets by their mentions in patents and journals, and you can certainly see a lot of rise-and-fall stories over the last 20 years. Those actual years should not be over-interpreted, though, because of the delays (mentioned above) in patenting, and the even longer delays, in some cases, for journal publication from inside pharma organizations.
So what's going on with the apparent decline in output? The authors have some ideas, as do (I'm sure) readers of this site. Some of those ideas probably overlap pretty well:
While consideration of all possible causative factors is outside the scope of this work it could be speculated that the dominant causal effect on global output is mergers and acquisition activity (M&A) among pharmaceutical companies. The consequences of this include target portfolio consolidations and the combining of screening collections. This also reduces the number of large units competing in the production of medicinal chemistry IP. A second related factor is less scientists engaged in generating output. Support for the former is provided by the deduction that NME output is directly related to the number of companies and for the latter, a report that US pharmaceutical companies are estimated to have lost 300,000 jobs since 2000. There are other plausible contributory factors where finding corroborative data is difficult but nonetheless deserve comment. Firstly, patent filing and maintenance costs will have risen at approximately the same rate as compound numbers. Therefore part of the decrease could simply be due to companies, quasi-synchronously, reducing their applications to control costs. While this happened for novel sequence filings over the period of 1995–2000, we are neither aware any of data source against which this hypothesis could be explicitly tested for chemical patenting nor any reports that might support it. Similarly, it is difficult to test the hypothesis of resource switching from “R” to “D” as a response to declining NCE approvals. Our data certainly infer the shrinking of “R” but there are no obvious metrics delineating a concomitant expansion of “D”. A third possible factor, a shift in the small-molecule:biologicals ratio in favour of the latter is supported by declared development portfolio changes in recent years but, here again, proving a causative coupling is difficult.
Causality is a real problem in big retrospectives like this. The authors, as you see, are appropriately cautious. (They also mention, as a good example, that a decline in compounds aimed at a particular target can be a signal of both success and of failure). But I'm glad that they've made the effort here. It looks like they're now analyzing the characteristics of the reported compounds with time and by target, and I look forward to seeing the results of that work.
Update: here's a lead author of the paper with more in a blog post.