Well, here’s another crack at open-source science. Stephen Friend, the previous head of Rosetta (before and after being bought by Merck), is heading out on his own to form a venture in Seattle called Sage. The idea is to bring together genomic studies from all sorts of laboratories into a common format and database, with the expectation that interesting results will emerge that couldn’t be found from just one lab’s data.
I’ll be interested to see if this does yield something worthwhile – in fact, I’ll be interested to see if it gets off the ground at all. As I’ve discussed before, the analogy with open-source software doesn’t hold up so well with most scientific research these days, since the entry barriers (facilities, equipment, and money) are significantly higher than they are in coding. Look at genomics – the cost of sequencing has been dropping, for sure, but it’s still very expensive to get into the game. That lowered cost is measured per base sequenced – today’s technology means that you sequence more bases, which means that the absolute cost hasn’t come down as much as you might think. I’m sure you can get ten-year-old equipment cheap, but it won’t let you do the kind of experiments you might want to do, at least not in the time you’ll be expected to do them in.
But even past that issue, once you get down to the many labs that can do high-level genomics (or to the even larger number that can do less extensive sequencing), the problems will be many. Sage is also going to look at gene expression levels, something that's easier to do (although we're still not in weekend-garage territory yet). Some people would say that it's a bit too easy to do: there are a lot of different techniques in this field, not all of which always yield comparable data, to put it mildly. There have been several attempts to standardize things, along with calls for more control experiments, but getting all these numbers together into a useful form will still not be trivial.
Then you've got the really hard issues: intellectual property, for one. If you do discover something by comparing all these tissues from different disease states, who gets to profit from it? Someone will want to, that's for sure, and if Sage itself isn't getting a cut, how will they keep their operation going? Once past that question (which is a whopper), and past all the operational questions, there's an even bigger one: is this approach going to tell us anything we can use at all?
At first thought, you'd figure that it has to. Gene sequences and gene expression are indeed linked to disease states, and if we're ever going to have a complete understanding of human biology, we're going to have to know how. But. . .we're an awful long way from that. Look at the money that's been poured into biomarker development by the drug industry. A reasonable amount of that has gone into gene expression studies, trying to find clear signs and correlations with disease, and it's been rough sledding.
So you can look at this two ways: you can say fine, that means that the correlations may well be there, but they're going to be hard to find, so we're going to have to pool as much data as possible to do it. Thus Sage, and good luck to them. Or the systems may be so complex that useful correlations may not even be apparent at all, at least at our current level of understanding. I'm not sure which camp I fall into, but we'll have to keep making the effort in order to find out who's right.