I wrote last year about Foldit, a collaborative effort to work on protein structure problems that's been structured as an open-access game. Now the team is back with another report on how the project is going, and it's interesting stuff. The headlines have generally taken the "Computer Gamers Solve Incredible Protein Problem That Baffled Scientists!" line, but that's not exactly the full story.
The Foldit collaboration participated in the latest iteration of a regular protein-structure prediction challenge, CASP9. And their results varied - in the category of proteins with known structural homologs, for example, they didn't perform all that well. The players, it turned out, sort of over-worked the structures, and made a lot of unnecessary changes to the peripheral parts of the proteins. Another category took on proteins that have no identified structural homologs, a much harder problem. But that had its problems, too, which illustrate both the difficulties of the Foldit approach and protein modeling in general:
For prediction problems for which there were no identifiable homologous protein structures—the CASP9 Free Modeling category—Foldit players were given the five Rosetta Server CASP9 submissions (which were publicly available to other prediction groups) as starting points, along with the Alignment Tool. . .In this Free Modeling category, some of the shortcomings of the Foldit predictions became clear. The main problem was a lack of diversity in the conformational space explored by Foldit players because the starting models were already minimized with the same Rosetta energy function used by Foldit. This made it very difficult for Foldit players to get out of these local minima, and the only way for the players to improve their Foldit scores was to make very small changes ('tunneling' to the nearest local minimum) to the starting structures. However, this tunneling did lead to one of the most spectacular successes in the CASP9 experiment.
. . .the Rosetta Server, which carried out a large-scale search for the lowest-energy structure using computing power from Rosetta@home volunteers, produced a remarkably accurate model . . . However, the server ranked this model fourth out of the five submissions. The Foldit Void Crushers team correctly selected this near-native model and further improved it by accurately moving the terminal helix, producing the best model for this target of any group and one of the best overall predictions at CASP9 . . . Thus, in a situation where one model out of several is in a near-native conformation, Foldit players can recognize it and improve it to become the best model. Unfortunately for the other Free Modeling targets, there were no similarly outstanding Rosetta Server starting models, so Foldit players simply tunneled to the nearest incorrect local minima.
In the Refinement challenge, where participants take a minimized structure and try to improve its accuracy, the Foldit players had similar problems with starting from structures that had already been minimized by the same tools that they were using. Every change tended to make things look worse. The team improved their performance by reposting one of the structures as a new challenge, this time keeping the parts that were known with confidence to be near-native, while more or less randomizing the other parts to give a greater diversity to the starting points.
And those really are some of the key problems in this work. There are an awful lot of energy minima out there, and which ones you can get to depend crucially on where you start looking. In order to get to a completely different manifold of protein structures, even ones with much better energies, you may well have to go through a zone where you look like you're ruining everything. (And most of the time, you probably are ruining everything - there's no way to know if there's a safe haven on the other side or not).
But this paper also reports the results that are getting the headlines, a structure for the Mason-Pfizer monkey retroviral protease. This is an interesting protein, because although it crystallizes readily (in several different forms), and although the structures of other retroviral proteases are known, no one has been able to solve this one from the available X-ray data. The Foldit players, however, came up with several proposals that fit the data well enough for the structure to finally fall out of the diffraction data. It does have some odd features in its protein loops, different enough from the other proteases for no one to have hit on it before.
And that really is an accomplishment, and the way it was solved (with different players building on the results of others, competing to get the best optimization scores) really is the way the Foldit is supposed to work. Their less impressive performance on the CASP9 problems, though, shows that the same protein prediction difficulties apply to Foldit players as apply to the rest of the modeling field. This isn't a magic technique, and Foldit gamers are not going to rampage through the structural biology world solving all the extant problems any time soon. But it's nothing to sneeze at, either.