I wrote last year about Foldit, a collaborative effort to work on protein structure problems that's been structured as an open-access game. Now the team is back with another report on how the project is going, and it's interesting stuff. The headlines have generally taken the "Computer Gamers Solve Incredible Protein Problem That Baffled Scientists!" line, but that's not exactly the full story.
The Foldit collaboration participated in the latest iteration of a regular protein-structure prediction challenge, CASP9. And their results varied - in the category of proteins with known structural homologs, for example, they didn't perform all that well. The players, it turned out, sort of over-worked the structures, and made a lot of unnecessary changes to the peripheral parts of the proteins. Another category took on proteins that have no identified structural homologs, a much harder problem. But that had its problems, too, which illustrate both the difficulties of the Foldit approach and protein modeling in general:
For prediction problems for which there were no identifiable homologous protein structures—the CASP9 Free Modeling category—Foldit players were given the five Rosetta Server CASP9 submissions (which were publicly available to other prediction groups) as starting points, along with the Alignment Tool. . .In this Free Modeling category, some of the shortcomings of the Foldit predictions became clear. The main problem was a lack of diversity in the conformational space explored by Foldit players because the starting models were already minimized with the same Rosetta energy function used by Foldit. This made it very difficult for Foldit players to get out of these local minima, and the only way for the players to improve their Foldit scores was to make very small changes ('tunneling' to the nearest local minimum) to the starting structures. However, this tunneling did lead to one of the most spectacular successes in the CASP9 experiment.
. . .the Rosetta Server, which carried out a large-scale search for the lowest-energy structure using computing power from Rosetta@home volunteers, produced a remarkably accurate model . . . However, the server ranked this model fourth out of the five submissions. The Foldit Void Crushers team correctly selected this near-native model and further improved it by accurately moving the terminal helix, producing the best model for this target of any group and one of the best overall predictions at CASP9 . . . Thus, in a situation where one model out of several is in a near-native conformation, Foldit players can recognize it and improve it to become the best model. Unfortunately for the other Free Modeling targets, there were no similarly outstanding Rosetta Server starting models, so Foldit players simply tunneled to the nearest incorrect local minima.
In the Refinement challenge, where participants take a minimized structure and try to improve its accuracy, the Foldit players had similar problems with starting from structures that had already been minimized by the same tools that they were using. Every change tended to make things look worse. The team improved their performance by reposting one of the structures as a new challenge, this time keeping the parts that were known with confidence to be near-native, while more or less randomizing the other parts to give a greater diversity to the starting points.
And those really are some of the key problems in this work. There are an awful lot of energy minima out there, and which ones you can get to depend crucially on where you start looking. In order to get to a completely different manifold of protein structures, even ones with much better energies, you may well have to go through a zone where you look like you're ruining everything. (And most of the time, you probably are ruining everything - there's no way to know if there's a safe haven on the other side or not).
But this paper also reports the results that are getting the headlines, a structure for the Mason-Pfizer monkey retroviral protease. This is an interesting protein, because although it crystallizes readily (in several different forms), and although the structures of other retroviral proteases are known, no one has been able to solve this one from the available X-ray data. The Foldit players, however, came up with several proposals that fit the data well enough for the structure to finally fall out of the diffraction data. It does have some odd features in its protein loops, different enough from the other proteases for no one to have hit on it before.
And that really is an accomplishment, and the way it was solved (with different players building on the results of others, competing to get the best optimization scores) really is the way the Foldit is supposed to work. Their less impressive performance on the CASP9 problems, though, shows that the same protein prediction difficulties apply to Foldit players as apply to the rest of the modeling field. This isn't a magic technique, and Foldit gamers are not going to rampage through the structural biology world solving all the extant problems any time soon. But it's nothing to sneeze at, either.
1. gwern on September 20, 2011 9:54 AM writes...
Paper: http://www.cs.washington.edu/homes/zoran/NSMBfoldit-2011.pdf
Permalink to Comment2. RB Woodweird on September 20, 2011 10:19 AM writes...
Can someone please explain in very very simple terms how the "right" answer was obtained when the right answer was supposedly unknown?
Permalink to Comment3. Steve Sweet on September 20, 2011 10:42 AM writes...
@2:
Permalink to CommentAssuming you're referring to CASP, the competition uses proteins for which the structure has just been experimentally determined, but not yet published.
4. Tom Womack on September 20, 2011 10:59 AM writes...
If I read the Nature paper correctly, the monkey retrovirus wasn't part of CASP. But there were structure factors known for it (IE people had run the X-ray diffraction experiment), and there's a technique called 'molecular replacement' which takes a model and some structure factors, and tells you whether the model is consistent with the structure factors.
The structure doesn't come out from PDB for a few hours, so I haven't looked at it; a 1.63A dataset which can't be solved by molecular replacement from known proteins is a very frustrating object, and I wonder what the properties of the protease that keep it from binding metals and being phased that way are.
Permalink to Comment5. O-Chemist on September 20, 2011 11:06 AM writes...
Two questions from a organic chemist to the experts
a)is polymorphism an issue? I mean: maybe lowest energy conformations just did not crystallize
b) do packing effects have a measurable influence?
Cheers
Permalink to Comment6. MIMD on September 20, 2011 11:44 AM writes...
Shades of Stargate Universe!
Permalink to Comment7. leftscienceawhileago on September 20, 2011 11:56 AM writes...
RB Woodweird,
To solve a structure with molecular replacement, you need to bootstrap with a model.
Homologs can work, but MR is sensitive to the initial model (I like to think of it as a trying to orient a board with a fine array of pegs with another board that has a fine array of holes with your eyes closed. You'll have to search a large rotational and translational space...but when it's right it "snaps" in).
You can further confirm your correctness by seeing how the structure refines (tiny adjustments of atoms) and cross validate your model against your observed X-Ray reflections (you leave a few observations out and see if you are doing a good job of predicting out of sample observations as you refine your model parameters).
In this case, players came up with good models as starting points to solve the structure via MR. I suppose the other phasing attempts had failed (though I wouldn't doubt if someone would have gotten something like heavy atoms to work eventually).
Permalink to Comment8. KwadGuy on September 20, 2011 1:13 PM writes...
The summary is this:
The idea that we could jump-start the solution of an xray crystallographic structure using molecular replacement and a modeled structure is fairly old at this point. But this is the, to my knowledge, the first time where it has actually worked in the real world, for a real set of data for which other methods had failed.
That's quite an interesting story.
However, there is little--VERY little--in this story to suggest that this is yet ready for prime time, or to even suggest we might expect such success again any time soon. It seems that the FoldIt guys got lucky--good enough prediction with just the right system.
I expect you'll see more people trying this in the wake of this announcement. But I doubt you will see a long string of new successes.
This may perk up some ears toward tangential approaches, however. For example, it may encourage additional efforts to combine NMR with Xray, where the NMR data is used to create a low resolution model which then is used with molecular replacement to solve the crystal structure--stuff like that. And it may heat up the already kind of hot SAX (small angle Xray scattering) approach.
Permalink to Comment9. Curious Wavefunction on September 20, 2011 1:24 PM writes...
O-chemist: Do packing effects have a measurable influence?
Yes they can and sometimes they can be identified. For instance check out the following from the same group:
"Alternate states of proteins revealed by detailed energy landscape mapping."
Tyka MD, Keedy DA, André I, Dimaio F, Song Y, Richardson DC, Richardson JS, Baker D.
J Mol Biol. 2011, 405(2), 607-18.
It's worth reading for anyone interested in modeling proteins.
Permalink to Comment10. Anonymous on September 20, 2011 4:59 PM writes...
Now that I have some free time (ie: layoff from big pharma) I think I'm going to dust off my old Playstation 2 and get cracking! I may want to invest my severence pack on some updated stuff though! LOL
Permalink to Comment11. Anonymous BMS Researcher on September 20, 2011 8:30 PM writes...
Local minima are the bane of any modeler's existence. For readers unfamiliar with how X-ray diffraction works, solving structures is an example if what mathematicians call an inverse problem: given a good structural model one can predict the diffraction pattern and compare that with actual data, but going the other way is like going from guacamole to the starting ingredients.
Another area of pharma research where local minima can be lots of fun is SAR (structure-activity relationships).
Permalink to Comment12. luysii on September 20, 2011 8:52 PM writes...
OK, but it's time to give the modelers something that's pure but hasn't been crystallized (perhaps because it is impossible to do so) and see what they do with it. The analogy is the glass eye fakeout given to medical students to keep them honest. For details see
http://luysii.wordpress.com/2009/11/29/time-for-the-glass-eye-test-to-be-inserted-into-casp/
Permalink to Comment13. cliffintokyo on September 20, 2011 9:28 PM writes...
@11: Don't get me started on SAR local minima [roadblock maxima]
Permalink to CommentFOLDIT seems to be an excellently conceived project, and just at a time when increased collaboration is being touted.
I hope that Kwad @8 is wrong and there are a few more successes soon, to help build momentum.
14. Anonymous Academic on September 21, 2011 6:37 AM writes...
The idea that we could jump-start the solution of an xray crystallographic structure using molecular replacement and a modeled structure is fairly old at this point. But this is the, to my knowledge, the first time where it has actually worked in the real world, for a real set of data for which other methods had failed.
Partly correct. There have been a few other successful attempts to phase X-ray data with ab initio models in the past - the Baker lab has published a couple of papers on this (including one in Nature in 2007), and a few other labs are doing large-scale studies on exactly how often this works. In these cases the structures had already been solved, but it's a very useful way to benchmark the accuracy of blind predictions, and I expect it will become a central feature of future CASP competitions.
A more extreme method involves molecular replacement using random helical fragments, combined with density modification and model-building - basically applying computational brute-force. This was done by a group in Spain and published in 2009 in Nature Methods; they call their method "Arcimboldo" (after a famous painter). I've definitely seen a couple of examples presented where it was used to phase genuinely unknown structures. (I've also seen simple helical models - protein or RNA - used to phase simple helical structures, which is a nice trick but less technically impressive.)
Permalink to Comment15. Anonymous Academic on September 21, 2011 6:41 AM writes...
[Corante seems determined to truncate my comments. Lame!]
The real problem with these methods (aside from substantial computational needs) is that they're limited to certain classes of proteins: for Rosetta prediction, small (ab initio modeling used to phase the kind of crystal structures that end up as featured articles in Science or Nature for a very long time. It may not matter in the end, since as the number of known structures increases, the likelihood of molecular replacement working will keep getting better - but it never hurts to have extra tricks to try on tough cases.
Permalink to Comment16. Titon Jor on September 21, 2011 9:25 PM writes...
Ah, Foldit. Responsible for lowering my Erdos number to 4.
Good times.
Permalink to Comment