Corante

About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: derekb.lowe@gmail.com Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Emolecules
ChemSpider
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
PubChem
Not Voodoo
DailyMed
Druglib
Clinicaltrials.gov

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
Kilomentor
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
ChemBark
Realizations in Biostatistics
Chemjobber
Pharmalot
ChemSpider Blog
Pharmagossip
Med-Chemist
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
SimBioSys
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Business|Bytes|Genes|Molecules
Eye on FDA
Chemical Forums
Depth-First
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa


Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
FuturePundit
Aetiology
Gene Expression (I)
Gene Expression (II)
Sciencebase
Pharyngula
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net


Medical Blogs
DB's Medical Rants
Science-Based Medicine
GruntDoc
Respectful Insolence
Diabetes Mine


Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem


Politics / Current Events
Virginia Postrel
Instapundit
Belmont Club
Mickey Kaus


Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Pour Encourager Les Autres | Main | European Drugs, American Drugs »

July 15, 2007

Proteomics 101

Email This Entry

Posted by Derek

Over at the entertaining culture-blog 2Blowhards, the comments to this post (on people who feel deficient in math ability) include a mention of proteomics, which prompted Michael Blowhard to say:

"Proteomics" -- even the word is scary. I wonder how people in the field are going to communicate the substance and importance of what they're up to to civilians ... A challenge, I guess."

A challenge that I'm willing to take up! It's not my exact field, of course, but close enough. I'm starting a new category for posts like this, when I (and the readership here, in the comments) try to explain some technical buzzword-laden area in language that intelligent non-scientists can profit from. So. . .proteomics.

The place to start, most likely, is where the word came from. It's a direct steal from "genomics", the study of genomes, which are the total DNA sequences of a species (or individuals of a species). Back a few years ago when the human genome was being sequenced for the first time (all the individual A T C G letters being read off), it became clear that the number of genes that humans carry around was very much on the low side of what most people expected. (The human genome, as we have it today, is a composite - the number of people in the world who have their complete genome read can be counted on one hand. That's going to change drastically in the years to come as the process gets cheaper, faster, and more useful).

The reason why people expected more genes relates to what a gene is: a stretch of DNA that's read off (transcribed) and turned into a specific protein. That's DNA's job; it's a set of coded instructions to make proteins. But, as it happens, we have a lot more different proteins than we have genes. Clearly, something more happens downstream of the DNA part of the process.

A lot of things happen, actually. Those first-made proteins get altered in all sorts of ways. The same protein can be folded into different shapes, for starters (we're just now recognizing how important a process this is in some diseases). Proteins can also be clipped into smaller ones by many different routes, and at any stage they'll be decorated with molecular tinsel like sugars and lipids and phosphates. All of those can totally change a protein's function. This gives you some idea of where all that diversity is coming from - and why sequencing the human genome, huge and necessary accomplishment though it was, was nowhere near the end of the story.

Proteins spend their time interacting with other proteins. If you think of a cell in your body as a large irregularly shaped bag, full of intricate (and somewhat squishy) 3-D jigsaw pieces which are constantly sluicing around assembling or sliding past each other, you'll have a pretty reasonable idea of what it's like in there. Any given cell will contain thousands upon thousands of different proteins, many of which are doing multiple jobs depending on the time and place. Proteomics is the attempt to understand which proteins are doing what, when, with whom, and why.

It hardly needs saying, but we're just at the very beginning of that study. We have some tools to track these interactions, and they're far better than anything people had twenty or thirty years ago, but they're still rather crude compared to what we need. Huge signaling networks get uncovered and extended, and are found to touch upon others for reasons that are unclear. All sorts of feedback loops and backup systems are sketched in, and many pathways have been missed (or, alternatively, assigned too much importance) because they only operate under certain special conditions that our assays may overemphasize or skip entirely.

This project is much harder than the deciphering of the genome, and will take much longer. But that's because it's much closer to the real-time workings of a living organism, which means that comprehension, when it comes, will be still more valuable. Really substantial sums are being spent on this stuff, along with serious brainpower and computing resources. Progress will be jerky, irregular, infuriating, and of very great interest indeed.

Comments (20) + TrackBacks (0) | Category: Pharma 101


COMMENTS

1. Anonymous BMS Researcher on July 16, 2007 6:40 AM writes...


> it became clear that the number of genes that humans carry around was very much on the low side of what most people expected.

Count me among those who were utterly stunned by this discovery -- in the late 1990s I fully expected the number of genes (by any reasonable definition, defining what is meant by "number of genes" is itself a major can of worms) would be at least 75 thousand and possibly over 100 thousand.

> the number of people in the world who have their complete genome read can be counted on one hand.

I have met one of those people and observed first hand the amazing size of his ego. When will we get the first complete genome of a human who does not have a Y chromosome?

Permalink to Comment

2. Anonymous BMS Researcher on July 16, 2007 6:50 AM writes...


Followup to my previous comment:

when attempting to explain genomics to lay folks, I often ask them to imagine printing out in hexadecimal format on a monstrous stack of paper the complete binary for any current computer operating system, then sending that back to 1957 by time machine for the brightest minds of that era to reverse engineer. Puzzling out the information-processing systems of life is at least as hard a problem and possibly harder.

Permalink to Comment

3. Submarine on July 16, 2007 7:00 AM writes...

> But, as it happens, we have a lot more different proteins than we have genes.
> Clearly, something more happens downstream of the DNA part of the process.

Agreed, but don't forget that we also have a lot more DNA than genes, and much of that DNA is not junk. It is playing a critical role in determining when and where proteins encoded by the relatively modest number of genes are expressed.

Permalink to Comment

4. Anonymous BMS Researcher on July 16, 2007 7:18 AM writes...


> much of that DNA is not junk

You are absolutely right, the non-coding part of the genome is clearly of tremendous importance.

And we are barely beginning to comprehend what sorts of things it might be doing. There are huge stretches of non-coding DNA that have been conserved for enormous amounts of evolutionary time, so clearly they must do SOMETHING important but as yet we have little clue WHAT most of these sequences do.

Permalink to Comment

5. RKN on July 16, 2007 7:19 AM writes...

Proteomics is the attempt to understand which proteins are doing what, when, with whom, and why.

Tho I agree with this, at the risk of making the matter even murkier, this is really the area of Interactomics.

One advantage of Proteomics over Genomics, at least in terms of the study of expression changes, is that proteins ("decorated" and non-) are the immediate effectors of phenotype. Changes in a message (treated vs. control) can be telling, but unimportant to phenotype unless that message is proportionally translated. I think one of the biggest challenges right now in Proteomics is the effort to measure the change in specific protein isoforms. But first we have to identify those specific isoforms, and that is anything but easy.

Permalink to Comment

6. GATC on July 16, 2007 1:10 PM writes...

I put the various "polyomics" in the same category as "systems biology", or as Josh Lederberg once said "what we used to call physiology".

So Derek, now that you are firmly in place up there in Cambridge, perhaps you could ask around and get a good definition from the Harvard crowd as to what is "chemical biology" and how that relates to what we used to call "biochemistry".

Permalink to Comment

7. Caleb on July 16, 2007 1:31 PM writes...

We have three graduate programs at the school I currently attend that are very similar: Biological Chemistry, Chemical Biology, and Medicinal Chemistry (plus a Chemical Biology "track" within the Chem Dept)! Funny thing is, most of the faculty are cross-listed so it doesn't really matter what program you're in. Generally speaking the biological chem folks don't do any synthesis and are more likely to use model organisms such as Drosophila or knockout mice, the med chem and chem bio labs have a synthetic and biological component and primarily use organisms such as E. coli and yeast.

Permalink to Comment

8. Interested Layperson on July 16, 2007 2:59 PM writes...

> Huge signaling networks get uncovered and extended...

You might want to edit the description to define "signaling networks" and "pathways". As a non-biologist who works in the industry, those leap out as jargon to me. I now have a sense of what they mean, but I remember when I had to learn it.

Nice work!

Permalink to Comment

9. Anonymous BMS Researcher on July 16, 2007 4:44 PM writes...


Here's a stab at fairly brief -- and therefore oversimplified -- definitions of "signaling networks" and "pathways."

First off, let me introduce an engineering analogy. In an engineering system there will be two main sets of wires connecting various components. One set, known as the "control circuits," mainly convey INFORMATION about the current and desired states of the system. These typically are small wires, carrying relatively low voltages and currents. A second set of wires, known as the "power circuit," are much larger because they carry POWER at higher voltages and currents to the motors and actuators that do whatever physical work the system is designed to perform. When you press the button for your desired floor in an elevator (a "lift" in the UK), you close a connection in the control circuit which sends a small amount of electrical energy to the controller telling it where you want the elevator to go. In response, the controller will actuate a contactor that sends a much larger amount of power to the motor and up the elevator goes. Sensors detect when it has reached your desired floor and send small amounts of electrical energy to the controller, which then de-energizes the contactor, and the contactor stops sending power to the motor, sends power to smaller motors that open the appropriate doors, and so forth. Before I became a biologist I was an engineer and once I actually worked on some control devices that my company sold to Otis Elevator (even though this was over 20 years ago, elevators have long service lives so probably even now some elevators are being controlled by gadgets I helped design). But I am digressing here, the point is the control circuit does not directly move the elevator, it does so by its effects on the state of the power circuit.

Of course, this distinction between "control" and "power" circuits is oversimplified because even the "control" circuits carry some energy and even the "power" circuits carry some information.

In somewhat the same fashion, biologists describe the molecular equivalents of control circuits as "signaling networks" by which various molecules (hormones and a zillion others) convey information from one part of the organism to another and/or integrate information from various sources. A "pathway" is a series of biochemical reactions that make or destroy substances needed by the organism, and they tend to consume much larger amounts of energy in their operations; these are analogous to the power circuits of electrical engineering. In general, signaling networks have their effects on the organism and its physical world through their effects on the operation of pathways. For example, when I wave a toy at my cat, her eyes send messages through various signaling networks to her brain, where other signaling networks trigger energy conversion pathways in various muscles and she chases after the toy that I am waving.

However, the distinction between a signaling network and a pathway is even less clear-cut than is the electrical engineering distinction between the control and power circuits.

Permalink to Comment

10. MTK on July 16, 2007 4:54 PM writes...

Beyond just the pure size of the proteome and the complexity of it due to the isoforms, posttranslational modifications (sorry, jargon), and multiple protein-protein interactions, there's also the fact that many of the most interesting, i.e. relevant to various disease states, proteins are probably low abundance proteins. Tough to study when there is no protein equivalent to PCR. Sensitivity is a huge issue.

Hey, at least you can semi-describe and characterize a protein by it's linear amino acid sequence. There's also only 20 naturally occurring amino acids all with the same stereochemistry. Compare that to the glycome.

Permalink to Comment

11. roadnottaken on July 16, 2007 6:40 PM writes...

exactly, MTK. I've heard it said that in a typical cell protein abundances vary over six orders of magnitude. that is an enormous challenge and i can tell you from experience that detecting your low-abundance proteins in a background of actin and tubulin etc can be very challenging.

regarding your second point, metabolomics is a much nastier beast. as you mentioned, at least one always has the ability to sequence proteins, but compared to proteins the subunits of metabolites seem almost infinite. whereas with proteomics, detection and quantitation is the major challenge, simply determining the identity of a particular metabolite can take months.

Permalink to Comment

12. RKN on July 16, 2007 7:03 PM writes...

Tough to study when there is no protein equivalent to PCR. Sensitivity is a huge issue.

The big challenge in proteomics is separation (chromatography), not amplification. If you can satisfactorily separate the proteins in a sample, and digest them properly, mass spectrometry will find the peptides. Modern mass spectrometers have attamole sensitivity, and quantitation (control vs. treated/disease) of even low abundance proteins is readily achieveable.

Permalink to Comment

13. TNC on July 16, 2007 7:28 PM writes...

There's no protein equivalent to PCR? You may have just written my next proposal! ;)

Permalink to Comment

14. SRC on July 16, 2007 7:40 PM writes...

I have a modest proposal: the next person who defines yet another risible "ome" (metabolome qualifies here) is put in a Waring blender to have his proteome/genome/gnomeome extracted - all of it. It's the biological equivalent of including "gate" as the suffix of any remotely questionable political transaction.

/curmudgeon

Permalink to Comment

15. roadnottaken on July 16, 2007 7:49 PM writes...

SRC: then what would you call the untargeted analysis of biological small-molecules? i agree that some -omics words are silly (my favorite is snake venomics) but oftentimes it's the most concise way to express a concept. if a lexical construction is useful then use it, i say. i think the defining feature of an -omic is untargeted/global measurement which is qualitatively different from the way biology used to be done (i.e. isolate then study).

Permalink to Comment

16. Anonymous BMS Researcher on July 17, 2007 6:59 AM writes...


Can any reader point us to the paper I saw a few years ago and cannot find now in which the authors say of a term they have just defined something like "this name was carefully chosen for its resistance to being given an omics suffix."?

I've tried Googling variations on such a sentence, but all I get is various -omics hits...

Permalink to Comment

17. MTK on July 17, 2007 10:09 AM writes...

TNC,

If you come up with a protein chemistry PCR equivalent, it's time to buy your tickets to Stockholm, brush off the tux, and get ready to pose with Randolph and Mortimer Duke.

Permalink to Comment

18. RNAbiologist on July 17, 2007 3:00 PM writes...

For the sake of completeness in Derek's synopsis of 'how one gene becomes multiple proteins' I think one should mention the enormous diversity resulting from alternate mRNA splicing. His original post glossed over this aspect, and it's important for the less-familiar-with-biology crowd to know that it exists.

In summary: DNA is transcribed into RNA, which is spliced - Eg: one or more piece is removed from the sequence and the remainder is shipped out of the nucleus. This RNA is then translated into a protein. The sequences removed from the transcript sometimes vary, sometimes sections are removed, sometimes they're not. Many genes have multiple sequences removed that are thousands of bases apart. No one understands how this works. What's more is that due to the triplicate nature of the codons used in translation a frameshift can result in largely different proteins sequences from a single gene. (In english: RNA is translated into protein sequence based on three base combinations of RNA. Three bases of RNA result in a single amino acid being produced. If a transcript has say 8 bases removed from the middle of the RNA, everything downstream of the 'splice site' will be translated into a totally different amino acid sequence versus the unspliced transcript.)

Anyway, this is another area where the 'genomics revolution' has fallen a bit short.

Permalink to Comment

19. Jonadab the Unsightly One on July 21, 2007 1:48 PM writes...

SRC: It could be worse. Instead of inventing words like "genomics", "proteomics", "metabolomics", and (worst, IMO) "interactomics", they could be calling the whole lot of it "Biology 2.0"

P.S.: how about "chain reactionomics"?

Permalink to Comment

20. Ian Musgrave on July 22, 2007 11:59 PM writes...

I've just come back from the International Brain Research Organisation Meeting in Melbourne, where I was exposed to the delightful concept of "pocketomics" (An J, et al., Mol Cell Proteomics. 2005 Jun;4(6):752-61), the universe of ligand binding pockets.

Some nit picks:

Back a few years ago when the human genome was being sequenced for the first time ..., it became clear that the number of genes that humans carry around was very much on the low side of what most people expected.

Actually, this isn't quite correct. There were a number of estimates, many of which were around the final figure, see Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome.

Submarine wrote:

Agreed, but don't forget that we also have a lot more DNA than genes, and much of that DNA is not junk.

It depends on what you mean by "much". About 1.2-2% codes for proteins, a similar figure codes for structural and regulatory RNA. Regulatory sequences may account for between 5% to 20% (at the most optimistic, if you take the over-hyped ENCODE project at face value). So, even at the most optimistic, 70% of the genome is doing nothing (at least 8% is broken retroviruses, around 2-5% are broken genes). You can delete great swaths of mouse non-coding DNA without effect, and you can delete huge chunks of the most highly conserved non-coding DNA in the mouse without effect either.

It doesn't necessarily mean these sequences are not functional in some way, but that any claim as to the importance of conserved non-coding sequences should be taken with a big grain of salt (and again, these sequences constitute a minor part of the overall DNA of vertebrates) until some actual function is found.

Permalink to Comment

POST A COMMENT




Remember Me?



EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
Conference in Basel
Messed-Up Clinical Studies: A First-Hand Report
Pharma and Ebola
Lilly Steps In for AstraZeneca's Secretase Inhibitor
Update on Alnylam (And the Direction of Things to Come)
There Must Have Been Multiple Chances to Catch This
Weirdly, Tramadol Is Not a Natural Product After All
Thiola, Retrophin, Martin Shkrell, Reddit, and More