Introduction to Proteomics
Author and Curator: Larry H. Bernstein, MD, FCAP
We have had a considerable extended discussion of preoteins and peptides, protein sinthesis, amino acid incorporation into protein, and metabolism of carbohydrates and lipids. It is also clear that the historic practice of medicine, and the classification of biological systems has been highly dependent on the observations related to the observed phenotypical traits and disturbances of normal function that could be measured by traditional metabolic pathways for over a century.
What did we gain from the genomic revolution?
- Traceability of protein expression to a basic coded message
- The possibility of tracing disturbed cellular function to mutation related loss-of-function
- The ability to trace generational traits over long periods of time
- The promise of regenerating the enterprise of pharmacology and pharmaceutical intervention based on the silencing of or readjustment of regulated metabolic pathways to bring an adaptive rebalancing favoring extended life
What can we expect as we progress further as a result of the last two decades?
- There is a huge amount of information, as well as missing information that is necessary for adequately tackling the mastery of the life processes.
- There is a complex web of knowledge that goes beyond the genome and the one-gene one-enzyme, and the DNA-RNA-protein hypotheses that can only be realized by more full disclosure of the many metabolic control circuits involved in cellular homeostasis and adaptive control.
- The ability to come to disclosure and understanding of this cellular balancing will require the comprehensive exploration of the proteome and the active role of proteins and peptides in the functioning of all cells, and the organism.
- Proteomics will open up the discovery of new approaches to diagnostics and pharmaceutical discovery.
What about proteins? What can proteins do? What can’t they do!
- Enzymes are proteins that make sure that chemical reactions in your body take place up to a million times faster than they would without enzymes.
- Antibodies are proteins that help your immune system to fight disease.
- When you get an injury, the bleeding stops because of blood clots, thanks to the proteins fibrinogen and thrombin.
- Transport! Some proteins carry vitamins ot hormones from one place to another, or form tunnels (pores) in cell membranes that will let only specific molecules (or ions) through. Hemoglobin, a protein in your blood, carries oxygen from your lungs to your cells.
- Strength and support! Other proteins like collagen and keratin are strong and tough and make up your skin, hair, and fingernails. Collagen also supports your cells and organs so they don’t slosh around.
- Motion! The proteins myosin and actin make up much of your muscle tissue. They work together so your muscles can move you around. Some bacteria have cilia and flagella made out of proteins. The bacteria can whip these around to move from place to place.
http://www.pslc.ws/macrog/kidsmac/protein.htm
Proteins (/ˈproʊˌtiːnz/ or /ˈproʊti.ɨnz/) are large biological molecules, or macromolecules,
- consisting of one or more long chains of amino acid residues.
Proteins perform a vast array of functions within living organisms, including
- catalyzing metabolic reactions,
- replicating DNA,
- responding to stimuli, and
- transporting molecules from one location to another.
Proteins differ from one another primarily in
- their sequence of amino acids,
- which is dictated by the nucleotide sequence of their genes, and
- which usually results in folding of the protein into
- a specific three-dimensional structure that determines its activity.
A linear chain of amino acid residues is called a polypeptide. A protein contains at least one long polypeptide. Short polypeptides, containing less than about 20-30 residues, are rarely considered to be proteins and are commonly called peptides, or sometimes oligopeptides. The individual amino acid residues are bonded together by peptide bonds and adjacent amino acid residues. The sequence of amino acid residues in a protein is defined by
- the sequence of a gene, which is encoded in the genetic code.
In general, the genetic code specifies 20 standard amino acids; however, in certain organisms the genetic code can include selenocysteine and—in certain archaea—pyrrolysine. Shortly after or even during synthesis,
- the residues in a protein are often chemically modified by posttranslational modification,
- which alters the physical and chemical properties, folding, stability, activity, and ultimately, the function of the proteins.
http://en.wikipedia.org/wiki/Protein
Posttranslational modification (PTM) is a step in protein biosynthesis. Proteins created by ribosomes translating mRNA into polypeptide chains may undergo PTM (such as folding, cutting and other processes) before becoming the mature protein product. After translation, the posttranslational modification of amino acids extends the range of functions of the protein by attaching it to other biochemical functional groups (such as acetate, phosphate, various lipids and carbohydrates), changing the chemical nature of an amino acid (e.g. citrullination), or making structural changes (e.g. formation of disulfide bridges).
Also, enzymes may remove amino acids from the amino end of the protein, or cut the peptide chain in the middle. For instance, the peptide hormone insulin is cut twice after disulfide bonds are formed, and a propeptide is removed from the middle of the chain; the resulting protein consists of two polypeptide chains connected by disulfide bonds. Also, most nascent polypeptides start with the amino acid methionine because the “start” n mRNA also codes for this amino acid. This amino acid is usually taken off during post-translational modification. Other modifications, like phosphorylation, are part of common mechanisms for controlling the behavior of a protein, for instance activating or inactivating an enzyme.
Posttranslational modification of insulin. At the top, the ribosome translates a mRNA sequence into a protein, insulin, and passes the protein through the endoplasmic reticulum, where it is cut, folded and held in shape by disulfide (-S-S-) bonds. Then the protein passes through the golgi apparatus, where it is packaged into a vesicle. In the vesicle, more parts are cut off, and it turns into mature insulin.
The genetic code diagram showing the amino acid residues as target of modification.
PTMs involving addition of cofactors for enhanced enzymatic activity
- lipoylation, attachment of a lipoate (C8) functional group
- flavin moiety (FMN or FAD) may be covalently attached
- heme C attachment via thioether bonds with cysteins
- phosphopantetheinylation, the addition of a 4′-phosphopantetheinyl moiety from coenzyme A, as in fatty acid, polyketide, non-ribosomal peptide and leucine biosynthesis
- retinylidene Schiff base formation
http://en.wikipedia.org/wiki/Posttranslational_modification
Sometimes proteins have non-peptide groups attached, which can be called prosthetic groups or cofactors. Examples of cofactors include metal ions like iron and zinc. Proteins can also work together to achieve a particular function, and they often associate to form stable protein complexes.
Coenzymes are molecules that work at the active site of an enzyme and aid in recognizing, attracting, or repulsing a substrate or product. Many are derived from vitamins. The substrate is the molecule upon which an enzyme catalyzes a reaction transforming A to B by removal or addition of a hydrogen, or a hydroxyl group, or a methyl group, and so forth. This is how an alcohol or an aldehyde is produced. Such a reaction is critical is carbohydrate metabolism for producing two 3-carbon sugars from a 6-carbon sugar. Coenzymes shuttle chemical groups from one enzyme to another enzyme. They may bind loosely to enzymes, while another group of cofactors do not.
Prosthetic groups are cofactors that bind tightly to proteins or enzymes. As if holding on for dear life, they are not easily removed. They can be organic or metal ions and are often attached to proteins by a covalent bond. The same cofactors can bind multiple different types of enzymes and may bind some enzymes loosely, as a coenzyme, and others tightly, as a prosthetic group. Some cofactors may always tightly bind their enzymes. It’s important to note, though, that these prosthetic groups can also bind to proteins other than enzymes. A holoenzyme is an enzyme with any metal ions or coenzymes attached to it that is now ready to catalyze a reaction.
Around the world, millions of people don’t get enough protein. Protein malnutrition leads to the condition known as kwashiorkor. Lack of protein can cause growth failure, loss of muscle mass, decreased immunity, weakening of the heart and respiratory system, and death.
All Protein Isn’t Alike
Protein is built from building blocks called amino acids. Our bodies make amino acids in two different ways: Either from scratch, or by modifying others. A few amino acids (known as the essential amino acids) must come from food.
- Animal sources of protein tend to deliver all the amino acids we need.
- Other protein sources, such as fruits, vegetables, grains, nuts and seeds, lack one or more essential amino acids.
Vegetarians need to be aware of this. People who don’t eat meat, fish, poultry, eggs, or dairy products need to eat a variety of protein-containing foods each day in order to get all the amino acids needed to make new protein.
http://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/protein/
Molecular Biologists Guide to Proteomics
PR. Graves and TA.J. Haystead*
Microbiol Mol Biol Rev. Mar 2002; 66(1): 39–63 PMC120780
http://dx.doi.org:/10.1128/MMBR.66.1.39-63.2002
The emergence of proteomics, the large-scale analysis of proteins, has been inspired by the realization that
- the final product of a gene is inherently more complex and
- closer to function than the gene itself.
Shortfalls in the ability of bioinformatics to predict
- both the existence and function of genes have also illustrated
- the need for protein analysis.
Moreover, only through the study of proteins can posttranslational modifications be determined,
- which can profoundly affect protein function.
Proteomics has been enabled by
- the accumulation of both DNA and protein sequence databases,
- improvements in mass spectrometry, and
- the development of computer algorithms for database searching.
In this review, we describe why proteomics is important,
- how it is conducted, and
- how it can be applied to complement other existing technologies.
We conclude that currently, the most practical application of proteomics is
- the analysis of target proteins as opposed to entire proteomes.
This type of proteomics, referred to as functional proteomics, is always
- driven by a specific biological question.
In this way, protein identification and characterization has a meaningful outcome. We discuss some of the advantages
- of a functional proteomics approach and
provide examples of how different methodologies can be utilized to address a wide variety of biological problems.
Entry of our laboratory into proteomics 5 years ago was driven by a need to define a complex mixture of proteins (∼36 proteins) we had affinity isolated that bound specifically to the catalytic subunit of protein phosphatase 1 (PP-1, a serine/threonine protein phosphatase that regulates multiple dephosphorylation events in cells). We were faced with the task of trying to understand the significance of these proteins, and the only obvious way to begin to do this was to identify them by sequencing. Since the majority of intact eukaryotic proteins are not immediately accessible to Edman sequencing
- due to posttranslational N-terminal modifications,
- we invented mixed-peptide sequencing.
This method enables internal peptide sequence information to be derived from proteins
- electroblotted onto hydrophobic membranes.
Using the mixed-peptide sequencing strategy, we identified all 36 proteins in about a week. The mixture contained at least two known PP-1 regulatory subunits, but most were novel proteins of unknown function. Herein lies the lesson of proteomics. Identifying long lists of potentially interesting proteins often generates more questions than it seeks to answer.
Despite learning this obvious lesson, our early sequencing experiences were an epiphany that has subsequently altered our whole scientific strategy for probing protein function in cells. The sequencing of the 36 proteins has opened new avenues to further explore the functions of PP-1 in intact cells. Because of increased sensitivity, our approaches now routinely use state-of-the-art mass spectrometry (MS) techniques. However, rather than using proteomics to simply characterize large numbers of proteins in complex mixtures, we see the real application of this technology as a tool to enhance the power of existing approaches currently used by the modern molecular biologist such as classical yeast and mouse genetics, tissue culture, protein expression systems, and site-directed mutagenesis.
Importantly, the one message we would want the reader to take away from reading this review is that one should always let the biological question in mind drive the application of proteomics rather than simply engaging in an orgy of protein sequencing. From our experiences, we believe that if the appropriate controls are performed, proteomics is an extremely powerful approach for addressing important physiological questions. One should always design experiments to define a selected number of relevant proteins in the mixture of interest. Examples of such experiments that we routinely perform include defining early phosphorylation events in complex protein mixtures after hormone treatment of intact cells or comparing patterns of protein derived from a stimulated versus nonstimulated cell in an affinity pull-down experiment. Only the proteins that were specifically phosphorylated or bound in response to the stimulus are sequenced in the complex mixtures. Sequencing proteins that are regulated then has a meaningful outcome and directs all subsequent biological investigation.
The term “proteomics” was first coined in 1995 and was defined as the large-scale characterization of the entire protein complement of a cell line, tissue, or organism. Today, two definitions of proteomics are encountered. The first is the more classical definition, restricting the large-scale analysis of gene products to studies involving only proteins. The second and more inclusive definition combines protein studies with analyses that have a genetic readout such as mRNA analysis, genomics, and the yeast two-hybrid analysis. However, the goal of proteomics remains the same, i.e., to obtain a more global and integrated view of biology by studying all the proteins of a cell rather than each one individually.
Using the more inclusive definition of proteomics, many different areas of study are now grouped under the rubric of proteomics (Fig. (Fig.1).1). These include protein-protein interaction studies, protein modifications, protein function, and protein localization studies to name a few. The aim of proteomics is not only to identify all the proteins in a cell but also to create a complete three-dimensional (3-D) map of the cell indicating where proteins are located. These ambitious goals will certainly require the involvement of a large number of different disciplines such as molecular biology, biochemistry, and bioinformatics. It is likely that in bioinformatics alone, more powerful computers will have to be devised to organize the immense amount of information generated from these endeavors.
In the quest to characterize the proteome of a given cell or organism, it should be remembered that the proteome is dynamic. The proteome of a cell will reflect the immediate environment in which it is studied. In response to internal or external cues, proteins can be modified by posttranslational modifications, undergo translocations within the cell, or be synthesized or degraded. Thus, examination of the proteome of a cell is like taking a “snapshot” of the protein environment at any given time. Considering all the possibilities, it is likely that any given genome can potentially give rise to an infinite number of proteomes.
The first major technology to emerge for the identification of proteins was the sequencing of proteins by Edman degradation. A major breakthrough was the development of microsequencing techniques for electroblotted proteins. This technique was used for the identification of proteins from 2-D gels to create the first 2-D databases. One of the most important developments in protein identification has been the development of MS technology. In the last decade, the sensitivity of analysis and accuracy of results for protein identification by MS have increased by several orders of magnitude. It is now estimated that proteins in the femtomolar range can be identified in gels. Because MS is more sensitive, can tolerate protein mixtures, and is amenable to high-throughput operations, it has essentially replaced Edman sequencing as the protein identification tool of choice.
The growth of proteomics is a direct result of advances made in large-scale nucleotide sequencing of expressed sequence tags and genomic DNA. Without this information, proteins could not be identified even with the improvements made in MS. Protein identification (by MS or Edman sequencing) relies on the presence of some form of database for the given organism. The majority of DNA and protein sequence information has accumulated within the last 5 to 10 years. In 1995, the first complete genome of an organism was sequenced, that of Haemophilus influenzae. At the time of this writing, the sequencing of the genomes of 45 microorganisms has been completed and that of 170 more is under way (http://www.tiger.org/tdb/mdb/mdbcomplete.html). To date, five eukaryotic genomes have been completed: Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, and Drosophila melanogaster. In addition, the rice, mouse, and human genomes are near completion.
One of the first applications of proteomics will be to identify the total number of genes in a given genome. This “functional annotation” of a genome is necessary because
- it is still difficult to predict genes accurately from genomic data. One problem is that
- the exon-intron structure of most genes cannot be accurately predicted by bioinformatics.
To achieve this goal, genomic information will have to be integrated with
- data obtained from protein studies to confirm the existence of a particular gene.
The analysis of mRNA is
- not a direct reflection of the protein content in the cell.
Many studies have shown a poor correlation
- between mRNA and protein expression levels.
The formation of mRNA is only the first step in a long sequence of events resulting in the synthesis of a protein (Fig. (Fig.2).2).
- mRNA is subject to posttranscriptional control in the form of alternative splicing, polyadenylation, and mRNA editing. Many different protein isoforms can be generated from a single gene at this step.
- mRNA then can be subject to regulation at the level of protein translation. Proteins, having been formed, are subject to posttranslational modification. It is estimated that up to 200 different types of posttranslational protein modification exist. Proteins can also be regulated by proteolysis and compartmentalization. It is clear that the tenet of “one gene, one protein” is an oversimplification.
Triple-quadrupole mass spectrometers are most commonly used to obtain amino acid sequences. In the first stage of analysis, the machine is operated in MS scan mode and all ions above a certain m/z ratio are transmitted to the third quadrupole for mass analysis (Fig. (Fig.6)6) (82, 173). In the second stage, the mass spectrometer is operated in MS/MS mode and a particular peptide ion is selectively passed into the collision chamber. Inside the collision chamber, peptide ions are fragmented by interactions with an inert gas by a process known as collision-induced dissociation or collisionally activated dissociation. The peptide ion fragments are then resolved on the basis of their m/z ratio by the third quadrupole (Fig. (Fig.6).6). Since two different mass spectra are obtained in this analysis, it is referred to as tandem mass spectrometry (MS/MS). MS/MS is used to obtain the amino acid sequence of peptides by generating a series of peptides that differ in mass by a single amino acid.
The largest application of proteomics continues to be protein expression profiling. Through the use of two-dimensional gels or novel techniques such as ICAT, the expression levels of proteins or changes in their level of modification between two different samples can be compared and the proteins can be identified. This approach can facilitate the dissection of signaling mechanisms or identify disease-specific proteins.
Cancer cells are good candidates for proteomics studies because they can be compared to their non-transformed counterparts. Analysis of differentially expressed proteins in normal versus cancer cells can
(i) identify novel tumor cell biomarkers that can be used for diagnosis,
(ii) provide clues to mechanisms of cancer development, and
(iii) identify novel targets for therapeutic intervention. Protein expression profiling has been used in the study of breast, esophageal, bladder and prostate cancer. From these studies, tumor-specific proteins were identified and 2-D protein expression databases were generated. Many of these 2-D protein databases are now available on the World Wide Web.
Leave a Reply