Gene to Protein
Topics in Depth
The Theme of Cracking the Code: Translation in Gene to Protein
RNA contains only 4 different types of nucleotides. A protein contains 20 different amino acids. Now you don't need to be a math guru to realize the problem here. How does something with a code of only 4 encode for something with 20 different subunits? The answer to this question actually wasn't worked out until the 1960's, and it is called the genetic code.
The discovery of the genetic code actually has some cool history. In fact, it involves brotherhoods, Russians, and code names for all its members. The brotherhood, called the "RNA Tie Club," was founded by the Russian physicist George Gamov. The club was a group of prominent scientists, each of whose members was given a name based on one of the 20 amino acids. The club met a couple times a year to think about how the genetic code might work. Who said scientists couldn't be cool?
Check out the Nobel prize website for even more tales of adventure and uncharted territory. http://nobelprize.org/educational/medicine/gene-code/history.html
Who finally figured out the genetic code? Was it one of the 8 Tie Club members who went on to win the Nobel prize? Nope. It was two "American underdogs" (not members of the Tie Club), Har Gobind Khornana from the University of Wisconsin and Robert Holley from Cornell University. Har Gobind Khorana discovered a way to produce chains of nucleic acids, while Robert Holley figured out the exact structure of a tRNA. The tRNA molecule was the adaptor molecule that connected the mRNA to an amino acid. After these discoveries the code fell into place.
The genetic code describes how the nucleotide sequence of a gene is translated into an amino acid sequence. How?
RNA acts as a middleman messenger. According to the genetic code, three RNA nucleic acids (think of them as a three letter word) code for a single amino acid. This three-letter code is called a codon. The first nucleotide written is always the nucleotide that is the most 5'.
Psst…did you forget what 5' and 3' means already? Don't worry, we have got you covered. The 5' and 3' notations refer to the structure of the nucleic acid at the end of the chain. The polymerase only adds the new nucleotide to the 3' hydroxyl end (-OH). Remember that base pairing only occurs in an anti-parallel orientation. This basically means that most 5' base of one strand binds to the most 3' end of the other strand.
The AUG codon signals the beginning of translation, and also codes for the amino acid methionine. Some codons do not code for an amino acid at all, and instead signal the stop of translation. These are called stop codons. Altogether this genetic code explains how protein expression is even possible.
Warning: Don't let the stop codons trick you. They don't actually code for the insertion of an amino acid. All they do is what their names suggest…stop translation. Finally biology is straightforward right?
For the most part, the genetic code is universal among organisms, but they're a few slight changes in certain systems. The mitochondria and chloroplasts are examples, as they have their own slightly modified code. In fact mitochondria and chloroplasts are kind of like San Telmo, the country that is located within the bigger country of Italy.
Mitochondria and chloroplasts are thought to draw their roots from a prokaryotic ancestor. As a result, they produce their own proteins in a way that is reminiscent of their evolutionary heritage, and different from the "country" of the cell where they reside. In other words, think of the mitochondrial and chloroplast as having their own dialect. They produce many of their own proteins with this special dialect.
Finally, the table you have all been waiting for…the genetic code. Okay, maybe this table isn't "I won the lottery" exciting, but it will make your biology class a whole lot easier.
We've mentioned that an RNA sequence must be grouped into "words" of three bases, where these three-base words are called codons. The problem is, where do you start defining your words? This is important because if you shift your start site by one you will get a list of totally different words.
Where you start reading the DNA sequence is called the reading frame. A protein is said to have 3 reading frames.
Consider the RNA sequence GUGACUGAUUGU.
You can tell which amino acid is coded for by looking at the graph above. Let's start by saying that GUG is the first codon. Therefore, you will divide the codons like this:
GUG ACU GAU UGU = valine, threonine, aspartic acid, cysteine
This is the first reading frame.
Now let's move the reading frame over by one: UGA CUG AUU GU_
Now we have: stop leucine isoleucine and two bases that aren't enough for a codon.
Finally lets start our reading frame 2 bases from the beginning of the sequence. Now we get:
GAC UGA UUG U, we get aspartic acid-stop-leucine and 1 lonely amino acid.
This exercise shows how important it is to start reading at the right place. Moving the start site over by one completely changes the amino acid sequence coded for by the RNA. More importantly, in this instance, it actually results in the insertion of a stop codon. When a stop codon is reached in the translation sequence, no further amino acids are inserted into the polypeptide chain. This can be disastrous to the cell, because it results in a protein that is prematurely truncated (or shortened). Changes like this one are called nonsense mutations.
We’re adding new materials and resources all the time.
Sign up for our newsletter to stay up to date.
An informed Shmooper is the greatest weapon against pop quizzees.