Open the UCSC Genome Browser and explore the annotations of the human genome. There is information on what genes are present in a certain area of the chromosome. Examine the superoxide dismutase (SOD1) gene (sequence is here).
Then go to UCSC BLAT program here.
And, paste the sequence into the field. Clicking on "I'm feeling lucky" will take you straight to the Genome viewer, while "Submit" will take you to a table of other locations in the human genome that match less well to SOD1. Play around with the functions of the UCSC Genome viewer, and look at the other matches of the SOD1 sequence. The score is a numerical value on the match of SOD1 to other sequences.
1. How do you think the score is determined?
2. What do the other numbers on the table signify?
3. Click on the "mRNA (may differ from genome)" link to see the mRNA sequence. Copy this sequence and use BLAT to search this sequence. Why are there far fewer matches in this search?
4. Only 2% of the human genome is made of gene coding sequence. What is the rest of the DNA?
5. Why are amino acids encoded by 3 nucleotide codons?
6. Why can't you use plasmids for large pieces of DNA?
1. Score is determined based on nucleotide sequence similarity. Sequences that match strongly in small regions of the SOD1 gene will also be reported. This can be used, admittedly inefficiently, to determine gene homologies. BLAST would be a better program for gene homologies, but that is beyond the scope of this unit.
2. START and END are the positions of your query sequence that begin and end the similarity to the sequence reported. QSIZE is the length of your query sequence. IDENTITY is the percent similarity of your sequence to the sequence that BLAT is matching. CHRO is the chromosome, STRAND is the strand of DNA, START and END (2nd time) are the start and end of the genome match, and SPAN is the distance between START and END (2nd time), respectively.
Now try BLAT searching the mRNA sequence, which lacks introns of SOD1. There are SOD1 UCSC genes annotated in the genome browser. Click on one of the SOD1 genes. This will take you to a page that has all the information on the SOD1 gene.
3. The first search included the introns of SOD1, which contain SINE elements (one of many repetitive elements in the human genome), so some of the match is the SINE sequence finding some of the other numerous SINE sequences in the genome.
4. The rest of the DNA is composed of retrotransposons, transposons, and other "junk DNA" elements. These elements may affect gene expression through unknown mechanisms.
5. Three nucleotides allow for encoding potentially 64 amino acids, while two nucleotides allow for only 16. Having three nucleotides is the simplest way to create the diversity of amino acids seen in proteins.
6. Plasmids cannot properly replicate if DNA pieces are too large, and are eventually lost. Therefore, larger pieces of DNA need to have extra genes to ensure that they are efficiently maintained in the bacterium (that is, like a BAC or a cosmid).