Translation, the Genetic Code, and Protein Structure — ASCP MB

Overview

Transcription copies a gene into messenger RNA (mRNA). Translation is the second half of gene expression: the cell reads that mRNA and builds the polypeptide it specifies. This lesson covers two linked ideas — the genetic code that maps RNA sequence to amino acids, and the structure of the protein that results. It builds on the Transcription and RNA Processing lesson (concept transcription-rna-processing); here we assume a mature mRNA already exists and follow it to a folded, functional protein. We close by tracing how the sequence variants from the Mutations and Variation lesson (concept mutations-variation) propagate through the code into protein consequence.

The genetic code

The genetic code is the set of rules that relates the nucleotide sequence of mRNA to the amino acid sequence of a protein. It is read in non-overlapping groups of three bases called codons ¹. Because each codon is drawn from four possible bases (A, U, G, C), there are 4 × 4 × 4 = 64 possible codons. These 64 codons specify only 20 standard amino acids plus a stop signal, so the code is necessarily many-to-one.

Redundancy and degeneracy

Mapping 64 codons onto 21 outcomes (20 amino acids + stop) means most amino acids are encoded by more than one codon. This property is called degeneracy (or redundancy). Much of the redundancy sits in the third codon position, where a change often specifies the same amino acid — the basis of “wobble” pairing at that position ². Degeneracy is not randomness: the code is otherwise unambiguous, because any single codon specifies exactly one amino acid.

Start and stop signals

Translation does not begin at the literal 5’ end of the mRNA. It begins at a start codon, AUG, which both initiates synthesis and encodes the amino acid methionine (Met) ¹. Three codons — UAA, UAG, and UGA — specify no amino acid and instead act as stop codons, signaling the end of the polypeptide.

Reading frames

Because codons are triplets, where translation starts determines how the entire downstream sequence is parsed. Each mRNA has three possible reading frames depending on which base is treated as position 1; the start codon sets the single open reading frame that is actually translated ¹. Shifting the frame re-groups every following base into different codons:

mRNA:    5'- ... A U G  G C A  U C U  U A A ... -3'
frame 1:        AUG  GCA  UCU  UAA      (Met-Ala-Ser-STOP)
frame 2:         UG-G  CAU  CUU  AA?    (different codons entirely)

A correct reading frame is therefore essential: a disturbance to it changes every codon from that point onward.

Translation mechanics

Translation is carried out by the ribosome, a two-subunit ribonucleoprotein machine that moves along the mRNA and catalyzes peptide-bond formation ¹. The physical link between nucleic-acid sequence and amino acid is transfer RNA (tRNA). Each tRNA carries a three-base anticodon that base-pairs with a complementary mRNA codon, and is covalently charged with the matching amino acid to form an aminoacyl-tRNA ². The fidelity of charging — the right amino acid on the right tRNA — is what makes the code physically real.

The process proceeds in three stages ¹:

INITIATION   small subunit + initiator tRNA find AUG; large subunit joins
ELONGATION   codon-by-codon: aminoacyl-tRNA enters, peptide bond forms,
             ribosome translocates one codon along the mRNA
TERMINATION  a stop codon is reached; release factors free the polypeptide

Initiation: the small ribosomal subunit, assisted by initiator Met-tRNA, locates the start codon; the large subunit then assembles to form the complete ribosome.
Elongation: for each codon, the matching aminoacyl-tRNA is delivered, a peptide bond joins its amino acid to the growing chain, and the ribosome translocates by exactly one codon. The chain grows from its amino (N) terminus toward its carboxyl (C) terminus.
Termination: when a stop codon enters the ribosome, no tRNA matches it; protein release factors trigger release of the finished polypeptide ².

The output of translation is a linear chain of amino acids — but a chain is not yet a working protein.

Protein structure

A protein’s biological function depends on its three-dimensional shape, which is organized into four hierarchical levels ².

Primary structure is the linear sequence of amino acids, joined by peptide bonds, in the order dictated by the codons. This sequence is the direct readout of the gene and determines all higher levels.
Secondary structure is local folding of the backbone into regular patterns stabilized by hydrogen bonds — chiefly the alpha helix (a coiled rod) and the beta sheet (extended strands lying side by side) ².
Tertiary structure is the overall three-dimensional fold of a single polypeptide, packing its helices, sheets, and loops into a compact shape with a defined active or binding site.
Quaternary structure is the assembly of two or more folded polypeptide chains (subunits) into a single functional complex — hemoglobin’s four subunits are the classic example ².

PRIMARY        -Met-Ala-Ser-...-   (amino acid sequence)
SECONDARY      alpha helix  /  beta sheet  (local H-bonded patterns)
TERTIARY       one chain folded into a 3-D shape
QUATERNARY     multiple chains assembled into one complex

Why one amino acid can matter

Because the higher levels of structure all derive from the primary sequence, a single amino-acid substitution can be enough to alter or abolish function. If the changed residue sits in an active site, a folding contact, or a subunit interface, the protein may misfold, lose activity, or aggregate ². Sickle-cell hemoglobin — a single substitution in a globin chain — is the canonical illustration that one residue can change a protein’s behavior ³.

Mapping variants onto codons and protein consequence

The variant types from the Mutations and Variation lesson (concept mutations-variation) take on concrete meaning once the code and protein levels are in view ³:

A silent (synonymous) variant changes a codon to another codon for the same amino acid — typically a third-position change exploiting degeneracy. Primary structure is unchanged.
A missense variant changes a codon so it specifies a different amino acid, altering primary structure; the effect ranges from negligible to severe depending on where and what the substitution is.
A nonsense variant changes a sense codon into a stop codon, truncating the protein early and often destroying function.
A frameshift variant inserts or deletes a number of bases not divisible by three, shifting the reading frame so that every codon downstream is misread — usually producing a garbled sequence and a premature stop.

reference     ...  CAU  GAA  ...   -> His - Glu
silent        ...  CAC  GAA  ...   -> His - Glu      (same protein)
missense      ...  CGU  GAA  ...   -> Arg - Glu      (one residue changed)
nonsense      ...  UAA  ...        -> STOP           (truncated)
frameshift    ... C-UG-AA- ...     -> all later codons reframed

This is why mutation classification is fundamentally a statement about the code: the same single-base change can be silent, missense, or nonsense depending only on which codon position it strikes and what it produces.

Summary

The genetic code reads mRNA in triplet codons — 64 codons specifying 20 amino acids plus stop signals, with redundancy concentrated at the third position, AUG as start/Met, and UAA/UAG/UGA as stops. Ribosomes and aminoacyl-tRNAs translate the open reading frame through initiation, elongation, and termination to produce a polypeptide. That polypeptide folds through primary, secondary, tertiary, and quaternary structure into a functional protein, so a single amino-acid change can reshape function. Sequence variants map cleanly onto this machinery: silent, missense, nonsense, and frameshift consequences all follow from how a base change alters codons and the reading frame.