Lesson 6 of 34 · Basic Molecular Theory

Transcription and RNA Processing

Transcription and RNA Processing

Overview

The previous lesson followed how DNA is copied at the replication fork so that genetic information is faithfully duplicated. This lesson takes the next step in the central dogma: how the information stored in DNA is read out into RNA. That process, transcription, is the first stage of gene expression, and in eukaryotic cells the primary transcript is then extensively processed before it can direct protein synthesis. Understanding both steps explains a fact that shapes much of molecular diagnostics — that the messenger RNA a cell actually uses is not a verbatim copy of the gene as it sits in the genome.

This lesson builds on DNA structure, base pairing, and the mechanics of templated synthesis introduced with replication; those fundamentals are assumed here rather than re-taught.

Transcription by RNA polymerase

Transcription is carried out by RNA polymerase, an enzyme that synthesizes an RNA copy of a DNA segment. Like the DNA polymerases of replication, RNA polymerase builds its product in the 5’→3’ direction, adding ribonucleotides to the growing 3’ end and reading the template in the antiparallel 3’→5’ direction 1. The chemistry is the familiar templated, base-paired addition — but with two key differences from replication. The product is RNA, so uracil (U) is incorporated opposite adenine in place of thymine, and the sugar is ribose rather than deoxyribose 2.

Critically, RNA polymerase needs no primer. It can initiate a new strand from scratch on a DNA template, which is why transcription does not depend on the primer-laying machinery required to start DNA replication 1.

Template versus coding strand

Only one of the two DNA strands is read in any given gene. The strand that RNA polymerase uses as its template is called the template strand (or antisense strand). Because the new RNA is synthesized by base pairing against it, the RNA sequence matches the other strand — the coding strand (or sense strand) — except that the RNA carries U where the coding strand has T 2.

coding (sense)      5'- A T G C A T G G ... -3'   (RNA matches this, U for T)
template (antisense)3'- T A C G T A C C ... -5'   (read by RNA polymerase)
RNA transcript      5'- A U G C A U G G ... -3'

Promoters and where transcription starts

RNA polymerase does not begin at an arbitrary point. It is positioned by a promoter — a specific DNA sequence, upstream of the transcribed region, that marks where transcription should start and on which strand 1. Promoter recognition (assisted in cells by accessory proteins) defines the start site and direction, ensuring that the correct strand serves as template and that the gene is read from its proper beginning.

The major classes of RNA

Transcription produces several functionally distinct classes of RNA 1:

  • Messenger RNA (mRNA) carries the protein-coding message from gene to ribosome; its sequence is what gets translated into a polypeptide.
  • Transfer RNA (tRNA) acts as the adaptor in translation, matching each codon to its amino acid.
  • Ribosomal RNA (rRNA) forms the structural and catalytic core of the ribosome itself.
  • Regulatory and other non-coding RNAs — including small RNAs that tune gene expression — are not translated but help control which genes are expressed and when 1.

Only mRNA encodes protein; the others do their work as RNA. The processing steps below apply chiefly to eukaryotic mRNA.

Eukaryotic mRNA processing

In eukaryotes the molecule that RNA polymerase first produces is a primary transcript (pre-mRNA), and it is not yet a functional message. Three processing events convert it into mature mRNA, and the first two begin while transcription is still underway 1.

5’ cap and 3’ polyadenylation

Shortly after synthesis begins, the 5’ end of the transcript receives a 5’ cap — a modified guanine nucleotide added to the very first nucleotide. The cap protects the transcript from degradation and is recognized by the translation machinery 1.

At the other end, the transcript is cleaved and a string of adenine nucleotides — the poly(A) tail — is added in a step called polyadenylation. The tail contributes to mRNA stability and to export from the nucleus 2.

Splicing: removing introns, joining exons

Most eukaryotic genes are interrupted: their coding information is split into segments called exons separated by intervening sequences called introns. The primary transcript contains both. Splicing is the process that excises the introns and joins the exons together in order, producing a continuous coding sequence 1.

Splicing is performed by the spliceosome, a large complex of proteins and small nuclear RNAs that recognizes the boundaries of each intron, cuts the transcript at those sites, and ligates the flanking exons 1.

pre-mRNA:    [exon1]--intron--[exon2]----intron----[exon3]
                  \____________  ________________/
                   spliceosome removes introns, joins exons
mature mRNA: cap-[exon1][exon2][exon3]-AAAA...(poly-A)

Alternative splicing

The exons of a single transcript need not always be joined the same way. In alternative splicing, different combinations of exons are retained or skipped, so one gene can give rise to several distinct mRNAs and therefore several related proteins 1. This is a major reason the number of distinct proteins a genome can produce greatly exceeds its number of genes.

The mature mRNA

The end product carries a 5’ cap, a continuous string of joined exons (the coding sequence, flanked by untranslated regions), and a 3’ poly(A) tail. Only this processed molecule is exported and translated. The crucial point is that the introns present in the gene are absent from the mature mRNA 2.

Lab relevance

Because introns are spliced out, mature mRNA represents only the exonic, expressed portion of a gene — and only the genes a given cell is actually transcribing. Copying mRNA back into DNA therefore yields a strand whose sequence corresponds to the joined exons, with no intronic sequence. This complementary DNA, made from mRNA, captures the expressed sequence rather than the full genomic locus 3.

That property is the basis of several techniques covered later in this program — the enzyme reverse transcriptase, which synthesizes DNA from an RNA template, and RT-PCR, which couples reverse transcription to amplification to detect or quantify RNA. These are named here only to connect the biology of splicing to its laboratory use; they are taught as their own topics in upcoming lessons.

Summary

Transcription copies one strand of DNA into RNA: RNA polymerase reads the template (antisense) strand 5’→3’, producing a transcript matching the coding (sense) strand, starts at a promoter, and requires no primer. Cells make several RNA classes — mRNA, tRNA, rRNA, and regulatory RNAs — but only mRNA is translated. In eukaryotes the primary transcript is processed by 5’ capping, 3’ polyadenylation, and splicing, in which the spliceosome removes introns and joins exons; alternative splicing lets one gene yield multiple transcripts. Because the mature mRNA contains only exonic sequence, reverse-transcribing it captures expressed sequence — the foundation for reverse transcriptase and RT-PCR, addressed in later lessons.

References

  1. Bruce Alberts, Rebecca Heald, Alexander Johnson, David Morgan, Martin Raff, Keith Roberts, Peter Walter. Molecular Biology of the Cell. 7th ed. W. W. Norton & Company. 2022. verified
  2. David L. Nelson, Michael M. Cox, Aaron A. Hoskins. Lehninger Principles of Biochemistry. 8th ed. W. H. Freeman (Macmillan Learning). 2021. verified
  3. Lela Buckingham. Molecular Diagnostics: Fundamentals, Methods, and Clinical Applications. 3rd ed. F.A. Davis Company. 2019. verified