Lesson 31 of 34 · Sequencing and Bioinformatics
Pyrosequencing and RNA Sequencing
Overview
Chain-termination (Sanger) sequencing and standard short-read next-generation sequencing are the workhorses of the molecular laboratory, but they are not the only ways to read a nucleic acid sequence. Two additional approaches appear in the ASCP Technologist in Molecular Biology outline because they answer questions the mainstream methods answer poorly. Pyrosequencing reads short stretches of DNA while measuring how much of each variant is present, making it well suited to quantitative questions. RNA sequencing turns the transcriptome — the RNA a cell is actually making — into sequence data, revealing which genes are expressed and how their transcripts are assembled. This lesson assumes you are already comfortable with the chemistry of sequencing-by-synthesis from the Sanger and next-generation sequencing lessons, with reverse transcription from the polymerases and reverse transcriptase topic, and with the massively parallel platforms covered under next-generation sequencing.
Pyrosequencing: sequencing by synthesis with a light readout
Pyrosequencing is a sequencing-by-synthesis method, but unlike Sanger sequencing it does not separate labeled fragments by size. Instead, it watches a DNA polymerase extend a primer in real time and reports each successful nucleotide incorporation as a flash of light 1. The signal comes not from the incoming base itself but from a byproduct of the polymerization reaction: inorganic pyrophosphate (PPi).
When DNA polymerase adds a nucleotide to the growing strand, it joins the incoming deoxynucleoside triphosphate (dNTP) to the 3’ end and releases one molecule of pyrophosphate for each base incorporated 2. Pyrosequencing converts that pyrophosphate into visible light through a coupled enzyme cascade:
- ATP sulfurylase combines the released PPi with adenosine 5’-phosphosulfate (APS) to generate adenosine triphosphate (ATP).
- Luciferase uses that ATP to convert luciferin to oxyluciferin, emitting a burst of light proportional to the amount of ATP produced.
- Apyrase, a nucleotide-degrading enzyme, continuously destroys unused dNTPs and excess ATP, resetting the reaction so the next nucleotide can be tested cleanly 1.
The four dNTPs are not supplied all at once. They are flowed across the template one at a time in a known, repeating order. If the dispensed nucleotide is complementary to the next base on the template, polymerase incorporates it, pyrophosphate is released, and light is emitted; if it is not complementary, nothing is incorporated and no light appears. Because the laboratory knows which nucleotide was added at each step, the pattern of light-producing flows spells out the sequence.
Reading a pyrogram
The output is a pyrogram — a trace of light intensity versus the nucleotide dispensation order. The key quantitative feature is that peak height is proportional to the number of identical bases incorporated in a single flow. When the template has a run of the same base (a homopolymer), polymerase adds all of them in one dispensation and releases a correspondingly larger pulse of pyrophosphate, producing a taller peak 1.
Template synthesized: A G G T T T C
Dispensation order: A T C G A T C G A T C G
Incorporated: A GG TTT C
Light signal (peak height):
3 | * (TTT -> 3x peak)
2 | * (GG -> 2x peak)
1 | * * (single A, C)
0 |____*__*__*__*__*__*__*__*__*__*__*__*___________
A T C G A T C G A T C G
(no light where the dispensed base is not incorporated)
The proportional readout is what makes pyrosequencing quantitative. A peak that is half the height of a neighboring single-base peak indicates that the incorporated base was present in only half the template molecules — exactly the measurement needed for allele or variant quantification. Common applications include estimating the percentage of a somatic variant in a tumor specimen, genotyping single-nucleotide variants, and measuring DNA methylation. Methylation work pairs pyrosequencing with bisulfite conversion, which turns the methylation state of individual cytosines into a sequence difference that the pyrogram can quantify base by base; that chemistry is covered under the bisulfite conversion and methylation topic.
Strengths and limitations
Pyrosequencing’s strengths follow from its design. It is quantitative, reporting variant or methylation fractions rather than a simple call; it is fast and requires no electrophoretic separation; and it is well matched to short, defined targets where precise quantification matters more than read length 1.
Its limitations follow just as directly. Read length is short — typically only tens of bases — because signal-to-noise degrades as the strand lengthens and small synchronization errors accumulate across the population of template molecules. The most important limitation is homopolymer difficulty: in a long run of identical bases, all are incorporated in one flow, and the method must infer the run length from peak height alone. Distinguishing, say, seven identical bases from eight depends on a linear relationship between peak height and base count that becomes unreliable for long homopolymers, so insertions and deletions in such stretches are the characteristic error mode 1.
RNA sequencing (RNA-seq)
Sequencing chemistries read DNA, not RNA, so RNA sequencing begins by converting the RNA in a sample into a complementary DNA copy. Reverse transcriptase, the RNA-dependent DNA polymerase introduced in the polymerases and reverse transcriptase topic, synthesizes a cDNA strand using the RNA as template 2. The resulting cDNA is then built into a sequencing library and read, in the overwhelming majority of clinical and research workflows, on a massively parallel next-generation sequencing platform — the same instruments and short-read chemistries described in the next-generation sequencing lesson 1.
What distinguishes RNA-seq from genome sequencing is not the instrument but the template and the questions it answers. Because the abundance of each transcript in the starting material is reflected in the number of sequencing reads that map back to its gene, RNA-seq is fundamentally a measurement of gene expression: counting reads per transcript estimates how actively each gene is being transcribed 1. Sampling the transcriptome this way also reveals features that genomic DNA alone cannot show:
- Gene expression levels — relative or absolute transcript abundance across thousands of genes at once.
- Splice variants — which exons are joined together, since RNA-seq reads the spliced, processed transcript rather than the intron-containing gene. This builds directly on transcription and RNA processing.
- Fusion transcripts — chimeric messages produced when a structural rearrangement joins two genes, detectable as reads that span the junction between sequences from different genes 1.
The clinical relevance is greatest in oncology, where fusion transcripts are recurrent, diagnostic, and sometimes directly targetable drivers of disease. Because RNA-seq detects the expressed fusion message regardless of exactly where in the intervening introns the DNA break occurred, it can capture fusions that are difficult to find by DNA-based methods; the use of fusion detection in hematologic and solid-tumor testing is developed further under those application topics.
A note on long-read (third-generation) sequencing
The short-read platforms above fragment nucleic acid into small pieces before sequencing. A newer class of long-read, or third-generation, sequencing instead reads single molecules of DNA or RNA continuously, producing reads thousands to tens of thousands of bases long 1. Because each read can span a long homopolymer, a repetitive region, or a large structural rearrangement in one piece, long-read sequencing has advantages for resolving homopolymers and structural variants and for assembling regions that are ambiguous when reconstructed from many short reads 3. The trade-off has historically been a higher per-read error rate, and the technology is mentioned here for completeness; its detailed chemistry is beyond the scope of this lesson.
Summary
Pyrosequencing is a sequencing-by-synthesis method that detects the pyrophosphate released on each nucleotide incorporation, converting it through the sulfurylase-to-ATP-to-luciferase cascade into light whose peak height is proportional to the number of identical bases added. That proportionality makes it quantitative and ideal for short targets, methylation, and variant quantification, but limits it to short reads and makes long homopolymers its characteristic weakness. RNA sequencing converts RNA to cDNA with reverse transcriptase and reads it on next-generation platforms to measure gene expression and to reveal splice variants and clinically important fusion transcripts. Long-read sequencing rounds out the picture by reading single, very long molecules, with particular strengths for homopolymers and structural variation.
References
- Lela Buckingham. Molecular Diagnostics: Fundamentals, Methods, and Clinical Applications. 3rd ed. F.A. Davis Company. 2019. verified
- Bruce Alberts, Rebecca Heald, Alexander Johnson, David Morgan, Martin Raff, Keith Roberts, Peter Walter. Molecular Biology of the Cell. 7th ed. W. W. Norton & Company. 2022. verified
- Michael R. Green, Joseph Sambrook. Molecular Cloning: A Laboratory Manual. 4th ed. Cold Spring Harbor Laboratory Press. 2012. verified