Lesson 30 of 34 · Sequencing and Bioinformatics

Next-Generation Sequencing

Next-Generation Sequencing

Overview

Sanger sequencing reads one amplified template at a time, resolving a few hundred to roughly a thousand bases per capillary in a single run. That is ideal for confirming a single variant or sequencing one short region, but it does not scale to interrogating thousands of genes, a whole exome, or a microbial population in one experiment. Next-generation sequencing (NGS), also called massively parallel sequencing, removes that bottleneck: instead of one template per reaction, NGS sequences millions of DNA fragments simultaneously, each immobilized in its own physical location and read in parallel 1.

This lesson assumes familiarity with the chemistry of nucleic acids, complementary base pairing, and the enzymes used in molecular biology — DNA polymerase, ligase, and the labeling of nucleotides — together with the polymerase chain reaction and the principle of chain-termination (Sanger) sequencing covered in the previous lesson. With that foundation, NGS is best understood not as a single instrument but as a generic, platform-neutral workflow that turns a purified DNA sample into millions of short text records called reads.

From One Read to Millions

The defining shift from Sanger to NGS is parallelism. In Sanger sequencing the number of bases read scales with the number of capillaries and runs. In NGS the sample is broken into a vast collection of fragments, each fragment is anchored to a surface or a bead, and every anchored fragment is sequenced at the same time in a repeated chemical cycle imaged across the whole surface 1. The output is therefore not one trace but an enormous file of individual reads, which shifts the analytical burden downstream: the challenge becomes assembling and interpreting millions of short reads rather than reading a single long one.

Although platforms differ in their chemistry and detection method, nearly all share the same five-stage pipeline:

  (1) Library         (2) Target          (3) Clonal
      preparation  ->     enrichment   ->     amplification  ->
      (fragment,          (optional:          (clusters /
       adapters,           panel, exome,        bead colonies)
       index)              genome)
                                                    |
                                                    v
  (5) Reads        <-  (4) Sequencing by synthesis
      (FASTQ-style         (cyclic labeled-base
       text records)        incorporation, imaged)

Stage 1 — Library Preparation

A sequencing library is the population of sample fragments made ready for the instrument. Preparation begins by fragmenting the genomic DNA — mechanically (for example by sonication) or enzymatically — into pieces of a target size range 2. Short, synthetic double-stranded oligonucleotides called adapters are then ligated to both ends of every fragment, using DNA ligase to form the covalent joins 2. These adapters are the universal handles that let the instrument bind, amplify, and prime every fragment with common oligonucleotides, regardless of the fragment’s internal sequence.

Adapters also enable multiplexing — running many samples together in one reaction. Each sample is given adapters carrying a short, unique sequence called an index or barcode. Because every read inherits the barcode of its sample, reads from a shared run can be sorted back to their source afterward, a step called demultiplexing. The labeling and end-modification chemistry that makes ligation efficient builds on the same nucleic-acid labeling principles used elsewhere in molecular biology 1. The product of stage 1 is a pool of fragments, each flanked by adapters and tagged by sample.

Stage 2 — Target Enrichment (Optional)

Sequencing every base of the genome is often unnecessary. When the question concerns specific genes, the library can be enriched for the regions of interest before sequencing, which concentrates the reads where they are most informative 1. Two strategies dominate:

  • Amplicon-based panels use targeted PCR primer pairs to amplify the regions of interest directly, so only those amplicons enter the library. This suits small, well-defined panels and works with limited input material.
  • Hybrid-capture panels use long biotinylated probes that hybridize to the target regions; the probe-bound fragments are then pulled down (typically on streptavidin-coated beads) while unbound DNA is washed away. This scales to larger gene sets and tolerates a wider range of fragments.

Enrichment exists on a spectrum of breadth. A targeted panel interrogates tens to hundreds of genes; whole-exome sequencing captures the protein-coding regions across the genome; whole-genome sequencing skips enrichment entirely and sequences everything. Broader scope yields more comprehensive data but spreads sequencing effort more thinly, a tradeoff that connects directly to coverage, discussed below.

Stage 3 — Clonal Amplification

A single DNA molecule produces too little signal to detect reliably, so each library fragment is copied many times in place to form a localized cluster of identical molecules — a clonal population — that emits a detectable signal during sequencing 1. Two classic approaches achieve this:

  • Bridge amplification on a flow cell: fragments bind to a lawn of surface-bound oligonucleotides complementary to the adapters. Each fragment repeatedly bends over to prime a neighboring surface oligo, copying itself across the surface and building a dense, spatially fixed cluster of clones.
  • Emulsion PCR on beads: single fragments are captured on individual beads inside microscopic water-in-oil droplets, where PCR coats each bead with many copies of its one fragment. The beads are then arrayed for sequencing.

Either way, the principle is the same: convert one molecule into a spatially discrete colony of identical copies so its signal rises above background. Because amplification is clonal, every molecule in a given cluster carries the same sequence, and all the parallel clusters are sequenced simultaneously.

Stage 4 — Sequencing by Synthesis

Most platforms read the clusters by sequencing-by-synthesis: a DNA polymerase extends a primer along each template one base at a time, and the identity of each added base is detected at every step 1. The reaction proceeds in repeated cycles, and the surface is interrogated after each cycle so that the base just incorporated at every cluster is recorded at once. Reading across the cycles gives the sequence of each cluster.

The detection chemistry varies. In the most common scheme, the incorporated nucleotides carry fluorescent labels, and the surface is imaged each cycle so that a cluster’s color reveals which of the four bases was added 1. The chemistry typically uses reversible terminators: a blocking group permits only one base to add per cycle, and after imaging the label and block are removed so the next cycle can proceed. An alternative, label-free approach detects the hydrogen ion (and the resulting local pH change) released each time a nucleotide is incorporated, using a semiconductor sensor instead of a camera; the base is then inferred from which nucleotide was flowed when the signal appeared. Both approaches read sequence by building the complementary strand and watching incorporation — the same logic of templated, polymerase-driven synthesis seen in DNA replication and in PCR 3.

Stage 5 — Reads and Their Properties

The output of a run is a large set of reads, each a short string of base calls with an associated per-base quality estimate. Several properties of reads determine what an experiment can detect.

  • Read length is the number of bases called from a single fragment in one direction. NGS reads are typically much shorter than a Sanger read, which is why downstream assembly and alignment are essential.
  • Single-end vs paired-end. A fragment can be read from one end only (single-end) or from both ends (paired-end). Paired-end reads come with a known approximate distance between the two reads, which improves alignment accuracy and helps resolve repetitive regions and structural rearrangements.
  • Coverage, or read depth, is how many independent reads span a given position. It is often summarized as an average (for example, “30x” means each base is covered about thirty times on average), but the depth at any single position is what matters for calling a variant there.

Why depth matters is central to clinical NGS. Each read carries a small error rate, and a genuine variant must be distinguished from sequencing noise. When the variant of interest is present in only a small fraction of molecules — as with a low-level somatic mutation in a tumor sample diluted by normal DNA — many reads must cover the position before enough variant-bearing reads accumulate to separate true signal from error 1. Deep coverage is therefore the price of detecting low-frequency variants confidently; broad scope (a whole genome) and great depth pull in opposite directions for a fixed amount of sequencing, which is why panel design balances breadth against the depth each target needs.

Applications

The same workflow supports very different clinical and research questions, mostly by varying the enrichment strategy and the depth:

  • Targeted gene panels in oncology, sequenced deeply, detect somatic variants in solid tumors and hematologic malignancies and support companion-diagnostic decisions (covered under the molecular oncology applications).
  • Whole-exome sequencing supports the diagnosis of inherited disease, where a causative variant may lie anywhere among the protein-coding genes (covered under the inherited-disorder applications).
  • Untargeted sequencing of microbial DNA underlies pathogen identification and metagenomics, where reads from many organisms in a specimen are sorted computationally (covered under the infectious-disease applications).

Looking Ahead

Generating reads is only half of NGS. Turning millions of short reads into a clinical answer — aligning them to a reference, assessing per-base quality, measuring depth, and calling variants — is the work of bioinformatics, the subject of a later lesson in this module. The properties introduced here (read length, paired-end information, and coverage) are precisely the inputs that those analysis pipelines consume.

References

  1. Lela Buckingham. Molecular Diagnostics: Fundamentals, Methods, and Clinical Applications. 3rd ed. F.A. Davis Company. 2019. verified
  2. Michael R. Green, Joseph Sambrook. Molecular Cloning: A Laboratory Manual. 4th ed. Cold Spring Harbor Laboratory Press. 2012. verified
  3. Bruce Alberts, Rebecca Heald, Alexander Johnson, David Morgan, Martin Raff, Keith Roberts, Peter Walter. Molecular Biology of the Cell. 7th ed. W. W. Norton & Company. 2022. verified