Lesson 4 of 34 · Nucleic Acid Chemistry

Mutations and Sequence Variation

Mutations and Variation

Why variation is the point of molecular testing

The preceding lessons built up DNA as a stable, base-paired, protein-packaged molecule. Yet the reason a molecular laboratory sequences or genotypes DNA at all is that the sequence is not identical from one copy to the next. Any difference between an observed sequence and a chosen reference sequence is a sequence variant. Some variants change how a gene works and cause disease; most do not. The work of the molecular lab is to detect variation reliably and then describe it precisely enough that another laboratory, a clinician, or a database can interpret the same finding the same way 1.

This lesson is the chemistry module’s capstone. It catalogs the kinds of variation a lab encounters, draws the germline/somatic and polymorphism/pathogenic distinctions, and introduces the idea of a standardized variant nomenclature. It builds directly on nucleic-acid structure and base pairing, and it sets up the genetics and applications courses that follow.

Substitutions

A substitution replaces one base with another at a single position — the simplest and most common kind of variant. Substitutions are classified first by the chemistry of the change 2:

  • A transition exchanges one purine for the other purine (A to G or G to A) or one pyrimidine for the other pyrimidine (C to T or T to C). Because the swapped bases have similar shape, transitions are the more frequent class.
  • A transversion exchanges a purine for a pyrimidine or vice versa (for example A to C, or G to T). These are chemically larger changes and arise less often.
Purines      A <----transition----> G
                 \                /
        transversion       transversion
                 /                \
Pyrimidines  C <----transition----> T

When a substitution falls inside a protein-coding region, its effect is read through the genetic code 3:

  • A silent (synonymous) variant changes the DNA but leaves the encoded amino acid unchanged, because several codons specify the same amino acid.
  • A missense variant changes one codon so that a different amino acid is inserted. The consequence ranges from negligible to severe depending on where it falls and how chemically different the new residue is.
  • A nonsense variant turns an amino-acid codon into a stop codon, truncating the protein early and often abolishing its function.

These categories explain why two substitutions of identical chemistry can have entirely different clinical weight: position and reading frame, not the base change alone, determine the effect on the protein 1.

Insertions, deletions, and the reading frame

An insertion adds one or more bases and a deletion removes them; together they are often called indels. Their consequence in a coding region depends on how many bases are involved 3.

Translation reads the coding sequence in non-overlapping triplets. If an indel adds or removes a number of bases that is not a multiple of three, every codon downstream is reshuffled — a frameshift. A frameshift usually produces a string of unrelated amino acids ending at a premature stop codon, so it tends to be highly disruptive.

reference   ATG  CAT  CAT  CAT  ...   reads: M  H  H  H
insert C    ATG  CCA  TCA  TCA  T..   reads: M  P  S  S  (frame shifted)
delete 3    ATG  CAT  ___  CAT  ...   reads: M  H  -  H  (in-frame: one residue lost)

By contrast, an in-frame indel adds or removes bases in multiples of three. The reading frame is preserved; the protein simply gains or loses whole amino acids while the rest of the sequence is read normally. In-frame changes are often, though not always, less damaging than frameshifts 1.

Variation beyond the coding sequence

Not all consequential variation sits within codons.

Splice-site variants fall at or near the boundaries between exons and introns. Because the cell relies on specific sequences to remove introns and join exons, a variant here can cause an exon to be skipped or an intron to be retained, altering the final protein even though the change lies outside the coding triplets themselves 3. (RNA processing and splicing are developed in the genetics module.)

Copy-number variation (CNV) refers to gains or losses of larger stretches of DNA — from kilobases to whole genes or segments — so that a region is present in more or fewer copies than the reference. Dosage changes of this kind can contribute to disease independent of any single-base change 1.

Short-tandem-repeat and trinucleotide-repeat expansions involve a short sequence motif repeated many times in tandem. When the number of repeats grows beyond a normal range, the expansion can disrupt gene function. This mechanism underlies a group of disorders — fragile X and Huntington disease among them — that the applications course examines in detail; here it is enough to recognize repeat expansion as a distinct category of variation 1.

Structural variants are large-scale rearrangements of the genome: duplications, large deletions, inversions, and translocations, in which segments of chromosomes are exchanged or relocated. Certain translocations create fusion genes that drive cancers, a topic the oncology applications course treats directly. At this stage the point is simply that variation spans scales from a single base to whole chromosome arms 1.

Germline versus somatic variation

Where a variant arises is as important as what it is.

A germline variant is present in the cells that give rise to eggs or sperm, so it is carried in essentially every cell of an individual and can be passed to offspring. Inherited conditions and hereditary disease risk are germline 3.

A somatic variant arises in a non-germline cell during life and is confined to that cell and its descendants. It is not heritable. Most cancer-driving variants are somatic, which is why a tumor’s genome can differ from the patient’s normal tissue. The distinction shapes how a specimen is collected, what it is compared against, and how the result is reported 1.

Polymorphism versus pathogenic variant

A variant’s existence says nothing, by itself, about whether it causes disease. Two ideas separate harmless from harmful variation.

Allele frequency is how common a particular allele is in a population. A variant common enough to be a normal part of human diversity is often called a polymorphism; single-base polymorphisms are extremely numerous and account for much of the benign difference between individuals 1. Rarity alone does not prove harm, and commonness does not prove safety, but frequency is a first and powerful filter.

A pathogenic variant is one with evidence that it causes or substantially contributes to disease. Establishing pathogenicity draws on frequency, the predicted effect on the protein, observed inheritance with disease, and functional data — it is an interpretive judgment, not a property read directly off the sequence 1. The older term mutation is still widely used, but current practice prefers the neutral word variant and states the clinical interpretation separately.

Describing variants consistently

Detecting a variant is only useful if it can be communicated unambiguously. Because the same change can be written many informal ways, the field uses a standardized variant nomenclature so that every laboratory describes the same finding identically 1.

At a conceptual level, this nomenclature names a variant relative to a defined reference sequence and specifies the coordinate system being used. A change can be described at the coding-DNA level (conventionally introduced with a c. prefix) or at the protein level (a p. prefix), so that one report can state both the nucleotide change and its predicted effect on the protein. The essential ideas are that the reference must be stated, the level of description must be explicit, and the same rules apply everywhere.

This lesson introduces the principle of standardized nomenclature, not a catalog of specific variant strings: writing an actual variant correctly requires the chosen reference sequence and the formal rules, which the techniques and applications courses apply to concrete cases. What matters here is the habit of mind — a variant claim is incomplete until it says what reference it is measured against and at what level it is described.

Where this leads

Variation is the signal every downstream molecular assay is built to detect. The genetics module returns to how these variants are inherited and expressed; the techniques course shows how substitutions, indels, repeats, and rearrangements are physically detected; and the applications courses tie specific variant classes to infectious disease, inherited conditions, and cancer. With structure, base pairing, chromatin, and now variation in hand, the chemistry foundation is complete.

References

  1. Lela Buckingham. Molecular Diagnostics: Fundamentals, Methods, and Clinical Applications. 3rd ed. F.A. Davis Company. 2019. verified
  2. David L. Nelson, Michael M. Cox, Aaron A. Hoskins. Lehninger Principles of Biochemistry. 8th ed. W. H. Freeman (Macmillan Learning). 2021. verified
  3. Bruce Alberts, Rebecca Heald, Alexander Johnson, David Morgan, Martin Raff, Keith Roberts, Peter Walter. Molecular Biology of the Cell. 7th ed. W. W. Norton & Company. 2022. verified