Lesson 19 of 34 · Manipulation of RNA/DNA

Bisulfite Conversion and Methylation Analysis

Bisulfite Conversion and Methylation

Overview

An earlier foundations lesson on chromatin introduced DNA methylation — the addition of a methyl group to cytosine, most often where a cytosine is immediately followed by a guanine (a CpG dinucleotide) — as an epigenetic mark that can quiet a gene without altering its underlying sequence 1. This lesson does not re-teach epigenetics; instead it addresses a practical analytical problem. Methylation carries clinical meaning, yet ordinary sequence reading is blind to it. The standard solution is a chemical pretreatment of the DNA — sodium bisulfite conversion — that rewrites the methylation pattern into a sequence difference the usual tools can detect.

Why Ordinary Sequence Reading Misses Methylation

The methyl group that defines 5-methylcytosine (5-mC) sits on the cytosine ring at a position that does not touch the hydrogen bonds the base uses to pair. A methylated cytosine therefore still pairs with guanine exactly as an unmethylated cytosine does. Replication copies it as a C, a polymerase reads it as a C, and a sequencer calls it a C.

The consequence is that the methyl mark is chemically real but analytically invisible. Two DNA molecules identical in sequence — one methylated at a given CpG, the other not — produce the same result in any method that only reads the four bases. To interrogate methylation, the mark must first be turned into something the base-reading machinery can see.

The Bisulfite Reaction

Sodium bisulfite treatment supplies exactly that conversion. Under controlled conditions of pH, temperature, and time, bisulfite chemically deaminates cytosine, and through the steps that follow, an unmethylated cytosine is converted to uracil. In the downstream amplification and reading steps, uracil behaves like thymine, so a converted (originally unmethylated) cytosine is ultimately read as T.

The decisive feature is selectivity. The methyl group on 5-methylcytosine protects it from this reaction. A methylated cytosine is left essentially unchanged and continues to be read as C. The direction of the conversion is therefore specific and must be stated precisely:

  • Unmethylated cytosine -> uracil -> read as T downstream
  • 5-methylcytosine -> protected, unchanged -> read as C downstream

After conversion, the readout at each CpG becomes a simple two-state question: a C that survives marks a position that was methylated, while a T marks a position that was unmethylated. The epigenetic mark has been recast as an ordinary sequence difference 2.

ORIGINAL DNA (mC = 5-methylcytosine):
   5'- A  C  G  T  mC G  A  C  T -3'
          ^unmeth     ^methylated
          C            mC

AFTER BISULFITE CONVERSION (read downstream):
   5'- A  T  G  T  C  G  A  T  T -3'
          ^            ^
   unmeth C -> T   meth C stays C
   (other non-CpG C's also convert to T)

Note in the example that cytosines outside a CpG context are typically unmethylated and so also convert to T; only the protected methylcytosine remains a C. Reading C-versus-T at each CpG is what reports methylated-versus-unmethylated.

Workflow at a High Level

A bisulfite-based methylation analysis proceeds in a consistent sequence of stages, whatever the final readout:

  purified DNA
       |
  denature to single strands   (bisulfite acts on single-stranded DNA)
       |
  sodium bisulfite treatment    (unmeth C -> U; meth C protected)
       |
  desalting / clean-up
       |
  amplify the converted region  (PCR — covered separately)
       |
  read the C-vs-T pattern       (MSP, bisulfite sequencing, or array)

Denaturation matters because the reagent acts on single-stranded DNA; double-stranded regions resist conversion. Equally important, the conversion is not symmetric across the two original strands — once unmethylated cytosines become uracils, the two strands are no longer complementary, so each strand is analyzed in its own right rather than treated as a mirror of the other.

Downstream Readouts

Several methods read the converted DNA. They are introduced here by topic only; the underlying techniques are developed in their own lessons.

Methylation-specific PCR (MSP) uses two competing primer pairs designed against the converted sequence. One pair matches the sequence expected if the CpGs were methylated (cytosines preserved); the other matches the sequence expected if they were unmethylated (cytosines now thymines). Whichever pair yields product indicates the methylation state of the template. This is a targeted, qualitative readout for a specific locus.

Bisulfite sequencing amplifies a converted region and then sequences it, reading the C-versus-T call at every CpG across the amplicon. This gives a position-by-position map of methylation rather than a single yes/no answer, and remains a reference approach for characterizing a locus in detail 3.

Methylation arrays scale the same converted-DNA principle to many thousands of predefined CpG sites at once, using probes that distinguish the methylated (C) from the unmethylated (T) version of each site. Arrays trade the fine, contiguous detail of sequencing for broad, genome-scale coverage of selected positions.

In every case the logic is identical: bisulfite conversion first transforms methylation into a C/T sequence difference, and the readout method then measures that difference.

Caveats and Controls

The chemistry that makes bisulfite analysis possible also creates its main pitfalls, and a reliable assay is built around guarding against them.

Incomplete conversion is the most consequential error. If unmethylated cytosines are not fully deaminated, they remain as C and are misread as methylated — a false positive for methylation. Conversion efficiency is therefore monitored, often by checking cytosines at non-CpG positions, which in most contexts are unmethylated and so should all read as T after a complete reaction. Any residual C at those positions signals under-conversion.

DNA degradation and fragmentation follow from the harshness of the treatment. The combination of low pH, elevated temperature, and extended incubation that drives conversion also damages and fragments the DNA, reducing the amount of intact, amplifiable template. Starting with sufficient good-quality DNA and avoiding over-treatment help preserve enough material for the downstream step.

Controls anchor interpretation. Fully methylated and fully unmethylated reference DNAs establish the expected C and T readouts at the assayed sites and confirm that conversion, amplification, and detection are all behaving as designed. Without such controls, an ambiguous result cannot be confidently assigned to true biology rather than to a failed or partial conversion 2.

Summary

DNA methylation is biologically meaningful but invisible to ordinary base reading, because 5-methylcytosine pairs the same as cytosine. Sodium bisulfite conversion solves this by deaminating unmethylated cytosine to uracil (read downstream as thymine) while leaving 5-methylcytosine protected (still read as cytosine). After conversion, a simple C-versus-T comparison at each CpG reports methylated versus unmethylated, and that signal can be read by methylation-specific PCR, bisulfite sequencing, or methylation arrays. Because the process depends on complete conversion and subjects DNA to harsh conditions, controls for conversion efficiency and attention to DNA integrity are essential to a trustworthy result.

References

  1. Bruce Alberts, Rebecca Heald, Alexander Johnson, David Morgan, Martin Raff, Keith Roberts, Peter Walter. Molecular Biology of the Cell. 7th ed. W. W. Norton & Company. 2022. verified
  2. Lela Buckingham. Molecular Diagnostics: Fundamentals, Methods, and Clinical Applications. 3rd ed. F.A. Davis Company. 2019. verified
  3. Michael R. Green, Joseph Sambrook. Molecular Cloning: A Laboratory Manual. 4th ed. Cold Spring Harbor Laboratory Press. 2012. verified