Components, Subcomponents, and Repetition

A field is the top of a small hierarchy

Earlier lessons established the outer shape of a message: a segment is a line, and within it fields are separated by the field separator |. But a field is rarely a flat string. HL7 v2 defines a strict, four-level delimiter hierarchy that lets a single field carry structured, even repeating, data ¹:

field          |   separates fields within a segment
repetition     ~   separates repeats of one field
component      ^   separates components within a field (or repeat)
subcomponent   &   separates subcomponents within a component

These characters are not arbitrary. The MSH segment declares them: the field separator is the character right after MSH, and MSH-2 carries the remaining four encoding characters in a fixed order, ^~\& ¹. A parser reads those characters first, then applies them everywhere else. The order also reflects precedence: ~ divides a field into repeats, then ^ divides each repeat into components, then & divides a component into subcomponents.

Components: structure inside one value

The component delimiter ^ splits one field into ordered, positionally defined pieces. Consider a PID-3 identifier on an ADT^A01:

PID-3:  100711^^^HOSP^MR
        ^^^^^^^^^^^^^^^^^
        |     | |    |
        |     | |    +-- identifier type code (MR = medical record)
        |     | +------- assigning authority (HOSP)
        |     +--------- (empty components)
        +--------------- the ID value (100711)

Each ^ advances to the next component position. Empty components in the middle are still counted: 100711^^^HOSP means component 1 is 100711, components 2 and 3 are empty, and component 4 is HOSP. Position carries meaning, so the empties cannot be collapsed.

Subcomponents: one more level down

When a single component is itself composite, the subcomponent delimiter & divides it. The assigning authority is the classic case. Instead of a bare name like HOSP, it can be expressed as a namespace, a universal id, and an id type:

PID-3:  100711^^^HOSP&1.2.840.114&ISO^MR
                     \___________________/
                      one component, three subcomponents:
                      HOSP / 1.2.840.114 / ISO

Here component 4 is not the simple string HOSP; it is three subcomponents, HOSP, 1.2.840.114, and ISO, joined by &. A correct reader treats that whole &-joined run as a single component and only then looks inside it. Treating the & as a component break would shift every later position and corrupt the field ¹.

Repetition: one field, many values

Some fields legitimately hold more than one value — a patient with several identifiers, or several phone numbers. The repetition delimiter ~ separates those repeats. Each repeat is a complete, independently structured value with its own components and subcomponents:

PID-3:  100711^^^HOSP&1.2.840.114&ISO^MR~999888^^^SSA^SS
        \_____________________________/  \_____________/
         repeat 1 (medical record)        repeat 2 (SSN)

PID-13: (555)555-1234^PRN^PH~(555)555-7777^WPN^CP
        \____________________/  \________________/
         home phone              work cell

Repetition is not the same as components. Components describe the parts of one value; repetition lists several values of the same kind. A patient with two identifiers has one PID-3 field with two ~-separated repeats — not two components and not two PID segments.

Empty versus absent, and reading position

Two situations look similar but mean different things ¹:

PID-8:  |   |        an empty field: "no value sent" / not asserted
PID-8:  |""|         the null value: "this field is explicitly empty"

Consecutive delimiters mark empty positions, and you must count them to keep your place. The two-character null "" is a deliberate signal often used on updates to say “clear whatever was here,” which is different from simply omitting the field.

Why precise reading matters

Because meaning rides on position, a parser that miscounts delimiters does not merely lose a value — it misreads every field after the error. The reverse risk appears on round-trips: a system that reads a message, edits one field, and re-emits it must preserve trailing empty components, the exact repetition structure, and any & subcomponents it did not understand. Dropping a trailing ^^ or flattening a repeat silently changes the data even though the text looks plausible ¹. Reading the hierarchy exactly — field, repeat, component, subcomponent — is what keeps data intact as it passes between systems.