Abstract:
Overlapping genes occur widely in microorganisms and in some plastid genomes, but unique properties are observed when such genes span the boundaries between single-copy and repeat regions. The termini of ndhH and ndhF, situated near opposite ends of the small single-copy region (SSC) in the plastid genomes of grasses (Poaceae), have migrated repeatedly into and out of the adjacent inverted-repeat regions (IR). The two genes are transcribed in the same direction, and the 5' terminus of ndhH extends into the IR in some species, while the 3' terminus of ndhF extends into the IR in others. When both genes extend into the IR, portions of the genes overlap and are encoded by the same nucleotide positions. Fine-scale mapping of the SSC-IR junctions across a sample of 92 grasses and outgroups, integrated into a phylogenetic analysis, indicates that the earliest grasses resembled the related taxa Joinvillea (Joinvilleaceae) and Ecdeiocolea (Ecdeiocoleaceae), with ca. 180 nucleotides of ndhH extending into the IR, and with ndhF confined to the SSC. This structure is maintained in early-diverging grass lineages and in most species of the BEP clade. In the PACMAD clade, ndhH lies completely or nearly completely within the SSC, and ca. 20 nucleotides of ndhF extend into the IR. The nucleotide substitution rate has increased in the PACMAD clade in the portion of ndhH that has migrated into the SSC.