🧬 Molecular Biology

The Blueprint of Life — DNA & RNA

A comprehensive, humanized guide to nucleic acids. From the chemical architecture of purines and pyrimidines to the intricate machinery of replication, transcription, and translation — every detail is explained clearly, with formulas, models, and deep biological insight.

DNARNAA-DNAB-DNAZ-DNAmRNAtRNAAmino Acids

What Are Nucleic Acids?

Nucleic acids are among the most important macromolecules in every living organism. They are linear polymers assembled from repeating units called nucleotides, and they carry the genetic instructions that define every cellular process. The two principal forms — deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) — differ in their sugar component, their stability, and their biological roles. DNA is the long‑term storage medium of genetic information, while RNA acts as a dynamic intermediary, translating stored information into functional proteins and even catalysing biochemical reactions itself. Understanding their structures at the atomic level is essential for grasping everything from heredity to modern gene‑editing technologies.

Nucleic Acids

Biopolymers

→

DNA

Deoxyribose

RNA

Ribose

Nucleotides & Nitrogenous Bases

A nucleotide is the fundamental monomer of all nucleic acids. Each nucleotide is composed of three distinct chemical parts: a phosphate group (often one, two, or three phosphates linked together), a five‑carbon sugar (ribose in RNA, deoxyribose in DNA), and a nitrogen‑containing base that gives each nucleotide its identity. The sugar and base together (without phosphate) are called a nucleoside. When one or more phosphate groups are attached to the 5′ carbon of the sugar, the molecule becomes a nucleotide. For instance, adenosine is the nucleoside; adenosine monophosphate (AMP) is a nucleotide; add two more phosphates and you have adenosine triphosphate (ATP), the universal energy currency of the cell.

The Five Nitrogenous Bases

Category	Base	Symbol	Found In	Molecular Formula
Purine (double ring)	Adenine	A	DNA & RNA	C₅H₅N₅
Purine	Guanine	G	DNA & RNA	C₅H₅N₅O
Pyrimidine (single ring)	Cytosine	C	DNA & RNA	C₄H₅N₃O
Pyrimidine	Thymine	T	DNA only	C₅H₆N₂O₂
Pyrimidine	Uracil	U	RNA only	C₄H₄N₂O₂

Purines (adenine and guanine) contain a fused six‑membered and five‑membered ring system, while pyrimidines (cytosine, thymine, uracil) are built from a single six‑membered ring. Thymine is essentially 5‑methyluracil — it has an extra methyl group (–CH₃) attached to the fifth carbon of the pyrimidine ring. This small chemical modification has enormous biological consequences: it helps repair enzymes distinguish thymine from uracil in DNA, protecting the genome from mutations caused by spontaneous deamination of cytosine.

Purine & Pyrimidine — Structural Models & Differences

To truly understand nucleic acid chemistry, one must visualise the ring structures of purines and pyrimidines. Below are detailed atomic models, including the numbering conventions that are crucial for following hydrogen‑bond patterns, tautomeric shifts, and nucleoside nomenclature.

Purine Ring System (Adenine & Guanine parent)

Purine Skeleton

         N1
        /  \
      C2    C6
      |      |
      N3    C5
        \  /  \
         C4   N7
         |     |
         N9 – C8
      (sugar attachment at N9)
      Molecular formula of purine: C₅H₄N₄
      Adenine = 6‑aminopurine (NH₂ at C6)
      Guanine = 2‑amino‑6‑oxypurine (NH₂ at C2, =O at C6)

Pyrimidine Ring System

Pyrimidine Skeleton

       N1
      /  \
    C2    C6
    |      |
    N3 – C5
      \  /
       C4
      (sugar attachment at N1)
      Pyrimidine formula: C₄H₄N₂
      Cytosine = 4‑amino‑2‑oxypyrimidine
      Thymine = 5‑methyl‑2,4‑dioxypyrimidine
      Uracil = 2,4‑dioxypyrimidine

The critical difference between the two families is the glycosidic bond position: purines link to the sugar via their N9 atom, while pyrimidines attach through N1. This influences the overall geometry of the nucleic acid strand. Additionally, the tautomeric forms of these bases (keto vs. enol, amino vs. imino) play a role in spontaneous mutations when a rare tautomer pairs with the wrong partner during replication.

📷 Structural comparison of purine (adenine) and pyrimidine (thymine) rings. (Source: Wikimedia)

DNA Structure — The Double Helix in Detail

The classic Watson‑Crick model describes DNA as a right‑handed double helix composed of two antiparallel polynucleotide chains. Each chain is formed by a sugar‑phosphate backbone where the 3′ carbon of one deoxyribose is linked to the 5′ carbon of the next via a phosphodiester bond. The nitrogenous bases are stacked inside the helix, nearly perpendicular to the axis, and paired through highly specific hydrogen bonds: adenine pairs with thymine via two hydrogen bonds, while guanine pairs with cytosine via three. This pairing is the molecular basis of heredity — the sequence of bases on one strand automatically determines the sequence on the complementary strand.

Adenine (A)

Purine

Thymine (T)

Pyrimidine

2 H‑bonds

Guanine (G)

Purine

≡

Cytosine (C)

Pyrimidine

3 H‑bonds

The dimensions of B‑DNA (the physiological form) are remarkably consistent: the helix has a diameter of approximately 2.0 nm, each turn spans 3.4 nm and contains about 10.5 base pairs. The two strands are not equally spaced — they create a major groove (wider, deeper) and a minor groove (narrower). These grooves are where proteins read the base sequence without unwinding the helix, allowing transcription factors and other regulatory proteins to bind specific DNA sequences.

Types of DNA — A, B, Z and Beyond

DNA is a surprisingly flexible molecule. While the B‑form is the canonical right‑handed helix found in most cellular environments, other conformations exist depending on hydration, ionic strength, and base sequence. The three most significant forms are A‑DNA, B‑DNA, and Z‑DNA, but researchers have also characterized C, D, E forms as well as higher‑order structures like triplex DNA, G‑quadruplexes, and i‑motifs.

Feature	B‑DNA	A‑DNA	Z‑DNA
Helix sense	Right‑handed	Right‑handed	Left‑handed
Base pairs per turn	10.5	11	12
Diameter	~2.0 nm	~2.3 nm	~1.8 nm
Major groove	Wide & deep	Narrow & deep	Flat (nearly absent)
Biological occurrence	Normal cellular DNA	RNA‑DNA hybrids, dehydrated DNA	Regulatory regions, alternating CG sequences

A‑DNA is prevalent in double‑stranded RNA and in DNA‑RNA hybrids during transcription. Z‑DNA, with its zigzag backbone, is stabilised by alternating purine‑pyrimidine tracts (especially (CG)ₙ) and negative supercoiling. Its left‑handed twist creates a distinct zigzag phosphodiester backbone. Z‑DNA‑binding proteins are involved in antiviral responses and gene regulation. Beyond these, G‑quadruplexes form in guanine‑rich regions (like telomeres) through Hoogsteen hydrogen bonding between four guanines in a planar tetrad, while i‑motifs arise from intercalated cytosine‑rich strands under slightly acidic conditions. These structures are now recognised as important regulatory elements in the genome.

DNA Replication — The Complete Machinery

DNA replication is a semiconservative process: each daughter molecule contains one original parental strand and one newly synthesised strand. This was elegantly demonstrated by Meselson and Stahl in 1958 using isotopic labelling. The replication fork is a complex assembly of enzymes that work in concert. Helicase unwinds the double helix, single‑strand binding proteins prevent re‑annealing, and topoisomerase relieves the torsional stress ahead of the fork. Primase synthesises short RNA primers, which are then extended by DNA polymerase III (in prokaryotes) in the 5′→3′ direction. Because the two template strands run antiparallel, the leading strand is synthesised continuously while the lagging strand is made discontinuously as Okazaki fragments. These fragments are later processed by DNA polymerase I (which removes RNA primers and fills gaps) and sealed by DNA ligase.

In eukaryotes, the replication machinery is even more intricate, with multiple origins of replication, licensing factors, and a coordinated cell‑cycle control system. Telomerase, a ribonucleoprotein enzyme, solves the end‑replication problem by adding repetitive sequences to chromosome ends, preventing progressive shortening.

Chargaff's Rules

Before the double helix was proposed, Erwin Chargaff made a crucial discovery: in any sample of double‑stranded DNA, the amount of adenine always equals the amount of thymine (A = T), and guanine equals cytosine (G = C). This equality extends to the total purine and pyrimidine content: A + G = T + C. This first parity rule is a direct consequence of Watson‑Crick base pairing. Chargaff also observed a second rule: the base composition varies between species, but is constant within all tissues of the same organism — a finding that helped establish DNA as the genetic material.

RNA Structure & Models

Unlike DNA, RNA is typically single‑stranded, but it readily folds into intricate three‑dimensional shapes stabilised by intramolecular base pairing. This folding gives rise to structural motifs such as hairpin loops, stem‑loops, bulges, and pseudoknots. The presence of the 2′‑hydroxyl group on ribose makes RNA chemically more reactive and less stable than DNA, but it also enables RNA to act as a catalyst — ribozymes. The RNA backbone still has a 5′→3′ directionality, and the same base‑pairing rules apply (A‑U, G‑C), though G‑U wobble pairs are common and important for tRNA structure.

📷 RNA strand showing ribose sugar and the four bases including uracil. (Source: Wikimedia)

Types of RNA — A Complete Catalog

The RNA world is far more diverse than the three classical types. Here we describe each major category with its function and characteristics.

mRNA

Carries the protein‑coding sequence from DNA to ribosomes. In eukaryotes, it undergoes 5′ capping, splicing, and 3′ polyadenylation. Bacterial mRNA is often polycistronic.

tRNA

Small L‑shaped molecules (~76‑90 nt) that decode mRNA codons. Each tRNA carries a specific amino acid at its 3′ CCA tail and contains an anticodon loop complementary to the mRNA codon.

rRNA

Forms the structural and catalytic core of ribosomes. 16S/18S rRNA in the small subunit, 23S/28S + 5S in the large subunit. rRNA catalyses peptide bond formation (peptidyl transferase).

snRNA

Small nuclear RNAs (U1, U2, U4, U5, U6) are key components of the spliceosome, directing pre‑mRNA splicing.

miRNA

~22 nt regulatory RNAs that silence gene expression by base‑pairing with target mRNAs, typically in the 3′ UTR, leading to translational repression or degradation.

siRNA

Small interfering RNAs, derived from double‑stranded RNA, trigger cleavage of perfectly complementary mRNAs through the RNAi pathway.

lncRNA

Long non‑coding RNAs (>200 nt) involved in chromatin remodelling, X‑inactivation (XIST), and genomic imprinting.

piRNA

Piwi‑interacting RNAs protect germline genomes by silencing transposable elements.

snoRNA

Guide chemical modifications (methylation and pseudouridylation) of rRNA, tRNA, and snRNA.

Ribozymes

Catalytic RNAs such as the hammerhead ribozyme, capable of self‑cleavage or ligation.

Transcription — From DNA to RNA

Transcription is the DNA‑directed synthesis of RNA, catalysed by RNA polymerase. In prokaryotes, a single RNA polymerase handles all transcription, while eukaryotes employ three (Pol I, II, III) for different RNA classes. The process begins with promoter recognition and initiation, followed by elongation where the polymerase moves along the template strand in the 3′→5′ direction, building an RNA chain in the 5′→3′ direction. Termination signals cause the polymerase to dissociate. In eukaryotes, the resulting pre‑mRNA is extensively processed: a 5′ cap (7‑methylguanosine) is added, introns are spliced out, and a poly‑A tail (~200 adenines) is appended at the 3′ end. Alternative splicing allows a single gene to yield multiple protein isoforms — a key source of biological complexity.

Translation — Decoding mRNA into Protein

Translation is the ribosomal synthesis of proteins from mRNA templates. The ribosome moves along the mRNA in a 5′→3′ direction, reading codons (triplets of nucleotides) and recruiting the appropriate aminoacyl‑tRNAs. The process has three phases: initiation, elongation, and termination. During elongation, the ribosome cycles through three tRNA binding sites: the A site (aminoacyl entry), P site (peptidyl), and E site (exit). Peptide bond formation is catalysed by the rRNA of the large subunit — a classic example of a ribozyme. When a stop codon (UAA, UAG, or UGA) enters the A site, release factors trigger hydrolysis of the polypeptide chain and ribosome disassembly. The newly synthesised protein then folds into its functional three‑dimensional shape, often with the help of chaperones.

The 20 Standard Amino Acids

Proteins are linear polymers of amino acids linked by peptide bonds. Each amino acid contains a central α‑carbon bonded to an amino group (–NH₂), a carboxyl group (–COOH), a hydrogen atom, and a distinctive side chain (R group). The chemical nature of the R group determines whether the amino acid is hydrophobic, polar, acidic, or basic, and profoundly influences protein folding and function.

Category	Amino Acid	Abbrev	R‑Group Property	Essential?
Nonpolar	Glycine (Gly, G)	G	–H, achiral	No
	Alanine (Ala, A)	A	–CH₃	No
	Valine (Val, V)	V	Branched hydrocarbon	Yes
	Leucine (Leu, L)	L	Branched	Yes
	Isoleucine (Ile, I)	I	Branched	Yes
	Proline (Pro, P)	P	Cyclic, rigid	No
	Methionine (Met, M)	M	–CH₂CH₂SCH₃	Yes
	Phenylalanine (Phe, F)	F	Benzyl ring	Yes
	Tryptophan (Trp, W)	W	Indole ring	Yes
Polar uncharged	Serine (Ser, S)	S	–CH₂OH	No
	Threonine (Thr, T)	T	–CH(OH)CH₃	Yes
	Cysteine (Cys, C)	C	–CH₂SH (disulfide)	No
	Asparagine (Asn, N)	N	–CH₂CONH₂	No
	Glutamine (Gln, Q)	Q	–CH₂CH₂CONH₂	No
	Tyrosine (Tyr, Y)	Y	Phenol	Conditionally
Basic (+)	Lysine (Lys, K)	K	–(CH₂)₄NH₃⁺	Yes
	Arginine (Arg, R)	R	Guanidinium	Conditionally
	Histidine (His, H)	H	Imidazole	Yes
Acidic (–)	Aspartate (Asp, D)	D	–CH₂COO⁻	No
Acidic (–)	Glutamate (Glu, E)	E	–CH₂CH₂COO⁻	No

The nine essential amino acids cannot be synthesised de novo by humans and must be obtained from the diet. A helpful mnemonic is "PVT TIM HALL": Phenylalanine, Valine, Threonine, Tryptophan, Isoleucine, Methionine, Histidine, Arginine (conditionally), Leucine, Lysine.

The Genetic Code

The genetic code is the set of rules by which nucleotide triplets (codons) specify amino acids. It is nearly universal, degenerate, and non‑overlapping. Of the 64 possible codons, 61 encode amino acids and 3 serve as termination signals. AUG codes for methionine and also acts as the start codon. The code is read in a fixed reading frame, and its degeneracy — exemplified by six codons for leucine — provides a buffer against point mutations. The wobble hypothesis explains how fewer than 61 tRNA species can decode all codons: the third base of the codon can form non‑canonical pairs with the anticodon, allowing one tRNA to recognise multiple codons.

Codon Table (mRNA 5′→3′)

UUU Phe   UCU Ser   UAU Tyr   UGU Cys
UUC Phe   UCC Ser   UAC Tyr   UGC Cys
UUA Leu   UCA Ser   UAA STOP  UGA STOP
UUG Leu   UCG Ser   UAG STOP  UGG Trp
CUU Leu   CCU Pro   CAU His   CGU Arg
CUC Leu   CCC Pro   CAC His   CGC Arg
... (complete table continues)

Complete Molecular Formulas

Here is a comprehensive collection of chemical formulas for every major nucleic acid component, from bases to full nucleotides and the central dogma.

Bases

Adenine   C₅H₅N₅    (6‑aminopurine)
Guanine   C₅H₅N₅O   (2‑amino‑6‑oxypurine)
Cytosine  C₄H₅N₃O   (4‑amino‑2‑oxypyrimidine)
Thymine   C₅H₆N₂O₂  (5‑methyl‑2,4‑dioxypyrimidine)
Uracil    C₄H₄N₂O₂  (2,4‑dioxypyrimidine)
Purine    C₅H₄N₄
Pyrimidine C₄H₄N₂

Sugars

Ribose       C₅H₁₀O₅   (β‑D‑ribofuranose)
Deoxyribose  C₅H₁₀O₄   (β‑D‑2′‑deoxyribofuranose)

Nucleotides (examples)

AMP  C₁₀H₁₄N₅O₇P   (adenosine monophosphate)
dAMP C₁₀H₁₄N₅O₆P   (deoxyadenosine monophosphate)
ATP  C₁₀H₁₆N₅O₁₃P₃ (adenosine triphosphate)
GTP  C₁₀H₁₆N₅O₁₄P₃ (guanosine triphosphate)

Summary

Key Takeaways

🧬 Nucleic acids (DNA & RNA) are nucleotide polymers. DNA is a double‑stranded antiparallel helix (B‑form in cells) storing genetic information. RNA is usually single‑stranded and fulfills diverse roles: mRNA, tRNA, rRNA, regulatory RNAs, and catalytic ribozymes.

🔬 Nucleotides consist of a phosphate group, a pentose sugar, and a nitrogenous base. Purines (A, G) have a double‑ring; pyrimidines (C, T, U) have a single ring. Base pairing: A=T (2 H‑bonds), G≡C (3 H‑bonds); in RNA, A pairs with U.

🧪 DNA can adopt multiple conformations: A‑DNA (dehydrated), B‑DNA (physiological), Z‑DNA (left‑handed). G‑quadruplexes and i‑motifs are higher‑order structures with regulatory functions.

⚙️ Replication is semiconservative. Key enzymes: helicase, primase, DNA polymerase III, ligase. The lagging strand is synthesised as Okazaki fragments.

📝 Transcription (DNA→RNA) by RNA polymerase; eukaryotic mRNA undergoes capping, splicing, and polyadenylation. Translation (mRNA→protein) occurs on ribosomes using the genetic code; AUG is the start codon; UAA, UAG, UGA are stop codons.

🧫 The 20 standard amino acids have diverse R‑groups. Nine are essential in humans. Peptide bonds link amino acids via dehydration synthesis.

Questions & Answers

Why does DNA use thymine instead of uracil?▼

Cytosine spontaneously deaminates to uracil. If DNA used uracil, the cell could not distinguish a legitimate uracil from a mutated one. Thymine (5‑methyluracil) allows repair enzymes to recognise uracil in DNA as an error and excise it.

What makes Z‑DNA biologically important?▼

Z‑DNA forms in regions of alternating purine‑pyrimidine under negative supercoiling. It is recognised by specific proteins (e.g., ADAR1) and is implicated in gene regulation, antiviral responses, and possibly genetic instability.

How does the ribosome catalyse peptide bond formation?▼

The large ribosomal subunit's 23S/28S rRNA acts as a ribozyme. The peptidyl transferase centre positions the aminoacyl‑tRNA in the A site and the peptidyl‑tRNA in the P site, facilitating the nucleophilic attack that forms the peptide bond — no protein enzyme is needed.

What are Okazaki fragments?▼

Short DNA fragments (100‑2000 nt) synthesised on the lagging strand during replication. Because DNA polymerase can only extend in the 5′→3′ direction, the lagging strand is made discontinuously. These fragments are later joined by DNA ligase.