What Are Nucleic Acids?
Nucleic acids are among the most important macromolecules in every living organism. They are linear polymers assembled from repeating units called nucleotides, and they carry the genetic instructions that define every cellular process. The two principal forms â deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) â differ in their sugar component, their stability, and their biological roles. DNA is the longâterm storage medium of genetic information, while RNA acts as a dynamic intermediary, translating stored information into functional proteins and even catalysing biochemical reactions itself. Understanding their structures at the atomic level is essential for grasping everything from heredity to modern geneâediting technologies.
Nucleotides & Nitrogenous Bases
A nucleotide is the fundamental monomer of all nucleic acids. Each nucleotide is composed of three distinct chemical parts: a phosphate group (often one, two, or three phosphates linked together), a fiveâcarbon sugar (ribose in RNA, deoxyribose in DNA), and a nitrogenâcontaining base that gives each nucleotide its identity. The sugar and base together (without phosphate) are called a nucleoside. When one or more phosphate groups are attached to the 5Ⲡcarbon of the sugar, the molecule becomes a nucleotide. For instance, adenosine is the nucleoside; adenosine monophosphate (AMP) is a nucleotide; add two more phosphates and you have adenosine triphosphate (ATP), the universal energy currency of the cell.
The Five Nitrogenous Bases
| Category | Base | Symbol | Found In | Molecular Formula |
|---|---|---|---|---|
| Purine (double ring) | Adenine | A | DNA & RNA | Câ Hâ Nâ |
| Purine | Guanine | G | DNA & RNA | Câ Hâ Nâ O |
| Pyrimidine (single ring) | Cytosine | C | DNA & RNA | CâHâ NâO |
| Pyrimidine | Thymine | T | DNA only | Câ HâNâOâ |
| Pyrimidine | Uracil | U | RNA only | CâHâNâOâ |
Purines (adenine and guanine) contain a fused sixâmembered and fiveâmembered ring system, while pyrimidines (cytosine, thymine, uracil) are built from a single sixâmembered ring. Thymine is essentially 5âmethyluracil â it has an extra methyl group (âCHâ) attached to the fifth carbon of the pyrimidine ring. This small chemical modification has enormous biological consequences: it helps repair enzymes distinguish thymine from uracil in DNA, protecting the genome from mutations caused by spontaneous deamination of cytosine.
Purine & Pyrimidine â Structural Models & Differences
To truly understand nucleic acid chemistry, one must visualise the ring structures of purines and pyrimidines. Below are detailed atomic models, including the numbering conventions that are crucial for following hydrogenâbond patterns, tautomeric shifts, and nucleoside nomenclature.
Purine Ring System (Adenine & Guanine parent)
N1
/ \
C2 C6
| |
N3 C5
\ / \
C4 N7
| |
N9 â C8
(sugar attachment at N9)
Molecular formula of purine: Câ
HâNâ
Adenine = 6âaminopurine (NHâ at C6)
Guanine = 2âaminoâ6âoxypurine (NHâ at C2, =O at C6)Pyrimidine Ring System
N1
/ \
C2 C6
| |
N3 â C5
\ /
C4
(sugar attachment at N1)
Pyrimidine formula: CâHâNâ
Cytosine = 4âaminoâ2âoxypyrimidine
Thymine = 5âmethylâ2,4âdioxypyrimidine
Uracil = 2,4âdioxypyrimidineThe critical difference between the two families is the glycosidic bond position: purines link to the sugar via their N9 atom, while pyrimidines attach through N1. This influences the overall geometry of the nucleic acid strand. Additionally, the tautomeric forms of these bases (keto vs. enol, amino vs. imino) play a role in spontaneous mutations when a rare tautomer pairs with the wrong partner during replication.
DNA Structure â The Double Helix in Detail
The classic WatsonâCrick model describes DNA as a rightâhanded double helix composed of two antiparallel polynucleotide chains. Each chain is formed by a sugarâphosphate backbone where the 3Ⲡcarbon of one deoxyribose is linked to the 5Ⲡcarbon of the next via a phosphodiester bond. The nitrogenous bases are stacked inside the helix, nearly perpendicular to the axis, and paired through highly specific hydrogen bonds: adenine pairs with thymine via two hydrogen bonds, while guanine pairs with cytosine via three. This pairing is the molecular basis of heredity â the sequence of bases on one strand automatically determines the sequence on the complementary strand.
The dimensions of BâDNA (the physiological form) are remarkably consistent: the helix has a diameter of approximately 2.0 nm, each turn spans 3.4 nm and contains about 10.5 base pairs. The two strands are not equally spaced â they create a major groove (wider, deeper) and a minor groove (narrower). These grooves are where proteins read the base sequence without unwinding the helix, allowing transcription factors and other regulatory proteins to bind specific DNA sequences.
Types of DNA â A, B, Z and Beyond
DNA is a surprisingly flexible molecule. While the Bâform is the canonical rightâhanded helix found in most cellular environments, other conformations exist depending on hydration, ionic strength, and base sequence. The three most significant forms are AâDNA, BâDNA, and ZâDNA, but researchers have also characterized C, D, E forms as well as higherâorder structures like triplex DNA, Gâquadruplexes, and iâmotifs.
| Feature | BâDNA | AâDNA | ZâDNA |
|---|---|---|---|
| Helix sense | Rightâhanded | Rightâhanded | Leftâhanded |
| Base pairs per turn | 10.5 | 11 | 12 |
| Diameter | ~2.0 nm | ~2.3 nm | ~1.8 nm |
| Major groove | Wide & deep | Narrow & deep | Flat (nearly absent) |
| Biological occurrence | Normal cellular DNA | RNAâDNA hybrids, dehydrated DNA | Regulatory regions, alternating CG sequences |
AâDNA is prevalent in doubleâstranded RNA and in DNAâRNA hybrids during transcription. ZâDNA, with its zigzag backbone, is stabilised by alternating purineâpyrimidine tracts (especially (CG)â) and negative supercoiling. Its leftâhanded twist creates a distinct zigzag phosphodiester backbone. ZâDNAâbinding proteins are involved in antiviral responses and gene regulation. Beyond these, Gâquadruplexes form in guanineârich regions (like telomeres) through Hoogsteen hydrogen bonding between four guanines in a planar tetrad, while iâmotifs arise from intercalated cytosineârich strands under slightly acidic conditions. These structures are now recognised as important regulatory elements in the genome.
DNA Replication â The Complete Machinery
DNA replication is a semiconservative process: each daughter molecule contains one original parental strand and one newly synthesised strand. This was elegantly demonstrated by Meselson and Stahl in 1958 using isotopic labelling. The replication fork is a complex assembly of enzymes that work in concert. Helicase unwinds the double helix, singleâstrand binding proteins prevent reâannealing, and topoisomerase relieves the torsional stress ahead of the fork. Primase synthesises short RNA primers, which are then extended by DNA polymerase III (in prokaryotes) in the 5â˛â3Ⲡdirection. Because the two template strands run antiparallel, the leading strand is synthesised continuously while the lagging strand is made discontinuously as Okazaki fragments. These fragments are later processed by DNA polymerase I (which removes RNA primers and fills gaps) and sealed by DNA ligase.
In eukaryotes, the replication machinery is even more intricate, with multiple origins of replication, licensing factors, and a coordinated cellâcycle control system. Telomerase, a ribonucleoprotein enzyme, solves the endâreplication problem by adding repetitive sequences to chromosome ends, preventing progressive shortening.
Chargaff's Rules
Before the double helix was proposed, Erwin Chargaff made a crucial discovery: in any sample of doubleâstranded DNA, the amount of adenine always equals the amount of thymine (A = T), and guanine equals cytosine (G = C). This equality extends to the total purine and pyrimidine content: A + G = T + C. This first parity rule is a direct consequence of WatsonâCrick base pairing. Chargaff also observed a second rule: the base composition varies between species, but is constant within all tissues of the same organism â a finding that helped establish DNA as the genetic material.
RNA Structure & Models
Unlike DNA, RNA is typically singleâstranded, but it readily folds into intricate threeâdimensional shapes stabilised by intramolecular base pairing. This folding gives rise to structural motifs such as hairpin loops, stemâloops, bulges, and pseudoknots. The presence of the 2â˛âhydroxyl group on ribose makes RNA chemically more reactive and less stable than DNA, but it also enables RNA to act as a catalyst â ribozymes. The RNA backbone still has a 5â˛â3Ⲡdirectionality, and the same baseâpairing rules apply (AâU, GâC), though GâU wobble pairs are common and important for tRNA structure.
Types of RNA â A Complete Catalog
The RNA world is far more diverse than the three classical types. Here we describe each major category with its function and characteristics.
mRNA
Carries the proteinâcoding sequence from DNA to ribosomes. In eukaryotes, it undergoes 5Ⲡcapping, splicing, and 3Ⲡpolyadenylation. Bacterial mRNA is often polycistronic.
tRNA
Small Lâshaped molecules (~76â90 nt) that decode mRNA codons. Each tRNA carries a specific amino acid at its 3ⲠCCA tail and contains an anticodon loop complementary to the mRNA codon.
rRNA
Forms the structural and catalytic core of ribosomes. 16S/18S rRNA in the small subunit, 23S/28S + 5S in the large subunit. rRNA catalyses peptide bond formation (peptidyl transferase).
snRNA
Small nuclear RNAs (U1, U2, U4, U5, U6) are key components of the spliceosome, directing preâmRNA splicing.
miRNA
~22 nt regulatory RNAs that silence gene expression by baseâpairing with target mRNAs, typically in the 3ⲠUTR, leading to translational repression or degradation.
siRNA
Small interfering RNAs, derived from doubleâstranded RNA, trigger cleavage of perfectly complementary mRNAs through the RNAi pathway.
lncRNA
Long nonâcoding RNAs (>200 nt) involved in chromatin remodelling, Xâinactivation (XIST), and genomic imprinting.
piRNA
Piwiâinteracting RNAs protect germline genomes by silencing transposable elements.
snoRNA
Guide chemical modifications (methylation and pseudouridylation) of rRNA, tRNA, and snRNA.
Ribozymes
Catalytic RNAs such as the hammerhead ribozyme, capable of selfâcleavage or ligation.
Transcription â From DNA to RNA
Transcription is the DNAâdirected synthesis of RNA, catalysed by RNA polymerase. In prokaryotes, a single RNA polymerase handles all transcription, while eukaryotes employ three (Pol I, II, III) for different RNA classes. The process begins with promoter recognition and initiation, followed by elongation where the polymerase moves along the template strand in the 3â˛â5Ⲡdirection, building an RNA chain in the 5â˛â3Ⲡdirection. Termination signals cause the polymerase to dissociate. In eukaryotes, the resulting preâmRNA is extensively processed: a 5Ⲡcap (7âmethylguanosine) is added, introns are spliced out, and a polyâA tail (~200 adenines) is appended at the 3Ⲡend. Alternative splicing allows a single gene to yield multiple protein isoforms â a key source of biological complexity.
Translation â Decoding mRNA into Protein
Translation is the ribosomal synthesis of proteins from mRNA templates. The ribosome moves along the mRNA in a 5â˛â3Ⲡdirection, reading codons (triplets of nucleotides) and recruiting the appropriate aminoacylâtRNAs. The process has three phases: initiation, elongation, and termination. During elongation, the ribosome cycles through three tRNA binding sites: the A site (aminoacyl entry), P site (peptidyl), and E site (exit). Peptide bond formation is catalysed by the rRNA of the large subunit â a classic example of a ribozyme. When a stop codon (UAA, UAG, or UGA) enters the A site, release factors trigger hydrolysis of the polypeptide chain and ribosome disassembly. The newly synthesised protein then folds into its functional threeâdimensional shape, often with the help of chaperones.
The 20 Standard Amino Acids
Proteins are linear polymers of amino acids linked by peptide bonds. Each amino acid contains a central Îąâcarbon bonded to an amino group (âNHâ), a carboxyl group (âCOOH), a hydrogen atom, and a distinctive side chain (R group). The chemical nature of the R group determines whether the amino acid is hydrophobic, polar, acidic, or basic, and profoundly influences protein folding and function.
| Category | Amino Acid | Abbrev | RâGroup Property | Essential? |
|---|---|---|---|---|
| Nonpolar | Glycine (Gly, G) | G | âH, achiral | No |
| Alanine (Ala, A) | A | âCHâ | No | |
| Valine (Val, V) | V | Branched hydrocarbon | Yes | |
| Leucine (Leu, L) | L | Branched | Yes | |
| Isoleucine (Ile, I) | I | Branched | Yes | |
| Proline (Pro, P) | P | Cyclic, rigid | No | |
| Methionine (Met, M) | M | âCHâCHâSCHâ | Yes | |
| Phenylalanine (Phe, F) | F | Benzyl ring | Yes | |
| Tryptophan (Trp, W) | W | Indole ring | Yes | |
| Polar uncharged | Serine (Ser, S) | S | âCHâOH | No |
| Threonine (Thr, T) | T | âCH(OH)CHâ | Yes | |
| Cysteine (Cys, C) | C | âCHâSH (disulfide) | No | |
| Asparagine (Asn, N) | N | âCHâCONHâ | No | |
| Glutamine (Gln, Q) | Q | âCHâCHâCONHâ | No | |
| Tyrosine (Tyr, Y) | Y | Phenol | Conditionally | |
| Basic (+) | Lysine (Lys, K) | K | â(CHâ)âNHââş | Yes |
| Arginine (Arg, R) | R | Guanidinium | Conditionally | |
| Histidine (His, H) | H | Imidazole | Yes | |
| Acidic (â) | Aspartate (Asp, D) | D | âCHâCOOâť | No |
| Glutamate (Glu, E) | E | âCHâCHâCOOâť | No |
The nine essential amino acids cannot be synthesised de novo by humans and must be obtained from the diet. A helpful mnemonic is "PVT TIM HALL": Phenylalanine, Valine, Threonine, Tryptophan, Isoleucine, Methionine, Histidine, Arginine (conditionally), Leucine, Lysine.
The Genetic Code
The genetic code is the set of rules by which nucleotide triplets (codons) specify amino acids. It is nearly universal, degenerate, and nonâoverlapping. Of the 64 possible codons, 61 encode amino acids and 3 serve as termination signals. AUG codes for methionine and also acts as the start codon. The code is read in a fixed reading frame, and its degeneracy â exemplified by six codons for leucine â provides a buffer against point mutations. The wobble hypothesis explains how fewer than 61 tRNA species can decode all codons: the third base of the codon can form nonâcanonical pairs with the anticodon, allowing one tRNA to recognise multiple codons.
UUU Phe UCU Ser UAU Tyr UGU Cys
UUC Phe UCC Ser UAC Tyr UGC Cys
UUA Leu UCA Ser UAA STOP UGA STOP
UUG Leu UCG Ser UAG STOP UGG Trp
CUU Leu CCU Pro CAU His CGU Arg
CUC Leu CCC Pro CAC His CGC Arg
... (complete table continues)Complete Molecular Formulas
Here is a comprehensive collection of chemical formulas for every major nucleic acid component, from bases to full nucleotides and the central dogma.
Adenine Câ
Hâ
Nâ
(6âaminopurine)
Guanine Câ
Hâ
Nâ
O (2âaminoâ6âoxypurine)
Cytosine CâHâ
NâO (4âaminoâ2âoxypyrimidine)
Thymine Câ
HâNâOâ (5âmethylâ2,4âdioxypyrimidine)
Uracil CâHâNâOâ (2,4âdioxypyrimidine)
Purine Câ
HâNâ
Pyrimidine CâHâNâRibose Câ
HââOâ
(βâDâribofuranose)
Deoxyribose Câ
HââOâ (βâDâ2â˛âdeoxyribofuranose)AMP CââHââNâ
OâP (adenosine monophosphate)
dAMP CââHââNâ
OâP (deoxyadenosine monophosphate)
ATP CââHââNâ
OââPâ (adenosine triphosphate)
GTP CââHââNâ
OââPâ (guanosine triphosphate)Summary
Key Takeaways
đ§Ź Nucleic acids (DNA & RNA) are nucleotide polymers. DNA is a doubleâstranded antiparallel helix (Bâform in cells) storing genetic information. RNA is usually singleâstranded and fulfills diverse roles: mRNA, tRNA, rRNA, regulatory RNAs, and catalytic ribozymes.
đŹ Nucleotides consist of a phosphate group, a pentose sugar, and a nitrogenous base. Purines (A, G) have a doubleâring; pyrimidines (C, T, U) have a single ring. Base pairing: A=T (2 Hâbonds), GâĄC (3 Hâbonds); in RNA, A pairs with U.
đ§Ş DNA can adopt multiple conformations: AâDNA (dehydrated), BâDNA (physiological), ZâDNA (leftâhanded). Gâquadruplexes and iâmotifs are higherâorder structures with regulatory functions.
âď¸ Replication is semiconservative. Key enzymes: helicase, primase, DNA polymerase III, ligase. The lagging strand is synthesised as Okazaki fragments.
đ Transcription (DNAâRNA) by RNA polymerase; eukaryotic mRNA undergoes capping, splicing, and polyadenylation. Translation (mRNAâprotein) occurs on ribosomes using the genetic code; AUG is the start codon; UAA, UAG, UGA are stop codons.
đ§Ť The 20 standard amino acids have diverse Râgroups. Nine are essential in humans. Peptide bonds link amino acids via dehydration synthesis.