This document was originally written in 1999 while at SmithKline Beecham Pharmaceuticals R&D, Bioinformatics department.
[Back to top of page]
The dalton, aka amu, is a unit of mass equal to one-twelvth the mass of a carbon-12 atom, or about the mass of a single hydrogen atom.
The oft-cited A-T and C-G pairing of nucleosides isn't as strict as the biology textbooks and popular science articles make it out to be. First, there are several ways in which a purine and pyrimidine can hydrogen-bond. The Watson-Crick form is preferred in most organisms, because the alternate -- form is only stable at higher pH than are found in most organisms. Also, G will bond quite happily with T. This is why "proofreading" enzymes are needed during DNA replication; to enforce the exclusive pairing rules.
If flattened, there will be 6Å between adjacent sugars (or phosphates) in the backbone, the nucleosides will be 3.3Å thick, with 2.7Å between them. Because they are hydrophobic, the nucleosides will twist to evade water molecules, dragging the backbone with them. The energetically favorable way to do this is a double helix.
The backbones are labeled in 5' to 3' order (based on the identifiers for the carbon atoms in the sugar). 5' is upstream, 3' is downstream. A DNA diagram shows the 5' to 3' (left to right) "sense" strand on top, and the complementary "antisense" "3'-5'" (left to right) strand (which reads the same, but right to left) at the bottom. RNA transcription reads from the antisense strand, restoring the 5' to 3' order. Then protein is synthesized from the RNA, in N-to-C order.
Double-stranded DNA has sufficient flexibility in its sugar-phosphate backbone and between the backbone and the nucleosides to fold into a variety of helices. The exact shape depends on the local environment: the water, acidity or alkalinity. And the sequence of bases. Because the hydrophobic nucleosides adjust to minimize their exposure. A single DNA molecule will vary in its coiling along its length. 2nm diameter.
B-DNA is a right-handed helix with 10 bases per turn. A-DNA is also right-handed, with 11. Z-DNA is left-handed with 12. DNA as it occurs in humans is between the A- and B-forms.
Triple-helix DNA is possible, because a third nucleoside can stably hydrogen-bond to two others in a plane. Experiment ... used to split a normal double-stranded yeast chromosome.
Major and minor groove. Because the helix will alter in its pitch along its length, the relative size of the grooves varies.
Certain molecules are of the right shape and size to insert themselves ("intercalate") between paired bases, forcing open the helix.
The base ratio (AT pairs : GC pairs) is consistent within a species, but differs markedly between them.
To fit inside a cell nucleus, DNA must be coiled and supercoiled several times in an intricate packing strategy. The first coiling is the helix. With the second, it forms "supercoils" in structures called nucleosomes in collusion with 5 specialized proteins. 2 copies each of 4 of these form a 6nm-wide "histone octamer" around which a strand of DNA wraps left-handed twice, about 150bp. A fifth histone adheres to the 50bp strand linking octamers. The 9 elements together form a 200bp nucleosome. The exact linking configuration of multiple nucleosomes (and thus, the length of the linking strand) is unknown, but together they form a super-supercoiled "tertiary coiling" "300-angstrom fiber," 6-7nm per turn.
DNA and accessory proteins together form the material called "chromatin."
Telomere 5'TTAGGG3', centromere. By convention, the shorter arm of a chromosome is known as the "P" (for "petite") arm, and the longer as "Q" (it's the letter following P). The cytoband nomenclature includes the chromosome number, p or q, and band location.
Humans have 22 homologous pairs, the autosomes (#1-22), and 2 non-homologous sex chromosomes (X and Y, sometimes called #23 and #24). Mice have 21 pairs, X and Y. A homologous pair is, technically, any two that pair up during mitosis.
The "A" chromosomes constitute the main complement of an eukaryotic cell. "B" chromosomes are dispensable. They consist of highly compressed chromatin. Euchromatin accounts for most of it; it's uncoiled during interphase. Heterochromatin is, in contrast, highly condensed during interphase, seems to have few coding regions (which would explain why it can afford to be condensed) and binds to staining compounds. Found in, for instance, the centromere. Constitutive and faculative heterochromatin.
A single chromatid is typically 0.5um wide and 10um long.
A stained chromosome displays dark bands and light interbands. The bands are consistent across individuals in a species. Cytogenetic ideogram. Different stains will produce different bands. The most common is G-banding (using Giemsa stain). Q, R, C. Light bands (little uptake of stain) R-bands, and dark G-bands.
During mitosis, the homologous pairs will line up and attach at the centromere. At this time each is known as a chromatid, or sister half-chromosome.
Haploid, diploid, tetraploid. Nullisomy, monosomy, trisomy.
Analogs have the same purpose, but different evolutionary origins. ("convergent evolution") Homologs have the same origin, but have evolved to serve different purposes (arms, forelegs, wings). Orthologs ... Paralogs ... A supergene family consists of all the descendents of a primordial gene that duplicated then diverged.
Unique (single-copy), moderately repetitive, and highly repetitive sequences. Tandemly repeated simple sequences (satellite DNA) occur near centromeres, 4 families in humans. One type consists of 5-10bp sequences in tandem and inverted repeats for about 1kbp. Hypervariable minisatellite sequences (variable number of tandem repeats, VNTRs) consist of 10-15bp-long sequences. The most common of these is Alu. It's also a SINE, a short interspersed nucleotide repeat. LINEs like the 5-kbp Kpn sequence.
A pseudogene looks like a gene, but isn't operative because it's lacking necessary regulator regions.
A linkage group is the complete set of genes on a chromosome.
A locus may be defined by a gene, restriction enzyme site, phenotype, band, rearrangement, DNA sequence, or molecular clone. They may be detected by linkage analysis, somatic cell techniques; phage (cosmid), E.coli (plasmid), yeast (YAC) molecular cloning; DNA sequencing; chromosome restriction mapping, walking (crawling), linking, jumping.
Linkage analysis looks at the frequency of meiotic recombination between loci. If close, two loci display linkage (by cosegregating during meiosis) more often. Measured in centimorgans, cM, a unit of probability.
RFLPs, restriction fragment length polymorphisms, imply that a base is different somewhere (if it were the same, the restriction enzyme would have cut there, and there wouldn't be any fragment length difference).
Somatic cell genetics observes abnormal karyotypes in mitotic chromosomes. In situ hybridization of labeled (radioactive, fluorescent, gold antibody particles) DNA clones to metaphase chromosomes. X-ray-induced chromosomal fragmentation (measured in centirays, cR).
A library of sequences is a misnomer. It's a set (unordered collection) of fragments cut up (cleaved by restriction enzymes) from some source -- an organism, chromosome, cancerous cell line. Bottom-up analysis seeks to overlap cloned sequences from a library. Such a contiguous overlapping clone set is known as a contig. Researchers have several biological tools for dealing with fragments of various lengths. E.coli plasmids can carry them up to 4kbp. Lambda phase 5-15kbp, cosmids (plasmids with a special phage sequence added) to 40kbp, and YAC (yeast artificial chromosomes) to 500kbp. Assembling multiple disjoint contigs in chromosome walking (aka crawling because the task is so tedious) requires spanning sequences, end probe generation.
STAR, sequence tagged restriction site. EST, expressed sequence tag.
A restriction map is measured in bp.
A cytogenetic map uses banding. Radiation hybrid map.