‘MOLECULAR BIOLOGY’
All of the above techniques of investigation are
themselves ‘molecular biology’ in the original sense of the term; however, the
term ‘molecular biology’ has taken on the new and different meaning of ‘genetic engineering’ or ‘genetic manipulation.’ These techniques for manipulating nucleic acids
in vitro (that is, outside living cells or organisms) do not comprise a new
discipline but are an outgrowth of earlier developments in biochemistry and
cell biology over the previous 50 years. This powerful new technology has
revolutionized virology and, to a large extent, has shifted the focus of
attention away from the virus particle onto the virus genome. Again, this book
is not the place to discuss in detail the technical aspects of these methods,
and readers are referred to one of the many relevant texts, such as those given
at the end of this chapter.
Virus infection has long been used to probe the
working of ‘normal’ (i.e., uninfected) cells—for
example, to look at macromolecular synthesis. This is true, for example, of the
applications of bacteriophages in bacterial genetics and in many instances
where the study of eukaryotic viruses has revealed fundamental information
about the cell biology and genomic organization of higher organisms. In 1970,
John Kates first observed that vaccinia virus mRNAs were polyadenylated at
their 3¢ ends. In the same year, Howard Temin and David
Baltimore jointly identified the enzyme reverse transcriptase (RNA-dependent
DNA polymerase) in retrovirusinfected cells. This finding shattered the
so-called ‘central dogma’ of biology that there is a one-way flow of information from DNA
through RNA into protein and revealed the plasticity of the eukaryote genome.
Subsequently, the purification of this enzyme from retrovirus particles
permitted cDNA cloning, which greatly accelerated the study of viruses with RNA
genomes—a good illustration of the catalytic
nature of scientific advances. In 1977, Richard Roberts and, independently,
Phillip Sharp recognized that adenovirus mRNAs were spliced to remove
intervening sequences, indicating the similarities between virus and cellular
genomes. Initially at least, the effect of this new technology was to shift the
emphasis of investigation from proteins to nucleic acids. As the power of the
techniques developed, it quickly became possible to determine the nucleotide
sequences of entire virus genomes, beginning with the smallest bacteriophages
in the mid-1970s and working up to the largest of all virus genomes, those of the
herpesviruses and poxviruses, many of which have now been determined. This
nucleic acid-centred technology, in addition to its ultimate achievement of
nucleotide sequencing and the artificial manipulation of virus genomes, also
offered significant advances in detection of viruses and virus infections
involving nucleic acid hybridization techniques. There are many variants of
this basic idea, but, essentially, a hybridization probe, labelled in some
fashion to facilitate detection, is allowed to react with a crude mixture of
nucleic acids. The specific interaction of the probe sequence with
complementary virus-encoded sequences, to which it binds by hydrogen-bond
formation between the complementary base pairs, reveals the presence of the
virus genetic material (Figure 1.7). This approach has been taken a stage
further by the development of various in vitro nucleic acid amplification
procedures, such as polymerase chain reaction (PCR), which is an even more
sensitive technique, capable of detecting just a single molecule of virus
nucleic acid (Figure 1.8). More recently, there has also been renewed interest
in virus proteins based on a new biology which is itself dependent on
manipulation of nucleic acids in vitro and advances in protein detection
arising from immunology. Methods for in vitro synthesis and expression of
proteins from molecularly cloned DNA have advanced rapidly, and many new
analytical techniques are now available. Studies of protein–nucleic acid interactions are proving to be
particularly valuable in understanding virus structure and gene
expression.Advances in electrophoresis have made it possible to study
simultaneously all of the proteins in a virus-infected cell, called the
proteome of the cell (by analogy to the genome). Molecular biologists have one
further trick up their sleeves. Because of the repetitive, digitized nature of
nucleotide sequences, computers are the ideal means of storing and processing
this mass of information. ‘Bioinformatics’ is a broad term coined in the 1980s to encompass any application of
computers to biology. This can imply anything from artificial intelligence and
robotics to genome analysis. More specifically, the term applies to computer
manipulation of biological sequence data, including protein structural analysis.
Bioinformatics permits the inference of function from the linear sequence and
is thus central to all areas of modern biology. Due to the flood of new
sequence information, computers are being used increasingly to make predictions
based on nucleotide sequences (Figure 1.9).These include detecting the presence
of open reading frames, the amino acid sequences of the proteins encoded by
them, control regions of genes such as promoters and splice signals, and the
secondary structure of proteins and nucleic acids. However (particularly in the
case of RNA), the secondary structure assumed by molecules is almost as
important as the primary nucleotide sequence in determining the biological
reactions that the molecule may undergo. Caution is needed in interpreting such
predicted rather than factual information, and the validity of such predictions
should not be accepted without question unless confirmed by biochemical and/or
genetic data. However, when the structure of a protein has been determined by
x-ray crystallography or NMR, the shape can be accurately modelled and explored
in three dimensions on computers (Figure 1.10).
Figure 1.7
Nucleic acid hybridization relies on the specificity of base-pairing which
allows a labelled nucleic acid probe to pick out a complementary target
sequence from a complex mixture of sequences in the test sample. The label used
to identify the probe may be a radioisotope or a nonisotopic label such as an
enzyme or chemiluminescent system. Hybridization may be performed with both the
probe and test sequences in the liquid phase (top of figure) or with the test
sequences bound to a solid phase, usually a nitrocellulose or nylon membrane
(below). Both methods may be used to quantify the amount of the test sequence
present, but solid-phase hybridization is also used to locate the position of
sequences immobilized on the membrane. Plaque and colony hybridization are used
to locate recombinant molecules directly from a mixture of bacterial colonies
or bacteriophage plaques on an agar plate. Northern and Southern blotting are
used to detect RNA and DNA, respectively, after transfer of these molecules
from gels following separation by electrophoresis (cf., western blotting,
Figure 1.2).
Figure 1.8
Polymerase chain reaction (PCR) relies on the specificity of basepairing
between short synthetic olignucleotide probes and complementary sequences in a
complex mixture of nucleic acids to prime DNA synthesis using a thermostable
DNA polymerase. Multiple cycles of primer annealing, extension, and thermal
denaturation are carried out in an automated process, resulting in a massive
amplification (2n-fold increase after n cycles of amplification) of the target
sequence located between the two primers.
While the genome is the nucleic acid comprising
the entire genetic information of an organism, by extension ‘genomics’ is the
study of the composition and function of the genetic material of an organism.
Virus genomics began with the first complete sequence of a virus genome
(bacteriophage fX174 in 1977). Vast international databases of nucleotide and
protein sequence information have now been compiled, and these can be rapidly
accessed by computers to compare newly determined sequences with those whose
function may have been studied in great detail. At the time of publication, the
complete genome sequences of almost 1500 different viruses had been published,
with more appearing almost weekly (Table 1.1).
Figure 1.9 An example of the use of a
computer to store and process digitized information from a nucleic acid
sequence. This figure shows an analysis of all of the open reading frames
(ORFs) present in an HIV-1 provirus.The ORFs present in the three main
retrovirus genes, gag, pol, and env, can be seen. This complex analysis took
only a few seconds to perform using an ordinary personal computer. Manually,
the same task may have taken several days.
Figure 1.10
Three-dimensional structure of the DNA binding domain of SV40 T-antigen
reconstructed from NMR data using a computer
Thus we have, in a sense, come full circle in our
investigations of viruses— from particles via genomes back to
proteins again—and have emerged with a far more
profound understanding of these organisms; however, the current pace of
research in virology tells us that there is still far more that we need to
know.