Institute of Cytology and Genetics, USSR Siberian Academy of Sciences,
Novosibirsk, USSR.
Introduction
Any genome, except for its presently neutral DNA (i.e. without
coding sequence), comprises a perfect ensemble of functional genetic
units, the range of these units having its roots in individual exons
and being crowned by the most complicated supergene complexes. The
whole ensemble is undoubtably a product of mutually adaptive molecular
coevolution. Any ecosystem, in turn, is the result of concerted
molecular evolution of the species making up the system. At present,
when the sequencing of entire genomes is running at a phenomena
rate, to construct a theory of molecular coevolution would be of
utmost importance both for theoretical molecular biology and genetics
and for evolutionary theory itself. All specific forms of adaptive
molecular coevolution may be subdivided into intra- and intergenome
and into directed and nondirected processes [1]. It is typical of
nondirected molecular coevolution that, whatever its mode, mutations
occur and are fixed at a rate which, despite their apparent adaptive
value, remains on average constant. This fact, however, is at variance
with one of the keystone postulates of Kimura 's neutralistic theory.
Up until now, only nondirected processes of molecular coevolution
have been proposed and studied in details: intergenomically, concerted
fix ations of mutant reception and absorption genes in bacteria
and phages, respectively [2], different variants of coevolving antigens
and antibodies [3], original interactions between natural selection
and molecular drive in the coevolution of multiple promoter and
enhancer regions in rDNA loci, on the one hand, and the RNA Pol
J gene, on the other [4], specific pairs of base substitutions compensating
for each other to maintain the rRNA secondary structure [5], etc.
All these cases of molecular co-evolution, both intra- and intergenome
(in different parasite- host pairs), are in fact variations on a
theme, i.e. coevolution. The question of whether coevolution could
provide development of multigene systems ab simplecioribus ad complexiora
is of especially profound interest. Regarding the genes of the immune
system, we have suggested that HIV-Iike viruses could be involved
in coevolution of this sort [6].
Coevolutionarily Motivated Complication of Immune Multigene
Families
Let us consider a hypothetical ancestral organism with a primitive,
poorly differentiated immune system. Suppose the corresponding ancestral
immune cells (prelymphocytes) change their state from L to Tin the
course of ontogenesis, where Land Tare the immature and mature prelymphocytes,
respectively. Suppose also that viruses (V) can only strike the
L-cells, i.e. immature prelymphocytes. We then admit that the molecular-genetic
system of immunity is simple enough for the virus to make use, by
adsorption, of the very receptor of the L-cell that T -cells, in
turn, use to identify and inactivate the virus. The adsorption of
V on L leads to the formation of infected cells (denoted Z), from
which, via lysis, the daughter viral particles come. The T IV binding,
by contrast, leads to the elimination of the viruses. Accordingly,
we can derive the following system of differential equations describing
the dynamics:
Where alfa , w, {3 and k are constants of the respective process
rates. The state of equilibrium (L,0, 0, T) in Eq.1, where t= (a/k)L
and L is the root of equation F(L) = aL, is assumed to be health.
This state is locally stable (which implies that the prelymphoid
tissue is resistant to minor infections) if Q'vT,o) > (W -1) G´v
(L,0) and unstable if Q´v(T,O) < (W-1)G´v(£,0). In our model, an
increase in the stability of the "healthy" state can be obtained
by increasing the value of the term Q´v, (T, 0) and/or decrcasing
G´v(L,O). However, since t = (a/k) L, then a drop in L causes a
drop in T. To increase T, it is necessary to increase L; note that
T< (alk)L. Therefore, there are two ways of increasing the resistance
of the prelymphoid system to minor infections (in terms of a simplified
model): first, by increasing the number of clones of those prelymphocytes
that are specific to various antigen determinants; and secondly,
by changing in the course of the prelymphocytes maturation the avidity
of the antigen specific receptor. Both ways are found in the immune
systems of contemporary vertebrates. The second way actually implies
that no entirely identical receptor molecules can participate in
either the absorption of viruses upon the target immunocytes or
in the recognition and destroying of viruses (or their antigens),
since to acquire homologous but not identical receptors, progressive
divergence of the molecular genetic system of immunity is required.
However, what factor(s) could direct the evolutionary complication
of all the other multigene families (MFs)? What if the role of intragenome
"parasites", such as human Alu repea1ed sequences, retroviruses,
and mobile genes of Drosophila, in the evolution of MFs is similar
to that of HIV -like pathogens in the evolution of immune supergenes?
Intragenome Parasites and Genome: a Coevolutionary Aspect
We have studied [7] the processes of concerted variability which
actually result from cooperation of such entities as, on the one
hand, various mobile elements (a kind of "intragenome parasite",
GP) and, on the other, the genome itself ("host"). Several systems
of differential equations similar to Eq.1 have been built order
to analyse the following situations: 1) the GP is insertable in
the vacant sites only, its free state (not in the "host" but still
in the cell) not being durable; 2) the GP is insertable in the vacant
sites only, its free state being durable; 3) the GP is insertable
in both vacant and occupied sites ("molecular memory"), its free
state not being durable (mammalian Alu-like repeats taken as a prototype
); 4) the GP is insertable in both vacant and occupied sites and
is able to exist "on its own" (retroviruses taken as a prototype).
We then admitted that the genome is tolerant to the "selfish" proliferation
of GP until the share of the occupied sites exceeds the limit 1
I K. Our analysis revealed that the coevolutionary complication
of GP from the simplest, which is only able to insert in vacant
sites, through the ongoing acquirement of terminal re peats ("molecular
memory"), to perfectly integrated complexes with an extragenomiallife
style is accompanied by change in the selective coevolutionary restrictions
on genome size: upper limit-no limit -lower limit. Thus, mobile
elements may be regarded as an inner factor inducing progressive,
coevolutionarily motivated complication of genomes, including multiplication
of coding regions. Our models are based on the assumption that there
is always a superior selective force (from the "host" side) that
restricts the number of GPs and influences the pattern of GP distribution
in the host genome. However, the following question arises here:
Are there any inferior restrictions directly related to the GP structure
as such? We go on to show below that A/u-Iike repeated sequences,
even with extremely simple structures, could have such restrictions.
CpG-Rich Promoters as an Inner Constraint on Amplification of
Alu-Like Sequences
With the aid of the package of applied programs VOSTORG [8], designed
in our laboratory, 83 A/u repeats (60 human included) from seven
species of primates and 13 A/u-Iike B1 repeats from three rodent
species were subjected to phy logenetic analysis, in particular,
for mutations fixed in RNA polymerase III promoters (Fig. 1 ). Using
the method of diagnostic positions [9] enabled us to divide all
60 human A/u sequences into three different classes (Fig. 2) corresponding
to J, Sa and Sb (identification of the Sc class was certain) according
to Brit ten et al. [9]. The topologies of the phylogenetic tree
constructed on the complete sample of A/u sequences and of the tree
derived from the comparison of the consensus for all classes revealed
a good agreement with the order of appearance of these classes in
the course of evolution (Fig. 3): progenitor (7 SL RNA gene) --.J--.
Sa --. Sc(?) --.Sb. As is known, the CpG positions evolve on average
10.5 times as much as other positions of A/u repeats, which is due
to methylation of the cytosines. In particular, the A (enhancing)
and B (initiating) boxes of promoters contribute considerably to
the concentration of CpGs (Fig. 1). We tried to build a dichotomic
dendrogram from CpG positions of the promoter alone but failed.
This could be an argument in favour of the "burst"-like formation
of the A/u classes. The most intriguing feature of the A/u evolutionary
tree (Fig. 3 c) is the almost absolute lack of mutations in CpG
dinucleotides of the promoter region at the
Fig. 1. Consensus of human Alu repeats [9] with the left and
right halves of the sequences aligned. A and B boxes of the promoter
region are underlined. The CpG dinucleotides are in lower case letters.
The right promoter is likely to be inactive due to the relatively
long inserted sequence
Fig.2. Variability in the diagnostic positions of 60 human
Alu repeats. On the left, the consensus nucleotides are shown. All
the Alu repeats rearranged in accordance with a divergence from
the consensus. Letters at the bottom indicate the class to which
each Alu repeat
upper branches of the tree (when the Alu subfamilies wcre being
newly formed), in spite of their extreme mutability. NonCpG positions
show the same regularity (Fig. 3 e). It should be noted that mutations
in the CpG sites of the "quasineutral" part of the Alu repeats appear
not to be in deficit at that period (Fig. 3 d). Thus, a lion's share
of mutations in the promoter CpG sites are concentrated in the lower
branches of the divergency tree just after the class formation process
is over (not shown). This means that the promoter region of Alu
repeat progenitors were under very strong negative natural selection
pressure until the amplification process started. Moreover, the
topology of the dichotomic branching within each class appears to
be unstable. Thus, during evolution, first some changes in diagnostic
positions (CpG sites not belonging to them) had to be accumulated;
secondly, a current class of Alu sequences branched off thc main
stem of the trcc; and, finally, mutations at CpG positions predominantly
within the promoter occurred most rapidly. The superfamily of Bl
repeats of rodents, closely related to Alu, shows similar regularities.
The results obtained allow us to propose a model where promoter
sites playa role of profound importance both in intragenome amplification
of the progenitor Alu sequence and in the divergence of individual
members of the corresponding subfamilies. The model is supposed
to explain the limited sizes of a subfamily with the subsequent
acquisition ofmutational defects in CpG positions of promoters and
hence the inevitable slowdown of amplification. As a result, only
those of 7 SL RNA-Iike sequences which have retained the promoters
could
Fig.3a-f. Phylogeny of the consensus sequcnces reconstructed
for the main Alu classes with a human 7 SL RNA sequence as a repeat.
Numbers of mutations fixed in various types of positions arc shown:
a in the 23 diagnostic non-CpG positions; bin all positions without
central and terminal oligo-A parts; c in 8 CpG positions in A and
B boxes of the left (active) promoter for the host RNA polymerase
II; din 38 non-promoter CpG positions; e in 16 non-CpG positions
in the left promoter; fin 6 CpG positions from sites in the right
(inactive) domain homologous to A and B boxes
become the progenitor for the following subfamily of Alu repeats
to amplify and evolve in an active mode. Each Alu repeat is well
known to consist of two homologous halves (Fig. 1 ). Usually , only
the leftmost domain is active for amplification by reverse transcription
[10]. Figure 3f shows that, in contrast to the single CpG mutation
in the leftmost promoter, the rightmost one accepted seven such
mutations in CpG dinucleotides at the top part of the tree just
when the Alu subfamilies were in the making. This is an additional,
rather convincing, argument in favour of the importance of the promoter
CpG sites, in particular those located in A and B boxes. Thus, the
"selfish" intragenome propagation of any progenitor "pregnant" with
a recurrent Alu subfamily is destined to slow down and, eventually,
to come to a standstill because as any individual Alu promoter rapidly
accumulates more and more defects, predominantly due to the increased
mutability of CpG sites, the host reverse transcriptase becomes
less able to recognize the promoter. This is not so with HIV-Iike
retroviruses. They show unusually high variability, generated by
viral reverse transcriptase, the most error-prone of the various
RNA and DNA polymerases [11 ]. In contrast to the short Alu repeating
unit, the HIV reverse transcriptase is encoded by its own Pol gene.
It produces 327 extremely frequent mutations in all regions of the
viral genome, including in its own gene. Therefore, there is a good
chance for promoter sites and reverse transcriptase to be involved
in prolonged steady coevolution, based on the selection of pairs
of substitutions compensating for each other. It is an original
case of a strikingly rapid intragenome coevolution which should
be adaptive but is apparently not directed.
Summary and Conclusions
"Parasitic" DNA may be regarded as a rather active partner in different
coevolutionary processes. The basic stages of the processes are
likely in most cases to be as follows. parasitism? tolerance -+
-+ symbiosis. There are interior and exterior coevolutionary factors
complicating molecular-genetic systems within a supersystem "mobile
elements-genome". For example, the data presented above indicate
clearly that the relatively high concentration of CpG sites in the
Alu promoter looks prudent as regards the needs of the "parasite"
as well as those of the host genome. We consider "prudence" of this
kind to be most likely a product of large-scale molecular coevolution.
As to HIV -like retroviruses, they could be simultaneously involved
in three different regimes of molecular coevolution: 1) at a level
of the parasitic genome as such; 2) as atypical intragenome parasite
inserted in the host genome inducing complication in multigenic
system (like Alu); 3) as a typical intracellular parasite in an
"active", infectious state stimulating complication in the immune
multigene families. Evidently, it is only the steadiness of the
first coevolutionary process (with "no wheels, no sails") provides
for a possible 328 role of HIV-Iike parasites as a selective factor
provoking coevolutionary complication of host genomes.
References
1. Rodin SN, Rzhetsky AY, Matushkin YG (1987) Coevolutionary approach
to the motivation of molecular-genetic organization of immune system.
In. MlikovskyJ, Novak V (eds) Towards a new synthesis in evolutionary
biology. CSA V, Prague, pp130-132
2. Rodin SN, Ratner VA (1983) Some theoretical aspects of protein
coevolution in the ecosystem "phage-bacteria". II. The deterministic
model or microevolution. J Theor BioI100:197-210
3. Rodin SN, Rzhetsky AY( 1989) Coevolutionary approach to the problem
of molecular-genetic bases of antibody diversity (in Russian). Achicvements
Contemp BioI 107.357-374
4. Dover GA, Flavell RB (1984) Molecular coevolution. DNA divergence
and the maintenance of function. Cel138 :622-623
5. Hancock JM, Tautz D, Dovcr GA (1988) Evolution of the secondary
structures and compensatory mutations of the ribosomal RNAs of Drosophila
melanogaster. Mol BioI Evol 5:393-414
6. Rodin SN, Matushkin YG (1987) Intracellular infections as one
of the factors directing progressive divergency of immune multigene
system in evolution (in Russian). J Gen BioI 48.845-856
7. Rodin SN (1991) Idea of coevolution. Chapman and Hall, London
(in press)
8. Zharkikh AA, Rzhetsky AY, Morozov PS, Sitnikova TL, Krushkal
JS, Matushkin YG (1990) VOSTORG: package or a microcomputer programs
of phylogenetic analysis. Gene (in press)
9. Brit ten RJ, Baron WF, Stout DB, Davidson EH (1988) Sources and
evolution of human Alu repeated sequences. Proc Natl Acad Sci USA
85:4770-4774
10. Perez-Stable C, Shen C-K (1986) Competitive and cooperative
functioning of the anterior and posterior promoter elements of an
Alu family repeat. Mol Cell Bioi 6:2041-2052
11. Varmus H (1988) Retroviruses. Science 240:1427-1435
|