Institute ofCytology and Genetics, Siberian Branch of USSR Academy
of Sciences, Novosibirsk, 630090, USSR.
Introduction
The multigene families (MF) are known to have been formed in the
course of evolution mainly by sequential duplication of ancestor
genes. Almost all MFs are characterized by some specified order
of homologous gene expression in the course of ontogenesis. The
question arises: are the genes expressed in early ontogenetic stages
more "ancient" than their ontogenetically later expressed homologues?
Zuckerkandl [1] was the first to formulate and study this question
with respect to the MFs. Taking into account that divergence of
?- and ßsubfamilies of globins occurred much earlier than those
of ß-Iike genes, he compared human ß-globins namely, y (fetal) and
ß (adult) protein sequences with alfa-globin. The latter protein
sequence was taken as a marker close to the "ancestor". Zuckerkandl
supposed that if the fetal ß-Iike globin (y) was closer to the alfa-globin
that the adult one (ß), the former protein could be assumed to be
more ancient than the latter one, and thus evidence in favour of
molecular recapitulation would be found. Nevertheless, he discovered
that both y- and ß-sequences showed the same number of amino acid
dissimilarities (55) with the alfa-globin [1]. This result compromised
the idea of molecular recapitulation for a rather long period. It
is a priori evident that if the phenomenon of molecular recapitulation
really 316 takes place, it must be caused by the stabilizing natural
selection: the earlier a gene is expressed in ontogeny, the wider
is the range of possible undesirable consequences of any mutation
in the gene. Selection of this kind must preserve the structure
of "functional" domains of the gene much more carefully than those
which are "subneutral". Thus, it is not unlikely that a large number
of subneutral substitutions is masking a smaller number of substitutions
located in the functional sites. Therefore, we decided to verify
this suggestion using more representative samples of globin nucleotide
sequences and more adequate and rigorous methods than Zuckerkandl
of phylogenetic analysis and of differentiating the mutations in
the globin functional sites from all the others.
Results
All of the sequences employed were taken from the GenBank data
base. Trees were constructed by means of the maximum parisimony
method of Zharkikh [2] (program UNISUB). A number of other programs
from the VOSTORG package were also used [3]. Using the data of Perutz
[4], we have divided the amino acid sites of the globins into two
groups: "functional" and "nonfunctional" (or "subneutral"). All
amino acid sites that participate in some important functional contacts
were assigned to the former group. This group includes sites involved
in: the alfa and ß-contacts with haem, the Bohr effect, the alfa
-ß bonds between the haemoglobin subunits,
the binding of 2,3-diphosphoglycerate (for ß-like chains) and the
salt bridges. The nonfunctional group includes all the other sites.
On the base of the primary DNA sequence alignment, phylogenetic trees
for the globin genes of Homo sapiens (see Fig. 1), Capra hircus and
Xenopus laevis (not presented) were inferred. In order to determine
the position of the tree root, we used the Halichoerus grypus myoglobin
gene as a homologous but relatively distant gene. When estimating
branch lengths of the trees we sorted the reconstructed nucleotide
substitutions in a special way. Each nucleotide substitution was characterized
from two points ofview: on the one hand, as affecting a functional
or nonfunctional site of the protein, and on the other hand, as synonymous
or nonsynonymous. Using the estimated branch lengths we computed the
distances between the present day sequences and the corresponding
ancestor ones reconstructed for each ?- and ß-gene cluster. The results
are presented in Table 1 Studying both ?- and ß-like human sequences
revealed the same regularity: the number of reconstructed nonsynonymous
substitutions fixed in the functional sites of the embryonic genes
(, and e) is threefold less than in adult genes (?l' ?2 in ?-cluster
and ß, 15 in ß-cluster) The analogous values for the fetal and adult
ßlike genes are almost equal (about nine substitutions) (see Table
1) In fact, the same could be said about the C. hircus genes. The
goat ß-cluster consists of three groups of genes [5]: the ßc, ßA and
ßF genes (the last one is also often designated y); they are orthologous
to the human ß-globin, IfI ßx, IfI ßY and IfI ßz pseudogenes and the
15-globin gene.
In the individual development of a goat, besides embryonic (GI and GII genes), fetal (f3F /y) and adult stages (f3A) of globin gene expression, an additional "preadult" or "juvenile" stage is found which is characterized by the expression ofpc gene [5].
Thus, the GI (goat p-Iike embryonic) gene appears to be the closest of the p-Iike genes to the "ancestor" gene (if only the nonsynonymous substitutions in functional sites are considered). Almost negligible regularity pA > pc > y is observed for the other three genes (see Table 2).
As for the goat GII gene, it was noticed that it exceeds all other p-like genes both in the total number of substitutions and in almost any particular group of distances (see Table 2). Taking into account that this gene
1) significantly differs from the goat GI and human G genes,
2) has accumulated large numbers of nonsynonymous substitutions in the "functional" sites (Table 2), and
3) is orthologous to the primate gene (IfIPl) that was proved to be a pseudogene, it is reasonable to suggest that the goat GI! gene is not an active one, but could be involved in some other processes, e.g. regulation of ontogenetic expression of the globins, as proposed by Goodman et al. [6] for the primate IfIPl gene.
Finally, the most significant regularity was found for the x. laevis globin genes [7]: the tadpole genes from both (X- and Pclusters are approximately twice as close to the corresponding ancestors than the adult ones and it was the class ofnonsynonymous substitutions in the functional sites that revealed this difference (see Table 3).
Summing up, let us note that the effect expected by Zuckerkandl
can be clearly seen when embryonic/"ancestor" and adult/"ancestor"
distances are compared. It does not hold true when comparing fetal/"ancestor"
and adult/ "ancestor" distances. The latter conclusion is obviously
in agreement with Zuckerkandl's idea: there were no embryonic-stage
globins in his sample of amino acid sequences. There are good reasons
to consider the fetal-stage globins (and the goat "preadult" globin)
as the product of relatively recent gene duplications. Thus, the
timc span after the last duplication might have been insufficient
to accumulate the differences in the degree of evolutionary conservatism
of the fetal- and adult-stage globin genes. It should be emphasized
that when analysing phylogenctic relations in some other MFs [immunoglobulin
genes of mammals [8], insect chorion protein genes [9], and even
homeoboxes of some regulatory genes of Drosophila melanogaster responsibel
for embryonic morphogenetic gradients, segmentation and differentiation
of the segments (S N. Rodin, unpublished)] we found a tendency resembling
that described here for globin genes. For example, the order of
duplication of immunoreceptor progenitor genes in the evolutionary
past was in good agreement with the order of gene rearrangements
and their expression in the course of B- and T -lymphocyte differentiation
[8].
Discussion
" Relay-Race" Regime
of Molecular Evolurion
Any significant increase in the rate of substitution fixation in
a particular gene from a multigene family could be explained in
two ways. The first explanation implies that the pressure of stabilizing
(negative) natural selection is lessened. The second possible cause
of the same phenomenon might be the improvement in the gene function
that is provided by positive natural selection. In the second case,
the highcr the ratc of adaptive evolution, the larger the substitution
load, i.e. Haldane's dilemma must be playing an important role in
evolutionary periods of just this kind. These two possible reasons
might appear to be combined in the case of globin gene family evolution
[10-13]. Although gene multiplications seem to bc quite an ordinary
event in genome evolution, they far more often give rise to silent
pseudogenes than to novel functional genes. The above may imply
that multigene family evolution occurs in this "relayrace" mode,
i.e. at any moment, most probably only one gene within the same
family is allowed to evolvc in an adaptive manner [11]. In fact,
the relay-race mode of molecular evolution may be considered as
a general theoretical substantiation of a cascade-Iike pattern of
switches in ex pression from one structural gene to another in the
course of ontogenesis.
Regulation of Development and Anaboly
The majority of authors (see [14]) are unanimous in assuming that
ontogenesis is regulated by a number of genes that are organized
as a "Bickford fuse" or a "relay-race with a specified time of last
participant arrival". This means that the expression of "the right
gene in the right time and in the right cell" requires a chain of
intermediate regulatory gene activations. The last participant of
this relayrace must activate the target gene. This chain of activations
must be characterized by strict adherence to the expression timetable.
Each regulatory gene might be responsible for multiple gene activations.
In turn, a group of regulatory genes is often controlled by a higher
order regulatory gene. Thus, the scheme of gene interactions in
ontogeny is undoubtedly a hierarchic one. The mode of terminal addition
of new stages (called anaboly by Severtsov [15]) appears to be the
least dangerous mode of gaining ontogenetic complexity. The latter
does not mean that "nonanabolic" evolutionary rearrangements of
individual development are forbidden, but in reality they are likely
to occur far more rarely than the anabolic ones. There are well-studied
examples where the prolonged activity of an earlier expressed gene
compensated for a malfunction in its later expressed homologues
(see [16]), i.e. the earlier expressed gene could be said to recapitulate
the ancestral mode of expression. Notably, among all the reported
cases of human globin gene malfunctions (thalassaemias) there are
no examples of compensating embryonic gene damage by expression
of fetal or adult globin genes. Thus, one can conclude that, for
example, a normal activation of fetal globins takes place only provided
that the embryonic gene was expressed normally etc. Thus, the structural
globin genes are also organized into 320 some analogue of the regulatory
hierarchy and the later expressed genes are more open to evolutionary
changes.
Recapitulation and Selective Strategies
The so-called "biogenctic law" of Haekkel was proved to hold true
only in some cases and not in others (see [14]). However, one can
explain (and maybe even predict) whether recapitulation will be
found in any particular case if the following speculations are valid.
There are two main "poles" of natural selection that are recognized
by ecologists [17]. The complexity of any ecological system is thought
to be determined, on the one hand, by the quantity of free energy
available and, on the other hand, by the stability of the environment.
An environment which is characterized by low probability of intensive
disastrous fluctuations is usually most densely populated. Plant
and animal communities in these conditions are known to form complex
trophic chains that utilize free energy in the most efficient way.
The intensive intra- and inter-specific competition that is observed
in these cases favours the increase of organism complexity. Selection
of this kind is called "Kselection " [17]. When the environment
is unstable (large parts of populations are randomly eliminated)
the individuals which have more offspring are most successful. This
kind of selection is known as "r-selection". A prolonged period
of r-selection may cause a drastic reduction in the morphologic
and ontogenetic complexity. It is quite reasonable to suggest that
the anabolic complication of ontogenesis must be demonstrated by
species evolving under pronounced K-type natural selection. On the
other hand, it is unlikely that traces of a recent terminal addition
of new stages will be found when typical r strategy species are
considered. Of course, when real organisms are being dealt with,
the picture might appear to be much more complex. First of all,
ancestors of almost any present-day animal surely underwent multiple
successions of r- and K -selection. This means that what could be
observed a posteriori is a complicated tangle of tendencies. Apart
from that, there are a great number of species which could not be
definitely classified according to the r/K scheme. Thus, the hypothesis
suggested may be applied only to relatively "recent" spans of evolutionary
time when the species observed are known to evolve under one kind
of selection.
Summary and Conclusions
Multigene families (MF) represent the most promising level of genome
organization when studying the molecular basis of both developmental
and evolutionary processes. Haldane's cost of selection "allows"
almost all MFs to increase their complexity in evolution in a relay-race
manner. Each MF is in turn characterized by astrict ontogenetic
order of expression of homologous structural genes. According to
Zuckerkandl, if any earlier expressed gene resembles in structure
the ancestor gene more than its later expressed homologue, this
could be considered as a case of molecular recapitulation. We showed
here that this phenomenon does occur invarious MFs when comparison
is performed only for sites that are known to be involved in selectively
important functional bonds. For all other sites, conditionally denoted
nonfunctional or subneutral, this regularity is not valid. The dichotomic
mode of switches in gene expression, unreciprocity of ontogenetic
compensation of human globin gene malfunctions (adult by fetal but
not reverse), allelic and isotypic exclusions in expression of immunoglobulin
genes clusters are certainly associated with the molecular recapitulation
phenomenon.
References
1. Zuckerkandl E (1968) Hemoglobins, Haeckel's "biogenetic law",
and mole cular aspects of development. In' Rich A, Davidson N (eds)
Structural chemistry and molecular biology. Freeman, San Francisco,
pp 256-274
2. Zharkikh AA (1977) Algorithms of phylogenetic tree buildinging
from amino acid sequences (in Russian), In' Ratner V (ed) Mathematical
models of evolution and selection, Institute of Cytology and Genetics,
Novosibirsk, pp 5-52
3. Zharkikh AA, Rzhetsky A, Morozov PS, Sitnikova TL, Krushkal JS
(1990) VOSTORG: package of a microcomputer program of phylogcnetic
analysis. Gene (in press)
4. Perutz MF (1972) Nature of haem-haem interaction, Nature 237'495-499
5. Schon EA, Cleary ML, Haynes JR, Lingrel JB (1981) Structure and
evolution of goat y-, ßc- and ßA-globin genes. three developmentally
regulated genes contain inserted elements. Cell 27.359-369
6. Goodman M, Koop BF, Czelusniak J, Weiss ML (1984) The IJ-globin
gene family of mammals. J Mol Bioi 180:803-823
7. Knochel W, MeyerhofW, Stadler J, Weber R (1985) Comparative nucleotide
sequence analysis of two types of larval ßglobin mRNA of Xenopu,s
laevis. Nucleic Acids Res 13: 7899- 7908
8. Rzhetsky A, Rodin SN (1987) Theoretical analysis of relations
between an order of evolutionary divergencies and developmental
stages (in Russian). Genetics (USSR) 23'2183-2195
9. Rzhetsky A, Rodin SN, Zharkikh AA (1990) "Biogenetic law" and
evolution of multigene families (in Russian). Institute of Cytology
and Genetics, Novosibirsk, pp1-60
10. Ratner V A, Rodin SN, Zharkikh AA (1977) Analysis of globin
phylogeny by a more precise method (in Russian), In: Ratner V A
(ed) Mathematical models of evolution and selection. I nstitute
of Cytology and Genetics, Novosibirsk, pp 5396
11. Rodin SN (1985) Multigenic families: evolutionary problems (in
Russian). Mol BioI (Mosc) 21 :198-240
12. Li W-H (1985) Accelerated evolution following gene duplication
and its implication for the neutralist-selectionist con 321 troversy.
In. Ohta T, Aoki K (eds) Population genetics and molecular evolution
Springer, Berlin Heidelberg New York, pp 333- 352
13. Goodman M, Moorc GW, Matsuda G (1975) Darwinian evolution in
the geneology of haemoglobin. Nature 253.603-608
14. Raff RA, Kaufman TC (1983) Embryos, genes and evolution. Macmillan,
New York 322
15. Severtsov AN (1945) Evolution of fins (in Russian). USSR Academy
of Sciences, Moscow (Selected works, vol 2)
16. Henthorn PS, Magcr DL, Huisman THJ, Smithies O (1986) A gene
deletion ending within a complex array of rcpcated sequences 3'
to the human ß-globin gene cluster. Proc Natl Acad Sci USA 83.5194-5198
17. MacArthur RH, Wilson EO (1967) The theory of island biogeography.
Princeton University Press, Princeton
|