Multigene Families: The Problem of Molecular Recapitulation
S. N. Rodin, A. Y. Rzhetsky, and A. A. Zharkikh     Hńmatol. Bluttransf. Vol 35

Institute ofCytology and Genetics, Siberian Branch of USSR Academy of Sciences, Novosibirsk, 630090, USSR.


The multigene families (MF) are known to have been formed in the course of evolution mainly by sequential duplication of ancestor genes. Almost all MFs are characterized by some specified order of homologous gene expression in the course of ontogenesis. The question arises: are the genes expressed in early ontogenetic stages more "ancient" than their ontogenetically later expressed homologues? Zuckerkandl [1] was the first to formulate and study this question with respect to the MFs. Taking into account that divergence of ?- and ▀subfamilies of globins occurred much earlier than those of ▀-Iike genes, he compared human ▀-globins namely, y (fetal) and ▀ (adult) protein sequences with alfa-globin. The latter protein sequence was taken as a marker close to the "ancestor". Zuckerkandl supposed that if the fetal ▀-Iike globin (y) was closer to the alfa-globin that the adult one (▀), the former protein could be assumed to be more ancient than the latter one, and thus evidence in favour of molecular recapitulation would be found. Nevertheless, he discovered that both y- and ▀-sequences showed the same number of amino acid dissimilarities (55) with the alfa-globin [1]. This result compromised the idea of molecular recapitulation for a rather long period. It is a priori evident that if the phenomenon of molecular recapitulation really 316 takes place, it must be caused by the stabilizing natural selection: the earlier a gene is expressed in ontogeny, the wider is the range of possible undesirable consequences of any mutation in the gene. Selection of this kind must preserve the structure of "functional" domains of the gene much more carefully than those which are "subneutral". Thus, it is not unlikely that a large number of subneutral substitutions is masking a smaller number of substitutions located in the functional sites. Therefore, we decided to verify this suggestion using more representative samples of globin nucleotide sequences and more adequate and rigorous methods than Zuckerkandl of phylogenetic analysis and of differentiating the mutations in the globin functional sites from all the others.


All of the sequences employed were taken from the GenBank data base. Trees were constructed by means of the maximum parisimony method of Zharkikh [2] (program UNISUB). A number of other programs from the VOSTORG package were also used [3]. Using the data of Perutz [4], we have divided the amino acid sites of the globins into two groups: "functional" and "nonfunctional" (or "subneutral"). All amino acid sites that participate in some important functional contacts were assigned to the former group. This group includes sites involved in: the alfa and ▀-contacts with haem, the Bohr effect, the alfa -▀ bonds between the haemoglobin subunits,

the binding of 2,3-diphosphoglycerate (for ▀-like chains) and the salt bridges. The nonfunctional group includes all the other sites. On the base of the primary DNA sequence alignment, phylogenetic trees for the globin genes of Homo sapiens (see Fig. 1), Capra hircus and Xenopus laevis (not presented) were inferred. In order to determine the position of the tree root, we used the Halichoerus grypus myoglobin gene as a homologous but relatively distant gene. When estimating branch lengths of the trees we sorted the reconstructed nucleotide substitutions in a special way. Each nucleotide substitution was characterized from two points ofview: on the one hand, as affecting a functional or nonfunctional site of the protein, and on the other hand, as synonymous or nonsynonymous. Using the estimated branch lengths we computed the distances between the present day sequences and the corresponding ancestor ones reconstructed for each ?- and ▀-gene cluster. The results are presented in Table 1 Studying both ?- and ▀-like human sequences revealed the same regularity: the number of reconstructed nonsynonymous substitutions fixed in the functional sites of the embryonic genes (, and e) is threefold less than in adult genes (?l' ?2 in ?-cluster and ▀, 15 in ▀-cluster) The analogous values for the fetal and adult ▀like genes are almost equal (about nine substitutions) (see Table 1) In fact, the same could be said about the C. hircus genes. The goat ▀-cluster consists of three groups of genes [5]: the ▀c, ▀A and ▀F genes (the last one is also often designated y); they are orthologous to the human ▀-globin, IfI ▀x, IfI ▀Y and IfI ▀z pseudogenes and the 15-globin gene.

In the individual development of a goat, besides embryonic (GI and GII genes), fetal (f3F /y) and adult stages (f3A) of globin gene expression, an additional "preadult" or "juvenile" stage is found which is characterized by the expression ofpc gene [5]. Thus, the GI (goat p-Iike embryonic) gene appears to be the closest of the p-Iike genes to the "ancestor" gene (if only the nonsynonymous substitutions in functional sites are considered). Almost negligible regularity pA > pc > y is observed for the other three genes (see Table 2). As for the goat GII gene, it was noticed that it exceeds all other p-like genes both in the total number of substitutions and in almost any particular group of distances (see Table 2). Taking into account that this gene 1) significantly differs from the goat GI and human G genes, 2) has accumulated large numbers of nonsynonymous substitutions in the "functional" sites (Table 2), and 3) is orthologous to the primate gene (IfIPl) that was proved to be a pseudogene, it is reasonable to suggest that the goat GI! gene is not an active one, but could be involved in some other processes, e.g. regulation of ontogenetic expression of the globins, as proposed by Goodman et al. [6] for the primate IfIPl gene. Finally, the most significant regularity was found for the x. laevis globin genes [7]: the tadpole genes from both (X- and Pclusters are approximately twice as close to the corresponding ancestors than the adult ones and it was the class ofnonsynonymous substitutions in the functional sites that revealed this difference (see Table 3).

Summing up, let us note that the effect expected by Zuckerkandl can be clearly seen when embryonic/"ancestor" and adult/"ancestor" distances are compared. It does not hold true when comparing fetal/"ancestor" and adult/ "ancestor" distances. The latter conclusion is obviously in agreement with Zuckerkandl's idea: there were no embryonic-stage globins in his sample of amino acid sequences. There are good reasons to consider the fetal-stage globins (and the goat "preadult" globin) as the product of relatively recent gene duplications. Thus, the timc span after the last duplication might have been insufficient to accumulate the differences in the degree of evolutionary conservatism of the fetal- and adult-stage globin genes. It should be emphasized that when analysing phylogenctic relations in some other MFs [immunoglobulin genes of mammals [8], insect chorion protein genes [9], and even homeoboxes of some regulatory genes of Drosophila melanogaster responsibel for embryonic morphogenetic gradients, segmentation and differentiation of the segments (S N. Rodin, unpublished)] we found a tendency resembling that described here for globin genes. For example, the order of duplication of immunoreceptor progenitor genes in the evolutionary past was in good agreement with the order of gene rearrangements and their expression in the course of B- and T -lymphocyte differentiation [8].


" Relay-Race" Regime
of Molecular Evolurion

Any significant increase in the rate of substitution fixation in a particular gene from a multigene family could be explained in two ways. The first explanation implies that the pressure of stabilizing (negative) natural selection is lessened. The second possible cause of the same phenomenon might be the improvement in the gene function that is provided by positive natural selection. In the second case, the highcr the ratc of adaptive evolution, the larger the substitution load, i.e. Haldane's dilemma must be playing an important role in evolutionary periods of just this kind. These two possible reasons might appear to be combined in the case of globin gene family evolution [10-13]. Although gene multiplications seem to bc quite an ordinary event in genome evolution, they far more often give rise to silent pseudogenes than to novel functional genes. The above may imply that multigene family evolution occurs in this "relayrace" mode, i.e. at any moment, most probably only one gene within the same family is allowed to evolvc in an adaptive manner [11]. In fact, the relay-race mode of molecular evolution may be considered as a general theoretical substantiation of a cascade-Iike pattern of switches in ex pression from one structural gene to another in the course of ontogenesis.

Regulation of Development and Anaboly

The majority of authors (see [14]) are unanimous in assuming that ontogenesis is regulated by a number of genes that are organized as a "Bickford fuse" or a "relay-race with a specified time of last participant arrival". This means that the expression of "the right gene in the right time and in the right cell" requires a chain of intermediate regulatory gene activations. The last participant of this relayrace must activate the target gene. This chain of activations must be characterized by strict adherence to the expression timetable. Each regulatory gene might be responsible for multiple gene activations. In turn, a group of regulatory genes is often controlled by a higher order regulatory gene. Thus, the scheme of gene interactions in ontogeny is undoubtedly a hierarchic one. The mode of terminal addition of new stages (called anaboly by Severtsov [15]) appears to be the least dangerous mode of gaining ontogenetic complexity. The latter does not mean that "nonanabolic" evolutionary rearrangements of individual development are forbidden, but in reality they are likely to occur far more rarely than the anabolic ones. There are well-studied examples where the prolonged activity of an earlier expressed gene compensated for a malfunction in its later expressed homologues (see [16]), i.e. the earlier expressed gene could be said to recapitulate the ancestral mode of expression. Notably, among all the reported cases of human globin gene malfunctions (thalassaemias) there are no examples of compensating embryonic gene damage by expression of fetal or adult globin genes. Thus, one can conclude that, for example, a normal activation of fetal globins takes place only provided that the embryonic gene was expressed normally etc. Thus, the structural globin genes are also organized into 320 some analogue of the regulatory hierarchy and the later expressed genes are more open to evolutionary changes.

Recapitulation and Selective Strategies

The so-called "biogenctic law" of Haekkel was proved to hold true only in some cases and not in others (see [14]). However, one can explain (and maybe even predict) whether recapitulation will be found in any particular case if the following speculations are valid. There are two main "poles" of natural selection that are recognized by ecologists [17]. The complexity of any ecological system is thought to be determined, on the one hand, by the quantity of free energy available and, on the other hand, by the stability of the environment. An environment which is characterized by low probability of intensive disastrous fluctuations is usually most densely populated. Plant and animal communities in these conditions are known to form complex trophic chains that utilize free energy in the most efficient way. The intensive intra- and inter-specific competition that is observed in these cases favours the increase of organism complexity. Selection of this kind is called "Kselection " [17]. When the environment is unstable (large parts of populations are randomly eliminated) the individuals which have more offspring are most successful. This kind of selection is known as "r-selection". A prolonged period of r-selection may cause a drastic reduction in the morphologic and ontogenetic complexity. It is quite reasonable to suggest that the anabolic complication of ontogenesis must be demonstrated by species evolving under pronounced K-type natural selection. On the other hand, it is unlikely that traces of a recent terminal addition of new stages will be found when typical r strategy species are considered. Of course, when real organisms are being dealt with, the picture might appear to be much more complex. First of all, ancestors of almost any present-day animal surely underwent multiple successions of r- and K -selection. This means that what could be observed a posteriori is a complicated tangle of tendencies. Apart from that, there are a great number of species which could not be definitely classified according to the r/K scheme. Thus, the hypothesis suggested may be applied only to relatively "recent" spans of evolutionary time when the species observed are known to evolve under one kind of selection.

Summary and Conclusions

Multigene families (MF) represent the most promising level of genome organization when studying the molecular basis of both developmental and evolutionary processes. Haldane's cost of selection "allows" almost all MFs to increase their complexity in evolution in a relay-race manner. Each MF is in turn characterized by astrict ontogenetic order of expression of homologous structural genes. According to Zuckerkandl, if any earlier expressed gene resembles in structure the ancestor gene more than its later expressed homologue, this could be considered as a case of molecular recapitulation. We showed here that this phenomenon does occur invarious MFs when comparison is performed only for sites that are known to be involved in selectively important functional bonds. For all other sites, conditionally denoted nonfunctional or subneutral, this regularity is not valid. The dichotomic mode of switches in gene expression, unreciprocity of ontogenetic compensation of human globin gene malfunctions (adult by fetal but not reverse), allelic and isotypic exclusions in expression of immunoglobulin genes clusters are certainly associated with the molecular recapitulation phenomenon.


1. Zuckerkandl E (1968) Hemoglobins, Haeckel's "biogenetic law", and mole cular aspects of development. In' Rich A, Davidson N (eds) Structural chemistry and molecular biology. Freeman, San Francisco, pp 256-274
2. Zharkikh AA (1977) Algorithms of phylogenetic tree buildinging from amino acid sequences (in Russian), In' Ratner V (ed) Mathematical models of evolution and selection, Institute of Cytology and Genetics, Novosibirsk, pp 5-52
3. Zharkikh AA, Rzhetsky A, Morozov PS, Sitnikova TL, Krushkal JS (1990) VOSTORG: package of a microcomputer program of phylogcnetic analysis. Gene (in press)
4. Perutz MF (1972) Nature of haem-haem interaction, Nature 237'495-499
5. Schon EA, Cleary ML, Haynes JR, Lingrel JB (1981) Structure and evolution of goat y-, ▀c- and ▀A-globin genes. three developmentally regulated genes contain inserted elements. Cell 27.359-369
6. Goodman M, Koop BF, Czelusniak J, Weiss ML (1984) The IJ-globin gene family of mammals. J Mol Bioi 180:803-823
7. Knochel W, MeyerhofW, Stadler J, Weber R (1985) Comparative nucleotide sequence analysis of two types of larval ▀globin mRNA of Xenopu,s laevis. Nucleic Acids Res 13: 7899- 7908
8. Rzhetsky A, Rodin SN (1987) Theoretical analysis of relations between an order of evolutionary divergencies and developmental stages (in Russian). Genetics (USSR) 23'2183-2195
9. Rzhetsky A, Rodin SN, Zharkikh AA (1990) "Biogenetic law" and evolution of multigene families (in Russian). Institute of Cytology and Genetics, Novosibirsk, pp1-60
10. Ratner V A, Rodin SN, Zharkikh AA (1977) Analysis of globin phylogeny by a more precise method (in Russian), In: Ratner V A (ed) Mathematical models of evolution and selection. I nstitute of Cytology and Genetics, Novosibirsk, pp 5396
11. Rodin SN (1985) Multigenic families: evolutionary problems (in Russian). Mol BioI (Mosc) 21 :198-240
12. Li W-H (1985) Accelerated evolution following gene duplication and its implication for the neutralist-selectionist con 321 troversy. In. Ohta T, Aoki K (eds) Population genetics and molecular evolution Springer, Berlin Heidelberg New York, pp 333- 352
13. Goodman M, Moorc GW, Matsuda G (1975) Darwinian evolution in the geneology of haemoglobin. Nature 253.603-608  
14. Raff RA, Kaufman TC (1983) Embryos, genes and evolution. Macmillan, New York 322
15. Severtsov AN (1945) Evolution of fins (in Russian). USSR Academy of Sciences, Moscow (Selected works, vol 2)
16. Henthorn PS, Magcr DL, Huisman THJ, Smithies O (1986) A gene deletion ending within a complex array of rcpcated sequences 3' to the human ▀-globin gene cluster. Proc Natl Acad Sci USA 83.5194-5198
17. MacArthur RH, Wilson EO (1967) The theory of island biogeography. Princeton University Press, Princeton