Beckman Research Institute of The City of Hope
1450 East Duarte Road Duarte, CA 91010
I pay homage to Umberto Eco who attended this meeting by quoting
the very first sentence of PROlOGUE to his widely acclaimed book
"THE NAME OF THE ROSE" in English translation: "In the beginning
was the Word and the Word was with God and
the Word was God". It - would be noted that in this
17-word-long sentence, the word recurrs thrice and God
twice; all together, recurring words occupying half of the sentence.
Indeed, the essence of any good writing appears to depend upon the
recurrence at more or less regular intervals of the same or similar
sounding words which give a lyrical quality to it. Julius Caesar's
announcement to the Roman senate of his victory at Zela (47 B.C.)
survives to this day, only because all three word-combinations uttered
in succession began with v and ended in i; "ve'ni, vi'di, vi'ci".
This then is the extreme of recurrence with no interval at all.
In immunology, the antigen-specific cooperation between helper T-cells
and antibody-producing B cells appears again to depend upon recurrence
of the same word (the same signal ). As schema tically illustrated
in Figure 1, a macrophage phagocytizes antigen "A" and digests it
to a number of small peptide fragments. Of those digested fragments,
an amphipathic alfa-helical fragment is preferentially chosen and
presented to the outside world by the antigen-presenting macrophage
in conjunction with the self class II MHC antigen (extreme left
of Figure 1). A clone of T-cells which happen to possess the membrane-bound
receptor that fits this antigen "A" alfa-helical fragment + self
class II MHC antigen complex, now becomes antigen-specific helper
T-cells (middle of Figure 1). But how this clone of T-cells can
selectively recognize and help not a single clone but a number of
clones of B-cells equipped with membrane-bound antiantigen "A" anti-bodies
of IgM and IgD types? For it would be recalled that antibodies are
constructed to recognize antigens per Be not complexed with self
class II MHC antigen. Further more. each polypeptide antigen usually
present not one but a number of antigenic determinants. Accordingly.
there ought to be and are several different clones of B-cells; their
antibodies being directed against different antigenic determinants
of antigen "A". Nevertheless. having been endowed with membrane-bound
anti-antigen "A" antibodies. all these different clones of Bcells
would manage to concentrate antigen "A" on their cell surface. These
antigen-antibody complexes shall be lysostripped off the plasma
membrane and shall be digested inside B-cells. As the digestive
enzyme involved is of the same sort as that present in the macrophage.
antigen "A" is digested to the same variety of peptide fragments
and the same amphipathic alfa-helical fragment is chosen and presented
by B-cells to the outside world in conjunction with self class II
MHC antigen. It is this complex which the receptor of helper T-cells
recognizes. thus. resulting in antigen-specific help to expand all
clones of antiantigen "A" B-cells (extreme right of Figure I).
Figure 1 How helper T cells are able to provide the antigen
specific help to multiple clones of B cells is schematically illustrated.
This help is based upon antigen-presenting macrophages and antigenspecific
clones of B cells uttering the same word which is perceived as such
by membrane-bound receptor of helper T cells. At the extreme left.
the specific antigen (antigen "A") is depicted as a polypeptide
chain comprised of alternating alfa-helical and ß-sheet forming
segments; of those. one amphipathic alfa-helical segment is shown
as a black barrel. A macrophage. at the left. phagocitizes antigen
"A". and in a specific intracellular locality but not in a lysosome.
antigen "A" is digested by a particular protease to several peptidic
fragments. One amphipathic alfa-helical fragment then is preferentially
chosen and presented to the outside world. complexed with self class
II MHC antigen. A clone of T cells equipped with the receptor that
fits this complex presented by the antigen-presenting macrophage
now becomes anti-antigen "A" specific helper T cells as shown in
the middle. On the other hand. still membrane-bound antibodies of
dormant anti-antigen "A" B cells shown at the right recognize any
of the antigenic determinants present on antigen "A" but never a
complex formed between self class II MHC and antigen "A" aphipathic
ahelical fragment. Yet they can be recipients of the help from anti-antigen
"A" specific T cells. because a complex formed between antigen "A"
and specific membrane-bound antibody is taken inside B cells by
pynocytosis and subsequently antigen "A" is digested by the same
protease as present in macrophages. Of digested fragments. the same
amphipathic alfa-helical fragment is preferrentially chosen and
presented by B cells to the outside world complexed with self class
II MHC antigen. This enables anti-antigen "A" specific helper T
cells to see the same word on the plasma membrane of anti-antigen
"A" specific B cells as it had seen on the antigen-presenting macrophage
plasma membrane; hence antigen-specific help to cause clonal expansion
of the antibody secretion by anti-antigen "A" specific B cells.
All in all. it would thus appear that the antigen-specific cooperation
between T-cells and B-cells is based upon one principle; that when
confronted with the same sentence (antigen "A"). both antigen-presenting
macrophage and anti-antigen "A" B-cells chose the same word (a particular
amphipathic alfa-helical segment) out of that sentence and crowns
it with the same adjective (self class II MHC antigen).
COMPLEMENTARY RECOGNITION IS BUT A FORM OF THE HOMOLOGOUS RECOGNITION
In immunology. one often speaks of specific antigen-antibody interactions
as examples of recognition based upon the complementarity between
two components. Nevertheless, the fact is that all components of
the adaptive immune system are composed of strings of repeating
units ultimately derived from the common ancestral unit. This unit
commonly referred to as the ß2-rnicroglobulin-like domain is made
of 90 to 100 mostly hydrophobic amino acid residues, the relative
abundance of hydrophilic Ser and/or Thr also being a conspicuous
feature. These residues are folded into three to five loops of anti-parallel
ß-sheet forming segments. Contacts between neighboring ß-sheet forming
segments are maintained through hydrogen bonds mostly formed between
Thr-Thr, Thr-Ser or Ser-Ser, and the whole structure is compacted
by the presence of one intradomain disulfide bridge. It now appears
that the immediate ancestor of genes for the adaptive immune system
was CAM (cell adhesive molecule) gene engaged in organogenesis of
early embryos. In the extracellular portion of N-CAM specific for
neuronal organization, four successive ß2-rnicroglobulin-like domains
were found (Hemperly et al. ,1986). Through these domains, N-CAM
engages in homologous recognition, thus, aggregating similar neuronal
cells; the first step in neuronal organization. It is fitting that
all components of the adaptive immune system evolved from CAM; the
original mediator of cell-cell interaction. The point to be made
here is that the ß 2-rnicroglobulin-like domain originally evolved
to engage in homologous, not complementary, recognition. Accordingly,
recognition of class I and Class II MHC antigens by T-cell receptors
and 8-cell antibodies, as well as that of idiotypes by another T-cell
receptors and B-cell antibodies are homologous recognition sensu
stricto; the notion of complementary recognition being more of an
illusion than reality
SEARCH FOR THE ULTIMATE ANCESTOR
Implicit in the above stated notion that the immediate ancestor
of various components of the adaptive immune system was one of the
cell adhesion molecules (CAM) involved in the initial stage of embryonic
organogenesis is the assumption that four ß2-rnicroglobulin-like
domains of N-CAM arose in situ. Were they borrow ed from other molecules
(even from immunoglobulins themselves) by the so-called domain exchange,
the whole notion of CAM being the immediate ancestor of various
components of the adaptive immune system becomes ridiculous. Fortunately,
it looks as though these ß2-microglobulin-like domains of N-CAM
indeed evolved in situ, for there is a noticeable similarity in
construction of these ß2-microglobuin-like domains and other parts
of N-CAM. Each of these ß2-microglobulin-like domains contains three
absolutely invariant residues; 1) Cys in 12th position, 2) Trp in
24th position, and 3) Cys in 62nd position. These three invariant
residues tend to be included in Thr-X,Thr-X dipeptidic repeats.
This is illustrated at the top of Figure 2 on 3rd of the four success.
ive ß2-microglobulin-like domains, for 3rd is the only complete
domain, the other three sustaining deletions of three to six res.
idues (Hemperly et al. ,1986). Four successive ß2-microglobulin
like domains comprise but 40% of N-CAM polypeptide chain. The 362-residue
long carboxyl terminal domain remaining within the cell is constructed
of a simpler mode, thus, suggesting that this segment remained close
to the original design of the entire CAM polypeptide chain. As also
shown at the top of Figure 2, 699th to 728th residues of N-CAM is
esentially made of Thr-X, Thr-X dipeptide repeats. Thus, it is conceivable
that the entire coding sequence for the ancestral CAM was simple
repeats of something like ACT C C A A, ß2-microglobulin-like domains
too evolving from parts of it. Three consecutive copies of sucl
a heptamer, 21 bases in the total length would have given the heptapeptidic
periodicity to the original peptide chain as showl below:
Two base substitutions affecting the above noted periodicity uni' would have produced three consecutive Thr-X dipeptides as show below; two substituted bases are underlined:
At the top of Figure 2, ACT C portion of this hypothetical heptameric unit and its single base substituted deviants are solidly underlined.
Is there any validity to the above noted proposal as to the ulti
mate origin of CAM coding sequences. The first CAM must have
come into being when the first multicellular eukaryote evolved
from unicellular eukaryotes. Slime molds of the genus DictyOs
telium indeed occupies a unique position of being an intermediate between unicellular and multicellular eukaryotes, for these organisms in nutrient rich environments live as unicellular
Figure 2.The indication of propinquity of descents between
the Chicken N-CAM at the top (Hemperly et al. ,1986), csACAM of
the slime mold (Dyctiostelium discoideum) in the middle (Noegel
et al. ,1986) and the mouse transcript of primordial T A T C, T
G T C repeats (Ohno and Epplen,1983). At the top, internal homology
within N-CAM between ß2-rnicroglobulin-like domains and the apparently
more ancient intracellular domain is indicated. Within each ß2-rnicroglobulin-like
domain, three most invariant residues are a pair of cysteine for
the intradomain disulfide bridge (12th and 62nd positions) and TRP
at 24th position. As exemplified in 3rd ß2-rnicroglobulin-like domain
of the chicken N-CAM, THR-X,THR-X dipeptidic repeats invariably
occur in vicinities of these three most invariant residues. THR-X,THR-X
dipeptidic repeats are even more prominent feature of the intracellular
domain. The principle tetramer A C T C and its single base substituted
deviants are solidly underlined. One T A T C primordial tetramer
is identified by a shadded bar. Although not identified, both T
G T G and T G A C tetramers recurr twice each in six short coding
segments of the chicken N-CAM shown at the top. Both T G T G and
T G A Care single base substituted deviants of T G T C. In the middle,
four coding segments of the slime mold csA-CAM which are essentially
encoding THR-X,THR-X dipeptidic repeats are shown. A C T C and its
single base substituted deviants are again identified by solid bars.
The 30 base-long tandem repeats are noteworthy. 441st to 450th codon
differs from 451st to 460th codon by a single base. At the bottom,
a portion of the mouse primordial transcript which is mixed repeats
of T A T C and its single base substituted deviant T G T C is shown
as the ultimate ancestor of CAM coding sequences.
amoeboid creatures. When surroundings become unfavorable, how ever
they begin to aggregate with each other to form the stalk and fruiting
body, much in the manner of fungal species that include various
mushrooms. This aggregation is induced by cyclicAMP and mediated
through csA CAM, and the 494-residue-long amino acid sequence of
Dictyostelium discoideum csA CAM has recently been deduced from
cDNA base sequence (Noegel et al. ,1986). Indeed, it appears as
though this primordial CAM has evolved from Thr-X,Thr-X dipeptidic
repeats as shown in the middle of Figure 2. Particularly noteworthy
is the coding segment encoding 431st to 460th residues, for it is
made of three consecutive copies of the 30-base-long unit. It would
be noted that 2nd and 3rd copies differ from each other only by
a single base substitution, while 11 base substitutions separate
Ist from 2nd. The already noted tetrameric unit A C T C and its
single base substituted deviants are again very prominent in csA
CAM coding sequence. However, it appears that this is a derived
oligomeric unit and not the original repeating unit. The A T/ G
C ratio of A C T C tetramer is 50/50. But csA coding sequence is
quite unusual in that 62.6% of the sequence is A and T. The original
repeating unit of the ultimate ancestor of CAM coding sequences
had to contain considerably more A and T than G and C. Thus, we
come to the tetrameric repeat coding sequence of the mouse which
we previously reported as one of the few ultimate ancestors of all
coding sequences (Ohno and Epplen,1983). This primordial coding
sequence is mixed repeats of two tetrameric units; TAT C and its
single base substituted deviant T G T C as shown at the bottom of
Figure 2. The even representation of two tetramers give to the primordial
coding sequence A T/G C ratio of 62.5/37.5. It would be recalled
that this is the exact ratio found in csA CAM of the slime mold.
Indeed, overall, TAT C, T G T C and their single base substituted
deviants are as prominent as ACT C and its single base substituted
deviants in csA CAM as well as N-CAM coding sequences. However,
the latter gains prominance in segments dominated by Thr-X,Thr-X
dipeptidic repeats preferentially shown at the top and middle of
Figure 2. However, it would be noted that a pair of invariant CYS
of each ß2-microglobulin-like domain of N-CAM is invariably encoded
by apart of T G T G tetremer which is a single base substituted
deviant of T G T C as shown at the top of Figure 2. This applies
to invariant Cys in components of adaptive immune system as well.
Thus, we have deduced the ultimate ancestor of various CAM genes
engaged in cell-cell recognition of early embryonic organogenesis
as well as genes for various components of the adaptive immune system
to mixed repeats of two base tetramers TAT C and T G T C.
THE PRINCIPLE OF RECURRING UNITS IN CONSTRUCTION OF CODING SEQUENCES,
LANGUAGES AND MUSICAL COMPOSITIONS
In our galaxy and others, stars have been formed and are still
being formed by gravitational condensation of molecular clouds that
contain large quantities of molecular hydrogen, water, ammmonia,
carbon monoxide, methyl alcohol, hydrocyanic acid and others. When
the earth was formed some 4.5 billion years ago, the primeval atmosphere
surrounding it must have also contained these chemically reducing
compounds noted above (Holye,1979; Dyson, 1985). In the classical
experiment of Miller in 1953, electric sparks passed through a mixture
of methane, ammonia, molecular hydrogen and water yielded large
fractions of amino acids; notable being alanine of a 2% yield. Oro
in 1960, on the other hand, prepared a concentrated solution of
ammonium cyanidein water. After a period, he found spontaneous converison
of ammonium cyanide to adenine with 0.5% yield, (Miller and Orgel,
1974). Thus, it might be said that the yielding of various building
materials of life was and is inherent in the composition of molecular
clouds. What is life but a form that reproduces near exact replicas
of itself. Thus, we owe our lives to the inherent complementarity
that exists between the two purinepyrimidine pairs of bases. Adenine
pairs with uracil or thymine, while guanine forms hydrogen bonds
with cytosine. Accordingly, when two complementary strand of double-stranded
nucleic acids fall asunder, each can form its complementary strands.
By this way, nucleic acids are inherently designed to perpetuate
their base sequences. Inasmuch as the copying of the template, that
is to say building of a new single stranded RNA complementary to
the preexisted single stranded RNA is based upon the above noted
inherent complementarity between A and U as well as G and C, this
could have taken place in the prebiotic world, for if provided with
a template as long as 60 to 100-base-long, AT P, G T P, UT P and
C T P would align themselves in the proper 3'-5' linkage to form
a complementary strand in the presence of Zn++ metal ion alone (Bridson
and Orgel,1980). The major obstacle in the prebiotic world against
spontaneous generation of the first cell on this earth, thus, was
the formation of long enough templates directly from AT P, G T P,
UT P and C T P, for even in the presence of imidazol and Zn++, autopolymelization
of nucleotide triphosphates yields only base hexamers to decamers.
It follows then that unless these base oligomers were endowed with
the inherent property for self elongation, long enough templates
would not have come into being to start life on this earth. What
if a given base octamer was repeats of the base tetramer such as
TAT C already noted This octamer and its complementary strand formed
after the first round of copying may have reannealed unequally first
copy to the second copy after falling asunder as illustrated below:
T A T C T A T C
A T A G A T A G
The hydrogen bonded paired portion would have served as a primer
for the next round of copying (replication), and after this round,
the octameric template would have elongated itself to the dodecameric
template. Indeed, self elongation is inherent in repeats of base
oligomers (prebiotic nucleic acids were RNA rather than DNA, thus,
two T's of TAT C should have been substituted by U's, but for the
sake of continuity, U AU C is shown as TAT C). This, then, is one
of the many reasons for believing that the first set of coding sequences
emerged at the very beginning of life on this earth were all repeats
of base oligomers (Ohno and Epplen,1983). Indeed, we have already
seen that mixed repeats of TAT C and its single base substituted
deviant T G T C appear to have served as the ultimate ancestor of
one superfamily of genes; first various CAM's for general cell-cell
recognition during the initial stage of organogenesis of all multicellular
eukaryotes and through them, various components of the adaptive
immune system unique to vertebrates. It would be noted that such
tetrameric repeats resemble Julius Caesar's remark already cited
in construction. vi'di in the middle can be considered as TAT C,
then ve'ni preceeding it becomes its two base substituted copy such
as T G A C, while vi'ci following it becomes its single base substituted
copy such as T G C C. Such tetrameric repeats also resemble musical
compositions of the Baroque period. As an example, the treble clef
musical score of Prelude No.1 for well-tempered clavichord by Johann
Sebastian Bach (16856-1750) is shown in Figure 3. It would be noted
that the initial part of this treble clef score in C major (the
top 2 and 2/3rd lines of Figure 3) is essentially four note repeats;
the second half of each 8/8th time signature segment being the exact
copy of the first half. Each half of the time signature segment
is comprised of two sets of the identical four notes; 4th note of
the 1st set overlapping with 1st note of 2nd set. From the last
one-third of the 3rd line of Figure 3 and downward, the theme now
changes to three note repeats. This is because 1st note of each
previous four note unit is now relegated to the base clef score.
Such striking resemblance between Baroque musical compositions and
primordial coding sequences that are repeats of base oligomers tempted
us to devise one invariant rule by which treble clef scores of musical
compositions and coding base sequences become interchangeable. After
considering their respective molecular weights and complementarity,
we have decided to assign a space and a line above it of the treble
clef staff to each of the four bases in the ascending order of A
G T C; Con the line of the previous scale occupying the classical
middle C position (Ohno and Ohno, 1986). This assignment of bases
to the treble clef staff afforded a needed freedom in transmutating
coding base sequences to treble clef musical scores. This freedom
is analogous to that accorded to coding sequences by the redundancy
of
Figure 3 The treble clef score of an initial portion of
J.S. Bach´s Prelude No.1 from well-tempered clavichord is shown accompanied
by a base sequence transcribed from it according to the previously
devised invariant rule (Ohno and Ohno,1986). Initial tetrameric repeat
portion should have encoded a polypeptide chain of tetrapeptidic periodicities,
except for an unfortunate concentration of chain terminaters T A A's
and T A G's at the extreme right of 2nd line. Subsequently, the treble
clef score becomes trimeric repeats monotonously encoding homoserines
occasionally interspersed by stretches of homoisoleucines and homoarginines.
Figure 4, Part III
base tetramer A G C A; last A of the first unit overlapping with
1st A of 2nd unit, and the same with regard to 3rd and 4th units.
This (A G C A) X 4 recurrs as 4th segment. The 2nd segment, on the
other hand, appears as four repeats of A T C A; A T C A being a
single base substituted deviant of the previous A G C A. However,
it would be noted that all four notes of the first segment unit
changed a step each in the second segment unit; in musical notation
from e g b d to d h c f. In our devised rule, however, only two
of the four possible single step changes can be detected as base
substitutions; from a position on the line to a space above as well
as from a position in the space to a line below. Whereas two other
single step changes, from a position on the line to a space below
as well as from a position in the space to a line above, are perceived
as synonymous. In compliance with this rule, a portion of the primordial
T A T C, T G T C repeats corresponding to 91st to 144th cod on in
its longest open reading frame (Ohno and Epplen,1983) has been transmutated
to the musical score in A minor and 8/8th time signature as shown
as part I of Figure 4. This is to be regarded as prelude, for as
part II of Figure 4, the transmutation in C major and again in 8/8th
time signature of 431st and 464th codons of the slime mold csA CAM
coding sequence is shown. As shown in the middle of Figure 2, this
portion of csA CAM coding sequence is comprised of three copies
of the 3O-base-long unit. The unit itself, however, apparently arose
as repeats of shorter oligomers. One such tetramer, A C T C and
its single base substituted deviants are identified by solid bars.
This evolutionary trilogy ends in Part III of Figure 4 which celebrates
the birth of original ß2-microglobulin-like domains in N-CAM -like
cell adhesion proteins. The initial one-third of the coding sequence
for 3rd ß2-microglobulin-like domain of the chicken N-CAM (Hemperly
et al.,1986) has been transmutated to the treble clef musical score
of Part III. Accordingly, Part III contains the first CYS for the
invariably present intradomain disulfide bridge (in the middle of
2nd line of Figure 4, Part III) as well as the equally invariant
TRP seen at the extreme right of 3rd line of Figure 4, Part III.
Both CYS and TRP noted above are parts of THR-X, THR-X dipeptidic
units. Even though the coding segment depicted in Part III is comprised
of only 108 bases, there are still base oligomers recurring within.
Tandem repeats of the pentamer G A T C A is seen at the extreme
right and extreme left of 1st line, and that the base octamer C
T T C CAT C encoding 210th to 213th PRO-SER-ILE (in the middle of
3rd line of Part III) is a single base deviant of C T T C C A c
C encoding 217th to 219th THR-SERTHR seen straddling 3rd and 4th
line of Part III. These recurring base oligomers still provide a
melodious quality to aged coding sequences a billion or more years
removed from their ultimately ancestral oligomeric repeats. SUMMARY
Common denominators in all our cognitive processes are recurring
elements. For example, the first step in deciphering ancient writings
left on excavated tablets of a long lost civilization would be to
identify the most frequently recurring set of symbols, for such
a set likely represents the main subject with which those writing
were concerned; be it a king of a particular dynasty or a taxable
unit of lands. Similarly, our vision perceives patterns as a pattern
only if a pattern is repeated, and a melody becomes a melody only
when it is repeated. The same applies to all components of the adaptive
immune system. All together they form a cognitive pattern because
they were all derived from the ancestral ß2-microglobulin-like unit
which probably arose in cell adhesion molecules (CAM); those plasma
membrane proteins through homologous recognition contributed and
are still contributing to the initial stage in organogenesis of
all multicellular eukaryotes. This reliance on repetitions of our
biological system appeared to have started at the very beginning
of of coding sequences were likely to have been repeats of base
oligomers. I have composed a musical trilogy to celebrate the birth
of original ß2-rnicroglobulin -like domains in CAM-like molecules.
Part I represents the ultimately ancestral mixed repeats of T A
T C and its single base substituted deviant T G T C, Part II de
picts a portion of csA CAM coding sequence of the slime mold (a
link between unicellular and multicellular eukaryotes) which encodes
THR-X, THR-X dipeptictic repeats. Finally, Part III represents 3rct
ß2-rnicroglobulin-like domain of the chicken N-CAM. On one hand,
this symbolizes the immediate ancestor of all the components of
the adaptive immune system. On the other hand, it is linked to the
past through recurring THR-X,THR-X dipeptictic repeats.
REFERENCES
1. Bridson, P.K, and Orgel, L.E. (1980) Catalysis of accurate
poly (C) directed synthesis of 3'-5' linked oligoguanytes by Zn+2.
J. Mol. Biol. 144:567-577.
2. Dyson, F. (1985) Origins of life. Cambridge Univ. Press, Cambridge,
London.
3. Hemperly, J.J. , Murray, B.A. , Ectelman, G.M. and Cunningham,
B.A. (1986) Sequence of a cDNA clone encoding the polysialic acid-rich
and cytoplasmic domains of the neural cell adhesion molecule N-CAM.
Proc. Natl. Acad. Sci .USA 83:3037-3041.
4. Hoyle, F. (1979) Ten faces of the universe. Freeman Press, London.
5. Miller, S.L. and Orgel, L.E. (1974) The origin of life on the
earth. Prentice-Hall, New York.
6. Noegel, A. , Gerisch, G. , Stactler, J. and Westphal, M. (1986)
Complete sequence and transcript regulation of a cell adhesion protein
from aggregating Dictyostelium cells. Ernbo J. 5:14731476.
7. Ohno, S. and Epplen, J. (1983) The primitive code and repeats
of base oligomers as the primordial protein-encocting sequence.
Proc. Natl. Acad. Sci. USA 80:3391-3395.
8. Ohno, S. and Ohno, M. (1986) The all pervasive principle of repetitious
recurrence governs not only coding sequence construction but also
human endeavor in musical composition. Immunogenetics 24:71-78.
|