Beckman Research Institute of the City of Hope, Duarte, CA 91010,
USA.
Introduction Ever since the X-ray crystallographic analysis of
a class I major histocompatibility complex (MHC) antigen revealed
the presence of an alien peptide fragment sandwiched between its
two parallel IXhelices [1], the immunological self became a multitude
of such peptide fragments, usually 15-20 residues long, derived
from host proteins after intracellular processing. For the mainly
intra thymic education of self to cytotoxic T cells, these fragments
are presented in association with class I MHC antigens, while for
the education of helper T cells, they are presented with class II
MHC antigens. For those who believe that proteins represent random
assemblages of 20 amino acid residues, the above manner of presentation
of self poses no problem, for 15- 20 residues long peptide fragments
represent an astronomical variety of 2015-202°. With this much variety,
homologous peptide fragments are to be found only among proteins
related by the propinq uity of their descents. Thus, viral and other
pathogenic peptide fragments would be distinct from most of the
host peptide fragments. The purpose of this paper is to show that
the above is far from the truth. Many peptide fragments are syntactical
in construction, and are therefore to be found in many totally unrelated
proteins. The average amino acid composition deduced from 18383
entries in Database is as follows: (1) The top four residues, Leu,
Ala, Gly, and Ser, in this order, comprise 32% of the total, and
(2) the bottom four residues, His, Met, Cys, and Trp, in this order,
comprise only 7% of the total. All 20 homodipeptides occurred at
above their expected rates, thus, homodipeptides in the average
protein acounted for 14% of its length. While the LeuLeu homodipeptide
was the most numerous of the 400 dipeptides, the second in rank
was Leu- Val, occurring at nearly twice the expected rate, while
its reciprocal Val-Leu was only one-third as numerous [2]. The above
can be viewed as a rudimentary indication of syntatic structures
in amino acid sequences. In order to expand on this theme, I have
chosen four totally unrelated proteins as representatives of the
warm-blooded verterate host. They are: (1) human ET .REC (estrogen
receptor), 595 residues long [3]; (2) chicken C-SRC (tyrosine kinase),
533 residue long [4]; (3) human S.ALB (serum albumin), 585 residue
long [5]; and (4) human PGK (phospholglycerate kinase) 415 residue
long [6]. Lys-Leu- and Leu-Lys-Containing Oligopeptides in Four
Host Proteins We shall now start our inquiry by choosing a pair
of Leciprocal dipeptides, LysLeu and Leu-Lys. According to the aforementioned
extensive survey of 18383 entries in Database, Lys-Leu occurred
at about the expected rate, while the incidence of its reciprocal
Leu-Lys was slightly less [2]. In the case of four host proteins
totalling 2128 residues, there were only 12 Lys-Leu and 13 Leu Lys.
Yet, three of the 12 Lys-Leu dipeptides appeared as Val-Lys-Leu
and two of them as Ser-Lys-Leu tripeptides. These are indisputable
cases of preferential associations, for the most abundant tripeptide
ending in Lys-Leu should have been the palindromic Leu-Lys-Leu which,
on a random basis, had the expected incidence of 1.08. The fact
is that there was not a single Leu-Lys-Leu tripeptide among the
four proteins. As to its carboxyl end partners, the Lys-Leu dipeptide
showed a distinct preference for Val and the next Gly, for there
were four Lys-Leu- Val and two Lys-Leu-Gly. Accordingly, it was
no surprise that two totally unrelated proteins, C-SRC and S.ALB,
shared a pair of homologous tetrapeptides. Lys-Leu- ValGIn and Lys-Leu-Val-Asn,
as shown in Fig. 1 a. As to the 13 Leu-Lys found in four host proteins,
this dipeptide showed a definite preference to associate with Phe
as its amino terminal partner (four PheLeu-Lys in C-SCR, S.ALB,
and PGK) and a preference for Ser as its carboxyl terminal partner
(three Leu-Lys-Ser in ET.REC and PGK). Accordingly, a pair of homologous
pentapeptides containing Leu-Lys was shared between ET.REC and PGK
and a pair of identical tetrapeptides, Thr-Phe-Leu-Lys, between
S.ALB and PGK. As to two pairs of homologous tetrapeptides containing
Leu-Lys or IleLys, the first was shared by S.ALB and PGK and the
second by ET.REC and CSRC, as also shown in Fig. 1 a.
Lys-Leu- and Leu-Lys-Containing Oligopeptides in Two Influenza
A Virus Hemagglutinins
As it has now become clear that totally unrelated host proteins
commonly share homologous and identical penta- and tetrapeptides
between them, comparison between vertebrate host proteins and viral
proteins becomes quite interesting. For this comparison, I have
chosen two hemagglutinins of influenza A virus: INF.HEM I and INF.HEM
II [7]. Together, these two hemagglutinins comprise only 550 residues,
and so, there were only three each of Lys-Leu and Leu-Lys. Nevertheless,
it should be noted that within these two hemagglutinins, they were
parts of two pairs of homologous tetrapeptides, as shown in Fig.
1 b. It would also be noted that two of the three Leu-Lys appeared
as Leu-Lys-Ser in INF.HEM II. Thus, the preference of Leu-Lys for
Ser as its carboxyl end partner is truly catholic. The above aroused
interest on the longstanding question of self versus nonself. Confining
ourselves only to Lys-Leu- and Leu-Lys-containing oligopeptides,
how long a fragment of influenza virus hemagglutinins was homologous
with that contained in one or the other of the four vertebrate host
proteins?
Fig. 1. a Lys-Leu- and Leu-Lys-containing oligopeptides in four
host proteins. On the left are the number of Lys-Leu dipeptides,
two pairs of Lys-Lcu-containing homologous tetrapeptides, and a
pair of Lys- Val-containing identical tetrapeptides found in four
unrelated proteins of the vertebrate host. They are undcrlined by
open bars; thick bars are for identical tetrapeptides and thinner
bars for homologous ones. As to the identity of protein sources
of thcsc oligopeptidcs, see the text. Bclow these three pairs of
homologous and identical tetrapeptides, eight Lys-Leu-containing
tripeptides that were found more than once arc identified and each's
source is also indicated, if not alrcady shown. Identical residues
are shown in all capitalletters, while the third Ietters of homologous
residues are shown in small capitals. On the right, the same with
regard to Leu-Lys dipeptides and Leu-Lys-containing oligopcptides
are shown. They are underlined by solid bars. b Lys-Leu to the li!ft
and Leu-Lys to the right of homologous tetrapeptides found within
INF.HEM I and II. c Three Lys-Leu- and onc Leu-Lys-containing oligopeptide
of host proteins that were homologous and identical with those of
INF.HEM II
Lys-Leus-and Leu-Lys-Containing Oligopeptides in Host Versus Virus
Although there were onla three Lys-Leu in two hemagglutins of influenza
a virus, compared to 11 Lys-Leu among the four host proteins, these
three Lys-Leu of the virus can also be considered as homologous
to six Lys- Val and six Lys-Ile of the host. As shown in Fig. 1
c, the decapeptide ending in Lys-Val of host PGK occupying the 397th-406th
positions was seven-tenths homologous with the decapeptide ending
in Lys-Leu of INF.HEM II occupying the 42nd-51st positions. In view
of the fact that the total number of proteins possessed by the vertebrate
host is of the order of 104, it would be no surprise if the decapeptide
identical to the above of INF.HEM II were found in at least one
unknown host protein. If such is the case, this viral decapeptide
is an indisputable self. On the other hand, if the homology of seventenths
or thereabouts is the maximal obtainable between this viral peptide
fragment and a multitude of host peptide fragments, can it be universally
recognized as a nonself? Most instructive concerning this question
is the finding reported on human cytotoxic T cell responses to the
nuclear matrix protein of influenza A virus [8]. It has been shown
that only internal viral proteins, such as the matrix and nucleoproteins
of influenza A virus, can invoke a cytotoxic T cell response in
infected human and mouse hosts. As far as the matrix protein was
concerned, however, it proved incapable of eliciting cytotoxic T
cell responses from those human individuals whose class I MHC haplotypes
contained HLA-C7 [8]. For those individuals, all peptide fragments
of the influenza matrix protein must have appeared as self. Although
cytotoxic T cells of HLA-A2 individuals infected with influenza
A virus readily responded to the matrix proteins, the test of various
peptide fragments revealed that even HLA-A2 cytotoxic T cells recognized
only one 19-residue-long peptide fragment representing positions
55- 73 of the matrix protein as nonself [8]. It is probable that
positions 42- 51 of INF.HEM II shown in Fig. 1 care the type of
peptide fragments that are re cognized as nonself only by helper
T cells of particular class 11 MHC haplotypes, thus creating classical
responders and nonresponders among individuals. Figure 1 c also
shows that two Lys-llecontaining octapeptides of the host ( one
derived from ET.REC and the other from C-SRC) enjoyed seven-eighths
and sixeighths homology with two heptapeptides of INF.HEM 11, if
lIe or Lys-lle of each was deleted. As to Leu-Lys-containing oligopeptides,
I shall be content to show only the identical pentapeptide, Val-Glu-Leu-LysSer,
shared by PGK of the host and INF.HEM 11. Actually, positions 81-86
are entirely homologous with positions 175-180 of INF.HEM 11. In
addition, this PGK hexapeptide was also fivesixths homologous with
positions 35~40 of INF.HEM II.
ALL Proteins as Divergent Essays Written in One Language
During the past several years, we have advanced the notion that
all coding sequences in this world are scriptures written in one
and the same DNA language [9]. Here, it was shown that the same
applies to amino acid sequences of proteins as well. As long as
they are written in the same language, two essays on entirely different
subjects may have surprisingly many identical and similar components.
Witness the following: "The term high ceiling has been used to denote
a group of diuretics that have a distinctive action on renal tubular
function." "The term high ceiling has been used to denote a group
of stocks that show a distinctive pattern of price fluctuations."
The first was derived from an essay on diuretic drugs, while the
second was from one on stocks and stock markets, yet 15 of the 22
words are identical. Is it a surprise, then, if totally unrelated
proteins derived from vertebrates and from a virus share a multitude
of identical and homologous oligopeptides?
References
1. Bjorkman PJ, Saper MA, Samraouri B, Bennett WS, Strominger JL,
Wiley DC (1987) The foreign antigen binding site and T cell recognition
regions of class I histocompatibility antigens. Nature 329.512-518
2. Seto Y (1989) Formation of proteins on the primitive earth, Evidence
for the oligoglycine hypothesis. Viva Origino 17: 153 -163
3. Greene GL, Gilna P, Waterfield M, Baker A, Hort Y, Shine J (1986)
Sequence and expression of human estrogen receptor complementary
DNA. Science 231: 1150-1154
4. Takeya T, Hanafusa H (1983) Structure and sequence of the cellular
gene homologous to the RSV src gene and the mechanism for generating
the transforming virus. Cell 32:881-890
5. Minghetti PP, Ruffner DE, Kuang WJ, Dennison OE, Hawkins JW,
Beat tie WG, Dugaiczyk A (1986) Molecular structure of the human
albumin gene is revealed by nucleotide sequence within q11-22 of
chromosome 4, J BioI Chem 261:67476757
6. Michelson AM, Markham AF, Orkin SH (1983) Isolation and DNA sequence
of a full-length cDNA clone for human Xchromosome-encoded phosphoglycerate
kinase. Proc Natl Acad Sci USA 80:472-476
7. Verhoeyen M, Fang R, Jou WM, Devos R, Huylebroeck D, Saman E,
Fiers W (1980) Antigenic drift between the haemagglutinin of the
Hong Kong influenza strains A/Aichi/2/68 and A/Victoria/3/75. Nature
286:771-776
8. Gotch F, Rothbard J, Howland K, Townsend A, McMichael A ( 1987)
Cytotoxic T lymphocytes recognize a fragmcnt of influenza virus
matrix protein in association with HLA-A2. Nature 326:881-882
9. Ohno S (1990) Grammatical analysis of DNA sequences provides
a rationale for the regulatory control of an entire chromosome.
Genet Res (Camb) 56:115-120
|