Immunological Self-Nonself Discrimination and Numerous Peptide Fragments Shared by Unrelated Proteins
S. Ohno     Hämatol. Bluttransf. Vol 35

Beckman Research Institute of the City of Hope, Duarte, CA 91010, USA.

Introduction Ever since the X-ray crystallographic analysis of a class I major histocompatibility complex (MHC) antigen revealed the presence of an alien peptide fragment sandwiched between its two parallel IXhelices [1], the immunological self became a multitude of such peptide fragments, usually 15-20 residues long, derived from host proteins after intracellular processing. For the mainly intra thymic education of self to cytotoxic T cells, these fragments are presented in association with class I MHC antigens, while for the education of helper T cells, they are presented with class II MHC antigens. For those who believe that proteins represent random assemblages of 20 amino acid residues, the above manner of presentation of self poses no problem, for 15- 20 residues long peptide fragments represent an astronomical variety of 2015-202°. With this much variety, homologous peptide fragments are to be found only among proteins related by the propinq uity of their descents. Thus, viral and other pathogenic peptide fragments would be distinct from most of the host peptide fragments. The purpose of this paper is to show that the above is far from the truth. Many peptide fragments are syntactical in construction, and are therefore to be found in many totally unrelated proteins. The average amino acid composition deduced from 18383 entries in Database is as follows: (1) The top four residues, Leu, Ala, Gly, and Ser, in this order, comprise 32% of the total, and (2) the bottom four residues, His, Met, Cys, and Trp, in this order, comprise only 7% of the total. All 20 homodipeptides occurred at above their expected rates, thus, homodipeptides in the average protein acounted for 14% of its length. While the LeuLeu homodipeptide was the most numerous of the 400 dipeptides, the second in rank was Leu- Val, occurring at nearly twice the expected rate, while its reciprocal Val-Leu was only one-third as numerous [2]. The above can be viewed as a rudimentary indication of syntatic structures in amino acid sequences. In order to expand on this theme, I have chosen four totally unrelated proteins as representatives of the warm-blooded verterate host. They are: (1) human ET .REC (estrogen receptor), 595 residues long [3]; (2) chicken C-SRC (tyrosine kinase), 533 residue long [4]; (3) human S.ALB (serum albumin), 585 residue long [5]; and (4) human PGK (phospholglycerate kinase) 415 residue long [6]. Lys-Leu- and Leu-Lys-Containing Oligopeptides in Four Host Proteins We shall now start our inquiry by choosing a pair of Leciprocal dipeptides, LysLeu and Leu-Lys. According to the aforementioned extensive survey of 18383 entries in Database, Lys-Leu occurred at about the expected rate, while the incidence of its reciprocal Leu-Lys was slightly less [2]. In the case of four host proteins totalling 2128 residues, there were only 12 Lys-Leu and 13 Leu Lys. Yet, three of the 12 Lys-Leu dipeptides appeared as Val-Lys-Leu and two of them as Ser-Lys-Leu tripeptides. These are indisputable cases of preferential associations, for the most abundant tripeptide ending in Lys-Leu should have been the palindromic Leu-Lys-Leu which, on a random basis, had the expected incidence of 1.08. The fact is that there was not a single Leu-Lys-Leu tripeptide among the four proteins. As to its carboxyl end partners, the Lys-Leu dipeptide showed a distinct preference for Val and the next Gly, for there were four Lys-Leu- Val and two Lys-Leu-Gly. Accordingly, it was no surprise that two totally unrelated proteins, C-SRC and S.ALB, shared a pair of homologous tetrapeptides. Lys-Leu- ValGIn and Lys-Leu-Val-Asn, as shown in Fig. 1 a. As to the 13 Leu-Lys found in four host proteins, this dipeptide showed a definite preference to associate with Phe as its amino terminal partner (four PheLeu-Lys in C-SCR, S.ALB, and PGK) and a preference for Ser as its carboxyl terminal partner (three Leu-Lys-Ser in ET.REC and PGK). Accordingly, a pair of homologous pentapeptides containing Leu-Lys was shared between ET.REC and PGK and a pair of identical tetrapeptides, Thr-Phe-Leu-Lys, between S.ALB and PGK. As to two pairs of homologous tetrapeptides containing Leu-Lys or IleLys, the first was shared by S.ALB and PGK and the second by ET.REC and CSRC, as also shown in Fig. 1 a.

Lys-Leu- and Leu-Lys-Containing Oligopeptides in Two Influenza A Virus Hemagglutinins

As it has now become clear that totally unrelated host proteins commonly share homologous and identical penta- and tetrapeptides between them, comparison between vertebrate host proteins and viral proteins becomes quite interesting. For this comparison, I have chosen two hemagglutinins of influenza A virus: INF.HEM I and INF.HEM II [7]. Together, these two hemagglutinins comprise only 550 residues, and so, there were only three each of Lys-Leu and Leu-Lys. Nevertheless, it should be noted that within these two hemagglutinins, they were parts of two pairs of homologous tetrapeptides, as shown in Fig. 1 b. It would also be noted that two of the three Leu-Lys appeared as Leu-Lys-Ser in INF.HEM II. Thus, the preference of Leu-Lys for Ser as its carboxyl end partner is truly catholic. The above aroused interest on the longstanding question of self versus nonself. Confining ourselves only to Lys-Leu- and Leu-Lys-containing oligopeptides, how long a fragment of influenza virus hemagglutinins was homologous with that contained in one or the other of the four vertebrate host proteins?

Fig. 1. a Lys-Leu- and Leu-Lys-containing oligopeptides in four host proteins. On the left are the number of Lys-Leu dipeptides, two pairs of Lys-Lcu-containing homologous tetrapeptides, and a pair of Lys- Val-containing identical tetrapeptides found in four unrelated proteins of the vertebrate host. They are undcrlined by open bars; thick bars are for identical tetrapeptides and thinner bars for homologous ones. As to the identity of protein sources of thcsc oligopeptidcs, see the text. Bclow these three pairs of homologous and identical tetrapeptides, eight Lys-Leu-containing tripeptides that were found more than once arc identified and each's source is also indicated, if not alrcady shown. Identical residues are shown in all capitalletters, while the third Ietters of homologous residues are shown in small capitals. On the right, the same with regard to Leu-Lys dipeptides and Leu-Lys-containing oligopcptides are shown. They are underlined by solid bars. b Lys-Leu to the li!ft and Leu-Lys to the right of homologous tetrapeptides found within INF.HEM I and II. c Three Lys-Leu- and onc Leu-Lys-containing oligopeptide of host proteins that were homologous and identical with those of INF.HEM II

Lys-Leus-and Leu-Lys-Containing Oligopeptides in Host Versus Virus

Although there were onla three Lys-Leu in two hemagglutins of influenza a virus, compared to 11 Lys-Leu among the four host proteins, these three Lys-Leu of the virus can also be considered as homologous to six Lys- Val and six Lys-Ile of the host. As shown in Fig. 1 c, the decapeptide ending in Lys-Val of host PGK occupying the 397th-406th positions was seven-tenths homologous with the decapeptide ending in Lys-Leu of INF.HEM II occupying the 42nd-51st positions. In view of the fact that the total number of proteins possessed by the vertebrate host is of the order of 104, it would be no surprise if the decapeptide identical to the above of INF.HEM II were found in at least one unknown host protein. If such is the case, this viral decapeptide is an indisputable self. On the other hand, if the homology of seventenths or thereabouts is the maximal obtainable between this viral peptide fragment and a multitude of host peptide fragments, can it be universally recognized as a nonself? Most instructive concerning this question is the finding reported on human cytotoxic T cell responses to the nuclear matrix protein of influenza A virus [8]. It has been shown that only internal viral proteins, such as the matrix and nucleoproteins of influenza A virus, can invoke a cytotoxic T cell response in infected human and mouse hosts. As far as the matrix protein was concerned, however, it proved incapable of eliciting cytotoxic T cell responses from those human individuals whose class I MHC haplotypes contained HLA-C7 [8]. For those individuals, all peptide fragments of the influenza matrix protein must have appeared as self. Although cytotoxic T cells of HLA-A2 individuals infected with influenza A virus readily responded to the matrix proteins, the test of various peptide fragments revealed that even HLA-A2 cytotoxic T cells recognized only one 19-residue-long peptide fragment representing positions 55- 73 of the matrix protein as nonself [8]. It is probable that positions 42- 51 of INF.HEM II shown in Fig. 1 care the type of peptide fragments that are re cognized as nonself only by helper T cells of particular class 11 MHC haplotypes, thus creating classical responders and nonresponders among individuals. Figure 1 c also shows that two Lys-llecontaining octapeptides of the host ( one derived from ET.REC and the other from C-SRC) enjoyed seven-eighths and sixeighths homology with two heptapeptides of INF.HEM 11, if lIe or Lys-lle of each was deleted. As to Leu-Lys-containing oligopeptides, I shall be content to show only the identical pentapeptide, Val-Glu-Leu-LysSer, shared by PGK of the host and INF.HEM 11. Actually, positions 81-86 are entirely homologous with positions 175-180 of INF.HEM 11. In addition, this PGK hexapeptide was also fivesixths homologous with positions 35~40 of INF.HEM II.

ALL Proteins as Divergent Essays Written in One Language

During the past several years, we have advanced the notion that all coding sequences in this world are scriptures written in one and the same DNA language [9]. Here, it was shown that the same applies to amino acid sequences of proteins as well. As long as they are written in the same language, two essays on entirely different subjects may have surprisingly many identical and similar components. Witness the following: "The term high ceiling has been used to denote a group of diuretics that have a distinctive action on renal tubular function." "The term high ceiling has been used to denote a group of stocks that show a distinctive pattern of price fluctuations." The first was derived from an essay on diuretic drugs, while the second was from one on stocks and stock markets, yet 15 of the 22 words are identical. Is it a surprise, then, if totally unrelated proteins derived from vertebrates and from a virus share a multitude of identical and homologous oligopeptides?


1. Bjorkman PJ, Saper MA, Samraouri B, Bennett WS, Strominger JL, Wiley DC (1987) The foreign antigen binding site and T cell recognition regions of class I histocompatibility antigens. Nature 329.512-518
2. Seto Y (1989) Formation of proteins on the primitive earth, Evidence for the oligoglycine hypothesis. Viva Origino 17: 153 -163
3. Greene GL, Gilna P, Waterfield M, Baker A, Hort Y, Shine J (1986) Sequence and expression of human estrogen receptor complementary DNA. Science 231: 1150-1154
4. Takeya T, Hanafusa H (1983) Structure and sequence of the cellular gene homologous to the RSV src gene and the mechanism for generating the transforming virus. Cell 32:881-890
5. Minghetti PP, Ruffner DE, Kuang WJ, Dennison OE, Hawkins JW, Beat tie WG, Dugaiczyk A (1986) Molecular structure of the human albumin gene is revealed by nucleotide sequence within q11-22 of chromosome 4, J BioI Chem 261:67476757
6. Michelson AM, Markham AF, Orkin SH (1983) Isolation and DNA sequence of a full-length cDNA clone for human Xchromosome-encoded phosphoglycerate kinase. Proc Natl Acad Sci USA 80:472-476
7. Verhoeyen M, Fang R, Jou WM, Devos R, Huylebroeck D, Saman E, Fiers W (1980) Antigenic drift between the haemagglutinin of the Hong Kong influenza strains A/Aichi/2/68 and A/Victoria/3/75. Nature 286:771-776
8. Gotch F, Rothbard J, Howland K, Townsend A, McMichael A ( 1987) Cytotoxic T lymphocytes recognize a fragmcnt of influenza virus matrix protein in association with HLA-A2. Nature 326:881-882
9. Ohno S (1990) Grammatical analysis of DNA sequences provides a rationale for the regulatory control of an entire chromosome. Genet Res (Camb) 56:115-120

TEST    8 ter TEST