|
|
||||||||
Medical College of Georgia, School of Dentistry, Department of Oral Biology and Maxillofacial Pathology, 1120 15th Street, Augusta, GA 30912; ddickins{at}mail.mcg.edu
Abstract (I) Introduction (1) BACKGROUND (2) THE CYSTATIN SUPERFAMILY (a) Type 2 cystatins (i) Chicken cystatin and cystatin C (ii) Salivary cystatins (a) Human (b) Rodent (c) Other salivary cystatins (iii) Other type 2 cystatins (iv) Generation of diversity in mammalian type 2 cystatins (a) Polymorphisms (b) Glycosylation (c) Phosphorylation (d) N-terminal processing (b) Type 1 cystatins (the stefins) (c) Type 3 cystatins (d) Type 4 cystatins (e) Unassigned proteins (i) CRP/CRES/testatin (ii) Atypical cystatins from other phyla (iii) Other proteins (3) THE PLACE OF SALIVARY CYSTATINS IN THE CYSTATIN SUPERFAMILY (II) Papain-like CP Inhibitory Activities of Human Type 2 Cystatins (III) Structure-Function Relationships in Type 2 Cystatins (i) N-TERMINAL DOMAIN (ii) QXVXG DOMAIN (iii) PW DOMAIN (IV) OTHER DOMAINS (IV) Cystatin Genes in the Mammalian Genome (V) Cystatin Gene Expression (1) HUMAN SD-TYPE CYSTATINS (2) RAT CYSTATIN S (3) EXPRESSION OF NON-SD-TYPE CYSTATIN GENES (V) Potential Functions of Type 2 Cystatins (1) INHIBITION OF ENDOGENOUS CPS (a) Physiological regulation by type 2 cystatins (b) Cystatins and cancer (c) Cystatins and injury (d) Cystatins, endogenous CPs, and periodontal disease (2) INHIBITION OF EXOGENOUS CPS (3) IMMUNOMODULATION (4) ANTIMICROBIAL AND ANTIVIRAL ACTIVITIES (5) CONTROL OF MINERALIZATION (6) OTHER POSSIBILITIES (VI) Cystatin Phylogeny and Function (VII) Animal Models for Salivary Cystatin Function (VIII) Summary and Conclusions REFERENCES
| Abstract |
|---|
|
|
|---|
Key words. Cystatin, cysteine protease, superfamily, evolution, saliva, human, function
| (I) Introduction |
|---|
|
|
|---|
(1) BACKGROUND
Scission of peptide bonds is an essential reaction in living cells, and there is a considerable number of peptidases that catalyze this reaction with various degrees of specificity. CPs use a cysteine residue in the active site as the nucleophile in the reaction (see Dickinson, 2002, and references therein). Release of proteolytic enzymes outside of their normal compartment has the potential to cause serious degradation and pathologyan effect taken advantage of by pathogenic organisms from a wide range of phyla. Therefore, control of proteolytic enzyme activity is mandatory. Building upon a very early report (see Barrett et al., 1986, for a review of this earlier literature) of a bovine trypsin inhibitor in chicken egg white, a search for egg white inhibitors of plant proteinases (the CPs ficin and papain) identified a relatively small (12.7 kDa) protein that was a potent inhibitor (Ki = 10 nM). The term cystatin was coined for this protein, which was shown also to inhibit mammalian CPs of the papain superfamily, including cathepsins B, C, H, and L. Cystatins form 1:1 reversible complexes with CPs, in competition with the substrate, but in some cases the binding is so tight as to be physiologically irreversible.
(2) THE CYSTATIN SUPERFAMILY
Examination of mammalian tissues and serum revealed several CP inhibitors ranging in size from 11 kDa to 175 kDa. Sequence analysis of the smaller inhibitors from tissues showed that they were related to each other, and to egg white cystatin (reviewed in Barrett et al., 1986). One of these smaller proteins was identical to human
-trace, a 13,260-Da basic protein first described as a microprotein constituent of normal cerebrospinal fluid, and of urine from patients with renal failure. The protein was named cystatin C. Thus, historically, the cystatins have been defined as proteins with a particular sequence (and hence structural) motif that bind tightly, but reversibly, to CPs, forming an enzymatically inactive complex. Proteins that share a certain level of similarity (e.g., 50%) along with other traits, such as similarity in function or expression pattern, are considered to belong to the same family, and related families are grouped into a superfamily. Underlying this grouping is the concept of descent from a common ancestor, and genes or proteins that are related through common ancestry are described as homologous. Orthologs are formed by speciation, paralogs by duplication.
Alignment of cystatin sequences identifies three regions conserved during more than one billion years of evolution: a glycine residue in the N-terminal region (G11 in human cystatin C numbering), a QXVXG motif in one hairpin loop, and a PW motif in a second (see Figs. 1, 2![]()
). These regions form a surface on the cystatin molecule that can dock with the substrate binding site of family C1 (papain-like) enzymes (see below). Four main cystatin families have been distinguished, but over the past several years (according to the criterion of sequence similarity), the cystatin superfamily has been greatly expanded.
|
|
The three-dimensional structure of chicken cystatin, as determined by x-ray crystallographic analysis (Bode et al., 1988), is shown in Fig. 2
. The structure in solution has also been determined by NMR (Dieckmann et al., 1993). The main feature of the structure is a five-stranded ß-sheet wrapped around a five-turn
-helix. The N-terminal 10 residues are disordered and flexible. The conserved G11 is present at the N-terminal of the first short ß-strand A, which is connected to the five-turn
-helix 1. A match to the conserved QXVXG motif (QLVSG; see below) is found in a hairpin loop between ß-strands B and C. In the crystal, ß-strand C is connected to strand D via a short second
-helix 2. In solution, this
-helix is not detected by NMR, and this region forms a loop that lacks secondary structure. This
-helix 2/loop region is anchored by the first disulfide bond. The conserved PW motif is located in a hairpin loop between ß-strands D and E, which are linked by the second disulfide bond. The three conserved regions form a wedge-shaped edge complementary to the CP active site. The side-chains of N-terminal residues R8, L9, and V10 in human cystatin C interact with the S4, S3, and S2 substrate-binding subpockets, respectively, of the target enzyme. G11A12 form a wedge-shaped edge complementary to the active site cleft of papain, with G11 in the S1 site. However, this bond would be in an inappropriate conformation and too far away from the reactive site cysteine to be cleaved. Both hairpin loops make major binding interactions with residues in the vicinity of the reactive site cysteine. The three domains interact with the target enzyme fairly independently (Hall et al., 1993, 1995), and binding of cystatins to papain can occur with little, if any, conformational changes in either protein (Bode et al., 1988). Cystatin binding to cathepsin B is more complex, involving a two-step mechanism. This CP has a loop that occludes the active site. An initial weak interaction, most likely involving the N-terminal region, is followed by a conformational change in which the inhibitor displaces the occluding loop, allowing tight binding to occur (Pavlova et al., 2000). Cystatins vary considerably in their ability to displace the loop.
(ii) Salivary cystatins
(a) Human
Early studies aimed at characterizing human salivary proteins by purification, followed by peptide sequencing, established that whole saliva contains several closely related proteins with homology to cystatin C, and having activity as cysteine peptidase inhibitors (see Bobek and Levine, 1992; Henskens et al., 1996c, for reviews of this earlier literature). Three proteins with similar sequences (and related isoforms; see below) were identified, now named cystatins S, SA, and SN. These three proteins, and a more distantly related cystatin D, have been cDNA-cloned (Al-Hashimi et al., 1988; Frieje et al., 1993a; Bobek et al., 1991). The cDNA sequences are about 740 bp long. Comparison of the encoded pre-proproteins with the purified salivary proteins demonstrates a typical secretory signal peptide of 20 residues, followed by an 11-residue extension N-terminal to the conserved G11 (Fig. 3
). At the protein level, cystatins S, SA, and SN have about 88% identity (i.e., they are very similar), while cystatin D has 60% or less similarity. The structures of cystatins S, SA, and SN have been modeled based on the structure of chicken cystatin (Bell et al., 1997) (see Fig. 2
). Overall, the predicted structures of cystatins SA and SN were more similar to that of chicken cystatin than that of cystatin S, consistent with the different (generally poorer) inhibitory properties of cystatin S (see below). For reasons explained below, cystatins S, SA, and SN will be referred to as the S-like cystatins, and with cystatin D the four human proteins will be referred to as the SD-type cystatins.
|
(c) Other salivary cystatins
Surprisingly, there are no published reports of murine salivary cystatins. Snake venom glands are modified salivary glands, and a subfamily of type 2 cystatins has been isolated from snake venom. These proteins have a six-residue insertion (compared with cystatin C) in the
-helix 2/loop (Fig. 1
). A cystatin from snake venom was shown to be a good inhibitor of papain and cathepsins B, L, and S (Brillard-Bourdet et al., 1998).
(iii) Other type 2 cystatins
A novel human cystatin was independently identified by expressed sequence tag (EST) sequencing of human amniotic and fetal skin epithelial cell cDNA libraries (cystatin E; Ni et al., 1997), and as a down-regulated gene in human breast metastatic tumor cells, as compared with primary tumors (cystatin M; Sotiropoulou et al., 1997). Cystatin E/M is a secreted protein with a relatively low similarity to other human type 2 cystatins (26-34% amino acid identity), and a five-residue insertion in the
-helix 2/loop (Fig. 1
). Cystatin F, also called leukocystatin and CMAP, was initially independently identified by EST screening of human dendritic cells (Halfon et al., 1998), and of human cDNA libraries (Ni et al., 1998), and as a metastasis-associated gene identified by differential display in murine carcinoma cells that showed a high rate of metastasis to the liver (Morita et al., 1999). The human and mouse cDNA sequences predict a secreted protein with a large 17-residue N-terminal extension before the conserved G residue (Fig. 3
). Cystatin F is also unusual in possessing a third disulfide bond that anchors the N-terminal to the body of the protein, and a positively charged residue in the normally hydrophobic QXVXG motif.
Although numerous type 2 cystatins have been described from vertebrates, they are not confined to this subphylum. A cystatin has been isolated from the hemocytes of an invertebratethe horseshoe crab Tachypleus tridentatus (reviewed in Iwanaga et al., 1998). Tachypleus cystatin is a fairly standard type 2 protein: It is secreted, and has the two appropriately positioned disulfide bonds (Fig. 1
). The protein is a potent CP inhibitor.
(iv) Generation of diversity in mammalian type 2 cystatins
At the level of the gene, diversity is generated by the existence of multigene families (see below) encoding the different proteins introduced above, and by polymorphisms affecting the coding sequence. Only a limited number of polymorphisms in human type 2 cystatins have been identified. Far more diversity in the secreted proteins is generated by several post-translational modifications.
(a) Polymorphisms
There are two common haplotypes of CST3, designated A and B, that differ at three sites. Two base-pair differences are localized to the promoter region, and one in the signal peptide domain that causes an A
T substitution. However, the secreted protein produced by either haplotype is the same. A mutation with the substitution L68Q has been shown to cause the rare autosomal-dominant disease, hereditary cerebral hemorrhage with amyloidosis, Icelandic type (see below). A limited number of polymorphisms affecting the coding sequence of human SD-type cystatins have been found. Two alleles at the CST2 locus (encoding cystatin SA) are known (Shintani et al., 1994; Saitoh et al., 1998; Haga and Minaguchi, 1999). CST2*1 and CST*2 differ by two point mutations: a G
A transition in exon 2, and an A
T transversion in exon 3. These produce *16*2 substitutions of G59
D59 and E120
D120 in the corresponding SA1 and SA2 proteins. The first substitution is in the QXVXG motif (SA1 QIVGG
SA2 QIVDG). Recombinant SA1 and SA2 differ in their inhibitory activities (Saitoh et al., 1998). A T/C transition in exon 1 of the CST5 gene produces alleles encoding C25 or R25 (cystatin C numbering) with comparable frequency (0.55, 0.45, respectively) in the population (Balbin et al., 1993). Both forms have similar Ki values toward CPs. This would be expected, since the variation is in
-helix 1 on the side of the molecule opposite the inhibitory surface (Balbin et al., 1994).
(b) Glycosylation
Type 2 cystatins are generally described as non-glycosylated. However, there are numerous exceptions: For example, while human cystatin C lacks N-glycosylation sites, about 20% of rat cystatin C is N-glycosylated at a consensus NLT site in the
-helix 2/loop region (Esnard et al., 1990). Cystatin F has two functional N-glycosylation sites. In cystatin E/M, a functional N-glycosylation site is located at N108, adjacent to the conserved PW motif, and about 30-40% of the protein released by cultured mammary cells is glycosylated (Ni et al., 1997; Sotiropoulou et al., 1997). The possible effect of glycosylation on CP binding is unknown.
(c) Phosphorylation
In chicken cystatin, but not human cystatin C, a phosphorylated site (S82) is present in the
-helix 2/loop. Both human cystatin S and SA are partially phosphorylated (Isemura et al., 1991; Lamkin et al., 1991; Ramasubbu et al., 1991; Shintani et al., 1994), whereas cystatin SN is not. Various forms of cystatin S have been purified from saliva, with S(N-terminal), S2, S98, S111, and S114 being identified as phosphorylation sites. S2 and S98 conform to a Golgi kinase site [SXE/S(PO4)], and phosphorylation of S2 would make S(N-terminal) a consensus site. A consensus site is also present at S98 in cystatin SN. S111 conforms to a casein kinase 2 site [S/TXXD/E/S(PO4)]. The effects of dephosphorylation on inhibition have not been examined in detail. Two forms of cystatin S were found in human nasal and broncho-alveolar lavage fluid that likely resulted from phosphorylation of a portion of the protein in the N-terminal (Lindahl et al., 1999). Interestingly, the relative proportions of the two forms differed between the secretions, and the level of the presumptive phosphorylated form was decreased 10-fold in the broncho-alveolar lavage fluid from smokers. The physiological significance of this finding is unknown.
(d) N-terminal processing
The predicted N-terminal sequences of SD-type cystatins extend an additional 11 residues beyond the conserved glycine (Fig. 3
). However, isolated proteins are frequently shorter, and N-terminal processing is a common feature of type 2 cystatins that has been observed, for example, in chicken, rat, and human cystatin C (Esnard et al., 1990; Popovic et al., 1990; Turk and Bode, 1991), human salivary cystatins (reviewed in Saitoh and Isemura, 1993; Baron et al., 1999a), and rat cystatin S (Nishiura et al., 1991). N-terminal processing differentially affects the inhibitory properties of cystatins (see below). Processing of cystatin C also affects its interaction with neutrophils (see below).
(b) Type 1 cystatins (the stefins)
The type 1 cystatins (also known as stefins) are CP inhibitors from vertebrates. Humans have 2 related proteins, often called stefins A and B. They are about 100 residues in length, and differ from type 2 cystatins in several respects: They lack disulfide bonds and the
-helix 2/loop, there is a kink in
-helix 1, and they have a nine-residue C-terminal extension. The type 1 cystatin genes have different intron positions in comparison with the type 2 cystatins, and they do not encode a secretory peptide signal sequence. Thus, the type 1 cystatins are generally considered to be cytoplasmic proteins, although it does appear that they might be secreted, or at least released, under certain circumstances (reviewed in Turk and Bode, 1991).
(c) Type 3 cystatins
This family consists of the kininogens (reviewed in Turk and Bode, 1991). There are three types: high- and low-molecular-weight kininogen, and T-kininogen (also known as major acute-phase protein). They consist of three tandemly arranged type 2 cystatin domains, followed by a kinin fragment, but differ in their C-termini. They all have the ability to inhibit CPs.
(d) Type 4 cystatins
The fetuins are a small family of abundant mammalian fetal serum and bone glycoproteins (reviewed in Brown et al., 1992; Brown and Dziegielewska, 1997). Fetuin was the name used for the protein from animals in the order Artiodactyla, while
2-Heremans Schmid glycoprotein (
2-HS glycoprotein,
2-HS) and histidine-rich glycoprotein (HRG) (also called histidine-proline-rich glycoprotein) referred to two related human proteins. The fetuins are N- and O-glycosylated and phosphorylated. The N-terminal region consists of two tandem type 2 cystatin domains, followed by a C-terminal region comprised of a histidine-rich domain between two proline-rich domains. The N- and C-terminal regions are linked by a disulfide bond.
2-HS has a structure similar to that of HRG, except that it lacks the histidine-rich tandem repeat. Unlike the kininogen cystatin domains, those of the fetuins lack detectable CP inhibitor activity. Consistent with this, they lack the conserved G, QXVXG, and PW motifs. Orthologs have been found in snake venom, where they act as anti-hemorrhagic factors. Remarkably, although they lack CP inhibitor activity like other type 4 cystatins, they are metalloproteinase inhibitors (Valente et al., 2001). This adds a new layer of complexity to the cystatin superfamily.
(e) Unassigned proteins
(i) CRP/CRES/testatin
Human and rodent genomes contain several related genes comprising a small multigene family encoding secreted glycoproteins with sequence similarity to type 2 cystatins. They have been given a variety of names, including testatin (Eriksson et al., 2002), cystatin-related peptide (CRP; Aumuller et al., 1995), cystatin-related epididymal spermatogenic protein (CRES; Cornwall et al., 1999), and cystatin T (Shoemaker et al., 2000). However, although they possess the PW conserved motif, they lack the conserved N-terminal glycine and consensus QXVXG motif generally required for inhibition of CPs (see below). Thus far, the function of these proteins is unknown.
(ii) Atypical cystatins from other phyla
Subsequent to the discovery of the mammalian cystatins, small (ca. 100-residue) CP inhibitors were discovered in rice and were called oryzacystatins (reviewed in Arai et al., 1996). Similar proteins have since been identified in a large number of plant species, and are collectively termed phytocystatins. Some are secreted. Although clearly homologous to vertebrate cystatins, they are structurally distinct: Like the type 1 cystatins, they lack the
-helix 2/loop and both disulfide bonds (Figs. 1, 2![]()
). However, unlike the type 1 cystatins, the phytocystatins generally have the typical C-terminal PW motif, and they lack the C-terminal extension (Margis et al., 1998). The three-dimensional structure of oryzacystatin I is closer to that of chicken cystatin than stefin A (Nagata et al., 2000). A variety of cystatins has been identified, primarily by cloning, in invertebrates, such as insects (e.g., Drosophila melanogaster; Delbridge and Kelly, 1990), or nematodes (e.g., Caenorhabditis elegans; see Fig. 1
). Although they possess the typical cystatin conserved motifs involved in inhibition of CPs, they do not fit the classification of families described abovee.g., the Drosophila cystatin, like the phytocystatins, lacks the
-helix 2/loop and associated disulfide bond, but it does have the C-terminal disulfide bond; conversely, many nematode cystatins have the
-helix 2/loop and associated disulfide bond, but lack the C-terminal bond (Figs. 1, 2![]()
; Dickinson, unpublished observations; see Maizels et al., 2001, for other examples).
(iii) Other proteins
A novel cystatin-related protein has been isolated from bovine cortical bone and cloned (Hu et al., 1995). It is a 24-kDa secreted phosphoprotein with a 107-residue N-terminal region that shows closest similarity to the cystatin domains of kininogens, followed by a short phosphorylated serine-rich domain. However, the N-terminal region also showed significant matches to members of the cathelicidin family, a class of mammalian myeloid cell antimicrobial peptides (reviewed in Zanetti et al., 1997). Cathelicidins are characterized by a conserved N-terminal proregion with closest similarity to the kininogen cystatin domains, followed by a highly divergent cationic antimicrobial C-terminal region that is released by proteolysis. A common feature of antimicrobial peptides is a high content of basic residues that facilitate binding to the cell, and a tendency to adopt an amphipathic conformation that mediates membrane disruption. At least some cathelicidins are CP inhibitors, albeit rather poor ones. Thus, the cathelicidins appear to be another branch of the cystatin superfamily. The CP cathepsin F has been cloned, and the zymogen has been shown to have a large N-terminal extension (Nagler et al., 1999; reviewed in Dickinson, 2002). The first segment of this extension was predicted to fold into a cystatin-like structure with two disulfide bonds. The predicted structure would lack the
-helix 2/loop, which would be replaced by a smaller loop, and also lacks the consensus QXVXG and PW motifs. The properties of this region remain to be characterized.
Based on sequence comparisons, the MHC class II invariant chain protein Ii p41 isoform, a CP inhibitor, was at one time thought to be a member of the cystatin superfamily (reviewed in Brown and Dziegielewska, 1997). The structure is now known to be quite different, and Ii represents an example of convergent evolution (reviewed in Riese and Chapman, 2000). The lipocalins are a large and diverse family of proteins found in fluids such as saliva and tears. They are thought to be involved in binding lipophilic molecules. Their structure is completely different from that of cystatins, but a lipocalin from human Von Ebners gland was shown to be a CP inhibitor, apparently another example of convergent evolution (vant Hof et al., 1997). The serpins, normally considered to be serine proteinase inhibitors, can also inhibit certain CPs, and this property may reflect the similarities in the catalytic mechanisms and active sites of the enzymes.
(3) THE PLACE OF SALIVARY CYSTATINS IN THE CYSTATIN SUPERFAMILY
It is reasonable to assume that the function(s) of the SD-type cystatins are the result of selection. While cases of complete shifts in the function of proteins are known (e.g., the recruitment of proteins as lens proteins), a more common trend in multigene families is specialization by modification of an existing function (e.g., the globins). Thus, clues to the function of SD-type cystatins may be present in the phylogeny of cystatins.
A statistically significant level of similarity between two or more proteins implies common ancestry, and, in principle, similarity can be used to organize homologous genes or proteins into a phylogenetic tree that describes their evolutionary relationships. Various hierarchical terms (superfamily, family, sub-family, group) are used to describe related groups. The choice of similarity level required to group proteins into a family (or any other level) is somewhat discretionary, and also depends upon the proteins under consideration. In general, the term family is the most commonly used, usually to denote a group of proteins that show a clear similarity in sequence to each other (e.g., > 30% residue identity, a BLAST score of < 10-4, etc.) along most of their lengths (or for matching domains of chimeric proteins). Strictly, a family of proteins should be a monophyletic groupthat is, the members share a most recent common ancestor that is not also an ancestor of one or more proteins not included in the group. Methods for constructing phylogenetic trees rely on various assumptions, and no single method is without weaknesses. Phylogenetic analysis of the cystatins is problematic, because the proteins are rather small, and the origins of the different families and subfamilies are ancient, so there has been extensive divergence. Further, different branches appear to have evolved at different rates.
Over the years, several evolutionary studies and phylogenetic trees of the cystatin superfamily have been reported (e.g., Rawlings and Barrett, 1990; Saitoh and Isemura, 1993; Brown and Dziegielewska, 1997; Brillard-Bourdet et al., 1998; Ni et al., 1998; Cornwall et al., 1999; Margis et al., 1998), although many did not include a statistical analysis of significance for phylogenetic trees. Several schemes have been proposed for the evolution of the different families (reviewed in Brown and Dziegielewska, 1997). A plausible model for the evolution of type 2 cystatins is as follows: Plants and animals diverged from a common ancestor that possessed a cystatin, perhaps around 1.6 billion years ago (BYa). Modern phytocystatins potentially represent this ancestral form of all types of cystatins. The
-helix 2/loop and first disulfide bond were acquired prior to the divergence of the nematodes (about 1.2 BYa). Since a type 2 cystatin is present in the horseshoe crab, the second disulfide bond and the general features of the type 2 cystatins must have evolved prior to the divergence of protostomes and deuterostomes about 1 BYa. Thus, regardless of the particular dates of these events (there is considerable argument), the type 2 cystatins are ancient proteins that have diversified further in the mammalian lineage. This model would suggest that the insect and type 1 cystatins are secondarily derived.
A typical phylogenetic tree for selected vertebrate type 2 cystatins is shown in Fig. 4
. Reliability of the branches was assessed by bootstrapping, which is very conservative. The human S-like cystatins group with 100% confidence in this analysis, although their order of evolution was not estimated reliably. Importantly, cystatin D forms a monophyletic clade with the S-like cystatins with quite good confidence (84%). Consistent with this, cystatins D and SA have essentially identical expression patterns. Thus, there is no reason to separate cystatin D and the S-like cystatins into separate subfamilies. It is proposed that they be collectively referred to as the SD-type cystatins. Consistent with several other analyses, this tree shows the SD-type cystatins evolving from a common cystatin C-like ancestor. Analyses with more sequences place the time of this divergence around the time of the mammalian radiation (about 100 million years ago [MYa]), but do not reliably determine if it occurred prior to the start of the radiation (leading to SD-type cystatins in all species) or after (in the limit, confining the SD-type cystatins to the primate lineage) (Margis et al., 1998; Dickinson, unpublished observations). In this tree, the rat cystatin S branch is not positioned with confidence, but it does not group with the SD-type cystatins, implying an independent origin. However, analyses with additional sequences can group it with the SD-type cystatins (Dickinson, unpublished observations). Thus, at this time it is not clear whether the rat cystatin S is a highly divergent ortholog of the human proteins, or a case of independent evolution. Similarly, the position of the snake venom cystatins cannot yet be estimated with any confidence.
|
| (II) Papain-like CP Inhibitory Activities of Human Type 2 Cystatins |
|---|
|
|
|---|
Inhibition constants and measured concentrations of human cystatins in various fluids are summarized in the Table
. It can be seen that, in general, the SD-type cystatins are poorer inhibitors of papain-like CPs than cystatin C. Due to their high concentrations in saliva and tears, S-like cystatins have the potential to inhibit many enzymes, although inhibition in many cases will not be pseudo-irreversible. There is a considerable range in concentration of cystatin levels in saliva in the periodontally healthy population (e.g., Aguirre et al., 1992; Henskens et al., 1994; Baron et al., 1999c). Although it has not been rigorously examined, cystatin and total protein levels in any single individual seem to be fairly constant, at least over a period of a few weeks (Rudney et al., 1993; Henskens et al., 1994). Since different assay techniques (e.g., immunoassay, papain inhibition, mRNA levels) give the same large population variance, it would appear to reflect true differences in glandular production of cystatins. This could be due to variation in total protein synthesis activity of the glands, or a cystatin-specific variation. Either might be subject to genetic control, or to long-term physiological modulation in response to factors such as diet or oral disease. Total salivary protein does have a genetic component (Rudney et al., 1994). It also increases with gingivitis and periodontal disease (Henskens et al., 1993; Rudney et al., 1994). In principle, much of the increase in salivary protein levels with disease could be due to serum transudate. However, the lack of correlation with myeloperoxidase levels, together with an increase in parotid saliva protein, suggests a substantial contribution from a glandular source (Rudney et al., 1994; Henskens et al., 1996a). A considerable range in S-type cystatin mRNA levels was found among submandibular glands (SMGs) from three individuals (Dickinson et al., 2002), and levels of cystatin protein and activity have been reported to change in response to oral health status, although these studies are not entirely consistent (see below). There is some discordance among the reported levels of cystatin mRNAs, protein concentrations, and cystatin activity (Henskens et al., 1996a,c; Dickinson et al., 2002), suggesting a contribution of post-translational and post-secretion events (such as N-terminal processing) to overall inhibitory capacity. Also, for parotid saliva, the contribution of cystatin C to total activity (against papain) may predominate and be subject to change (see below). Once the functions of SD-type cystatins are established, it will be of great interest to relate population variance in activity to oral and overall health status.
|
| (III) Structure-Function Relationships in Type 2 Cystatins |
|---|
|
|
|---|
(i) N-TERMINAL DOMAIN
Docking models based on the three-dimensional structures of chicken cystatin and papain suggested that one function of the conserved G11 residue could be to provide flexibility of the N-terminal region, allowing it to adopt a conformation that provides maximal binding contribution (Bode et al., 1988). Consistent with this model, mutation of cystatin C G11 increases the Ki, and the size of the effect depends upon the substitution and the target enzyme (Hall et al., 1993). Further, removal of the N-terminal decapeptide from these variants has a relatively small effect on the Ki values, unlike the effect in wild-type cystatin C, where the Ki is greatly increased. Neutrophil elastase rapidly cleaves the N-terminal region of cystatin C between V10 and G11 (the conserved glycine) to generate a form lacking 10 residues (Abrahamson et al., 1991). The activity of the modified cystatin C is reduced, but not uniformly: The affinity for cathepsins B and L is more than three orders of magnitude lower, while for cathepsin H it is only five-fold lower (although this appears to underestimate the importance of this region for this enzyme [Hall et al., 1995]). Therefore, the N-terminal region of cystatin C likely binds in the substrate-binding pockets of cathepsins B and L, but makes little contribution to cystatin C binding to cathepsin H. To examine the contributions of the individual N-terminal side-chains to binding, investigators have used site-directed mutagenesis to replace residues 8-10 with glycine or other amino acids, either singly or in combination (Hall et al., 1995; Mason et al., 1998). Residue 10 was found to be responsible for the main contribution to binding affinity for cathepsins B, H, L, and S. Most V10 substitutions tested caused a decrease in the Ki. For example, V10G decreased it by 2-3 orders of magnitude, depending on the target enzyme. Some substitutions increased affinity for a particular enzyme (e.g., V10W increased the affinity for cathepsin L 10-fold but decreased the affinity for cathepsin S about two-fold). R 8 and L9 were also found to make smaller, enzyme-dependent contributions to binding affinity. Thus, residues 8, 9, and 10 are involved in binding specificity, and cathepsin-specific cystatins can be produced from the broadly inhibitory cystatin C by the selection of appropriate residues at these positions.
Human S-type cystatins can be similarly truncated at the N-terminal (Fig. 3
; see above). In general, truncated forms either isolated from saliva or produced by cleavage with enzymes (e.g., gingipain R), or by recombinant techniques, show, at most, modest differences in inhibitory activity toward papain (and ficin) as compared with the full-length proteins (Bobek et al., 1994; Blankenvoorde et al., 1996; Saitoh et al., 1998; Baron et al., 1999a). Mutation of the conserved G11 (G11A-G12A) in cystatin SN also had no significant effect on papain inhibition (Tseng et al., 2000). Three forms of rat cystatin S have been isolated from rat saliva, designated RSC-1, -2, and -3, that differ in their extent of N-terminal processing, with RSC-1 being truncated to G11 (Nishiura et al., 1991). Although RSC-1 is a poorer inhibitor of papain and ficin than RSC-3, which has an additional three residues, the differences were not large (< 50-fold). In contrast to papain inhibition, a cystatin SAT protein isolated from saliva with a six-residue truncation was found to be a 1000-fold poorer inhibitor of cathepsin L than cystatin SA (Baron et al., 1999a). N-terminally truncated forms of cystatin D have also been examined (Hall et al., 1998). A form lacking all residues to G11 inclusive had essentially no activity against cathepsins H, L, or S (Ki > 1 µM), indicating that this region contributes 2-4 orders of magnitude to the Ki value, depending on the enzyme. Exchanging the N-terminal regions of human cystatins C and D resulted in inhibitors with moderately altered affinities for these three cathepsins. Collectively, these studies suggest that the N-terminal extension of the SD-type and rat cystatin makes only a modest contribution to papain binding, but is important for the specificity and strength of binding to cathepsins.
(ii) QXVXG DOMAIN
The binding affinity of human cystatin C with a deleted N-terminal region and mutation of the highly conserved W106 (see below) indicates that the QXVXG region contributes 40-60% of the total free energy of binding to actinidin, papain, and cathepsins B and H (Bjork et al., 1996). Mutagenesis of the QXVXG motif in chicken cystatin demonstrated that increases in the Ki were primarily the result of an effect on koff (Auerswald et al., 1995). However, the effects of different mutations were not consistent with all enzymes: e.g., alteration of the QXVXG loop had only a relatively modest effect on the Ki for cathepsin L or papain, but gave a > 1000-fold increase in the Ki for cathepsin B. This region appears to be particularly important for the inhibition of cathepsin S by human cystatin C, but less so for cathepsins B, H, and L (Hall et al., 1995).
Alteration of the QTVGG loop of cystatin SN by deletion or substitution drastically reduces the inhibitory activity toward papain (Bobek et al., 1994; Hiltke et al., 1999; Tseng et al., 2000).
This region is also essential for activity in rat cystatin S. Replacement of the QVVAG loop with LVL resulted in a protein with minimal activity toward papain (Bedi et al., 1998). The allelic variants of the QXVXG motif in cystatin SA (plus a co-variant in residue 120; see above) differ in their inhibitory activities (Saitoh et al., 1998). While SA1 (QIVGG) is a potent inhibitor of plant CPs and a good inhibitor of cathepsin K, SA2 (QIVDG) is a poorer one, especially for papain.
(iii) PW DOMAIN
As for the N-terminal region, the contribution of W106 to binding affinity (mainly by affecting koff) depends on the target enzyme. For cathepsin B, it makes a significant, although not dominant, contribution, but is less important for cathepsin S or cathepsin L, where the QXVXG and N-terminal regions, respectively, make the main contributions (Auerswald et al., 1995; Hall et al., 1995). Substitution of W106 with G in human cystatin C reduced the affinity for papain, actinidin, and cathepsins B and H by 300- to 900-fold (Bjork et al., 1996). Mutation of the PW motif in cystatin SN (P105G-W106G) had minimal effect on papain inhibition, but decreased the affinity for cathepsin C more than 100-fold (Tseng et al., 2000). Thus, for papain inhibition by the SD-type cystatins, the QXVXG region has the main effect on binding, while for cathepsin CPs all three conserved regions contribute.
(IV) OTHER DOMAINS
Remarkably, type 2 cystatins can inhibit mammalian legumain, a CP that belongs to a family (family C13) distinctly different from the papain-like CPs (family C1). Legumains perform protein-processing functions and have been shown to be important in antigen presentation in mammals (reviewed in Dickinson, 2002). Legumains have a strict requirement for Asn at the P1 position of the substrate. The Ki values for human cystatins C, E/M, and F with pig legumain are 0.2 nM, 0.0016 nM, and 10 nM, respectively (Alvarez-Fernandez et al., 1999). Recombinant cystatin C variants demonstrated that the papain/cathepsin and legumain inhibitory activities are independent, and that N39 is required for legumain inhibition. This residue is located in a loop on the opposite side of the papain-binding surface (Fig. 2
). Although cystatin D has an asparagine in this region (Fig. 1
), it was found to be non-inhibitory, perhaps due to an immediately adjacent positive, instead of a negative, charge highly conserved in vertebrate cystatins (Fig. 1
). All 3 S-like cystatins lack an asparagine in this region, and so would be expected to be inactive against legumain.
Cystatin SN, but not S, SA, or chicken cystatin, was found to form a variant stable complex with papain in which the enzyme remains proteolytically active (Baron et al., 1999b). This complex does not apparently involve the normal docking of the inhibitor with the active site, since it forms even when the active site is blocked with the irreversible inhibitor E-64. The region involved in this interaction is not known.
| (IV) Cystatin Genes in the Mammalian Genome |
|---|
|
|
|---|
CST1-5, CST7, CSTP1, and CSTP2 have all been localized by fluorescence in situ hybridization (FISH) to 20p11.2 (Abrahamson et al., 1989; Freije et al., 1993b; Dickinson et al., 1994; Thiesse et al., 1994; Morita et al., 2000). Physical mapping by means of pulsed-field gel electrophoresis and gene-specific hybridization probes localized all seven genes to a cluster spanning no larger than ca. 365 kb, with CST3 at one end (Dickinson et al., 1994; Thiesse et al., 1994) (Fig. 5
). CSTP2 and CST1 had previously been shown, by cosmid cloning, to be tandemly linked (Dickinson et al., 1993). At present, the public domain human genome map in the region of Chromosome 20 containing the known SD-type cystatin genes is incomplete. It may be significant that clones containing certain regions of the S-like cystatin genes are highly unstable in Escherichia coli, even in strains designed to stabilize unusual sequences (Millar et al., 1992; Dickinson, unpublished observations). Efforts to "walk" between genes were also frustrated by dispersed repetitive sequences found in the regions between the cystatin genes (Dickinson et al., 1993), and PCR screens of three YAC libraries for clones spanning the gene cluster were unsuccessful (Dickinson and Thiesse, unpublished observations). CST3 and CST4 have been physically linked in the genome sequence map, and placed on the telomere side of a ca. 300-kb gap (Fig. 5
). A gene localized to the centromere side of this gap is currently designated as similar to cystatin SA. However, comparison of its sequence with those of known genes reveals it to be CSTP1 (data not shown), consistent with the physical mapping described above. Thus, the order and orientation of CST2, CST5, and CSTP2-CST1 within this cluster remain to be established. However, it is interesting to note that the genes in this cluster thus far localized are all tandemly oriented in a head-to-tail manner, suggesting that the gene cluster has evolved primarily by simple unequal crossover. Cystatin F (CST7) has been placed ca. 1 Mb centromeric to this cluster, and CRES (CST8) and three related genes are located within a ca. 150-kb region telomeric to CST3. This gene organization is conserved in the rat and mouse, and rat salivary cystatin S gene (designated as CST4) has been mapped close to the CST3 ortholog (Alonso et al., 1997; Cornwall et al., 1999; Shoemaker et al., 2000). Therefore, all of these genes have been closely linked at least during mammalian evolution, but probably for much longer. Only a high-stringency Southern blot of the rat genome probed with rat cystatin S has been published (Cox and Shaw, 1992). It would be of interest to know if the rat genome has other rat CST4-like genes.
|
2-HS, and kininogen map to a region of less than 0.5 Mb on chromosome 3q27 (James et al., 1996), consistent with models for a common evolutionary origin for these three cystatin-domain proteins. Cystatin A also maps to chromosome 3, although at 3q21 it is rather distant from the fetuin gene cluster. Cystatin B maps to 21q22.3. The human secreted phosphoprotein 24 gene (SPP2) has been mapped to 2q37
qter (Swallow et al., 1997). | (V) Cystatin Gene Expression |
|---|
|
|
|---|
(1) HUMAN SD-TYPE CYSTATINS
In a recent comprehensive study, the distribution of human SD-type cystatin gene expression in 23 adult tissues was examined with the use of gene-specific riboprobes in a sensitive RNase protection assay (RPA), and sites of expression by immunohistochemistry (Dickinson et al., 2002). Three patterns of expression were found: CSTP1 and CSTP2 were confirmed as non-expressed pseudogenes, CST3 demonstrated ubiquitous expression, although levels varied somewhat between different tissues, and the SD-cystatin genes CST1, 2, 4, and 5 were shown be expressed in a differential tissue-specific manner. Most tissues did not have detectable levels of SD-type cystatin mRNA. Based on their distribution and level of expression, the SD-type cystatins could be divided into two subgroups. CST2 and CST5 were expressed only in the SMG and parotid gland, at levels comparable those of CST3. CST1 and CST4 were both expressed in these tissues, but SMG mRNA levels were much higher than those of CST3. Expression was localized to the serous acini and demilunes. CST1 and CST4 were also both expressed at modest levels in the acini of the orbital lobe of the lacrimal gland, and the epithelial linings of the gall bladder and seminal vesicle. CST4 was found in the proximal convoluted tubules of the kidney and at trace levels in the prostate. For the tracheal sample used in this study for mRNA analysis, very low levels of CST1 were detected (Dickinson et al., 2002). In another sample, significant S-like cystatin expression was observed in the serous acini and demilunes of the tracheal gland. Northern blot analysis of other samples showed quite different levels between and among samples (Dickinson, unpublished observations). These results are generally consistent with those of several earlier studies that have examined SD-type cystatin expression, primarily using Northern blotting or immunohistochemistry in a more limited number of tissues (e.g., Sabatini et al., 1989; Barka et al., 1991; Bobek et al., 1991; Freije et al., 1991; Takahashi et al., 1992). Salivary cystatins (identified by immunoassay) comprise about 10% of tear protein, and cystatins S and SN were identified in tear fluid (Barka et al., 1991). Cystatin D has also been reported in tear fluid (Freije et al., 1993a). A likely explanation is that cystatin D is not expressed in the orbital lobe of the lacrimal gland, but in one or more of the other glands that secrete in the eye. Non-purulent bronchial secretions contain up to 30 µg/mL (2.1 µM) S-type cystatins, and up to 6 µg/ mL (0.4 µM) cystatin C (Buttle et al., 1990). Since SD-type cystatins are not produced by bronchial epithelial cells (Burnett et al., 1995), tracheal glands appear to be a major source. Cystatin S protein has been specifically identified by electrophoresis and sequencing in both broncho-alveolar and nasal lavage fluids (Lindahl et al., 1999). It will be of interest to examine CST1 and CST4 expression in additional tracheal gland samples, and to determine if they can be modulated by disease. In summary, SD-type cystatin gene expression is primarily restricted to serous-type acinar and demilune cells of anterior exocrine glands, and to secretory epithelia of a limited number of other tissues in the body. However, it is clear that, as a subfamily, they are more than salivary cystatins.
Expression of human SD-type cystatin genes and CST3 was examined during pre- and post-natal development of the SMG (Dickinson et al., 2002). CST3 was expressed at modest levels before birth, and showed only a 2.7-fold increase between 2 and 9 months of age. In contrast, all four SD-type cystatin genes were expressed at trace levels before birth. Expression rose to 18-38% of adult levels during the first week of full term, then declined to 1-6% of adult levels by 1-2 months of age. Then, between 2 and 9 months, all four genes showed a dramatic, co-ordinate rise in expression to adult levels.
Very little is known regarding the mechanisms regulating human SD-cystatin gene expression. Independent lines of mice carrying a 22-kb CST1 transgene showed significant expression in the parotid and lacrimal glands, but not the SMG (Dickinson and Thiesse, 1995). This lack of SMG expression could reflect the absence of an SMG enhancer in the transgene, the lack of a cognate transcription factor in the mouse SMG, or the presence of a repressor. The last two possibilities would be consistent with the seromucous nature of the mouse SMG acini, and the lack of expression of CST1 in human mucous acini. Using phylogenetic footprinting, Shaw and Chaparro (1999) have found conserved motifs in the promoter region of salivary protein genes, including human cystatins. Their functionality remains to be tested. A major hindrance is the lack of cell lines that routinely express their endogenous SD-type genes. An immortalized human submandibular gland cell line (HSG) has been reported to express cystatins immunoreactive with a polyclonal anti-cystatin SN antibody when grown on Matrigel (Hoffman et al., 1996). However, preliminary tests with gene-specific riboprobes failed to detect expression of any SD-type cystatins in HSG cells (Dickinson and Thiesse, unpublished observations). Cross-reactivity of the antibody is an obvious concern, but this cell line warrants further study.
(2) RAT CYSTATIN S
The rat cystatin S gene has been cloned (Cox and Shaw, 1992). It has the typical type 2 cystatin 3-exon-2-intron structure, and the same GATAAA variant of the TATA box as the human SD-type cystatins (and cystatin C). However, it is sufficiently divergent in comparison with human SD-type cystatins (< 61% nucleotide identity between exons), such that rat and human probes would not be expected to cross-hybridize significantly. Phylogenetic analyses of the relationship of rat cystatin S and human SD-type cystatins are inconsistent (see above), and although there are similarities between the human and rat genes at the level of gene expression, there are significant differences.
In rats, development of the SMG at the molecular and histological levels continues post-natally (reviewed in Denny et al., 1997; see Nishiura and Abe, 1999, for early references). Cystatin S mRNA was undetectable by Northern blotting in 20-day-old fetuses and newborn or 10-day-old Sprague-Dawley rats, although trace levels of cystatin S mRNA were detected at 1 week by means of a sensitive quantitative reverse-transcriptase/polymerase chain-reaction (RT-PCR) (Shaw et al., 1990; Nishiura and Abe, 1999). Cystatin S mRNA levels were found to rise dramatically between 21 and 28 days. Expression was confined to the acinar cells and coincided with acinar cell differentiation (Shaw et al., 1990). Following this short period, cystatin S mRNA levels declined rapidly: By 32 days, expression was near the limits of detection by Northern blotting (Shaw et al., 1990). This developmental pattern has some similarities to that seen in humans. However, in the rat, the levels do not rise again. Cystatin S protein was undetectable in adult saliva by Western blotting (Bedi, 1991). Low-level SMG cystatin S gene expression detectable by RT-PCR was shown to persist out to 52 weeks, and mRNA levels in the adult rat were about 100-fold lower than those of cystatin C, which did not show marked developmental regulation (Nishiura and Abe, 1999). Expression of cystatins C and S in the adult Sprague-Dawley rat SMG detectable by in situ hybridization is confined to the acinar cells (Barka and van der Noen, 1994). As for human S-like cystatins, expression of rat cystatin S is not limited to the salivary glands. In contrast to the adult rat SMG and parotid, rat cystatin S (or an immunologically similar protein) is expressed at detectable levels in the acinar cells of the adult lacrimal gland, and in a subset of the sebaceous glands (Takahashi et al., 1992; Cohen et al., 1996). The detection of rat cystatin S by immunoelectron microscopy in normal osteoclasts (Moroi et al., 1997) suggests that this may also be a normal site of expression in adult animals, thereby implying a role for CPs in the degradation of bone extracellular matrix (see below).
SMG secretion in adult rats is regulated by both parasympathetic and sympathetic nerves of the autonomic nervous system. Chronic treatment with the ß-adrenergic agent IPR causes reversible SMG enlargement and induces expression of several proteins. Cystatin S expression, but not that of cystatin C, is rapidly and dramatically up-regulated, reaching salivary levels as high as 1.6 mg/mL following chronic injection with IPR (Shaw et al., 1990; Bedi, 1991; Barka and van der Noen, 1994). Induction is not restricted to the SMG: In adult female rats, modest levels of cystatin S mRNA are induced in the parotid following IPR treatment. Lacrimal expression is not significantly affected.