CROBM CROBM & JDR Merger
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Dickinson, D.P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Dickinson, D.P.
13(3):238-275 (2002)     Crit Rev Oral Biol Med
© 2002 International and American Associations for Dental Research

CYSTEINE PEPTIDASES OF MAMMALS: THEIR BIOLOGICAL ROLES AND POTENTIAL EFFECTS IN THE ORAL CAVITY AND OTHER TISSUES IN HEALTH AND DISEASE

D.P. Dickinson

Medical College of Georgia, School of Dentistry, Department of Oral Biology and Maxillofacial Pathology, 1120 15th Street, Augusta, GA 30912; ddickins{at}mail.mcg.edu

(I) Introduction
    (A) OVERVIEW OF PEPTIDASES
    (B) OVERVIEW OF CPS
    (C) REGULATION OF CPS AND OTHER PEPTIDASES
(II) Clan CA
    (A) EVOLUTION OF CLAN CA ENZYMES
    (B) GENERAL PROPERTIES OF PAPAINRELATED (FAMILY C1A) CPS
    (1) Structure and activity
    (2) The proregion
    (C) BIOCHEMICAL PROPERTIES, EXPRESSION, AND NORMAL FUNCTIONS OF MAMMALIAN CPS OF SUBFAMILY C1A
    (1) Widely expressed cathepsins
    (a) Cathepsins B, H, and L
    (i) Properties and tissue distribution
    (ii) Functions of cathepsins B, H, and L
    (iii) Cathepsins B, H, and L in the oral cavity
    (b) Dipeptidyl peptidase I (DPP I)
    (i) Properties and tissue distribution
    (ii) Functions of DPP I and its role in pre-pubertal periodontitis and Papillon-Lefèvre and Haim-Munk syndromes
    (c) Cathepsins O and X
    (i) Properties and tissue distribution
    (ii) Functions
    (d) Cathepsin F
    (i) Properties and tissue distribution
    (ii) Functions of cathepsin F
    (2) Tissue-specific cathepsins
    (a) Cathepsin S
    (i) Properties and tissue distribution
    (ii) Functions of cathepsin S and its role in antigen presentation
    (iii) Potential roles for cathepsin S in oral tissues and the oral cavity
    (b) Cathepsin K
    (i) Properties and tissue distribution
    (ii) Potential functions of cathepsin K in oral tissues
    (c) Lymphopain
    (d) Cathepsin V and other cathepsin L-like sequences
    (i) Properties and tissue distribution
    (ii) Functions
    (e) Cathepsins J, P, CLRP, M, Q, R, and the testins
    (i) Properties and tissue distribution
    (ii) Functions
(III) Calpains
(IV) Mammalian Peptidases of Clan CD
    (A) THE LEGUMAINS
    (B) THE CASPASES
    (i) Properties
    (ii) Function of caspases in inflammation and apoptosis
    (iii) Apoptosis and oral tissues
(V) Mechanisms of Exposure of Tissues to Host CPs and the Consequences
    (i) RELEASE OF MATURE ENZYMES
    (ii) RELEASE OF PROENZYMES
    (iii) CONTROL OF RELEASED ENZYMES
(VI) The Proteolytic Cascade: Interactions between Enzymes and Their Inhibitors
(VII) Interaction of Host CPs with the Immune System
(VIII) CPs and Sjögren's Syndrome
(IX) CPs and Periodontal Disease
(X) CPs in the Oral Cavity
(XI) CPs and Tooth Development and Movement
(XII) CPs and Cancer
(XIII) CPs and Arthritis
(XIV) Summary and Conclusions
REFERENCES

   Abstract
 Top
 Next
 
Cysteine peptidases (CPs) are phylogenetically ubiquitous enzymes that can be classified into clans of evolutionarily independent proteins based on the structural organization of the active site. In mammals, two of the major clans represented in the genome are: the CA clan, whose members share a structure and evolutionary history with papain; and the CD clan, which includes the legumains and caspases. This review focuses on the properties of these enzymes, with an emphasis on their potential roles in the oral cavity. The human genome encodes at least (but possibly no more than) 11 distinct enzymes, called cathepsins, that are members of the papain family C1A. Ten of these are present in rodents, which also carry additional genes encoding other cathepsins and cathepsin-like proteins. Human cathepsins are best known from the ubiquitously expressed lysosomal cathepsins B, H, and L, and dipeptidyl peptidase I (DPP I), which until recently were considered to mediate primarily "housekeeping" functions in the cell. However, mutations in DPP I have now been shown to underlie Papillon-Lefèvre syndrome and pre-pubertal periodontitis. Other cathepsins are involved in tissue-specific functions such as bone remodeling, but relatively little is known about the functions of several recently discovered enzymes. Collectively, CPs participate in multiple host systems that are active in health and in disease. They are involved in tissue remodeling and turnover of the extracellular matrix, immune system function, and modulation and alteration of cell function. Intracellularly, CPs function in diverse processes including normal protein turnover, antigen and proprotein processing, and apoptosis. Extracellularly, they can contribute directly to the degradation of foreign proteins and the extracellular matrix. However, CPs can also participate in proteolytic cascades that amplify the degradative capacity, potentially leading to pathological damage, and facilitating the penetration of tissues by cancer cells. We know relatively little regarding the role of human CPs in the oral cavity in health or disease. Most studies to date have focused on the potential use of the lysosomal enzymes as markers for periodontal disease activity. Human saliva contains high levels of cystatins, which are potent CP inhibitors. Although these proteins are presumed to serve a protective function, their in vivo targets are unknown, and it remains to be discovered whether they serve to control any human CP activity.

Key words. Cysteine protease, peptidase, evolution, cathepsin, oral tissues, human


   (I) Introduction
 Top
 Previous
 Next
 
As evidenced by the deleterious effects of salivary insufficiency, saliva is important in maintaining oral health, and it is largely taken for granted that much of this protective activity is mediated by the variety of different salivary proteins (reviewed in Schenkels et al., 1995). The salivary cystatin proteins (reviewed in Bobek and Levine, 1992; Henskens et al., 1996) are abundant salivary (and tear) inhibitors of CPs, a phylogenetically ubiquitous and diverse class of peptidase. Thus, it is generally assumed that one function of salivary cystatins in vivo is to provide protection in the oral cavity by inhibiting CPs. However, the in vivo target(s) for salivary cystatins has yet to be identified, and to date, no disease has been reported that is associated with a defect in a salivary cystatin gene. Therefore, the biological function of salivary cystatins can be inferred only from circumstantial data. What are potential sources of exposure of oral and nasopharyngeal tissues to CPs, and the consequences that could warrant a protective mechanism?

The number of identified mammalian CPs has grown considerably in the past few years. The purpose of this review is to summarize the salient features of these enzymes, concentrating on known (or potential) functions, and to indicate relevance to oral tissues. In addition, the potential for interactions with other peptidases (and their inhibitors) to form proteolytic networks will be examined. It is important to note that while some endogenous CPs have been shown to be involved in processes such as inflammation, antigen presentation, bone remodeling, and cancer, in only a few cases has their role been examined in oral tissues. Further, there are several CPs whose function remains to be established, but whose properties warrant investigation of potential oral sources. Exogenous sources of CPs (e.g., from pathogens) and the cystatins themselves will be the subjects of a separate review.

    (A) OVERVIEW OF PEPTIDASES
Enzymatic cleavage of peptide bonds is fundamental to almost every aspect of life, and peptidases represent about 2% of all gene products. Examples can be found in digestion, blood coagulation and fibrinolysis, processing of preproproteins such as collagen, immune function, development, and apoptosis (reviewed in Twining, 1994). Not surprisingly, peptidase genes are found in the genomes of all cellular organisms (and several types of virus), and there are arrays of proteolytic enzymes distributed in cellular and tissue compartments. Peptide bond scission proceeds via a nucleophilic attack on the carbonyl carbon, followed by a general acid-base hydrolysis. The general term peptidase is preferred for enzymes that catalyze this reaction, although protease and proteinase are commonly found in the literature. Peptidases are generally grouped into five major types (cysteine, serine, threonine, aspartate, and metalloproteinase) according to the mechanism used to generate the nucleophile in the active site (reviewed in Barrett et al., 1998). In the CPs, an activated cysteine residue is used as the nucleophile and a histidine residue as the proton donor. In some enzymes, a third residue serves to orient the His residue.

The CPs comprise a complex set of enzymes (reviewed in Rawlings and Barrett, 1994; Kirschke et al., 1995; Chapman et al., 1997; Barrett et al., 1998). CPs of various physical and biochemical types are found in all kingdoms of organisms, indicating that they are among the most ancient proteins. Phylogenetic analysis is proving to be a powerful tool for elucidating the relationships between and among the large numbers of CP-like protein sequences that have been identified thus far, and phylogeny now provides the foundation for classifying the CPs (and other peptidases) (Rawlings and Barrett, 1993, 1994; see Barrett et al., 1998, for a detailed compilation of peptidases using this scheme, and detailed descriptions of their enzymatic properties). However, in the literature, descriptions of the different groups of CPs and their relationships to each other are complicated by differences in usage of the hierarchical phylogenetic terms class, superfamily, family, and subfamily. For example, the latter three terms have all been applied to papain-like cysteine peptidases (Karrer et al., 1993; Rawlings and Barrett, 1993; Berti and Storer, 1995). Used in a phylogenetic context (that is, where evolutionary relationships are considered), these terms implicitly mean a grouping based on the position of a node on a phylogenetic tree that links all members of the group, and reflects the degree of divergence of that group from other groups. Strictly, a family of proteins (or matching domains of chimeric proteins) should be a monophyletic group—that is, the members share a most recent common ancestor that is not an ancestor of one or more proteins not included in the group. The term subfamily is often used to denote clear divisions within a family, such as functional differences in proteins of otherwise similar sequences. The term superfamily is often used to denote groups of proteins that have relatively low (but detectable) sequence similarity to each other, but which have structural and functional properties consistent with a common evolutionary origin. In the system of Rawlings and Barrett, the term family is used to denote recognizable groups such as the papain-like enzymes. In a family, every member has a statistically significant relationship to at least one other member of the family (at least in the sequence of residues involved in catalytic activity), which implies evolution from a common ancestral peptidase. Deep divergences within the phylogenetic tree generally warrant the designation of the main branches as subfamilies. The term clan denotes groups of families for which there are indications of evolutionary relationships (e.g., active site arrangements, common three-dimensional structures) but which lack statistically significant similarities in sequence. CP clans are designated CA, CD, etc., and families by C+number (e.g., C12).

    (B) OVERVIEW OF CPS
From sequence and structural comparisons, it is clear that several types of CPs had independent origins, and there are at least six distinct clans consisting of 43 families, of which over half are from viruses (Rawlings and Barrett, 2000). Most and perhaps all clans represent convergent evolution, since in the majority of cases the enzymes have been shown to have a distinctly different organization of catalytic residues. Enzymes in clan CA (papain-related) have the catalytic residues in the order Cys....His....Asn/Asp, while enzymes in clan CD (legumain-related) have an active site catalytic dyad with a His-Gly-spacer-Ala-Cys motif (Chen JM et al., 1998). In the oral cavity and the nasopharynx, the substantial majority of CPs encountered—whether from the host, from viruses, bacteria, parasitic protozoa or helminths, or the diet—are from these two clans.

CPs are best known from members of the CA clan that were purified by classic biochemical techniques: the plant enzyme papain and the related mammalian lysosomal cathepsins B, H, L, and DPP I. (It should be noted that the term cathepsin is a general term for a peptidase, especially lysosomal, with an acidic pH optimum that is involved in protein degradation, and is not restricted to peptidases of any one catalytic type. For example, cathepsin D is an aspartic peptidase.) More recently, molecular cloning has been used to identify other related mammalian CP cathepsins—primarily human—and now the human genome is known to contain at least 11 related, but distinct, CP cathepsins: B, F, H, K, L, O, S, V, X, DPP I, and lymphopain. At this time, a search of the near-complete human genome database does not reveal any additional functional genes (data not shown), and thus these cathepsins may represent the entire human complement of enzymes most closely related to papain. Most of the 11 CP cathepsins evolved relatively early and are present in all mammals. The literature concerning these proteins has been greatly complicated by the use of the same letter to designate different proteins, and different letters to designate the same protein. Some of the various designations, and presently accepted or recommended ones, are summarized in the Table, together with the human chromosomal locations. Amino acid identity between and among these mammalian cathepsins is high in the vicinity of the active site, but overall levels of similarity generally range from 20 to 60% (Wiederanders et al., 1992). However, the recently evolved cathepsin V shares 78% similarity with cathepsin L (see below). Other cathepsin activities have been described in the literature, but their relationship to the 11 listed above is uncertain (Kirschke et al., 1995). In addition to other, evolutionarily independent, types of CPs in the human genome (e.g., clan CD enzymes), there are other CPs, such as cancer procoagulant, that remain to be characterized.

    (C) REGULATION OF CPS AND OTHER PEPTIDASES
It is axiomatic that peptidases represent potentially dangerous enzyme activities that must be subject to strict control and containment within appropriate compartments, and that a failure in these controls could lead to pathology. Several mechanisms are used to regulate peptidase activity, in addition to transcriptional and post-transcriptional controls (Twining, 1994; Chapman et al., 1997). Most enzymes are synthesized as inactive zymogens that must be activated by proteolytic cleavage. This may be autocatalytic under specific conditions, such as low pH. Release of peptidases from a cell is generally a controlled process. The CPs are readily inactivated by oxidation of the active site cysteine and require a reducing environment for full activity. Many human CPs are unstable at neutral pH and require an acidic pH for full activity. Once activated, enzyme activity can be lost by degradation. A major control governing peptidase activity is the presence of protein inhibitors that bind tightly to the enzyme, blocking substrate binding. It must be emphasized that, in many basic tissue reactions (e.g., inflammation), CPs probably do not function in isolation. Rather, they are components of complex networks constituting representatives of different types of peptidase and their respective inhibitors. Within these networks, cross-activation of zymogens and cross-inactivation of inhibitors can provide an amplification of an initial perturbation.

Numerous low-molecular-weight CP inhibitors have been developed. One of the most widely used is E-64 [L-trans-epoxysuccinyl-leucylamino(4-guanidino)butane], a potent irreversible inhibitor of many (but not all) CPs that forms a thioether with the active site cysteine (Barrett et al., 1982). The vinylsulfone N-morpholinurea-leucine-homophenylalanine-vinylsulfone-phenyl (LHVS) is a potent inhibitor of cathepsin S, and quite effective against cathepsin F (Shi et al., 2000).


   (II) Clan CA
 Top
 Previous
 Next
 
The CA clan is the largest characterized to date: It consists of 25 families, of which many are viral. Three are represented in prokaryotes, and four in eukaryotes (including protozoa, yeast, plants, and animals) and in some cases prokaryotes (Rawlings and Barrett, 2000). This clan includes the ancient papain-related enzymes found in bacteria, protozoan and metazoan organisms (family C1), and the calpains (C2). Several viral CPs also have papain-like structures (e.g., C28, foot-and-mouth disease virus leader peptidase). Many members of clan CA can be inhibited by the cystatins, and also by the synthetic inhibitor E-64.

    (A) EVOLUTION OF CLAN CA ENZYMES
The papain-related enzyme family C1 is grouped into two subfamilies: C1A, comprised of "papain-like" enzymes (in a general sense; i.e., statistically similar in sequence and related in structure to papain), and C1B, consisting of intracellular bleomycin hydrolases and related bacterial aminopeptidases. These subfamilies display significant similarity only in the vicinity of the active-site residues, and represent the earliest evolutionary divergence in the C1 enzymes (Berti and Storer, 1995). Clan CA enzymes arose in bacteria in the Archean(4000 to 2500 millennia ago) and papain-related CPs are phylogenetically ubiquitous (reviewed in Tort et al., 1999).

Several phylogenetic analyses of the C1A enzymes have been reported, often in conjunction with structural considerations, although not all of these studies have included a statistical analysis of branch support (e.g., Wiederanders et al., 1992; Berti and Storer, 1995; Santamaria et al., 1999; Tort et al., 1999; Wex et al., 1999; Rawlings and Barrett, 2000). An alignment of vertebrate cathepsins and related proteins and two representative plant enzymes is shown in Fig. 1Go, and a phylogenetic tree based on this alignment in Fig. 2Go. Each branch on the tree represents either a duplication of a common ancestor (leading to paralogous genes) or speciation (leading to orthologs). The various trees are generally consistent, regardless of the methodologies used, and the C1A family of "papain-like" enzymes can be divided into two major, ancient groups (subfamilies), which have been referred to as Branch A and Branch B (Tort et al., 1999). These branches arose by duplication of a common ancestor over 2700 Ma, before the eukaryote-prokaryote divergence. A few other enzymes from parasitic and free-living protozoa and metazoa, as well as the recently described human cathepsin O, do not strongly localize to either branch (although they appear to group weakly with Branch B), and the timing of their divergence from the ancestors of the two major groups is uncertain.




View larger version (147K):
[in this window]
[in a new window]
 
Figure 1. Protein sequence alignments of mature regions of known human cathepsins, other rodent cathepsins and related proteins, and plant papain and aleurain. Although only human and selected sequences are shown, the original dataset used for generating the alignments contained 43 vertebrate and two plant sequences, identified by a BLAST search of the GenBank non-redundant database (Altschul et al., 1990). The sequence abbreviations and GenBank entries for the sequences shown are: hDPP I, human dipeptidyl peptidase I, P53634; hB, human cathepsin B, NP_001899; hV, human cathepsin V, AAC23593; hL, human cathepsin L, NP_001903; rtestin, rat testin, P15242; mP, mouse cathepsin P, NP_036137; rCLRP, rat cathepsin L-related protein, I58002; rQ, rat cathepsin Q, AAF01247; mM, mouse cathepsin M, AAF68224; hK, human cathepsin K, P43235; hS, human cathepsin S, P25774; hH, human cathepsin H, P09668; Aleurain, plant, P05167; Papain, plant, P00784; hF, human cathepsin F, NP_003784; hX, human cathepsin X, NP_001327; hLym, human lymphopain, P56202; and hO, human cathepsin O, NP_00135. Propeptide cleavage sites were obtained from the GenBank entries, or from preliminary alignments. Alignments of the predicted mature proteins were generated based on the default settings of ClustalX (Thompson et al., 1997). Introduced gaps are shown as "-". The alignment is in good agreement with that of other published ones, including those based on structure. To derive consensus sequences, we processed alignment files using the public domain software BOXSHADE (written by K. Hofmann and M. Baron, www.ch.embnet.org/software/BOX_form.html). Predominant identical residues (> 50%) at a position are shown with a black background, predominant similar residues on a grey background. The majority consensus residue is shown under each alignment. An uppercase letter shows a residue conserved in all aligned sequences. An * under the consensus sequence denotes the active-site cysteine and histidine residues conserved in all functional peptidases.

 


View larger version (24K):
[in this window]
[in a new window]
 
Figure 2. Phylogenetic tree of vertebrate and select plant CPs. The full alignment of CPs described in the legend to Fig. 1Go was used, except that bovine cathepsin L was excluded, due to the incomplete C-terminal sequence. The PAUP 4b4a software package (Swofford, 2000) was used to search for trees by means of the distance optimality criterion and default parameters, and starting trees obtained by neighbor-joining. The tree shown is an unrooted 50% majority rule consensus tree obtained by the bootstrap method (500 replicates) with heuristic search. Values shown adjacent to branches are percentage support for a branch. The large arrow indicates the position of the presumptive root of the tree: CPs on the group of branches to the right of this point comprise Branch A, those to the left Branch B (see text for explanation of Branches A and B). The abbreviations used for species are: bo, bovine; ch, chicken; h, human; m, mouse; pig, pig; r, rat; ra, rabbit; rh, rhesus; sh, sheep; and zf, zebrafish.

 
Branch A divides into three strongly supported major groups: the cathepsins B- and X-like, and DPP I groups. Their order of origin is uncertain. Cathepsin B-like enzymes have been found in plants, the primitive protozoan Giardia as well as trypanosomatids, nematodes, trematodes, arthropods, and vertebrates; DPP I has been found in trematodes and vertebrates, and cathepsin X-like enzymes in nematodes and vertebrates. Thus, DPP I and cathepsin X likely evolved by duplication of an ancient cathepsin B-like ancestral gene. Branch B is more complex. Analysis of sequences from a wide phylogenetic range identified four major groups (subfamilies), plus two minor groups of enzymes from Dictyostelium, certain parasitic and free-living protozoa, and a nematode (Berti and Storer, 1995). All groups appear to have evolved early in the history of Branch B, reflecting a rapid initial duplication and divergence. The major groups are the cruzipain (cruzain)-like (corresponding to the Ddis1 group of Berti and Storer, 1995), cathepsin L-like, papain-like, and cathepsin H/aleurain-like enzymes. Cruzipain-like enzymes are found in plants, Dictyostelium, free-living and parasitic protozoa, nematodes, trematodes, and vertebrates (cathepsin F and lymphopain), while cathepsin H and aleurain-like (e.g., aleurain, orizain gamma, maize cysteine proteinase 2) enzymes have thus far been found in vertebrates and plants. Papain-like enzymes (in the strictest sense of the term, i.e., a monophyletic group, e.g., papain, chymopapain, caricain, stem bromelain, actinidain, vignain, orizain a) are restricted to plants, while cathepsin L-like enzymes are restricted to nematodes, trematodes, cestodes, arthropods, and vertebrates. Since they are represented in plants and eukaryotes, the cruzipain and the cathepsin H/aleurain groups must have evolved prior to the plant-animal divergence around 1600 Ma, while the cathepsin L-like group must have arisen before the divergence of nematodes from the lineage leading to arthropods and chordates, around 1200 Ma. The L-like cathepsins S and K have evolved more recently, and may be vertebrate-specific, pre-dating the evolution of bony fish over 450 Ma.

The mammalian cathepsin L-like CPs (cathepsins L, V, K, and S and the rodent M, J/P, Q, R, CLRP subgroup) form a distinct, well-supported (99% bootstrap) group (Fig. 2Go). Cathepsin F and lymphopain are also related to each other, although with weaker support (65% in this analysis). Significantly, gene pairs with a later common ancestor have tended to remain together on the chromosome (see TableGo). Thus, cathepsin pairs F and lymphopain co-localize to 11q13.1-13.3, K and S to 1q21, and L and V to 9q21-22. Members of each pair also have intron/exon organizations that are similar to each other. Other cathepsins map to unique positions in the human genome and tend to have unique intron/exon arrangements. Cathepsins L and V show 78% sequence similarity, consistent with a relatively recent gene duplication of an ancestral cathepsin L gene (Brömme et al., 1999; Itoh et al., 1999). They are sufficiently similar as to suggest that cathepsin V evolved after the mammalian radiation: Efforts to find a mouse ortholog of cathepsin V have failed, and thus the distribution of cathepsin V in mammals may be limited (Brömme et al., 1999). In general, vertebrate cathepsins representing phylogenetically ancient types (cathepsins B, X, L, H, O, F, and DPP I) are ubiquitously expressed in mammalian tissues, although often at quite different levels in different sites (see below). It should be noted that this does not preclude specific functions in certain tissues, as has been shown for cathepsins B, L, and DPP I (see below). However, within Branch B, other cathepsins have evolved tissue-specific patterns of expression (see below), primarily associated with cells of the immune system.


View this table:
[in this window]
[in a new window]
 
TABLE Human and Rodent Cathepsins
 
    (B) GENERAL PROPERTIES OF PAPAINRELATED (FAMILY C1A) CPS
    (1) Structure and activity
The three-dimensional structures of several papain-related enzymes have been determined (Kirschke et al., 1995; reviewed in Turk et al., 1997, 1998; McGrath, 1999). With the exception of DPP I , which is oligomeric, all are relatively small monomers of 20 to 35 kDa. Some (e.g., cathepsins B, H, and L) undergo internal cleavage to produce two-chain forms, and many enzymes are glycosylated. The different proteins are similar in the number and position of {alpha}-helices and ß-pleated sheets, and most relative insertions and deletions between the proteins occur in the loops and turns linking these elements, consistent with a common ancestry. The molecules are bi-lobed. The two domains are designated L and R, and the L domain consists of the majority of the N-terminal half of the protein, while the R domain consists of the C-terminal half of the molecule and the most N-terminal residues. The catalytic site is located in a cleft between the lobes: the catalytic Cys on the L domain, the His opposite on the R domain. Typically, the structure is stabilized by three disulfide bonds.

In contrast to many other peptidases, in no case has a CP cathepsin been shown to have a single specific substrate, although they do differ considerably in their preferred cleavage site. Polypeptide substrates bind along the cleft in an extended conformation. Binding sites for substrate residues N-terminal to the cleaved peptide bond are designated as S1, S2...etc.; those C-terminal are designated as S1', S2'...(where S1 and S1' are proximal to the cleaved bond). Similarly, P1, P2...; P1', P2'......, etc., are used to designate the corresponding substrate residues (Barrett, 1994). Only the S2 subsite is a real pocket: the other sites are shallow indentations on the surface of the enzyme. Most enzymes favor a hydrophobic residue at the P2 site, whose side chain projects down into a hydrophobic S2 pocket, but papain-like CPs vary widely in their accommodation of an aromatic residue in this position. Some (e.g., cathepsin B, cruzipain) will accept an arginine at this position by forming a salt bridge. Interactions with the substrates P3 and P2' residues involve the side chains, while P2, P1, and P1' involve both main and side-chain contacts (reviewed in Turk et al., 1998; McGrath, 1999). Most enzymes are endopeptidases, but cathepsin B has strong carboxypeptidase activity, whereas cathepsin H has strong aminopeptidase and limited endopeptidase activity (reviewed in Kirschke et al., 1995). Cathepsin B has an 18-residue insertion proximal to the active-site cleft that forms an occluding loop. This restricts access to potential substrates in the prime sites and helps provide an anchor for the C-terminal carboxyl group. Flexibility in the loop facilitates endopeptidase activity (reviewed in Mort and Buttle, 1997; McGrath, 1999). In Cathepsin H, an 8-residue segment of the proregion, the minichain, remains attached via a disulfide bond and restricts access to the non-primed sites (reviewed in Turk et al., 1997; McGrath, 1999). The plant aminopeptidases aleurain and oryzain share sequence similarity with cathepsin H, and presumably a similar structure-function relationship (Kirschke et al., 1995; reviewed in Turk et al., 1997, 1998). The cystatins are CP inhibitors that can place N-terminal residues along the active site in the same orientation as a substrate, thereby occupying the S2 subsite, and also place hairpin loop regions in the active site (Turk et al., 1997). Just as the CP loops can interfere with the binding of polypeptide substrates, they can also interfere with the binding of cystatins. For example, cystatin C inhibits cathepsin B less well than cathepsin L. There are also proteins with homology to papain that lack peptidase activity due to substitutions in the active site (e.g., the testins, see below).

The overall conservation of cathepsin CP structure is maintained in both the ancient A and B branches. However, sensitivity to inhibition by cystatins is not uniformly distributed. In particular, cathepsin B is only poorly inhibited, while cathepsin X is resistant. Taking into account the ancient origins of DPP I, and the more recent mammalian origin of salivary cystatins (Dickinson, manuscript in preparation), it seems more probable that the target(s) for salivary cystatins is a member of branch B.

    (2) The proregion
Typically, Clan CA enzymes are synthesized as inactive preproenzymes with a signal peptide and a multifunctional N-terminal proregion. The C1A subfamily proregion varies between 38 and 251 residues, depending on the enzyme and, to a smaller extent, on the species. Branch B-type CPs have a proregion around 60-100 residues in length. Cathepsin X is unusual in having a very short proregion (38 or 41 residues, depending on the prepeptide cleavage site), while DPP I has a long 206 (human)-residue prosegment due to an N-terminal extension. Alignments of the proregions of the 11 known human cathepsins, novel rodent cathepsins, and those of papain and aleurain are shown in Fig. 3Go. It can be seen that the definitive human Branch B-type CPs (cathepsins L, V, S, K, and H), as well as the rodent placental cathepsins (M, P and Q), have related proregions with similarities to those of papain and aleurain (Fig. 3AGo). The rodent proteins CTLA-2 and the testins also have homology to the Branch B proregions. As noted above, the origins of the related cathepsins F and lymphopain, and the more distantly related cathepsin O, are not well-resolved with respect to Branches A and B. However, cathepsin F and lymphopain show clear similarity to other Branch B enzymes in their proregions, and they have quite good matches to three conserved motifs within the Branch B-type prodomain (see below), consistent with the closer relationship to Branch B suggested by the tree constructed with the use of the mature enzyme sequences (Fig. 2Go). The cathepsin O proregion has a modest overall similarity, and partial matches to the three motifs. In contrast, the proregions of Branch A-type CPs (cathepsins B, X, and DPP I) are distinct from those of Branch B (Fig. 3Go), and lack matches to the conserved motifs. The proregion of cathepsin X shows some similarity to that of cathepsin B (Fig. 3BGo), while that of DPP I appears to be unrelated to other CPs (Fig. 1CGo). This fits with the results summarized above from phylogenetic analyses of larger numbers of enzymes from diverse species that identify these two distinct major branches. Despite the low sequence similarity of the proregions of cathepsins B, L, and K, analysis of their crystal structures shows that they have a common fold (Coulombe et al., 1996; LaLonde et al., 1999). Most likely this represents high divergence of ancestrally related sequences, but convergent evolution of ancestrally unrelated domains cannot be excluded at present.




View larger version (67K):
[in this window]
[in a new window]
 
Figure 3. Proregion alignments of known human cathepsins and selected rodent and plant proteins. Proregion alignments (excluding signal peptides and the N-terminal extension of cathepsin F) are shown in Panel A (Branch B proteins), Panel B (Branch A cathepsins B and X) and Panel C (DPP I). The same dataset described in Fig. 1Go was used to generate the alignment, with the inclusion of rat CTLA-2ß (CTLA-2b, GenBank P12400). Signal peptide and propeptide cleavage sites were obtained from the GenBank entries, or from preliminary alignments. Many of the proregions are relatively short, and at least some are potentially unrelated to each other. To avoid spurious alignments, we constructed a preliminary tree of the mature protein dataset using ClustalX. This was used to guide the construction of sets of profile alignments of proregions of most closely related proteins (e.g., cathepsin L proregions, cathepsins F, lymphopain and O). These profiles were then aligned to each other, beginning with the cathepsin L profile, and progressively adding more distantly related Branch B vertebrate cathepsins. After aligning cathepsins V, we added the placental cathepsins (M, P, and Q), K, S, and H, papain, and then aleurain, followed by the F, lymphopain, and O profiles, then testin and finally CTLA-2ß, to produce a Branch B proregion alignment. A similar strategy was used to align the cathepsin B profile with the cathepsin X profile. Efforts to align cathepsin B and DPP I profiles with the Branch B alignment produced a large number of gaps, suggesting spurious alignments. The cathepsin X profile produced an alignment with the Branch B proregion showing modest similarity and a limited number of gaps. This could have been due to chance, or it could reflect a transitional form between a Branch A and Branch B type proregion. Predominant identical residues (> 50%) at a position are shown with a black background, predominant similar residues on a grey background. The majority consensus residue is shown under each alignment. An uppercase letter shows a residue conserved in all aligned sequences (excluding cathepsin O, which was not included due to its extensive divergence). Consensus motifs implicated in Branch B proregion function are shown above the alignment in Panel A; residues implicated in cathepsin B proregion function are denoted by an * under the consensus sequence in Panel B.

 
The proregion is multifunctional. That of cathepsin L is essential for proper folding of the enzyme, and the proregions of other enzymes, e.g., aleurain, cannot substitute for it (Tao et al., 1994; Vernet et al., 1995). Many mammalian cathepsins are predominantly lysosomal. Newly synthesized CPs are routed via the signal peptide to the endoplasmic reticulum and then to the lysosomes (vacuoles in plants) or for secretion (reviewed in Rawlings and Barrett, 1994; Mort and Buttle, 1997). Mammalian lysosomal CPs have one or more potential N-glycosylation sites that can be located in either the proregion and/or the mature enzyme. Phosphorylation of certain mannose residues and binding to mannose 6-phosphate receptors in the Golgi results in targeting to the lysosome. However, there is evidence for mannose 6-phosphate independent transport (reviewed in von Figura and Hasilik, 1986). A 9-residue sequence in the proregion of cathepsin L was shown to mediate pH-dependent, mannose 6-phosphate-independent association with microsomal membranes (McIntyre et al., 1994). This motif has similarity to the yeast vacuolar-sorting sequences. An homologous motif in the protozoal cruzipain (cruzain) was subsequently shown to be necessary and sufficient for lysosomal targeting of green fluorescent protein in trypanosomes (Huete-Perez et al., 1999), suggesting that this motif represents an ancient lysosomal targeting system. This Branch B-type proregion peptide lysosome targeting motif is absent from the cathepsin B proregion (Fig. 3Go).

The N-terminal propeptide segment has been shown to be a potent and relatively specific inhibitor that serves to maintain the precursor in an inactive state until cleaved (Carmona et al., 1996; reviewed in Turk et al., 1997). The proregions bind to the active site in a linear, extended conformation, but in the reverse orientation to normal substrates. This binding is distinct to the interaction of CPs with cystatins, and likely provides a combination of a good fit to the site with resistance to proteolysis. The inhibitory activities of the cathepsin L and B proregions are sharply pH-dependent; for cathepsin L, the Ki is less than 0.5 nM at a pH above 4.3, but rises to 3.0 nM at pH 4.0, consistent with a low pH-dependent autoactivation of the enzyme (Fox et al., 1992; Carmona et al., 1996). Construction of cathepsin L proregion peptides containing various N- and C-terminal deletions allowed the inhibitory domain to be localized to a stretch of 30 residues located just following the peptide lysosome targeting motif (Carmona et al., 1996). Deletion of this domain caused a more than 200-fold increase in the Ki value. This domain contains the so-called ERFNIN motif: the conserved propeptide sequence E X3 R X2 (I/V) F/W X2 N X3 I X3 N previously identified in Branch B enzymes, but not other members of clan CA, from a phylogenetically diverse group of organisms (Karrer et al., 1993) (see Fig. 3Go). It is located within an alpha helix, and the conserved residues are in contact with the surface of the enzyme. Interestingly, this motif is also found in the mouse cytotoxic T-lymphocyte antigen-2 (CTLA-2) {alpha} and ß gene products (Karrer et al., 1993; see Fig. 3AGo). These proteins show significant similarity to the L-type proregions (Denizot et al., 1989), and the CTLA-2ß protein has been shown to be a good inhibitor of cathepsin L (Ki = 24 nM), H (IC50 = 67 nM), and papain (Ki = 25 nM), but not cathepsin B (Delaria et al., 1994). It is likely that the CTLA-2 genes evolved by duplication of an ancestral L-type cathepsin gene and subsequent deletion of the enzyme-coding region. However, it should be noted that CTLA-2ß is a relatively non-specific inhibitor and can exist as a dimer or tetramer. Thus, it may be functionally distinct from the L-type proregions. The species distribution of CTLA-2-like proteins has not been explored. Inhibitory regions of the 56-amino-acid rat cathepsin B proregion have also been examined (Chen et al., 1996). Two regions were identified that caused 150- and 625-fold increases in the Ki. Alanine scanning identified W-24p and C-42p (rat procathepsin B numbering) as the most important residues within these regions.

In general, the N-terminal propeptide must be removed proteolytically for activation (but see below). pH-dependent auto-catalysis is believed to proceed in trans (reviewed in Turk et al., 2000). For the lysosomal cathepsins, the acidic conditions of the lysosome promote autocatalytic cleavage and dissociation of the propeptide (Mason and Massey, 1992; Ishidoh et al., 1998). Removal of the proregion can also occur by cleavage by other peptidases. However, the propeptide also serves to stabilize the enzyme against inactivation by denaturation under neutral or alkaline conditions (Mason et al., 1987; Yamaguchi et al., 1990). A conserved GXNXFXD motif (the GNFD motif) that is involved in both autoactivation and appropriate folding of the enzyme (Vernet et al., 1995) was identified within L-type but not other clan CA enzymes. Site-directed mutagenesis of the conserved residues and expression in yeast (which does not process the wild-type propapain) indicated that the negative charge of this region is involved in triggering processing at low pH. It will be of interest to examine the function of a highly conserved, negatively charged region that follows the GNFD motif (Fig. 3AGo). Processing of procathepsin L can be considerably enhanced by polyanions such as dextran sulfate and glycosaminoglycans (GAGs) (Mason and Massey, 1992; Ishidoh and Kominami, 1995). There is no information on whether polyanions interact with the proregion.

The propeptides of the clan CA enzymes generally share no structural similarity with the cystatins. It is therefore interesting that a region of the N-terminal propeptide extension of cathepsin F has been shown to display similarity to a cystatin domain (Nagler et al., 1999a). The level of sequence similarity is weak (it is not detected by a BLAST search of Genbank at http://www.ncbi.nlm.nih.gov), but molecular modeling predicted that the propeptide would have a structure similar to that of chicken egg white cystatin. Consistent with this conclusion, searches of a non-redundant peptide database with cystatin sequences revealed proteins in Japanese flounder, Drosophila, and Caenorhabditis elegans that contain an N-terminal domain with homology to cystatins and a C-terminal domain with significant homology to mammalian cathepsin F (Fig. 4Go). Thus, there appears to be an ancient CP lineage within the papain-related subfamily in which the propeptide contains a cystatin-like inhibitory domain. Whether this cystatin domain is a functional inhibitor remains to be established.



View larger version (82K):
[in this window]
[in a new window]
 
Figure 4. Cathepsin F-like and chicken cystatin C protein alignment. Consensus residues are indicated as described in Fig. 1Go. Residues identical in all six sequences are denoted by an uppercase letter in the consensus sequence. Presumptive N-terminal residues produced by cleavage of the leader peptide and the proregion are denoted by an * under the consensus sequence. The GenBank entries for the sequences shown are: hF, human cathepsin F, NP_003784; mF, mouse cathepsin F, AAF13147; flF, Japanese flounder Paralichthys olivaceus, AU050404 (partial peptide sequence derived from +1 frame of mRNA sequence); DrF, Drosophila melanogaster CG12163 gene product, AAF52055; CeF, Caenorhabditis elegans cathepsin F-like hypothetical protein F41E6.6, AAB65956; and chC, chicken egg white cystatin, P01038. The alignment was produced by separately aligning chicken cystatin to flounder cathepsin F, and human, mouse, Drosophila, and C. elegans cathepsin F proteins to each other by means of ClustalX. These two profiles were then aligned, and this alignment was manually adjusted by means of an alignment of all five CPs. The alignment of chicken cystatin with the cathepsin proregions agrees quite well with an alignment based on threading reported by Nagler et al. (1999a).

 
    (C) BIOCHEMICAL PROPERTIES, EXPRESSION, AND NORMAL FUNCTIONS OF MAMMALIAN CPS OF SUBFAMILY C1A
    (1) Widely expressed cathepsins
    (a) Cathepsins B, H, and L
    (i) Properties and tissue distribution
Early studies of cathepsins B, H, and L unequivocally localized these peptidases to the lysosomes of cells, to which they are all targeted in mammals by the addition of mannose-6-phosphate. The concentrations of cathepsins B and L in lysosomes of cultured cells can be as high as 1 mM (Xing et al., 1998). In contrast to most members of the papain-like subfamily (C1A), which commonly have a neutral pH optimum, these lysosomal peptidases have an acidic pH optimum (around pH 6.0, depending in part on the substrate), and their activity would be maximal in the acidic environment of the lysosome (reviewed in Barrett and Kirschke, 1981). Cathepsin L will degrade almost any protein, while cathepsins B and H are more limited in their degradative abilities. Cathepsins L and H are efficiently inhibited by chicken egg-white cystatin and human cystatin C, while cathepsin B is less efficiently inhibited.

Cathepsin B, H, and L proteins and activities (reviewed in Barrett and Kirschke, 1981; Howie et al., 1985; Kirschke et al., 1995; Xing et al., 1998) and mRNAs (Qian et al., 1989; Söderström et al., 1999) have been detected in all tissues and cells examined. Consistent with this ubiquitous pattern of expression, these genes lack the TATA box motifs normally found in highly regulated genes but frequently absent from constitutively expressed genes (Ishidoh et al., 1989a,b; Qian et al., 1991). However, there is significant variation in the levels of these enzymes and their ratios in different tissues and cells (e.g., Qian et al., 1989; Gong et al., 1993; Katunuma et al., 1993; Söderström et al., 1999). Cathepsin B is the most abundant and widely expressed cathepsin and is found at high levels in macrophages. At the mRNA level, the highest levels are found in non-skeletal tissues. Cathepsin B levels in skeletal tissues are not greatly lower than those of non-skeletal tissues, while cathepsin H skeletal tissue mRNA levels are very low. Cathepsin L levels are generally higher in tissues that turn over more rapidly, such as the liver and ovary, and in phagocytic cells such as stimulated macrophages. In the rat, high levels of cathepsin B and L mRNA are found in the kidney.

In addition to transcriptional regulation, there is evidence that cathepsin levels may be governed by post-transcriptional processing and differences in translation rates of alternative transcripts (Chauhan et al., 1993; Gong et al., 1993; Yan et al., 1998). The level of expression of cathepsin L in fibroblasts is increased by several growth factors (e.g., epidermal growth factor, fibroblast growth factor, and platelet-derived growth factor), phorbol esters, and by oncogene-mediated transformation in vitro (reviewed in Ishidoh and Kominami, 1998). Cathepsin L has also been shown to be induced in granulosa cells of growing follicles by follicle-stimulating hormone, and in pre-ovulatory follicles in response to leuteinizing hormone in a progesterone receptor-dependent manner (Robker et al., 2000). Cathepsin B levels may be associated with cell differentiation (reviewed in Yan et al., 1998). Cathepsin B was not detected immunohistochemically in normal minor salivary glands (Steinfeld et al., 2000). However, minor glands in organotypic culture expressed significant levels of cathepsin B, primarily in the ducts, and these levels were substantially increased by treatment with prolactin.

    (ii) Functions of cathepsins B, H, and L
Until recently, the ubiquitous lysosomal distribution of cathepsins B, H, and L has led them to be considered primarily "housekeeping" enzymes essential to the normal protein turnover of cells. Consistent with this view, broad CP inhibitors block up to 40% of cellular protein turnover (Shaw and Dean, 1980; reviewed in Barrett and Kirschke, 1981). However, regulation by growth factors and variation in expression levels imply duties beyond those of "housekeeping" and raise the possibility of tissue-specific functions. A powerful approach to the study of function in vivo is the generation of homozygous null mutants through the generation of transgenic knockout mice. Surprisingly, homozygous cathepsin-B-deficient mice have an apparently normal phenotype (Deussing et al., 1998). This suggests functional redundancy but raises the question of why a redundant gene has been so conserved throughout vertebrate evolution. Cathepsin-L-deficient mice have periodic shedding of fur and abnormal skin morphology but are otherwise viable (Nakagawa et al., 1998). Significantly, these mice also have a defect in major histocompatibility complex (MHC) class-II-mediated antigen presentation. In antigen-presenting cells (APCs), extracellular foreign proteins are internalized via endocytosis or phagocytosis and degraded to peptides in the endocytic pathway. Major histocompatibility complex (MHC) class II molecules bind derived antigenic peptides and present them on the cell surface to CD4+ T-helper cells. Intracellular trafficking of the MHC class II molecules and binding of antigen are regulated processes (reviewed in Wolf and Ploegh, 1995). In the endoplasmic reticulum, class II {alpha} and ß chains form a heterodimer, and three {alpha}ß heterodimers associate with an invariant (Ii) chain trimer. This nonamer is then transported to the Golgi apparatus and sorted to the endocytic pathway by a signal in the Ii chain cytoplasmic domain, preventing it from entering the constitutive secretory pathway. In the endocytic compartment, the MHC class II molecules can encounter the foreign peptides. However, the Ii chain binds to the peptide-binding domain, blocking this interaction until it is removed by sequential proteolytic cleavage. The Iip10 fragment is the smallest that retains the N-terminal endosome-targeting sequence and a C-terminal extension in the peptide-binding groove. Further cleavage of the Iip10 fragment causes dissociation of the nonamer and release of {alpha}ß heterodimers bound to the CLIP fragment of the Ii chain, which occupies the peptide-binding site until the heterodimer interacts with another class-II-like chaperone molecule (HLA-DM in humans). This causes release of CLIP and allows peptide binding to occur. If the Ii chain is not cleaved, the nonamers can be targeted to the lysosome by the Ii chain cytoplasmic tail and degraded. Thus, cleavage of Iip10 is an important regulatory step, and the sequence and timing of Ii cleavage events likely determine the antigenic peptides presented. With a transgenic mouse knockout, cathepsin L has been shown to be essential for the degradation of the invariant (Ii) chain and cleavage of Iip10 to produce the CLIP fragment during MHC class-II-restricted antigen presentation in cortical thymic epithelial cells, but not in bone-marrow-derived antigen-presenting cells, which instead use cathepsin S (Nakagawa et al., 1998). Interestingly, the p41 form of the invariant chain contains a 64-amino-acid fragment with a thyroglobulin type 1 domain (Lenarcic et al., 1997) that binds and inhibits cathepsin L (Ki 1.7 pM), but not cathepsin S (Guncar et al., 1999). It may therefore be involved in regulation of Ii degradation, and in production of antigenic epitopes in endosomes (Fineschi et al., 1996).

Although lysosomal peptidases, including CPs, are undoubtedly involved in peptide antigen processing, the exact role of individual enzymes remains equivocal (Villadangos and Ploegh, 2000). The use of inhibitors "specific" for individual cathepsins has provided evidence for a role for cathepsins B (a CP) and D (an aspartyl peptidase) in antigen processing both in vitro and in vivo. For example, treatment of a mouse T-cell clonal line with a cathepsin B inhibitor suppressed processing of an ovalbumin antigenic epitope, and treatment of mice immunized with ovalbumin with this inhibitor suppressed the Th2 response and IgE production (Katunuma et al., 1998). Similarly, treatment of mice experimentally infected with Leishmania major with a cathepsin B inhibitor causes a switch in the immune response from Th2 to Th1, possibly reflecting a change in antigen processing (Maekawa et al., 1998). However, the use of inhibitors in these studies is complicated by the potential lack of complete specificity, and by the fact that the various cathepsins are involved in transprocessing of each other (e.g., Ishidoh et al., 1999). Cathepsin-B-deficient mice show no evidence for a role of cathepsin B in MHC class-II-mediated antigen presentation (including ovalbumin), indicating either that cathepsin B is not involved in this process, or that there is redundancy in the proteolytic system (Deussing et al., 1998).

Cathepsin L and, to a lesser extent, cathepsin B have been implicated in normal tissue-remodeling events. Hormonal regulation of cathepsin L levels in the granulosa cells of follicles suggests that it may be involved in the degradation of the follicle wall that leads to release of the mature oocyte (Robker et al., 2000). Cathepsin B mRNA levels rise in the apoptotic lumenal epithelial cells of regressing prostate and mammary glands, consistent with a role in degradation of the basement membrane, an early event in cell death (Guenette et al., 1994). Cathepsin CPs have been implicated in various stages of embryogenesis. The supply of amino acids to the developing mouse embryo prior to development of the chorioallentoic placenta is mediated by proteolysis of proteins in the visceral yolk sac, and levels of active cathepsin L are relatively high in this tissue at this time, in comparison both with later times and with the placenta (prior to parturition), as well as with cathepsin B (Sol-Church et al., 1999b). During implantation of the embryo, the embryonic trophoblasts invade the uterine stroma in a controlled manner, degrading the extracellular matrix (ECM). The endometrial connective tissue cells respond with the decidual reaction, which involves an enlargement of the cells and remodeling of the ECM. This provides a barrier to uncontrolled trophoblast invasion, and facilitates formation of an immunologically privileged site. As the placenta forms, decidual cells adjacent to the embryo undergo apoptosis and are phagocytized by the trophoblasts. The mouse placenta expresses substantially higher levels of cathepsin L mRNA relative to tissues such as the liver and kidney, and these levels are at their highest during implantation, suggesting a possible role in this process (Hamilton et al., 1991; Sol-Church et al., 1999b). The placenta also secretes procathepsin L, which may have proteolytic activity under certain circumstances, as well as other activities (see below). Injection of higher doses of E-64 into pregnant mice during the period of blastocyst attachment leads to a complete failure of implantation. Lower doses result in stunted embryos and a reduced decidual reaction (Afonso et al., 1997). These results suggest that CPs are essential for normal embryo development and decidualization of the uterus. Previously, it was suggested that cathepsins B and L were important in these processes (Afonso et al., 1997). However, the subsequent construction of cathepsin-B- and L-deficient mice (see above), which appear to grow and develop normally during gestation, makes this possibility seem less likely, although there could easily be redundancy in the enzyme systems. The recent discovery of placental-specific CPs (see below) might lead to clarification of these issues in the future. Placental cathepsin L mRNA levels also rise prior to parturition, possibly related to the degeneration of tissue around the placenta in preparation for birth (Hamilton et al., 1991). The role of cathepsin CPs in human implantation is unknown.

Thus far, discussion of the functions of cathepsins B, H, and L has primarily been confined to a lysosomal or endosomal location: the degradation of proteins trafficking through the endosomal system. However, it is also now clear that cathepsins B, H, and L are not purely lysosomal, and that they can be released from cells under various circumstances (see below). In the presence of thiol compounds, cathepsin B is active in the pH range of 5-6, while cathepsin L is active at pH 3-6.5, and cathepsin H has an optimum of 6.5-6.8 (Kirschke et al., 1995). In these pH ranges, cathepsins B and L and, to a lesser extent, cathepsin H can degrade a variety of components of the extracellular matrix, such as proteoglycans, laminin, and collagens II, IX, and XI (Maciewicz et al., 1990a; Buck et al., 1992; reviewed in Kirschke et al., 1995). Cathepsin L is a potent elastase at the optimal pH (5.5), where it is almost as active as pancreatic elastase, and significantly more than neutrophil elastase (both serine peptidases) (Chapman et al., 1994). In contrast, cathepsin B is 100-fold less active than cathepsin L against this substrate (Mason et al., 1986). Cathepsins B, H, and L are unstable at neutral pH, and are irreversibly inactivated above pH 7 (Barrett and Kirschke, 1981). Cathepsin L has a half-life of only about 1 minute at pH 7.2 and 37°C (Wang B et al., 1998), while cathepsin B is about 15-fold more stable (Turk et al., 1995). The rate of auto-degradation of cathepsin B at neutral pH is reduced in the presence of alternative substrates (Buck et al., 1992). Such instability would be expected to limit the extracellular degradative activity of these enzymes severely. Further, the concentration of cystatin C in vivo is sufficiently high to provide rapid and effective inhibition of cathepsin L and cathepsin B (even though the latter is less-well-inhibited by this cystatin), provided it remains in molar excess (Turk et al., 1995). However, various conditions can arise to enhance the stability of these enzymes (see below), and in contrast to the active enzymes, the proenzymes (which can also be released (see below)) are stable at neutral pH, as is a complex of mature cathepsin B and the proregion.

    (iii) Cathepsins B, H, and L in the oral cavity
As lysosomal enzymes, cathepsins B, H, and L likely function in normal protein turnover of intracellular and endocytosed proteins in oral as in other tissues. Cathepsin B has been immunolocalized to granular duct cells in the rat submandibular gland and co-localized with renin in secretory granules, suggesting a role in processing secreted proteins (Sano et al., 1993). The role of these cathepsins—either intracellular or extracellular—in normal remodeling of oral tissues has not been addressed to any great extent. Cathepsins B and L have been localized to gingival fibroblasts, and this source may have a role to play in periodontal disease (discussed in detail below). Interestingly, phenytoin and cyclosporin A suppress the expression of cathepsin L (as well as of MMP-1 and TIMP-1), but not cathepsin B, in cultured gingival fibroblasts. Both these drugs induce gingival overgrowth, suggesting that some of this overgrowth is the result of impaired extracellular matrix degradation involving cathepsin L (Yamada et al., 2000).

It is axiomatic that the immune system is central to the maintenance of oral health, and the progression from gingivitis to periodontal disease. Therefore, the involvement of cathepsins B, H, and L in the function of the immune system described above also applies to the oral cavity. As lysosomal enzymes, they also function in phagocytosis and can be released extracellularly by immune cells, where they can be involved in remodeling (or damaging) the extracellular matrix and tissues as outlined above. However, these released cathepsins can also participate in more powerful proteolytic cascades. This area, and the number of studies which have examined the activities of cathepsins B, H, and L in gingival fluids with respect to periodontal disease, are discussed below. Their potential role in Sjögren's syndrome is also discussed in a separate section.

    (b) Dipeptidyl peptidase I (DPP I)
    (i) Properties and tissue distribution
Dipeptidyl peptidase I (DPP I) is the accepted nomenclature for an enzyme previously called cathepsin C, among other names (e.g., cathepsin J). DPP I is a Branch A enzyme most closely related to cathepsin B, and is likely to be phylogenetically widely distributed. It is a lysosomal CP with a pH optimum of 5-6 that primarily cleaves dipeptides from the N-terminus of polypeptides, although it also has endopeptidase activity (Kirschke et al., 1995). It does not cleave substrates with N-terminal Arg, Lys, or Pro, or Pro in the penultimate position. It has some distinct differences from other lysosomal CPs: It has a long 206 (human)-residue prosegment that has an N-terminal extension relative to the papain-related CPs, it forms oligomers of around 200 kDa, and it requires halide ion to be maximally active. The enzyme is inhibited by stefins A and B and chicken cystatin, but only weakly by E-64, and is unstable at > pH 7.5 (Nikawa et al., 1992; Dolenc et al., 1996).

In the mouse, Western blot analysis demonstrated DPP I in the majority of tissues examined (Pham et al., 1997): The highest levels were found in the spleen, lung, liver, and small and large intestines, while very low levels were found in the heart and brain. Comparable results were found for the mRNA distribution (Rao et al., 1997). DPP I is also present in various immune cells, including neutrophils, lymphocytes, and macrophages, and treatment of lymphocytes with interleukin-2 (IL-2) was shown to cause a significant increase in DPP I mRNA levels (Rao et al., 1997).

    (ii) Functions of DPP I and its role in pre-pubertal periodontitis and Papillon-Lefèvre and Haim-Munk syndromes
In mammals, multiple functions have been ascribed to DPP I. It is thought to have a role in general protein degradation and turnover. More specific functions have been suggested, such as activation of platelet factor XIII. Recently, DPP I was shown to be required for the activation of granzymes—serine peptidases important in cytotoxic lymphocyte granule-mediated apoptosis—and could be involved in activation of other serine peptidase zymogens such as neutrophil elastase (Pham and Ley, 1999).

Missense mutations in the DPP I gene, located at 11q14, have very recently been shown to be responsible for one recessive form of pre-pubertal periodontitis, a rapidly progressing, heritable form of the disease that affects the primary dentition (Hart et al., 2000). Two distinct autosomal-recessive palmoplantar keratoderma disorders, Papillon-Lefèvre syndrome and Haim-Munk syndrome, characterized by hyperkeratosis of specific epithelial areas, particularly the hands and feet, are also characterized by severe early-onset periodontitis, resulting in the loss of the primary and secondary dentition. Papillon-Lefèvre syndrome is usually first diagnosed by dentists. Both syndromes have now been shown to result from mutations in the DPP I gene (Hart et al., 1999, 2000; Toomes et al., 1999). Why loss of this widely distributed lysosomal enzyme should preferentially affect these tissues is unknown, although Chediak-Higashi syndrome, which also affects lysosomes, is also associated with immune dysfunction and severe early-onset periodontal disease (Tempel et al., 1972; Introne et al., 1999). The association of DPP I with these disorders illustrates that a ubiquitously expressed cathepsin can have tissue-specific functions, and need not be confined to a housekeeping function.

    (c) Cathepsins O and X
    (i) Properties and tissue distribution
Little is known about cathepsin O. It was originally cloned from breast tumor tissue by the polymerase chain-reaction (PCR) by means of primers directed to conserved CP sequences (Velasco et al., 1994). It has a predicted prodomain of a typical length (84 residues) but with only partial matches to the three consensus sequences discussed above (see Fig. 3AGo). Northern analysis demonstrated mRNA in all tissues, with the highest levels in the ovary, kidney, liver, and placenta and the lowest in the thymus and skeletal muscle. The native protein has not been purified, although a recombinant protein has been obtained by expression in E. coli. No enzymatic properties (pH profile, stability, inhibition) have been reported.

Another novel human cathepsin has been independently characterized by three groups, who used identification of novel ESTs in the database, followed by screening of cDNA libraries. It was initially designated as cathepsin X (Nagler and Menard, 1998), cathepsin Z (Santamaria et al., 1998a), and cathepsin P (Pungercar and Ivanovski, 2000). Phylogenetic analysis (see Fig. 2Go) indicates that a CP designated as cathepsin Y cloned from rat spleen (Sakamoto et al., 1999) is the rat ortholog of cathepsin X. Cathepsin X is unusual in having a very short proregion (38 or 41 residues, depending on the prepeptide cleavage site) that is even smaller than that of cathepsin B. It completely lacks the N-terminal region that contains the lysosomal targeting consensus and the ERNF/WNIN motif found in the cathepsin L group. The role of this short proregion in folding, inhibition, and stabilization at different pHs remains to be determined. However, it does contain a cysteine residue in a position similar to that of a cysteine in cathepsin B that has been shown to be important in inhibition by the proregion (Chen et al., 1996; see above). Two potential N-glycosylation sites are present in the mature protein that could serve to target it to the lysosome. Interestingly, the proregion also contains an RGD integrin-binding motif.

Recombinant human procathepsin X was obtained by expression in Pichia pastoris (Nagler et al., 1999b). Unlike other cathepsins, it did not activate auto-catalytically at low pH, but cathepsin L was found to convert the proenzyme efficiently to the active form. Cathepsin X was found to be a very good carboxypeptidase, with a pH optimum around 5.0, and a relatively poor endopeptidase. The 3D structure of human procathepsin X has been determined. A Cys residue in the proregion is covalently bound to the active-site Cys, and a 3-residue "mini-loop" insertion between the Gln of the oxyanion hole and the active-site cysteine (predicted by primary sequence alignment algorithms) partially occludes the S2' subsite, providing an explanation for the carboxypeptidase activity (Sivaraman et al., 2000). It is not inhibited by human cystatin C.

Northern blot and RT-PCR analysis demonstrated ubiquitous expression of cathepsin X, although the levels varied considerably between tissues (Nagler and Menard, 1998; Santamaria et al., 1998a; Deussing et al., 2000; Pungercar et al., 2000). Ubiquitous expression in the mouse and human was consistent with the characterization of the promoter as housekeeping-type (Deussing et al., 2000). Cathepsin X was also highly expressed in a variety of cancer cell lines, and may therefore be up-regulated with malignant transformation (Santamaria et al., 1998a; Pungercar et al., 2000). Cathepsin X was immunolocalized in human hepatocytes and Kupffer cells, and in the epithelial cells of distal tubules (Pungercar et al., 2000). It showed a diffuse, mostly peri-membranous distribution, in contrast to the punctated, granular distribution shown by cathepsin B, which was also primarily localized to the proximal tubules of the kidney. This suggests that cathepsin X may be localized to the membrane or the adjacent extracellular space. An examination of expression in oral tissues has not been reported.

    (ii) Functions
The physiological functions of cathepsin X are unknown. The rat enzyme was initially identified based on its ability to produce bradykinin-potentiating peptide from plasma (Sakamoto et al., 1999). In equimolar amounts, this peptide increases the activity of bradykinin seven-fold, and in two-fold excess, by 23-fold. The precursor protein for this peptide is unknown. Bradykinin has been shown to synergize with IL-1 or TNF{alpha} to stimulate IL-6 production by human gingival fibroblasts (Modéer et al., 1998). Therefore, cathepsin X activity could contribute to the pathogenesis of periodontal disease by increasing the effect of this pro-inflammatory mediator.

    (d) Cathepsin F
    (i) Properties and tissue distribution
Cathepsin F was independently cloned by three groups either by using PCR and degenerate oligonucleotides directed to conserved CP regions or by identifying novel ESTs in the database (Wang B et al., 1998; Nagler et al., 1999a; Santamaria et al., 1999). The proregion is very large (251 residues), due to an N-terminal extension with an N-terminal region which has similarity to a cystatin domain, followed by a 50-residue flexible linker peptide (Nagler et al., 1999a). The following C-terminal segment of this proregion has overall similarity to the Branch B-like group, although it is most similar to lymphopain. Like lymphopain, it contains a peptide lysosome-targeting motif, followed by a partial match to the ERWNIN motif, ERFNAQ, consistent with these enzymes forming a phylogenetically distinct subgroup (Wex et al., 1999). The proprotein contains 5 potential N-glycosylation sites. Transient expression in Cos-7 cells localized the protein to vesicles, most likely lysosomes (Wang B et al., 1998).

Cathepsin F has been expressed in Pichia pastoris (Wang B et al., 1998). The enzyme autocatalytically activated at an acidic pH, and was shown to have a level of activity toward synthetic substrates similar to that of cathepsin L, with a broad pH optimum between 5.2 and 6.8. The catalytic efficiency (kcat/Km) was comparable with that of cathepsin L, which is the most active lysosomal CP cathepsin. Like cathepsins K, L, and S, cathepsin F prefers a bulky hydrophobic or aromatic residue at the P2 position. The enz