Figure 3. Proregion alignments of known human cathepsins and selected rodent and plant proteins. Proregion alignments (excluding signal peptides and the N-terminal extension of cathepsin F) are shown in Panel A (Branch B proteins), Panel B (Branch A cathepsins B and X) and Panel C (DPP I). The same dataset described in Fig. 1 was used to generate the alignment, with the inclusion of rat CTLA-2ß (CTLA-2b, GenBank P12400). Signal peptide and propeptide cleavage sites were obtained from the GenBank entries, or from preliminary alignments. Many of the proregions are relatively short, and at least some are potentially unrelated to each other. To avoid spurious alignments, we constructed a preliminary tree of the mature protein dataset using ClustalX. This was used to guide the construction of sets of profile alignments of proregions of most closely related proteins (e.g., cathepsin L proregions, cathepsins F, lymphopain and O). These profiles were then aligned to each other, beginning with the cathepsin L profile, and progressively adding more distantly related Branch B vertebrate cathepsins. After aligning cathepsins V, we added the placental cathepsins (M, P, and Q), K, S, and H, papain, and then aleurain, followed by the F, lymphopain, and O profiles, then testin and finally CTLA-2ß, to produce a Branch B proregion alignment. A similar strategy was used to align the cathepsin B profile with the cathepsin X profile. Efforts to align cathepsin B and DPP I profiles with the Branch B alignment produced a large number of gaps, suggesting spurious alignments. The cathepsin X profile produced an alignment with the Branch B proregion showing modest similarity and a limited number of gaps. This could have been due to chance, or it could reflect a transitional form between a Branch A and Branch B type proregion. Predominant identical residues (> 50%) at a position are shown with a black background, predominant similar residues on a grey background. The majority consensus residue is shown under each alignment. An uppercase letter shows a residue conserved in all aligned sequences (excluding cathepsin O, which was not included due to its extensive divergence). Consensus motifs implicated in Branch B proregion function are shown above the alignment in Panel A; residues implicated in cathepsin B proregion function are denoted by an * under the consensus sequence in Panel B.