Glycobiology Advance Access originally published online on July 13, 2005
Glycobiology 2006 16(1):1R-27R; doi:10.1093/glycob/cwj008
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
REVIEW |
Siglecsthe major subfamily of I-type lectins
2 Department of Medicine, 3 Department of Cellular & Molecular Medicine, and 4 Glycobiology Research and Training Center, University of California, San Diego, La Jolla, CA 92093; and 5 Research Center for Glycoscience, National Institute of Advanced Industrial Science and Technology, Tsukuba, Ibaraki 305-8568, Japan
1 To whom correspondence should be addressed; e-mail: varkiadmin{at}ucsd.edu
Received on April 25, 2005; revised on July 2, 2005; accepted on July 2, 2005
| Abstract |
|---|
Animal glycan-recognizing proteins can be broadly classified into two groupslectins (which typically contain an evolutionarily conserved carbohydrate-recognition domain [CRD]) and sulfated glycosaminoglycan (SGAG)-binding proteins (which appear to have evolved by convergent evolution). Proteins other than antibodies and T-cell receptors that mediate glycan recognition via immunoglobulin (Ig)-like domains are called "I-type lectins." The major homologous subfamily of I-type lectins with sialic acid (Sia)-binding properties and characteristic amino-terminal structural features are called the "Siglecs" (Sia-recognizing Ig-superfamily lectins). The Siglecs can be divided into two groups: an evolutionarily conserved subgroup (Siglecs-1, -2, and -4) and a CD33/Siglec-3-related subgroup (Siglecs-3 and -513 in primates), which appear to be rapidly evolving. This article provides an overview of historical and current information about the Siglecs.
Key words: Siglecs / sialic acids / lectins / immunoglobulin superfamily / evolution
| Introduction |
|---|
The major classes of animal macromolecules are nucleic acids, proteins, lipids, and complex carbohydrates (hereafter called glycans). Interactions amongst these molecules play vital roles in biological processes. Animal glycan-binding proteins can be broadly classified into animal lectins (Drickamer and Taylor, 2003
|
Historical background, definition and nomenclature of I-type lectins and Siglecs
The immunoglobulin superfamily (IgSF) is a diverse and evolutionarily ancient protein group whose appearance predated the emergence of antibodies (immunoglobulins) (Chothia and Jones, 1997
). IgSF members are involved in homotypic and heterotypic protein : protein interactions mediating various biological functions (Williams and Barclay, 1988
; Edelman and Crossin, 1991
; Chothia and Jones, 1997
). Until the 1990s, it was assumed that IgSF members (other than some antibodies) did not mediate glycanprotein interactions. However, indirect evidence had been presented for glycan recognition by certain IgSF members, such as neural cell adhesion molecule (Kadmon et al., 1990
; Horstkorte et al., 1993
), P0 (Filbin and Tennekoon, 1991
), and intercellular adhesion molecule-1 (Rosenstein et al., 1991
; McCourt et al., 1994
). The first direct evidence emerged from independent work on sialoadhesin (Sn, expressed on macrophage subsets) and on CD22 (found on mature-resting B cells). Experimental removal of cell surface sialic acids (Sias) with sialidases was often used to enhance cellcell interactions by reducing negative charge repulsion. In contrast, cellcell interactions mediated by both Sn (Crocker and Gordon, 1989
) and CD22 (Stamenkovic et al., 1991
) were abolished by sialidase treatment, suggesting that Sias were ligands for these proteins. Purified Sn was then shown to recognize certain glycolipids and glycoproteins in a Sia-dependent manner (Crocker et al., 1991
). Meanwhile, CD22 had been cloned by others (Wilson et al., 1991
; Engel et al., 1993
) and shown to be an IgSF member. Availability of recombinant soluble forms of the extracellular domains of CD22 fused to the hinge and constant (Fc) region of human IgG (CD22-Fc) (Stamenkovic and Seed, 1990
; Stamenkovic et al., 1991
; Aruffo et al., 1992
) then allowed the direct demonstration of Sia recognition by CD22, but only when presented in
2-6 linkage (Powell et al., 1993
; Sgroi et al., 1993
). The importance of Sia structure in recognition was confirmed by treating target cells or cognate glycans with mild periodate, under conditions selectively oxidizing only the C7-C9 exocyclic side chains of Sias (Powell et al., 1993
; Sgroi et al., 1993
; Powell and Varki, 1994
; Sjoberg et al., 1994
). This represented the first conclusive proof that an IgSF family member other than an antibody or a T-cell receptor could specifically recognize a glycan, suggesting the generic name, "I-type lectin" (Powell and Varki, 1995
).
Meanwhile, cloning of Sn showed that it was also an IgSF member with 17 extracellular Ig-like domains, the amino-terminal 4 of which showed homology with corresponding domains of CD22 (Crocker et al., 1994
). Studies of recombinant soluble Fc-fusion proteins then showed that although the first two Ig-like domains were necessary and sufficient to mediate Sia-dependent binding for CD22 (Engel et al., 1995
; Law et al., 1995
; Nath et al., 1995
), only the first domain was required for Sn (Crocker et al., 1994
; Kelm et al., 1994a
). Close homology of this amino-terminal V-set Ig-like domain to the corresponding domains of CD33, mammalian myelin-associated glycoprotein (MAG), and avian Schwann cell myelin protein (SMP) suggested that these proteins might also recognize Sias. This was proven by transfection of full-length cDNAs into heterologous cell types and/or production of IgFc-fusion proteins (Kelm et al., 1994
a; Freeman et al., 1995
). Site-directed mutagenesis then confirmed that the Sia-binding site of MAG is in the first amino-terminal V-set Ig-like domain (Tang et al., 1997
).
Some groups thereafter referred to Sn, CD22, CD33, and MAG as the "Sialoadhesin family" or as "Sialoadhesins" (Kelm et al., 1996
), whereas others lumped them together with IgSF members that recognized carbohydrates, under the generic name "I-type lectins" (Powell and Varki, 1995
). The latter name did not allow appropriate subclassification of the Sia-binding molecules, and the former name was confusing, because one member already had the name and because not all were involved in cell adhesion. Our group therefore suggested the name"Siglec," which contains elements of "Sialic acid," "immunoglobulin," and "lectin." Following discussions with the Crocker group and extensive consultation among all researchers then involved, this family name was agreed upon by almost everyone (Crocker et al., 1998
). Siglecs are thus now considered a subset of I-type lectins, just as Selectins are a subset of C-type lectins (see Angata and Brinkman-Van der Linden, 2002
, for a comprehensive review of I-type lectins that are not Siglecs).
Although there was no reason to change existing names of the first four family members, it was felt useful to designate them with a numerical suffix, providing a basis for naming new members of the family subsequently discovered. Although recombinant CD22 was shown to bind Sias before Sn was cloned, the latter was given the designation Siglec-1, because it was the first member characterized as a Sia-binding lectin. Furthermore, categorizing CD22 as Siglec-2 and CD33 as Siglec-3, respectively, was useful as an "aide-mémoire." MAG and SMP were grouped together as Siglec-4a and -4b, respectively, because they are structurally and functionally related (SMP now appears likely to be the avian ortholog of MAG). Criteria for the inclusion of other IgSF-related proteins as Siglecs were defined as: (1) the ability to recognize sialylated glycans and (2) significant sequence similarity within the N-terminal V-set and adjoining C2-set domains.
It was suggested that all future publications about such proteins use the Siglec nomenclature when describing them collectively and that new members be named in the order of discovery, following consultation with others in the field (Crocker et al., 1998
). There are currently 11 human and 8 mouse molecules that fulfill the criteria. Because humans have more Siglecs than mice and the cloning of the mouse molecules initially lagged behind, the numbering system is based on the former. Two additional primate molecules have fulfilled the criteria and are therefore designated Siglecs-12 and -13 (see Table II, for a complete listing of primate and rodent Siglecs known to date and for other names given to some of these proteins). Although earlier papers did not capitalize "Siglec," this is now recommended, to allow designation of the species of origin with a prefix (e.g., human and mouse CD22 can be hSiglec-2 and mSiglec-2, respectively). Complexity in nomenclature arises from the fact that orthologs of some Siglecs in certain species have undergone mutations in an "essential" arginine residue required for optimal Sia binding (see Table II and Structural features common to all Siglecs) and therefore no longer fulfill all the criteria to be called Siglecs. The first of these was found in humans and initially called Siglec-L1 (Siglec-like molecule-1) (Angata et al., 2001b
). This molecule has a Sia-binding ("essential arginine"-containing) ortholog in the chimpanzee, designated as chimpanzee Siglec-12 (cSiglec-12). The international nomenclature group thus agreed to change the name of hSiglec-L1 to hSiglec-XII (the Roman numeral indicates that it is the Arg-mutated ortholog of cSiglec-12) (Angata et al., 2004
). Likewise, the Arg-mutated ortholog of hSiglec-5 in the chimpanzee is designated cSiglec-V, and the Arg-mutated Siglec-6 ortholog in baboon is bSiglec-VI (Angata et al., 2004
). A primate molecule deleted in humans was discovered by sequencing the chimpanzee Siglec gene cluster (see Figure 4) and designated as Siglec-13 (Angata et al., 2004
).
|
|
Expressed sequence tag (EST) clones and genomic data from humans and mice aided the groups of Crocker (Cornish et al., 1998
; Nicoll et al., 1999
; Floyd et al., 2000a
; Zhang et al., 2000
; Munday et al., 2001
) and ourselves (Angata and Varki, 2000a
,b; Angata et al., 2001a
,b
; 2002
) to clone most of the remaining Siglecs in these species. A genomics-driven approach by Diamandis group also identified several Siglec candidates (Foussias et al., 2000a
,b
; 2001
; Yousef et al., 2001
, 2002
). Approaches from other directions also contributed to the cloning and functional studies of many Siglecs. For example, a search for a leptin receptor led to the discovery of Siglecs-5 and -6 (Patel et al., 1999
); the study of natural killer (NK) cell signaling to Siglec-7 (Falco et al., 1999
); the identification of an eosinophil-specific marker to Siglec-8 (Kikly et al., 2000
); and the analysis of docking partners of the Src homology-2 (SH2) domain-containing protein-tyrosine phosphatase-1 (SHP-1) led to hSiglec-XII (originally named S2V) (Yu et al., 2001a
) and mSiglec-E (Yu et al., 2001b
). Several other laboratories also contributed to discovering novel Siglecs and/or their new splice variants (Tchilian et al., 1994
; Takei et al., 1997
; Li et al., 2001
; Aizawa et al., 2002
, 2003
; Connolly et al., 2002
; Kitzig et al., 2002
).
We wish to provide a current and inclusive review of the literature on Siglecs. For further details, readers are referred to the original papers cited, as well as many reviews published in the past decade (Powell and Varki, 1995
; Crocker et al., 1997
; Schnaar et al., 1998
; Munday et al., 1999
; Crocker and Varki, 2001a
,b
; Kelm, 2001
; May and Jones, 2001
; Mingari et al., 2001
; Nitschke et al., 2001
; Angata and Brinkman-Van der Linden, 2002; Crocker, 2002
, 2005
; Crocker and Zhang, 2002
).
Two subfamilies of Siglecs
Siglecs can be broadly divided into an evolutionarily conserved subgroup (Siglecs-1, -2, and -4) and a CD33/Siglec-3-related subgroup (Siglecs-3 and -513 in primates and Siglecs-3 and EH in rodents), which appear to be rapidly evolving (see Table III, for a comparison). Facilitated by the "modular" nature of some Ig domain-encoding exons (i.e., containing multiples of amino acid-encoding triplet codons), some CD33/Siglec-3-related Siglecs (hereafter abbreviated as CD33rSiglecs) appear to have evolved as hybrids of preexisting genes and/or by gene conversion (e.g., primate SIGLEC6) (Angata et al., 2004
). Partly for this reason, sequence comparisons alone do not allow the conclusive designation of the human orthologs of all rodent CD33rSiglec genes, but additional features, such as gene position and exon structure, must be taken into account. Because of these difficulties, rodent Siglecs were assigned alphabetical designations. As CD33rSiglecs from other species are cloned, the situation is likely to get more complicated. Continued communication amongst all interested scientists will be important. (A group e-mail address is currently maintained for this purpose. Interested scientists who wish to have their names added to this discussion list should please contact the authors.)
|
All the above discussion refers to protein names. Communication with the human gene nomenclature committee resulted in names corresponding reasonably well with protein names (http://www.gene.ucl.ac.uk/nomenclature/genefamily/siglec.html). Thus, for example, the gene encoding human Siglec-5 is designated SIGLEC5. The situation with mouse gene nomenclature is less satisfactory at the moment, because gene and protein names do not yet correlate well (http://www.informatics.jax.org/mgihome/nomen/, search for "Siglec").
Structural features common to all Siglecs
All Siglecs are single-pass type 1 integral membrane proteins containing extracellular domains with one (or two, in the case of Siglec-12) unique and homologous N-terminal V-set Ig domain, followed by variable numbers of C2-set Ig domains, ranging from 16 in Sn to 1 in CD33. Most sequence similarity is seen in the V-set Ig domain and with CD33rSiglecs in two cytosolic tyrosine-based motifs (Figure 1). Crystal structures for mouse Siglec-1 (May et al., 1998
) and human Siglec-7 (Alphey et al., 2003
; Dimasi et al., 2004
) indicate that the V-set Ig-like fold has several unusual features, including an intra-beta sheet disulfide and a splitting of the standard beta strand G into two shorter strands (Figure 2 upper panel). These features along with certain amino acid residues (Figures 1 and 2 lower panel) appear to be requirements for Sia recognition. In particular, a conserved arginine residue that forms a salt bridge with the carboxylate of Sia is conserved in all functional Siglecs studied to date (see Figures 1 and 2). Also, all Siglecs (other than Siglec-XII) contain an odd number (typically 3) of Cys residues in the first and second Ig-like domains. A resulting inter-domain disulfide bond between the first and second domains has been demonstrated in MAG (Pedraza et al., 1990
). It remains to be proven whether other Siglecs also have this unusual inter-domain disulfide bond. The fact that effective Sia recognition by the V-set domain of many recombinant Siglecs requires the second C2-set domain suggests that this may be the case.
|
|
A conserved arginine residue required for optimal Sia recognition
The mutation of the Arg residue known to form a salt bridge with the carboxylate of Sia results in a marked reduction of binding capacity of all Siglecs studied to date, with a change to a positively charged Lys being less effective than a change to an Ala (Van der Merwe et al., 1996
; Vinson et al., 1996
; Tang et al., 1997
; Crocker et al., 1999
; Angata and Varki, 2000a
,b
; Angata et al., 2001a
). However, binding is not completely lost with all Arg to Ala mutations (e.g., MAG/Siglec-4, Siglec-6, and Siglec-11). Presumably, in these instances, other aspects of the Sia molecule and/or the underlying glycan chain make a major contribution in recognition. Indeed, this has been directly shown for MAG (Vinson et al., 2001
).
Other amino acid residues involved in interactions with sialylated ligands
Several other amino acid residues have been defined as involved in direct contacts with sialylated ligands. In Sn/Siglec-1, these include Trp2 and Trp106 (Figure 2). The corresponding residues in Siglec-7 appear to be Tyr26 and Trp132. However, the mode of atomic contact between lectin and ligand has not yet been reported for the latter. To identify region(s) responsible for differences in binding specificities, Yamaji et al. prepared a series of V-set domain chimeras between Siglecs-7 and -9 (2002)
. The results, combined with molecular modeling, suggest that specific residues in the CC' loop of the sugar-binding domain play a major role in determining the binding specificities of Siglecs-7 and -9.
Biosynthesis and multimerization
All Siglecs presumably follow a typical biosynthetic pathway for membrane-bound type-1 transmembrane proteins, with polypeptide synthesis and N-glycosylation on endoplasmic reticulum (ER)-bound ribosomes, and transport to the cell surface via the Golgi apparatus. Those Siglecs studied have been shown to bear complex N-glycans, which can themselves be sialylated. Siglecs can exist as monomers, for example, Siglec-7 (Nicoll et al., 1999
), as disulfide-linked dimers, for example, Siglec-5 (Cornish et al., 1998
), or even as higher level multimers, for example, CD22 (Powell et al., 1995
; Zhang and Varki, 2004
; Han et al., 2005
). Because lectin valency can markedly affect functional binding avidity, this issue requires further investigation.
Endocytosis and turnover
Like most cell surface glycoproteins, Siglecs are likely turned over by endocytosis and delivery to lysosomes and/or by cleavage from the cell surface. However, neither possibility has been extensively studied for most Siglecs. Before the recognition that CD22 and CD33 were Siglecs, they had been defined as targets for treatment of lymphomas and leukemias, respectively, using antibody-toxin conjugates (see Medical relevance of Siglecs). In retrospect, success in this approach is likely related to the fact that these molecules undergo rapid antibody-triggered endocytosis. With CD22, the endocytosis rate following antibody triggering is much more rapid than the slower constitutive clearance rate from the cell surface (Zhang and Varki, 2004
). Also, anti-Siglec-5 (Fab)2 fragments were rapidly endocytosed into early endosomes (Lock et al., 2004
). Thus, in addition to inhibitory signaling, there is a potential role in endocytosis/phagocytosis for Siglec-5 and the other CD33rSiglecs (Lock et al., 2004
; Rapoport et al., 2005
). Further work is needed to ascertain whether it is a general feature of Siglecs to undergo antibody-triggered endocytosis. The two cytosolic Tyr-based signaling motifs also conform to the consensus sequence for a known internalization motif "YxxØ," where Ø stands for bulky hydrophobic amino acid (Bonifacino and Traub, 2003
). However, tyrosine phosphorylation of these motifs usually prevents their association with the adaptor protein 2 (AP2) complex mediating internalization. It is also unclear what physiological processes are mimicked by the artificial process of antibody-mediated cross-linking. In particular, what is the relevance to cell surface recognition of Sias? Regardless, it is of interest that other CD33rSiglecs are found on acute myeloid leukemia (AML) cells (Vitale et al., 2001
; Virgo et al., 2003
). Thus, the principle already established in the clinic with anti-CD33 immunotoxins to treat chemotherapy-resistant AMLs might potentially be extended to additional Siglec targets. Another membrane-proximal motif (QRRWKRTQSQQ) described as being relevant to rapid internalization of CD22 (Chan et al., 1998
) is, however, not well conserved between humans and mice.
Does association with cell surface sialylated ligands modulate Siglec turnover?
Genetic elimination of cognate ligands for CD22 or MAG in intact mice is associated with lowered levels of the Siglec on B cells (Hennet et al., 1998
; Collins et al., 2002
, 2004
) or glial cells (Collins et al., 2000
; Vyas et al., 2002
; Sun et al., 2004
), respectivelysuggesting that association with cell surface sialylated ligands might restrict endocytotic clearance of Siglecs and thereby regulate steady state levels. However, this hypothesis was not supported in studies of CD22, wherein constitutive endocytosis rates were unaltered by the mutation of the Arg residue required for Sia recognition, nor by removal of cell surface Sias (Zhang and Varki, 2004
). Thus, it appears more likely that Siglec down-regulation in the absence of cognate ligands represents a long-term resetting of some unknown "steady state" mechanism or feedback loop.
Cell type-specific expression
Each human Siglec is expressed in a cell type-specific fashion, suggesting involvement in discrete functions (see Figure 3, for their distribution in the hematopoietic and immune systems of humans). The selective expression of Sn/Siglec-1, CD22/Siglec-2, and MAG/Siglec-4 on tissue macrophages, mature B cells, and glial cells, respectively, appears to be conserved amongst all mammalian species studied so far. The CD33rSiglecs appear to be variably distributed amongst cell types in the immune system, with significant overlaps (Figure 3). The striking exception are T cells in which very low expression of Siglecs is seen (Razi and Varki, 1999
), primarily Siglec-7 and -9 on a subset of CD8+ cells in some humans (Nicoll et al., 1999
; Zhang et al., 2000
; Ikehara et al., 2004
). One study reported the appearance of CD33 on chronically activated human T cells (Nakamura et al., 1994
); however, this result was not reproducible in our hands (unpublished data). Notably, CD22 expression on a subset of mouse T cells has been recently reported (Sathish et al., 2004
), as well as on basophils (Han et al., 1999
, 2000
) and, surprisingly, on neurons (Mott et al., 2004
). Also, Siglec-6 is expressed in placental trophoblast cells (Patel et al., 1999
).
|
The cell type-specificity of human and mouse CD33rSiglecs often do not follow their presumed orthologous relationships, for example, although human CD33/Siglec-3 is highly expressed on mature monocytes, mouse CD33/Siglec-3 is expressed only on granulocytes (Brinkman-Van der Linden et al., 2003
). Most CD33rSiglecs are found on multiple leukocyte types, to varying extents, for example, human CD33/Siglec-3, -5, -7, -9, and -10 are expressed on circulating monocytes. When monocytes are differentiated into macrophages or stimulated with lipopolysaccharide (LPS), they retained the expression of these Siglecs (Lock et al., 2004
). In comparison, monocyte-derived dendritic cells down-modulated Siglec-7 and -9 following maturation with LPS, and plasmacytoid dendritic cells in human blood expressed only Siglec-5 (Lock et al., 2004
). In a few instances, certain CD33rSiglecs show expression predominantly restricted to one cell type. Although human Siglec-7 was found at low levels on granulocytes and monocytes, relatively high levels are found on a major subset of NK cells and a minor subset of CD8(+) T cells (Nicoll et al., 1999
). Siglec-8 could be detected only on eosinophils (Floyd et al., 2000a
).
Even for some of the well-conserved Siglecs, there are differences between humans and mice. The expression of CD22/Siglec-2 on mouse T cells has already been mentioned. There are also differences between human and rodent Siglec-1/Sn expression patterns in the spleen. Although strongly Sn-positive macrophages form sheaths around capillaries in the perifollicular zone of humans, such sheaths are not observed in rats. Also in contrast to rats, the human marginal zone does not contain Sn-positive macrophages, and marginal metallophilic macrophages are absent in humans. Thus, Sn-positive macrophages and IgM+ IgD memory B lymphocytes both share the marginal zone as a common compartment in rats, although they occupy different compartments in humans (Steiniger et al., 1997
). Also, although only a subset of splenic macrophages are Sn-positive in rats and chimpanzees, expression is almost universal in human spleen macrophages, suggesting a recently evolved condition (Brinkman-Van der Linden et al., 2000
) (see Effects of human-specific Neu5Gc loss on human Siglec biology). Another species difference is sequestration of human CD22 in intracellular compartments at the bone marrow precursor cell stage (Dorken et al., 1986
), a feature apparently not always found in mice (Erickson et al., 1996
; Nitschke et al., 1997
).
Phylogeny and genomic organization
Searches of DNA databanks reveal the expected distant homologies between Siglecs and other IgSF members. However, when probing for canonical functional amino acids of the typical Siglec V-set domain, there is no evidence for such molecules in prokaryotes, fungi, or plants, nor in animals of the Protostome lineage, including organisms for which the complete genomic sequence is available such as the fruit fly Drosophila melanogaster and the nematode Caenorhabditis elegans (Angata and Varki, 2000b
). In contrast, it is easy to find Siglec-like V-set domains in various vertebrate taxa, including fishes (Lehmann et al., 2004
) and birds (Dulac et al., 1992
) (Table IV). These findings fit well with the general distribution of Sias, which are hard to detect in Protostomes, but widespread in the Deuterostomes (vertebrates and some higher invertebrates) (Angata and Varki, 2002
). It will be interesting to know whether Siglecs are present in the "higher" invertebrates that express Sias, such as Echinoderms (sea urchins, starfish, etc.), that is, did the Siglec family emerge at the Cambrian expansion
530 million years ago, at the same time that Sias seem to have "flowered" in the Deuterostome lineage?
|
The most highly conserved Siglec is MAG/Siglec-4, with an easily identifiable ortholog in the Fugu (puffer fish) genome (Lehmann et al., 2004
). This conservation may be related to the fact that membrane proteins in the nervous system make intricate interactive networks, such that amino acid changes are not easily tolerated (Fraser et al., 2002
). The fact that MAG is involved not only in proteinglycan interactions but also in proteinprotein interactions (see Siglec recognition of specific macromolecules) supports this hypothesis. Also, because MAG is expressed solely in nervous system, is it protected from external agents that accelerate molecular evolution, such as pathogens. As for the other two Siglecs conserved in mammalian genomes (Sn/Siglec-1 and CD22/Siglec-2), the nearly complete Fugu genome seemed to lack clear orthologs (further work is needed to confirm this).
In the human genome, the gene for Sn/Siglec-1 is located on chromosome 20, and the genes for CD22/Siglec-2 and MAG/Siglec-4 are located next to each other on chromosome 19 (Mucklow et al., 1995
). Most human CD33rSiglec are clustered together in a
500 Kb region on chromosome 19q13.313.4, and the mouse genes are in a syntenic region of chromosome 7 (Figure 4). Although the mouse apparently has only five functional CD33rSiglecs, humans have eight (there is one lineage-specific Siglec gene found outside the cluster in each speciesfor Siglec-11 in humans and for Siglec-H in mice). Complexities about the ortholog status and cell-type specificity have been discussed above.
As mentioned earlier, a further complication is the "modular" nature of some IgSF genes, wherein one exon typically encodes one Ig-like domain, allowing for the generation of hybrid genes via exon shuffling (Angata et al., 2004
). Additional confusion arises from gene conversion events in several lineages (our unpublished observations). It remains to be seen whether the major differences between humans and mice represent repeated duplications in the primate lineage or a wholesale loss in rodent genomes. Overall, molecular phylogenetic and genome map analyses indicate that there should have been four prototypical CD33rSiglecs in the rodent/primate common ancestor (Table II): (1) CD33-like, with two Ig-like domains; (2) Siglec-E/9-like, with three Ig-like domains, lacking the intron separating leader peptide and Ig1-coding regions found in other Siglec genes; (3) Siglec-F/5/6-like, with four Ig-like domains (Siglec-6 has lost its fourth Ig); and (4) Siglec-G/10-like, with five Ig-like domains. Siglec-7 and Siglec-12 are probably derivatives of the group 2, in that they also lack an intron separating the leader peptide and Ig1-coding regions and showing extensive sequence similarity with the group 2 members at the align-able segments. Siglec-8 also belongs to group 2.
| Ligand recognition by Siglecs |
|---|
Recognition of Sias and their linkages
The first two identified Siglecs (Sn/Siglec-1 and CD22/Siglec-2) had strikingly different binding propertieswith Sn strongly preferring
2-3-linked ligands and CD22 being highly specific for
2-6 linkages. The binding affinity for CD22 was also found to be in the low micromolar range (Powell et al., 1995
Although different laboratories have used different probes and assay formats for analyzing glycan recognition, there is general consensus regarding structures recognized by Sn/Siglec-1, CD22/Siglec-2, and MAG/Siglec-4. Contradicting glycan-recognition specificities are often reported for CD33rSiglecs, which under saturating conditions bind many kinds of probes tested. A further complication arises from the observation that different alternative splicing products may have different specificities, for example, the Siglec-7 form with only two extracellular Ig-like domains preferentially recognizes Neu5Ac
2-6Galß1-4Glc (Angata and Varki, 2000a
), a finding very different from the more promiscuous (but more robust) binding of the full-length form (Nicoll et al., 1999
).
More recent studies have reported that some CD33rSiglecs, under limiting conditions, show certain relative preferences for sialylated ligands (Blixt et al., 2003
; Rapoport et al., 2003
). Therefore, it is possible that a more detailed study with a wider array of sialylated glycans may reveal strong preferences by Siglecs currently considered to be "promiscuous." Indeed, a recent study using a glycan array developed by the Consortium for Functional Glycomics (Blixt et al., 2004
) indicates that 6'sulfo-sialyl-Lewis x (sLex with a sulfate ester at the 6-position of the penultimate galactose [Gal] residue) is a highly selective ligand for human Siglec-8 (Bochner et al., 2005
). Also, although mSiglec-F shows a preference for
2-3-linked Sias (Angata et al., 2001a
), it binds best to 6'sulfo-sLex (Tateno et al., 2005
; data of core H of the Consortium for Functional Glycomics, which has also discovered that hSiglec-9 prefers 6-sulfo-sLex with a sulfate ester at the 6-position of the underlying GlcNAc residue, see http://www.functionalglycomics.org/static/consortium/organization/sciCorescoreh.shtml). Although human Siglec-8 and mouse Siglec-F are not orthologs, they share the same expression pattern (Zhang et al., 2004
) and binding specificity and thus may have developed similar functions in vivo by convergent evolution. Thus, they are "functionally equivalent paralogs" or, more precisely, "isofunctional paralogs" (a term suggested to the authors by Walter Fitch, UC Irvine). Likewise, the presentation of the sialyl-Tn epitope and/or more extended structures that include this motif may be important for optimal recognition by hSiglec-6, as concluded from studies using ovine, bovine, and porcine submaxillary mucins and Chinese hamster ovary (CHO) cells transfected with ST6GalNAc-I and/or the mucin polypeptide MUC1 (Brinkman-Van der Linden and Varki, 2000
). Figure 5 makes an attempt to summarize available information for human Siglecs, realizing that different studies have given somewhat different results for the CD33rSiglecs.
|
Effects of Sia modifications on recognition
Unlike selectins (another class of Sia-binding lectins), which primarily require the negative charge of Sias for recognition, Siglecs seem to recognize many aspects of the Sia molecule (Figure 6). The recognition of the Sia linkage from the 2-position is already discussed above. The carboxyl group of Sias is required for recognition by most Siglecs, as evidenced by studies using glycans with Sias reduced at C1 to an alcohol (Collins et al., 1997b
; Brinkman-Van der Linden and Varki, 2000
). Complementary studies using recombinant Siglec proteins mutated at the "essential arginine" residue, which forms a salt bridge with carboxyl group of Sia, support this conclusion.
|
The glycerol-like side chain of Sias at C7-C9 can be specifically cleaved by mild periodate treatment (Van Lenten and Ashwell, 1971
), and a requirement of this side chain for Siglec binding so far seems to be a general rule (Powell et al., 1993
; Collins et al., 1997a
,b; Barnes et al., 1999
; Angata and Varki, 2000a
,b; Brinkman-Van der Linden and Varki, 2000
), with exceptions such as Siglec-6 (Brinkman-Van der Linden and Varki, 2000
) and Siglec-11 (Angata et al., 2002
). With Sn, the residue that interacts with the side chain is Trp106 (May et al., 1998
), and an equivalent aromatic amino acid residue is conserved in all Siglecs (Figure 1). The equivalent residue in mouse CD22 is also required for recognition (Van der Merwe et al., 1996
).
Although there are many natural modifications of the Sia side chain (Kelm and Schauer, 1997
; Schauer, 2000
; Angata and Varki, 2002
), very few studies deal with effects on Siglec recognition of even the commonest of these, 9-O-acetylation. Published experiments also did not use synthetic probes with O-acetylated structures (the absence of cloned enzymes catalyzing O-acetylation makes it difficult to prepare such probes). Rather, they relied on the presence of O-acetyl groups in naturally occurring glycoconjugates and on removal of these groups using esterases and/or alkaline treatment. The presence of 9-O-acetyl group has strong negative effect on recognition by human CD22 (Sjoberg et al., 1994
) and mouse Sn (Kelm et al., 1994b
; Shi et al., 1996
), and removal of this group by the 9-O-acetylesterase of influenza C virus enhanced recognition. Thus, the presence of not only the intact glycerol-like side chain, but also the absence of any modification of it, was postulated to be a structural requirement for a glycan to be recognized by Siglecs. A study analyzing binding of synthetic C9-substituted 5-N-acetylneuraminic acid (Neu5Ac) to Sn, MAG, and CD22 seemed to confirm this finding (Kelm et al., 1998
). However, studies of some other CD33rSiglecs revealed that Siglec-5 and Siglec-6 do not show different affinities toward 9-O-acetylated or non-O-acetylated ligands and that CD33 shows reduced affinity toward 9-O-acetylated ligands, only in a limited structural context (Brinkman-Van der Linden and Varki, 2000
). Moreover, recent work showed that human CD22 can recognize a synthetic Neu5Ac modified at C9 with bulky hydrophobic moiety, with much higher affinity than unmodified Neu5Ac (Kelm et al., 2002
). It appears that in contrast to 9-O-acetylation, the synthetic modification of the 9-carbon with a nitrogen conserves the hydrogen bond donor potential needed for interactions with Siglecs like Sn. Regardless, one should consider an "intact glycerol-like side chain" and "O-acetylation of the side chain" as separate issues. There also remains a possibility that some yet unstudied Siglecs in some species may specifically recognize natural ligands with Sias modified on the side chain by acetyl, methyl, or sulfate groups.
Although prominent expression of 4-O-acetylation has so far been limited to certain species, such as monotremes, horse, and guinea pig (Kelm and Schauer, 1997
; Schauer, 2000
; Angata and Varki, 2002
), the substrate specificity of murine hepatitis virus hemagglutinin-esterase (Regl et al., 1999
) and recent studies of human samples (Pons et al., 2003
) indicate that such Sias are more widespread than previously thought. The effect of 4-O-acetylation on Siglec recognition has also not been evaluated so far, mostly due to the difficulty in synthesizing and/or obtaining (from natural sources) defined probes containing 4-O-acetylated Sias. There is one study that evaluated the binding of 4-O-methyl Neu5Ac to MAG, showing reduced binding compared with its unmethylated parent compound (Strenge et al., 1998
). However, 4-O-methylated Sias have so far not been found in nature.
The ligands used in binding studies typically contain only Neu5Ac. However, some Siglecs show distinct preferences toward the kind of N-acyl group at C5. Both mouse (Kelm et al., 1994b
, 1998
) and human (Brinkman-Van der Linden et al., 2000
) Sn strongly prefers Neu5Ac over 5-N-glycolylneuraminic acid (Neu5Gc). Although murine CD22 strongly prefers Neu5Gc over Neu5Ac (Kelm et al., 1994b
, 1998
; Van der Merwe et al., 1996
), human CD22 (and that of the closely related great apes) accommodates both types of Sias (Brinkman-Van der Linden et al., 2000
; Collins et al., 2002
). Rodent MAG and avian SMP do not tolerate Neu5Gc (Collins et al., 1997b
, 2000
; Kelm et al., 1998
), correlating with the near-absence of Neu5Gc in the mammalian central nervous system (Varki, 2002
). Although Sn and MAG/SMP do not tolerate the hydroxyl group in Neu5Gc, they can bind synthetic halogenated acetyl residues at the same position. With MAG, N-fluoroacetylneuraminic acid bound about 17-fold better than Neu5Ac. In contrast, although human and murine CD22 both bind Neu5Gc, only human CD22 bound the halogenated compounds (Kelm et al., 1998
).
As for the CD33rSiglecs, our previous work with human CD33, Siglec-5, and Siglec-6 failed to show distinct binding preferences between Neu5Ac and Neu5Gc (Brinkman-Van der Linden et al., 2000
). However, our recent study (Sonnenburg et al., 2004
) suggests that this "promiscuous" recognition of both Neu5Ac and Neu5Gc by human CD33rSiglecs may be an exception among great ape orthologs. Further issues regarding the Neu5Ac/Neu5Gc preference of human versus great ape Siglecs are discussed below.
Rare types and linkages of Sias have not been well-studied for Siglec recognition
Despite extensive studies on Siglec recognition specificity, we have to date only sampled a small portion of the marked structural diversity of Sias in nature (Kelm and Schauer, 1997
; Schauer, 2000
; Angata and Varki, 2002
). For example, rarer modifications, such as 8-O-sulfation and 8-O-methylation, have not been studied at all. Meanwhile, recent data (Pons et al., 2003
) indicate that such modifications are found in many mammalian species including humans (albeit at much lower levels than in echinoderms, where they were first found in larger amounts). Combinations of substitutions can be found on a single Sia molecule as well, but these have also not been studied for Siglec binding.
Siglec recognition of the glycan chain underlying Sias
Basic ligand structures for most Siglecs appear to be sialylated type IIII disaccharides (Galß1-3GlcNAc, Galß1-4GlcNAc, Galß1-3GalNAc, respectively, with terminal Sias attached to Gal), and the modifications of the HexNAc seem to affect recognition by some Siglecs. For example, although MAG binds to Neu5Ac
2-3Galß1-3GalNAc, the modification of the GalNAc 6-position with acidic moieties (e.g., another Sia or a sulfate ester) greatly enhance binding (Collins et al., 1997a
, 1999
). In contrast, some Siglecs that recognize
2-3-linked Sias are negatively affected by a fucose (Fuc) on GlcNAc, that is, in Sia
2-3Galß1-4[Fuc
1-3] GlcNAc (sLex structure) (Brinkman-Van der Linden and Varki, 2000
; Angata et al., 2001a
), with a notable exception of human Siglec-9 (Angata and Varki, 2000b
) and a positive effect on human Siglec-8 (Bochner et al., 2005
). Regardless, although Siglecs may not interfere with selectin-mediated recognition, fucosylation could negatively regulate Siglec binding. Also of interest is human Siglec-7, which is reported to bind Neu5Ac
2-3Galß1-3[Neu5Ac
2-6] HexNAc (Ito et al., 2001
; Miyazaki et al., 2004
), regardless of substitution with GalNAc at 4-position of Gal (Ito et al., 2001
) or Fuc at the 4-position of GlcNAc (Miyazaki et al., 2004
) or absence of Neu5Ac
2-3 (Yamaji et al., 2002
). Human Siglec-7 is also reported to prefer the (Neu5Ac
2-8)n oligomer (Ito et al., 2001
; Yamaji et al., 2002
; Nicoll et al., 2003
). Taken together, the data suggest that this Siglec might recognize a terminal Neu5Ac and an N-acetyl group of penultimate sugar (which may be Neu5Ac, GalNAc, or GlcNAc).
MAG ligand recognition can be influenced by the modification of sugars more proximal to the reducing end. For example, in ganglio-series glycolipids, the modification of Gal at position II with an acidic residue (either Sia or sulfate) enhances binding (Collins et al., 1997a
, 1999
), although this effect is not as prominent as the substitution of the GalNAc at position III. Such extended ligand recognition seems to be an exception among Siglecs. Notably, the binding specificity of human MAG has not been studied so far.
Siglec recognition of specific macromolecules
Several studies have identified apparently specific ligands (or "counter receptors") for Siglecs. These can be classified into ligands that interact with Siglecs via the sialylated glycans expressed on them and those that interact independent of glycans, that is, via protein : protein interactions. The first type includes CD43 and P-selection glycoprotein ligand (PSGL)-1 identified as Sn counter receptors on T cells (Van den Berg et al., 2001
) and CD45 as a CD22 counter receptor on T cells (Sgroi et al., 1993
). The epithelial mucin Muc-1 has also been identified as an Sn counter receptor (Nath et al., 1999
). However, O-sialoglycoprotease treatment of erythroleukemia cells that express glycophorins also resulted in the loss of Sn binding (Shi et al., 1996
). Thus, it may be that any mucin with a high density
2-3-linked Sias will behave as a "high affinity" ligand for Sn (CD43 and PSGL-1 are also heavily O-glycosylated). Similar considerations might explain why serum IgM and haptoglobin, which carry high densities of
2-6linked Sias, appear to be selective ligands for CD22 (Hanasaki et al., 1995a
). Appropriate valency and spacing, rather than a special underlying structure, may also be a key factor in determining binding preference, as shown for CD22CD45 interaction (Bakker et al., 2002
).
MAG has several proposed counter receptors. In addition to the glycolipids GD1a, GT1b, and GD1alpha (Yang et al., 1996
; Vinson et al., 2001
; Vyas et al., 2002
), certain glycoproteins, for example, fibronectin (Strenge et al., 2001
), tenascin-R (Yang et al., 1999
), Nogo66 receptor (Domeniconi et al., 2002
; Liu et al., 2002
), microtubule-associated protein 1B (Franzen et al., 2001
), and neurotrophin receptor (Yamashita et al., 2002
), are also suggested as receptors. Of these, fibronectin interacts with MAG via its glycan chain (Strenge et al., 2001
), tenascin-R and Nogo66 receptor independent of glycans (Yang et al., 1999
; Domeniconi et al., 2002
; Liu et al., 2002
), and neurotrophin receptor via a GT1b molecule, which reportedly makes a stable complex with the receptor (Yamashita et al., 2002
). Additional unidentified glycoproteins are also reported to bind MAG in Sia-dependent manner (De Bellard and Filbin, 1999
; Strenge et al., 1999
). A recent paper indicates the Nogo-66 receptor homolog NgR2 is a Sia-dependent receptor for MAG (Venkatesh et al., 2005
).
Regarding defined ligands for CD33rSiglecs, there is only one reported specific interaction, between Siglec-6 and leptin (Patel et al., 1999
), which is independent of leptin glycosylation (i.e., recombinant leptin produced in bacteria binds to Siglec-6). Although Siglec-6 exhibited tight binding to leptin (Kd = 91 nM), two other CD33rSiglecs showed weak binding with Kd values in the 12 µM range. These data lead us to suggest a role for placental Siglec-6 in human leptin physiology, perhaps as a molecular sink to regulate leptin serum levels (Patel et al., 1999
). However, such leptin binding to Siglec-6 appears to be dependent on an artificially multimeric state of leptin generated in bacteria (E.C.M. Brinkman-Van der Linden and A. Varki, unpublished data). Meanwhile, there has been no definitive report so far on glycan-dependent specific-binding partner(s) for CD33rSiglecs. This may have some relevance to the rapidly evolving functions of CD33rSiglecs, as discussed later.
Sn was also shown to be a counter receptor for the mannose receptor, another macrophage lectin (Martínez-Pomares et al., 1999
). However, this interaction was dependent on sulfated glycans on Sn, as predicted by the binding specificity of the Cys-rich domain of mannose receptor (4-O-sulfo-GalNAc) (Fiete et al., 1998
). The macrophage Gal-binding lectin, another C-type lectin expressed on macrophages, also preferentially bound Sn on macrophages (Kumamoto et al., 2004
). In these instances, Sn is apparently serving as a very large carrier of glycan ligands for these lectins, rather than as a Sia-binding Siglec.
"Masking" and "unmasking" of Siglec-binding sites on cell surfaces
Siglecs were originally thought to be involved in cellcell interactions (Crocker et al., 1994
; Hanasaki et al., 1994
, 1995b
)just like selectins, another family of mammalian cell-surface lectins that recognize Sias (Varki, 1997
; Lowe, 2003
). For example, CD22/Siglec-2 on B cells was assumed to interact with specific counter receptors on the T-cell surface, for example, CD45, thus modulating TB interaction and ensuing signaling events (Stamenkovic et al., 1991
; Sgroi et al., 1993
, 1995
; Sgroi and Stamenkovic, 1994
). However, most Siglecs do not show as strict a glycan recognition specificity as CD22, and vertebrate cell surfaces expressing Siglecs are covered with many glycans containing Sias (i.e., Siglecs are submerged in many potential ligands expressed in cis). In keeping with this, our group discovered that Siglec Sia-binding sites appear to be "masked" in their native state (Razi and Varki, 1998
, 1999
; Brinkman-Van der Linden and Varki, 2003
). Similar phenomena have been reported by others (Braesch-Andersen and Stamenkovic, 1994
; Tropak and Roder, 1997
). A possible exception is Sn/Siglec-1, which extends beyond the glycocalyx, due to its numerous Ig-like domains (Nakamura et al., 2002






