Glycobiology Advance Access originally published online on May 9, 2007
Glycobiology 2007 17(8):868-876; doi:10.1093/glycob/cwm050
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
NetCGlyc 1.0: prediction of mammalian C-mannosylation sites
Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-171 77 Stockholm, Sweden and Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91 Stockholm, Sweden
To whom correspondence should be addressed; Tel: +46 8 52486976; fax: +46 8 313445; e-mail: karin.julenius{at}sbc.su.se
Received on March 21, 2007; revised on May 3, 2007; accepted on May 5, 2007
| Abstract |
|---|
|
|
|---|
C-mannosylation is the attachment of an
-mannopyranose to a tryptophan via a C–C linkage. The sequence WXXW, in which the first Trp becomes mannosylated, has been suggested as a consensus motif for the modification, but only two-thirds of known sites follow this rule. We have gathered a data set of 69 experimentally verified C-mannosylation sites from the literature. We analyzed these for sequence context and found that apart from Trp in position +3, Cys is accepted in the same position. We also find a clear preference in position +1, where a small and/or polar residue (Ser, Ala, Gly, and Thr) is preferred and a Phe or a Leu residue discriminated against. The Protein Data Bank was searched for structural information, and five structures of C-mannosylated proteins were obtained. We showed that modified tryptophan residues are at least partly solvent exposed. A method predicting the location of C-mannosylation sites in proteins was developed using a neural network approach. The best overall network used a 21-residue sequence input window and information on the presence/absence of the WXXW motif. NetCGlyc 1.0 correctly predicts 93% of both positive and negative C-mannosylation sites. This is a significant improvement over the WXXW consensus motif itself, which only identifies 67% of positive sites. NetCGlyc 1.0 is available at http://www.cbs.dtu.dk/services/NetCGlyc/. Using NetCGlyc 1.0, we scanned the human genome and found 2573 exported or transmembrane transcripts with at least one predicted C-mannosylation site. Key words: C-mannosylation / machine learning / neural networks / prediction
| Introduction |
|---|
|
|
|---|
Among posttranslational modifications, protein glycosylation is more abundant and structurally diverse than all the other types combined (Hart 1992
C-Mannosylation is the attachment of an
-mannopyranosyl residue to the indole C2 of tryptophan via a C–C link (Hofsteenge et al. 1994
; de Beer et al. 1995
). The first example of glycosylation of a tryptophan residue (with a hexose of unknown type) was discovered in a neuropeptide from a stick insect (Gade et al. 1992
). Since then, numerous C-mannosylation sites have been found in mammalian proteins, of which the first was in RNase 2 (Hofsteenge et al. 1994
; Furmanek and Hofsteenge 2000
). In all mammalian cases, the glycan has been found to be a single
-mannopyranose. The transfer of mannose to the protein is catalyzed by the enzyme C-mannosyltransferase, and this probably occurs in the endoplasmic reticulum (ER) (Doucey et al. 1998
; Perez-Vilar et al. 2004
). C-Mannosyltransferase activity toward peptides derived from human RNase has been found in Caenorhabditis elegans, amphibians, birds, and mammals, but not in Escherichia coli, insects, or yeast (Krieg et al. 1997
; Doucey et al. 1998
; Furmanek and Hofsteenge 2000
). At present, little is known about the function of C-mannosylation, but two recent studies indicate that it is probably required for proper folding of Cys subdomains in two mucins (Perez-Vilar et al. 2004
) and that it may have a pathological role in diabetic complications under hypoglycemic conditions (Ihara et al. 2005
).
A study involving site-directed mutagenesis of RNase 2 showed that the sequence WXXW, in which the first Trp becomes mannosylated, is the specificity determinant for C-mannosylation (Krieg et al. 1998
). In thrombospondin repeats, containing the motif WXXWXXWXXC (in some cases with one or two of the tryptophan residues substituted by other amino acids), C-mannosylation was found on one, two or all three tryptophans (Hofsteenge et al. 1999
). The shortest peptide still valid as a substrate for C-mannosyltransferase found so far is WAKW (Hartmann and Hofsteenge 2000
). However, in two particular thrombospondin repeats (from complement component C6 and C7), the first tryptophan is mutated to phenylalanine or tyrosine respectively, (Hofsteenge et al. 1999
), and two recently discovered C-mannosylation sites in bovine lens fiber membrane intrinsic protein show no relationship at all to the WXXW motif (Ervin et al. 2005
). This indicates that although the WXXW motif seems to be a sufficient requirement for C-mannosylation, it does not seem to be a necessary one.
According to estimates based on the Swiss–Prot database, more than half of all proteins are glycosylated (Apweiler et al. 1999
). However, despite the fact that human proteins are the most studied of all and that only proteins with some experimental verification are present in Swiss–Prot, only approximately 1.7% of human Swiss–Prot entries have experimentally verified glycosylation site information. To bridge the enormous gap between an exponential increase in gene sequences in databases and a linear increase in proteins investigated for posttranslational modifications, prediction methods are needed. Prediction of glycosylation sites is a valuable tool when trying to characterize a new protein, e.g. for the interpretation of mass spectrometry results. Further, prediction of glycosylation sites is one of the important features when predicting orphan protein function (Jensen et al. 2003
). Since glycosylation may affect the structure of the protein and occurs primarily in surface-exposed regions, predicted glycosylation sites may be used to improve protein structural prediction as well. Prediction can also be useful in protein engineering to incorporate or abolish glycosylation sites and to design competitive inhibitors of glycosyltransferases (Hansen et al. 1998
).
We have analyzed experimentally verified C-mannosylation sites with respect to sequence and structure. We have trained a predictor method, NetCGlyc 1.0, which correctly predicts 93% of both positive and negative C-mannosylation sites. This is a significant improvement over the WXXW consensus motif, which identifies only 67% of the positive sites. NetCGlyc 1.0 is publicly available at http://www.cbs.dtu.dk/services/NetCGlyc/. Using NetCGlyc 1.0, we scanned the human genome for predicted C-mannosylation sites.
| Results |
|---|
|
|
|---|
Sequence analysis
From the literature, we gathered a dataset of 12 native proteins and 27 naturally occurring or engineered mutants/peptides that contain a total of 69 experimentally verified C-mannosylation sites and 88 nonmodified sites. The sequence neighborhood around the sites can be illustrated using sequence logos based on Shannon information content (Schneider and Stephens 1990
|
The Kullback–Leibler information logo (Figure 1B) is based on both positive and negative sites. Residues over-represented in positive sites are shown as normal letters and those that are over-represented in negative sites are shown as upside-down letters. Note that the modified tryptophan residue in the middle is entirely cancelled out since both positive and negative sites have a tryptophan at that position. Not surprisingly, the strongest preference is again found at position +3, where tryptophan and, to some extent, cysteine is preferred and most other residues are discriminated against. We found that phenylalanine and leucine, both large and hydrophobic, are not tolerated at position +1 of the positive sites. We also found a number of residues at different positions, even surprisingly far away from the attachment site, that seem to be inconsistent with C-mannosylation: arginine/lysine at position –9, glutamine at positions –6 and 4, phenylalanine at position –5, histidine at position 5, aspartic acid at position 9, and alanine at position 10. Whether these are true reflections on the requirements for C-mannosylation or a result of insufficient sequence sampling in the dataset is hard to say at this point.
Structural analysis
Using FeatureMap3D (Wernersson et al. 2006
), we were able to identify five nuclear magnetic resonance (NMR) or X-ray structures in the worldwide Protein Data Bank (Berman et al. 2006
) showing the structure of C-mannosylated proteins (Table I). Two of the structures (1SZL and 1LSL) show the structure of thrombospondin repeats. The fold of a thrombospondin repeat contains two ß strands along with a third, fairly extended, but not hydrogen-bonded stretch running parallel to the ß sheet (Figure 2B) (Tan et al. 2002
; Paakkonen et al. 2006
). The three, potentially glycosylated, tryptophans are situated in the non-ß stretch. The aromatic rings of the three tryptophans are parallel to each other at a C
–C
distance of 8.3–8.5 Å, which is too long to allow aromatic stacking (
–
interactions). In two particular thrombospondin repeats (from complement components C6 and C7), C-mannosylation is found in this structural context without the presence of a true WXXW motif. Instead, the first tryptophan is mutated to phenylalanine or tyrosine, respectively.
|
|
Two structures show similar local structures around the C-mannosylation site compared with the thrombospondin repeats, 1EER (Figure 2A) and 1F42 (not shown). Again, the glycosylated tryptophan is situated in a fairly extended, non-hydrogen-bonded stretch running parallel to a ß strand (Syed et al. 1998
–C
distance of 8.6 and 8.7 Å, respectively.
One structure shows an entirely different local structure, 2BZZ (Figure 2C). The two tryptophans are located in an
-helix and rotated so that the aromatic rings are face to edge at a C
–C
distance of 5.1 Å, indicating aromatic stacking between the rings (Baker et al. 2006
). The protein has been co-crystallized with a ligand (not shown), but a ligand-free structure not available in the Protein Data Bank shows very similar orientations of the tryptophan rings (Mosimann et al. 1996
). Unfortunately, no structure was found for the only protein where the C-mannosylation sites are completely unrelated to the WXXW motif, lens fiber membrane intrinsic protein.
On the basis of the available structures, we found that the accessible surface according to DSSP (digital shape sampling and processing) is 30–147 Å2 (mean, 71 Å2) for glycosylated tryptophans and 0–85 Å2 (mean, 39 Å2) for nonglycosylated tryptophans, showing that modified tryptophans are, on average, more solvent exposed, and all of them are solvent exposed to a certain extent.
Prediction of C-mannosylation sites
Before developing a predictor using machine learning, we investigated what prediction performance is obtained when searching for the simple consensus pattern suggested: WXXW, where the first tryptophan would be glycosylated (Krieg et al. 1998
). This is the approach used so far and must ultimately be out-performed for a more complex machine learning approach to be worthwhile. In our dataset consisting of 69 positive and 88 negative sites, the consensus pattern predictor correctly identifies 67% of the positive sites and 93% of the negative sites (see Table II). This means that the consensus rule does not apply for as much as one-third of the positive sites in our data set. Since most experimental studies have so far been directed toward sites that follow the WXXW rule, our data set is, if anything, biased toward sites that do follow it. The number of true sites missed when using the consensus pattern predictor could therefore be much higher. As a test we trained neural networks based only on the information of whether the WXXW pattern was present or not. Not surprisingly, these networks all had predictive performances identical to the consensus predictor itself.
|
To develop a more complex predictor, we used a neural network strategy developed for the prediction of mucin-type glycosylation sites (Julenius et al. 2005
|
To find the best possible combination of features, we used a greedy strategy, trying to combine what appeared to be good input information when training the single feature networks. We also combined the information on the presence/absence of the WXXW motif. For feature combinations that seemed promising, networks with a varying number of hidden neurons (different network complexity) were trained. The very best combination was sparse encoding in a 21-residue window, and information on the presence/absence of the WXXW motif, using eight hidden neurons. This network correctly identifies 93% of both the positive and the negative sites (see Table II). Figure 4 shows the trade-off between making many positive predictions, of which some are false, and making fewer predictions and thereby missing some. A curve reaching far up into the upper left corner is to be preferred and completely random designation would perform along the diagonal. ROC (receiver operating curves) curves are widely used in describing the quality of a classification method such as a predictor or a medical diagnostic tool. For comparison, the performance of the consensus pattern search is marked with X.
|
Scanning the human genome
All human transcripts with signal peptides and/or transmembrane helices were scanned with NetCGlyc 1.0 for predicted C-mannosylation sites. Since C-mannosylation occurs in the ER, only tryptophans either in extracellular proteins or on the extracellular side of membrane proteins can be mannosylated. Of the 14 554 downloaded transcripts, 2573 (18%) were predicted to contain at least one C-mannosylation site. These proteins were investigated for gene ontology (GO) annotation, and the results are shown in Table III. An enrichment factor >1 indicates that the term is over-represented for the C-mannosylated proteins. Of the 3713 predicted sites, 1366 were located at the first tryptophan in a WXXW motif, 214 were located at the second tryptophan in a WXXW motif, and 2133 were found in different sequence contexts.
|
Investigating proteins with more than five predicted sites, we found that proteins with thrombospondin repeats are highly over-represented (e.g. semaphorins, brain-specific angiogenesis inhibitors, ADAMTS's) as would be expected. More surprisingly, we also found many proteins related to low-density lipoprotein (LDL)-receptor. Looking more closely at this class of proteins, we find that a substantial number of LDL-receptor class B repeats, also called YWTD repeats, have an additional tryptophan, making the repeated sequence YWTDW. According to PROSITE (http://www.expasy.org/prosite/), there are 47 such YWTDW repeats in the human proteome, and our predictor predicts most of these to be positive for C-mannosylation. There are three available crystal structures (PDB ID 1IJQ [PDB] , 1NPE, and 1N7D) of LDL-receptor class B repeats from two different proteins (human LDL-receptor and mouse nidogen 1). In both proteins, six repeats are packed very closely together in a six-bladed ß-propeller (Jeon et al. 2001
One of the characteristic structural features of type I cytokine receptors is a WSXWS motif in the C-terminal domain (Bazan 1990
). This has, at least in the case of the erythropoietin receptor, been shown to be C-mannosylated (Furmanek et al. 2003
). We extracted 29 human protein sequences with annotated WSXWS motifs from Swiss–Prot and performed prediction of C-mannosylation sites using NetCGlyc 1.0. Twenty-seven of 29 proteins have at least one predicted site. The two exceptions were growth hormone receptor (P10912
[GenBank]
) and interleukin-3 receptor alpha chain (P26951
[GenBank]
), both with degenerated motifs (YGEFS and LSAWS respectively). Interestingly, several receptors contain more than one predicted C-mannosylation site. Interleukin-6 receptor subunit beta (P40189
[GenBank]
), leptin receptor (P48357
[GenBank]
), and leukemia inhibitory factor receptor (P42702
[GenBank]
) each contain as many as four predicted sites and what seem to be two WSXWS motifs. Type I cytokine receptors are classified as GO:0004896 (hematopoietin/interferon class cytokine receptor activity), which explains the high enrichment factor (4.09) of this GO term among the human transcripts predicted to be C-mannosylated (Table III).
| Discussion |
|---|
|
|
|---|
The structural analysis indicates that aromatic stacking may play a role in the substrate recognition of C-mannosyltransferase, at least in the case of substrates that contain the WXXW motif. Modified tryptophan residues are typically at least partly solvent exposed, whereas nonmodified tryptophans may be completely buried in the interior of the protein. Previous studies have shown C-mannosylation to take place very early, probably even before the folding of the protein (Doucey et al. 1998
The results of the training on predicted features (Figure 3) are in agreement with the results of the structural analysis. The fact that the predicted surface accessibility proved to be good input information for the network method can be explained by the fact that glycosylated tryptophans are more solvent accessible than the tryptophans that are not modified. Predicted disorder according to DSSP loop–coil definition was much better input information than either of the other two predicted disorder measures. In four of the five available structures, the glycosylated tryptophan is located in a fairly extended, non-hydrogen-bonded stretch. These stretches are classified as loop or coil according to the DSSP definition, but are not particularly disordered according to the two other definitions, which require the loop–coil to have elevated temperature factor, "hot loops", or atom coordinates to be missing in the structure. It is hardly surprising that the prediction of a disorder definition that seems to apply to a large part of glycosylated tryptophans is good input information to the predictor network.
We were able to develop a predictor that predicts more sites than the WXXW consensus rule (higher sensitivity) without making any additional false predictions. Obtaining higher sensitivity without loss of specificity is usually very difficult, but can probably be explained by the fact that there is a lot of sequence information in various positions of the aligned sites (Figure 1) in addition to the tryptophan in position +3. Our method is able to use these additional sites in an optimal way. We would like to point out that although this is the case, NetCGlyc 1.0 will work best on WXXW-related sites since most of the sites in the training examples were of this type. If future experiments show that C-mannosylation is common in other sequence contexts as well, NetCGlyc will be retrained to accommodate this.
By training a predictor, NetCGlyc 1.0, and making it publicly available among our other predictors for different types of glycosylation sites at our web page, www.cbs.dtu.dk/services, we hope to bring attention to this newly discovered type of glycosylation. The glycan is very small, only one hexose, which is probably why the modification was left undiscovered for so long. One hexose would not change the migration rate on sodium dodecylsulphate–polyacrylamide gel electrophoresis enough to attract attention to its presence, compared with the large glycans of N-glycosylation and proteoglycans, or the numerous glycans of a mucin. The two newly discovered sites in lens fiber membrane intrinsic protein (Ervin et al. 2005
) indicate that although tryptophan is the rarest of the amino acid residues, its modification with
-mannopyranose does not require the presence of a WXXW motif and may actually be more common than we think.
| Materials and methods |
|---|
|
|
|---|
Dataset
Experimentally verified mammalian C-mannosylation sites were extracted from O-GlycBase v6.00 (www.cbs.dtu.dk/databases/OGLYCBASE/) (Gupta et al. 1999
Neural network training
For readability, this section was shortened to suit the average reader of Glycobiology. The Supplementary data provide details of sequence encoding, feature encoding, and neural networks.
A neural network does not understand letters, so the amino acid sequence and different features must be translated into numbers. This is called encoding, and can be done in a number of ways. Each number that is presented to the neural network constitutes what is called an input neuron. The goal is to provide the network with as much information as possible while still keeping the number of input neurons as low as possible.
- Sparse encoding (Qian and Sejnowski 1988
; Hertz et al. 1991
) is the conventional way to convert an amino acid sequence into numerical form.
- With profile encoding, the input for each amino acid consisted of the corresponding row in the BLOSUM62 matrix (Henikoff and Henikoff 1992
).
- With PSI–BLAST encoding, the input for each amino acid consisted of the corresponding row in the position-specific scoring matrix computed from three cycles of PSI–BLAST (Altschul et al. 1997
).
- The amino acid composition was calculated for a sequence window around each particular site.
- Surface accessibility was predicted using a neural network method called surfg (Hansen et al. 1998
).
- Secondary structure was predicted using PSIPRED (Jones 1999
; McGuffin et al. 2000
) using position-specific scoring matrices computed from three cycles of PSI–BLAST (Altschul et al. 1997
).
- Protein disorder was predicted using DisEMBL (Linding et al. 2003
). DisEMBL predicts disorder according to three different definitions: (1) loops–coils as defined by DSSP (Kabsch and Sander 1983
); (2) hot loops, being loops according to DSSP with a high degree of mobility as determined from C
temperature factors; and (3) missing coordinates in X-ray structures.
The neural networks were of the two-layer feed-forward type, trained by standard back propagation. Network complexity was varied by changing the number of neurons in the input layer as well as in the hidden layer to find the optimal complexity for this particular prediction problem. This is important, since a network with too little complexity (too few neurons) will lack the ability to learn the training examples, and a network with too much complexity (too many neurons) will learn the examples too well and lose the ability to make predictions for examples that were not in the training set (the ability to generalize). This second problem is sometimes called over-training and is one of the reasons why it is so important to make sure that the examples in the test set are different from and unrelated to the examples in the training set. If the sets are unrelated to each other, the performance on the test set will decrease when over-training occurs and if the problem can be detected, it can also be avoided. The risk of over-training increases with decreasing data set size.
The predictive performance was monitored using the Matthews correlation coefficient (Matthews 1975
) during training and test of the networks
|
|
where tp is the number of correctly predicted positive sites (true positives), tn the number of correctly predicted negative sites (true negatives), fn the number of sites falsely predicted to be negative (false negatives), and fp the number of sites falsely predicted to be positive (false positives). The Matthews correlation coefficient will always be a value between –1 and 1, where a predictor that is always wrong will have a correlation coefficient of –1, one that is always right will have a correlation coefficient of 1, and one that makes random guesses will have a correlation coefficient of 0. It takes into account the performance on both the positive and the negative sites and is widely used for classification problems such as this one.
The fraction of positive sites correctly predicted, the positive site sensitivity, Sn,pos, was computed as
|
|
The fraction of all positive classifications that are correct, the specificity Sp, was computed as
|
|
|
|
A region of 21 residues around each (positive or negative) site was extracted (10 amino acids on each side of the tryptophan). The sites were aligned according to the central tryptophan and an unrooted neighbor-joining tree was constructed using CLUSTAL X (Thompson et al. 1997
). From this tree, groups of closely related sites were identified. One or more of these groups were collected into larger sets, in total six, each containing both positive and negative sites and of roughly equal size. Between sites belonging to different sets, sequence identity did not exceed 50%. The six sets were used so that every network was trained six times, using five sets as training sets and one set as the test set. The reported cross-validation performance is the joint performance of the six resulting networks on their respective test sets.
Scanning the humane genome
Sequences and their GO annotations for all human protein transcripts (build NCBI36) with either signal peptide and/or transmembrane helices were downloaded from http://www.ensembl.org using the EnsMart system. Looking at GO annotations, "cellular component" terms were ignored. We compared the occurrences of different GO terms of the proteins predicted to be C-mannosylated with the occurrence of the different GO terms of all the protein transcripts, since some GO terms are more frequently occurring than others. The enrichment factor was calculated as the ratio between the occurrence of the term for the C-mannosylated sequences and the occurrence of the term for a random sample of the same size. An enrichment factor >1 indicates that the term is over-represented for the C-mannosylated proteins.
Proteins with annotated WSXWS motifs were extracted from Swiss–Prot by searching for the term "WSXWS motif." in the "features" section of all entries. Human proteins were identified using the last part of the entry name, which is "_HUMAN" for human proteins. In total, 29 human type I cytokine receptors were identified in this way.
| Supplementary data |
|---|
|
|
|---|
Supplementary data are available at Glycobiology online (http://www.glycob.oxfordjournals.org/).
| Conflict of interest statement |
|---|
|
|
|---|
None declared.
| Acknowledgments |
|---|
|
|
|---|
The author thanks Kristoffer Rapacki for technical assistance in making the web predictor operational, Anne Mølgaard for help with the analysis of protein structures, and Timo Pikkarainen for critical reading of the manuscript. This work was supported by the Knut and Alice Wallenberg foundation.
| Abbreviations |
|---|
DSSP, digital shape sampling and processing; ER, endoplasmic reticulum; GO, gene ontology; LDL, low-density lipoprotein; NMR, nuclear magnetic resonance; PSI–BLAST, position specific iterative–basic local alignment search tool; RNase 2, human ribonuclease 2; ROC curve, receiver operating characteristics curve.
| References |
|---|
|
|
|---|
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI–BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 25:3389–3402.
Apweiler R, Hermjakob H, Sharon N. On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta (1999) 1473:4–8.[Medline]
Baker MD, Holloway DE, Swaminathan GJ, Acharya KR. Crystal structures of eosinophil-derived neurotoxin (EDN) in complex with the inhibitors 5'-ATP, Ap3A, Ap4A, and Ap5A. Biochemistry (2006) 45:416–426.[CrossRef][Medline]
Bazan JF. Structural design and molecular evolution of a cytokine receptor superfamily. Proc Natl Acad Sci USA (1990) 87:6934–6938.
Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. (2006) 35:D301–303.[CrossRef][Web of Science][Medline]
Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res (2003) 31:365–370.
de Beer T, Vliegenthart JF, Loffler A, Hofsteenge J. The hexopyranosyl residue that is C-glycosidically linked to the side chain of tryptophan-7 in human RNase Us is alpha-mannopyranose. Biochemistry (1995) 34:11785–11789.[CrossRef][Medline]
Doucey MA, Hess D, Cacan R, Hofsteenge J. Protein C-mannosylation is enzyme-catalysed and uses dolichyl-phosphate-mannose as a precursor. Mol Biol Cell (1998) 9:291–300.
Ervin LA, Ball LE, Crouch RK, Schey KL. Phosphorylation and glycosylation of bovine lens MP20. Invest Ophthalmol Vis Sci (2005) 46:627–635.
Furmanek A, Hess D, Rogniaux H, Hofsteenge J. The WSAWS motif is C-hexosylated in a soluble form of the erythropoietin receptor. Biochemistry (2003) 42:8452–8458.[CrossRef][Medline]
Furmanek A, Hofsteenge J. Protein C-mannosylation: facts and questions. Acta Biochim Pol (2000) 47:781–789.[Web of Science][Medline]
Gade G, Kellner R, Rinehart KL, Proefke ML. A tryptophan-substituted member of the AKH/RPCH family isolated from a stick insect corpus cardiacum. Biochem Biophys Res Commun. (1992) 189:1303–1309.[CrossRef][Web of Science][Medline]
Gupta R, Birch H, Rapacki K, Brunak S, Hansen JE. O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins. Nucleic Acids Res (1999) 27:370–372.
Hansen JE, Lund O, Tolstrup N, Gooley AA, Williams KL, Brunak S. NetOglyc: prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility. Glycoconj J (1998) 15:115–130.[CrossRef][Web of Science][Medline]
Hart GW. Glycosylation. Curr Opin Cell Biol (1992) 4:1017–1023.[CrossRef][Medline]
Hartmann S, Hofsteenge J. Properdin, the positive regulator of complement, is highly C-mannosylated. J Biol Chem (2000) 275:28569–28574.
Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA (1992) 89:10915–10919.
Hertz J, Krogh A, Palmer R. Introduction to the theory of neural computation. (1991) Redwood City, CA: Addison-Wesley.
Hofsteenge J, Blommers M, Hess D, Furmanek A, Miroshnichenko O. The four terminal components of the complement system are C-mannosylated on multiple tryptophan residues. J Biol Chem (1999) 274:32786–32794.
Hofsteenge J, Muller DR, de Beer T, Loffler A, Richter WJ, Vliegenthart JF. New type of linkage between a carbohydrate and a protein: C-glycosylation of a specific tryptophan residue in human RNase Us. Biochemistry (1994) 33:13524–13530.[CrossRef][Medline]
Ihara Y, Manabe S, Kanda M, Kawano H, Nakayama T, Sekine I, Kondo T, Ito Y. Increased expression of protein C-mannosylation in the aortic vessels of diabetic Zucker rats. Glycobiology (2005) 15:383–392.
Jensen LJ, Gupta R, Staerfeldt HH, Brunak S. Prediction of human protein function according to Gene Ontology categories. Bioinformatics (2003) 19:635–642.
Jeon H, Meng W, Takagi J, Eck MJ, Springer TA, Blacklow SC. Implications for familial hypercholesterolemia from the structure of the LDL receptor YWTD-EGF domain pair. Nat Struct Biol (2001) 8:499–504.[CrossRef][Web of Science][Medline]
Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol (1999) 292:195–202.[CrossRef][Web of Science][Medline]
Julenius K, Molgaard A, Gupta R, Brunak S. Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology (2005) 15:153–164.
Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers (1983) 22:2577–2637.[CrossRef][Web of Science][Medline]
Kesmir C, van Noort V, de Boer RJ, Hogeweg P. Bioinformatic analysis of functional differences between the immunoproteasome and the constitutive proteasome. Immunogenetics (2003) 55:437–449.[CrossRef][Web of Science][Medline]
Krieg J, Glasner W, Vicentini A, Doucey MA, Loffler A, Hess D, Hofsteenge J. C-Mannosylation of human RNase 2 is an intracellular process performed by a variety of cultured cells. J Biol Chem (1997) 272:26687–26692.
Krieg J, Hartmann S, Vicentini A, Glasner W, Hess D, Hofsteenge J. Recognition signal for C-mannosylation of Trp-7 in RNase 2 consists of sequence Trp-x-x-Trp. Mol Biol Cell (1998) 9:301–309.
Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, Russell RB. Protein disorder prediction: implications for structural proteomics. Structure (2003) 11:1453–1459.[Medline]
Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta (1975) 405:442–451.[Medline]
McGuffin LJ, Bryson K, Jones DT. The PSIPRED protein structure prediction server. Bioinformatics (2000) 16:404–405.
Mosimann SC, Newton DL, Youle RJ, James MN. X-ray crystallographic structure of recombinant eosinophil-derived neurotoxin at 1.83 Å resolution. J Mol Biol (1996) 260:540–552.[CrossRef][Web of Science][Medline]
Ohtsubo K, Marth JD. Glycosylation in cellular mechanisms of health and disease. Cell (2006) 126:855–867.[CrossRef][Web of Science][Medline]
Paakkonen K, Tossavainen H, Permi P, Rakkolainen H, Rauvala H, Raulo E, Kilpelainen I, Guntert P. Solution structures of the first and fourth TSR domains of F-spondin. Proteins (2006) 64:665–672.[CrossRef][Web of Science][Medline]
Perez-Vilar J, Randell SH, Boucher RC. C-Mannosylation of MUC5AC and MUC5B Cys subdomains. Glycobiology (2004) 14:325–337.
Qian N, Sejnowski TJ. Predicting the secondary structure of globular proteins using neural network models. J Mol Biol (1988) 202:865–884.[CrossRef][Web of Science][Medline]
Rudenko G, Henry L, Henderson K, Ichtchenko K, Brown MS, Goldstein JL, Deisenhofer J. Structure of the LDL receptor extracellular domain at endosomal pH. Science (2002) 298:2353–2358.
Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res (1990) 18:6097–6100.
Seitz O. Glycopeptide synthesis and the effects of glycosylation on protein structure and activity. Chembiochem (2000) 1:214–246.[CrossRef][Medline]
Spiro RG. Protein glycosylation: nature, distribution, enzymatic formation, and disease implications of glycopeptide bonds. Glycobiology (2002) 12.
Syed RS, Reid SW, Li C, Cheetham JC, Aoki KH, Liu B, Zhan H, Osslund TD, Chirino AJ, Zhang J, et al. Efficiency of signalling through cytokine receptors depends critically on receptor orientation. Nature (1998) 395:511–516.[CrossRef][Medline]
Takagi J, Yang Y, Liu JH, Wang JH, Springer TA. Complex between nidogen and laminin fragments reveals a paradigmatic beta-propeller interface. Nature (2003) 424:969–974.[CrossRef][Medline]
Tan K, Duquette M, Liu JH, Dong Y, Zhang R, Joachimiak A, Lawler J, Wang JH. Crystal structure of the TSP-1 type 1 repeats: a novel layered fold and its biological implication. J Cell Biol (2002) 159:373–382.
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res (1997) 25:4876–4882.
Varki A. Biological roles of oligosaccharides: all of the theories are correct. Glycobiology (1993) 3:97–130.
Wernersson R, Rapacki K, Staerfeldt HH, Sackett PW, Molgaard A. FeatureMap3D—a tool to map protein features and sequence conservation onto homologous structures in the PDB. Nucleic Acids Res (2006) 34:W84–W88.
Yoon C, Johnston SC, Tang J, Stahl M, Tobin JF, Somers WS. Charged residues dominate a unique interlocking topography in the heterodimeric cytokine interleukin-12. EMBO J (2000) 19:3530–3541.[CrossRef][Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. A Reeves, D. Talavera, and J. M Thornton Genome and proteome annotation: organization, interpretation and integration J R Soc Interface, February 6, 2009; 6(31): 129 - 147. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




