Glycobiology Advance Access originally published online on July 24, 2003
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Glycobiology, 2003, Vol. 13, No. 10 707-712
© 2003 Oxford University Press
Fold recognition analysis of glycosyltransferase families: further members of structural superfamilies
2 Embrapa Genetic Resources and Biotechnology, Cenargen/Embrapa, Brasilia-DF, Brazil; and 3 Universidade Católica de Brasília, Pós-Graduação em Ciências Genômicas e Biotecnologia, SGAN Quadra 916, Módulo B, Av. W5 Norte70. 790-160 Brasília-DF, Brazil
Received on April 24, 2003; revised on June 27, 2003; accepted on June 27, 2003
| Abstract |
|---|
|
|
|---|
Glycosyltransferases (GTs) are diverse enzymes organized into 65 families. X-ray crystallography and in silico studies have shown many of these to belong to two structural superfamilies: GT-A and GT-B. Through application of fold recognition and iterated sequence searches, we demonstrate that families 60, 62, and 64 may also be grouped into the GT-A fold superfamily. Analysis of conserved acidic residues suggests that catalytic sites are better conserved in superfamily GT-B than in GT-A. Although 26% and 29% of GT families may now be confidently placed in superfamilies GT-A and GT-B, respectively, the remaining 45% of families bear no discernible resemblance to either superfamily, which, given the sensitivity of modern fold recognition methods, suggests the existence of novel structural scaffolds associated with GT activity. Furthermore, bioinformatics studies indicate the apparent ease with which mechanisminverting or retainingmay change during evolution.
Key words: evolutionary relationships / fold recognition / glycosyltransferases / MurG / SpsA
| Introduction |
|---|
|
|
|---|
Glycosyltransferases (GTs; EC 2.4..) constitute a large group of enzymes that are involved in the biosynthesis of oligosaccharides and polysaccharides and that act through the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. Particularly abundant are a group of enzymes, present in both prokaryotes and eukaryotes, that utilize an activated nucleotide sugar as a donor and plays significant roles in important biological processes (Verbert and Cacan, 1999
As of June 2003, the carbohydrate active enzymes (CAZY) classification contains 65 GT families, defined on the basis of sequence similarity (Coutinho and Henrissat, 1999
; Coutinho et al., 2003
). With the ongoing determinations of GTases structures (for review, see Unligil and Rini, 2000
), and computational analyses (Wrabl and Grishin, 2001
; Breton et al., 2002
) a picture has emerged of two GT superfamilies, each containing various families, which do not necessarily share significant sequence similarity. The most studied family within superfamily GT-A is family 2, which contains the inverting glycosyltransferase SpsA from Bacillus subtilis (Figure 1) (Charnock and Davies, 1999
; Tarbouriech et al., 2001
). This enzyme acts in spore coat formation, and its homologes include cellulose synthase and numerous proteins involved in bacterial cell surface glycosylation (as reviewed by Unligil and Rini, 2000
). The structure of SpsA is a single domain consisting of parallel ß-strands flanked on either side by
-helices (Figure 1) (Charnock and Davies, 1999
). Within GT-B, family 28 has received most attention, particularly the MurG protein. This enzyme is an N-acetylglucosaminyltransferase involved in the intracellular phase of bacterial peptidoglycan biosynthesis that catalyzes the transfer of N-acetyl-D-glucosamine (GlcNAc) from UDP-GlcNAc to the C4 hydroxyl group of a lipid-linked N-acetylmuramoyl pentapeptide (Ikeda et al., 1990
). The structure of MurG contains two domains, each of the Rossmann fold
/ß open sheet structure separated by a deep cleft in which the substrates bind (Figure 1) (Ha et al., 2000
, Hu et al., 2003
).
|
Here we present the results of an investigation by fold recognition of GT families of still unknown catalytic domain architecture by which families 60, 62, and 64 were grouped to the GT-A fold superfamily. The failure to assign folds to catalytic domains in the remaining families, despite the application of a battery of modern fold recognition methods, suggests that, contrary to the general supposition, other folds are likely to be associated with GT activity. Surprisingly, just a single iteration of PSI-BLAST is required to demonstrate homology between inverting and retaining GT families.
| Results and discussion |
|---|
|
|
|---|
Fold recognition analysis and evolutionary relationships by PSI-BLAST
Representative sequences (see Table I) from GT families for which the catalytic domain structure was not known were submitted for analysis at the Meta-Server (Bujnicki et al., 2001a
|
Fold recognition analyses showed that families 12, 16, 21, 25, 27, 45, 54, 55, 60, 62, and 64 matched the structure of SpsA (PDB: 1QG8), a GT from family 2, which belongs to superfamily GT-A. Relations between 2 and 12, 16, 21, 25, 27, 45, 54, and 55 have been previously noted (Breton et al., 2002
A striking conclusion is that even after comprehensive fold recognition, no folds could be assigned to 45% of GT families, containing 39% of known GT sequences. There are two alternative explanations. First, a large number of extremely divergent GT families, in fact possessing known GT folds, are present in the CAZY database, the catalytic domain folds of which are not identifiable, even after the application of advanced fold recognition tools (Fischer and Rychlewski, 2003
). However, given the success of fold recognition when applied to carbohydrate active enzymes (Rigden and Franco, 2002
; Rigden, 2002
) it is perhaps more likely to suppose the existence of one or more different folds, not currently associated with GT activity. Given the importance of GTs in general (Unligil and Rini, 2000
) these would be very significant targets for structural determination.
Because fold recognition can produce significant results for structural analogs as well as distant structural homologs, we sought further evidence regarding evolutionary relationships with sensitive sequence comparisons carried out using PSI-BLAST. The results (Figure 2) provide evidence that most of the families recently identified as members of GT-A (Breton et al., 2002
) (Table I) share a common evolutionary origin, although in four cases (families 16, 54, 55, and 60) no relationships could be established (Figure 2). None of the illustrated families leaves the network when only results obtained at the more conservative E-value threshold of 0.001 are considered.
|
Another important question relates to the evolution of the enzymatic mechanismwith what ease can inverting enzymes evolve into retaining ones, and vice versa? As shown in Figure 2, only a single round of PSI-BLAST was sufficient to demonstrate a relationship between family 2 (inverting) and family 45 (retaining). Similarly, within superfamily GT-B, just two iterations of PSI-BLAST at 0.001 are necessary to demonstrate a relationship between retaining family 5 and inverting family 19 (Wrabl and Grishin, 2001
Conservation of residues important to enzymatic activity
Current knowledge of GT mechanisms suggests the obligatory involvement of conserved catalytic acidic residues in inverting enzymes. However, as reviewed by Davies and Henrissat (2002)
, the mechanism of retaining GTs remains obscure. A mechanism analogous to that of retaining glycoside hydrolases would involve a conserved basic residue: Alternative mechanisms might not, although stabilization of the positively charged intermediates might be effectively carried out by acidic residues. It has also often proved difficult, particularly within superfamily GT-B, to locate these catalytic residues, even with the benefit of structural information. For example, based on the structure of the inverting enzyme MurG complexed to UDP-GlcNAc, several acidic residues were mutated, but the loss of none of these abolished catalytic activity (Hu et al., 2003
). Thus no aspartate or glutamate has so far been definitively identified as catalytic in the GT-B fold superfamily.
Using the fold recognition alignments totally (100%) and strongly (90%) conserved residues in GT families were identified and their positions compared within superfamilies (Figure 3A, 3B). In this way, tendencies in conserved acidic residue positioning were sought that might help locate the catalytic acidic residues. However, three possible complications had to be borne in mindinaccuracies in the fold recognition alignments, existence of divergent nonexpressed genome sequences, and presence of noncatalytic proteins within GT families (Unligil and Rini, 2000
). It was also necessary to identify conserved acidic residues with known noncatalytic roles.
|
Within the GT-A family there is a strong candidate for catalytic residueAsp191 (SpsA numbering; Tarbouriech et al., 2001
Another conserved Asp (numbered 39 in SpsA), located at the end of the strand ß-2 (Figure 1 and 3A), is involved in nucleotide binding, interacting with the uracyl moiety by making a hydrogen bond to N3 of uracil base (Charnock and Davies, 1999
). Figure 3A demonstrates that only families 12 and 16 contain fully conserved aspartic acids in the vicinity. The remaining inverting families, with the exception of family 54, have a 90% conserved Asp and/or Glu. In family 54, 75% of sequences contain a conserved Glu residue in this same position. The retaining families 27, 45, and 55 all have conserved acidic residues for binding UDP. In contrast, families 60, 62, and 64 do not show any conservation of acidic residues at this position (Figure 3A), although some of these similarly bind UDP nucleotide compounds. Presumably other specificity-conferring mechanisms function in these families.
GT-A also contains a motif known as the DxD motif (Wiggins and Munro, 1998
; Shibayama et al., 1998
; Tarbouriech et al., 2001
) (Figure 3), although the arrangement of the two Asp residues varies. In SpsA, this motif adopts the sequence xD98D99. In four representative structures available for the folding superfamily, the first aspartate residue of the motif binds to hydroxyl groups on the ribose moiety, whereas the second aspartic acid binds the divalent metal ion, which could be Mn2+ (Tarbouriech et al., 2001
). The Mn2+ ion is clearly positioned to counter the negative charge that develops on the ß-phosphate on cleavage of the donor sugarphosphate linkage (Cowan, 1998
). As shown in Figure 3A, with the exception of families 27 and 62, all other families have an at least 90% conserved acidic residue in the region. In family 27 the figure drops to 88%. Again, the families newly assigned to GT-A provide a surprise; no conserved acidic residue is seen in this region for family 62.
Our degree of understanding of mechanism in GT-B is less advanced. One particular region of sequence conservation between families of the GT-B superfamily (Ha et al., 2000
) lies around 250290 (MurG numbering). Three G-loops (glycine-rich loops located at turns between the carboxyl ends of ß-strands and the N-termini of the following
-helices in Rossman fold domains; Baker et al., 1992
) have also been the focus of attention (Ha et al., 2000
). Later structural determination of the UDP-GlcNAc:MurG complex (Hu et al., 2003
) revealed that two G-loops and the 250290 conserved region are responsible for binding to the UDP-GlcNAc substrate. All inverting families had the acidic residues conserved in this region, with the exception of families 9 and 41. In family 41, 78% of sequences contained a conserved glutamine. Among the residues forming hydrogen bonds to the substrate is Glu269 (Figure 1), for which a role in distinguishing between UDP and TDP has been proposed (Hu et al., 2003
). This residue is 90% conserved in family 28 (containing MurG itself) and 100% conserved in family 30 (Figure 3B), whereas an Asp residue may functionally substitute in family 19 and 33. A catalytic role for this residue may be effectively ruled out because its replacement with Ala in MurG led to only modest loss of activity (Hu et al., 2003
). Families 5, 9, and 41 do not have any conservation of acidic residues in this region and must use other mechanisms to confer substrate specificity.
From these analyses it is clear that further work is required to help locate catalytic acidic residues in GT-B and in retaining families of GT-A. The lack of any perceptible trends in positioning of conserved acidic residues in GT-B (Figure 3B), even considering only inverting enzymes, suggests that several catalytic site architectures may well be present in the superfamily.
| Materials and methods |
|---|
|
|
|---|
Members of GT families 2, 12, 16, 21, 25, 27, 28, 45, 54, 55, 56, 60, 62, and 64 were located in the CAZY database and retrieved using Entrez (www.ncbi.nlm.nih.gov/entrez). Groups of sequences were aligned with ClustalW (Higgins et al., 1994
Iterated sequence database searches were carried out using PSI-BLAST (Altschul et al., 1997
) at the NCBI (www.ncbi.nlm.nih.gov/BLAST), using either 0.01 or 0.001 as the E-value cut-off, below which a sequence is included in the next iteration. Appearance of a member of a given GT family in the list of sequences resulting from a search using a different GT family was taken to indicate significant sequence similarity and hence to support a common evolutionary origin for the two families. As input for the fold recognition and iterated database searches we used representative sequences of all families as listed in Table I. When searches with a certain GT family member produced members of another GT family among the significant results, a possible evolutionary origin for the two families was suggested.
| Footnotes |
|---|
1 To whom correspondence should be addressed; e-mail: ocfranco{at}cenargen.embrapa.br
| Abbreviations |
|---|
CAZY, carbohydrate active enzymes; GT, glycosyltransferases; GT-A, glycosyltransferase from fold superfamily A; GT-B, glycosyltransferase from fold superfamily B; MurG, GT-B from E. coli; SpsA, GT-A from B. subtilis
| References |
|---|
|
|
|---|
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.
Baker, P.J., Britton, K.L., Rice, D.W., Rob, A., and Stillman, T.J. (1992) Structural consequences of sequences patterns in the fingerprint region of the nucleotide binding fold: implications for nucleotide specificity. J. Mol. Biol., 228, 662671.[CrossRef][Web of Science][Medline]
Breton, C., Bettler, E., Joziasse, D.H., Geremia, R.A., and Imberty, A. (1998) Sequence-function relationship of prokaryotic and eukaryotic galactosyltransferases. J. Biochem. (Tokyo), 123, 10001009.
Breton, C., Heissigerova, H., Jeanneau, C., Moravcova, J., and Imberty, A. (2002) Comparative aspects of glycosyltransferases. Biochem. Soc. Symp., 69, 2332.[Medline]
Brown, N.P., Leroy, C., and Sander, C. (1998) MView: a Web compatible database search or multiple alignment viewer. Bioinformatics, 14, 380381.
Bujnicki, J.M., Elofsson, A., Fischer, D., and Rychlewski, L. (2001a) Structure prediction meta server. Bioinformatics, 17, 75007511.
Bujnicki, J.M., Elofsson, A., Fischer, D., and Rychlewski L. (2001b) LiveBench-1: continuous benchmarking of protein structure prediction servers. Protein Sci. 10, 352361.[CrossRef][Web of Science][Medline]
Campbell, J.A., Davies, G.J., Bulone, V., and Henrissat, B. (1998) A classification of nucleotide-diphospho-sugar glycosyltransferases based on amino acid sequence similarities. Biochem. J., 329, 719.[Medline]
Charnock, S.J. and Davies, G.J. (1999) Structure of the nucleotide-diphospho-sugar transferase, SpsA from Bacillus subtilis, in native and nucleotide-complexed forms. Biochemistry, 38, 63806385.[CrossRef][Medline]
Colonna-Romano, S., Porta, A., Franco, A., Kobayashi, G.S., and Maresca, B. (1998) Identification and isolation by DDRT-PCR of genes differentially expressed by Histoplasma capsulatum during macrophages infection. Microb. Pathog., 25, 5566.[CrossRef][Web of Science][Medline]
Coutinho, P.M. and Henrissat, B. (1999) Carbohydrate-active enzymes server. Available at http://afmb.cnrs-mrs.fr/CAZY. Accessed August 20, 2003.
Coutinho, P.M., Deleury, E., Davies, G.J., and Henrissat, B. (2003) An evolving hierarchical family classification for glycosyltransferases. J. Mol. Biol., 328, 307317.[CrossRef][Web of Science][Medline]
Cowan, J.A. (1998) Magnesium activation of nuclease enzymesthe importance of water. Inorg. Chim. Acta, 275, 2427.[CrossRef]
Davies, G.J. and Henrissat, B. (2002) Plant glyco-related genomics. Structural enzymology of carbohydrate-active enzymes: implications for the post-genomic era. Biochem. Soc. Trans., 30, 291297.[CrossRef][Web of Science][Medline]
Fischer, D. (2000) Hybrid fold recognition: combining sequence derived properties with evolutionary information. In Altman, R.B., Dunker, A.K., Hunter, L., Laudardale, K., and Klein, T.E. (Eds.), Pacific Symposium on Biocomputing. World Scientific, Singapore, pp. 119130.
Fischer, D. (2003) 3D-SHOTGUN: a novel, cooperative, fold-recognition meta-predictor. Proteins, 51, 434441.[CrossRef][Web of Science][Medline]
Fischer, D. and Rychlewski, L. (2003) The 2002 Olympic games of protein structure prediction. Protein. Eng., 16, 157160.
Garinot-Schneider, C., Lellouch, A.C., and Geremia, R.A. (2000) Identification of essential amino-acid residues in the Sinorhizobium meliloti glucosyltransferase ExoM. J. Biol. Chem., 275, 3140731423.
Gastinel, L.N., Cambilau, C., and Bourne, Y. (1999) Crystal structures of the bovine ß4-galactosyltransferase catalytic domain and its complex with uridine diphosphogalactose. EMBO J., 18, 35463557.[CrossRef][Web of Science][Medline]
Guex, N. and Peitsch, M.C. (1997) Swiss-model and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis, 18, 27142723.[CrossRef][Web of Science][Medline]
Ha, S., Walker, D., Shi, Y., and Walker, S. (2000) The 1.9 Å crystal structure of Escherichia coli MurG, a membrane-associated glycosyltransferase involved in peptidoglycan biosynthesis. Protein. Sci., 9, 10451052.[Web of Science][Medline]
Higgins, D., Thompson, J., Gibson, T., Thompson, J.D., Higgins, D.G., and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 46734680.
Hu, Y., Chen, L., Ha, S., Gross, B., Falcone, B., Walker, D., Mokhtarzadeh, M., and Walker, S. (2003) Crystal structure of the MurG:UDP-GlcNAc complex reveals common structural principles of a superfamily of glycosyltransferases. Proc. Natl Acad. Sci. USA, 100, 845849.
Ikeda, M., Wachi, M., Jung, H.K., Ishino, F., and Matsuhashi, M. (1990) Nucleotide sequence involving murG and murC in the mra gene cluster region of Escherichia coli. Nucleic Acids Res., 18, 40144014.
Keenleyside, W.J., Clarke, A.J., and Whitfield, C. (2001) Identification of residues involved in catalytic activity of the inverting glycosyl transferase WbbE from Salmonella enterica serovar borreze. J. Bacteriol., 183, 7785.
Kelley, L.A., MacCallum, R.M., and Sternberg, M.J. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol., 299, 499520[Web of Science][Medline]
Lundstrom, J., Rychlewski, L., Bujnicki, J., and Elofsson, A. (2001) Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci., 10, 23542362.[CrossRef][Web of Science][Medline]
Parkhill, J., Wren, B.W., Thomson, N.R., Titball, R.W., Holden, M.T.G., Prentice, M.B., Sebaihia, M., James, K.D., Churcher, C., Mungall, K.L., and others. (2001) Genome sequence of Yersinia pestis, the causative agent of plague. Nature, 413, 523527.[CrossRef][Medline]
Pedersen, L.C., Tsuchida, K., Kitagawa, H., Sugahara, K., Darden, T.A., and Negishi, M. (2000) Heparan/chondoitin sulfate biosynthesis: structure and mechanism of human glucuronyltransferase I. J. Biol. Chem., 275, 3458034585.
Rigden, D.J. (2002) Iterative database searches demonstrate that glycoside hydrolase families 27, 31, 36 and 66 share a common evolutionary origin with family 13. FEBS Lett., 523, 1722.[CrossRef][Web of Science][Medline]
Rigden, D.J. and Franco, O.L. (2002) Beta-helical catalytic domains in glycoside hydrolase families 49, 55 and 87: domain architecture, modelling and assignment of catalytic residues. FEBS Lett., 530, 22532.[CrossRef][Web of Science][Medline]
Rychlewski, L., Jaroszewski, L., Li, W., and Godzik, A. (2000) Comparison of sequence profiles: strategies for structural predictions using sequence information. Protein Sci., 9, 232241.[Web of Science][Medline]
Shibayama, K., Ohsuka, S., Tanaka, T., Arakawa, Y., and Ohta, M. (1998) Conserved structural regions involved in the catalytic mechanism of Escherichia coli K-12 WaaO(Rfal). J. Bacteriol., 180, 53135318.
Tarbouriech, N., Charnock, S.J., and Davies, G.J. (2001) Three-dimensional structures of the Mn and Mg dTDP complexes of the family GT-2 glycosyltransferase SpsA: a comparison with related NDP-sugar glycosyltransferases. J. Mol. Biol., 314, 655661.[CrossRef][Web of Science][Medline]
Theologis, A., Ecker, J.R., Palm, C.J., Federspiel, N.A., Kaul, S., White, O., Alonso, J., Altafi, H., Araujo, R., Bowman, C.L., and others. (2000) Sequence and analysis of chromosome 1 of the plant Arabidopsis thaliana. Nature, 408, 816820.[CrossRef][Medline]
Unligil, U.M. and Rini, J.M. (2000) Glycosyltransferase structure and mechanism. Curr. Opin. Struct. Biol., 10, 510517.[CrossRef][Web of Science][Medline]
Unligil, U.M., Zhou, S., Yuwaraj, S., Sarkar, M., Schachter, H., and Rini, J.M. (2000) X-ray crystal structure of rabbit N-acetylglucosaminyltransferase I: enzyme mechanism and a new protein superfamily. EMBO J., 19, 52695280.[CrossRef][Web of Science][Medline]
Verbert, A. and Cacan, R. (1999) "Glyco-deglyco" processes during the biosynthesis of glycoproteins. J. Soc. Biol., 193, 101110.[Medline]
Wiggins, C.A. and Munro, S. (1998) Activity of the yeast MNN1
-1,3-mannosyltransferase requires a motif conserved in many other families of glycosyltransferases. Proc. Natl Acad. Sci. USA, 95, 79457950.
Wrabl, J.O. and Grishin, N.V. (2001) Homology between O-linked GlcNAc transferases and proteins of the glycogen phosphorylase superfamily. J. Mol. Biol., 314, 365374.[CrossRef][Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
H.-Y. Shu, C.-P. Fung, Y.-M. Liu, K.-M. Wu, Y.-T. Chen, L.-H. Li, T.-T. Liu, R. Kirby, and S.-F. Tsai Genetic diversity of capsular polysaccharide biosynthesis in Klebsiella pneumoniae clinical isolates Microbiology, December 1, 2009; 155(12): 4170 - 4183. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. T. Larson, D. Reiter, M. Young, and C. M. Lawrence Structure of A197 from Sulfolobus Turreted Icosahedral Virus: a Crenarchaeal Viral Glycosyltransferase Exhibiting the GT-A Fold J. Virol., August 1, 2006; 80(15): 7636 - 7644. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Breton, L. Snajdrova, C. Jeanneau, J. Koca, and A. Imberty Structures and mechanisms of glycosyltransferases Glycobiology, February 1, 2006; 16(2): 29R - 37R. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Singh, G. A. Khan, L. Kinarsky, H. Cheng, J. Wilken, K. H. Choi, E. Bedows, S. Sherman, and P.-W. Cheng Identification of Disulfide Bonds among the Nine Core 2 N-Acetylglucosaminyltransferase-M Cysteines Conserved in the Mucin {beta}6-N-Acetylglucosaminyltransferase Family J. Biol. Chem., September 10, 2004; 279(37): 38969 - 38977. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







