Glycobiology Advance Access originally published online on February 13, 2008
Glycobiology 2008 18(4):339-349; doi:10.1093/glycob/cwn013
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
N-Glycoproteomics – An automated workflow approach
2 MediCel, Haartmaninkatu 8, FIN-00290 Helsinki
3 Transplantation Laboratory, Haartman Institute, University of Helsinki, PO Box 63, FIN-00014
4 HUCH Laboratory Diagnostics, Helsinki University Central Hospital, PO Box 401, FIN-00029 HUCH, Helsinki, Finland
1 To whom correspondence should be addressed: Tel: +358-9-1912-5111; Fax: +358-9-1912-5155; e-mail: risto.renkonen{at}helsinki.fi
Received on November 22, 2007; revised on January 28, 2008; accepted on February 3, 2008
| Abstract |
|---|
|
|
|---|
Glycan decorations dictate protein functions and thus have crucial importance in life sciences. Previously glycoprotein analysis was mainly focused on the analysis of the liberated glycans allowing detailed structural, but lacking positional information. Analysis of intact glycopeptides required purified glycoproteins and manual interpretation of spectra. We developed an approach where mixtures of native glycopeptides were analyzed with tandem mass spectrometry and the spectra were analyzed with automated in silico workflows. The latter included combination of the original spectra, generation of a human N-glycopeptide library, matching the glycopeptide spectra to the theoretical peptide fragments, scoring the observations, predicting the glycan composition, which were then matched against the observed spectra, statistical validation of the results with target–decoy filtering, and finally the calculation of glycan structures. We verified this approach with the 150 serotransferrin glycopeptide spectra, where we automatically generated 105 putative interpretations from >109 theoretical glycopeptides. After scoring 62 glycopeptide spectra obtained validated interpretation with concomitant amino acid sequences, glycan compositions, and structures. When applying this method to an unknown mixture of human plasma glycoproteins we identified 80 glycopeptides with their glycan compositions or structures. Instead of weeks and months of interpretation work of mass spectrometry files our automated workflow can be executed in few hours and provide information concomitantly from both the amino acid and glycan moieties of intact glycopeptides in mixtures. No advanced computational skills were needed to use these preformed and tested workflows. In case users want to add complexity to the analysis they are allowed to alter all parameters and rebuild the workflows.
Key words: automated workflow / glycopeptide / mass spectrometry / N-glycoproteomics
| Introduction |
|---|
|
|
|---|
Glycosylation, i.e., protein decoration with carbohydrates or glycans, is the most diverse form of protein posttranslational modifications (PTM) and can provide a huge degree of glycan variations to the protein backbones. These various glycosylated molecules are synthesized mainly in the endoplasmic reticulum and Golgi via reactions involving sugar nucleotide synthases and their transporters, glycosyltransferases, glycosidases, and other sugar-modifying enzymes. Branched N-linked and O-linked glycan chains modify glycoproteins, while proteoglycans are modified with linear glycans. The biological role of glycans has been linked to a broad variety of phenomena such as cell growth and development, malignant tumor growth, differentiation and metastasis, within several steps of immune response, microbial-host interactions, and intercellular adhesion (Lowe 2003
Descriptive analysis of human glycoconjugates began with the discovery of lectins in the late 1940s and their use in ABO blood typing (Renkonen 1948
; Morgan and Watkins 2000
). Lectins and later also glycans-specific antibodies have revealed a great amount of circumstantial evidence of various glycoproteins on cell surfaces, their tissue distributions as well as expression patterns at sites of diseases such as inflammations (Kannagi 2002
; Renkonen et al. 2002
; Rosen 2004
; Uchimura and Rosen 2006
).
The identification of glycoproteins has relied on the analysis of nonglycosylated peptides. Traditionally protein glycosylation analysis has been performed with enzymatically (Takahashi et al. 2003
) or chemically (Yosizawa et al. 1966
) released glycans, which have been analyzed with various mass spectrometry approaches (Barr et al. 1991
; Mock et al. 1991
; Duffin et al. 1993
) and manual or automated interpretation of the spectra (Mizuno et al. 1999
; Cooper et al. 2001
). There are several software tools for analyzing released glycans. Cartoonist (Goldberg et al. 2005
) is the simplest and thus most applicable in high-throughput analysis of these and uses solely MALDI ionization and precursor ion masses for the calculation of potential glycan compositions. More advanced software tools, such as STAT (Gaucher et al. 2000
), FragLib (Zhang et al. 2005
), and StrOligo (Ethier et al. 2002
), use MS2 and OSCAR (Lapadula et al. 2005
) MSn data to match the glycan structures, but still lack the information of glycosylation sites on the protein backbone. GlycoMod (Cooper et al. 2001
) is the first software tool for the analysis of the glycan composition in the glycopeptide. Here the mass of the peptide moiety needs to be previously known as the analysis is based on the total mass of the glycan. Sweet substitute (Clerens et al. 2004
) uses another approach and generates the theoretical N-glycopeptide CID spectra for one or few known glycopeptides and compares the empirical spectrum against them. Recently high-throughput methods for the analysis of occupied glycosylation sites on glycopeptides were published (Zhang et al. 2003
; Sun et al. 2007
), where after the conjugation of glycopeptides to beads, the glycans were released and the asparagines modified to aspartic acid on the peptides could be identified. Regardless of many present tools developed so far none of them is able to determine the components of glycopeptide when glycan and peptide moieties are unknown.
We describe here our novel automated in silico workflow for the glycoprotein analysis. The workflow analyzes the N-glycopeptide mass spectra to determine concomitantly their amino acid sequences and glycosylations in a high-throughput manner (Figure 1). The main target is unknown glycoprotein mixtures, but as a proof of principle we show first the analysis of purified human plasma serotransferrin.
|
| Results and discussion |
|---|
|
|
|---|
Analysis of glycoforms of human serotransferrin
Trypsinization of human serotransferrin leaves the amino acid backbones long enough for identification against database search whereas other methods such as pronase or chemical cleavage yield too short peptides for identification from unknown mixtures. Nonglycosylated peptides represent >95% of the mixture and suppress the ionization of the few glycopeptides. The presence of several glycovariants from the same glycosylation site reduces the molar ratio of unique glycopeptides further and thus the glycopeptides need to be enriched. For that we used a size-exclusion column (Alvarez-Manilla et al. 2006
10,000 MS2 spectra, out of which only a subset meeting the following three criteria was selected for further analysis: (i) the glycan marker ions, 204 m/z (HexNAc), 292 m/z (NeuAc), 366 m/z (HexHexNAc), or 657 m/z (HexHexNAcNeuAc), are present (Carr et al. 1993
mass < 0.1 Da) were combined and the 150 spectra entered to the next analytical step.
Automated in silico workflow for the glycoprotein analysis
Amino Acid Sequence
The analysis of the glycopeptide spectra began with the determination of their peptide sequences.
For this we first generated a database of the human UniProt (Wu et al. 2006
) FASTA sequences, in silico trypsinized it with maximum of two miscleavages and selected the sequences containing the N-glycosylation consensus sequence (Marshall 1974
; Bause and Legler 1981
) (Supplementary Table I). These 440,000 in silico tryptic peptides were fragmented and their b- and y-ion series were matched to the 150 measured glycopeptide spectra obtained in the MS2 fragmentation experiments (Conboy and Henion 1992
) (Figure 2A). Unlike in the classical peptide analysis the total mass of the precursor ion could not be used to filter peptides as we had unknown combinations of amino acid sequences and glycan decorations. Therefore we required that the MS2 spectrum contained a peptide + HexNAc peak, which is known to be intensive in the glycopeptide spectra (Ritchie et al. 2002
). For each precursor there were on average 800 (range 0–6000) putative peptide interpretations fulfilling this requirement. After matching the observed fragmentation peaks against the library of the 800 in silico peptides, 50 best scoring amino acid sequences for each spectrum were stored for further analysis (Supplementary Table II). However the presence of the peptide + HexNAc in the spectra is not obligatory, although its presence limits the number of putative solutions. Our protein identification within this software analysis can also be executed from either an identified peptide or only b- and y-ions.
|
N-Glycan Composition
The compositions of the putative N-glycan decorations on the given glycopeptide were calculated based on the mass differences of the observed glycopeptide mass and the masses of the putative amino acid backbones. Combinations of four monosaccharides (Hex, HexNAc, DeoxyHex, and NeuAc) and one optional formylation for each (artificial modification from the sample and LC buffers) (Gottlieb et al. 1940
Statistical Validation of Peptide Matches of the Glycopeptide Interpretations
When analyzing samples that are mixtures, the confidence of the interpretations should be determined. A target–decoy approach (Beausoleil et al. 2006
) was used to estimate the true positive rate of matched peptides. To generate the decoy database the amino acid sequences of the 440,000 in silico glycopeptides were reversed. The amino acid sequence and glycan analysis were performed for all the 150 spectra using the target and decoy glycopeptide databases. The peptide score of the best matching glycopeptides of each spectrum was used as the filtering variable and the score cutoff was selected so that the number of decoy hits indicated the 90% level of true positives.
De novo N-Glycan Structures
At the final step, for each glycan composition, a set of de novo glycan structures (without linkage information), which matched best with the measured spectrum, was generated. As the number of unique structures with a given composition can be huge, not all structures were generated, but were searched with our novel Branch and Bound type algorithm. The algorithm iteratively grows a population of N-glycan structures, filters out structures, which match least to the measured spectrum, until structures have grown to target composition. The theoretical spectra were generated using fragments resulting from glycosidic cleavages. Cross-ring cleavages were not included as they are not visible in the protonated spectra (Harvey 2001
). The method used is true de novo; it does not use a database of known structures. Only N-glycan core structure (with and without bisecting HexNAc) was given as input. However, some limiting factors like maximum number of branches were used. Finally, for each spectrum up to 30 glycopeptide structures with the highest sum of peptide and structure scores were stored (Figure 2C and Supplementary Table V).
This list is the final result, which can be achieved with the automatic workflow. Automatic selection of only one identified or undoubtedly best matching structure for each spectrum proved to be difficult. Typically dozen of structures from the original
109 theoretical glycopeptide search space matched almost equally well. The reason is that the theoretical spectra of various isomeric structures can be very similar or even identical. In most cases the final selection (if necessary) has to be done based on other biological information.
Our analysis of human serotransferrin showed that we could interpret 62 out of the 150 glycopeptide spectra with the 90% confidence level of identified peptides within a few hours. Thirty-one individual glycopeptides represented the N431 site, 6 glycopeptides the N491 site, and 19 glycopeptides the N631 site (Table I). Altogether the structures, i.e., not only the composition but also the sequence of the glycans, could be determined for 45 out of the 62 interpreted spectra (Supplementary Table VI).
|
Validation of the results
For further proof we also isolated one of the glycopeptides (m/z = 1227.691, z = 3) with offline HPLC and released its N-glycans with PNGase F. The glycans were analyzed with nanoESI-CapLC-MS using a carbon-based column. From the observed [M + 2H]2+, [M+ Na + H]2+, and [M + 2Na]2+ derivatives the doubly protonated ion was selected for the MS2 fragmentation studies. We fragmented the major glycan decorating serotransferrin and known from the previous literature to the b-, c- and y-, z-ion series in silico and compared these to the measured data. We could assign all the theoretical fragments in our MS2 spectrum (Figure 3, Supplementary Table VII). Concomitantly our automated workflow also provided the same result (spectrum 4 in Table I), representing m/z = 1227.691.
|
Glycopeptide relative quantitation with LC-MS
In addition to the LC-MS2 stopflow analysis we acquired an LC-MS spectrum of the original tryptic digest for relative quantitation of different glycopeptides and glycosylation sites. The acquisition was done from the sample prior to size exclusion chromatography in order to preserve the original ratios of glycosylation sites. Not all the precursors were found as expected because in the size exclusion chromatography glycopeptides are enriched compared to the original digest. Abundances of not detected glycopeptides in the LC-MS were assumed to be insignificant as in the first place they had to be enriched to be acquired in the stopflow experiment. The relative quantities of the glycosylation sites were N432 67%, N491 1%, and N630 32%.
Analysis of glycoforms of human plasma
When applying this in silico analysis method to human plasma specimen we could identify 80 spectra, i.e., identify 80 glycopeptides with their amino acid sequence and concomitantly their glycan compositions (Table II) followed by structure analysis (Supplementary Table VIII). The execution of the workflows takes only a few hours and thus shows the dramatic increase on pace of the glycopeptide analysis. Previously it has not been possible to go through thousands and thousands of optional glycopeptide interpretations and to validate them automatically.
|
The aim of the last part of this work was to identify glycoproteins via their respective glycopeptides from a very complex mixture of glycoproteins where the manual analysis is not possible. To our knowledge no one has been able to come even close with glycoprotein analysis when starting to analyze a complex mixture as we here.
Taken together we describe here our novel automated in silico workflow for the rapid analysis of unpurified glycoproteins or even complex mixtures of glycoproteins. First in our analysis with purified serotransferrin out of over 109 theoretical glycopeptide combinations our automated workflow generated 105 putative interpretations for the 150 glycopeptide spectra. Sixty-two spectra (40%) yielded statistically valid data of (a) amino acid sequence, (b) glycan composition, and (c) glycan structures. We identified 62 individual glycopeptide compositions representing three different glycosylation sites. The most abundant and variable in glycan structures was the glycosylation site N432. Secondly we could directly apply this analysis workflow also to very complex mixture of human plasma glycoproteins and detect some 80 various glycopeptides with one in silico workflow execution. These examples emphasize the great power of in silico approaches to the interpretation of mass spectrometry data for the analysis of glycopeptides structures.
| Materials and methods |
|---|
|
|
|---|
Plasma processing
The human blood was collected in the 3.5 mL Venosafe lithium heparin gel (VF-054SAHLW) tubes, cooled at room temperature for 15 min, and centrifuged for 10 min 1200 x g at room temperature. The plasma was aliquoted and stored at –80°C for further use. In the preparation of plasma bulk and low abundance protein fraction Multiple Affinity Removal System HPLC column 4.6 mm x 100 mm (5185–5985) (Agilent Technologies, Inc., Palo Alto, CA) was used as stated in the manufacturer's instructions.
Digestion of proteins
Plasma bulk and low abundance protein fractions and human serotransferrin from Sigma-Aldrich Ltd. (MO) (1 mg of protein) were trypsin digested, reduced, and alkylated (22).
Size exclusion chromatography of protein digest
Glycopeptides were enriched from protein digest using size exclusion chromatography (23). In brief, 1 nmol of human serotransferrin, plasma low abundant or bulk protein digest was injected to Superdex Peptide 10/300 GL column, 2.1 mm x 250 mm (GE Healthcare Bio-Sciences) and the peptides were eluted isocratically at 0.1 mL/min with 0.1% trifluroacetic acid. 0.1 mL fractions were collected and vacuum-dried.
Mass spectrometry
The peptide and glycopeptide spectra were acquired with a Q-TOF Ultima Global mass spectrometer (Micromass Ltd. Manchester, UK) using variable flow nano high-pressure liquid chromatography with a CapLC system (Waters Ltd.). The analytical column used was LC Packings (Amsterdam, the Netherlands) 25 cm long PepMapC18 with 75 µm diameter. The ESI spectra were acquired in the positive mode during 600 min gradient 5% B (95% acetonitrile and 0.1% formic acid) to 30% B (buffer A was 5% acetonitrile, 0.1% formic acid) and acquisition range was set to 800–1600 m/z in the MS Survey and 100–4000 m/z in the MS2 mode. The spectra were collected in a data-dependent acquisition fashion including charge states +2–+6.
We applied a series of collision energies between 10 and 70 eV in the CID experiments. The glycopeptide spectra were identified from the marker ions: 204 m/z (HexNAc), 366 m/z (HexHexNAc), 292 m/z (NeuAc), or 657 m/z (HexHexNAcNeuAc) (24). The acquired spectra were deconvoluted with the MaxEnt3 algorithm (Waters) and extracted (imported) as Micromass Ltd. pkl file format into the Medicel Integrator software platform (www.medicel.com), version 1.3.
Relative quantitation of human serotransferrin glycopeptides
Mass chromatograms for precursor ions of different glycoforms of transferrin glycopeptides were extracted from an LC-MS run of transferrin digest. Elution time for each precursor was determined and a combined spectrum was created from the mass chromatogram. The spectrum was deconvoluted using Masslynx 4.0 software and MaxEnt3 algorithm (Waters). Relative quantities of glycopeptides were calculated from deconvoluted peak intensities. Site-specific relative quantities were calculated based on the sum of intensities of the precursors assigned to be a glycopeptide containing one of three possible glycosylation sites independent of peptide sequence length and glycan composition.
N-Glycan release and analysis
N-Linked glycans were released from glycopeptide with PNGase F (Sigma-Aldrich, MO) in a solution containing 20% acetonitrile, 5 mM ammonium acetate, and 50 mU/µL PNGase F. The released glycan was analyzed using Hypercarb column (Thermo Electron Corporation, MA) coupled to the CapLC HPLC system and mass spectrometer with the nanoESI interface. Solvent A was 5 mM ammonium acetate and solvent B was 5 mM ammonium acetate in 80% acetonitrile. The glycan was eluted from Hypercarb column using the gradient of solvent B from 5% to 50% over 18 min. The mass of the glycan was determined by an LC-MS experiment. The glycan was fragmented with six collision energies (10–60 eV) in six consecutive LC-MS2 experiments. The acquired fragmentation data were deconvoluted using the MaxEnt3 algorithm (Waters) and analyzed further to obtain the glycan composition.
N-Glycopeptide workflow
The in silico N-glycopeptide analysis workflow is a part of Medicel Integrator Protein N-glycosylation suite version 1.0 (www.medicel.com). Medicel Integrator is a modular software platform for biological and biomedical research and development. It connects all data to a relational database and enables continuous processes and solid tracking of data, from sample information and laboratory procedures to the in silico processes. The key applications used in this study are the laboratory information management system entitled Experiment and Workflow. The latter is an application with a graphical user interface for automated data analysis. Workflow implements a client–server architecture, can run automatically consecutive tools, and has a tool server cluster for tool execution. Medicel Integrator versions used were 1.1, 1.2, and 1.3.
The N-glycopeptide analysis workflow contained eight distinct tools (Supplementary Figure 3). Tools used in analysis were coded with Java or with Matlab 6.5 (MathWorks Inc). Matlab tools were compiled to stand-alone tools with Matlab Compiler 3.0 (MathWorks Inc).
Combine Spectra
The deconvoluted input spectra where the precursor mass difference was less than 10 mDa were combined as one spectrum and converted to Medicel mass spectrometry XML format. The combinations were done by adding 400 highest intensity peaks from different spectra to a single peak list. The idea of combination was to combine peaks from the same precursor, but with different collision energies to a single spectrum.
Retrieve Protein Sequences
UniProt (25) human proteins in the FASTA format were retrieved from the Medicel data warehouse, which had UniProtKB version 2.5 integrated. The number of human protein entries was 44,601. Each protein entry within Medicel data warehouse was sequence aligned for its uniqueness and given a unique Medicel identifier, which allowed it to be linked with other data domains such as genes and transcripts (not used in this study).
Calculate Proteolytic Peptides with N-glycan Sites
The theoretical tryptic peptides were calculated from the protein sequences. Cleavage was set to occur after arginine and lysine if next amino acid is not proline. In one peptide two miscleavages were set to be possible. Sequences having N-glycosylation consensus site (26, 27), NXS/T/C, where X is not P, were stored.
Match Peptides to the Spectra
The glycopeptide spectra were matched to the theoretical peptide fragments and the best scoring peptide sequences were listed. To limit the number of input peptides for each spectrum it was required that the spectrum contains a peptide+HexNAc peak. All peaks with a relative intensity higher than 3% were considered as a potential peptide+HexNAc peak. All peptides with mass matching those peaks with a tolerance of 50 mDa or 100 ppm were included. The theoretical fragment spectra contained b- and y-ion series without attached glycan. The mass tolerance of fragment matches was either 50 mDa or 100 ppm and the relative intensity limit was 0.5% from the base peak.
Predict Glycan Compositions
The mass of unknown glycan was expected to match the mass difference between the measured glycopeptide precursor mass and the masses of best scoring peptides. A list of possible glycan compositions, which match those mass differences, was generated with a mass tolerance of 200 mDa. The monosaccharides used in the glycan composition calculations were hexose (Hex), N-acetylhexosamine (HexNAc), sialic acid (NeuAc), and deoxyhexose (DeoxyHex). The accepted range of the number of monosaccharide units was for Hex 3–15, HexNAc 2–15, NeuAc 0–4, and DeoxyHex 0–2. Also composition was allowed to have 0–3 formylations (CO). Some compositions were filtered out with the following rules (13): (a) the number of deoxyhexoses plus 1 must be less than or equal to the sum of the number of hexoses and N-acetylhexosamines and (b) if there are no N-acetylhexosamines, except in N-glycan core, then the number of sialic acids is zero.
Match Glycopeptide Compositions to the Spectra
First a fragment library of glycan compositions was generated by in silico glycosidic cleavages. Then the glycan compositions from glycopeptides observed after MS analysis were scored against the theoretical fragments. Some impossible fragments were filtered out (i) if they did not fulfill the requirement of the N-glycan core, (ii) if NeuAc residues were assumed to attach to the N-glycan core HexNAcs or the branching hexose, or (iii) if NeuAc and DeoxyHex residues were connected to each other.
The peptide backbones were assumed to stay intact. The spectrum peaks matching the peptide fragments without glycan were filtered out before composition matching. The mass tolerance for fragment matching was 50 mDa or 80 ppm and the relative intensity limit was 0.75%. The final glycopeptide score (Supplementary Table VIII) was a sum of peptide score and glycan composition scores.
Filter by Decoy Matches
Statistical validation of peptide hits was done based on a target–decoy approach (28). The main assumption is that the false positive hits (random spectra) are independent and have an equal probability to match target or decoy databases. The decoy database was generated by reversing target peptides, i.e., the human peptides having the N-glycosylation site. The peptide identification, glycan composition, and composition matching steps were done exactly the same way against the target and the decoy-peptide databases. After that, for each spectrum, it was compared which one gives higher score and thus identifies target or decoy matches. The score parameter used was the peptide score of the highest ranked glycopeptide after glycopeptide composition scoring step. By assuming independent experiments, the number of false positive hits to the target database can be estimated from the number of hits to the decoy database by inverting the binomial distribution (Huttlin et al. 2007
). The score cutoff was selected so that the expected (mean) ratio of true positive hits to the target database was 90%. With the sample of purified serotransferrin it was possible to verify the expected true positive rate (Supplementary Figure 2). The match of estimated true positive and observed serotransferrin rates was very good. Also, the main assumption was tested by checking the target–decoy distributions with rank (Supplementary Figure 3). Higher rank peptides were rather random, although there were slight tendency toward the decoy database. Although the results are very promising, some deviations of the estimated true positive rates can be expected. The reason is that the false positive or decoy hits are not truly independent as they are originated from real molecules and can be measured several times with different charge states or modifications.
Match de novo Glycan Structures
De novo glycan structures, which match best to the measured spectra, were searched with the Branch and Bound type algorithm. The algorithm proceeds as follows (Supplementary Figure 4): an input is a list of possible glycan compositions with matched fragments. Search is started from a given glycan core structure, new carbohydrate residues are added iteratively, and a population of glycan structures is generated. The process is continued until the structures have the target composition. Search can be guided by setting the maximum number of allowed branches and allowed connections for separate monosaccharides. During the search the structures with cost function less than a given limit are stored for next iteration. Also if the structure population size grows beyond a given limit, the structures with highest cost are removed. The cost function used is the sum of not matching theoretical glycan structure fragments and not matching measured glycan composition fragments. The theoretical fragments are generated with the unlimited number of glycosidic cuts and only theoretical and measured glycan fragments attached to unfragmented peptide are included. The cost function defined here can be used only when comparing structures with the same composition. To rank structures with different compositions and to include the unattached glycan fragments the score values were calculated at the end of iteration. The final glycopeptide score was the sum of structure and peptide scores. The limiting factors used with structure match were (a) max number of branches 4; (b) max number of connections for carbohydrate units: Hex 4, HexNAc 3, NeuAc 1, and DeoxyHex 1; (c) min number of measured glycopeptide peaks 5; (d) max population size 1000; and (e) max cost 17.
Scoring method
Score to a theoretical glycopeptide G is calculated as a negative logarithm of a probability that a random set of fragments would have as many or more shared peaks with the measured spectrum as the ranked glycopeptide.
Let S be the measured spectrum with M mass values {m1···mM} from Mp possible ones with given mass range and tolerance (tol). Let G be the theoretical glycopeptide spectrum with N mass values {g1···gN} and R be the random spectrum with N mass values chosen from the mass range of the measured spectrum. A shared peak count SPC(G, S) is defined as the number of peak pairs in G and S such that
, where i{1...N} and j{1...M}. The probability PR that the random spectrum R has more or equal shared peaks than the glycopeptide spectrum G, SPC(R, S)
SPC(G, S), is calculated using the binomial distribution
|
|
Finally the score values are given as Score(G, S) = –log(PR).
It should be noted that the definition of the score used here is not supposed to reflect the true probability of a match being false positive. Simply because molecule fragments are composed of atoms they are not distributed evenly and no true random peaks exists. Despite that this simple approximation works well in practical cases to differentiate good hits from bad ones. The score defined here resembles the Ascore which has been used with phosphorylation site determination (Beausoleil et al. 2006
).
| Supplementary Data |
|---|
|
|
|---|
Supplementary data for this article is available online at http://glycob.oxfordjournals.org/.
| Conflict of interest statement |
|---|
|
|
|---|
None declared.
| Acknowledgements |
|---|
The work was supported in part by Research Grants of Academy of Finland, by Research Grants from Technology Development Centre (TEKES), Helsinki, Sigrid Juselius Foundation and a grant from the Helsinki University Central Hospital Fund.
| Abbreviations |
|---|
DeoxyHex, deoxyhexose; Hex, hexose; HexNAc, N-acetylhexosamine; NeuAc, N-acetylneuraminic acid; PNGase F, peptide:N-glycanase F; PTM, posttranslational modification; XML, extended markup language
| References |
|---|
|
|
|---|
Alvarez-Manilla G, Atwood J, Guo Y, Warren NL, Orlando R, Pierce M. Tools for glycoproteomic analysis: Size exclusion chromatography facilitates identification of tryptic glycopeptides with N-linked glycosylation sites. J Proteome Res (2006) 5:701–708.[CrossRef][ISI][Medline]
Barr JR, Anumula KR, Vettese MB, Taylor PB, Carr SA. Structural classification of carbohydrates in glycoproteins by mass spectrometry and high-performance anion-exchange chromatography. Anal Biochem (1991) 192:181–192.[CrossRef][ISI][Medline]
Bause E, Legler G. The role of the hydroxy amino acid in the triplet sequence Asn-Xaa-Thr(Ser) for the N-glycosylation step during glycoprotein biosynthesis. Biochem J (1981) 195:639–644.[ISI][Medline]
Beausoleil SA, Villén J, Gerber SA, Rush J, Gygi SP. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. (2006) 24:1285–1292.
Carr SA, Huddleston MJ, Bean MF. Selective identification and differentiation of N- and O-linked oligosaccharides in glycoproteins by liquid chromatography-mass spectrometry. Protein Sci (1993) 2:183–196.[Abstract]
Clerens S, van den Ende W, Verhaert P, Geenen L, Arckens L. Sweet substitute: A software tool for in silico fragmentation of peptide-linked N-glycans. Proteomics (2004) 4:629–632.[CrossRef][ISI][Medline]
Conboy JJ, Henion JD. The determination of glycopeptides by liquid chromatography/mass spectrometry with collision-induced dissociation. J Am Soc Mass Spectrom (1992) 3:804–814.[CrossRef][ISI]
Cooper CA, Gasteiger E, Packer NH. GlycoMod—a software tool for determining glycosylation compositions from mass spectrometric data. Proteomics (2001) 1:340–349.[CrossRef][ISI][Medline]
Davis MT, Stahl DC, Lee TD. Low flow high-performance liquid chromatography solvent delivery system designed for tandem capillary liquid chromatography-mass spectrometry. J Am Soc Mass Spectrom (1995) 6:571–577.[CrossRef][ISI]
Duffin KL, Lange GW, Welply JK, Florman R, OBrien PJ, Dell A, Reason AJ, Morris HR, Fliesler SJ. Identification and oligosaccharide structure analysis of rhodopsin glycoforms containing galactose and sialic acid. Glycobiology (1993) 3:365–380.
Ebnet K, Suzuki A, Ohno S, Vestweber D. Junctional adhesion molecules (JAMs): More molecules with dual functions? J Cell Sci (2004) 117:19–29.
Ethier M, Saba JA, Ens W, Standing KG, Perreault H. Automated structural assignment of derivatized complex N-linked oligosaccharides from tandem mass spectra. Rapid Commun Mass Spectrom (2002) 16:1743–1754.[CrossRef][ISI][Medline]
Fu D, van Halbeek H. N-glycosylation site mapping of human serotransferrin by serial lectin affinity chromatography, fast atom bombardment-mass spectrometry, and 1H nuclear magnetic resonance spectroscopy. Anal Biochem (1992) 206:53–63.[CrossRef][ISI][Medline]
Gaucher SP, Morrow J, Leary JA. STAT: A saccharide topology analysis tool used in combination with tandem mass spectrometry. Anal Chem (2000) 72:2331–2336.[Medline]
Goldberg D, Sutton-Smith M, Paulson J, Dell A. Automatic annotation of matrix-assisted laser desorption/ionization N-glycan spectra. Proteomics (2005) 5:865–875.[CrossRef][ISI][Medline]
Gottlieb D, Caldwell CG, Hixon RM. Action of formic acid on starch. J Am Chem Soc (1940) 62:3342–3344.[CrossRef][ISI]
Harvey DJ. Ionization and collision-induced fragmentation of N-linked and related carbohydrates using divalent cations. J Am Soc Mass Spectrom (2001) 12:926–937.[CrossRef][ISI][Medline]
Hirabayashi J, Kasai K-I. Separation technologies for glycomics. J Chromatogr B: Anal Technol Biomed Life Sci (2002) 771:67–87.[ISI][Medline]
Huttlin EL, Hegeman AD, Harms AC, Sussman MR. Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy. J Proteome Res (2007) 6:392–398.[CrossRef][ISI][Medline]
Jebanathirajah J, Steen H, Roepstorff P. Using optimized collision energies and high resolution, high accuracy fragment ion selection to improve glycopeptide detection by precursor ion scanning. J Am Soc Mass Spectrom (2003) 14:777–784.[CrossRef][ISI][Medline]
Kannagi R. Regulatory roles of carbohydrate ligands for selectins in the homing of lymphocytes. Curr Opin Struct Biol (2002) 12:599–608.[CrossRef][ISI][Medline]
Lapadula AJ, Hatcher PJ, Hanneman AJ, Ashline DJ, Zhang H, Reinhold VN. Congruent strategies for carbohydrate sequencing: 3. OSCAR: An algorithm for assigning oligosaccharide topology from MS(n) data. Anal Chem (2005) 77:6271–6279.[Medline]
Ley K, Kansas GS. Selectins in T-cell recruitment to non-lymphoid tissues and sites of inflammation. Nat Rev Immunol (2004) 4:325–335.[CrossRef][ISI][Medline]
Lowe JB. Glycan-dependent leukocyte adhesion and recruitment in inflammation. Curr Opin Cell Biol (2003) 15:531–538.[CrossRef][ISI][Medline]
Marshall RD. The nature and metabolism of the carbohydrate-peptide linkages of glycoproteins. Biochem Soc Symp (1974) 17–26.
Mizuno Y, Sasagawa T, Dohmae N, Takio K. An automated interpretation of MALDI/TOF postsource decay spectra of oligosaccharides: 1. Automated peak assignment. Anal Chem (1999) 71:4764–4771.[Medline]
Mock KK, Davey M, Cottrell JS. The analysis of underivatized oligosaccharides by matrix-assisted laser desorption mass spectrometry. Biochem Biophys Res Commun (1991) 177:644–651.[CrossRef][ISI][Medline]
Morgan WT, Watkins WM. Unravelling the biochemical basis of blood group ABO and Lewis antigenic specificity. Glycoconj J (2000) 17:501–530.[CrossRef][ISI][Medline]
Qiu R, Regnier FE. Use of multidimensional lectin affinity chromatography in differential glycoproteomics. Anal Chem (2005) 77:2802–2809.[Medline]
Renkonen J, Tynninen O, Hayry P, Paavonen T, Renkonen R. Glycosylation might provide endothelial zip codes for organ-specific leukocyte traffic into inflammatory sites. Am J Pathol (2002) 161:543–550.
Renkonen KO. Studies on hemagglutinins present in seeds of some representatives of leguminoseae. Ann Med Exp Fenn (1948) 26:66–72.
Ritchie MA, Gill AC, Deery MJ, Lilley K. Precursor ion scanning for detection and structural characterization of heterogeneous glycopeptide mixtures. J Am Soc Mass Spectrom (2002) 13:1065–1077.[CrossRef][ISI][Medline]
Rosen SD. Ligands for L-selectin: Homing, inflammation, and beyond. Annu Rev Immunol (2004) 22:129–156.[CrossRef][ISI][Medline]
Rudd PM, Wormald MR, Dwek RA. Sugar-mediated ligand-receptor interactions in the immune system. Trends Biotechnol (2004) 22:524–530.[CrossRef][ISI][Medline]
Sun B, Ranish JA, Utleg AG, White JT, Yan X, Lin B, Hood L. Shotgun glycopeptide capture approach coupled with mass spectrometry for comprehensive glycoproteomics. Mol Cell Proteomics (2007) 6:141–149.
Takahashi N, Yanagida M, Fujiyama S, Hayano T, Isobe T. Proteomic snapshot analyses of preribosomal ribonucleoprotein complexes formed at various stages of ribosome biogenesis in yeast and mammalian cells. Mass Spectrom Rev (2003) 22:287–317.[CrossRef][ISI][Medline]
Uchimura K, Rosen S. Sulfated L-selectin ligands as a therapeutic target in chronic inflammation. Trends Immunol (2006) 12:559–565.
Vissers JPC, Blackburn RK, Moseley MA. A novel interface for variable flow nanoscale LC/MS/MS for improved proteome coverage. J Am Soc Mass Spectrom (2002) 13:760–771.[CrossRef][ISI][Medline]
von Andrian UH, Mempel TR. Homing and cellular traffic in lymph nodes. Nat Rev Immunol (2003) 3:867–878.[CrossRef][ISI][Medline]
Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, et al. The Universal Protein Resource (UniProt): An expanding universe of protein information. Nucleic Acids Res (2006) 34:D187–D191.
Yosizawa Z, Sato T, Schmid K. Hydrazinolysis of a1-acid glycoprotein. Biochim Biophys Acta (1966) 121:417–420.[Medline]
Zhang H, Li XJ, Martin DB, Aebersold R. Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nat Biotechnol (2003) 21:660–666.[CrossRef][ISI][Medline]
Zhang H, Singh S, Reinhold VN. Congruent strategies for carbohydrate sequencing: 2. FragLib: An MS(n) spectral library. Anal Chem (2005) 77:6263–6270.[Medline]
Zhang X, Herring CJ, Romano PR, Szczepanowska J, Brzeska H, Hinnebusch AG, Qin J. Identification of phosphorylation sites in proteins separated by polyacrylamide gel electrophoresis. Anal Chem (1998) 70:2050–2059.[Medline]
Zhu X, Borchers C, Bienstock RJ, Tomer KB. Mass spectrometric characterization of the glycosylation pattern of HIV-gp120 expressed in CHO cells. Biochemistry (2000) 39:11194–11204.[CrossRef][ISI][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



