Combining NMR and LC/MS using backward variable elimination: Metabolomics analysis of colorectal cancer, polyps, and healthy controls Article

Deng, L, Gu, H, Zhu, J et al. (2016). Combining NMR and LC/MS using backward variable elimination: Metabolomics analysis of colorectal cancer, polyps, and healthy controls . 88(16), 7975-7983. 10.1021/acs.analchem.6b00885

cited authors

  • Deng, L; Gu, H; Zhu, J; Nagana Gowda, GA; Djukovic, D; Chiorean, EG; Raftery, D

fiu authors

abstract

  • Both nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) play important roles in metabolomics. The complementary features of NMR and MS make their combination very attractive; however, currently the vast majority of metabolomics studies use either NMR or MS separately, and variable selection that combines NMR and MS for biomarker identification and statistical modeling is still not well developed. In this study focused on methodology, we developed a backward variable elimination partial least-squares discriminant analysis algorithm embedded with Monte Carlo cross validation (MCCV-BVE-PLSDA), to combine NMR and targeted liquid chromatography (LC)/MS data. Using the metabolomics analysis of serum for the detection of colorectal cancer (CRC) and polyps as an example, we demonstrate that variable selection is vitally important in combining NMR and MS data. The combined approach was better than using NMR or LC/MS data alone in providing significantly improved predictive accuracy in all the pairwise comparisons among CRC, polyps, and healthy controls. Using this approach, we selected a subset of metabolites responsible for the improved separation for each pairwise comparison, and we achieved a comprehensive profile of altered metabolite levels, including those in glycolysis, the TCA cycle, amino acid metabolism, and other pathways that were related to CRC and polyps. MCCV-BVE-PLSDA is straightforward, easy to implement, and highly useful for studying the contribution of each individual variable to multivariate statistical models. On the basis of these results, we recommend using an appropriate variable selection step, such as MCCV-BVE-PLSDA, when analyzing data from multiple analytical platforms to obtain improved statistical performance and a more accurate biological interpretation, especially for biomarker discovery. Importantly, the approach described here is relatively universal and can be easily expanded for combination with other analytical technologies.

publication date

  • August 16, 2016

Digital Object Identifier (DOI)

start page

  • 7975

end page

  • 7983

volume

  • 88

issue

  • 16