Access Type

Open Access Embargo

Date of Award


Degree Type


Degree Name



Electrical and Computer Engineering

First Advisor

Gregory W. Auner


Despite the attention that Raman Spectroscopy has gained recently in the area of pathogen identification, the spectra analyses techniques are not well developed. In most scenarios, they rely on expert intervention to detect and assign the peaks of the spectra to specific molecular vibration. Although some investigators have used machine-learning techniques to classify pathogens, these studies are usually limited to a specific application, and the generalization of these techniques is not clear. Also, a wide range of algorithms have been developed for classification problems, however, there is less insight to applying such methods on Raman spectra. Furthermore, analyzing the Raman spectra requires pre-processing of the raw spectra, in particular, background removing. Various techniques are developed to remove the background of the raw spectra accurately and with or without less expert intervention. Nevertheless, as the background of the spectra varies in the different media, these methods still require expert effort adding complexity and inefficiency to the identification task. This dissertation describes the development of state-of-the-art classification techniques to identify S. pyogenes from other species, including water and other confounding background pathogens. We compared these techniques in terms of their classification accuracy, sensitivity, and specificity in addition to providing a bias-variance insight in selecting the number of principal components in a principal component analysis (PCA). It was observed that Random Forest provided a better result with an accuracy of 94.11%.

Next, a novel deep learning technique was developed to remove the background of the Raman spectra and then identify the pathogen. The architecture of the network was discussed and it was found that this method yields an accuracy of 100% in our test samples. This outperforms other traditional machine learning techniques as discussed. In clinical applications of Raman Spectroscopy, the samples have confounding background creates a challenging task for the removal of the spectral background and subsequent identification of the pathogen in real- time. We tested our methodology on datasets composed of confounding background such as throat swabs from patients and discussed the robustness and generalization of the developed method. It was found that the misclassification error of the test dataset was around 3.7%. Also, the realization of the trained model is discussed in detail to provide a better understating and insight into the efficacy of the deep learning architecture. This technique provides a platform for general analysis of other pathogens in confounding environments as well.