ABOUT ISB
Catalyzing the future of health through science.
Transform health with an emphasis on scientifically-defined wellness for individuals
Transform biomedical innovation through rapid concept to clinic back to concept translation
Be the innovator of systems approaches to effective STEM education in the Pacific Northwest and beyond
PLANT PROTEOMES SPECTRUM READER
08/2022-Present aims to process and analyze proteomes tandem mass spectrometry data
Work under Principal Scientist Eric Deutsch, PhD in Moritz Lab of ISB to develop data analysis and visualization skills, Python programming skills, and scientific research skills, specifically in the domain of proteomics and mass spectrometry.
Develop Python programs to read mass spectra, compute metrics about each of the spectra, select certain mass spectra for further analysis, and write new data files with selected information
Learn about machine learning to differentiate between different types of spectra
Learn about packaging up software in a way to make it useful to other researchers
Write a paper on the functionality of the developed software and how to use it
Scanms2s Program
I created a program to scan through an input data file consisting of data collected from a mass spectrometer machine. The program collects and analyzes the data, searching for possible identifications for each of the peaks within the dataset's spectrum.
​
However, mass spectrometer machines are not always accurate and may have inconsistencies. My program analyzes the data and applies a correction to all the collected data points, accounting for the inconsistencies, and minimizes the delta PPM (the difference between the measured m/z value from the machine and the theoretical m/z value of the closest identification).
​
There are four steps:
1. Using the most prevalent and common possible identifications, I scan through and collect the delta PPM for each of these acids. Collecting the delta PPM, I create a crude correction to shift every data point by the same amount.
2. Using all the identified peaks, I plot the raw delta PPM with the m/z.
3. I apply the crude correction on every data point, accounting for any large inconsistencies in the machine. Then, I fit a spline that corrects the other m/z values based on the spline fit value.
4. I plot a final scatterplot after all the data points have been corrected by the crude and spline calibration. Ideally, the final graph has delta PPM values between -1 and 1, which indicates a successful correction, and the newly corrected peaks are close to the theoretical values of the acid identifications.
​
At the end of the program, a .json file is created including the crude calibration constant and spline fit interpolation constants. These values can be inputted into my Calibration modification program, which outputs a corrected data file accounting for the machine's inconsistencies.
Calibration Modification Program
Using the json file from my Scanms2s program, I created a new program that returns a corrected data file, so all data points are consistent with the theoretical identifications. This allows researchers to quickly correct data files, without worrying about machine inconsistencies.
​
This image shows a corrected 4-plot scatterplot. Unlike the initial data file, the initial run-through has a delta PPM within -1 to 1 delta PPM, indicating that the data file is already mostly accurate. When applying a correction, there is little change to be applied.
Scanms2s Program Individual Plots
In addition to finding the calibration constants, I also print the individual peak plots in the scanms2s program. This allows the user to go in and analyze each specific identification, seeing the difference between the observed and measured m/z, all possible identifications, as well as the intensity. The intensity allows the viewer to determine the likeliness of the identification - a larger intensity indicates the identification is more likely to be correct.
​
To prevent the data file to be too large, only the top 300 most intense peaks are printed out. However, the viewer may change the arguments and print all individual peaks if wanted.
Correlation Program
I also created a correlation program. The scanms2s program uses a list to match m/z values to an acid. However, there are some m/z values that are still unidentified. This correlation program will allow you to input an m/z value and some acids, plotting a correlation between the two m/z values.
​
This allows the user to identify if the unknown m/z is correlated to another common acid, such as IH, helping limit the possibilities of the acid's identification.
ISB AMBASSADORSHIP
2021 Summer
The ISB Ambassadorship enlightened me about the subjects of data analysis, graphing, and how code can be used to simplify and enhance data in the fields of biology and Computational modeling. I had a great time working with others on projects and learning more about Google CoLabs. I appreciate this program for providing such a wonderful opportunity, encouraging younger generations to get introduced to specialists and hands-on experience. This program really helped me improve my creative problem-solving skills through independent and group work.