top of page

ABOUT ISB

Catalyzing the future of health through science.

  • Transform health with an emphasis on scientifically-defined wellness for individuals

  • Transform biomedical innovation through rapid concept to clinic back to concept translation

  • Be the innovator of systems approaches to effective STEM education in the Pacific Northwest and beyond

ISB.jpg
ISB intership: About

PLANT PROTEOMES SPECTRUM READER

08/2022-Present aims to process and analyze proteomes tandem mass spectrometry data

  • Work under Principal Scientist Eric Deutsch, PhD in Moritz Lab of ISB to develop data analysis and visualization skills, Python programming skills, and scientific research skills, specifically in the domain of proteomics and mass spectrometry.

  • Develop Python programs to read mass spectra, compute metrics about each of the spectra, select certain mass spectra for further analysis, and write new data files with selected information

  • Learn about machine learning to differentiate between different types of spectra

  • Learn about packaging up software in a way to make it useful to other researchers

  • Write a paper on the functionality of the developed software and how to use it

ISB intership: Text

Scanms2s Program

I created a program to scan through an input data file consisting of data collected from a mass spectrometer machine. The program collects and analyzes the data, searching for possible identifications for each of the peaks within the dataset's spectrum. 

​

However, mass spectrometer machines are not always accurate and may have inconsistencies. My program analyzes the data and applies a correction to all the collected data points, accounting for the inconsistencies, and minimizes the delta PPM (the difference between the measured m/z value from the machine and the theoretical m/z value of the closest identification).

​

There are four steps:

1. Using the most prevalent and common possible identifications, I scan through and collect the delta PPM for each of these acids. Collecting the delta PPM, I create a crude correction to shift every data point by the same amount.

2. Using all the identified peaks, I plot the raw delta PPM with the m/z.

3. I apply the crude correction on every data point, accounting for any large inconsistencies in the machine. Then, I fit a spline that corrects the other m/z values based on the spline fit value.

4. I plot a final scatterplot after all the data points have been corrected by the crude and spline calibration. Ideally, the final graph has delta PPM values between -1 and 1, which indicates a successful correction, and the newly corrected peaks are close to the theoretical values of the acid identifications.

​

At the end of the program, a .json file is created including the crude calibration constant and spline fit interpolation constants. These values can be inputted into my Calibration modification program, which outputs a corrected data file accounting for the machine's inconsistencies.

sample scan plots.png
ISB intership: Image
calibrated sample scan plots.png

Calibration Modification Program

Using the json file from my Scanms2s program, I created a new program that returns a corrected data file, so all data points are consistent with the theoretical identifications. This allows researchers to quickly correct data files, without worrying about machine inconsistencies.

​

This image shows a corrected 4-plot scatterplot. Unlike the initial data file, the initial run-through has a delta PPM within -1 to 1 delta PPM, indicating that the data file is already mostly accurate. When applying a correction, there is little change to be applied.

ISB intership: Image

Scanms2s Program Individual Plots

In addition to finding the calibration constants, I also print the individual peak plots in the scanms2s program. This allows the user to go in and analyze each specific identification, seeing the difference between the observed and measured m/z, all possible identifications, as well as the intensity. The intensity allows the viewer to determine the likeliness of the identification - a larger intensity indicates the identification is more likely to be correct.

​

To prevent the data file to be too large, only the top 300 most intense peaks are printed out. However, the viewer may change the arguments and print all individual peaks if wanted.

individual peak plots sample.png
ISB intership: Image
correlation sample.png

Correlation Program

I also created a correlation program. The scanms2s program uses a list to match m/z values to an acid. However, there are some m/z values that are still unidentified. This correlation program will allow you to input an m/z value and some acids, plotting a correlation between the two m/z values.

​

This allows the user to identify if the unknown m/z is correlated to another common acid, such as IH, helping limit the possibilities of the acid's identification.

ISB intership: Image

ISB AMBASSADORSHIP

2021 Summer

The ISB Ambassadorship enlightened me about the subjects of data analysis, graphing, and how code can be used to simplify and enhance data in the fields of biology and Computational modeling. I had a great time working with others on projects and learning more about Google CoLabs. I appreciate this program for providing such a wonderful opportunity, encouraging younger generations to get introduced to specialists and hands-on experience. This program really helped me improve my creative problem-solving skills through independent and group work.

ISB intership: HTML Embed
bottom of page