Improvement of selection criteria and prioritisation for neoantigen prediction

Phorutai Pearngam

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/81453

Title:	Improvement of selection criteria and prioritisation for neoantigen prediction
Other Titles:	การพัฒนาวิธีการคัดเลือก และการจัดอันดับผลการทำนายจากวิธีการทางคอมพิวเตอร์เพื่อระบุเปปไทด์ที่มีความสามารถในการเป็นนีโอแอนติเจน
Authors:	Phorutai Pearngam
Advisors:	Trairak Pisitkun Sira Sriswasdi Thanyada Rungrotmongkol
Other author:	Chulalongkorn University. Graduate School
Issue Date:	2021
Publisher:	Chulalongkorn University
Abstract:	A tumour-specific neoantigen-based cancer vaccine is a potentially powerful treatment option, which utilises unique mutated peptides from tumour cells to boost the immune response and selectively attack cancer cells. Thus, the characterisation of the specifically targeted peptides that can be selectively recognised by the immune system is essential for this approach. However, a major problem in neoantigen prediction is obtaining false positives, leading to poor outcomes in clinical research and practice. This thesis aims to address some of the computational issues in neoantigen prediction, including developing more reliable statistics for assessing peptide binding to a major histocompatibility complex (MHC) protein and using machine learning to predict which peptides will generate an immune response. Specifically, the thesis introduces an approach for parameter estimation using the modified expectation maximisation (EM) framework with the method of moments for a two-component beta mixture model, representing the distribution of true and false scores from peptide binding prediction. The estimated parameters obtaining from the model can be further used for estimating false discovery rate (FDR) or a local peptide-level statistic such as the posterior error probability (PEP) to develop a robust method for MHC binding peptide selection. Next, the thesis introduces a new immunogenicity prediction model to classify immunogenic and non-immunogenic peptides using machine learning. A data set was assembled containing peptides classes as immunogenic and non-immunogenic peptides, and peptide features of physicochemical properties and homology features were used for constructing the Random Forest classifier for immunogenicity prediction. The two innovations were assembled into an end-to-end pipeline that provides the final probability described true MHC binding ability and the potential for immunogenicity. The final probability of MHC binding and T cell recognition provides a statistical framework to guide users in defining the appropriate thresholds, and prioritising peptides with the highest chance for being real neoantigens.
Other Abstract:	นีโอแอนติเจนคือเส้นเปปไทด์ที่มีตำแหน่งกลายพันธ์ที่จำเพาะต่อเนื้อเยื่อมะเร็งของผู้ป่วย วัคซีนมะเร็งที่พัฒนาจากนีโอแอนติเจนเป็นหนึ่งในทางเลือกสำหรับการรักษาโรคมะเร็งที่มีประสิทธิภาพ เพราะการใช้เปปไทด์กลายพันธุ์ที่มีลักษณะเฉพาะจากเซลล์มะเร็ง สามารถเพิ่มการตอบสนองภูมิคุ้มกันของผู้ป่วยและไปทำลายเซลล์มะเร็งได้อย่างแม่นยำ ดังนั้นการระบุว่าเส้นเปปไทด์นั้นๆสามารถเป็นนีโอแอนติเจนได้หรือไม่ จึงมีความสำคัญอย่างมากในการพัฒนาวัคซีนมะเร็ง ซึ่งปัญหาหลักในการทำนายนีโอแอนติเจนคือมีอัตราเสี่ยงสูงที่จะได้ผลการทำนายที่เป็นผลบวกปลอม (False Discovery Rate, FDR) คือการที่ได้เส้นเปปไทด์ที่มีคะแนนการทำนายความสมารถในการเป็นนีโอแอนติเจนได้ดีเยี่ยม แต่ไม่สามารถจับกับ MHC โปรตีนได้ หรือไม่มีความสามารถในการกระตุ้นภูมิคุ้มกัน ซึ่งความผิดพลาดในขั้นตอนการทำนายนี้จะทำให้ผลการทดลองในระดับห้องปฏิบัติการหรือระดับคลินิกมีความคลาดเคลื่อน งานวิจัยนี้จึงมีจุดมุ่งหมายที่จะแก้ไขปัญหาการทำนายนีโอแอนติเจนด้วยวิธีการทางคอมพิวเตอร์ โดยการพัฒนาโมเดลที่สามารถคำนวณค่า FDR จากผลการทำนายค่าที่บ่งบอกความสามารถในการจับกันระหว่างเส้นเปปไทด์และ MHC โปรตีน การคำนวณค่า FDR จะใช้วิธีการเรียนรู้จากการกระจายตัวของข้อมูลผลการทำนาย และใช้หลักการทางคณิตศาสตร์ (Expectation Maximisation) ในการประเมิณค่าพารามิเตอร์ทางสถิติที่สอดคล้องกับการกระจายตัวของข้อมูลนั้น นอกจากนี้ ในวิทยานิพนธ์ฉบับนี้ยังได้ศึกษาและพัฒนาโมเดลที่ใช้ทำนายความสามารถการกระตุ้นภูมิของเส้นเปปไทด์ โดยใช้วิธีการทางคอมพิวเตอร์ที่เรียกว่า Machine Learning ซึ่งในงานวิจัยนี้ใช้การคำนวณแบบ Random Forest โมเดลจะเรียนรู้จากชุดข้อมูลที่ประกอบไปด้วยเส้นเปปไทด์ที่มีความสามารถในการกระตุ้นภูมิคุ้มกันและเส้นเปปไทด์ที่ไม่สามารถกระตุ้นภูมิคุ้มกันได้ ซึ่งผลการทำนายจากโมเดลนี้จะระบุคะแนนที่บ่งบอกถึงความเป็นไปได้ที่เส้นเปปไทด์จะสามารถกระตุ้นภูมิคุ้มกันได้ เมื่อรวมคะแนนจากค่า FDR และความน่าจะเป็นของความสามารถในการกระตุ้นภูมิ คะแนนจากผลรวมนั้นจะบ่งบอกถึงความน่าจะเป็นของเส้นเปปไทด์ในการจับกับ MHC โปรตีนและความสามารถในการกระตุ้นภูมิคุ้มกัน ซึ่งค่าคะแนนรวมนี้จะช่วยให้สามารถคัดเลือกเส้นเปปไทด์ที่จะสามารถเป็นนีโอแอนติเจนได้อย่างมีประสิทธิภาพและลดความผิดพลาดในกระบวนการทำนายนีโอแอนติเจนได้
Description:	Thesis (Ph.D.)--Chulalongkorn University, 2021
Degree Name:	Doctor of Philosophy
Degree Level:	Doctoral Degree
Degree Discipline:	Bioinformatics and Computational Biology
URI:	http://cuir.car.chula.ac.th/handle/123456789/81453
URI:	http://doi.org/10.58837/CHULA.THE.2021.16
metadata.dc.identifier.DOI:	10.58837/CHULA.THE.2021.16
Type:	Thesis
Appears in Collections:	Grad - Theses

Files in This Item:

File	Description	Size	Format
5987842720.pdf		6.63 MB	Adobe PDF	View/Open

Show full item record