.

Binary Classification of 3D Small-Scale Medical Images Using Video Vision Transformers

LAUR Repository

Show simple item record

dc.contributor.author Abi Younes, Simon
dc.date.accessioned 2025-03-05T08:50:28Z
dc.date.available 2025-03-05T08:50:28Z
dc.date.copyright 2025 en_US
dc.date.issued 2025-01-20
dc.identifier.uri http://hdl.handle.net/10725/16688
dc.description.abstract The applications of the pure Transformer model on sequences of image patches achieved promising results, comparable to those of the Convolutional Neural Networks (CNNs), the leading models of computer vision tasks. However, one of its gaps is the need for large volumes of data for Vision Transformers, making it worth looking into smaller scale datasets. Despite its fast advances and wide range of application, it remains lagging when it comes to the field of 3D images. In general, low-level resolution images pose problems in the model learning curve. Hence, this study leverages Vision Transformers (ViTs) capabilities in capturing global linkages and long-range interdependencies within an image, in the aim of achieving performance comparable to the benchmark established by the MedMNIST3D v2 family of datasets - offering small-scale images of high and low levels of resolution. Previous studies have demonstrated a plethora of methods in treating 3D images, increasing the interest in applying ViTs models from scratch to that data modality, specifically in small-scale datasets. The VesselMNIST3d dataset binary classification experiment was implemented by treating the 3D image as video where the third dimension represents the number of frames. Therefore, initiating temporal information for the model to learn, enriching more relationships across spatial information at a higher dimension. The study provides a robustness experimentation to prove the high performing of the vanilla Video Vision Transformer model scoring on average 0.877 for Area Under the Curve (AUC) and 0.916 for Accuracy (ACC) across 30 independent experiment. The study extends proof of pretraining the model at a higher resolution to improve the model’s learning capacity at a lower resolution level in which succeeded to boost 3% AUC score. The study transcends multiple levels of interpretation and caution for proper inferential results in order to make the Vision Transformer model competitive in its weak areas, at an alerting domain needing for growth. en_US
dc.language.iso en en_US
dc.title Binary Classification of 3D Small-Scale Medical Images Using Video Vision Transformers en_US
dc.type Thesis en_US
dc.term.submitted Fall en_US
dc.author.degree MS in Computer Science en_US
dc.author.school SoAS en_US
dc.author.idnumber 201504515 en_US
dc.author.commembers El Khatib, Nader
dc.author.commembers Abbas, Nadine
dc.author.department Computer Science And Mathematics en_US
dc.author.advisor Harmanani, Haidar
dc.keywords Accuracy en_US
dc.keywords Area Under the Curve en_US
dc.keywords Convolutional Neural Networks en_US
dc.keywords Vision Transformer en_US
dc.keywords Video Vision Transformers en_US
dc.identifier.doi https://doi.org/10.26756/th.2023.767 en_US
dc.author.email simon.abiyounes@lau.edu en_US
dc.identifier.tou http://libraries.lau.edu.lb/research/laur/terms-of-use/thesis.php en_US
dc.publisher.institution Lebanese American University en_US
dc.author.affiliation Lebanese American University en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search LAUR


Advanced Search

Browse

My Account