.

Transfer Learning and Sentiment Analysis of Lebanese Dialect Data using a Multilingual Deep Learning Approach

LAUR Repository

Show simple item record

dc.contributor.author Chehade, Mira
dc.date.accessioned 2025-02-07T13:50:30Z
dc.date.available 2025-02-07T13:50:30Z
dc.date.copyright 2024 en_US
dc.date.issued 2024-10-14
dc.identifier.uri http://hdl.handle.net/10725/16527
dc.description.abstract With the exponential growth in digitally created content and the surge in internet users, the challenges of handling and analyzing large volumes of data have increased, particularly in the context of textual data for sentiment classification. Although sentiment analysis has been extensively conducted on English, this has not been the case for Arabic, a language spoken in more than 22 countries, leaving significant room for exploration, especially across its diverse dialects. The Lebanese dialect, primarily spoken in Lebanon, differs significantly from Modern Standard Arabic (MSA) and has limited resources and research dedicated to sentiment analysis. This thesis aims to bridge that gap by employing transfer learning, which allows for the transfer of knowledge across domains. In this context, we utilize the pre-trained XLM-RoBERTa, a cross-lingual variant of RoBERTa trained on multilingual data across 100 languages. We adapt this pre-trained model for sentiment classification in Lebanese dialect tweets by training the model on a labeled English dataset and evaluating it on 1,000 records translated into the Lebanese dialect. This approach demonstrates how sentiment classification can be conducted in under-resourced dialects with minimal labeled data through cross-lingual learning. Furthermore, our methodology provides a novel contribution by showcasing the efficacy of transfer learning in a low-resource language context. Our dialect-specific model, fine-tuned from the pre-trained one, bypasses the need for large dialect-specific datasets and illustrates how this rapid adaptation can be achieved. The results are strong, with a high accuracy rate of 73.4% in our best epoch, outperforming traditional methods and proving the effectiveness of transfer learning techniques in sentiment analysis for less-represented languages and dialects. This work suggests that the solution to the scarcity of resources for Lebanese dialect sentiment analysis lies within this approach and highlights the broader potential of transfer learning in addressing natural language processing challenges for low-resource languages. The framework is extendable to other dialects, establishing a versatile approach for sentiment classification in both multilingual and cross-lingual scenarios. en_US
dc.language.iso en en_US
dc.title Transfer Learning and Sentiment Analysis of Lebanese Dialect Data using a Multilingual Deep Learning Approach en_US
dc.type Thesis en_US
dc.term.submitted Fall en_US
dc.author.degree MS in Computer Science en_US
dc.author.school SoAS en_US
dc.author.idnumber 202106024 en_US
dc.author.commembers Haber, Samer
dc.author.commembers Kaddoura, Sanaa
dc.author.department Computer Science and Mathematics en_US
dc.author.advisor Haraty, Ramzi
dc.keywords Sentiment Analysis en_US
dc.keywords Lebanese Dialect en_US
dc.keywords Arabic Natural Language Processing en_US
dc.keywords Transfer Learning en_US
dc.keywords Cross-lingual Learning en_US
dc.keywords XLM-RoBERTa en_US
dc.keywords Low-resource Languages en_US
dc.keywords Multilingual Data en_US
dc.keywords Text Classification en_US
dc.identifier.doi https://doi.org/10.26756/th.2023.747 en_US
dc.author.email mira.chehade@lau.edu en_US
dc.identifier.tou http://libraries.lau.edu.lb/research/laur/terms-of-use/thesis.php en_US
dc.publisher.institution Lebanese American University en_US
dc.author.affiliation Lebanese American University en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search LAUR


Advanced Search

Browse

My Account