dc.description.abstract |
With the exponential growth in digitally created content and the surge in internet users, the challenges of handling and analyzing large volumes of data have increased, particularly in the context of textual data for sentiment classification. Although sentiment analysis has been extensively conducted on English, this has not been the case for Arabic, a language spoken in more than 22 countries, leaving significant room for exploration, especially across its diverse dialects. The Lebanese dialect, primarily spoken in Lebanon, differs significantly from Modern Standard Arabic (MSA) and has limited resources and research dedicated to sentiment analysis.
This thesis aims to bridge that gap by employing transfer learning, which allows for the transfer of knowledge across domains. In this context, we utilize the pre-trained XLM-RoBERTa, a cross-lingual variant of RoBERTa trained on multilingual data across 100 languages. We adapt this pre-trained model for sentiment classification in Lebanese dialect tweets by training the model on a labeled English dataset and evaluating it on 1,000 records translated into the Lebanese dialect. This approach demonstrates how sentiment classification can be conducted in under-resourced dialects with minimal labeled data through cross-lingual learning.
Furthermore, our methodology provides a novel contribution by showcasing the efficacy of transfer learning in a low-resource language context. Our dialect-specific model, fine-tuned from the pre-trained one, bypasses the need for large dialect-specific datasets and illustrates how this rapid adaptation can be achieved. The results are strong, with a high accuracy rate of 73.4% in our best epoch, outperforming traditional methods and proving the effectiveness of transfer learning techniques in sentiment analysis for less-represented languages and dialects.
This work suggests that the solution to the scarcity of resources for Lebanese dialect sentiment analysis lies within this approach and highlights the broader potential of transfer learning in addressing natural language processing challenges for low-resource languages. The framework is extendable to other dialects, establishing a versatile approach for sentiment classification in both multilingual and cross-lingual scenarios. |
en_US |