GERest: A German Dataset for Aspect Sentiment Quadruple Prediction

Thema:
GERest: A German Dataset for Aspect Sentiment Quadruple Prediction
Art:
BA
BetreuerIn:
Nils Constantin Hellwig
BearbeiterIn:
Niclas Reuse
Status:
in Bearbeitung
angelegt:
2024-11-21
Antrittsvortrag:
2025-01-13

Hintergrund

The growing field of sentiment analysis has evolved from general text-level sentiment detection to more granular approaches like Aspect-Based Sentiment Analysis (ABSA) and Aspect Sentiment Quad Prediction (ASQP). While numerous resources and datasets exist for ASQP in English, there is a significant gap in resources for the German language. This lack of annotated datasets limits the development and evaluation of models capable of performing ASQP tasks in German. Addressing this gap is important to advance multilingual NLP and detailed sentiment analysis in underrepresented languages.

Zielsetzung der Arbeit

The primary goal of this thesis is to create the first German ASQP dataset by extending an existing ABSA dataset with opinion term annotations. The dataset will be used to train and evaluate a transformer-based model, such as BERT, to analyze aspect-level sentiments. Additionally, the performance of the model will be compared with an English ASQP dataset to assess cross-linguistic differences and the model's effectiveness.

Konkrete Aufgaben

  • Convert the existing german ABSA-dataset GERestaurant from JSON to CSV format
  • Extend the existing Rest16 dataset by annotating opinion terms for ASQP compatibility
  • Train a transformer-based model (e.g. BERT) with the annotated dataset
  • Evaluate the model's performance and compare it with the english ASQP dataset

Erwartete Vorkenntnisse

  • Basic knowledge of Python for data preprocessing and model training
  • Basic knowledge of natural language processing concepts (particularly ABSA)
  • Basic experience with machine learning frameworks

Weiterführende Quellen

  • W. Zhang, X. Li, Y. Deng, L. Bing and W. Lam, „A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges,“ in IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 11, pp. 11019-11038, 1 Nov. 2023, doi: 10.1109/TKDE.2022.3230975.
  • Wenxuan Zhang, Yang Deng, Xin Li, Yifei Yuan, Lidong Bing, and Wai Lam. 2021. Aspect Sentiment Quad Prediction as Paraphrase Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9209–9219, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.