Deutsch Intern
    Data Science Chair

    SemEval'25 Task 2 & Task 7 Participation - First Place for Task 2!

    03/03/2025

    We participated in two SemEval tasks and placed first in SemEval'25 Task 2!! OTTERly fantastic!

    After the sucess of last year's participation, we again managed to place first for a SemEval Challenge! The students participated in the context of the "Machine Learning for Natural Language Processing" Praktikum at our chair, where we placed 1st (for approaches not using any gold data) for the entity aware translation task!
    Here, the student explored how SQL-based retrieval combined with constrained neural translation can effectively handle culturally-adapted named entities without relying on massive language models - sometimes all you need is a pinch of SALT!

    Abstract (not yet published):
    Entity-aware machine translation faces significant challenges when translating culturally-adapted named entities that require knowledge beyond the source text. We present SALT (SQL-based Approach for LLM-Free Entity-Aware-Translation), a parameter-efficient system for the SemEval-2025 Task 2. Our approach combines SQL-based entity retrieval with constrained neural translation via logit biasing and explicit entity annotations. Despite its simplicity, it achieves state-of-the-art performance (First Place) among approaches not using gold-standard data, while requiring far less computation than LLM-based methods.
    Our ablation studies show simple SQL-based retrieval rivals complex neural models, and strategic model refinement outperforms increased model complexity. SALT offers an alternative to resource-intensive LLM-based approaches, achieving comparable results with only a fraction of the parameters.


    Secondly, another group participated in Task 7 "Multilingual and Crosslingual Fact-Checked Claim Retrieval", the focus of which is the retrieval of relevant fact-checks for social media posts across multiple languages.

    Abstract (not yet published):
    We approach this task with an enhanced bi-encoder retrieval setup, which is designed to match social media posts with relevant fact-checks using synthetic data from LLMs. We explored and analyzed two main approaches for generating synthetic posts. Either based on existing fact-checks or on existing posts. Our approach achieved an Success@10 score of 89.53% for the monolingual task and 74.48% for the crosslingual task, ranking 16th out of 28 and 13th out of 29, respectively. Without data augmentation, scores would have been 88.69 (17th) and 72.93 (15th).

    Back