Intern
    Data Science Chair

    Natural Language Processing

    In the field of Knowledge-Enriched NLP, we work on current topics of Natural Language Processing. Specifically, we are adapting and improving large language models (LLMs) such as BERT and its derivatives. Our particular focus lies in incorporating explicit knowledge, such as knowledge graphs.

    Our application areas range from analyzing historical literature (where current language models struggle due to the length of the texts) to product reviews, and even to unconventional media forms for NLP, such as comments on http://twitch.tv. These media forms present their own challenges due to their unique language style. In addition to analyzing pure text, we also investigate the adaptability of NLP methods for processing mathematical equations.

    In projects like Kallimachos or CLiGS we collaborate with literary scholars and work on literary and NLP research questions. In MOTIV, we work with psychologists to analyse the interaction between users and smart devices.

     

    Projects

    LitBERT

    Combining Knowledge Graphs and Large Language Models for character networks.

    KILiMod

    Machine learning based chat moderation and content enrichment

    MOTIV

    Cooperation about Digital Interaction Literacy: Monitor, Training, and Visibility

    Detecting Scenes in Fiction

    Building machine learning based models that can segment literary texts into coherent parts.

    Machine Learning and Knowledge Graphs

    Leveraging Knowledge Graphs for NLP

    Analysing Comments on Twitch.tv

    Sentiment analysis of twitch comment streams.

    LLäMmlein

    First native German LLM in 1B and 120M

    Concluded Projects

    • Kallimachos - Building a complete text analysis pipeline, starting with OCR from paper and going up to high-level text mining.

    • CLiGS - CLiGS combines large text collections with innovative analysis methods and hermeneutic sensibility for context. 

    Publications

    • Adapting Sequential Recom...
      Adapting Sequential Recommender Models to Content Recommendation in Chat Data using Non-Item Page-Models. Zehe, Albin; Fischer, Elisabeth; Kaiser, Jonas; Wagner, Toni; Hotho, Andreas. In Proceedings of the Sixth Knowledge-aware and Conversational Recommender Systems Workshop. 2024.
    • {O}tterly{O}bsessed{W}ith...
      {O}tterly{O}bsessed{W}ith{S}emantics at {S}em{E}val-2024 Task 4: Developing a Hierarchical Multi-Label Classification Head for Large Language Models. Wunderle, Julia; Schubert, Julian; Cacciatore, Antonella; Zehe, Albin; Pfister, Jan; Hotho, Andreas. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), A. K. Ojha, A. S. Do{\u{g}}ru{\"o}z, H. Tayyar Madabushi, G. Da San Martino, S. Rosenthal, A. Ros{\’a} (eds.), pp. 602–612. Association for Computational Linguistics, Mexico City, Mexico, 2024.
    • BibSonomy Meets ChatLLMs ...
      BibSonomy Meets ChatLLMs for Publication Management: From Chat to Publication Management: Organizing your related work using BibSonomy & LLMs. Völker, Tom; Pfister, Jan; Koopmann, Tobias; Hotho, Andreas. 2024.
    • PreAdapter: Pre-training ...
      PreAdapter: Pre-training Language Models on Knowledge Graphs. Omeliyanenko, Janna; Hotho, Andreas; Schlör, Daniel. In International Semantic Web Conference ISWC 2024, to appear. 2024.
    • Zero-Shot Clickbait Spoil...
      Zero-Shot Clickbait Spoiling by Rephrasing Titles as Questions. Wangsadirdja, Dirk; Pfister, Jan; Kobs, Konstantin; Hotho, Andreas. In Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), pp. 1090–1095. Association for Computational Linguistics, Toronto, Canada, 2023.
    • CapsKG: Enabling Continua...
      CapsKG: Enabling Continual Knowledge Integration in Language Models for Automatic Knowledge Graph Completion. Omeliyanenko, Janna; Zehe, Albin; Hotho, Andreas; Schlör, Daniel. In International Semantic Web Conference ISWC 2023, to appear. 2023.
    • Large Language Models and...
      Large Language Models and Knowledge Graphs: Opportunities and Challenges. Pan, Jeff Z.; Razniewski, Simon; Kalo, Jan-Christoph; Singhania, Sneha; Chen, Jiaoyan; Dietze, Stefan; Jabeen, Hajira; Omeliyanenko, Janna; Zhang, Wen; Lissandrini, Matteo; Biswas, Russa; de Melo, Gerard; Bonifati, Angela; Vakaj, Edlira; Dragoni, Mauro; Graux, Damien. In Transactions on Graph Data and Knowledge, 1(1), pp. 2:1–2:38. Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik, Dagstuhl, Germany, 2023.
    • Point me to your Opinion,...
      Point me to your Opinion, {S}en{P}oi. Pfister, Jan; Wankerl, Sebastian; Hotho, Andreas. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pp. 1313–1323. Association for Computational Linguistics, Seattle, United States, 2022.
    • The {F}airy{N}et Corpus -...
      The {F}airy{N}et Corpus - Character Networks for {G}erman Fairy Tales. Schmidt, David; Zehe, Albin; Lorenzen, Janne; Sergel, Lisa; D{\"u}ker, Sebastian; Krug, Markus; Puppe, Frank. In Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 49–56. Association for Computational Linguistics, Punta Cana, Dominican Republic (online), 2021.
    • Detecting Scenes in Ficti...
      Detecting Scenes in Fiction: A new Segmentation Task. Zehe, Albin; Konle, Leonard; Dümpelmann, Lea; Gius, Evelyn; Hotho, Andreas; Jannidis, Fotis; Kaufmann, Lucas; Krug, Markus; Puppe, Frank; Reiter, Nils; Schreiber, Annekea; Wiedmer, Nathalie. In Proceedings of the 16th Conference of the {E}uropean Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. ACL, 2021.
    • Shared Task on Scene Segm...
      Shared Task on Scene Segmentation @ KONVENS 2021. Zehe, Albin; Konle, Leonard; Guhr, Svenja; Dümpelmann, Lea; Gius, Evelyn; Hotho, Andreas; Jannidis, Fotis; Kaufmann, Lucas; Krug, Markus; Puppe, Frank; Reiter, Nils; Schreiber, Annekea. In Shared Task on Scene Segmentation @ KONVENS 2021, pp. 1–21. 2021.
    • Improving Sentiment Analy...
      Improving Sentiment Analysis with Biofeedback Data. Schl{\"o}r, Daniel; Zehe, Albin; Kobs, Konstantin; Veseli, Blerta; Westermeier, Franziska; Br{\"u}bach, Larissa; Roth, Daniel; Latoschik, Marc Erich; Hotho, Andreas. In Proceedings of LREC2020 Workshop ``People in language, vision and the mind’’ (ONION2020), pp. 28–33. European Language Resources Association (ELRA), Marseille, France, 2020.
    • Emote-Controlled: Obtaini...
      Emote-Controlled: Obtaining Implicit Viewer Feedback through Emote based Sentiment Analysis on Comments of Popular Twitch.tv Channels. Kobs, Konstantin; Zehe, Albin; Bernstetter, Armin; Chibane, Julian; Pfister, Jan; Tritscher, Julian; Hotho, Andreas. In ACM Transactions on Social Computing. 2020.
    • HarryMotions – Classify...
      HarryMotions – Classifying Relationships in Harry Potter based on Emotion Analysis. Zehe, Albin; Arns, Julia; Hettinger, Lena; Hotho, Andreas. In 5th SwissText & 16th KONVENS Joint Conference. 2020.
    • Towards Predicting the Su...
      Towards Predicting the Subscription Status of Twitch.tv Users. Kobs, Konstantin; Potthast, Martin; Wiegmann, Matti; Zehe, Albin; Stein, Benno; Hotho, Andreas. In Proceedings of ECML-PKDD 2020 ChAT Discovery Challenge on Chat Analytics for Twitch. 2020.
    • LM4KG: Improving Common S...
      LM4KG: Improving Common Sense Knowledge Graphs with Language Models. Omeliyanenko, Janna; Zehe, Albin; Hettinger, Lena; Hotho, Andreas. In International Semantic Web Conference. Springer, 2020.
    • On the Right Track! Analy...
      On the Right Track! Analysing and Predicting Navigation Success in Wikipedia. Koopmann, Tobias; Dallmann, Alexander; Hettinger, Lena; Niebler, Thomas; Hotho, Andreas. In Proceedings of the 30th ACM Conference on Hypertext and Social Media, of HT ’19, pp. 143–152. ACM, Hof, Germany, 2019.
    • Detection of Scenes in Fi...
      Detection of Scenes in Fiction. Gius, Evelyn; Jannidis, Fotis; Krug, Markus; Zehe, Albin; Hotho, Andreas; Puppe, Frank; Krebs, Jonathan; Reiter, Nils; Wiedmer, Nathalie; Konle, Leonard. In Proceedings of Digital Humanities 2019. 2019.
    • Classification of text-ty...
      Classification of text-types in german novels. Schlör, D; Schöch, C; Hotho, A. In Digital Humanities 2019: Conference Abstracts. 2019.
    • Analysing Direct Speech i...
      Analysing Direct Speech in German Novels. Jannidis, Fotis; Konle, Leonard; Zehe, Albin; Hotho, Andreas; Krug, Markus. In DHd 2018. 2018.
    • Burrows’ Zeta: Exploring and Evaluating Variants and Parameters. Schöch, Christof; Schlör, Daniel; Zehe, Albin; Gebhard, Henning; Becker, Martin; Hotho, Andreas. In DH, pp. 274–277. 2018.
    • A White-Box Model for Det...
      A White-Box Model for Detecting Author Nationality by Linguistic Differences in Spanish Novels. Zehe, Albin; Schlör, Daniel; Henny-Krahmer, Ulrike; Becker, Martin; Hotho, Andreas. In DH. ADHO, 2018.
    • ClaiRE at SemEval-2018 Ta...
      ClaiRE at SemEval-2018 Task 7 - Extended Version. Hettinger, Lena; Dallmann, Alexander; Zehe, Albin; Niebler, Thomas; Hotho, Andreas. 2018.
    • ClaiRE at SemEval-2018 Ta...
      ClaiRE at SemEval-2018 Task 7: Classification of Relations using Embeddings. Hettinger, Lena; Dallmann, Alexander; Zehe, Albin; Niebler, Thomas; Hotho, Andreas. In Proceedings of International Workshop on Semantic Evaluation (SemEval-2018). New Orleans, LA, USA, 2018.
    • Burrows Zeta: Varianten u...
      Burrows Zeta: Varianten und Evaluation. Schöch, Christof; Calvo, José; Zehe, Albin; Hotho, Andreas. In DHd 2018. 2018.
    • Learning Semantic Related...
      Learning Semantic Relatedness from Human Feedback Using Relative Relatedness Learning. Niebler, Thomas; Becker, Martin; Pölitz, Christian; Hotho, Andreas. In ISWC’17. 2017.
    • Learning Word Embeddings ...
      Learning Word Embeddings from Tagging Data: A methodological comparison. Niebler, Thomas; Hahn, Luzian; Hotho, Andreas. In Proceedings of the LWDA. 2017.
    • Towards Sentiment Analysi...
      Towards Sentiment Analysis on German Literature. Zehe, Albin; Becker, Martin; Jannidis, Fotis; Hotho, Andreas. 2017.
    • Neutralising the Authoria...
      Neutralising the Authorial Signal in Delta by Penalization: Stylometric Clustering of Genre in Spanish Novels. Tello, José Calvo; Schlör, Daniel; Henny-Krahmer, Ulrike; Schöch, Christof. In DH, R. Lewis, C. Raynor, D. Forest, M. Sinatra, S. Sinclair (eds.). Alliance of Digital Humanities Organizations (ADHO), 2017.
    • Prediction of Happy Endin...
      Prediction of Happy Endings in German Novels. Zehe, Albin; Becker, Martin; Hettinger, Lena; Hotho, Andreas; Reger, Isabella; Jannidis, Fotis. In Proceedings of the Workshop on Interactions between Data Mining and Natural Language Processing 2016, P. Cellier, T. Charnois, A. Hotho, S. Matwin, M.-F. Moens, Y. Toussaint (eds.), pp. 9–16. 2016.
    • Classification of Literar...
      Classification of Literary Subgenres. Hettinger, Lena; Jannidis, Fotis; Reger, Isabella; Hotho, Andreas. In DHd 2016. 2016.
    • Extracting Semantics from...
      Extracting Semantics from Unconstrained Navigation on Wikipedia. Niebler, Thomas; Schlör, Daniel; Becker, Martin; Hotho, Andreas. In KI, 30(2), pp. 163–168. 2016.
    • Analyzing Features for th...
      Analyzing Features for the Detection of Happy Endings in German Novels. Jannidis, Fotis; Reger, Isabella; Zehe, Albin; Becker, Martin; Hettinger, Lena; Hotho, Andreas. 2016.
    • Straight Talk! Automatic ...
      Straight Talk! Automatic Recognition of Direct Speech in Nineteenth-Century French Novels. Schöch, Christof; Schlör, Daniel; Popp, Stefanie; Brunner, Annelen; Henny, Ulrike; Tello, Jos{\’e} Calvo. In DH, pp. 346–353. 2016.
    • Significance Testing for ...
      Significance Testing for the Classification of Literary Subgenres. Hettinger, Lena; Jannidis, Fotis; Reger, Isabella; Hotho, Andreas. In DH 2016. 2016.
    • Straight Talk! Automatic ...
      Straight Talk! Automatic Recognition of Direct Speech in Nineteenth-Century French Novels. Sch{\"o}ch, Christof; Schl{\"o}r, Daniel; Popp, Stefanie; Brunner, Annelen; Henny, Ulrike; Tello, Jos{\’e} Calvo. In DH, pp. 346–353. 2016.
    • Evaluating Emergent Seman...
      Evaluating Emergent Semantics in Folksonomies on Human Intuition. Niebler, Thomas; Becker, Martin; Zoller, Daniel; Doerfel, Stephan; Hotho, Andreas. 2015.
    • Genre classification on G...
      Genre classification on German novels. Hettinger, Lena; Becker, Martin; Reger, Isabella; Jannidis, Fotis; Hotho, Andreas. In Proceedings of the 12th International Workshop on Text-based Information Retrieval. 2015.
    • Proceedings of the 1st In...
      Proceedings of the 1st International Workshop on Interactions between Data Mining and Natural Language Processing co-located with The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, DMNLP@PKDD/ECML 2014, Nancy, France, September 15, 2014. Cellier, Peggy; Charnois, Thierry; Hotho, Andreas; Matwin, Stan; Moens, Marie{-}Francine; Toussaint, Yannick. In Vol. 1202 of {CEUR} Workshop Proceedings. CEUR-WS.org, 2014.
    • Computing semantic relate...
      Computing semantic relatedness from human navigational paths on Wikipedia. Singer, Philipp; Niebler, Thomas; Strohmaier, Markus; Hotho, Andreas. In Proceedings of the 22nd international conference on World Wide Web companion, of WWW ’13 Companion, ACM (ed.), pp. 171–172. International World Wide Web Conferences Steering Committee, Rio de Janeiro, Brazil, 2013.
    • Computing Semantic Relate...
      Computing Semantic Relatedness from Human Navigational Paths: A Case Study on Wikipedia. Singer, Philipp; Niebler, Thomas; Strohmaier, Markus; Hotho, Andreas. In International Journal on Semantic Web and Information Systems (IJSWIS), 9(4), pp. 41–70. IGI Global, 2013.