Intern
KI 2024

Abstracts of Accepted Papers

Session 1: Probabilistic and Predictive Models

Towards Privacy-Preserving Relational Data Synthesis via Probabilistic Relational Models

Malte Luttermann, Ralf Möller and Mattis Hartwig

Probabilistic relational models provide a well-established formalism to combine first-order logic and probabilistic models, thereby allowing to represent relationships between objects in a relational domain. At the same time, the field of artificial intelligence requires increasingly large amounts of relational training data for various machine learning tasks. Collecting real-world data, however, is often challenging due to privacy concerns, data protection regulations, high costs, and so on. To mitigate these challenges, the generation of synthetic data is a promising approach. In this paper, we solve the problem of generating synthetic relational data via probabilistic relational models. In particular, we propose a fully-fledged pipeline to go from relational database to probabilistic relational model, which can then be used to sample new synthetic relational data points from its underlying probability distribution. As part of our proposed pipeline, we introduce a learning algorithm to construct a probabilistic relational model from a given relational database.

Star-Shaped Denoising Diffusion Probabilistic Models (Extended Abstract)

Andrey Okhotin, Dmitry Molchanov, Vladimir Arkhipkin, Grigory Bartosh, Viktor Ohanesian, Aibek Alanov and Dmitry Vetrov

A Note on Linear Time Series Prediction

Christopher Bonenberger, Markus Schneider, Wolfgang Ertel and Friedhelm Schwenker

We consider the problem of univariate time series prediction from an elementary machine learning point of view. Beginning with the question of whether and how Principal Component Analysis (PCA) can be used for time series prediction, we describe a simple methodology and attempt to classify PCA-based prediction in terms of statistics, signal processing and dynamical systems theory. Moreover, we extend the unsupervised scenario to a self-supervised linear regression scenario and develop a unifying perspective. In this regard, we review several related techniques, namely autoregressive (AR) and moving-average (MA) models, Singular Spectrum Analysis (SSA), Wiener filtering, and the discrete Fourier transform (DFT). By presenting these methods in a unified way, we can show how PCA-based time series prediction can be categorized in different settings of stochastic and deterministic models. Finally, we show the distinct relation between PCA-based prediction and (finite-
order) MA processes and propose a refined methodology.

Graph2RETA: Graph Neural Networks for Pick-up and Delivery Route Prediction and Arrival Time Estimation

Wilson Sentanoe, Sreyashi Saha, Yasas Diyanananda, Aqsa Manzoor, Buddhika Dasanayake and Daniela Thyssens

This research proposes an effective way to address the issues faced by pick-up and delivery services. The real-world variables that affect delivery routes are frequently overlooked by traditional routing technologies, resulting in differences between intended and actual trajectories. Similarly, the issue of forecasting the Estimated Time of Arrival involves unique challenges due to its high dimensionality. We suggest an integrated predictive modeling methodology that tackles routing prediction in a dynamic environment and ETA prediction at the same time to overcome these difficulties. Our method, Graph2RETA, uses a dynamic spatial-temporal graph-based model to forecast delivery workers’ future routing behaviors while integrating route inference into ETA prediction. Graph2RETA leverages rich decision context and spatial-temporal information to improve the prediction accuracy of the concurrent state-of-the-art while capturing dynamic interactions between workers and timesteps by incorporating the underlying graph structure and features.

Session 2: Visual and Acoustic Approaches

Leveraging Weakly Supervised and Multiple Instance Learning for Multi-label Classification of Passive Acoustic Monitoring Data

Ilira Troshani, Thiago Gouvêa and Daniel Sonntag

Data collection and annotation are time-consuming, resource-intensive processes that often require domain expertise. Existing data collections such as animal sound collections provide valuable data sources, but their utilization is often hindered by the lack of fine-grained labels. In this study, we examine the use of existing weakly supervised methods to extract fine-grained information from existing weakly-annotated data accumulated over time and alleviate the need for collection and annotation of fresh data. We employ TALNet, a Convolutional Recurrent Neural Network (CRNN) model and train it on 60-second sound recordings labeled for the presence of 42 different anuran species and compare it to other models such as BirdNet, a model for detection of bird vocalisation. We conduct the evaluation on 1-second segments, enabling precise sound event localization. Furthermore, we investigate the impact of varying the length of the training input and explore different pooling functions’ effects on the model’s performance on AnuraSet. Finally, we integrate it in an interactive user interface that facilitates training and annotation. Our findings demonstrate the effectiveness of TALNet and BirdNet in harnessing weakly annotated sound collections for wildlife monitoring. Our method not only improves the extraction of information from coarse labels but also simplifies the process of annotating new data for experts.

Active Learning in Multi-label Classification of Bioacoustic Data

Hannes Kath, Thiago S. Gouvêa and Daniel Sonntag

Passive Acoustic Monitoring (PAM) has become a key technology in wildlife monitoring, providing vast amounts of acoustic data. The recording process naturally generates multi-label datasets; however, due to the significant annotation time required, most available datasets use exclusive labels. While active learning (AL) has shown the potential to speed up the annotation process of multi-label PAM data, it lacks standardized performance metrics across experimental setups. We present a novel performance metric for AL, the ‘speedup factor’, which remains constant across experimental setups. It quantifies the fraction of samples required by an AL strategy compared to random sampling to achieve equivalent model performance. Using two multi-label PAM datasets, we investigate the effects of class sparsity, ceiling performance, number of classes, and different AL strategies on AL performance. Our results show that AL performance is superior on datasets with sparser classes, lower ceiling performance, fewer classes, and when using uncertainty sampling strategies.

Early Explorations of Lightweight Models for Wound Segmentation on Mobile Devices

Vanessa Borst, Timo Dittus, Konstantin Müller and Samuel Kounev

The aging population poses numerous challenges to health-care, including the increase in chronic wounds in the elderly. The current approach to wound assessment by therapists based on photographic documentation is subjective, highlighting the need for computer-aided wound recognition from smartphone photos. This offers objective and convenient therapy monitoring, while being accessible to patients from their home at any time. However, despite research in mobile image segmentation, there is a lack of focus on mobile wound segmentation. To address this gap, we conduct initial research on three lightweight architectures to investigate their suitability for smartphone-based wound segmentation. Using public datasets and UNet as a baseline, our results are promising, with both ENet and TopFormer, as well as the larger UNeXt variant, showing comparable performance to UNet. Furthermore, we deploy the models into a smartphone app for visual assessment of live segmentation, where results demonstrate the effectiveness of TopFormer in distinguishing wounds from wound-coloured objects. While our study highlights the potential of transformer models for mobile wound segmentation, future work should aim to further improve the mask contours.

Leveraging YOLO for Real-Time Video Analysis of Animal Welfare in Pig Slaughtering Processes

Christian Beecks, Anandraj Amalraj, Alexander Graß, Marc Jentsch, Felix Kitschke, Maximilian Norz and Patric Schäffer

Artificial intelligence has empowered digitalization into a new era of intelligent systems. Machine learning solutions are being tailored to various application scenarios, leading to automated functionalities along complex real-world processes. In this paper, we investigate the domain of animal welfare and present our latest findings in relation to the automated detection of animal welfare violations. To this end, we introduce three different situations of increased animal welfare risk occurring in a pig slaughtering process and elucidate YOLO-based approaches to detect these situations based on video data. Though the reported results are considered to be preliminary, our solution already detects most of the situations of increased animal welfare risk with high accuracy.

Session 3: Data Sets and Data Handling

Mechanisms for Data Sharing in Collaborative Causal Inference

Björn Filter, Ralf Möller and Özgür Özçep

Collaborative causal inference (CCI) is a federated learning method for pooling data from multiple, often self-interested, parties, to achieve a common learning goal over causal structures, e.g. estimation and optimization of treatment variables in a medical setting. Since obtaining data can be costly for the participants and sharing unique data poses the risk of losing competitive advantages, motivating the participation of all parties through equitable rewards and incentives is necessary. This paper devises an evaluation scheme to measure the value of each party’s data contribution to the common learning task, tailored to causal inference’s statistical demands, by comparing completed partially directed acyclic graphs (CPDAGs) inferred from observational data contributed by the participants. The Data Valuation Scheme thus obtained can then be used to introduce mechanisms that incentivize the agents to contribute data. It can be leveraged to reward agents fairly, according to the quality of their data, or to maximize all agents’ data contributions.

Image Dataset Quality Assessment Through Descriptive Out-of-Distribution Detection

Sami Kharma and Jürgen Großmann

Out-of-distribution detection ensures trustworthiness in machine learning systems by detecting anomalous data points and adjusting confidence in predictions accordingly. However, another key use-case of out-of-distribution detection is the assessment of data quality with respect to a desired distribution or semantic range of data. This work proposes a simple but powerful approach that allows for cleaning of image data based on descriptively defining desired data as well as undesired data. Notably, this method does not require the training of a machine learning model. In addition, this work presents a new image dataset suited for evaluating data cleaning tasks in a way that has practical relevance, and demonstrates satisfactory experimental results.

Instance segmentation with a novel tree log detection dataset⋆

Julian Haasis, Christopher Bonenberger and Markus Schneider

Reliable tree log detection is a key requirement for automation of forestry operations. Despite the substantial progress regarding object detection in general, tree log detection lags behind due to the lack of well-annotated datasets. In order to address this gap, we introduce the Tree Log Detection Dataset (TLDD). This real-world dataset is collected using a combination of 360 degrees multiline LIDAR and stereo cameras. It offers a wide range of annotated segmentation masks for over 1000 images of about 22000 tree logs. We assess the quality of the presented data set by comparing it to existing data sets using state-of-the-art architectures such as MaskDINO and Mask2Former. Our experiments demonstrate the quality of TLDD and confirm the efficiency of attention-based, transformer-like networks.

A Human-in-the-Loop Tool for Annotating Passive Acoustic Monitoring Datasets (Extended Abstract)

Hannes Kath, Thiago S. Gouvêa and Daniel Sonntag

Passive Acoustic Monitoring (PAM) has become a key technology in wildlife monitoring, generating large amounts of acoustic data. However, the effective application of machine learning methods for sound event detection in PAM datasets is highly dependent on the availability of annotated data, which requires a labour-intensive effort to generate. This paper summarises two iterative, human-centred approaches that make efficient use of expert annotation time to accelerate understanding of the data: Combining transfer learning and active learning, we present an annotation tool that selects and annotates the most informative samples one at a time [11]. To annotate multiple samples simultaneously, we present a tool that allows annotation in the embedding space of a variational autoencoder manipulated by a classification head [10]. For both approaches, we provide no-code web applications for intuitive use by domain experts.

MIND Your Language: A Multilingual Dataset for Cross-lingual News Recommendation (Extended Abstract)

Andreea Iana, Goran Glavaš and Heiko Paulheim

We present xMIND, an open, multilingual news recommendation dataset derived from the English MIND dataset using machine translation, covering 14 linguistically and geographically diverse languages, with digital footprints of varying sizes. Using xMIND, we benchmark several content-based neural news recommenders (NNRs) in zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer scenarios, considering both monolingual and bilingual news consumption patterns. Our findings reveal that (i) current NNRs, even when based on a multilingual language model, suffer from substantial performance losses under ZS-XLT and that (ii) inclusion of target-language data in FS-XLT training has limited benefits, particularly when combined with bilingual news consumption. We release xMIND at https://github.com/andreeaiana/xMIND.

Session 4: Explainability

Quantifying the Trade-Offs between Dimensions of Trustworthy AI - An Empirical Study on Fairness, Explainability, Privacy, and Robustness

Nils Kemmerzell and Annika Schreiner

Trustworthy AI encompasses various requirements for AI systems, including explainability, fairness, privacy, and robustness. Addressing these dimensions concurrently is challenging due to inherent tensions and trade-offs between them. Current research highlights these trade-offs, focusing on specific interactions, but comprehensive and systematic evaluations remain insufficient. This study aims to enhance the understanding of trade-offs among explainability, fairness, privacy, and robustness in AI. By conducting extensive experiments in the domain of image classification, it quantitatively assesses how methods to improve one requirement impact others. More specifically, it explores different training adaptations to enhance each requirement and measures their effects on the others on four datasets for gender classification. The experiments revealed that the Local Gradient Alignment method improved explainability and robustness but introduced trade-offs in fairness, privacy, and accuracy. Fairness-focused training adaptations only enhanced fairness for the most biased models. For all other cases, fairness, explainability and robustness are reduced. Differential Privacy improved privacy but compromised explainability, fairness, and accuracy, with varied impacts on robustness. Data augmentation techniques enhanced robustness, explainability and accuracy with minor trade-offs in privacy and fairness.

Data Generation for Explainable Occupational Fraud Detection

Julian Tritscher, Maximilian Wolf, Anna Krause, Andreas Hotho and Daniel Schlör

Occupational fraud, the deliberate misuse of company assets by employees, causes damages of around 5% of yearly company revenue. Recent work therefore focuses on automatically detecting occupational fraud through machine learning on the company data contained within enterprise resource planning systems. Since interpretability of these machine learning approaches is considered a relevant aspect of occupational fraud detection, first works have already integrated post-hoc explainable artificial intelligence approaches into their fraud detectors. While these explainers show promising first results, systematic advancement of explainable fraud detection methods is currently hindered by the general lack of ground truth explanations to evaluate explanation quality and choose suitable explainers. To avoid expensive expert annotations, we propose a data generation scheme based on multi-agent systems to obtain company data with labeled occupational fraud cases and ground truth explanations. Using this data generator, we design a framework that enables the optimization of post-hoc explainers for unlabeled company data. On two datasets, we experimentally show that our framework is able to successfully differentiate between explainers of high and low explanation quality, showcasing the potential of multi-agent-simulations to ensure proper performance of post-hoc explainers.

A Brief Systematization of Explanation-Aware Attacks

Maximilian Noppel and Christian Wressnegger

Due to the overabundance of trained parameters modern machine learning models are largely considered black boxes. Explanation methods aim to shed light on the inner working of such models, and, thus can serve as debugging tools. However, recent research has demonstrated that carefully crafted manipulations at the input or the model can successfully fool the model and the explanation method. In this work, we briefly present our systematization of such explanation-aware attacks. We categorize them according to three distinct attack types, three types of scopes, and three different capabilities an adversary can have. In our full paper [12], we further present a hierarchy of robustness notion and various defensive techniques tailored toward explanation-aware attacks.

LaFAM: Unsupervised Feature Attribution with Label-free Activation Maps

Aray Karjauv and Sahin Albayrak

Convolutional Neural Networks (CNNs) are known for their ability to learn hierarchical structures, naturally developing detectors for objects, and semantic concepts within their deeper layers. Activation maps (AMs) reveal these saliency regions, which are crucial for many Explainable AI (XAI) methods. However, the direct exploitation of raw AMs in CNNs for feature attribution remains underexplored in literature. This work revises Class Activation Map (CAM) methods by introducing the Label-free Activation Map (LaFAM), a streamlined approach utilizing raw AMs for feature attribution without reliance on labels. LaFAM presents an efficient alternative to conventional CAM methods, demonstrating particular effectiveness in saliency map generation for self-supervised learning while maintaining applicability in supervised learning scenarios.

Explanatory Interactive Machine Learning with Counterexamples from Constrained Large Language Models

Emanuel Slany, Stephan Scheele and Ute Schmid

In Explanatory Interactive Machine Learning (XIML), counterexamples refine machine learning models by augmenting human feedback. Traditionally created through random sampling or data augmentation, the emergence of Large Language Models (LLMs) now allows an infinite amount of new training instances to be queried through simple natural language prompts. However, validation of LLM results becomes crucial as they may produce potentially inaccurate or “hallucinated” content, which has led to an increased incorporation of logical reasoning with LLMs in recent literature. We present LlmXiml, a framework that integrates logically constrained LLMs into XIML. Our results indicate that LLM-generated counterexamples improve the model performance and logical reasoning increases the counterexamples’ correctness.

Session 5: AI in Practice

SaVeWoT: Scripting and Verifying Web of Things Systems and Their Effects on the Physical World

Justus Fries, Michael Freund and Andreas Harth

We introduce SaVeWoT (Scripting and Verifying Web of Things Systems), an approach for designing, formally verifying, and deploying decentralized control systems based on the W3C WoT. SaVeWoT consists of two main parts: the SaVeWoT language and the SaVeWoT compiler. The SaVeWoT language models devices (i.e., Things), controllers that orchestrate Things, virtual composite Things (i.e., subsystems consisting of multiple Things), interactions between these components, and their effects on the physical world. The SaVeWoT compiler uses Thing Descriptions (TDs) and SaVeWoT behavior descriptions along with correctness specifications in Linear-time Temporal Logic (LTL) to automatically generate a Promela model, which is validated using the SPIN model checker. We demonstrate the feasibility of the SaVeWoT approach by verifying a conveyor belt system as an example and conducting an empirical evaluation.

Evaluating AI-based Components in Autonomous Railway Systems ⋆

Jan Roßbach, Oliver De Candido, Ahmed Hammam, and Michael Leuschel

Recent breakthroughs in Artificial Intelligence (AI) are poised to transform many domains, including autonomous railway transportation systems. However, safety is essential in this high-stake, safety-critical domain. To ensure compliance with current safety certification standards, we propose a comprehensive methodology for evaluating AI-based components in railway applications. It combines ontology-driven systematic testing and data generation, formal verification techniques, and real-time monitoring. By leveraging these methods, we provide a comprehensive safety assurance and hopefully pave the way for the widespread adoption of AI in railway transportation systems.

Saxony-Anhalt is the Worst: Bias towards German Federal States in Large Language Models

Anna Kruspe and Mila Stillman

Recent research demonstrates geographic biases in various Large Language Models that reflects common human biases, which are presumably present in the training data. We hypothesize that these biases also exist on smaller scales. Within Germany, there is still a strong divide between the former states of the German Democratic Republic (the "East") and those of the Federal Republic of Germany (the “West”) in many respects as well as other perceived geographic disparities. We evaluate the responses of ChatGPT-3.5, ChatGPT-4, and LeoLM for various ratings and estimations by state. Those include objectively measurable values as well as subjective assessments of residents’ characteristics. Experiments are conducted in English and German. We show that there is a very visible bias in both the subjective and the objective ratings, and analyze various effects. In particular, we demonstrate that Eastern states are consistently rated lower (or worse, depending on task),
whereas Southern states frequently rate higher. We also discuss models’ behaviors when prompted with these tasks.

Context-Specific Selection of Commonsense Knowledge Using Large Language Models

Claudia Schon and Oliver Jakobs

In the field of automated reasoning, practical applications often face a significant challenge: knowledge bases are typically too large to be fully processed by theorem provers. To still be able to prove that a given goal follows from a large knowledge base, selection techniques are used to determine the parts of the knowledge base that are relevant to the goal. Traditional selection techniques used for this task are usually syntax-based and often overlook a crucial aspect—the meaning of symbol names and axioms. Especially in commonsense reasoning scenarios, the meaning embedded in the symbol names provides invaluable insights. For example, in a proof task using the symbol name cow, it intuitively makes more sense to select formulae using the symbol name calf than formulae using the symbol name weapon. To address this gap, our paper introduces a selection technique that exploits the capabilities of large language models. This technique focuses on contextually related formulae, closely aligning the selected part of the knowledge base with the context of the goal. The approach is implemented and we present a series of experiments that show promising results.

Session 6: AI and Games

Could the Declarer have Discarded It? Refined Anticipation of Cards in Skat

Stefan Edelkamp

In this paper we refine the concept of anticipation within a card game, taking the Nullspiel in Skat as a running example. We generate the belief space of all distributions of cards according to the assumption on plausible play of the declarer. Using a selection of open-card searches in a two-stage knowledge filtering we improve the play of the two opponents to find forced wins. In the voting scheme to combine the open-card analyses of the possible worlds, we additionally use the search tree size and depth to prefer short proofs. In hundreds of thousands human ouvert games replayed by our AIs over 99% matched the predictions of the open-card solver, with only 0.21% of games known lost for the declarer were not won by the AIs, both trademarks outperforming human play.

Efficiently Training Neural Networks for imperfect information Games by Sampling Information Sets

Timo Bertram, Johannes Fürnkranz and Martin Müller

In imperfect information games, the evaluation of a game state not only depends on the observable world but also relies on hidden parts of the environment. As accessing the obstructed information trivialises state evaluations, one approach to tackle such problems is to estimate the value of the imperfect state as a combination of all states in the information set, i.e., all possible states that are consistent with the current imperfect information. In this work, the goal is to learn a function that maps from the imperfect game information state to its expected value. However, constructing a perfect training set, i.e. an enumeration of the whole information set for numerous imperfect states, is often infeasible. To compute the expected values for an imperfect information game like Reconnaissance Blind Chess, one would need to evaluate thousands of chess positions just to obtain the training target for a single state. Still, the expected value of a state can already be approximated with appropriate accuracy from a much smaller set of evaluations. Thus, in this paper, we empirically investigate how a budget of perfect information game evaluations should be distributed among training samples to maximise the return. Our results show that sampling a small number of states, in our experiments roughly 3, for a larger number of separate positions is preferable over repeatedly sampling a smaller quantity of states. Thus, we find that in our case, the quantity of different samples seems to be more important than higher target quality.

A Framework for General Trick-Taking Card Games

Stefan Edelkamp

Inspired by recent advances in Computer Skat and Bridge, this paper investigates automated play for several other trick-taking card games like Belote, Tarot, Doppelkopf, Spades, Hearts, Euchre, and Schafkopf. We present a general framework that instantiates all these card games with deals from their respective decks based on a programming interface to play them interactively at a human-adequate speed and level. We have included bidding, team building, and game selection, as well as general and specialized card recommenders applicable for the different stages of trick-taking. We study the impact of expert rules for enhanced play. The AIs are evaluated in different variants and against a general card player that lacks expert rules.

Session 7: Symbolic Approaches

SocialCOP: Reusable Building Blocks for Collective Constraint Optimization

Julia Ruttmann and Alexander Schiendorfer

Distributing limited resources among a group of agents is a fundamental challenge in both algorithmic decision support systems and everyday life. The goal of achieving a socially desirable allocation of these resources instead of mere economic efficiency is relevant to many types of allocation problems under hard constraints. At the same time, modeling languages and high-level libraries for combinatorial optimization problems are becoming more widespread. Although fairness is an important key factor in optimization processes, there is currently no way to use fairness constraints and objectives – unless they are written from scratch. Thus, combining and experimenting with different fairness criteria is tedious as no predefined set of constraints and objectives is available in modeling languages. We propose SocialCOP, a toolbox of reusable constraint modeling building blocks of concepts derived from social choice theory, fair division, and algorithmic fairness (namely, Envy-freeness, Leximin, Rawlsianism, Utilitarianism, Pareto). Our toolbox provides a convenient and reusable solution for adding fairness constraints to existing collective constraint optimization problems formulated in MiniZinc. Our created building blocks can be combined or added individually to the existing satisfaction problem. Experimental results show that a much richer combination of fairness objectives can be modeled, leading to the discovery of solutions that are optimal in more than one way.

From Resolving Inconsistencies in Qualitative Constraints Networks to Identifying Robust Solutions: A Universal Encoding in ASP

Moritz Bayerkuhnlein, Tobias Schwartz and Diedrich Wolter

Qualitative Constraint Networks (QCNs) are foundational to Qualitative Spatial and Temporal Reasoning (QSTR) for modelling real-world entity relations, facilitating decision-making and planning. However, perturbations or even inconsistencies in inputs may arise in various application contexts, such as from unforeseen circumstances or by merging data from different sources. Therefore, it is crucial for qualitative reasoning systems to address these challenges, e.g., not just identify any solution, but to optimize solutions for resilience against perturbations. Likewise, the ability is needed to resolve any inconsistencies with minimal repairs. Both tasks are challenging optimization problems on their own, as determining network consistency is already NP-hard for most qualitative constraint languages. In this paper, we present a universal encoding in Answer-Set Programming (ASP) to address these challenges. Our encoding allows for efficient resolution of both the robustness and repair problem by exploiting ASP optimization techniques. We demonstrate the effectiveness of our encoding in an experimental evaluation. Our results show that our encoding can match the state-of-the-art on some qualitative calculi in terms of computational efficiency. On top, our encoding offers a flexible and powerful framework for tackling optimization problems in qualitative reasoning systems.

Out-of-Distribution Detection with Logical Reasoning (Extended Abstract)

Konstantin Kirchheim, Tim Gonschorek and Frank Ortmeier

Machine learning models often only generalize reliably to samples from their training distribution which motivates out-of-distribution (OOD) detection in safety-critical applications. Current OOD detection methods, however, tend to be domain agnostic and are incapable of incorporating prior knowledge about the structure of the training distribution. To address this limitation, we introduce a novel, neuro-symbolic OOD detection algorithm that combines a deep learning-based perception system with a first-order logic-based knowledge representation. A reasoning system uses this knowledge base at run-time to infer whether inputs are consistent with prior knowledge about the training distribution. This not only enhances performance but also fosters a level of explainability that is particularly beneficial in safety-critical contexts.

Session 8: Reinforcement Learning

Data Augmentation in Latent Space with Varational Autoencoder and pretrained Image Model for Visual Reinforcement Learning

Xuzhe Dang and Stefan Edelkamp

In this paper we investigate alternative data augmentation strategies for Visual Reinforcement Learning and explore the potential benefits of fine-tuning a pretrained image encoder to enhance the learning process. We propose an innovative approach that applies data augmentation in the latent space, rather than directly manipulating pixel values. This method utilizes a Variational Autoen- coder, integrated with a pretrained image model, to facilitate the data augmentation process in a more abstract and feature-rich latent space. We use the DeepMind Control suite as a benchmark to evaluate the impact of our approach.

Automated Design in Hybrid Action Spaces by Reinforcement Learning and Differential Evolution

Quirin Göttl, Haris Asif, Alexander Mattick, Robert Marzilger and Axel Plinge

Passive Acoustic Monitoring (PAM) has become a key technology in wildlife monitoring, generating large amounts of acoustic data. However, the effective application of machine learning methods for sound event detection in PAM datasets is highly dependent on the availability of annotated data, which requires a labour-intensive effort to generate. This paper summarises two iterative, human-centred approaches that make efficient use of expert annotation time to accelerate understanding of the data: Combining transfer learning and active learning, we present an annotation tool that selects and annotates the most informative samples one at a time [11]. To annotate multiple samples simultaneously, we present a tool that allows annotation in the embedding space of a variational autoencoder manipulated by a classification head [10]. For both approaches, we provide no-code web applications for intuitive use by domain experts.

Uli-RL: A Real-World Deep Reinforcement Learning Pedagogical Agent for Children

Anna Riedmann, Julia Götz, Carlo D’Eramo and Birgit Lugrin

Deep Reinforcement Learning (DRL) has proven its usefulness in various fields, such as robotic control systems, recommendation algorithms, and natural language dialogue interfaces. Recently, we have been witnessing a growing interest in applying DRL in education, with early results suggesting beneficial effects. However, the majority of research in educational applications apply methods in simulation without evaluation with real learners, thus providing scarce evidence of its effectiveness on real-world problems. Arguably, real-world applications are crucial to properly assess the validity of DRL methods. To this end, we present an approach for integrating DRL into an empirically validated digital reading application for second graders in the form of an adaptive pedagogical agent. We use DRL to tailor the agent’s feedback behavior to each child’s individual learning needs. We evaluate our approach with second graders, investigating their performance and overall motivation, and compare it to a control version of the app. Through this work, we contribute an innovative approach to the use of DRL within the context of primary education, showcasing promising results in a real-world evaluation.