Recent Work on Multimodal Emotion Recognition

Date: 2 July 2024, 10:30 AM

Location: Lecture Room 7.1.1, Neherstraße 1 and Zoom

Speakers: Prof. Jianhua Tao

Dr. Jianhua Tao is a Professor at the Tsinghua University. He is a recognized scholar in the field of speech and language processing, multimodal human-computer interaction and affective computing. He is a Fellow of the China Computer Federation (CCF), elected in 2017 “for contributions to spoken language processing and affective computing”. He has published more than 200 papers in Speech Communication, INTERSPEECH, IEEE TASLP, IEEE TIP, EURASIPJASMP, APSIPA, ISCSLP, ICASSP, Speech Prosody, ICME, ACII, etc. His recent awards include the Award of Distinguished Young Scholars of NSFC (2014), Award of National special support program for high-level person (2018), Best Paper Awards of NCMMSC (2001, 2015, 2017), Best Paper Awards of CHCI (2011, 2013, 2015, 2016), the Winner of IMAGINATION2001 (a competition organized in Eurospeech2001). He has delivered numerous invited and keynote talks, such as Speech Prosody (2012, 2018), NCMMSC (2017), etc. He has been the Leader or Chief-Scientist of several key scientific projects including Key Project of National Natural Science Foundation of China, National Key R&D Program of China, Key Project of International Cooperation, etc.

Prof. Tao was elected Chairperson of the ISCA Special Interest Group on Chinese Spoken Language Processing (ISCA SIG-CSLP) (2019-2020) and will be Technical Program Chair of INTERSPEECH2020. He was also an elected member of the Executive Committee of AAAC association (2007-2017) and served on the Steering Committee of IEEE Transactions on Affective Computing (2009-2017). He currently serves as the Subject Editor of Speech Communication, the Editorial Board Member of Journal on Multimodal User Interfaces, the Editorial Board Member of Journal of Computer Research and Development, the Director of Human Computer Interaction Committee of CSIG, the Deputy Director of Speech Dialog and Audition Committee of CCF, the Deputy Director of Speech Information Processing Committee of CSPSC, the Deputy Director of Artificial Psychology and Artificial Emotion Committee of CAAI, the Executive Board Member of CCF.

Abstract: Mulitmodal Emotion Recognition (MER) is an important technology in human-computer interaction. The talk will summarize the recent work on MER in Tsinghua University. Generally, emotion recognition involves four key components: datasets, labels features and classifiers. This report will revolve around the improvement of these four components. Regarding datasets, we organize MER2023@ACM Multimedia and MER 2024@IJCAI, and propose a large-scale 120k Chinese dataset. Regarding labels, we introduce a new task called explainable MER to improve the annotation accuracy and reliability. For features, we introduce our recent attempts in semi-supervised learning and we establish MERBench and MERTools, benchmark and toolkit covering various features, datasets, and fusion strategies. Regarding classifiers, we analyze how to automatically search for the optimal architecture, how to solve emotion recognition in conversations, and how to improve the noise robustness. Finally, we will point out some promising research directions in this field.

Novel Security User Interfaces based on Human Behavior and Physiology

Date: 20 June 2024, 11:30 AM

Location: Lecture Room 7.1.1, Neherstraße 1 and Zoom

Speakers: Prof. Dr. Florian Alt

Florian Alt is a full professor of Usable Security and Privacy at the Research Institute for Cyber Defense (CODE) in Munich. In his work, he is interested in designing secure and privacy-preserving systems that naturally blend with the way in which users interact with computing devices. Florian’s research focuses on investigating users' behavior in security contexts, on creating and enhancing security and privacy mechanisms based on users' behavior and physiology, and on understanding as well as mitigating threats emerging from novel ubiquitous technologies. Specific areas of application are Mixed Reality, Smart Homes, and Social Engineering. Florian holds a diploma in Media Informatics from LMU Munich and a Ph.D in Human-Computer Interaction from the University of Stuttgart.

Abstract: Over the past years, a global and highly professional cybercrime industry has established itself, making user-centered attacks an omnipresent threat to society. Companies, organizations, and individuals struggle to keep up and protect themselves effectively from an ever-increasing number of sophisticated attacks. The design of solid and sustainable means of protection is still a major challenge due to the need to consider the perspective of designers, software engineers, administrators, and end users alike. This talk will demonstrate how a new paradigm of human-centered security user interfaces based on knowledge about human behavior and physiology can be established. This becomes possible through sensing moving ever closer to the human body in the form of personal devices, wearables, and sensors in our environments. The talk will showcase how security mechanisms carefully integrated with interactive media systems allow a detailed understanding of human states in security-related situations to be obtained. This knowledge can be leveraged to enable personalized, contextualized, and tangible means for protection and increasing cybersecurity literacy.

Digital Phenotyping with the PhoneStudy App

Date: 18 June 2024, 2:30 PM

Location: Hörsaal A (Ismaninger Str. 22; Klinikum rechts der Isar der TUM

Speakers: Prof. Dr. Markus Bühner; M.Sc. Yannik Terhorst

Markus Bühner has been Professor of Psychological Methods and Assessment at Ludwig-Maximilian-University (LMU) of Munich since 2011. His research focuses on predicting personality, intelligence, and well-being with smartphone sensing data. For this reason, the PhoneStudy App was originally constructed. Another focus is to build a personality model with objective behavior collected via smartphone to provide an alternative model to assess personality in opposition to questionnaire data.

Yannik Terhorst is a researcher focussing on digital mental health care since his studies in psychology. His primary research areas are 1) novel assessment and diagnostic procedures (e.g., unobtrusively collected sensor data or supervised machine learning for depression screening), 2) the heterogeneity and magnitude of Internet- and mobile-based interventions effects, and 3) the acceptance and implementation of technologies for mental health.

Abstract: The use of unobtrusively collected sensor data (e.g., GPS) obtained from commonplace devices like smartphones holds promise as a valuable addition to existing tools for assessing personality, health, and human behavior (aka smart sensing). The LMU-developed PhoneStudy app provides a powerful framework to collect sensor data such as location data, language use, smartphone usage, app usage features, and ecological momentary assessments ( The present talk aims to present the framework accompanied by a presentation of the state-of-the-art evidence regarding personality and mental health research. The session will be closed by an open discussion and vision of interdisciplinary projects on how smart sensing can be combined with ongoing innovations in health informatics (e.g., smart sensing enhanced large language model for health interventions).

Presentation of the work at Fraunhofer Institute for Digital Media Technology (IDMT) Hearing, Speech and Audio Technology (HSA) in Oldenburg

Date: 24 May 2024, 10:00 AM

Location: Building 707 Room 7.1.1.

Speakers: Dr. Jens-E. Appell

Dr. Jens-E. Appell received his diploma in physics from the University of Göttingen, Germany in 1994. From 1994 to 2001 he worked as a Researcher at the Carl von Ossietzky Universität Oldenburg, Germany, on models for human auditory perception and audio signal processing and received his doctorate in 2001 with a thesis on “Loudness Models for rehabilitative Audiology”. 2001 he started at the OFFIS Institute for Information Technology as Head of the Design Center for embedded-systems and from 2003 as Director of the Embedded Hardware-/Software Systems Division. Since 2008, he founded and is since heading the Hearing, Speech and Audio Technology branch in Oldenburg of the Fraunhofer Institute for Digital Media Technology. In his work he addressed a variety of R&D topics and its applications in embedded systems, transportation, hearing aids, consumer electronics, ambient assisted living and production. Since 2020, Dr. Appell is also CEO of the FidiTec GmbH a start-up commercializing technologies in the field of Hearing, Speech and Audio Technology.

Energy-Efficient Deep Learning-Based Image Compression for LPWANs on Resource-Constrained AIoT Devices

Date: 22 January 2024, 2:30 PM

Location: Building 10 (MDSI, GALILEO Garching) + Zoom

Speakers: Nikolai Körber, M.Sc.

Nikolai Körber received his Bachelor of Science in Computer Science from the University of Applied Sciences Landshut in 2017 and his Master of Science in Computer Science from the same university in 2019. He is currently a PhD candidate in Electrical and Computer Engineering at TUM. He previously worked on several AI-related projects in industry (Fraunhofer IPA, iteratec GmbH, MHP Management- und IT Beratung GmbH) and academia (UAS Landshut, TUM). His research interests lie at the intersection of tinyML and computer vision. As part of his work, he is investigating how to enable the benefits of neural/ generative image compression on resource-constrained devices to address the high bandwidth constraints commonly encountered in sensor networks.

Abstract: Recently, there has been a rising demand for Low-Power Wide-Area Network (LPWAN)-based computer vision applications. LPWANs are specifically designed for long battery life, high transmission range and low production costs, which come however at the expense of very low bandwidths. Consequently, data compression plays a crucial role in energy-efficient image communication due to the significantly higher energy costs associated with communication compared to computation. For that, novel compute-intensive compression methods, like deep learning-based image compression techniques, are promising to further reduce the number of packets to be transmitted. Despite their superiority over hitherto established methods, such as JPEG, there is no related research that jointly addresses deep learning-based compression performance and resource efficiency on sensor platforms. Only recently, high computational power at low battery consumption has become possible by exploiting parallel ultra-low power processors like GAP8. The goal of this research is to develop robust, energy-efficient deep learning-based image compression techniques for LPWANs on resource-constrained AI-enabled IoT (AIoT) devices. Having higher compression rates while operating at low power will dramatically reduce network traffic, extend the battery life of visual IoT sensor nodes, and pave the way to a broad range of new data-intensive applications within the LPWAN/ 5G mMTC communication era.

Research Outline: AI based Quality of Service (QoS) Prediction and Repeater Placement

Date: 22 January 2024, 2:00 PM

Location: Building 10 (MDSI, GALILEO Garching) + Zoom

Speakers: Patrick Purucker, M.Sc

Patrick Purucker has received his Bachelor of Engineering in Electrical Engineering and Information Technology from the University of Applied Sciences Amberg-Weiden in 2020 followed by the Master of Science (Applied Research in Engineering Sciences) in 2021 from the same institution. He is currently working on the EU-funded projects A-IQ Ready and Archimedes focusing on resilient, AI supported wireless communications for harsh underground environments. Previously, he was working on the ADACORSA project, which was successfully completed in 2023, investigating cellular based UAS communications.

Abstract: The talk gives an overview of the research work involved in the planned doctoral study, which focuses on resilient communications for agents conducting autonomous search and rescue missions in challenging underground environments (e.g. tunnels). The research will be conducted within the framework of the projects A-IQ Ready and Archimedes. Thereby, a simulation of the wireless ad-hoc network between an operation center and search agents will be implemented to identify potential risks and tackle them with AI based algorithms. One potential approach is to develop a prediction model for the Quality of Service (QoS) in the vicinity of the agent, with the aim of adapting its trajectory to avoid communication blind spots. Furthermore, Reinforcement Learning algorithms will be researched for optimising the repeater placement within the wireless network.