The Social AI CDT and Social AI Group proudly hosted the Social AI Workshop: Social AI for Speech and Conversation, in the ARC, University of Glasgow, on 28th March 2025.
This event was open to all PhD students, faculty, and staff, where thee distinguished guest speakers delivered sessions, as per the below:
Jean-Francois Bonastre
Senior Researcher (Directeur de Recherche), Inria, France | Professor, Avignon University, France
Explainability in speaker recognition (and more generally in speech processing)
Explainability has become a mandatory topic in AI in general. This is largely due to the need for greater trust on the part of experts and the general public, in the face of AI’s limitations, manipulations, biases, errors, and hallucinations. New AI regulations, such as those of the EU, also play an important role, as explainability aspects are now required for certain applications. Speech processing applications are particularly concerned, as they are often linked to critical human aspects, such as HR, healthcare or forensics. This talk will present in a few words some of the main approaches in AI explainability (XAI), as well as their limitations. Using speaker recognition as an example, a new explainable by-design approach will be presented. By representing speech in terms of the presence or absence of speech attributes taken from a small and bounded set, it enables simple explanations that can be interpreted by anyone. Some potential extensions, such as a more general scheme capable of mixing knowledge-based and automatically discovered attributes, or the application of this principle to pre-trained encoders, will be discussed.
To view Jean-Francois Bonastre’s presentation slides, please click here.
Heysem Kaya
Assistant Professor, Utrecht University, The Netherlands
Towards Fair and Interpretable Speech-based Depression Severity Modeling
Recently, with increasing momentum, many state-of-the-art deep learning models have shown to be successful in detecting depression based on multimodal cues. However, such efforts and models render to be useless in clinical applications due to both legal (such as due to the new EU AI law) and practical reasons. Therefore, we aim to make such critical machine learning tasks employed for high-risk applications responsible and trustworthy. From responsibility in ML, here we mean transparency/interpretability, algorithmic fairness, and privacy. Since speech is relatively less prone to automatic subject identification via public tools/search engines compared to vision (i.e., face) and hence is more privacy-preserving, we work on speech modality for such critical tasks as depression. Therefore, this talk will focus on our recent and ongoing efforts in speech-based depression prediction with responsible AI considerations.
To view Heysem Kaya’s presentation slides, please click here.
Khiet Truong
Associate professor, University of Twente, The Netherlands
From speech technology to spoken conversational interaction technology
Nowadays, speech technology is at our fingertips. Automatic speech recognition (ASR) and speech synthesis have evolved drastically to the point where ASR performance has reached human parity and where an artificial voice is not discernible anymore from a natural human voice. However, as soon as you start talking to machines, one will start to notice that speech technology is still facing many challenges. In order to move to technology that really understands you, this technology also needs to be able to process non-speech or paralinguistic information. In this talk, I will highlight some of the research we are carrying out on spoken conversational interaction technology. I will talk about how current open-source ASR systems deal with non-speech elements and different speaker groups. And I will present some of our work around designing robot communication (that does not always need to involve speech).