I am part of the first cohort of Social AI CDT students, working on a project at the intersection of psychiatry and social signal processing. I did my undergraduate in Psychology at the University of Aberdeen finishing up with a thesis examining the physical symptoms of social anxiety. My academic interests are broad, but I have been particularly drawn to the fields of theoretical cognitive science, and cognitive neuroscience in both its basic and translational forms. The latter is what has motivated me to pursue research in the field of computational psychiatry, a novel approach aiming to detect and define mental disorders with the help of data-driven techniques. For my PhD, I am using methods from social signal processing to help psychiatrists identify children who display signs of Reactive Attachment Disorder, a severe cluster of psychological and behavioural issues affecting abused and neglected children.
Multimodal Deep Learning for Detection and Analysis of Reactive Attachment Disorder in Abused and Neglected Children
The goal of this project is to develop AI-driven methodologies for detection and analysis of Reactive Attachment Disorder (RAD), a psychiatric disorder affecting abused and neglected children. The main effect of RAD is “failure to seek and accept comfort”, i.e., the shut-down of a set of psychological processes, known as the Attachment System and essential for normal development, that allow children to establish and maintain beneficial relationships with their caregivers [YAR16]. While having serious implications for the child’s future (e.g., RAD is common in children with complex psychiatric disorders and criminal behaviour [MOR17]), RAD is highly amenable to treatment if recognised in infancy [YAR16]. However, the disorder is hard for clinicians to detect because its symptoms are not easily visible to the naked eye.
Encouraging progress in RAD diagnosis has been achieved by manually analysing videos of children involved in therapeutic sessions with their caregivers, but such an approach is too expensive and time consuming to be applied in a standard clinical setting. For this reason, this project proposes the use of AI-driven technologies for the analysis of human behaviour [VIN09]. These have been successfully applied to other attachment related issues [ROF19] and can help not only to automate the observation of the interactions, thus reducing the amount of time needed for possible diagnosis, but also to identify behavioural markers that might escape clinical observation. The emphasis will be on approaches that jointly model multiple behavioural modalities through the use of appropriate deep network architectures [BAL18].
The experimental activities will revolve around an existing corpus of over 300 real-world videos collected in a clinical setting and they will include three main steps:
- Identification of the behavioural cues (the RAD markers) most likely to account for RAD through manual observation of a representative sample of the corpus;
- Development of AI-driven methodologies, mostly based on signal processing and deep networks, for the detection of the RAD markers in the videos of the corpus;
- Development of AI-driven methodologies, mostly based on deep networks, for the automatic identification of children affected by RAD based on presence and intensity of the cues detected at point 2.
The likely outcomes of the system include a scientific analysis of RAD related behaviours as well as AI-driven methodologies capable of supporting the activity of clinicians. In this respect, the project aligns with needs and interests of private and public bodies dealing with child and adolescent mental health (e.g., the UK National Health Service and National Society for the Prevention of Cruelty to Children).
[BAL18] Baltrušaitis, T., Ahuja, C. and Morency, L.P. (2018). Multimodal Machine Learning: A Survey and Taxonomy, IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 421-433.
[HUM17] Humphreys, K. L., Nelson, C. A., Fox, N. A., & Zeanah, C. H. (2017). Signs of reactive attachment disorder and disinhibited social engagement disorder at age 12 years: Effects of institutional care history and high-quality foster care. Development and Psychopathology, 29(2), 675-684.
[MOR17] Moran, K., McDonald, J., Jackson, A., Turnbull, S., & Minnis, H. (2017). A study of Attachment Disorders in young offenders attending specialist services. Child Abuse & Neglect, 65, 77-87.
[ROF19] Roffo G, Vo DB, Tayarani M, Rooksby M, Sorrentino A, Di Folco S, Minnis H, Brewster S, Vinciarelli A. (2019). Automating the Administration and Analysis of Psychiatric Tests: The Case of Attachment in School Age Children. Proceedings of the CHI, Paper No.: 595 Pages 1–12.
[VIN09] Vinciarelli, A., Pantic, M. and Bourlard, H. (2009), Social Signal Processing: Survey of an Emerging Domain, Image and Vision Computing Journal, 27(12), 1743-1759.
[YAR16] Yarger, H. A., Hoye, J. R., & Dozier, M. (2016). Trajectories of change in attachment and biobehavioral catch-up among high risk mothers: a randomised clinical trial. Infant Mental Health Journal, 37(5), 525-536.
I am a PhD student with the Social AI CDT. My MA is in English Language and Linguistics from the University of Glasgow. My current area of research is the further development of socially intelligent robots with a hope to improve Human-Robot Interaction, through the use of theory and methods from socially informed linguistics, and through the deployment in a real-world context of MuMMER (a humanoid robot, based on the SoftBank Robotics’ Pepper robot). During my undergraduate, my research interests included looking at the ways in which speech is practically produced and understood, which different social factors have an effect on speech, which different conversational rules are applied in different social situations, what causes breakdowns in communication and how they can be avoided. My dissertation was titled “Are There New Emerging Basic Colour Terms in British English? A Statistical Analysis”, which was a study into how the semantic space of colour is divided linguistically by speakers of different social backgrounds. The prospect of developing helpful and entertaining robots that could be used to aid child language development, the elderly and the general public drew me to the Social AI CDT. I am excited to move forward in this research.
In 2022, Rhiannon Fyfe was awarded a PG Diploma in Computing Science and Psychology with Merit.
Evaluating and Enhancing Human-Robot Interaction for Multiple Diverse Users in a Real-World Context
The increasing availability of socially-intelligent robots with functionality for a range of purposes, from guidance in museums [Geh15], to companionship for the elderly [Heb16], has motivated a growing number of studies attempting to evaluate and enhance Human-Robot Interaction (HRI). But, as Honig and Oron-Gilad review of recent work on understanding and resolving failures in HRI observes [Hon18], most research has focussed on technical ways of improving robot reliability. They argue that progress requires a “holistic approach” in which “[t]he technical knowledge of hardware and software must be integrated with cognitive aspects of information processing, psychological knowledge of interaction dynamics, and domain-specific knowledge of the user, the robot, the target application, and the environment” (p.16). Honig and Oron-Gilad point to a particular need to improve the ecological validity of evaluating user communication in HRI, by moving away from experimental, single-person environments, with low-relevance tasks, mainly with younger adult users, to more natural settings, with users of different social profiles and communication strategies, where the outcome of successful HRI matters.
The main contribution of this PhD project is to develop an interdisciplinary approach to evaluating and enhancing communication efficacy of HRI, by combining state-of-the-art social robotics with theory and methods from socially-informed linguistics [Cou14] and conversation analysis [Cli16]. Specifically, the project aims to improve HRI with the newly-developed MultiModal Mall Entertainment Robot (MuMMER). MuMMER is a humanoid robot, based on the SoftBank Robotics’ Pepper robot, which has been designed to interact naturally and autonomously in the communicatively-challenging space of a public shopping centre/mall with unlimited possible users of differing social backgrounds and communication styles [Fos16]. MuMMER’s role is to entertain and engage visitors to the shopping mall, thereby enhancing their overall experience in the mall. This in turn requires ensuring successful HRI which is socially acceptable, helpful and entertaining for multiple, diverse users in a real-world context. As of June 2019, the technical development of the MuMMER system has been nearly completed, and the final robot system will be located for 3 months in a shopping mall in Finland during the autumn of 2019.
The PhD project will evalute HRI with MuMMER in a new context, a large shopping mall in an English-speaking context, in Scotland’s largest, and most socially and ethnically-diverse city, Glasgow. Project objectives are to:
- Design a set of sociolinguistically-informed observational studies of HRI with MuMMER in situ with users from a range of social, ethnic, and language backgrounds, using direct and indirect methods
- Identify the minimal technical modification(dialogue, non-verbal, other) to optimise HRI, and thereby user experience and engagement, also considering indices such as consumer footfall to the mall
- Implement technical alterations, and re-evaluate with new users.
[Cli16] Clift, R. (2016). Conversation Analysis. Cambridge: Cambridge University Press.
[Cou14] Coupland, N., Sarangi, S., & Candlin, C. N. (2014). Sociolinguistics and social theory. Routledge.
[Fos16] Foster M.E., Alami, R., Gestranius, O., Lemon, O., Niemela, M., Odobez, J-M., Pandey, A.M. (2016) The MuMMER Project: Engaging Human-Robot Interaction in Real-World Public Spaces. In: Agah A., Cabibihan J., Howard A., Salichs M., He H. (eds) Social Robotics. ICSR 2016. Lecture Notes in Computer Science, vol 9979. Springer, Cham
[Geh15] Gehle R., Pitsch K., Dankert T., Wrede S. (2015). Trouble-based group dynamics in real-world HRI – Reactions on unexpected next moves of a museum guide robot., in 24th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2015 (Kobe), 407–412.
[Heb16] Hebesberger, D., Dondrup, C., Koertner, T., Gisinger, C., Pripfl, J. (2016).Lessonslearned from the deployment of a long-term autonomous robot as companion inphysical therapy for older adults with dementia: A mixed methods study. In: TheEleventh ACM/IEEE International Conference on Human Robot Interaction, 27–34
[Hon18] Honig, S., & Oron-Gilad, T. (2018). Understanding and Resolving Failures in Human-Robot Interaction: Literature Review and Model Development. Frontiers in Psychology, 9, 861.
In 2020, Salman Mohammadi was awarded an MSc in Computing Science and Psychology with Distinction.
Enhancing Social Interactions via Physiologically-Informed AI
Over the past few years major developments in machine learning (ML) have enabled important advancements in artificial intelligence (AI). Firstly, the field of deep learning (DL) – which has enabled models to learn complex input-output functions (e.g. pixels in an image mapped onto object categories), has emerged as a major player in this area. DL builds upon neural network theory and design architectures, expanding these in ways that enable more complex function approximations.
The second major advance in ML has combined advances in DL with reinforcement learning (RL) to enable new AI systems for learning state-action policies – in what is often referred to as deep reinforcement learning (DRL) – to enhance human performance in complex tasks. Despite these advancements, however, critical challenges still exist in incorporating AI into a team with human(s).
One of the most important challenges is the need to understand how humans value intermediate decisions (i.e. before they generate a behaviour) through internal models of their confidence, expected reward, risk etc. Critically, such information about human decision-making is not only expressed through overt behaviour, such as speech or action, but more subtlety through physiological changes, small changes in facial expression and posture etc. Socially and emotionally intelligent people are excellent at picking up on this information to infer the current disposition of one another and to guide their decisions and social interactions.
In this project, we propose to develop a physiologically-informed AI platform, utilizing neural and systemic physiological information (e.g. arousal, stress) (Fouragnan et al., 2015; Pisauro et al., 2017; Gherman & Philiastides, 2018) together with affective cues from facial features (Vinciarelli, Pantic & Bourlard, 2009; Baltrušaitis, Robinson & Morency, 2016) to infer latent cognitive and emotional states from humans interacting in a series of social decision-making tasks (e.g. trust game, prisoner’s dilemma etc). Specifically, we will use these latent states to generate rich reinforcement signals to train AI agents (specifically DRL) and allow them to develop a “theory of mind” (Premack & Woodruff 1978; Frith & Frith 2005) in order to make predictions about upcoming human behaviour. The ultimate goal of this project is to deliver advancements towards “closing-the-loop”, whereby the AI agent feeds-back its own predictions to the human players in order to optimise behaviour and social interactions.
My name is Emily O’Hara and I am a current PhD student in the Social AI CDT programme for Socially Intelligent Artificial Agents at the University of Glasgow. My doctoral research focuses on the social perception of speech, paying particular attention to how the usage of fillers affect the percepts of speaker personality. Within the frames of artificial intelligence, the project aims to improve the functionality and naturalness of artificial voices. My research interests during my undergraduate degree in English Language and Linguistics included sociolinguistics, natural language processing, and psycholinguistics. My dissertation was entitled “Masked Degrees of Facilitation: Can They be Found for Phonological Features in Visual Word Recognition?” and was a psycholinguistic study of how the phonological elements of words are stored in the brain and accessed during reading. The opportunity to integrate my knowledge of linguistic methods and theory with computer science was what attracted me to the CDT, and I look forward to undertaking research that can aid in the creation of more seamless user-AI communication.
Social Perception of Speech
Short vocalizations like “ehm” and “uhm”, the fillers according to the linguistics terminology, are common in everyday conversations (up to one every 10.9 seconds according to the analysis presented in [Vin15]). For this reason, it is important to understand whether the fillers uttered by a person convey personality impressions, i.e., whether people develop a different opinion about a given individual depending on how she/he utters the fillers. This project will use an existing corpus of 2988 fillers (uttered by 120 persons interacting with one another) to achieve the following scientific and technological goals:
- To establish the vocal parameters that lead to consistent percepts of speaker personality both within and across listeners and the neural areas involved in these attributions from brief fillers.
- To develop an AI approach aimed at predicting the trait people attribute to an individual when they hear her/his fillers.
The first goal will be achieved through behavioural [Mah18] and neuroimaging experiments [Tod08] that pinpoint how and where in the brain stable personality percepts are processed. From there, acoustical analysis and data-driven approaches using cutting-edge acoustical morphing techniques will allow for generation of hypotheses feeding subsequent AI networks [McA14]. This section will allow the development of the skills necessary to design, implement, and analyse behavioural and neural experiments for establishing social percepts from speech and voice.
The final goal will be achieved through the development of an end-to-end automatic approach that can map the speech signal underlying a filler into the traits that listeners attribute to a speaker. This will allow the development of the skills necessary to design and implement deep neural networks capable to model sequences of physical measurements (with an emphasis on speech signals).
The project is relevant to the emerging domain called personality computing [Vin14] and the main application related to this project is the synthesis of “personality colored” speech, i.e., artificial voices that can give the impression of a personality and sound not only more realistic, but also better at performing the task they are developed for [Nas05].
[Mah18]. G. Mahrholz, P. Belin and P. McAleer, “Judgements of a speaker’s personality are correlated across differing content and stimulus type”, PLOS ONE, 13(10): e0204991. 2018
[McA14]. P. McAleer, A. Todorov and P. Belin, “How Do You Say ‘Hello’? Personality Impressions from Brief Novel Voices”, PL0S ONE, 9(3): e90779. 2014
[Tod08]. A. Todorov, S. G. Baron and N. N. Oosterhof, “Evaluating face trustworthiness: a model based approach, Social Cognitive Affective Neuroscience, 3(2), pp. 119-127. 2008
[Vin15] A.Vinciarelli, E.Chatziioannou and A.Esposito, “When the Words are not Everything: The Use of Laughter, Fillers, Back-Channel, Silence and Overlapping Speech in Phone Calls“, Frontiers in Information and Communication Technology, 2:4, 2015.
[Vin14] A.Vinciarelli and G.Mohammadi, “A Survey of Personality Computing“, IEEE Transactions on Affective Computing, Vol. 5, no. 3, pp. 273-291, 2014.
[Nas05] C.Nass, S.Brave, “Wired for speech: How voice activates and advances the human-computer relationship”, MIT Press, 2005.
I am a recent Psychology graduate from the University of Strathclyde, Glasgow. To me, conducting research has always been the most interesting part of my degree. I find that people and minds are the most complex and fascinating phenomena one could study, and throughout completing my degree I have been very passionate about learning more about the mechanisms underlying our cognition, emotion, and behaviour.
Grounded in the work on my dissertation, my current research interests include the psychology of biases, heuristics, and automatic processing. In this PhD programme I will work on the project “Robust, Efficient, Dynamic Theory of Mind” with Stacy Marsella and Lawrence Barsalou.
Being part of the Social AI CDT programme, I look forward to contributing to the emerging interdisciplinary junction between psychology and computer science. Coming from a psychological background, I am excited to apply psychological research to the development of more efficient and dynamic models of social situations.
Robust, Efficient, Dynamic Theory of Mind
The ability to socially function effectively is a critical human skill and providing such skills to artificial agents is a core challenge faced by these technologies. The aim of this work is to improve the social skills of artificial agents, making them more robust, by giving them a skill that is fundamental to effective human social interaction, the ability to possess and use beliefs about the mental processes and states of others, commonly called Theory of Mind (ToM) [Whiten, 1991]. Theory of Mind skills are predictive of social cooperation and collective intelligence, as well as key to cognitive empathy, emotional intelligence, and the use of shared mental models in teamwork. Although people typically develop ToM at an early age, research has shown that even adults with a fully formed capability for ToM are limited in their capacity to employ it (Keysar, Lin, & Barr, 2003; Lin, Keysar, & Epley, 2010).
From a computational perspective, there are sound explanations as to why this may be the case. As critical as they are, forming, maintaining and using models of others in decision making can be computationally intractable. Pynadath & Marsella  presented an approach, called minimal mental models, that sought to reduce these costs by exploiting criteria such as prediction accuracy and utility costs associated with prediction errors as a way to limit model complexity. There is a clear relation of that work to the work in psychology on ad hoc categories formed in order to achieve goals [Barsalou, 1983], as well as ideas on motivated inference [Kunda, 1990].
This effort seeks to develop more robust artificial agents with ToM using an approach that collects data on human ToM performance, analyses the data and then constructs a computational model based on the analyses. The resulting model will provide artificial agents with a robust, efficient capacity to reason about others.
a) Study the nature of mental model formation and adaptation in people during social interaction– specifically how do one’s own goals, as well as the other’s goals influence and make tractable the model formation and use process.
b) Develop a tractable computational model of this process that takes into the account the artificial agent’s and the human’s goals, as well as models of each other, in an interaction. Tractable of course is fundamental in face-to-face social interactions where agents must respond rapidly.
c) Evaluate the model in artificial agent – human interactions.
We see this work as fundamental to taking embodied social agents beyond their limited, inflexible approaches to interacting socially with us to a significantly more robust capacity. Key to that will be making theory of mind reasoning in artificial agents more tractable via taking into account both the agent’s goals and the human’s goals in the interaction.
[Kin90] Kunda, Z. (1990). The case for motivated reasoning. Psychological Bulletin, 108(3), 480-498.
[Bar83] Barsalou, L.W. Memory & Cognition (1983) 11: 211.
[Key03] Keysar, B., Lin, S., & Barr, D. (2003). Limits on theory of mind use in adults. Cognition, 89, 25–41.
[Lin10] Lin, S., Keysar, B., & Epley, N. (2010). Reflexively mindblind: Using theory of mind to interpret behavior requires effortful attention. Journal of Experimental Social Psychology, 46, 551–556.
[Pin07] David V. Pynadath & Stacy C. Marsella (2007). Minimal Mental Models. In: AAAI, pp. 1038-1046.
[Whi91] Whiten, Andrew (ed). Natural Theories of Mind. Oxford: Basil Blackwell, 1991.