Enhancing Social Interactions via Physiologically-Informed AI
Over the past few years major developments in machine learning (ML) have enabled important advancements in artificial intelligence (AI). Firstly, the field of deep learning (DL) – which has enabled models to learn complex input-output functions (e.g. pixels in an image mapped onto object categories), has emerged as a major player in this area. DL builds upon neural network theory and design architectures, expanding these in ways that enable more complex function approximations.
The second major advance in ML has combined advances in DL with reinforcement learning (RL) to enable new AI systems for learning state-action policies – in what is often referred to as deep reinforcement learning (DRL) – to enhance human performance in complex tasks. Despite these advancements, however, critical challenges still exist in incorporating AI into a team with human(s).
One of the most important challenges is the need to understand how humans value intermediate decisions (i.e. before they generate a behaviour) through internal models of their confidence, expected reward, risk etc. Critically, such information about human decision-making is not only expressed through overt behaviour, such as speech or action, but more subtlety through physiological changes, small changes in facial expression and posture etc. Socially and emotionally intelligent people are excellent at picking up on this information to infer the current disposition of one another and to guide their decisions and social interactions.
In this project, we propose to develop a physiologically-informed AI platform, utilizing neural and systemic physiological information (e.g. arousal, stress) ([Fou15][Pis17][Ghe18]) together with affective cues from facial features ([Vin09][Bal16]) to infer latent cognitive and emotional states from humans interacting in a series of social decision-making tasks (e.g. trust game, prisoner’s dilemma etc). Specifically, we will use these latent states to generate rich reinforcement signals to train AI agents (specifically DRL) and allow them to develop a “theory of mind” ([Pre78][Fri05]) in order to make predictions about upcoming human behaviour. The ultimate goal of this project is to deliver advancements towards “closing-the-loop”, whereby the AI agent feeds-back its own predictions to the human players in order to optimise behaviour and social interactions.
[Ghe18] S Gherman, MG Philiastides, “Human VMPFC encodes early signatures of confidence in perceptual decisions”, eLife, 7: e38293, 2018.
[Pis17] MA Pisauro, E Fouragnan, C Retzler, MG Philiastides, “Neural correlates of evidence accumulation during value-based decisions revealed via simultaneous EEG-fMRI”, Nature Communications, 8: 15808, 2017.
[Fou15] E Fouragnan, C Retzler, KJ Mullinger, MG Philiastides, “Two spatiotemporally distinct value systems shape reward-based learning in the human brain”, Nature Communications, 6: 8107, 2015.
[Vin09] A.Vinciarelli, M.Pantic, and H.Bourlard, “Social Signal Processing: Survey of an Emerging Domain“, Image and Vision Computing Journal, Vol. 27, no. 12, pp. 1743-1759, 2009.
[Bal16] T.Baltrušaitis, P.Robinson, and L.-P. Morency. “Openface: an open source facial behavior analysis toolkit.” Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2016.
[Pre78] D. Premack, G. Woodruff, “Does the chimpanzee have a theory of mind?”, Behavioral and brain sciences Vol. 1, no. 4, pp. 515-526, 1978.
[Fri05] C. Frith, U. Frith, “Theory of Mind”, Current Biology Vol. 15, no. 17, R644-646, 2005.
Situating mobile interventions for healthy hydration habits
Aims and Objectives. This project will examine which kinds of data to use to best integrate a digital mobile health intervention into a users’ daily life, to lead to habit formation. Previous research has shown that just-in-time adaptive interventions (JITAIs) are more effective than statically controlled interventions (Wang & Miller, 2020). In other words, health interventions are more likely to lead to behaviour change if they are well situated, i.e., with agency adapted to specific user characteristics, and applied in situations where behaviour change should happen. However, there is limited evidence on how to best design JITAIs for health apps, so as to create artificial agents that lead to lasting behaviour change through novel habit formation. In addition, there is no systematic evidence as to which features of situations a health app should use to support a user to perform a healthy behaviour (e.g., time of day, location, mood, activity pattern, social context). We will address these issues in the under-researched domain of hydration behaviours. The aim is to establish—given the same intervention—which type of contextual data, or which heterogeneous mix of types of data, is most effective at increasing water consumption, and at establishing situated water drinking habits that persist when the initial engagement with the intervention has ceased.
Background and Novelty. Mobile health interventions are a powerful new tool in the domain of individual health behaviour change. Health apps can reach large numbers of users at relatively low cost, and can be tailored to an individual’s health goals and adapted to support users in specific, critical situations. Identifying the right contextual features to trigger an intervention is critical, because context plays a key role both in triggering unhealthy behaviours, and in developing habits that support the long-term maintenance of healthy behaviours. A particular challenge, which existing theories typically don’t yet address, lies in the dynamic nature of health behaviours and their contextual triggers, and in establishing how these behaviours and contexts can best be monitored (Nahum-Shani et al., 2018). This project will take on these challenges in the domain of hydration, because research suggests that many adults may be chronically dehydrated, with implications for cognitive functioning, mood, and physical health (e.g., risk of diabetes, overweight, kidney damage; see Muñoz et al., 2015; Perrier et al., 2020). Our previous work has shown that healthy hydration is associated with drinking water habitually across many different situations each day (Rodger et al., 2020). This underlines the particular importance of establishing dynamic markers of situations that are cognitively associated with healthy behaviours so that they can support habit formation.
Methods. (1) We will examine the internal (e.g., motivation, mood, interoception) and external (e.g., time of day, location, activity pattern, social context) markers of situations in which high water drinkers consume water, using objective intake monitors. Then, integrating these findings with theory on habit formation and motivated behaviour (Papies et al., 2020), and using an existing app platform (e.g. AWARE-Light), (2) we will test which types of data or mixes of data types are most effective in an intervention to increase water consumption in a sample of low water drinkers in the short term, and (3) whether those same data types are effective at creating hydration habits that persist in the longer term.
Outputs. This project will lead to presentations and papers of three quantitative subprojects at both Computer Science and Psychology conferences, as well as a possible qualitative contribution on the dynamic nature of habit formation.
Impact. Results from this work will have implication for the design of health behaviour interventions across domains. This work will further contribute to the emerging theoretical understanding of the formation and context sensitivity of the cognitive processes that support healthy habits. It will explore how sensing and adaptive user modeling can situate both user and AI system in a common contextual frame and whether this facilitates engagement and behavior change.
- Muñoz, C. X., Johnson, E. C., McKenzie, A. L., Guelinckx, I., Graverholt, G., Casa, D. J., … Armstrong, L. E. (2015). Habitual total water intake and dimensions of mood in healthy young women. Appetite, 92, 81–86. https://doi.org/10.1016/j.appet.2015.05.002
- Nahum-Shani, I., Smith, S. N., Spring, B. J., Collins, L. M., Witkiewitz, K., Tewari, A., & Murphy, S. A. (2018). Just-in-Time Adaptive Interventions (JITAIs) in Mobile Health: Key Components and Design Principles for Ongoing Health Behavior Support. Annals of Behavioral Medicine: A Publication of the Society of Behavioral Medicine, 52(6), 446–462. https://doi.org/10.1007/s12160-016-9830-8
- Papies, E. K., Barsalou, L. W., & Rusz, D. (2020). Understanding Desire for Food and Drink: A Grounded-Cognition Approach. Current Directions in Psychological Science, 29(2), 193–198. https://doi.org/10.1177/0963721420904958
- Perrier, E. T., Armstrong, L. E., Bottin, J. H., Clark, W. F., Dolci, A., Guelinckx, I., Iroz, A., Kavouras, S. A., Lang, F., Lieberman, H. R., Melander, O., Morin, C., Seksek, I., Stookey, J. D., Tack, I., Vanhaecke, T., Vecchio, M., & Péronnet, F. (2020). Hydration for health hypothesis: A narrative review of supporting evidence. European Journal of Nutrition. https://doi.org/10.1007/s00394-020-02296-z
- Rodger, A., Wehbe, L., & Papies, E. K. (2020). “I know it’s just pouring it from the tap, but it’s not easy”: Motivational processes that underlie water drinking. Under Review. https://psyarxiv.com/grndz
- Wang, L., & Miller, L. C. (2020). Just-in-the-Moment Adaptive Interventions (JITAI): A Meta-Analytical Review. Health Communication, 35(12), 1531–1544. https://doi.org/10.1080/10410236.2019.1652388
Sharing the road: Cyclists and automated vehicles
Automated vehicles must share the road with pedestrians and cyclists, and drive safely around them. Autonomous cars, therefore, must have some form of social intelligence if they are to function correctly around other road users. There has been work looking at how pedestrians may interact with future autonomous vehicles [ROT15] and potential solutions have been proposed (e.g. displays on the outside of cars to indicate that the car has seen the pedestrian). However, there has been little work on automated cars and cyclists.
When there is no driver in the car, social cues such as eye contact, waving, etc., are lost [ROT15]. This changes the social interaction between the car and the cyclist, and may cause accidents if it is no longer clear, for example, who should proceed. Automated cars also behave differently to cars driven by humans, e.g. they may appear more cautious in their driving, which the cyclist may misinterpret. The aim of this project is to study the social cues used by drivers and cyclists, and create multimodal solutions that can enable safe cycling around autonomous vehicles.
The first stage of the work will be observation of the communication between human drivers and cyclists through literature review and fieldwork. The second stage will be to build a bike into our driving simulator [MAT19] so that we can test interactions between cyclists and drivers safely in a simulation.
We will then start to look at how we can facilitate the social interaction between autonomous cars and cyclists. This will potentially involve visual displays on cars or audio feedback from them, to indicate state information to cyclists nearby (eg whether they have been detected, whether the car is letting the cyclist go ahead). We will also investigate interactions and displays for cyclists, for example multimodal displays in cycling helmets [MAT19] to give them information about car state (which could be collected by V2X software on the cyclist’s phone, for example). Or directly communicating with the car by input made on the handlebars or via gestures. These will be experimentally tested in the simulator and, if we have time, in highly controlled real driving scenarios.
The output of this work will be a set new techniques to support the social interaction between autonomous vehicles and cyclists. We currently work with companies such as Jaguar Land Rover and Bosch and our results will have direct application in their products.
[ROT15] Rothenbucher, D., Li, J., Sirkin, D. and Ju, W., Ghost driver: a platform for investigating interactions between pedestrians and driverless vehicles, Adjunct Proceedings of the International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 44–49, 2015.
[MAT19] Matviienko, A. Brewster, S., Heuten, W. and Boll, S. Comparing unimodal lane keeping cues for child cyclists (https://doi.org/10.1145/3365610.3365632), Proceedings of the 18th International Conference on Mobile and Ubiquitous MultimediaNovember, pp. 1-11, 2019.
Social Interaction via Touch Interactive Volummetric 3D Virtual Agents
Vision and touch based interactions are fundamental modes of interaction between humans and between humans and the real world. Several portable devices use these modes to display gestures that communicate social messages such as emotions. Recently, non-volumetric 3D displays have attracted considerable interest because they give users a 3D visual experience – for example, 3D movies provide viewers with a perceptual sensation of depth via a pair of glasses. Using a newly developed haptics-based holographic 3D volumetric display, this project will develop these new forms of social interactions with virtual agents. Unlike various VR tools that require headsets (which can lead to motion sickness), here the interaction with 3D virtual objects will be less restricted, closer to its natural form, and, critically, give the user the illusion that the virtual agent is physically present. The experiments will involve interactions with holographically displayed virtual human faces and bodies engaging in various social gestures. To this end, the simulated 2D images showing these various gestures will be displayed mid-air in 3D. For enriched interaction and enhanced realism, this project will also involve hand gesture recognition and controlling haptic feedback (i.e. air patterns) to simulate the surface of several classes of virtual objects. This fundamental study is transformative for sectors where physical interaction with virtual objects is critical, including medical, mental health, sports, education, heritage, security, and entertainment.
Evaluating and Shaping Cognitive Training with Artificial Intelligence Agents
Virtual reality (VR) has emerged as a promising tool for cognitive training for several neurological conditions (ie. mild cognitive impairment, acquired brain injury) as well as for enhancing healthy ageing and reducing the impact of mental health conditions (ie. anxiety and fear). Cognitive training refers to behavioural training that results in enhancement of specific cognitive abilities such as visuospatial attention and working memory. Using VR for such training offers several advantages towards achieving improvements, including its high level of versatility and its ability to dynamically adjust difficulty in real-time. Furthermore, it is an immersive technology and thus has great potential to increase motivation and compliance in subjects. Currently, VR and serious video games come in a wide variety of shapes and forms and the emerging data are difficult to quantify and compare in a meaningful way (Sokolov 2020).
This project aims to exploit machine learning to develop intuitive measures of cognitive training in a platform independent way. The project is challenging as there is great variability in cognitive measures even in well controlled/designed lab experiments (Learmonth et al., 2017; Benwell et al., 2014). So the objectives of the projects are:
- Predict psychological dimensions (ie. enjoyment, anxiety, valence and arousal) based on performance and neurophysiological data.
- Relate performance improvements (ie. learning rate) to psychological dimensions and physiological data (ie. EEG and eye-tracking).
- Develop artificial intelligence approaches that are able to modulate the VR world to control learning rate and participant satisfaction.
VR is a promising new technology that provides new means of building frameworks that will help to improve socio-cognitive processes. Machine learning methods that dynamically control aspects of the VR games are critical to enhanced engagement and learning rates (Darzi et al. 2019, Freer et al. 2020). Developing continuous measures of spatial attention, cognitive workload and overall satisfaction would provide intuitive ways for users to interact with the VR technology and allow the development of a personalised experience. Furthermore, these measures will play a significant role in objectively evaluating and shaping new emerging VR platforms and this approach will thus generate significant industrial interest.
[BEN14] Benwell, C.S.Y, Thut, G., Grant, A. and Harvey, M. (2014). A rightward shift in the visuospatial attention vector with healthy aging. Frontiers in Aging Neuroscience, 6, article 113, 1-11.
[DAR19] A. Darzi, T. Wondra, S. McCrea and D. Novak (2019). Classification of Multiple Psychological Dimensions in Computer Game Players Using Physiology, Performance, and Personality Characteristics. Frontiers in Neuroscience, 2019.
[FRE20] D. Freer, Y. Guo, F. Deligianni and G-Z. Yang (2020). On-Orbit Operations Simulator for Workload Measurement during Telerobotic Training. IEEE RA-L, https://arxiv.org/abs/2002.10594.
[LEA17] Learmonth, G., Benwell, C. S.Y., Thut, G. and Harvey, M. (2017). Age-related reduction of hemispheric lateralization for spatial attention: an EEG study. Neuro-Image, 153, 139-151.
[SOK20] A. Sokolov, A. Collignon and M. Bieler-Aeschlimann (2020). Serious video games and virtual reality for prevention and neurorehabilitation of cognitive decline because of aging and neurodegeneration. Current Opinion in Neurology, 33(2), 239-248.
Detecting Affective States based on Human Motion Analysis
Human motion analysis is a powerful tool to extract biomarkers for disease progression in neurological conditions, such as Parkinson disease and Alzheimer’s. Gait analysis has also revealed several indices that relate to emotional well-being. For example, increased gait speed, step length and arm swing has been related with positive emotions, whereas a low gait initiation reaction time and flexion of posture has related with negative feelings (Deligianni et al. 2019). Strong neuroscientific evidence show that the reason behind these relationships are due to an interaction between brain networks involved in gait and emotion. Therefore, it does not come to surprise that gait has been also related to mood disorders, such as depression and anxiety.
In this project, we aim to investigate the relationship between effective mental states and psychomotor abilities with relation to gait, balance and posture while emotions are modulated via augmented reality displays. The goal is to develop a comprehensive continuous map of interrelationships in both normal subjects and subjects affected by a mood disorder. In this way, we are going to derive objective measures that would allow to detect early signs of abnormalities and intervene via intelligent social agents. This is a multi-disciplinary project with several challenges to address:
- Build robust experimental setup of intuitive naturalistic paradigms.
- Develop AI algorithms to relate neurophysiological data with gait characteristics based on state-of-the-art motion capture systems (taking into account motion artefacts during gait)
- Develop AI algorithms to improve detection of gait characteristics via rgbd cameras (Gu et al. 2020) and possibly new assistive living technologies based on pulsed laser beam.
The proposed AI technology for social agents has several advantages. It can enable the development of intelligent social agents that would track mental well-being based on objective measures and provide personalised feedback and suggestions. In several cases, assessment is done based on self-reports via mobile apps. These measures of disease progression are subjective and it has been found that in major disorders they do not correlate well with objective evaluations. Furthermore, measurements of gait characteristics are continuous and they can reveal episodes of mood disorders that are not present when the subject visits a health practitioner. This approach might shed a light on subject variability with relation to behavioural therapy and provide more opportunities for earlier intervention (Queirazza et al. 2019). Finally, compared to other state-of-the-art effect recognition approaches, human motion analysis might pose less privacy issues and enhance users’ trust and comfort with the technology. In several situations, where facial expressions are not easy to track, human motion analysis is far more accurate in classifying subjects with mental disorders.
[DEL19] F Deligianni, Y Guo, GZ Yang, ‘From Emotions to Mood Disorders: A Survey on Gait Analysis Methodology’, IEEE journal of biomedical and health informatics, 2019.
[GUO19] Y Guo, F Deligianni, X Gu, GZ Yang, ‘3-D Canonical pose estimation and abnormal gait recognition with a single RGB-D camera’, IEEE Robotics and Automation Letters, 2019.
[XGU20] X Gu, Y Guo, F Deligianni, GZ Yang, ‘Coupled Real-Synthetic Domain Adaptation for Real-World Deep Depth Enhancement.’, IEEE Transactions on Image Processing, 2020.
[QUE19] F Queirazza, E Fouragnan, JD Steele, J Cavanagh and MG Philiastides, Neural correlates of weighted reward prediction error during reinforcement learning classify response to Cognitive Behavioural Therapy in depression, Science Advances, 5 (7), 2019.
Developing a digital avatar that appears to be “alive”
How should we design a digital avatar so that it appears sentient—i.e. “alive”? Digital avatars can engage with humans to interact socially. However, how should we design such avatars so that they have a realistic appearance that promotes engagement with a human? Building on the strength of digital design avatars in the Institute of Neuroscience and Psychology and the social robotics research on the School of Computing Science, we will combine methods from human psychophysics, computer graphics, machine vision and social robotics to design such a sentient avatar (presented in VR or on a computer screen). We will start with the resting, default state of the avatar. Our research will aim to make it look like a sentient being (e.g. with a realistic appearance and spontaneous dynamic movements of the face and the eyes), who can then engage with humans (i.e. track their presence, engage with realistic eye contact and so forth).
Aims and Objectives
More specifically this project will attempt to achieve the following scientific and technological goals:
- Identify the default face movements (including eye movements) that produce a realistic sentient appearance.
- Implement those movements on a digital avatar which can be displayed on a computer screen or in VR.
- Use tracking software to detect human beings in the environment, follow their movements and engage with realistic eye contact.
- Develop models to link human behaviour with avatar movements to encourage engagement.
- Evaluate the performance of the implemented models through deployment in labs and in public spaces.
Output and Impact
Digital avatars that can engage their users and communicate accurate social information remain a research challenge with potentially broad applications in the digital economy and industry, where the tools developed to animate sentient avatars can also animate suitably designed social robots.
Zhan, J., Liu, M., Garrod, O.G.B., Daube, C., Ince, R.A.A., Jack, R.E. & Schyns, P.G. (2021). Modeling individual preferences reveals that face beauty is not universally perceived across cultures. Current Biology, 31, 1-10.
Chen, C., Hensel, L.B., Duan, Y., Ince, R.A.A., Garrod, O.G.B., Beskow, J., Jack, R.E. & Schyns, P.G. (2019). Equipping social robots with culturally-sensitive facial expressions of emotion using data-driven methods. 14th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2019), doi:10.1109/FG.2019.8756570.
Jack, R.E. & Schyns, P.G. (2017). Toward a Social Psychophysics of Face Communication. Annual Review of Psychology, 68, 269-297.
You Never get a Second Chance to Make a First Impression – Establishing how best to align human expectations about a robot’s performance based on the robot’s appearance and behaviour
Aims and objectives:
- A major aim of social robotics is to create embodied agents that humans can instantly and automatically understand and interact with, using the same mechanisms that they use when interacting with each other. While considerable research attention has been invested in this endeavour, it is still the case that when humans encounter robots, they need time to understand how the robot works; in other words, people need time to learn to read the signals the robot generates. People consistently have expectations that are far too high for the artificial agents they encounter, which often leads to confusion and disappointment.
- If we can better understand human expectations about robot capabilities based on the robot’s appearance (and/or initial behaviours) and ensure that those are aligned with the actual robot abilities, this should accelerate progress in human-robot interaction, specifically in the domains of human acceptance of robots in social settings and cooperative task performance between humans and robots. This project will combine expertise in robotic design and the social neuroscience of how we perceive and interact with artificial agents to develop a socially interactive robot designed for use in public spaces that requires (little or) no learning or effort for humans to interact with while carrying out tasks such as guidance, cooperative navigation, and interactive problem-solving tasks.
- Computing Science: System development and integration (Developing operational models of interactive behaviour and implementing them on robot platforms); deployment of robot systems in lab-based settings and in real-world public spaces
- Psychology/Brain Science: Behavioural tasks (questionnaires and measures of social perception, such as the Social Stroop task), non-invasive mobile brain imaging (functional near infrared spectroscopy) to record human brain activity when encountering the artificial agent in question.
- empirically-based principles for social robot design to optimize alignment between robot’s appearance, user expectations, and robot performance, based on brain and behavioural data
- A publicly available, implemented, and validated robot system embodying these principles
- Empirical research papers detailing findings for a computing science audience (e.g., ACM Transactions on Human-Robot Interaction) a psychology/neuroscience audience (e.g., Psychological Science, Cognition) and a general audience, that draws on the multidisciplinary aspects of the work (PNAS, Current Biology), as well as papers at appropriate conferences and workshops such as Human-Robot Interaction, Intelligent Virtual Agents, CHI, and similar.
[Fos17] Foster, M. E.; Gaschler, A.; and Giuliani, M. Automatically Classifying User Engagement for Dynamic Multi-party Human–Robot Interaction. International Journal of Social Robotics. July 2017.
[Fos16] Foster, M. E.; Alami, R.; Gestranius, O.; Lemon, O.; Niemelä, M.; Odobez, J.; and Pandey, A. K. The MuMMER project: Engaging human-robot interaction in real-world public spaces. In Proceedings of the Eighth International Conference on Social Robotics, 2016.
[Cro19] Cross, E. S., Riddoch, K. A., Pratts, J., Titone, S., Chaudhury, B. & Hortensius, R. (2019). A neurocognitive investigation of the impact of socialising with a robot on empathy for pain. Philosophical Transactions of the Royal Society B.
[Hor18] Hortensius, R. & Cross, E.S. (2018). From automata to animate beings: The scope and limits of attributing socialness to artificial agents. Annals of the New York Academy of Science: The Year in Cognitive Neuroscience.
Brain Based Inclusive Design
It is clear to everybody that people differ widely, but the underlying assumption of current technology designs is that all users are equal. The large cost of this, is the exclusion of users that fall far from the average that technology designers use as their ideal abstraction (Holmes, 2019). In some cases, the mismatch is evident (e.g., a mouse typically designed for right-handed people is more difficult to use for left-handers) and attempts have been made to accommodate the differences. In other cases, the differences are more subtle and difficult to observe and no attempt has been made, to the best of our knowledge, as yet to take them into account. This is the case, in particular, for change blindness (Rensink, 2004) and inhibition of return (Posner & Cohen, 1984), two brain phenomena that limit our ability to process stimuli presented too closely in space and time.
The overarching goal of the project is thus to design Human-Computer Interfaces capable of adapting to the limits of every user, in view of a fully inclusive design capable putting every user at ease, i.e., enabling him/her to interact with technology according to her/his processing speed and not according to the speed imposed by technology designers.
The proposed approach includes four steps:
- Development of the methodologies for the automatic measurement of the phenomena described above through their effect on EEG signals (e.g., changes in P1, N1 components (McDonald et al., 1999) and behavioural performance (e.g., in/decreased accuracy, in/decreased reaction times);
- Identification of the relationship between the phenomena above and observable factors such as age, education level, computer familiarity, etc. of the user;
- Adaptation of the technology design to the factors above,
- Analysis of the improvement of the users’ experience.
The main expected outcome is that technology will become more inclusive and capable of accommodating the individual needs of its users in terms of processing speed and ease of use. This will be particularly beneficial for those groups of users that, for different reasons, tend to be penalised in terms of processing speed, in particular older adults and special populations (e.g., children with developmental issues, stroke survivors, and related cohorts).
The project is of great industrial interest because, ultimately, improving the inclusion of technical design greatly increases user satisfaction, a crucial requirement for every company that aims to commercialise technology.
[HOL19] Holmes, K. (2019). Mismatch, MIT Press.
[MCD99] McDonald,J., Ward,L.M. &.Kiehl,A.H. (1999). An event-elated brain potential study of inhibition of return. PerceptionandPsychophysics, 61, 1411–1423.
[POS84] Posner, M.I. & Cohen, Y. (1984). “Components of visual orienting”. In Bouma, H.; Bouwhuis, D. (eds.). Attention and performance X: Control of language processes. Hillsdale, NJ: Erlbaum. pp. 531–56.
[RES04] Rensink, R.A. (2004). Visual Sensing without Seeing. Psychological Science, 15, 27-32.
Modelling Conversational Facial Signals for Culturally Sensitive Artificial Agents
In spoken interactions, face-to-face meetings are often preferred. This is because the human face is highly expressive and can facilitate coordinated interactions. Embodied conversational agents with expressive faces therefore have the potential for smoother interactions than voice assistants. However, knowledge of how the face expresses these social signals – the “language” of facial expressions – is limited, with no coherent modelling framework (e.g., see Jack & Schyns, 2017). For example, current models focus primarily on basic emotions such as fear, anger and happiness, which are not suitable for everyday conversations or recognized cross-culturally (e.g., Jack, 2013). Instead, signals of affirmation, uncertainty, interest, and turn-taking in different cultures (e.g., Chen et al., 2015) are more relevant (e.g., Skantze, 2016). Conversational digital agents typically employ these signals in an ad hoc manner, with smiles or frowns manually inserted at speech-coordinated time points. However, this is costly, time consuming, and provides only a limited repertoire of, often Western-centric, face signals, which in turn restricts the utility of conversational agents.
To address this knowledge gap, this project will (a) Develop a modelling framework for conversationally relevant facial expressions in distinct cultures – East Asian and Western, (b) Develop methods to automatically generate these facial expressions in conversational systems, and (c) Evaluate these models in different human-robot cultural interaction settings. This automatic modelling will coordinate with the agent’s speech (e.g. auto-inserting smiles at appropriate times), the user’s behaviour (e.g. directing gaze and raising eyebrows when the user starts speaking), and the agent’s level of understanding (e.g. frowning during low comprehension).
We will employ state-of-the-art 3D capture of human-human interactions and psychological data-driven methods to model dynamic facial expressions (see Jack & Schyns, 2017). We will deploy these models using FurhatOS – a software platform for human-robot interactions – and the Furhat robot head, which has a highly expressive animated face with superior social signalling capacity compared to other platforms (Al Moubayed et al., 2013). The flexibility of Furhat’s display system, combined with state-of-the-art psychological-derived 3D face models will also enable exploration of other socially relevant facial characteristics, such as ethnicity, gender, and age (e.g., see Zhan et al., 2019).
The results will be highly relevant to companies developing virtual agents/social robots, such as Furhat Robotics. Skantze, Furhat Robotics co-founder/chief scientist, will facilitate impact of the results. The project will also inform fundamental knowledge of human-human and human-robot interactions by precisely characterizing how facial signals facilitate spoken interactions. We anticipate outputs in international psychology and computer science conferences (e.g., Society for Personality and Social Psychology; IEEE Automatic Face & Gesture Recognition) and high-profile scientific outlets (e.g., Nature Human Behaviour). Jack is PI of a large-scale funded laboratory specializing in modelling facial expressions across cultures.
Year 1 (Master’s): Training in (a) programming human-robot interactions; (b) data-driven modelling of dynamic facial expressions.
Year 2 – 3: Data-driven modelling of dynamic conversational facial expressions in each culture.
Year 3 – 4: Application and evaluation of facial expression models in human-robot interaction scenarios.
- Jack, R. E. & Schyns, P. G. (2017). Toward a social psychophysics of face communication. Annual review of psychology, 68, 269-297.
- Jack, R. E. (2013). Culture and facial expressions of emotion. Visual Cognition, 21(9-10), 1248-1286.
- Chen, C., Garrod, O., Schyns, P., Jack, R. (2015). The face is the mirror of the cultural mind. Journal of Vision, 15(12), 928-928.
- Skantze, G. (2016). Real-time coordination in human-robot interaction using face and voice. Ai Magazine, 37(4), 19-31.
- Moubayed, S. A., Skantze, G., & Beskow, J. (2013). The furhat back-projected humanoid head – lip reading, gaze and multi-party interaction. International Journal of Humanoid Robotics, 10(01), 1350005.
- Zhan, J., Liu, M, Garrod, O.G., Jack, R. E., & Schyns, P. G. (2020, October). A Generative Model of Cultural Face Attractiveness. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents (pp. 1-3).
Social and Behavioural Markers of Hydration States
Aims and Objectives. This project will explore whether data derived from a person’s smartphone can be used to establish that person’s hydration status so that, in a well–guided and responsive way, a system can prompt the person to drink water. Many people are frequently underhydrated, which has negative physical and mental health consequences. Low hydration states can manifest in impaired cognitive and physical performance, experiences of fatigue or lethargy, and negative affect (e.g, Muñoz et al., 2015; Perrier et al., 2020). Here, we will establish whether such social and behavioural markers of dehydration can be inferred from a user’s smartphone, and which of these markers, or their combination, are the best predictors of hydration state (Aim 1). Sophisticated user models of hydration states could also be adapted over time, and help to predict possible instances of dehydration in advance (Aim 2). This would be useful because many individuals find it difficult to identify when they need to drink, and could benefit from clear, personalized indicators of dehydration. In addition, smart phones could then be used to prompt users to drink water, once a state of dehydration has been detected, or when dehydration is likely to occur. Thus, we will also test how hydration information should be communicated to users to prompt attitude and behaviour change and ultimately, improve hydration behaviour (Aim 3). Throughout, we will implement data collection, modelling, and feedback on smartphones in a secure way that respects and protects a user’s privacy.
Background and Novelty. The data that can be derived from smart phones (and related digital services) ranges from low level data on sensors (e.g. accelerometers) to patterns of app usage and social interaction. As such, ‘digital phenotyping’ is a rich source of information on an individual’s social and physical behaviours, and affective states. Some recent survey papers this burgeoning field include Thieme et al. on machine learning in mental health (2020), Chancellor and de Choudhury on using social media data to predict mental health status (2020), Melcher et al. on digital phenotyping of college students (2020), and Kumar et al. on toolkits and frameworks for data collection (2020).
Here, we propose that these types of data may also reflect a person’s hydration state. Part of the project’s novelty is in its exploration of a wider range of phone-derived data as a resource for system agency than prior work in this general area, as well as pioneering work specifically on hydration. We will relate cognitive and physical performance, fatigue, lethargy and affect to patterns in phone-derived data. We will test whether such data can be harnessed to provide people with personalized, external, actionable indicators of their physiological state, i.e. to facilitate useful behaviour change. This would have clear advantages over existing indicators of dehydration, such as thirst cues or urine colour, which are easy to ignore or override, and/or difficult for individuals to interpret (Rodger et al, 2020).
Methods. We will build on an existing mobile computing framework (e.g. AWARE-Light) to collect reports of a participant’s fluid intake, and to integrate them with phone-derived data. We will attempt to model users’ hydration states, and validate this against self-reported thirst and urine frequency, and self-reported and photographed urine colour (Paper 1). We will then examine in prospective studies if these models can be used to predict future dehydration states (Paper 2). Finally, we will examine effective ways to provide feedback and prompt water drinking, based on individual user models (Paper 3).
Outputs. This project will lead to presentations and papers at both Computer Science and Psychology conferences outlining the principles of using sensing data to understand physiological states, and to facilitate health behaviour change.
Impact. Results from this work will have implications for the use of a broad range of data in health behaviour interventions across domains, as well as for our understanding of the processes underlying behaviour change. This project would also outline new research directions for studying the effects of hydration in daily life.
Chancellor, S., & De Choudhury, M. (2020). Methods in predictive techniques for mental health status on social media: a critical review. Npj Digital Medicine, 3(1), 1–11. http://doi.org/10.1038/s41746-020-0233-7
Melcher, J., Hays, R., & Torous, J. (2020). Digital phenotyping for mental health of college students: a clinical review. Evidence Based Mental Health, 4, ebmental–2020–300180–6. http://doi.org/10.1136/ebmental-2020-300180
Muñoz, C. X., Johnson, E. C., McKenzie, A. L., Guelinckx, I., Graverholt, G., Casa, D. J., … Armstrong, L. E. (2015). Habitual total water intake and dimensions of mood in healthy young women. Appetite, 92, 81–86. https://doi.org/10.1016/j.appet.2015.05.002
Rodger, A., Wehbe, L., & Papies, E. K. (2020). “I know it’s just pouring it from the tap, but it’s not easy”: Motivational processes that underlie water drinking. Under Review. https://psyarxiv.com/grndz
Perrier, E. T., Armstrong, L. E., Bottin, J. H., Clark, W. F., Dolci, A., Guelinckx, I., Iroz, A., Kavouras, S. A., Lang, F., Lieberman, H. R., Melander, O., Morin, C., Seksek, I., Stookey, J. D., Tack, I., Vanhaecke, T., Vecchio, M., & Péronnet, F. (2020). Hydration for health hypothesis: A narrative review of supporting evidence. European Journal of Nutrition. https://doi.org/10.1007/s00394-020-02296-z
Thieme, A., Belgrave, D., & Doherty, G. (2020). Machine Learning in Mental Health. ACM Transactions on Computer-Human Interaction (TOCHI), 27(5), 1–53. http://doi.org/10.1145/3398069
Towards modelling of biological and artificial perspective taken
Context and objectives
Visual imagery, e.g. the ability to form a visual representation of unseen stimuli, is a fundamental developmental step in social cognition. Being able to take the perspective of another observer is the focus of classic paradigms in theory of mind research such as Piaget’s landscape task: overturning an egocentric world view is reached around the age of 4 when children learn to simulate another person’s perspective towards a visual screen and imagine what is in sight of that person (Piaget, 2013).
Visual imagery might be one of the cognitive processes supported by extensive feedback connections from higher order areas and other modalities to the visual system (Clavagnier et al., 2004), as evidenced by the fact that sound content can be decoded from brain activity patterns in the early visual cortex of blindfolded participants (Vetter et al. 2014). Preliminary data from Muckli’s lab also suggests that this result cannot be reproduced in aphantasic participants who report an inability to generate visual imagery (Zeman, Dewar and Della Sala, 2015).
Our project aims to further explore the neural correlates of visual imagery and aphantasia by using neural decoding techniques, which allow the reconstruction of perceived features from human magnetic resonance imaging (fMRI) data (Raz et al, 2017). This method will allow us to detect shared representation networks between visual imagery and actual visual perception of the same objects, whether these networks are shared across participants, and whether they differ between aphantasics and non-aphantasics.
Proposed methods and expected results
We will use Ultra High Field fMRI to read brain activity while participants (aphantasics and non-aphantasics) are presented with either single-sentence descriptions of object categories (e.g. “a red chair”) or different visual exemplars from the same categories.
Our hypotheses are that, in the visual system, representations of the same categories: (1) will be generalizable between the auditory and visual conditions for the non-aphantasic group, but not for the aphantasic group, (2) will be less generalizable across aphantasics than non-aphantasics in the auditory condition, (3) that the previous two points will allow us to discriminate between aphantasics and non-aphantasic participants.
In Human-Computer Interaction (HCI), we recently developed computational models capable of representing physical and virtual space, solving the problems of how to recognise virtual spatial regions starting from the detected physical position of the users (Benford et al., 2016). We used the models to investigate cognitive dissonance, namely the inability or difficulty to interact with the virtual environment. In this project, we will adapt these computational models and apply them to cognition processes to test hypotheses 1-3. The end goal is to embed them within AI agents to enable empathic-seeming behaviours.
Impact for artificial social intelligence
Our proposal is relevant for the future development of creating and contrasting artificial agents with and without imagery, not only making AI more human-like, but adding the layer of complexity that is imagery-based representations. We outline a number of key questions where we hypothesize imagery has a function in social cognition, and where imagery-based artificially intelligent machines can be applied to social phenomena. To what extent is visual imagery in social AI an advantage? Simulate the perspective another agent has on a view and being able to match the perspective.
- Clavagnier, S., Falchier, A. & Kennedy, H. (2004) “Long-distance feedback projections to area V1: Implications for multisensory integration, spatial awareness, and visual consciousness”, Cognitive, Affective, & Behavioral Neuroscience 4, 117–126
- Piaget, J. (2013). Child’s Conception of Space: Selected Works vol 4 (Vol. 4). Routledge.
- Vetter, P., Smith, F.W., and Muckli, L. (2014), “Decoding Sound and Imagery Content in the Early Visual Cortex”, Current Biology, 24, 1256-1262.
- Zeman, A., Dewar, M., and Della Sala, S. (2015) “Lives without imagery — Congenital aphantasia”, Cortex, 73, 378-380.
- Raz, G., Svanera, M., Singer, N., Gilam G., Bleich, M., Lin, T., Admon, R., Gonen, T., Thaler, A., Granot, R.Y., Goebel, R., Benini, S., Valente, G. (2017) “Robust inter-subject audiovisual decoding in functional magnetic resonance imaging using high-dimensional regression”, Neuroimage, 163, 244-263
- Benford, S., Calder, M., Rodden, T., & Sevegnani, M. (2016). On lions, impala, and bigraphs: Modelling interactions in physical/virtual spaces. ACM Transactions on Computer-Human Interaction (TOCHI), 23(2), 9.
Deep Learning feature extraction for social interaction prediction in movies and visual cortex
While watching a movie, a viewer is immersed in the spatiotemporal structure of the movie’s audiovisual and high level conceptual content [Raz19]. The nature of the movies induces a natural waxing and waning of more and less social immersive content. This immersion can be exploited during brain imaging experiments to emulate as closely as possible the every-day human life experience, including brain processes involved in social perception.
The human brain is a prediction machine: in addition to receiving sensory information, it actively generates sensory predictions. It implements this by creating internal models about the world which are used to predict upcoming sensory inputs. This basic but powerful concept is used in several studies in Artificial Intelligence (AI) to perform different type of predictions: from video inner-frames for video interpolation [Bao19], to irregularity detection [Sabokrou18], passing through future sound prediction [Oord18].
Despite different studies on AI focusing on how to use visual features to detect and track actors in a movie [Afouras20], it is not clear in the brain how cortical networks for social cognition involve layers in the visual cortex for processing the social interaction cues occurring between actors. Several studies suggest that biological motion recognition (the visual processing of others’ actions) is central to understanding interactions between agents and involves top-down social cognition with bottom up visual processing. We will use cortical layer specific fMRI at Ultra High Field to read brain activity during movie stimulation. Using the latest advances in Deep Learning [Bao19, Afouras20], we will study how the interaction between two people in a movie is processed, trying to analyse predictions that occur between frames. The comparison between the two representation sets, which involves the analysis of the movie video with Deep Learning and its response measured within the brain, will occur doing model comparison with Representational Similarity Analysis (RSA) [Kriegeskorte08].
The work and its natural extensions will help clarify how the early visual cortex is responsible for guiding attention in social scene understanding. The student will spend time in both domains: studying and analysing the state-of-the-art methods in pose estimation and scene understanding in Artificial Intelligence. In brain imaging, they will learn how to perform a brain imaging study with fMRI: from data collection and understanding, to analysis methods. These two fields will provide a solid background in both brain imaging and artificial intelligence, teaching the student the ability to transfer skills and draw conclusions across domains.
[Afouras20] Afouras, T., Owens, A., Chung, J. S., & Zisserman, A. (2020). Self-supervised learning of audio-visual objects from video. European Conference on Computer Vision (ECCV 2020).
[Bao19] Bao, W., Lai, W. S., Ma, C., Zhang, X., Gao, Z., & Yang, M. H. (2019). Depth-aware video frame interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3703-3712).
[Kriegeskorte08] Kriegeskorte, N., Mur, M., & Bandettini, P. A. (2008). Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in systems neuroscience, 2, 4.
[Oord18] Oord, A. V. D., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
[Raz19] Raz, G., Valente, G., Svanera, M., Benini, S., & Kovács, A. B. (2019). A Robust Neural Fingerprint of Cinematic Shot-Scale. Projections, 13(3), 23-52.
[Sabokrou18] Sabokrou, M., Pourreza, M., Fayyaz, M., Entezari, R., Fathy, M., Gall, J., & Adeli, E. (2018, December). Avid: Adversarial visual irregularity detection. In Asian Conference on Computer Vision (pp. 488-505). Springer, Cham.
Developing a public-space robot: MuMMER in Glasgow
The increasing availability of socially-intelligent robots with functionality for a range of purposes, from guidance in museums (Gehle et al 2015), to companionship for the elderly (Hebesberger et al 2016), has motivated a growing number of studies attempting to evaluate and enhance Human-Robot Interaction (HRI). But, as Honig and Oron-Gilad (2018)’s review of recent work on understanding and resolving failures in HRI observes, most research has focussed on technical ways of improving robot reliability. They argue that progress requires a ‘holistic approach’ in which ‘[t]he technical knowledge of hardware and software must be integrated with cognitive aspects of information processing, psychological knowledge of interaction dynamics, and domain-specific knowledge of the user, the robot, the target application, and the environment’ (p.16). Honig and Oron-Gilad point to a particular need to improve the ecological validity of evaluating user communication in HRI, by moving away from experimental, single-person environments, with low-relevance tasks, mainly with younger adult users, to more natural settings, with users of different social profiles and communication strategies, where the outcome of successful HRI matters.
This project will combine current advances in the development of real-world social robots with methods and insights from sociolinguistic theory. Specifically, it will make use of the MuMMER robot system, which is a humanoid robot designed to interact naturally and autonomously in public spaces (Foster et al., 2016; Foster et al., 2019). MuMMER has been originally designed to entertain and engage visitors to a shopping mall, thereby enhancing their overall experience in the mall. For a robot to be successful in this context, it must support human-robot interaction which is socially acceptable, helpful and entertaining for multiple, diverse users in a real-world context. The sociolinguistic context for enhancing human-robot interaction in a real-world setting will be Scotland’s largest city, Glasgow, home to a substantial socially and ethnically diverse population, with its own range of distinctive dialect and accents, from broad Glaswegian vernacular to educated Scottish Standard English (e.g. Stuart-Smith 1999; Macaulay 2005), as well as ‘Glaswasian’, spoken by Glasgow’s South Asian heritage communities (e.g. Lambert et al 2007). Glasgow is also one of the most researched dialect areas in the English-speaking world, and so provides a wealth of comparative sociolinguistic material as the basis for the project.
The work on the PhD project will draw on sociolinguistically-informed observational studies of the MuMMER robot deployed in various locations across Glasgow, interacting with users from a range of social, ethnic, and language backgrounds. Based on the findings of these studies, the student will identify necessary technical modifications to the robot’s interaction strategy to respond to and address issues identified when the robot is interacting with a diverse set of users. The modified robot will be deployed in a new set of observational studies; if time permits, this process will be repeated with different deployment locations and different sets of users to ensure that as many Glaswegians as possible are ultimately able to interact comfortably with the robot, whatever their background.
- Clift, R. (2016). Conversation Analysis. Cambridge: Cambridge University Press.
- Coupland, N., Sarangi, S., & Candlin, C. N. (2014). Sociolinguistics and social theory. Routledge.
- Foster M.E., Alami, R., Gestranius, O., Lemon, O., Niemela, M., Odobez, J-M., Pandey, A.M. (2016) The MuMMER Project: Engaging Human-Robot Interaction in Real-World Public Spaces. In: Agah A., Cabibihan J., Howard A., Salichs M., He H. (eds) Social Robotics. ICSR 2016. Lecture Notes in Computer Science, vol 9979. Springer, Cham
- Foster, M.E. et al. (2019). MuMMER: Socially Intelligent Human-Robot Interaction in Public Spaces. Proceedings of the AAAI Fall Symposium of Artificial Intelligence for Human-Robot Interaction (AI-HRI 2019).
- Gehle R., Pitsch K., Dankert T., Wrede S. (2015). Trouble-based group dynamics in real-world HRI – Reactions on unexpected next moves of a museum guide robot., in 24th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2015 (Kobe), 407–412.
- Hebesberger, D., Dondrup, C., Koertner, T., Gisinger, C., Pripfl, J. (2016). Lessons learned from the deployment of a long-term autonomous robot as companion in physical therapy for older adults with dementia: A mixed methods study. In: The Eleventh ACM/IEEE International Conference on Human Robot Interaction, 27–34.
- Honig, S., & Oron-Gilad, T. (2018). Understanding and Resolving Failures in Human-Robot Interaction: Literature Review and Model Development. Frontiers in Psychology, 9, 861.
- Lambert, K., Alam, F., & Stuart-Smith, J. (2007). Investigating British Asian accents: studies from Glasgow. In J. Trouvain & W. Barry (Eds.), 16th International Congress of Phonetic Sciences (Issue August, pp. 1509–1512). Universität des Saarlandes.
- Macaulay, Ronald K. S. and Oxford University Press. 2005. Talk that Counts: Age, Gender, and Social Class Differences in Discourse. Oxford: Oxford University Press.
- Stuart-Smith, J. (1999). Glasgow: Accent and voice quality. In P. Foulkes & G. J. Docherty (Eds.), Urban voices: Accent Studies in the British Isles (pp. 203–222). Arnold.
Digital user representations and perspective taking in mediated communication
Human social interaction is increasingly mediated by technology, with many of the signals present in traditional face-to-face interaction being replaced by digital representations (e.g., avatars, nameplates, and emojis). To communicate successfully, participants in a conversational interaction must keep track of the identities of their co-participants, as well as the “common ground” they share with each—the dynamically changing set of mutually held beliefs, knowledge, and suppositions. Perceptual representations of interlocutors may serve as important memory cues to shared information in communicative interaction (Horton & Gerrig, 2016; O’Shea, Martin, & Barr, in press). Our main question concerns how digital representations of users across different interaction modalities (text, voice, video chat) influence the development of and access to common ground during communication.
To examine the impact of digital user representations on real-time language production and comprehension, the project will use a variety of behavioral methods including visual world eye-tracking (Tanenhaus, et al. 1995), latency measures, as well as analysis of speech/text content. In the first phase of the project, we will examine how well people can keep track of who said what during a discourse depending on the abstract versus rich nature of user representations (e.g., from abstract symbols to dynamic avatar-based user representations), and how these representations impact people’s ability to tailor messages to their interlocutors, as well as to correctly interpret a communicator’s intended meaning. For example, in one such study, we will test participants’ ability to track “conceptual pacts” (Brennan & Clark, 1996) with a pair of interlocutors during an interactive task where each partner appears (1) through a video stream; (2) as an animated avatar; or (3) as a static user icon. In the second phase, we will examine whether the nature of the user representation during encoding affects the long-term retention of common ground information.
In support of the behavioural experiments, this project will also involve developing a range of conversational agents, both embodied and speech-only, and defining appropriate behaviour models to allow those agents to take part in the studies. The defined behaviour will incorporate both verbal interaction as well as non-verbal actions, to replicate the full richness of human face-to-face conversation (Foster, 2019; Bavelas et al., 1997). Insights and techniques developed during the project are intended to improve interfaces for computer-mediated human communication.
- Bavelas, J. B., Hutchinson, S., Kenwood, C., & Matheson, D. H. (1997). Using Face-to-face Dialogue as a Standard for Other Communication Systems. Canadian Journal of Communication, 22(1).
- Brennan, S. E., & Clark, H. H. (1996). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 1482.
- Foster, M. E. (2019). Face-to-face conversation: why embodiment matters for conversational user interfaces. Proceedings of the 1st International Conference on Conversational User Interfaces – CUI ’19. the 1st International Conference.
- Horton, W. S., & Gerrig, R. J. (2016). Revisiting the memory‐based processing approach to common ground. Topics in Cognitive Science, 8, 780-795.
- O’Shea, K. J., Martin, C. R., & Barr, D. J. (2021). Ordinary memory processes in the design of referring expressions. Journal of Memory and Language, 117, 104186.
- Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 1632-1634.
Facilitating Parental Insight and Moderation for Social Virtual / Mixed Reality
N.B. This proposal is part-funded by Facebook Reality Labs. As affordable, consumer-oriented mixed reality headsets find their way into the home, it becomes increasingly likely such technology will see adoption by children and adolescents, particularly for social Augmented and Virtual Reality (AR/VR). Where a new disruptive technology has entered the market, parental understanding, supervision, and controls have typically lagged, leading to a window (often years wide) where children and adolescents experience unsupervised access. Whilst this can be beneficial (e.g. in terms of technological literacy), historically there have been examples where this lack of safeguards has led to children experiencing new forms of bullying, harassment and abuse, often unbeknownst to parents.
This lack of parental insight and control is particularly important when we consider what new forms of potential misuses and abuses are made possible by embodied social experiences. For example, our own research has shown how VR can be differently affective compared to non-VR  and explored the ethical  and security  challenges posed by these technologies. This project aims to explore potential safeguards for adolescent use of social mixed reality experiences. The main objectives are to:
1) Understand how parents might moderate and limit the social VR experience, informed by prior practice in 2D/social web platforms. For example, Social VR could offer a safe platform for self-discovery and expression when growing up, and there is a tension between protecting children whilst avoiding stifling this growth / freedom of expression that needs to be explored. What moderation approaches are needed; can we adequately sense the circumstances in which they should apply; and how should moderation be enacted (e.g. invisibly to the child, or perhaps via an artificial agent serving as a proxy for the parent’s supervision)?
2) Develop methods to provide parents with insight into current/past social VR experiences. Insight is important because parents/guardians can help children process traumatic or difficult experiences and inform use of parental moderation controls. Given some knowledge of these social, online activities, parents will be empowered to help guide their children through these new worlds. There are challenges here regarding how we can identify, or enable self-report of, key sensitive events, and how to present this information in forms parents can easily manage and understand. We envisage journaling approaches (e.g. video/textual excerpts and descriptions) could enable parents to gain retrospective insights for example, whilst more real-time alerts might be more prescient for younger children, or particularly sensitive events.
This project will explore these challenges both qualitatively (e.g. surveys, focus groups, interviews) and quantitatively (e.g. capturing and interpreting the social signals and metadata made available in social VR experiences such as VR Chat and Facebook Horizon). The successful student will explore the risks of social mixed reality towards children and adolescents from an HCI / usable security perspective, and prototype (e.g. through co-design) tools for parental insight and moderation, evaluating them in terms of safety, efficacy, and the extent to which they support freedom of expression. This project is aligned with the interests of Facebook Reality Labs, which is one of the leading drivers in building the future of connection within virtual and augmented reality. The project has the potential for high impact in guiding the future of social mixed reality for children and adolescents.
 Graham Wilson and Mark McGill. 2018. Violent Video Games in Virtual Reality: Re-Evaluating the Impact and Rating of Interactive Experiences. In Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play (CHI PLAY ’18). DOI:https://doi.org/10.1145/3242671.3242684
 Jan Gugenheimer, Mark McGill, Samuel Huron, Christian Mai, Julie Williamson, and Michael Nebeling. 2020. Exploring Potentially Abusive Ethical, Social and Political Implications of Mixed Reality Research in HCI. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (CHI EA ’20). DOI:https://doi.org/10.1145/3334480.3375180
 Ceenu George, Daniel Buschek, Mohamed Khamis, Heinrich Hussmann. Investigating the Third Dimension for Authentication in Immersive Virtual Reality and in the Real World. In Proceedings of the 26th IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR 2019).
Better Look Away :-): Using AI methods to understand Gaze Aversion in Real and Mixed Reality Settings (exploring the Tell-Tale Task)
Main aims and objectives
The eyes are said to be a window to the brain . The way we move our eyes reflects our cognitive processes and visual interests, and we use our eyes to coordinate social interactions (e.g., take turns in conversations) . While there is a lot of research on attentive user interfaces that respond to user’s gaze , and directing user’s gaze towards targets , there is relatively less work on understanding and eliciting gaze aversion. This is unfortunate as the ability to not look is a classic psychological and neural measure of how much people are in voluntary control over their environment . In fact, people often avert their eyes to alleviate a negative social experience (such as avoiding a fight) and in some cultures, looking someone in the eyes directly can be seen as disrespectful.
Efficient gaze aversion is thus an essential adaptive response and its brain correlates have been mapped extensively . The main aim of this project is to investigate and enhance/train gaze aversion using virtual environments. Two potential examples will be considered in the 1st instance: Cultural gaze aversion training to accustom users to cultural norms, before encountering such a situation. Secondly, gaze elicitation and aversion will be integrated into augmented reality glasses to nudge the user to avert (or instead direct, as appropriate) their gaze while encountering for example an aggressive or socially desirable scenario. Another example could be the use of gaze aversion in mixed reality applications. In particular, guiding the user’s gaze and nudging them to look at targets and away from others, can help guide them in virtual environments, or ensure they see important elements of 360° videos.
This research is at the intersection of eye tracking, psychology and human-computer interaction. It will involve both empirical and technical work, exploring the opportunities and challenges of detecting and eliciting intentional and unintentional gaze aversion. Using an eye-tracker as well as a virtual reality headset we will a) investigate and evaluate methods for eliciting explicit and implicit gaze aversion guided by previous research on gaze direction [4,6]; b) study the impact of intentional and unintentional gaze aversion on the brain by measuring its impact on saccadic reaction times, error rates, and other metrics; and c) utilize the findings and developed methods in one or more application areas. Programming skills are required for this project and previous experience in conducting controlled empirical studies also a plus.
Likely outputs and impact
The results will inform knowledge and generate state of the art tools on how to best design virtual environments that optimize and measure eye-movement control. The topic spans Psychology, Neuro-and Computing Science and we thus envisage publications in journals and conferences that reach a wide academic audience, spanning a range of expertise (e.g. Psychological Science, PNAS, ACM CHI, PACM IMWUT, ACM TOCHI).
 Ellis, S., Candrea, R., Misner, J., Craig, C. S., Lankford, C. P., & Hutchinson, T. E. (1998, June). Windows to the soul? What eye movements tell us about software usability. In Proceedings of the usability professionals’ association conference (pp. 151-178).
 Majaranta, P., & Bulling, A. (2014). Eye tracking and eye-based human–computer interaction. In Advances in physiological computing (pp. 39-65). Springer, London.
 Khamis, M., Alt, F., & Bulling, A. (2018, September). The past, present, and future of gaze-enabled handheld mobile devices: Survey and lessons learned. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services (pp. 1-17).
 Rothe, S., Althammer, F., & Khamis, M. (2018, November). GazeRecall: Using gaze direction to increase recall of details in cinematic virtual reality. In Proceedings of the 17th International Conference on Mobile and Ubiquitous Multimedia (pp. 115-119).
 Butler, S.H., Rossit, R., Gilchrist, I.D., Ludwig, C.J., Olk, B., Muir, R., Reeves, I. and Harvey, M. (2009) Non-lateralised deficits in anti-saccade performance in patients with hemispatial neglect. Neuropsychologia, 47, 2488-2495.
 Salvia, E., Harvey M., Nazarian, B. and Grosbras, M-H. (2020). Social perception drives eye-movement related brain activity: evidence from pro- and anti-saccades to faces. Neuropsychologia, 139, 107360.
To style-shift is human? Testing the limits of adaptable conversational interfaces
An important element of successful human interaction is speakers’ ability to adapt their talk in response to social context and their human interlocutors (Giles et al 1991). Sociolinguistic research on context-driven style-shifting in spoken language shows a range of adaptative linguistic behaviours, from switching to and from standard/non-standard dialects to fine-grained variation in speech sounds, rate and pitch (Coupland 2007). It also reveals limits on socially-appropriate accommodation: imagine the social consequences for a standard Southern English speaker trying to adopt a Glaswegian accent features in an informal context like a bar; linguistic hyper-correction in highly formal contexts such as interviews can be equally problematic (Labov 1966).
A generally-held view is that enabling machines to communicate more effectively with humans, as humans do with each other, requires artificial agents also to be responsive, adapting to their human interlocutors in ways which are appropriate for the specific communication goal (e.g., Axelsson and Skantze 2020). However, the development of socially acceptable, adaptive AI requires information about human-AI interaction which is currently lacking. For linguistic human-AI accommodation in particular, we need to know:
- How do humans style-shift when speaking to artificial agents/conversational interfaces (e.g. digital assistants such as Alexa, Siri etc)? The few small-scale studies which exist suggest that humans may adapt to artificial agents, but little is known for certain (e.g. Staum-Casasanto et al 2010; Ferenc et al 2019)
- What is the impact of human style-shifting on communicative efficacy in human-AI interaction, especially shifts to and from standard dialects? Early work suggests that artificial agents are poorly equipped to deal with non-standard dialect African-American Vernacular English (Lopez-Lloreda, 2020). No formal evidence exists for style-shifting involving non-standard dialects in the UK context.
- What kinds of adaptive linguistic behaviours from artificial agents are socially acceptable and/or communicatively effective for humans (Cassell, 2009)? Are there limits on what human interlocutors can tolerate in terms of linguistic adaption in artificial agents?
This project will work in a specific sociolinguistic context – the Central Belt of Scotland, with a well-known standard-non-standard dialect continuum, and predictable context-induced style-shifting (e.g. Macaulay 2005). It will model human-artificial agent interaction in the context of a specific communicative goal, e.g. a service encounter, incorporating a set number of conversational pragmatic routines, bounded by opening and closing speech acts (Clift 2006). The project will consist of a series of experiments which test human style-shifting in interaction with a custom-developed and constantly-refined conversational agent; in particular, the experiments will measure the impact on both of social acceptability and communicative efficacy of adapting the agent’s linguistic responses.
- ReferencesAxelsson, N. and Skantze, G. Using knowledge graphs and behaviour trees for feedback-aware presentation agents. Proceedings of Intelligent Virtual Agents 2020. https://doi.org/10.1145/3383652.3423884
- Cassell J. (2009) Social Practice: Becoming Enculturated in Human-Computer Interaction. UAHCI 2009. https://doi.org/10.1007/978-3-642-02713-0_32
- Clift, R. (2016). Conversation Analysis. Cambridge: Cambridge University Press.
- Coupland, N. (2007). Style: Language Variation and Identity. CUP.
- Ferenc, B., Cohn, M., & Zellou, G. (2019). Perceptual adaptation to device and human voices: learning and generalization of a phonetic shift across real and voice-AI talkers. https://doi.org/10.21437/Interspeech.2019-1433
- Giles, H., Coupland, N., & Coupland, J. (1991). Contexts of accommodation. CUP.
- Labov, W. (1966). The effect of social mobility on linguistic behavior. Sociological Inquiry, 36(2), 186–203.
Lopez-Lloreda, C. (2020). How Speech-Recognition Software Discriminates against Minority Voices. Scientific American, 323(4).
- Macaulay, Ronald K. S. and Oxford University Press. 2005. Talk that Counts: Age, Gender, and Social Class Differences in Discourse. Oxford: Oxford University Press.
- Staum Casasanto, L Jasmin, K., & Casasanto, D. (2010). Virtual accommodating: Speech rate accommodation to a virtual interlocutor. In S. Ohlsson & R. Catrambone (Eds.), Proc. 32nd Ann Conf Cog Sci Soc., Austin (pp. 127–132).
Bridging the Uncanny Valley with Decoded Neurofeedback
A problem with artificial characters that appear nearly human in appearance is that they can sometimes lead users to report that they feel uncomfortable, and that the character is creepy. An explanation for this phenomenon comes from the Uncanny Valley Effect (UVE), which holds that characters approaching human likeness elicit a strong negative response (Mori, et al., 2012; Pollick, 2009). Empirical research into the UVE has grown over the past 15 years and the conditions needed to produce a UVE, and reliably measure its effect have been extensively examined (Diel & MacDorman, 2021). These empirical studies inform design standards of artificial characters (Lay et al., 2016), but deep theoretical questions of why the UVE exists and its underlying mechanisms remain elusive. One technique that has shown promise to answer these questions is that of neuroimaging, where brain measurements are obtained while the UVE is experienced (Saygin, et al., 2012). In this research we propose to use the technique of realtime fMRI neurofeedback, which allows fMRI experiments to go past correlational evidence by enabling the manipulation of brain processing to study the effect of brain state on behaviour.
In particular, we plan to use the technique of decoded neurofeedback (DecNef), which employs methods of machine learning to build a decoder of brain activity. Previous experiments have used DecNef to alter facial preferences (Shibata, et al., 2016) and this study by Shibata and colleagues will guide our efforts to develop a decoder that can be used during fMRI scanning to influence how the UVE is experienced. It is hoped that these experiments will reveal the brain circuits involved in experiencing the UVE, and lead to a deeper theoretical understanding of the basis of the UVE, which can be exploited in the design of successful artificial characters.
The project will develop skills in 1) the use of animation tools to create virtual characters, 2) the ability to design and perform psychological assessment of people’s attitudes and behaviours towards these characters, 3) the use of machine learning in the design of decoded neurofeedback algorithms, and finally 4) how to perform realtime fMRI neurofeedback experiments.
- Diel, A., & MacDorman, K. F. (2021). Creepy cats and strange high houses: Support for configural processing in testing predictions of nine uncanny valley theories. Journal of Vision.
- Lay, S., Brace, N., Pike, G., & Pollick, F. (2016). Circling around the uncanny valley: Design principles for research into the relation between human likeness and eeriness. i-Perception, 7(6), 2041669516681309.
- Mori, M., MacDorman, K. F., & Kageki, N. (2012). The uncanny valley [from the field]. IEEE Robotics & Automation Magazine, 19(2), 98-100. (Original work published in 1970).
- Pollick, F. E. (2009). In search of the uncanny valley. In International Conference on User Centric Media (pp. 69-78). Springer, Berlin, Heidelberg.
- Saygin, A. P., Chaminade, T., Ishiguro, H., Driver, J., & Frith, C. (2012). The thing that should not be: predictive coding and the uncanny valley in perceiving human and humanoid robot actions. Social cognitive and affective neuroscience, 7(4), 413-422.
- Shibata, K., Watanabe, T., Kawato, M., & Sasaki, Y. (2016). Differential activation patterns in the same brain region led to opposite emotional states. PLoS biology, 14(9), e1002546.
Priming expectation and motivation: Unpacking the placebo effect to improve effectiveness of mobile health apps
Numerous reports have highlighted that during the COVID-19 pandemic, poor mental health has been exacerbated globally (e.g., 1,2). There is a broad consensus that the most cost-effective measures for reducing the risk and prevalence of common mental disorders such as anxiety and depression are the implementation of evidence-based preventative interventions. Mobile health apps have brought a growing enthusiasm related to delivering behavioural and health interventions at low-cost and in a scalable fashion. A recent estimate identified in excess of 10,000 health and wellness apps designed for mental or behavioural health (3). Only a very small number of these have been evaluated for clinical efficacy despite claims of being evidence-based.
The effectiveness of a mobile application comprises measures of efficacy and engagement. To enhance effectiveness, it is necessary to deliver evidence-based techniques in an engaging manner which necessitates personalisation to the user–there is no one-size-fits all approach to mental health. Machine learning can create personalised recommendations based on other users and an individual user’s interaction with an app.
An individual’s beliefs, expectation and motivation are known to affect clinical efficacy. Factors such as these are thought to partly account for the placebo effect seen in trials and studies into a range of physical and mental health conditions (4). Yet there is a paucity of research that aims to understand and leverage these for clinical benefit. Improved understanding and application of the placebo effect could enhance the clinical efficacy of treatments including mobile health applications for mental health. From a grounded cognition perspective (5,6), we predict that personalisation and engagement of a mobile health intervention that creates vivid, multimodal expectations of desired outcomes may lead to stronger effects that can’t be attributed to the intervention alone.
Aims and objectives
This project will deepen understanding of how to personalise mobile health apps to user’s expectations, beliefs and motivation aiming to improve the engagement and ultimately the effectiveness of the intervention. The main objectives will be the following:
– Identify key factors that contribute to the placebo effect
– Explore whether manipulation of these factors can improve the effectiveness of interventions for mental well-being.
– Explore how these factors could be objectively or subjectively measured for individual users
– Explore integration of these factors into an AI-driven recommender system to personalise interventions within apps for mental health.
Extensive literature research will be first conducted to characterise the factors that contribute to the placebo effect. This will result in a set of hypotheses on the relationship between measure of, for example, motivation, expectation and beliefs, and a theoretical framework to integrate them and to derive novel predictions for use in mobile health applications. Methods of manipulating these factors will also be identified. Studies may then be undertaken to directly manipulate these factors to determine whether these change the effectiveness of an app for mental well-being. By applying standard statistical methods as well as machine learning (to unpack more complex interplay between factors and content/design elements), these data will be used to identify predictors of effectiveness. The learnings will be used to design and test personalisation in a real-world scenario.
- Czeisler, M. É., Lane, R. I., Petrosky, E., Wiley, J. F., Christensen, A., Njai, R., … Rajaratnam, S. M. W. (2020). Mental Health, Substance Use, and Suicidal Ideation During the COVID-19 Pandemic — United States, June 24–30, 2020. Morbidity and Mortality Weekly Report, 69(32), 1049–1057. https://doi.org/10.15585/mmwr.mm6932a1
- Vindegaard, N., & Eriksen Benros, M. (2020). COVID-19 pandemic and mental health consequences. Systematic review of the current evidence. https://doi.org/10.1016/j.bbi.2020.05.048
- Carlo, A. D., Hosseini Ghomi, R., Renn, B. N., & Areán, P. A. (2019). By the numbers: ratings and utilization of behavioral health mobile applications. Npj Digital Medicine, 2(1), 54. https://doi.org/10.1038/s41746-019-0129-6
- Price DD, Finniss DG, Benedetti F. A comprehensive review of the placebo effect: recent advances and current thought. Annu Rev Psychol. 2008;59:565-90. doi: 10.1146/annurev.psych.59.113006.095941. PMID: 17550344.
- Barsalou, L. W. (2009). Simulation, situated conceptualization, and prediction. Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1521), 1281–1289. https://doi.org/10.1098/rstb.2008.0319
- Papies, E. K., Barsalou, L. W., & Rusz, D. (2020). Understanding Desire for Food and Drink: A Grounded-Cognition Approach. Current Directions in Psychological Science, 29(2), 193–198. https://doi.org/10.1177/0963721420904958
Fashion Analytics Based on Deep Learning Visual Processing
Understanding and anticipating future trends is crucial for fashion companies looking to maximise their profit. Many machine learning approaches have been devoted to fashion forecasting, all of them with a
strong limitation: they model fashion styles as sets of textual attributes; for example, “dotted t-shirt with skinny jeans” defines an outfit which may correspond to many real outfits, since it misses the
color, the size of the dots, the type of the neckline, etc. Actually, the description does not incorporate the crucial part: the appearance. A picture is worth a thousand words, especially when it comes to fashion,
where subtle, fine-grained variations of a pattern may define a style. Few instants are needed to distinguish a female outfit of 1920 from one of the last years, but both of them have the same textual description: “Below-knee length drop-waist dresses with a loose, straight fit” describes a 1920 style; when copy-pasted to Google, it brings you to Zalando’s contemporary products! The devil is in the detail, and this detail is visual, and cannot described by words. With this PhD project, we want to model fashion exploiting visual patterns, as they were letters of a new artistic vocabulary, within deep network architectures. Deep learning allows to map complicate patterns into a mathematical space, including images, without the need to use words. In this space, similarities can be computed, which are way more effective then written descriptions, clearly differentiating the last trends from the ones of a century ago. Deep learning is particularly effective when many data are used. And fashion, nowadays, comes together with social media, where tons of images are now the new oil of communication, presenting clothing
items with pictures and video, with a pace of hundreds of thousands of items each day. This is the scenario where we will locate: our PhD theme will deal with fashion images collected on social media, in order to
give deep learning the capability of perfectly understanding a style. Finally, our PhD will aim at forecasting fashion trends, in order to predict the rise and fall of a particular visual trend. This will be possible by social signal processing, which treats the images together with the “likes” associated to them, predicting when an image of a clothing will become viral, understanding among all of the images the ones which are more important than the others in defining a trend, his rise and fall. The PhD theme will put the student in contact with Humatics, a young Italian start-up, which is currently working with important fast-fashion companies as Nunalie , Sirmoney, furnishing forecasting services, and looking to international collaborations to improve their services, and to create specialized professional in the field of computational fashion and aesthetics.