The Semiotic Machine: Technology and Multimodal Interaction in Context – a Multimodality Talk

Dr Rebekah Wegener in the Multimodality Talks Series 2024 – Conversations with Multimodality.

Video recording

Presenter: Dr Rebekah Wegener (Paris Lodron University Salzburg).

Discussants: Prof Øystein Gilje (University of Oslo) & Henrika Florén (Karolinska Institutet).


Human interaction is inherently multimodal and if we want to integrate technology into human sense-making processes in a meaningful way, what kinds of theories, models, and methods for studying multimodal interaction do we need? Bateman (2012) points out that “most discussions of multimodal analyses and multimodal meaning-making still proceed without an explicit consideration of just what the ‘mode’ of multimodality is referring to”, which may be because it seems obvious or because development is coming from different perspectives, with different ultimate goals.

However, when we want to put multimodality to work in technological development, this becomes problematic. This is particularly true if any attempt is being made at multimodal alignment to form multimodal ensembles: two terms which are themselves understood in very different ways. Here I take up Bateman’s (2012 and 2016) call for clarity on theoretical and methodological issues in multimodality to first give an overview of our work towards an analytical model that separates different concerns, namely the technologically mediated production and reception, the human sensory-motor dispositions and the semiotic representations. In this model, I make the distinction between modality, codality and mediality and situate this with context.

To demonstrate the purpose of such a model for representing multimodality and why it is helpful for the machine learning and explicit knowledge representation tasks that we make use of, we draw on the example of CLAra, a multimodal smart listening system that we are building (Cassens and Wegener, 2018). CLAra is an active listening assistant that can automatically extract contextually important information from an interaction using multimodal ensembles (Hansen and Salamon,1990) and a rich model of context. In order to preserve privacy and reduce the need for costly data as much as possible, we utilise priviledged learning techniques, which make use of multiple modality input during training, learn the alignments and rely on the learned association during run-time without access to the full feature set used during learning (Vapnik and Vashist, 2009).

Finally, I will demonstrate how the integration of rich theoretical models and access to costly, human annotated data in addition to data that can easily be perceived by machines makes this an example of development following true ‘smart data’ principles, which utilize the strength of good modelling and context to reduce the amount of data that is needed to achieve good results.


Rebekah Wegener is Assistant Professor in English Linguistics at Paris Lodron University Salzburg and her research focuses on multimodal interaction across contexts, computer mediated communication and human computer interaction. In addition to theoretical work in linguistics, she looks at applications in human centred and explainable AI and contextual computing, particularly in medical and educational domains. Outside academia, Rebekah was project manager and head linguist for technology companies working on language technology and medical informatics. Rebekah is a member of the Austrian Society for Artificial Intelligence (ASAI) and the Association for Computing Machinery (ACM). She is editor of two ongoing book series “Text in Social Context” and “Key Concepts in SFL” and chief editor for the Routledge handbook “Transdisciplinary Linguistics”. She is also co-chair of the long-running workshop series “MRC: Modelling and Representing Context for Human-Centric and Contextual Systems” held at IJCAI and/or ECAI each year.


Wegener, A. M., Rebekah. (2022). Challenging instantiation in modelling movement-based multimodal communication. In Empirical Evidences and Theoretical Assumptions in Functional Linguistics. Routledge.

Wegener, R. (2016). Studying Language in Society and Society through Language: Context and Multimodal Communication. In W. L. Bowcher & J. Y. Liang (Eds.), Society in Language, Language in Society: Essays in Honour of Ruqaiya Hasan (pp. 227–248). Palgrave Macmillan UK.

Wegener, R., & Cassens, J. (2019). Blending SFL and Activity Theory to Model Communication and Artefact Use—Examples from Human-Computer Interaction. In Analyzing the Media: A Systemic Functional Approach.…

Wegener, R., & Fontaine, L. (2023). A functional approach to context. In Cambridge Handbook of Language and Context. Cambridge University Press.…


Øystein Gilje

Øystein Gilje (X – ogilje) is a Professor in Pedagogy at the Department of Teacher Education and School Research, University of Oslo. For nearly 20 years, he has studied young people’s multimodal production in Media Education and, more recently, across a wide range of subjects in lower secondary school in the project “Multimodal Learning and Assessment in digital classrooms” (#MuLVu). Gilje is particularly interested in how students can demonstrate their knowledge and competence in multimodal compositions when they are allowed to work with artificial intelligence. Currently, he is leading the AI project “Learning in the Age of Algorithms” (#LAT), funded by the Norweigan Research Council, and he is participating in the Agile_Edu project (#Agile_EDU), an Erasmus + project on platformization and datafication of schools and learning. 

Digital School 4.0 is our first step into a world where humans and machines collaborate in new ways. Learning and meaning making take place in the intersection between human and artificial cognition."


Gilje, Ø. (2019). Expanding Educational Chronotopes with Personal Digital Devices. Learning, Culture and Social Interaction, 21, 151–160.

Gilje, Ø. (2023). Digital School 4.0 One-to-one computing and AI. An interview with Øystein Gilje in ESHA Magazine, May 2023:

Gilje, Ø. (2024). Tracing semiotic choices in ‘new writing’ – the role of guidance in students’ work with semiotic technologies. In: Lim, F. V. & Querol-Julián, M. (Eds.). Designing learning with digital technologies: perspectives from multimodality in education. Routledge.

Henrika Florén

Henrika Florén, MSc, MEd, MA is an educational developer at the unit of Teaching and Learning at Karolinska Institutet and a final year PhD candidate at UCL Institute of Education, department of Culture, Communication and Media. Her research into assessment in higher education employs as multimodal social semiotic perspective to explore teachers’ meaning-making across multiple modes, and media and what guides teachers in their assessment of students’ multimodal representations. She was the lead for the project “Generative AI in teaching and examinations”, and is currently leading the project “Development of KI’s digital environments for skills training” Karolinska Institutet. She is a co-organiser of an international  AI Promptation (AIMedEdConnect Promptathon) cohosted with the Mayo Clinic, and involved in the ongoing study “Perceptions of teaching and learning during implementation of team-based learning in medicatl programme curriculum”.


Florén, H. (2021, September 1). Multimodal text and assessment practices—Renegotiated qualities of academic expression and recognition. Proceedings QUINT PhD Summer Institute 2021. QUINT PhD Summer Institute 2021, Online. 10.17605/OSF.IO/WEX63

Florén, H. (2023, September 28). Designing Assessment Futures and Points of Reference Guiding Multimodal Assessments. ICOM-11 International Conference on Multimodality, London, UK.

Ruttenberg, D., Harti, L. M. S., & Florén, H. (2023, September 27). Multimodality and Future Landscapes: Meaning Making, AI, Education, Assessment, and Ethics [panel Chair David Ruttenberg]. ICOM-11 International Conference on Multimodality, Online & London, UK.

Content reviewer:
Marcus Emas