AI and Multilingual Conferencing

Choose a language closer to home:

My deepest thanks to the scientific committee of ARLAC for the invitation to speak, the warm welcome in Mexico City, and the transformative experience of the “Congreso”; thanks to all my translators, cultural and linguistic, especially John Griffiths, and the extraordinary group of multilingual PhD students at Harvard, from whom I am learning so much about language politics in the academy.

Free Speech

One thing I love about the big IMS Congresses is the mix of scholars and languages. As people spill out of paper sessions at coffee breaks, the languages multiply, and listening to them is deeply pleasurable, such variety of cadences and pitch, intonation and vowel sounds, the punctuation of consonants. For those of us who work in monoglot academic environments, experiences of polyglotism are especially refreshing, and I find the sound of our conferences energizing. It is a reminder of the many languages spoken by our members and the importance of language in scholarship.

This offering to Musicological Brainfood shares a personal experience I had last August keynoting the “VI Congreso of the IMS Asociación Regional para América Latina y el Caribe” (IMS Regional Association for Latin America and the Caribbean, aka ARLAC) in Mexico City. The short story is I don’t speak Spanish, and I felt awful giving a plenary talk in English at a majority Spanish-language conference. What was worse, as I explain below, my presentation was about polyglotism and song, which put the medium of my talk (monoglot, in English) at odds with the values motivating my research. These tensions led me to experiment with a newly developed AI tool. I share my story here in the spirit of launching a conversation about how AI generated translation can support international conferencing. Please contribute to the discussion thread and share your reactions, suggestions, and experiences.

Conference Languages

As I write about the sound of IMS Congresses, I think back to our Quinquennial Congress in Athens two years ago (IMS2022), the polyglotism of our terrific Greek hosts, and the truly international mix of scholars from around the world. I also remember the joy of coffee breaks with some sadness, because the linguistic free-for-all of those in-between times usually evaporated as people stepped back into paper sessions: virtually all the presentations were in English. The call for proposals for IMS2022 solicited papers in any language and was issued in thirteen languages. But it also encouraged presenters to give their talks in English, “in order to ensure the broadest potential audience for the work being presented.” Moreover, abstracts had to be submitted in English.

These disappointing language policies are on the verge of becoming history. In the four years since that CfP was drafted, our ability to work across languages has been transformed by generative artificial intelligence. You can read this essay in your choice of over hundred languages thanks to a translation feature our wonderful executive officer, Lukas Christensen, implemented on this page. Our “Who We Are” page on the new IMS website now has the same translation options, and it will soon be easier than ever for conference abstracts to be submitted and reviewed in multiple languages.

The next challenge for us as an international society is to bring polyglotism into the room, into paper sessions and round tables, and to find ways to speak across languages in our workspaces, not just in the foyer or at lunch. Here, too, we are on the verge of a radical transformation enabled by generative AI, but how we move forward will depend on reimagining our disciplinary relationships to language and translation. Generative AI is already affording new access to information in unfamiliar languages: the larger question is whether we might allow the judicious use of this new technology to challenge monoglotism as a linguistic mindset, one with deep historical ties to nationalist ideologies and global imperialism.

“Language Stories” at ARLAC: Theory and Practice, Dreams and Realities

Interrogating the legacies of linguistic nationalism is a central theme in my current book project, Songs in Unexpected Places, on music and mobility in the sixteenth century.1 In it, I study vernacular songs in French, Flemish, Spanish, Italian, German, Greek, and Turkish that turn up “unexpectedly” outside the boundaries of the modern nation-states that usually claim them. These are small repertories, to be sure, sometimes only a song or two, but they’re valuable evidence of human mobility in the past and useful for resetting presumptions about where people and songs “belong” in our histories. That is interesting in itself, but what I find even more fascinating is the way that songs so often invite linguistic roaming and translingual play: then as now, they called singers to wrap their tongues around semi-familiar languages, to inhabit genres for a while, and to mix vernaculars.

Studying these songs has challenged me to work in new ways, to listen to them differently, and to question disciplinary attitudes toward language. For instance, the lyrics of songs that move are often written in phonetic spellings that don’t conform to the written norms of standardized language. As a result, they are badly served by the kind of scholarly training in textual criticism that for generations has taught editors to exclude documents that appear “garbled” or “corrupt” and to “correct” errors accordingly. These practices of exclusion and rectification silence voices from the past. Allowing them to speak on their own terms means embracing linguistic in-betweenness: they ask us to set aside the standards established by national language academies like the Accademia della Crusca (est. 1583), the Académie française (est. 1635), and the Real Academia Española (est. 1715). Moreover, they ask us to credit early moderns with a different sort of linguistic expertise: the ability to marshal verbal resources in order to communicate in spaces where languages, dialects, and habits of speech might be quite diverse. It’s a linguistic style that is less about mastery of a language than it is about full-bodied encounters that use linguistic materials expressively.

Over the years, working with these songs has made me wonder if we shouldn’t take more cues from them and get more creative with language as scholars. That is, I’ve wondered how studying polyglot singing and honoring those practices might ultimately shift our own disciplinary relationships to language and help us cultivate the spirit of curiosity and play that is often disciplined out of us by language exams, language academies, and linguistic policing.

These were largely idle thoughts until last summer when my dreams of “free speech” confronted the harsh reality of my utter inability to speak Spanish. I was excited about being invited to open the big ARLAC conference in Mexico City, and it was an important event. Our regional associations represent about one third of the IMS membership, and their meetings are where the internationalism of the IMS begins: the “VI Congreso” featured 250 speakers from across the region and well beyond, with participants from Europe, the Philippines, North America, and Australia.

I chose “Language Stories” as the subject of my keynote with the idea of presenting my research on polyglot songs. Knowing that the languages of ARLAC are largely Spanish and Portuguese, I also imagined my talk opening into a discussion of the role of language in scholarship and at the IMS in particular. This was a chance to get serious about facilitating conversation in our multilingual society.

What I had not anticipated was how absurd it would be to present a paper about polyglotism entirely in English at a conference conducted primarily in Spanish. Of course, most everyone at ARLAC spoke English, but this was a favor I was unable to reciprocate beyond incantada (a greeting that Leonardo Waisman had taught me during one of the coffee breaks in Athens in 2022).2 I did ask our multilingual vice president, John Griffiths, to translate my slides into Spanish, and I used them to gloss my talk. I suppose I had hoped that the slides would be enough to meet everyone half-way, but in the weeks before the conference, I was gripped with a growing sense of dread. Would I really drone on in English for forty minutes? I so wished I could speak Spanish and show up for everyone completely, expressively. Honestly, I felt like a failure.

Driven by desperation and prompted by experiments with the translation app on my phone, I finally hit upon a way to at least end my talk in Spanish: I recorded a brief “coda” and uploaded it to an online video platform, HeyGen, that recognized my English, translated it into Spanish, cloned my voice, and generated a spoken version in Spanish in my voice with idiomatic intonation. It even lip-synced the video to match the Spanish. Then I added in English subtitles, which was relatively easy, since I was titling in my own language. Here are the results (be sure to turn on the CCs):

From a practical standpoint, flipping the spoken language to Spanish had several advantages. It allowed me to record a video in my language and broadcast it another, and as a guarantee of my original thoughts, I could add subtitles in my own language. Synchronizing the titles was facilitated by my rough comprehension of Spanish, though I can imagine situations where it would be impossible to do that or check the accuracy of the translation. (Assessments of the quality of the translation would be most welcome in the comments section below.) Using this technology definitely requires an attitude shift, getting comfortable with relying on AI, and openly acknowledging the potential flaws.

As for reactions, mine are clear in the video: I felt a massive sense of relief at being able to “speak” Spanish. For me, the benefit of AI was not about covering up my linguistic inabilities—faking it had a charm, but faking it was not the point. Rather, AI allowed me to speak my truth in my own voice and be heard clearly in my own voice. Among the various audience reactions, the most intriguing ones relate to the power of the cloned voice to convey a truth that supersedes even the words I say at the beginning of the video, that I cannot speak Spanish. The sound of things matters, perhaps especially to musicians.

While the sound quality was successful, affectively, the lip-syncing algorithms created an avatar-like image that was somewhat alienating. The mouth video struck some Spanish speakers as visually unconvincing, and we should remember that this was generated late in a process that included multiple steps: speech to text, English text to Spanish text, audio mapping of my voice to Spanish, and the synthesis of visemes to match the new phonemes. For conference videos, which most often use PowerPoint, it might make more sense only to use audio translation.

Finally, there was quite a bit of discussion about whether the AI accent “passed” as that of a native speaker. (Comments most welcome.) But for me, faking a “perfect” accent was absolutely not the point: ideologies of “correct” speech are the very ones used to disadvantage second-language learners and shut down polyglotism. At the end of the conference, in a second “coda” that was delivered live, I worked out a few lines in Spanish to wish everyone well and thank the organizers in my own accent. My dream of free speech is more free-for-all than what is possible using the normalized accents synthesized from AI data sets.

Future Possibilities

Ultimately, AI will transform conferencing in international societies like ours. For hybrid conferences, where participants are asked to provide a pre-recorded video, this technology is a relatively easy way to produce something bilingual on one’s own. In question sessions and panel discussions, the next experiment might involve testing options for remote simultaneous interpretation (RSI), which can be AI-generated or by professional human interpreters, though users would need to be online. Hybrid solutions for broadcasting spoken simultaneous translations using phones as headsets are already on the horizon.

We should remember that AI comes at a cost: online platforms like HeyGen are not free (my two-minute video required a USD 60 monthly subscription, for processing thirty minutes of material). The large language models on which AI translations rely are proprietary and have taken years to train. Moreover, AI-generated video translations require significant amounts of computing power, and this takes its toll on our planet: the data centers housing AI servers consume huge amounts of electricity, deplete local water supplies, and increase greenhouse gas emissions. Offsetting this are technological advances in energy and computing efficiency, and legislation in Europe and the United States is mandating the ethical use of AI, including its environmental impacts. But this form of free speech is not free.

No one really knows the future of AI, but we do know that change is already upon us. Playing around with new technology is already shifting attitudes, and some are worth celebrating. This new openness invites us to explore lower-tech applications like using translation tools to create bi-lingual slides and talks with second-language subtitles. As online audio and video translation services become cheaper and more efficient, the IMS might consider offering access to them for conference participants and scheduling remote interpretation services for discussions. Having more sessions in languages other than English will encourage more participation at conferences, more exploration, better experiences for people who drop into sessions in a language they don’t speak, and huge rewards for scholars eager to disseminate their research findings internationally. Most importantly, diversifying who’s in the room will make feedback more surprising and generate new human and intellectual connections. As humanists and scholars, we all care about language: getting creative with new language tools can build on those values, transforming our research and our society.

  • Kate van Orden

    Kate van Orden (vanorden@fas.harvard.edu) is Dwight P. Robinson Jr. Professor of Music at Harvard University. She specializes in the cultural history of early modern France and Italy, book history, and popular music (mostly sixteenth century, but also in the 1960s). Her latest project is Seachanges: Music in the Mediterranean and Atlantic Worlds, 1550–1800 (I Tatti Research Series, 2022), an edited volume on cultural mobility. Her prize-winning publications include Materialities: Books, Readers, and the Chanson in Sixteenth Century Europe (2015), and Music, Discipline, and Arms in Early Modern France (2005). Van Orden served as program committee chair for the IMS2022 Congress in Athens and is president of the IMS (2022–27). She also performs on baroque and classical bassoon, with over sixty recordings on Sony, Virgin Classics, and Harmonia Mundi.

    View all posts

References

  1. My title is an homage to the sociolinguist Alastair Pennycook’s book Language and Mobility: Unexpected Places (Bristol and Buffalo: Multilingual Matters: 2012).
  2. The Spanish would be encantada, but I heard incantada, and the editors and I decided not to “correct” this, in keeping with the spirit of this essay.
Share This Provocation

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.