I Hear You: On Human Knowledge and Vocal Intelligence

Moana Ava Holenstein

doi:10.21814/rlec.6316

Authors

Moana Ava Holenstein Sound Studies and the Sonic Arts, Universität der Künste Berlin, Berlin, Germany https://orcid.org/0009-0004-1895-3325

DOI:

https://doi.org/10.21814/rlec.6316

Keywords:

voice technology, human-computer interaction, affective computing, large language models

Abstract

This interview explores embodied agency and the evolving dynamics of knowledge creation through practical and experimental engagement with conversational artificial intelligence (AI) systems. Drawing on media archaeology, media theory, and science and technology studies, it examines how the emergence of language interfaces destabilize distinctions between user and system, collapsing the boundaries between human and artificial modes of expression and understanding. Framed within an artistic research methodology, the project critically engages with the ongoing shift toward machine- and voice-based forms of inquiry, analysing how these technologies reshape the epistemic, linguistic, and ontological conditions of knowledge and research. Departing from keyboard-based interaction, the process emphasizes the decoupling of the body from the machine interface and the increasing fluidity of human-computer correspondence through voice technology. While acknowledging the growing uncertainty of origin and autonomy resulting from this technological shift, it foregrounds indeterminate authorship as both methodological challenge and theoretical pivot, underlining the implications for academic accountability and data ethics. The employment of practice-based experimentation is used as a tool to trace the infrastructural, affective, and rhetorical vectors through which intelligent automated speech influences knowledge production. By examining this process, the study contributes to ongoing debates on verification, trust, and the social negotiation of information induced by advanced conversational AI agents. Overall, the paper argues that voice technologies do not merely transmit content but actively configure the conditions under which knowledge is produced, authenticated, and circulated.

Downloads

Download data is not yet available.

Author Biography

Moana Ava Holenstein, Sound Studies and the Sonic Arts, Universität der Künste Berlin, Berlin, Germany

Moana Ava Holenstein is a Berlin-based artist, sound designer and researcher. She has been working as a research assistant at Fraunhofer Institute for Telecommunications in the department of Capture and Display Systems since 2021. Her artistic practice centers on auditory immersion, exploring relationships between technology, identity and transience. Her current focus lies on tactile listening, integrating raw materials into experimental interfaces to create a sonic dialog between recursive vibrations and the body. From these interactions, complex interfaces and soundscapes emerge, connecting personal narratives and reflections on the intersections of digital and organic domains.

References

*Afzal, S., Khan, H. A., Khan, I. U., Piran, M. J., & Lee, J. W. (2023). A comprehensive survey on affective computing; challenges, trends, applications, and future directions. arXiv. https://doi.org/10.48550/arXiv.2305.07665

Agrawal, K. (2010). To study the phenomenon of the Moravec’s paradox. arXiv. https://doi.org/10.48550/arXiv.1012.3148

Ardelt, M. (2004). Wisdom as expert knowledge system: A critical review of a contemporary operationalization of an ancient concept. Human Development, 47(5), 257–285. https://doi.org/10.1159/000079154

Arora, S. (2025, April 28). OpenAI CEO Sam Altman admits ChatGPT 4O’s ‘annoying’ personality needs work: “We are working on fixes”. Times Now. https://www.timesnownews.com/technology-science/openai-ceo-sam-altman-admits-chatgpt-4os-annoying-personality-needs-work-we-are-working-on-fixes-article-151522930

Baltes, P. B., & Staudinger, U. M. (2000). Wisdom: A metaheuristic (pragmatic) to orchestrate mind and virtue toward excellence. American Psychologist, 55(1), 122–136. https://doi.org/10.1037/0003-066x.55.1.122

Bandura, A. (2001). Social cognitive theory: An agentic perspective. Annual Review of Psychology, 52, 1–26. https://doi.org/10.1146/annurev.psych.52.1.1

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., Sydney, V. A., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., . . . Liang, P. (2021). On the opportunities and risks of foundation models. arXiv. https://doi.org/10.48550/arXiv.2108.07258

*Chervonyi, Y., Trinh, T. H., Olšák, M., Yang, X., Nguyen, H., Menegali, M., Jung, J., Verma, V., Le, Q., V., & Luong, T. (2025). Gold-medalist performance in solving Olympiad geometry with AlphaGeometry2. arXiv. https://arxiv.org/html/2502.03544v1

*Chomsky, N. (2006). Language and mind. Cambridge University Press. https://doi.org/10.1017/cbo9780511791222. (Original work published 1968)

Clarke, L. (2022, November 12). When AI can make art – what does it mean for creativity? The Guardian. https://www.theguardian.com/technology/2022/nov/12/when-ai-can-make-art-what-does-it-mean-for-creativity-dall-e-midjourney

Cohn, M., Pushkarna, M., Olanubi, G. O., Moran, J. M., Padgett, D., Mengesha, Z., & Heldreth, C. (2024). Believing anthropomorphism: Examining the role of anthropomorphic cues on trust in large language models. arXiv. https://doi.org/10.48550/arXiv.2405.06079

*Connolly, F. F., Hjerm, M., & Kalucza, S. (2025). When will AI transform society? Swedish public predictions on AI development timelines. arXiv. https://doi.org/10.48550/arXiv.2504.04180

Crawford, K. (2021). Atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press. https://doi.org/10.2307/j.ctv1ghv45t

Crystal, D. (2008). Dictionary of linguistics and phonetics. Wiley-Blackwell. https://doi.org/10.1002/9781444302776

Dada, E. G., Bassi, J. S., Chiroma, H., Abdulhamid, S. M., Adetunmbi, A. O., & Ajibuwa, O. E. (2019). Machine learning for email spam filtering: Review, approaches and open research problems. Heliyon, 5(6), e01802. https://doi.org/10.1016/j.heliyon.2019.e01802

De Waal, F. (2016). Are we smart enough to know how smart animals are? W. W. Norton & Company.

*Dreyfus, H. (2014). 20. What computers can’t do: A critique of artificial reason. In B. Williams (Ed.), Essays and reviews: 1959-2002 (pp. 90–100). Princeton University Press. https://doi.org/10.1515/9781400848393-021

Eidsheim, N. S. (2019). The race of sound: Listening, timbre, and vocality in African American music. Duke University Press. https://doi.org/10.2307/j.ctv11hpntq

Epley, N., Waytz, A., & Cacioppo, J. T. (2007). On seeing human: A three-factor theory of anthropomorphism. Psychological Review, 114(4), 864–886. https://doi.org/10.1037/0033-295x.114.4.864

*Farrell, T. J. (1985). Orality and literacy: The technologizing of the word [Book review of Orality and literacy: The technologizing of the word, by W. J. Ong]. College Composition and Communication, 36(3), 363–365. https://doi.org/10.2307/357987

Fedorenko, E., & Varley, R. (2016). Language and thought are not the same thing: Evidence from neuroimaging and neurological patients. Annals of the New York Academy of Sciences, 1369, 132–153. https://doi.org/10.1111/nyas.13046

*Floridi, L., & Illari, P. (Eds.). (2014). The philosophy of information quality. Springer Cham. https://doi.org/10.1007/978-3-319-07121-3

*Freire, S. K., Wang, C., & Niforatos, E. (2024). Conversational assistants in knowledge-intensive contexts: An evaluation of LLM- versus intent-based systems. arXiv. https://doi.org/10.48550/arXiv.2402.04955

French, R. M. (1990). Subcognition and the limits of the Turing test. Mind, XCIX(393), 53–65. https://doi.org/10.1093/mind/XCIX.393.53

Fron, C., & Korn, O. (2019, July 2). A short history of the perception of robots and automata from antiquity to modern times. In O. Korn (Ed.), Social robots: Technological, societal and ethical aspects of human–robot interaction (pp. 1–12). Springer Nature. https://doi.org/10.1007/978-3-030-17107-0_1

*Gardavski, K. (2022). Wittgenstein and LaMDA. The Logical Foresight - Journal for Logic and Science, 2(1), 25–42. https://doi.org/10.54889/issn.2744-208x.2022.2.1.25

Harwell, D. (2019, November 6). A face-scanning algorithm increasingly decides whether you deserve the job. The Washington Post. https://www.washingtonpost.com/technology/2019/10/22/ai-hiring-face-scanning-algorithm-increasingly-decides-whether-you-deserve-job/

*He, L., Qi, X., Liao, M., Cheong, I., Mittal, P., Chen, D., & Henderson, P. (2025). The deployment of end-to-end audio language models should take into account the principle of least privilege. arXiv. https://doi.org/10.48550/arXiv.2503.16833

Hillis, K., Petit, M., & Jarrett, K. (2012). Google and the culture of search. Routledge. https://doi.org/10.4324/9780203846261

Jones, C. R., & Bergen, B. K. (2025). Large language models pass the Turing test. arXiv. https://doi.org/10.48550/arXiv.2503.23674

*Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2024). A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. ACM Transactions on Office Information Systems. https://doi.org/10.48550/arXiv.2311.05232

*Keat, L. C., & Ying, T. X. (2025). Artificial intelligence-based email spam filtering. Journal of Advanced Research in Artificial Intelligence & It’s Applications, 2(1), 67–75. https://doi.org/10.5281/zenodo.14264139

Kreps, S., McCain, R. M., & Brundage, M. (2022). All the news that’s fit to fabricate: AI-generated text as a tool of media misinformation. Journal of Experimental Political Science, 9(1), 104–117. https://doi.org/10.1017/xps.2020.37

Lakhani, K. (2023, July 17). How can we counteract generative AI’s hallucinations? Digital Data Design Institute at Harvard. https://d3.harvard.edu/how-can-we-counteract-generative-ais-hallucinations/

Leo-Liu, J. (2023). Loving a “defiant” AI companion? The gender performance and ethics of social exchange robots in simulated intimate interactions. Computers in Human Behavior, 141, 107620. https://doi.org/10.1016/j.chb.2022.107620

Lewandowsky, S., Robertson, R. E., & DiResta, R. (2023). Challenges in understanding human-algorithm entanglement during online information consumption. Perspectives on Psychological Science, 19(5), 758–766. https://doi.org/10.1177/17456916231180809

Li, Y. A., Han, C., Raghavan, V. S., Mischler, G., & Mesgarani, N. (2023). StyleTTS 2: Towards human-level text-to-speech through style diffusion and adversarial training with large speech language models. arXiv. https://doi.org/10.48550/arXiv.2306.07691

*Lin, G., Chiang, C., & Lee, H. (2024). Advancing large language models to capture varied speaking styles and respond properly in spoken conversations. arXiv. https://doi.org/10.48550/arXiv.2402.12786

Lovato, S. B., & Piper, A. M. (2019). Young children and voice search: What we know from human-computer interaction research. Frontiers in Psychology, 10, 1–5. https://doi.org/10.3389/fpsyg.2019.00008

Luscombe, R. (2022, June 12). Google engineer put on leave after saying AI chatbot has become sentient. The Guardian. https://www.theguardian.com/technology/2022/jun/12/google-engineer-ai-bot-sentient-blake-lemoine

Manovich, L. (2002). The language of new media. MIT Press.

Matthias, M. (2023, August 25). Why does AI art screw up hands and fingers? Encyclopaedia Britannica. https://www.britannica.com/topic/Why-does-AI-art-screw-up-hands-and-fingers-2230501

*Mikalson, J. D. (2006). (H.) Bowden classical Athens and the Delphic Oracle: Divination and democracy. Pp. xviii + 188, maps, ills. Cambridge: Cambridge University Press, 2005. ISBN: 0-521-53081-4 (0-521-82373-0 hbk). The Classical Review, 56(2), 406–407. https://doi.org/10.1017/s0009840x06002150

Miller, T., Paloque-Bergès, C., & Dame-Griff, A. (2022). Remembering Netizens: an interview with Ronda Hauben, co-author of Netizens: on the history and impact of Usenet and the internet (1997). Internet Histories, 7(1), 76–98. https://doi.org/10.1080/24701475.2022.2123120

Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. NYU Press. https://doi.org/10.2307/j.ctt1pwt9w5

O’Donnell, J. (2024, September 24). OpenAI released its advanced voice mode to more people. Here’s how to get it. MIT Technology Review. https://www.technologyreview.com/2024/09/24/1104422/openai-released-its-advanced-voice-mode-to-more-people-heres-how-to-get-it/

OpenAI. (2025a, January 30). Advanced voice mode FAQ. https://help.openai.com/en/articles/9617425-advanced-voice-mode-faq

OpenAI. (2025b, April 25). Response generated by ChatGPT (version 4o) [Large language model]. https://openai.com/policies/usage-policies/?utm

Parisi, L. (2019a). Machine sirens and vocal intelligence. In S. Goodman & U. Erlmann (Eds.), Unsound undead (pp. 53–56). MIT Press.

Parisi, L. (2019b). The alien subject of AI. Subjectivity, 12(1), 27–48. https://doi.org/10.1057/s41286-018-00064-3

Pinker, S. (1989). Learnability and cognition: The acquisition of argument structure. MIT Press. https://doi.org/10.7551/mitpress/4158.001.0001

Pillai, M. (2024). The Evolution of Customer Service: Identifying the Impact of Artificial Intelligence on Employment and Management in Call Centres. Journal of Business Management and Information Systems, (special issue), 52–55.

*Quijano, A., & Ennis, M. (2000). Coloniality of power, eurocentrism, and Latin America. Nepantla: Views from South, 1(3), 533–580. https://muse.jhu.edu/article/23906

*Quinn, K. (2014). Google and the culture of search [Review of the book Google and the culture of search, by K. Hillis, M. Petit, & K. Jarrett]. Journal of Broadcasting & Electronic Media, 58(3), 473–475. https://doi.org/10.1080/08838151.2014.935943

*Raman, R., Kowalski, R., Achuthan, K., Iyer, A., & Nedungadi, P. (2025). Navigating artificial general intelligence development: Societal, technological, ethical, and brain-inspired pathways. Scientific Reports, 15, 8443. https://doi.org/10.1038/s41598-025-92190-7

Schreibelmayr, S., & Mara, M. (2022). Robot voices in daily life: Vocal human likeness and application context as determinants of user acceptance. Frontiers in Psychology, 13, 787499. https://doi.org/10.3389/fpsyg.2022.787499

sculpting_Noise. (2025). I hear you: On human knowledge and vocal intelligence [Audio work]. SoundCloud. https://soundcloud.com/user-432639751-504934319/i-hear-you-on-human-knowledge-and-vocal-intelligence

Sheth, A., Roy, K., & Gaur, M. (2023). Neurosymbolic AI -- Why, what, and how. arXiv. https://doi.org/10.48550/arXiv.2305.00813

*Shum, H., He, X., & Li, D. (2018). From Eliza to XiaoIce: Challenges and opportunities with social chatbots. arXiv. https://doi.org/10.48550/arXiv.1801.01957

Sindoni, M. G. (2024). The femininization of AI-powered voice assistants: Personification, anthropomorphism and discourse ideologies. Discourse, Context & Media, 62, 100833. https://doi.org/10.1016/j.dcm.2024.100833

Sternberg, R. J. (2012). Intelligence. Dialogues in Clinical Neuroscience, 14(1), 19–27. https://doi.org/10.31887/dcns.2012.14.1/rsternberg

Sullivan, D. (2013, June 28). A eulogy for AltaVista, the Google of its time. Search Engine Land. https://searchengineland.com/altavista-eulogy-165366

*Sun, H., Zhao, L., Wu, Z., Gao, X., Hu, Y., Zuo, M., Zhang, W., Han, J., Liu, T., & Hu, X. (2024). Brain-like functional organization within large language models. arXiv. https://doi.org/10.48550/arXiv.2410.19542

X Technologies. (2025, February 21). Introducing NEO gamma. https://www.1x.tech/discover/introducing-neo-gamma

Takahashi, M., & Overton, W. F. (2005). Cultural foundations of wisdom: An integrated developmental approach. In R. J. Sternberg & J. Jordan (Eds.), A handbook of wisdom: Psychological perspectives (pp. 32–60). Cambridge University Press. https://doi.org/10.1017/CBO9780511610486.003

*Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Harvard University Press. https://doi.org/10.2307/j.ctv26070v8

Turing, A. (2004). Computing machinery and intelligence (1950). In B J Copeland (Ed.), The essential Turing (pp. 433–464). Oxford university Press. https://doi.org/10.1093/oso/9780198250791.003.0017

Wang, J., Ma, W., Sun, P., Zhang, M., & Nie, J. (2024). Understanding user experience in large language model interactions. arXiv. https://doi.org/10.48550/arXiv.2401.08329

Wittgenstein, L. (2009). Philosophical investigations (P. M. S. Hacker & J. Schulte, Eds.; G. E. M. Anscombe, Trans.; 4th ed.). Wiley-Blackwell. (Original work published 1953)

*Yamaguchi, S., & Fukuda, T. (2023). On the limitation of diffusion models for synthesizing training datasets. arXiv. https://doi.org/10.48550/arXiv.2311.13090

Yaqub, M. Z., & Alsabban, A. (2023). Knowledge sharing through social media platforms in the silicon age. Sustainability, 15(8), 6765. https://doi.org/10.3390/su15086765

Yeh, K.-C., Chi, J.-A., Lian, D.-C., Hsieh, S.-K. (2023). Evaluating interfaced LLM bias. In J.-L. Wu & M.-H. Su (Eds.), Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023) (pp. 292–299). The Association for Computational Linguistics and Chinese Language Processing (ACLCLP). https://aclanthology.org/2023.rocling-1.37/

*Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., & Fredrikson, M. (2023). Universal and transferable adversarial attacks on aligned language models. arXiv. https://doi.org/10.48550/arXiv.2307.15043