(From left) Kim Gang-wook, undergraduate student at Seoul National University School of Computer Science, Kim Keon-hee, professor at Seoul National University School of Computer Science, Lee Se-hun, doctoral student at Seoul National University School of Computer Science./Courtesy of Seoul National University College of Engineering

Seoul National University's College of Engineering announced on the 30th that Professor Kim Keon-hee and her research team from the Department of Computer Engineering developed a voice dialogue generation technology that helps artificial intelligence (AI) understand and replicate human conversational behaviors such as speech habits, interjections, and interruptions.

The research team noted that people exhibit conversational behaviors during voice dialogues that are not easily reflected in text conversations. For example, we use speech habits like 'um...' or 'so...' during a conversation, insert interjections like 'right' or 'yeah' at appropriate moments, and sometimes interrupt the other person. However, existing AI dialogue systems, which failed to reflect these subtle characteristics, inevitably felt unnatural and mechanical.

To simulate a real conversational environment as closely as possible, the research team collected 100,000 dialogue patterns and a total of 2,000 hours of voice conversations to build the 'Behavior-SD (Spoken Dialogue)' dataset. This large-scale dataset was designed to accurately implement natural conversations between people, using a method that adds a variety of finely categorized conversational behaviors to simple sentences exchanged between speakers.

Based on the constructed data, the research team developed a behavior-based dialogue generation AI model named 'BeDLM.' Built on a large language model (LLM), BeDLM easily generates voice conversations that closely resemble actual human dialogues when input with the conversational context and the behavioral patterns of the two speakers. It smoothly controls the reflection of conversational behaviors, such as interjections and the habit of interrupting, thus overcoming the limitations of existing AI dialogue systems and producing more human-like voice conversations.

BeDLM is expected to be widely used in various fields requiring interaction and emotional responses between people and AI, such as podcast content creation, counseling AI, and personalized voice assistants. Furthermore, this technology is projected to help facilitate smoother communication between people and AI in diverse areas like counseling, education, and caregiving services. Additionally, the Behavior-SD dataset and code developed in this research have been made available as open source.

Professor Kim Keon-hee noted, 'When conversing, people usually keep their ears open to adapt to the verbal and visual responses of the other person while leading the conversation. However, the AI dialogue generation models developed so far have failed to reflect this, so we aimed to surpass that limitation. The significance of this research lies in the advancement of technology that allows AI to converse as naturally as a human.'

The research team presented their paper at the North American Chapter of the Association for Computational Linguistics (NAACL 2025) conference held in Albuquerque, New Mexico, from April 29 to May 4, and received the Best Paper Award in the field of speech processing and speech language understanding. NAACL is one of the world's leading conferences on natural language processing (NLP), a field of AI that enables computers to understand and generate human language.

References

2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL2025), LINK: https://aclanthology.org/2025.naacl-long.484/