Artificial Intelligence (AI) image./pixabay

Domestic researchers have developed a core technology that consumes about 44% less power compared to the latest GPUs while increasing the inference performance of generative artificial intelligence (AI) models by an average of over 60%.

Research teams led by Professor Park Jong-se of the Korea Advanced Institute of Science and Technology (KAIST) Department of Computer Science and Hyper Excel (a startup company founded by Professor Kim Joo-young of the KAIST Department of Electrical and Electronic Engineering) announced on the 4th that they have developed core technology for high-performance and low-power neural processing units (NPUs) specialized for generative AI clouds like ChatGPT. NPUs are AI-specific semiconductor chips designed for fast processing of artificial neural networks.

Latest generative AI models, such as OpenAI's ChatGPT-4 and Google's Gemini 2.5, require not only high memory bandwidth but also a large memory capacity. This is why corporations operating generative AI clouds like Microsoft (MS) and Google purchase graphics processing units (GPUs) in the hundreds of thousands.

The research team developed a technology that can replace the existing GPU-based AI infrastructure, which requires numerous GPU devices. They reduced the data size of a kind of temporary storage used to enhance performance when operating generative AI models, allowing for the construction of an AI infrastructure with a smaller number of NPUs. Simultaneously, they combined this with hardware design optimized for the task. It is expected that using NPUs, which have superior cost and power efficiency compared to the latest GPUs, to set up an AI cloud will significantly reduce operating expenses.

Professor Park Jong-se noted, "We have implemented NPUs that show an average performance improvement of over 60% compared to the latest GPUs through this technology," and added, "This will play a significant role not only in AI cloud data centers but also in the AI transformation (AX) environment represented by active executable AI known as 'Agentic AI.'"

This research was presented at the '2025 International Symposium on Computer Architecture (ISCA)' held in Tokyo, Japan, in June. The International Symposium on Computer Architecture is an internationally prestigious conference in the field of computer architecture.

References

International Symposium on Computer Architecture (2025), DOI: https://doi.org/10.1145/3695053.3731019