“A newborn baby comprehends the surroundings through sight and hearing before learning letters. I wanted to create artificial intelligence (AI) that understands the world visually rather than through language.”
TwelveLabs is an AI startup founded in 2021 by three young people fascinated by multimodal AI. Multimodal AI refers to an AI model that understands various types of information, such as photos, illustrations, and videos, unlike text-based AI models.
CEO Lee Jae-sung, 31, majored in computer science at the University of California, Berkeley, and after completing internships at Samsung Electronics and Amazon, volunteered for the Ministry of National Defense's Cyber Operations Command in 2019 for defense obligations. During his military service, he met Kim Sung-jun, co-founder and Chief Development Officer of TwelveLabs, and Lee Seung-jun, Chief Technology Officer.
At that time, language-based models dominated the AI industry, but they believed that AI, which understands context based on visual data similar to human recognition structures, would change the future. The three young people established a corporation with a monthly salary of 2 million won earned during their military service and began serious technology development after catching the attention of the U.S. accelerator Techstars.
The initial team comprised a total of 12 members, and the name TwelveLabs is derived from this. Currently, 40 research and development (R&D) personnel are working in Korea, along with 40 business marketing personnel in the U.S.
TwelveLabs' flagship products are the video search model 'Marengo' and the video summary and question-answering model 'Pegasus.' Marengo is a model that can quickly search for desired scenes from hours of video data in text or image form, utilized in media, sports, and public institutions. Pegasus is a model that analyzes video content to summarize it or answer specific questions, applicable in industries such as news, advertisements, and healthcare.
TwelveLabs recently attracted attention by being the first domestic corporation to deploy a model on Amazon Web Services (AWS) generative AI platform 'Bedrock.' CEO Lee Jae-sung explained during a meeting with ChosunBiz on the 17th of last month at their office in Itaewon, Yongsan District, that this contract enables hundreds of thousands of companies worldwide to directly use TwelveLabs' model. For example, Canadian sports entertainment company 'Maple Leaf Sports' is adopting Marengo to create a system that automatically generates game highlight videos.
TwelveLabs has secured a cumulative investment of 150 billion won from companies such as NVIDIA, Intel, and SK Telecom. CEO Lee noted, “Major clients in North America are utilizing TwelveLabs, and we plan to rapidly target the European and Asian markets using Bedrock as a springboard.” He will be on stage at the 'AWS Summit Seoul 2025,' which will be held at COEX in Gangnam, Seoul, starting on the 14th. Below is a Q&A with CEO Lee.
-Is there a specific reason why the company is located in Itaewon, Yongsan?
“I thought it would be hard to focus on the main job if we were located in an area crowded with startups. I first met the co-founders in Yongsan. We formed our connections while serving in the Ministry of National Defense's Cyber Operations Command, and we have memories of deciding to work together after completing our service.”
-Your initial funding was 2 million won. What could you do with that money?
“The three co-founders had about 2 million won. However, if you are determined, you can accomplish a lot with 2 million won. We worked out of an office borrowed from a friend’s company because we had no office. We also worked in cafes and visited a startup center supported by the government. Although we founded the company in March 2021, it was during the height of the COVID-19 pandemic. When working with partners, we held video conferences early in the morning according to U.S. time and conducted business here in Korea. We operated like that for about a year and then gradually expanded the business after receiving investments.”
-The model based on video AI is unfamiliar. How can it be used?
“It is actively used in the media sector. For instance, Disney films hundreds of thousands of hours of footage to create a 2-hour movie. Editors must sift through all materials manually to edit the film. The TwelveLabs model can read thousands of hours of footage and find desired scenes through a simple search.
In the public sector, it can be used for CCTV video detection, and in the mobility sector, for understanding video in autonomous driving. It won't be long before it is applied in healthcare as well. For example, our technology could be utilized in a robot that checks if elderly parents are doing well in the aging society. A model trained to recognize abnormal reactions would detect videos in real time and communicate the situation to hospitals or family members.”
-TwelveLabs attracted attention by being the first Korean corporation to deploy an AI model on AWS Bedrock.
“We have had a connection with AWS since the early days of our founding, and it has been four years now. When we had no capital at all, the computing resources supported by AWS were very helpful. Currently, creating one model costs billions of won, but in the early days of the startup, it was uncommon to spend 100 million won to train a model. AWS quickly recognized this trend and supported us.”
To explain about our product being deployed on Bedrock, discussions started last year. Many AWS customers have had large volumes of video materials in the cloud that they couldn't utilize. I understand that there were numerous requests from client corporations for TwelveLabs' products to be incorporated. With our model now on Bedrock, I expect that it will positively impact our company's global recognition in terms of data security and the trust customers have in AWS.”
-OpenAI and Google are also developing multimodal AI. What differentiates TwelveLabs?
“We are not just a ‘model’ company. We create everything from video processing to video indexing technology. Instead of training the model with language, we train it with the video itself, which allows AI to effectively recognize the flow of information. Since video information is the basis, it learns images, audio, and even music. Compared to other models, we can work faster and at a lower expense. Our latest product, Pegasus-1.2, showed faster response times than competing products like GPT-4o and Google Gemini 1.5 Pro.”
-I am curious about the future of video AI that TwelveLabs envisions.
“Until now, we have released products that deal with generated videos. In the future, we want to create products that can quickly understand videos that are generated in real time. Our technology is advancing to the point where it can provide videos for real-time streaming services. Ultimately, all devices equipped with cameras will integrate TwelveLabs models to send compressed messages of information in real time or connect to required tasks.”
-If you have any thoughts on the development of Korea's AI industry, please share.
“I hope we can discard the mindset of ‘we’re late, so let’s catch up.’ Creating a language model is not the end goal. I believe we need to predict what the next trend will be and lead the way to get ahead. I hope truly great AI models emerge from Korea. The know-how and knowledge gained from Korean-based models should spread throughout the Korean AI industry as global customers utilize them. Talented individuals thirsty for such knowledge are going to the U.S. I hope the government will pay more attention to the voices of startups and large corporations contributing to Korea's AI sovereignty.”