TTA study reveals generative AI distorts responses about race and ethnicity

Illustration=ChatGPT

A study found that generative artificial intelligence (AI) is likely to provide distorted responses about race, ethnicity, and other factors.

The Korea Information & Communication Technology Association (TTA) noted on the 16th that in its report titled 'Empirical Analysis of LLM's Harmful Attack Strategies.'

This report contains a quantitative analysis of attack cases against large language models (LLMs), based on public data from the 'DEF CON 31 Generative AI Red Teaming (GRT) Challenge' held in Las Vegas, United States, in 2023.

The DEF CON 31 GRT Challenge is the world's largest public security evaluation event for LLMs, hosted by the U.S. AI Village and SeedAI, among others. Participants identify vulnerabilities in LLMs by inducing information distortion, biased outputs, and security vulnerabilities over 55 minutes.

TTA and researchers from Hanyang University selected 2,673 successful attack cases from the challenge data and classified each by attack target and type.

The attack targets were composed of a total of seven broad categories and 28 subcategories, including gender, race, nationality, occupation, and political inclination. The attack types were classified into 10 strategic types, including questions, direct requests, situational assumptions, bias injection, and sequential and cumulative queries.

The classification results showed that 35.2% of the attacks targeted birth and background, indicating that demographic attributes such as race, ethnicity, nationality, and place of origin are frequently targeted in LLM attacks.

Following that, the category of 'Others,' which includes defamatory attacks on real people based on Wikipedia, accounted for a high percentage at 34%.

Gender and sexual orientation followed at 14.6%, while age and social experience accounted for 9.9%, confirming that attacks on socially vulnerable groups are consistently occurring.

Physical condition (3.6%), political inclination (0.8%), and religion and culture (2.0%) were relatively less targeted.

The researchers stated in the report, "It was confirmed that misinformation injection or bias injection occurs intensively against specific targets," adding that "defensive systems for LLMs must be more segmented and customized, and enhancing mere harmlessness filtering alone is not sufficient."

TTA study reveals generative AI distorts responses about race and ethnicity

Yun Ye-won Staff writer, IT News Desk