ByteDance Accused of Violating OpenAI's Terms of Service in Developing Language Models

ByteDance Accused of Violating OpenAI's Terms of Service in Developing Language Models

ByteDance, the parent company of TikTok, is reportedly breaching OpenAI's terms of service by utilizing its technology to create competing large language models. According to reports from The Verge, ByteDance is leveraging OpenAI's API to collect data for the development of its own foundational model, known internally as Project Seed.

OpenAI's regulations explicitly prohibit the use of models like GPT-4 to create rival models. However, ByteDance is said to be obtaining access to OpenAI's technology through Microsoft, which enforces similar rules, regularly maxing out its API access.

The Chinese company allegedly employed OpenAI's API extensively throughout Project Seed's development, encompassing training and model evaluation phases. ByteDance purportedly attempted to conceal its use of OpenAI's technology by employing data desensitization techniques, a method typically used to protect sensitive business or personal information.

Internal discussions on ByteDance's messaging platform, Lark, reportedly involve strategies to "whitewash" evidence of the company's illicit use of OpenAI's tech.

OpenAI responded to the allegations by suspending ByteDance's ChatGPT account and initiating an ongoing investigation.

In a statement to The Verge, a ByteDance spokesperson emphasized the company's commitment to adhering to OpenAI's terms of use. The spokesperson clarified that while GPT powers products in non-China markets, ByteDance employs a self-developed model called Doubao for its Chinese market conversational AI system.

ByteDance acknowledged that a small group of engineers had used OpenAI's API for an internal experimental model, a practice that was halted in April. The company implemented new internal requirements to ensure that text generated by GPT models is not added to the training datasets of its self-developed models.

ByteDance stated that its engineering team now uses GPT APIs to a minimal extent during the evaluation and testing processes, such as for score benchmarking.

The rush by Chinese tech giants, including ByteDance, Baidu, and Alibaba, to build their large language models has intensified in the wake of ChatGPT's popularity. Recently, China launched a new supercomputer dedicated to training AI models, underscoring the nation's commitment to advancing in the field.