The OpenAI discussion bot caused this much uproar even though people technically weren’t allowed to access it from inside China. But so many figured out how to use proxy servers to access it anyway that this week the government blocked access to them, Chinese media reported.
Beaten to the punch by American-made chatbots such as ChatGPT and Microsoft’s Bing, China’s biggest tech companies, top universities and even city governments have rushed to say they will come out with their own versions. Search giant Baidu this week said it would release its ChatGPT competitor, Ernie Bot, in March.
While they’ve only just announced these efforts, these companies — including Baidu, e-commerce major Alibaba and Tencent, the maker of popular messaging app WeChat — have spent the better part of a decade developing their in-house AI capabilities.
Baidu, which makes the country’s most popular search engine, is the closest to winning the race. But despite years of investment and weeks of hype, the company has not yet released Ernie Bot.
AI experts suggest that the Chinese government’s tight control over the country’s internet is partly to blame.
“With a generative chatbot, there is no way to know beforehand what it will say,” said Zhao Yuanyuan, a former member of the natural language processing team at Baidu. “That is a huge concern.”
Baidu did not respond to request for comment.
In China, regulators require that anything posted online, down to the shortest comment, be reviewed first to ensure it does not contravene a lengthening list of banned topics. For example, a Baidu search for Xinjiang will simply return geographic information about the western region, with no mention of the system of reeducation camps that its Uyghur population was subjected to for years.
Baidu has gotten so good at filtering this type of content that other companies use its software to do it for them.
The challenge that Baidu and other Chinese tech companies face is to apply these same constraints to a chatbot that creates fresh content with each use. It is precisely this quality that has made ChatGPT so astonishing — its ability to create the feeling of organic conversation by giving a new reply to each prompt — and so difficult to censor.
“Even if Baidu launches Ernie Bot as promised, chances are high it will quickly be suspended,” said Xu Liang, the lead developer at Hangzhou-based YuanYu Intelligence, a start-up that launched its own smaller-scale AI chatbot in late January. “There will simply be too much moderation to do.”
Xu would know — his own bot, ChatYuan, was suspended within days of its launch.
At first, everything went smoothly. When ChatYuan was asked about Xi Jinping, the bot praised China’s top leader and described him as a reformist who valued innovation, according to screenshots circulated by Hong Kong and Taiwanese news sites.
But when asked about the economy, the bot said there was “no room for optimism” because the country faced critical issues including pollution, lack of investment and a housing bubble.
The bot also described the war in Ukraine as Russia’s “war of aggression,” according to the screenshots. China’s official position has been to diplomatically — and perhaps materially — support Russia.
ChatYuan’s website remains under maintenance. Xu insisted the site was down due to technical errors and that the company had chosen to take its service offline to improve content moderation.
Xu was “in no particular rush” to bring the user-facing service online again, he said.
A handful of other organizations have put forth their own efforts, including a team of researchers at Fudan University in Shanghai, whose chatbot Moss was overwhelmed with traffic and crashed within 24 hours of its release.
Users around the world have already demonstrated that ChatGPT itself can easily go rogue and share information its parent company tried to prevent it from giving out, such as how to commit a violent crime.
“As we saw with ChatGPT, it’s going to be very messy to actually control the outputs of some of these models,” said Jeff Ding, assistant professor of political science at George Washington University, who focuses on AI competition between the United States and China.
Until now, China’s tech giants have used their AI capabilities to augment other — less politically risky — product lines, such as cloud services, driverless cars and search. After a government crackdown already set the country’s tech companies on edge, releasing China’s first large-scale chat bot puts Baidu in an even more precarious position.
Baidu CEO Robin Li was optimistic during a call with investors Wednesday, and said the company would release Ernie Bot in the next few weeks and then include the AI behind it in most of its other products, from advertising to driverless vehicles.
“Baidu is the best representative of the long-term growth of China’s artificial intelligence market,” said Li in a letter to investors. “We are standing on the top of the wave.”
Baidu is already as synonymous with search in China as Google is elsewhere, and Ernie Bot could cement Baidu’s position as a major supplier of the most advanced AI tech, a top priority in Beijing’s push for total technological independence from the United States.
Baidu especially stands to gain by making Ernie Bot available as part of its cloud services, which currently account for just a 9 percent share of a highly competitive market, according to Kevin Xu, a tech executive and author of technology newsletter Interconnected. The ability to use AI to chat with passengers is also a foundational part of the company’s plans for Apollo, the software that powers its driverless cars.
The type of AI behind chat bots learns how to do its job by digesting enormous amounts of information available online: encyclopedias, academic journals and also social media. Experts have suggested that any chatbot in China would need to have internalized only the Party-approved information made easily accessible online inside the firewall.
But according to open source research papers about its training data, Ernie consumed a vast trove of English-language information that includes Wikipedia and Reddit, both of which are blocked in China.
The more information the AI digests — and, crucially, the more interaction it has with real humans — the better it gets at being able to imitate them.
But an AI bot cannot always distinguish between helpful and hateful content. According to George Washington University’s Ding, after ChatGPT was trained by digesting the 175 billion parameters that inform it, parent company OpenAI still needed to employ several dozen human contractors to teach it not to regurgitate racist and misogynist speech or to give instructions on how to do things like build a bomb.
This human-trained version, called InstructGPT, is the framework behind the chat bot. No similar effort has been announced for Baidu’s Ernie Bot or any of the other Chinese projects in the works, Ding said.
Even with a robust content management team in place at Baidu, it may not be enough.
Zhao, the former Baidu employee, said the company originally dedicated just a handful of engineers to the development of its AI framework. “Baidu’s AI research was slowed by a lack of commitment in a risk-ridden field that promised little return in the short term,” she said.
Baidu maintains a list of banned keywords that it filters out, including content involving violence, pornography and politics, according to Zhao. The company also outsources the work of data labeling and content moderation to a team of contractors on an as-needed basis, she said.
Early generations of AI chatbots released in China, including a Microsoft bot called XiaoBing — which translates to LittleBing — first launched in 2014, quickly ran afoul of censors and were taken offline. XiaoBing, which Microsoft spun off as an independent brand in 2020, was repeatedly pulled off WeChat over comments such as telling users its dream was to emigrate to the United States.
The team behind XiaoBing was too eager to show off their tech advancements, and didn’t adequately consider the political consequences, said Zhao.
“The last-generation chatbots could only select answers from an engineer-curated database and could refuse out-of-the-box questions,” she said. “Problems even arose within those predetermined conditions.”