Reddit was launched in 2005 by current CEO Steve Huffman and Alexis Ohanian as a place where users submit content — news articles, funny photos, cat videos, random observations — to be supported (“upvoted”) or buried (“downvoted”).
Since then, Reddit has become best known as a collection of communities called “subreddits,” where people discuss everything from the Vancouver food scene to serious personal struggles.
“People go there to literally get support to help them stop drinking, to help them stop doing drugs, to connect with people who are going through similar challenges,” said Sarah Gilbert, research director at Cornell University’s Citizens and Technology Lab. “Because it’s largely pseudonymous, that allows you to participate a little more openly.”
Even if you’re not a die-hard Redditor, there’s a chance the site has helped you because its years of posts on basically every subject frequently appear in search results. No wonder, then, that Google and companies like it have found Reddit to be a treasure trove of training data for chatty AI models — something that has naturally rubbed some the wrong way.
“Reddit users highly value privacy, which makes sense, given that most people contribute pseudonymously,” Gilbert said in a statement earlier this year. “We also found that how they feel about data use varies by context. For example, we found that users would be uncomfortable with use of private data, such as direct messages.”