Mike Huckabee joins authors suing Microsoft, Meta over AI copyright

by newsconquestOctober 19, 2023081

Former Arkansas Gov. Mike Huckabee and a group of religious authors filed a new lawsuit against a group of tech companies, arguing they trained artificial intelligence tools on the authors’ books without permission.

The suit, filed Tuesday in New York federal court, is the latest in a procession of lawsuits targeting tech companies for training their AI on text they scraped from the web, a practice that has helped OpenAI, Google and others create breakthrough chatbots like ChatGPT and kick off a competitive scramble to sell AI tools.

“While using books as part of data sets is not inherently problematic, using pirated (or stolen) books does not fairly compensate authors and publishers for their work,” the plaintiffs, which include Huckabee, and Christian writers and podcasters including Tsh Oxenreider and Lysa TerKeurst, said in the lawsuit. The suit targets Meta, Microsoft and financial data provider Bloomberg L.P., all of which have trained their own “large language models” — the giant algorithms that power tools like ChatGPT — using data from the web.

The lawsuit zeroes in on an infamous collection of pirated books, known as “books3,” which the plaintiffs allege was included in “the pile” — a freely available collection of data sources compiled by nonprofit group EleutherAI to allow smaller companies access to more data to train their own AI. The lawsuit also names EleutherAI as a defendant. The lawsuit, a proposed class-action, is seeking damages and an injunction to bar the companies from continuing to use their works.

A spokesperson for Microsoft declined to comment. Spokespeople for Meta, Bloomberg and EleutherAI did not respond to requests for comment.

Large language models are generally trained on billions of sentences of text pulled from the internet, including news stories, Wikipedia and comments on social media sites. OpenAI and other AI companies such as Google and Microsoft do not say specifically which data they use, but AI critics have long suspected that it includes collections of pirated books.

The battle over whether companies can take data from the internet without payment or permission to train their potentially lucrative AI models is only heating up. Multiple lawsuits from comedians, writers and artists have targeted the tech companies. Tech executives argue that taking data from the public web falls under “free use” — a concept in copyright law that creates exemptions for works that are substantially different from the source material they may be derived from.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Related posts

NewsConquest Cookie Policy