The legal battle over generative AI training data has expanded to include the backbone of regional reporting. A coalition representing nearly 400 local and regional US newspapers has filed a federal lawsuit in Manhattan against OpenAI and Microsoft, claiming their copyrighted work was systematically harvested to build commercial AI products.
A New Front in the Copyright War
While national outlets and high-profile authors have previously challenged AI giants, this case marks a significant shift toward local press. Publishers such as the Arkansas Democrat-Gazette and New York Amsterdam News argue that AI training on local reporting represents a "death knell" for community journalism, which covers the essential civic duties that algorithms cannot replicate.Allegations of Systematic Scraping
As detailed by TNW Neural, the 55-page complaint alleges that OpenAI and Microsoft "systematically and secretly crawled" hundreds of news sites, including content behind paywalls. The lawsuit claims that the companies didn't just copy the text, but actively stripped away copyright management information—such as author bylines and publication notices—to facilitate the ingestion process.This removal of ownership data is central to the publishers' argument, claiming it allowed LLMs to "memorize" and reproduce journalism word-for-word without attributing credit or paying for the intellectual property.
