The New York Times has filed a federal lawsuit against OpenAI and Microsoft in what might be the most consequential case of the generative AI era so far, one that could set a precedent for how AI models can be trained and rebalance the competing interests of creators and technologists.  

In this episode of The Disruption Is Now, host Greg Matusky explores the nuances of the lawsuit, the definition of fair use, and how this all might play out with Doug Panzer, an IP attorney and Partner at Caesar Rivise, who demystifies it all in plain English.

They compare this lawsuit to the case against Google Books, explore the most pertinent questions that need answering, and discuss how this pivotal case might affect copyright law and define the future of generative AI.

Watch now: 

Key takeaways: 

Training data questions loom large

It remains unclear exactly how OpenAI used the New York Times’ articles to train its AI models, which enables the verbatim text regeneration the suit alleges. Did OpenAI scan physical newspapers? Programmatically download articles by logging into the NYT website?

These unsettled questions around the process of acquiring, reproducing, and leveraging copyrighted content for AI ingestion and training carry legal significance. It seems likely the specific methods by which this data was obtained and used will come out in the course of the litigation.

Google Books case establishes relevant precedent

A pivotal fair use ruling found that Google could digitize the full text of copyrighted books without harming creators’ revenue streams or abilities to benefit from their works.

This suggests OpenAI’s ingestion of New York Times articles to advance its AI capabilities could also qualify as fair use, as long as ChatGPT and related outputs don’t directly threaten Times subscriptions or archive access.

Tensions between innovators and creators persist

As with past disruptive digital technologies, AI systems like OpenAI’s promise societal benefits and technological progress while threatening the business models of existing industries like publishing.

Judges have historically favored and upheld innovations that advance science and the public good, though reasonable constraints often follow to balance competing interests. Expect ongoing tensions between advancing AI capabilities and protecting creator livelihoods and industries vulnerable to disruption.

Guardrails over overhauls

Legal precedent suggests this lawsuit will likely establish helpful guardrails defining permissible uses of copyrighted content for AI training rather than prompt radical overhauls to intellectual property legislation.

Nonetheless, big open questions remain surrounding AI’s impacts to creator revenue streams and access to human-generated source content as training data to enable ongoing AI progress.

Key moments:

● Explaining the lawsuit and claims of copyright infringement (2:23)
● How did OpenAI train its models with New York Times content? (3:29)
● Defining fair use and its four factors (5:17)
● Applying the four factor test to OpenAI’s training data (7:16)
● The issue of memorization and regurgitation (10:46)
● The impact of precedent and the Google Books fair use case (12:09)
● What is the commercial impact on the New York Times? (15:30)
● Predicting the outcome and impact for OpenAI (17:44)
● Do we need to rewrite copyright laws? (20:59)
● Could Microsoft acquire the New York Times? (22:24)