New York Times v. OpenAI: Tech Company’s Infringement on Copyright Holders

Magaly Taylor, Contributing Member 2023-2024

Intellectual Property and Computer Law Journal

I. Introduction

The boundaries of copyright law have been challenged as technology progresses, with a current challenge being artificial intelligence. At the end of last year, The New York Times  (NYT) sued OpenAI and Microsoft for copyright infringement on their published work.[1] The NYT alleges the defendants seek to “free ride” on the NYT’s massive investment by using the materials without permission or payment.[2] Additionally, the Times contends that articles published by the Times were being used to train AI chatbots to provide information from their newspaper articles, causing competition with the news outlets as a source of reliable information.[3]  Infringement on newspaper articles can threaten journalism by reducing profit to the organizations, and it could pose a challenge by making it difficult for the reader to distinguish between facts and fiction.[4]

The NYT is the first major media organization to sue OpenAI over copyright issues related to their written works.[5] However, the NYT is one of many copyright holders that have attempted to protect its copyright material. Authors including John Grisham, the author of “Game of Thrones,” George Martin, Saral Silverman, and former Arkansas governor Mike Huckabee, filed a class-action lawsuit over using their text in AI software earlier last year.[6] Other copyright holders such as  visual artists and music producers, have filed similar lawsuits against tech companies, such as META and Microsoft.[7]  All complaints have a similar issue: the alleged infringement of copyrights by tech companies taking the information and reproducing their materials without the copyright holder’s permission and monetizing from it.[8]

This article will discuss the lack of proper legal boundaries between AI and copyrightable materials and the potential solutions to this growing issue. Part II of this article will discuss the background of the complaint filed by the N.Y.T. against OpenAI, copyright law and its relevant defenses, the history of AI and copyright law, and finally, the way Europe (E.U.) has reacted to the growing challenges associated with AI. Part III of this article will discuss the challenges of enforcing mass data, the potential consequences of limiting AI, tech companies’ defenses, and the unmet need to regulate AI.

II. Background

The Complaint

In December 2023, The Times sued OpenAI and Microsoft, alleging that they have programmed their system to take their information. This programming  diverts readers from the newspaper outlet, which depends on advertising revenue generated from their websites to continue producing quality and reliable journalism.[9] The Times further alleges that OpenAI is using Times articles word-for-word, and provided 100 plus examples of OpenAI using Times work in their complaint.[10] 

In their complaint, the NYT included an example of a 2019 article.[11] The article was a five-part series that consisted of an 18-month investigation with approximately 600 interviews and a review of thousands of records.[12] OpenAI, which had no role in the investigation or article, recited large portions of the original article verbatim in their ChatGPT.[13] Further, in the complaint, the NYT asked ChatGPT to provide a paragraph from an article that was paywalled out of reading, in which ChatGPT provided the first three paragraphs verbatim.[14]

The NYT alleges that infringement confuses the reader, blurring the line for information between fact and fiction, which has caused the NYT financial damages.  makes it difficult for a reader to determine whether the information is fact or fiction, and it is financially hurting the NYT, causing damages.[15]

Copyright Law

Copyright is a type of intellectual property that protects original works of authorship.[16]  To qualify for copyright protection, the copyright holder must meet certain prerequisites, including the subject matter categories, originality, and the copyrightable material being fixated in a tangible medium.[17] Different types of copyrightable works include paintings, photography, books, poems, movies, and journalism.[18] Under copyright law, journalist organizations like the NYT would become copyright owners once the original work is created.[19] 

Fair Use Doctrine

The Fair Use Doctrine is an affirmative defense under copyright that promotes freedom of expression by allowing the otherwise unlicensed use of copyright-protected works to still be used without compensation in certain circumstances without imposing infringement on the copyright holder.[20] In determining whether the use of a work, in any case, is fair, the court will consider the following factors: purpose of the use;[21] the nature of the copyrighted work;[22] the amount and substantiality of the work used about the work as a whole;[23] and the effect of the use on the market or value of the work.[24] Overall, courts will look at the totality of the circumstances using the four factors to determine whether the copyrighted material falls under the fair use doctrine.[25]

History of AI and Copyright Law

In the Southern District of New York, preceding cases have sided with tech companies when interpreting whether AI systems infringed under copyright laws.[26] The court previously failed to see copyright infringement regarding the copyright holder’s work being taken and an AI system releasing small portions of that data to use.[27] Tech companies have relied on the 2016 Supreme Court case The Authors Guild v. Google, Inc.[28]  

In The Authors Guild v. Google, Inc., the court held  in favor of Google’s fair use defense, even if the data was not being altered.[29] In Google, Inc., Google scanned and indexed over 20 million books, including books that were copyrighted works.[30] Members who accessed Google Books would be able to enter search words,  receive a list of books that had those terms along with brief description, and a snippet viewing of the text.[31] Google claimed that this system made it easier for users to find new works that they otherwise wouldn’t have found, thus making the information for use and overall benefit the copyright holder.[32] In this case, Google did not receive payment.[33] The courts held that the data was just a snippet, so although there is a possibility of some loss of sales, more is needed to make the copy an adequate competing substitute.[34] 

International Laws with Artificial Intelligence

At the end of last year, The E.U. created the first legal framework on AI.[35] The AI Act entered into force in late December 2023 and will have specific prohibitions take effect in waves.[36] The AI Act aims to provide AI developers and users with precise requirements and obligations on the usage of AI while strengthening uptake, investment, and innovation.[37] The E.U. concluded that AI systems may gain economic and societal relevance, so it is essential to regulate them through proper safeguards to ensure the systems respect fundamental rights, safety, and ethical principles.[38] The first provision includes three distinct requirements related to copyright.[39] First, the act requires providers of general-purpose AI models to put in place a policy to respect E.U. copyright law.[40] Second, under the same provision, the section requires general-purpose AI systems providers to provide a detailed summary of content used for training AI systems.[41] Third, the act clearly outlines that copyright-protected content requires the copyright holder to authorize its use unless relevant copyright exceptions apply.[42] When the rights holder has opted out appropriately, providers of general-purpose AI models will need authorization from rights holders if they want to use the data on their copyrighted works.[43] 

III. Discussion

Fair Use Defense

There has been a rising concern for uncompensated use of intellectual property by AI systems. In response to this, tech companies are defending their AI training by arguing that using their material qualifies as “fair use” under copyright law.[44] In The NYT v. OpenAI, Microsoft claims that training AI models to use publicly available materials is fair use by long-standing precedents, and relies on The Authors Guild v. Google, Inc. for their defense.[45] 

The NYT also says that their chatbots bypass the newspaper’s paywalls to create summaries of their articles.[46] Although article summaries are not copyright infringement, the use of the summaries could be used by NYT to demonstrate a negative commercial impact on the newspaper, which could challenge the fair use defense.[47] The Times stated that they are taking from the newspaper’s investment in journalists to build products without permission or payment.[48] The NYT argues that the infringement isn’t fair to use because they are taking information from them and building substitute products.[49]  Fair Use is case-by-case and fact-dependent, but because of previous binding cases, this may be an uphill battle for the Times.[50] 

Potential Consequences

Challenges of Enforcement of Mass Data

The future of AI could be constrained by how it uses copyrighted material to train and improve its systems. Tech companies operating OpenAI state that it would only be possible to train today’s AI models to obtain data using copyrighted materials.[51] Allowing publishers to opt out could have a significant impact. The NYT, being the only one to opt out might not cause a significant change, but giving companies an option could lead to issues. Another issue is that it is not one size fits all, and you could overregulate certain areas and underregulate others.

Limitations of AI

This suit could create roadblocks to the AI industry.[52] If the court finds that OpenAI and Microsoft infringed on the copyright holder’s work, the cost could significantly hamper AI development for the companies that would have to pay.[53] Limiting the data to books and materials in the public domain created over a century ago would not meet the needs of today’s society and could create gaps in providing accurate information. Additionally, limiting the potential data usage to copyrightable material or material that tech companies make a contract with could very likely cause the United States to lose its position as the leader in AI development.[54]

Possible Solution: Following EU Standards

It is essential to prevent tech companies from using AI to benefit from copyright holders’ work without properly compensating them. The establishment and enforcement of rules and regulations for AI training are required. A Supreme Court decision is needed for more concrete precedence, and legislation is required for the regulation of new technological advancements.[55] The E.U. model is a way to deal with the ever-changing and emerging fast-moving technology.[56] 

Adopting a standard like the E.U model would require tech companies to work with copyright holders and properly compensate them for using their copyrighted works. It would also set a standard for companies by mandating them to work with copyright holders before there is any damage from infringement.

IV. Conclusion

This case, The New York Times v. OpenAI, will ultimately steer the future of AI. The NYT and other important copyright holders will have an uphill battle due to the history of AI, copyright law, and preceding history. As a result, the legal system and lawmakers must get ahead of evolving technology because they are constantly being outpaced by the tech industry.


[1] Matt O’Brien, ChatGPT-Maker Braces for Fight with New York Times and Authors on “Fair Use” of Copyrighted Works, AP News, (Jan 10, 2024, 4:05 PM), https://apnews.com/article/openai-new-york-times-chatgpt-lawsuit-grisham-nyt-69f78c404ace42c0070fdfb9dd4caeb7.

[2] Id.

[3] Jonathan Stempel, NY Times sues OpenAI, Microsoft for infringing copyrighted works, Reuters, (Dec. 27, 2023), https://www.reuters.com/legal/transactional/ny-times-sues-openai-microsoft-infringing-copyrighted-work-2023-12-27/.

[4]  Id.

[5] Cade Metz, OpenAI Says New York Times Lawsuit Against It Is ‘Without Merit”, The New York Times, (Jan. 8, 2024), https://www.nytimes.com/2024/01/08/technology/openai-new-york-times-lawsuit.html.

[6] Michael M. Grynbaum, The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work, The New York Times, (Dec, 27, 2023). https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html.

[7] Blake Brittain, How Copyright Law Could Threaten the AI Industry in 2024, Reuters, (Jan. 2, 2024, 11:57PM), https://www.reuters.com/legal/litigation/how-copyright-law-could-threaten-ai-industry-2024-2024-01-02/.

[8] Id.

[9] Hayden Field, OpenAI Alleges New York Times ‘hacked” ChatGPT for Lawsuit Evidence, The CNBC, (Feb. 27, 2024, 1:29PM),  https://www.cnbc.com/2024/02/27/openai-alleges-new-york-times-hacked-chatgpt-for-lawsuit-evidence.html.

[10] Id.

[11] Complaint at 30, New York Times Company v. Microsoft Corp., et al, Case No. 1:23-cv-11195 (S.D.N.Y. Dec. 27, 2023).

[12] Id.

[13] Id.

[14] Id. at 33.

[15] Jonathan Stempel, supra, note 3.

[16] What is Copyright?, US Copyright Office, https://www.copyright.gov/what-is-copyright/ (last visited Feb. 29, 2024).

[17] Id.

[18] Id.

[19] Id.

[20] 17 U.S.C § 107

[21] Id.

[22] Id.

[23] Id.

[24] Id.

[25] Id.

[26] Matt O’Brien, supra note 1.

[27] Id.

[28] Id.

[29] The Authors Guild v. Google, Inc., 804 F.3d 202, 207 (2d Cir, 2015).

[30] Id. at 208.

[31] Id. at 209.

[32] Id.

[33] Id.

[34] Id.

[35] AI Act, European Commission, https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai#:~:text=The%20AI%20act%20ensures%20that,address%20to%20avoid%20undesirable%20outcomes. (last visited Mar. 1, 2024).

[36] Id.

[37] Id.

[38] Id.

[39]  Id.

[40] Id.

[41] Stas Malyarevsky, How a New York Times Copyright Lawsuit Against OpenAI Could Potentially Transform How AI and Copyright Work, The Conversation, (Jan. 17, 2024, 12:49 PM), https://theconversation.com/how-a-new-york-times-copyright-lawsuit-against-openai-could-potentially-transform-how-ai-and-copyright-work-221059.

[42] Id.

[43] Id.

[44] Blake Brittain, supra note 7.

[45] Stas Malyarevsky, supra note 41.

[46] Id.

[47] id

[48] Matt O’Brien, supra note 1.

[49] Id.

[50] Id.

[51] Stas Malyarevsky, supra note 41.

[52] Blake Brittain, supra note 7.

[53] Id.

[54] Michael M. Grynbaum, supra note 6.

[55] Id.

[56] Id.

Leave a comment

Blog at WordPress.com.

Up ↑