As more artificial intelligence (AI) applications enter the marketplace, there are many unanswered legal questions, which could have a significant impact on the field. However, as AI programs grow in popularity, this increased scrutiny is a good thing for companies operating in the market.
Generative AI has had an incredibly impressive year. Microsoft, Adobe, and GitHub have all been working to integrate the technology into their products. And in 2018, startups raised hundreds of millions of dollars to compete with them. Generative AI is even making waves in popular culture by spawning countless memes generated with its text-to-image models. However, what people are concerned about most is whether or not it’s legal.
AI systems are trained to identify patterns in human-created data. Because these programs are used to generate code, text, music, and art, that data is itself created by humans. This means that there’s a question about the ethics of replicating content from unoriginal sources.
In the 2010s, this wasn’t much of a problem for AI researchers. At the time, state-of-the-art models were only capable of generating black-and-white images and fuzzy, fingernail-sized photos of faces. This wasn’t an obvious threat to human beings. But in 2022, when a lone amateur can use software like Stable Diffusion to copy an artist’s style in a matter of hours or when companies are selling AI-generated prints and social media filters that are explicit knockoffs of living designers–questions about legality and ethics have become much more pressing.
One of the most important questions facing AI models is if they can get copyright protection.
The answer to this question is fairly straightforward. As it stands, there is no copyright protection in the United States for works generated solely by a machine — which includes compilations of data. It’s still possible that copyright is applicable when the creator can prove there was a high level of human input.
In September of 2018, the U.S. Copyright Office granted a first-of-its-kind registration for a comic generated with the help of text-to-image AI. After the comic was made 18 pages long with characters, dialogue and format, it was reported that U.S. Copyright Office is reviewing its decision to rescind the work’s copyright registration. It appears that one factor in this review will be how much human involvement was involved in making the comic. Kristina Kashtanova, the artist who created the work, told IPWatchdog that she had been asked by U.S. Copyright Office “to provide details of my process to show that there was substantial human involvement in the process of creation of this graphic novel.” (U.S. Copyright Office does not comment on individual cases.)
According to Guadamuz, this will not be an issue when it comes to granting copyright for works generated with the help of AI. “If you just type ‘cat by van Gogh,’ I don’t think that’s enough to get copyright in the U.S.,” he says. “But if you start experimenting with prompts and produce several images and start finetuning your images, start using seeds, and start engineering a little more, I can totally see that being protected by copyright.”
With this in mind, it’s likely that the majority of AI-generated material cannot be copyrighted because it is generative and churned out en masse with just a few keywords used as a prompt. But more involved processes would make for better cases. These might include controversial pieces, like the AI-generated print that won a state art fair competition. In this case, the creator said he spent weeks honing his prompts and manually editing the finished piece, suggesting a relatively high degree of intellectual involvement.
Computer scientist Giorgio Franceschelli discusses the problems surrounding AI copyright and says that measuring human input will be especially true for decisions in the European Union. The UK is a major jurisdiction of concern for Western AI startups, but they have laws that differ from many other countries. Unlike most nations, the UK grants copyright to works generated solely by a computer and identifies the human author as “the person by whom the arrangements necessary for creation of the work are undertaken.” This case offers precedence for some sort of copyright protection to be granted by a court.
Registration is only a first step, cautions Guadamuz. “The US copyright office is not a court”, he says. “You need to register if you’re going to sue someone for copyright infringement, but it would be the courts that decide whether or not that’s legally enforceable.”
Do you use copyrighted materials to train your AI models?
There are many experts who are wary about the implications of AI and copyright, especially when it comes to the data used to train these models. Most systems are trained on vast amounts of data that have been scraped from many different sources, including websites like blogs hosted by WordPress or Blogspot and art platforms like DeviantArt. One dataset for a text-to-AI system called Stable Diffusion, for instance — one of the largest and most influential AI models currently in use — contains billions of images that were harvested from hundreds of domains; from personal blogs to stock photography sites like Shutterstock and Getty Images. It’s likely that you’re in this dataset too (in fact, there’s even an website where you can upload a photo or write some text to see).
The US’s fair use doctrine states that the use of copyright-protected images is allowable as long as it is done to promote freedom of expression. AI researchers, startups, and big tech companies are all doing this in some capacity, claiming that it falls under the definition of fair use.
When it comes to what’s considered fair use, there are a number of considerations. Daniel Gervais, graduate and Professor at Vanderbilt Law School with extensive knowledge in intellectual property law, states that two factors have “much more prominence.” The first is the purpose or nature of the use, which often becomes “transformative” in some way. The second is how it could impact the market of the original creator – would it compete with their business?
Gervais says that it is far more likely than not that training systems on copyrighted data will be covered under fair use. However, he believes that it’s less clear if generating content falls under the bounds of fair use. As an example, he points out the difference between making fake money for a movie set and trying to buy a car with it.
The same text-to-image AI model is deployed in two scenarios. If the model is trained on many millions of images, it’s unlikely that this constitutes copyright infringement. The training data has been changed during the process, and the output does not threaten the market for the original art. However, if you fine-tune that model by 100 pictures by a specific artist and generate pictures that match their style, an unhappy artist would have a stronger case against you.
Investments by businesses and nonprofits into AI has led some startups to create fair use defenses that won’t result in lawsuits. For example, Stability AI didn’t collect the training data or train their models, they funded and coordinated these tasks by academics. With these concessions in place, Startup was able to distribute Stable Diffusion, a model licensed by German university. These protections allow them to sell the platform as a service without running afoul of copyright law.
Baio has labeled this process “AI data laundering.” He notes that this practice is being used to create facial recognition products, and points to the case of MegaFace, a dataset compiled by researchers from the University of Washington. “The academic researchers took the data, laundered it, and it was used by commercial companies,” says Baio. Now he says this data- including millions of personal pictures–is in the hands of Clearview AI and law enforcement and the Chinese government. Such an old-fashioned laundering process will likely help shield creators of generative AI models from liability as well.
There’s a last twist to all this, and while Gervais is confident that they will do the same as they did in previous cases, he believes it would be risky to say anything is settled law while waiting for the Supreme Court to release their decision.