“As artificial intelligence continues to evolve, the law must progress as well…. Beyond state laws, there needs to be a push for greater protection of the common person against vocal deepfakes.”
Jay-Z recently tried to have a YouTube video removed for copyright violations. When YouTuber Voice Synthesis used an open-source program, Tacotron 2, to digitally impersonate Jay-Z’s iconic voice saying different things or singing songs, his entertainment agency Roc Nation LLC claimed that the YouTuber “unlawfully uses an AI to impersonate our client’s voice” and infringe on Jay-Z’s copyright. Roc Nation’s assertion of copyright protection via YouTube’s copyright strike system begs the question: with ever-evolving AI, are voices copyrightable?
Protecting One’s Voice
In short, no; a voice cannot be copyrighted. Midler v. Ford Motor Co. proclaimed that ” A voice is as distinctive and personal as a face. The human voice is one of the most palpable ways identity is manifested.” The court held that not every instance of commercial usage of another’s voice is a violation of law; specifically, a person whose voice is recognizable and widely known receives protection under the law through their right of publicity as protection from invasion of privacy by appropriation. This protects public figures and celebrities from their identities being misappropriated and potentially misused from commercial gain. The right of publicity is the celebrity’s analog for the common person’s right of privacy; both are protected with different underlying motivations—one is to allow a celebrity alone to capitalize on their fame, and another is to allow a private person to remain private.
While Midler did not involve common copyright, the plaintiff asserted her rights to her recognizable voice when used for another’s commercial gain. 17 U.S.C.A § 102 provides the categories of art that may be protected by copyright: “sound recordings” permits the protection of individual or groups of recordings, such as jingles or derivations from the equally protected “musical works.” Although both protect the underlying musical elements and lyrics involved, neither protect the voice that may be heard in a recording in its entirety. The words or noises a voice is making are protected, but the voice itself remains undefined. There is no embodiment of an idea in a voice—the voice in its entirety (not saying anything specific) cannot embody anything but the thoughts of the person uttering them at that very moment. Butler v. Target Corp. held that although lyrics to a song are copyrightable, the underlying voice is not. As the “sounds are not fixed“, there is no copyright protection available to the infinite number of words or phrases a person might utter in their distinctive voice.
Voice Synthesis has a nonmonetized YouTube channel, meaning they place no ads on their video and in turn, do not receive any money from it. The channel has no commercial incentive to use celebrities’ voices for financial gain. Moreover, even if it were found that the voice itself was protected by copyright, it may be derivative enough to fall under 17 U.S.C.A § 103 as it transforms the audio into something entirely different than the original. The vocal deepfakes are essentially the same, in the channel owner’s eyes, as “someone naturally doing an (extremely accurate) impression of that celebrity’s voice.”
Help From the States
If a voice cannot be copyrighted, what are the protections against the unlawful use of a voice? For the average person, specific state-laws are the only source of relief. California passed a law to prevent people from creating deepfakes, AI-generated mimicry of one person’s face into another’s, from being used to mislead voters during elections. Virginia similarly passed a revenge porn law that includes the prohibition of the use of deepfakes in pornography with unanimous support in both the State Senate and House. For celebrities, trademarking the voice presents a greater opportunity for success in voice protection. 15 U.S.C.A. § 1051 provides the most open-ended route of protection, as the voice’s use could be protected over a variety of categories; this would require a person to register their voice in multiple categories, but much like copyrighting a voice, it does not wholly prevent someone from using another’s voice. Celebrities receive the greatest amount of protection, as their use is prevented via the the tort of the right of publicity—their ability to choose how their likeness is used for commercial gain.
As artificial intelligence continues to evolve, the law must progress as well. Revenge porn remains legal in some states, whereas others have extended their laws to cover AI-generated content. Tacotron 2, the technology mainly used for vocal deepfakes, and its equivalents for visual deepfakes, exist on open-source platforms; all that is required to create very realistic audio recordings of one person saying the desired phrase are multiple clips of someone speaking and transcriptions. The AI allows the user to teach it to get better at impersonating another.
The Law Needs to Catch Up
Beyond state laws, there needs to be a push for greater protection of the common person against vocal deepfakes. What protection would a grandmother have if someone used her voice to call her grandchildren for illicit purposes? Currently, there are few protections, if any.
Image Source: Deposit Photos