A voice technology company which uses artificial intelligence (AI) to generate realistic speech says it will introduce extra safeguards after its free tool was used to generate celebrity voices reading highly inappropriate statements.
ElevenLabs released a so-called voice cloning suite earlier this month.
It allows users to upload clips of someone speaking, which are used to generate an artificial voice.
This can then be applied to the firm’s text-to-speech speech synthesis feature, which by default offers a list of characters with various accents that can read up to 2,500 characters of text at once.
Read more:
Ukraine war: Deepfake video of Zelenskyy telling Ukrainians to ‘lay down arms’ debunked
‘Google it’ no more? How AI could change the way we search the web
It didn’t take long for the internet at large to start experimenting with the technology, including on the infamous anonymous image board site 4chan, where generated clips included Harry Potter actress Emma Watson reading a passage from Adolf Hitler’s Mein Kampf.
Other files found by Sky News included what sounds like Joe Biden announcing that US troops will go into Ukraine, and a potty-mouthed David Attenborough boasting about a career in the Navy Seals.
Film director James Cameron, Top Gun star Tom Cruise, and podcaster Joe Rogan have been targeted, and there are also clips of fictional characters, often reading deeply offensive, racist, or misogynistic messages.
‘Crazy weekend’
In a statement on Twitter, ElevenLabs – which was founded last year by ex-Google engineer Piotr Dabkowski and former Palantir strategist Mati Staniszewski – asked for feedback on how to prevent misuse of its technology.
“Crazy weekend – thank you to everyone for trying out our Beta platform,” it said.
“While we see our tech being overwhelmingly applied to positive use, we also see an increasing number of voice cloning misuse cases. We want to reach out to Twitter community for thoughts and feedback!”
The company said that while it could “trace back any generated audio” to the user who made it, it also wanted to introduce “additional safeguards”.
It suggested requiring additional account checks, such as asking for payment details or ID; verifying someone’s copyright to the clips they upload; or dropping the tool altogether to manually verify each voice cloning request.
But as of Tuesday morning, the tool remained online in the same state.
The company’s website suggests its technology could one day be used to give voice to articles, newsletters, books, educational material, video games, and films.
Sky News has contacted ElevenLabs for further comment.
Please use Chrome browser for a more accessible video player
Dangers of AI generated media
The deluge of inappropriate voice clips is a reminder of the perils of releasing AI tools into the public sphere without sufficient safeguards in place – previous examples include a Microsoft chatbot which had to be taken down after quickly being taught to say offensive things.
Earlier this month, researchers at the tech giant announced they had made a text-to-speech AI called VALL-E that could simulate a person’s voice based on just three seconds of audio.
They said they would not be releasing the tool to the public because “it may carry potential risks”, including people “spoofing voice identification or impersonating a specific speaker”.
The technology presents many of the same challenges as deepfake videos, which have become increasingly widespread on the internet.
Last year, a deepfake video of Volodymyr Zelenskyy telling Ukrainians to “lay down arms” was shared online.
It came after the creator of a series of realistic Tom Cruise deepfakes, albeit more light-hearted clips purporting to show the actor doing magic tricks and playing golf, warned viewers about the technology’s potential.