Audible—the audiobook and podcast arm of Amazon, and one of the largest distributors of audiobooks in the world—announced on Tuesday that it will be “bringing new audiobooks to life through [its] own fully integrated, end-to-end AI production technology.” According to a statement from Audible’s CEO, Bob Carrigan,
Audible believes that AI represents a momentous opportunity to expand the availability of audiobooks with the vision of offering customers every book in every language, alongside our continued investments in premium original content. We’ll be able to bring more stories to life — helping creators reach new audiences while ensuring listeners worldwide can access extraordinary books that might otherwise never reach their ears.
An announcement from Audible states that the company will soon be offering—to select publishing partners only, for now—both Audible-managed and self-service AI audiobook production, from which “publishers can choose from a quickly growing and improving selection of more than 100 AI-generated voices across English, Spanish, French, and Italian with multiple accent and dialect options.” Notably, authors will also be able to “access voice upgrades for their titles as [Audible’s] technology evolves.” Finally, the company plans to begin rolling out an AI translation service, which will allegedly be able to provide complete text translations from English to Spanish, French, Italian and German.
Those are some big promises, and as with most generative AI products thus far that promise a lot—and lean hard on the assurance of “growth” and “upgrades” that can be applied once the technology is better—experts are skeptical that Audible will be able to deliver a product that meets all of these promises or provides listeners with an experience that is as satisfying as listening to a human narrator.
To begin with, as James Folta of Lit Hub pointed out, the technology itself is not yet anywhere near capable of doing what Audible claims it is:
[S]ince these programs have a tendency to glitch out and invent—hallucinate, if you prefer to imagine the algorithm dreaming—then there’s a non-zero chance that these programs could edit or rewrite the book they’re translating to audio or another language. If left alone to interpret a text, who’s to say a confused large language model won’t hit a snag and starting inserting other books, or Reddit AITA threads, or old Jimmy Carter speeches into your Audible copy of Madame Bovary?
Language translation is itself an art, and one that requires consideration (and I would argue human consideration) of multiple competing factors, such as the era in which the original work was written, the differences between the culture the work is derived from versus the language it is being translated into, and much more. As writer and translator Yilin Wang explains,
Translation is an art, and it takes me just as long to translate a poem as it takes for me to write an original one in English. I have to work hard to research the poet, the times they’re living in, and the literary forms they’re working in, then find creative ways to convey the spirit of their work in English. Classic Chinese poetry has many cultural idioms, archaic diction, and completely different grammar and syntactical structures to English.
But even if Audible’s AI could do everything it claims it will, that doesn’t change the fact that narration, too, is an art form honed by years of experience; the human element in narration simply cannot be so easily overlooked or mimicked. Professional narrator Kristin Atherton says that human narrators “actively sell audio content by being good at their jobs”:
The art—and it is an art—of a good audiobook is the crack in the voice at a moment of unexpected emotion, the wryness of good comedy timing, or the disbelief a listener feels when one person can convincingly be a whole cast of characters. No matter how 'human' an AI voice sounds, it's those little intricacies that turn a good book into an excellent one. AI can't replicate that.
Even the most cursory glance at social media following the days after Audible’s announcement will tell you that readers and listeners are largely uninterested in, or actively against, AI-generated narration as well. Huge numbers of readers took to Threads to express their disgust with Audible for going this route, saying they planned to cancel their subscriptions, or already had, in favor of other services such as Libby (which allows users to borrow audiobooks for free with any library card) and Libro.fm, which, at least so far, has not indicated any plans to begin using AI-generated narration.
So what is the driving force behind Audible’s decision? It’s certainly not a desire for higher-quality audiobooks or better experiences for listeners. As renowned translator Frank Wynne explained recently in an article for the Guardian,
No one pretends to use AI for translation, audiobooks, or even writing books because they are better; the only excuse is that they are cheaper. Which is only true if you ignore the vast processing power even the simplest AI request requires. In the search for a cheap simulacra to an actual human, we are prepared to burn down the planet and call it progress.
As usual, it comes down to this: greedy CEOs endeavoring to put more money in their own pockets while short-changing literally everyone else, in order to force products that no one wants upon unwitting consumers. As James Folta recently wrote,
AI is simply not very good. In addition to the massive environmental waste of AI, its energy usage, and the fact that AI models are built using stolen art, these companies have also not yet built anything that people are excited about. The only folks who seem onboard are LinkedIn MBA bros, freaks who are trying to game dating apps, and students trying to get homework done more quickly.
I’ll end this with my usual plea: For the love of art, reject AI. For all of us, from writers to artists to voiceover actors to all the readers and listeners out there. Educate yourselves on what is happening in these spaces, and reject products that are actively worse than the actual art we have all come to enjoy and expect. I don’t want you to pay for garbage, and I don’t want you to implicitly support the creation of more garbage. Instead, look into the other apps I’ve mentioned here, like Libby and Libro.fm, for your audiobook usage—both of these services are cheaper than Audible (Libby is free!). Or consider Kobo, an e-book and audiobook provider that supports local bookstores.
It’s clear that very few consumers are clamoring for a future rampant with mediocre AI-generated “content,” but that future is currently being forced down our throats by huge companies with enormously outsized influence. The choices we make, the products we consume, and the companies we support going forward are going to have a huge hand in shaping the future of AI—and, much more importantly, the future of art.
On the other hand.... I recently listened to a podcast by Lex Fridman - an interview with Indian PM Modi. Modi spoke in Hindi in the interview itself, but the audio had him speaking in English. The translator was a woman, but the voice was an excellent AI-generated version of Modi's voice. (I asked an Indian student to listen to some of it and she said that it was a perfect imitation of Modi's speaking style, mannerisms, tone, etc.) So there's a case of AI enhancing an audio product by pairing with the human translator. Much as Gary Kasparov has written about chess, it turns out AI + human (in chess, chess program + human) can be better than either AI (chess program) or human alone. And Google's Notebook LM is terrific at generating podcast-like audio summaries of material (my students find it really helpful in reviewing material). So overreacting (which I think you are) is likely to throw the baby out with the bathwater.