Replicating More than Just Holograms

8 min readDec 8, 2021

--

Information about the Technology

What is it?

In the year 2020, a technology called “Deepfake” had broken out of the shell of being a mere researching tool and into the hands of amateurs. This technology acts to imitate a particular person by utilizing boundless amounts of data that has been recorded and maintained mostly on the Internet. While this technology has existed in the past, the flavor of the technology that pertains to this course and this time is that of vocal deepfake, where people now have access to technologies that imitate the voice of a person by analyzing audio files of their voice. The ability to deepfake voices has mostly been picking up this year with the advent of a plethora of websites hosting this type of program.

From: Detecting deepfakes by looking closely reveals a way to protect against them. Phys.org. (n.d.). Retrieved December 8, 2021, from https://phys.org/news/2019-06-deepfakes-reveals.html. Shows previous use of deepfake technology in that usually celebrities would be placed or “deepfaked” onto other people or in this case other celebrities.

How does it work?

The way that the program works, from a user’s perspective, is by simply choosing a voice they want to have imitated, and then typing the words they want to have said. The program will then construct an audio file for the user to use in whatever they want. Many of these programs exist and are usually free to use. From the perspective of a programmer or someone who wants to construct a new voice to deepfake, there are many open-source projects that people can utilize to feed in audio files, which will train the program to make a new voice to mimic.

This is an example I made with uberduck.ai using Michael Jackson’s voice. There are many other websites that can be used.

Potential of the Technology

Any Voice as an Instrument

The ability to have AI perform the vocals for a song has been made possible in the past by utilizing platforms like Hatsune Miku or Vocaloid, but those were limited to the particular voice options they provided. With this new technology, any person can have their voice transformed into an instrument, should a person have enough vocal data. This, of course, would require dealing with many audio files and many trials of training the AI to make it properly generate speech, but, with more time, this type of technology will surely have better ways for people to generate their own voices. Additionally, the net of voices that this can capture casts further than any actual person repeating someone’s words from a piece of paper by allowing people to generate voices from dead people.

Recontextualization of Singers and Songs

One part of this semester I enjoyed was this idea of recontextualizing music through the class material we viewed such as the covers of songs that occurred when examining game shows. With this technology, recontextualization is given further embodiment. Each singer and each song have particular feelings or genres they try to elicit when they are constructed. In other words, they fit into different contexts as defined by the makers whether by the song produced or the genre a singer usually falls under. However, with this technology, people can now extract the voice of a singer, enabling the mixing of different genres. A user can make Tupac sing a country song, or make Willie Nelson into one of the best EDM vocalists ever heard. This is not even limited to just singers either. Any character on a show or any person you know can be turned into an instrument and strummed to any beat.

This is a good example of recontextualization. This particular creator makes songs from AI and, in this song, made Eminem sing about feminism.

Preservation

As a final note for potential, as stated before, there is an argument to be made that there is potential for preservation of the artists who are dead. While it may be a bit dark to use the voice of a dead singer to profit from new, AI-generated music or make them say whatever anyone wants them to say, there still exists some notion of preserving the dead in that people can continue to create music of the same style with their voice.

Fitting into Liveness — Comparing other Examined Technology

Hatsune Miku versus TTS Bill Nye

Of the technologies examined in the Liveness portion of our class, Hatsune Miku most closely resembles this technology since both it and this technology exist as ways for people to make music through an artificial voice rather than real human voices. However, when examining things outside of Liveness, differences appear. Most notably, Hatsune Miku had the potential to become a commercial success and became one, spreading across the music industry as a tool that was for the people but also as an idol that was just a hologram. Concerts were held with Hatsune Miku as the lead singer. While it is totally possible for the Deepfake technology to be used to create music performed in front of a live audience, it probably would not be commercialized as much as Hatsune Miku. Nor would the music be attributed to just the AI voice, but rather the people who utilized the AI voice.

Hatsune Miku in a Domino’s commercial. This showcases how Hatsune Miku became more commercialized as an idol in addition to its notoriety as a tool for making music.

More than Holograms

Another technology we saw that pertained to music was the use of holograms in musical performances. As the title of this article suggests, vocal Deepfaking is a replication of more than just the visage of a singer, dead or not. The technology of Deepfaking is a reflection of the era that we live in of every interaction and every moment being recorded and being placed on the Internet. Holograms existed to serve as a way to create a more real experience of artists who are no longer alive through artificial (and significantly less data-driven) means, while this technology performs that job and more with this very reflection. While holograms are only paintings of the world before tragedy struck great people, Deepfaking is a paintbrush that anyone can materialize should they have the Internet and time.

How People React to this Technology (and the Industry)

Implications of this Technology (a.k.a. Has Black Mirror become a reality?)

Tying into our brief examination of Black Mirror, technology like Deepfaking evokes this feeling of technology reaching points that are dangerous for people. “What if someone uses this to recreate my voice? What if my voice is used as a weapon? Are we finally trapping a copy of our conscience into a machine and forcing it to do whatever we want it to do since it could have my voice respond to my whims?” Questions like these reflect how it feels as though reality is growing closer and closer to becoming like the realities in Black Mirror. Taking a look at articles, these technologies are becoming so realistic that they are becoming less distinguishable from regular human speech. However, as a Computer Science major, I have to say that making Black Mirror levels of AI would require something that may never even be attained by humans — a true understanding of real intelligence.

Headline from the Daily Mail - Dailymail.com, D. A. F. (2021, October 11). *Ai-generated deepfake voices can fool both smart assistants and humans with 5 seconds of training*. Daily Mail Online. Retrieved December 8, 2021, from https://www.dailymail.co.uk/sciencetech/article-10081007/AI-generated-deepfake-voices-fool-smart-assistants-humans-5-seconds-training.html. Even just the headline is enough to build up the feeling that we are approaching an AI-generated apocalypse.

However, at the same time, the uses of this technology right now do have some less negative effects that are outside of the topic of music such as utilizing voices to make parodies or to make memes with the voices. Sticking in actual recognized characters’ voices rather than the usual text-to-speech bot changes the feeling of something comedic, almost helping lead the comedic piece with the voice, either by using the character as they normally would act in their piece or by putting the characters in an opposite light.

Overall, I believe that this may evolve the soundscapes of the average human being, with a shift from the bland single tone text-to-speech bots that we have today into any voice reading and saying to us anything we want to hear.

The Industry Disagrees

As stated in the previous section, but not delved into, “What if my voice was used as a weapon?” brings up a good point since these AI-generated voices are getting better at replicating people. However, some renowned members of the music industry see issues differently with the existence of Deepfake technology. Notably, Jay-Z does not particularly like them.

Headline from The Verge — Statt, N. (2020, April 28). *Jay Z tries to use copyright strikes to remove deepfaked audio of himself from YouTube*. The Verge. Retrieved December 8, 2021, from https://www.theverge.com/2020/4/28/21240488/jay-z-deepfakes-roc-nation-youtube-removed-ai-copyright-impersonation. Articles like this show how the music industry does not like this technology. It does pose the question of whether or not a person’s voice has copyright. I personally lean toward not.

Artists in the industry do not like the use of Deepfake technologies due to the potential ethical hazards they may pose on the artists, such as putting words in their mouths by making the AI say bad things.

From NPR. (2020, December 15). *Latest deepfake controversy raises legal and ethical questions in music industry*. NPR. Retrieved December 8, 2021, from https://www.npr.org/transcripts/946827273. Articles like this highlight the things that could and might currently be happening with this technology. However, this particular article brings bad points in my opinion.

However, the person using the AI-generated voice in the Verge article brings up a great argument for these technologies. They argue that this technology is equivalent to someone accurately mimicking someone’s voice naturally. Such an argument lends a great hand to the concept that the words that come out of these AIs are not that of the original artists, but rather the person that utilizes the AI. Therefore, it is in my opinion that there is no danger to the industry when it comes to ethical hazards. If someone makes an AI say something racist or misogynistic or sexist (without parody), all the blame should be on the person who made the AI say that.

The Future of Deepfaking

More Data! MORE!

As time goes on, there will only be more data for people around the world. This applies especially as people try to spread themselves more on the Internet and those who are already spreading themselves across the Internet continue to expand their influence. Though we may never completely replicate a human conscience, the amount of data we have on people can mimic more than just their voices. Things people like, things people say, how people look, how people talk — all these things become easier to replicate as the data that exists on the Internet expands. Hopefully, corporations do not abuse this expanse of information (*cough cough* ZUCKERBERG *cough cough*) to make AI that mimic our family members. The way data makes this technology better is obvious — the AI has more varied speech patterns and each speech pattern has more pieces to train on.

From Stack Exchange: Ben, BenBen 46844 silver badges1010 bronze badges, & Bloodstone ProgrammerBloodstone Programmer 30022 gold badges33 silver badges99 bronze badges. (1967, November 1). *Clarification on train, test and Val and how to use/implement it*. Data Science Stack Exchange. Retrieved December 8, 2021, from https://datascience.stackexchange.com/questions/61467/clarification-on-train-test-and-val-and-how-to-use-implement-it. This is a simple diagram of the types of data sets a machine learning program would use to train itself.

We already make them dance. Why not make them sing?

Dark pun aside, as pointed out in this article, there is a notion of preserving and having the ability to use the voices of dead artists. Media has already used holograms to project the dead onto the stage, so the natural step forward would be to make new songs with the dead artists’ voices and make the holograms perform them. This is possible even now with enough time put into editing and producing the music.

From the RollingStone: Kreps, D. (2021, July 23). *Whitney Houston’s hologram is coming to Las Vegas*. Rolling Stone. Retrieved December 8, 2021, from https://www.rollingstone.com/music/music-news/whitney-houston-hologram-concert-las-vegas-residency-1201006/. Whitney Houston is STILL being used as a hologram.

Conclusion

In the future, I believe that this technology will blossom into another tool that a musician can use. It has its roots in being a free, more casual feeling rather than a corporate one, making it a prime piece of technology for amateur creations of musical pieces.