Replicating More than Just Holograms
Information about the Technology
What is it?
In the year 2020, a technology called “Deepfake” had broken out of the shell of being a mere researching tool and into the hands of amateurs. This technology acts to imitate a particular person by utilizing boundless amounts of data that has been recorded and maintained mostly on the Internet. While this technology has existed in the past, the flavor of the technology that pertains to this course and this time is that of vocal deepfake, where people now have access to technologies that imitate the voice of a person by analyzing audio files of their voice. The ability to deepfake voices has mostly been picking up this year with the advent of a plethora of websites hosting this type of program.

How does it work?
The way that the program works, from a user’s perspective, is by simply choosing a voice they want to have imitated, and then typing the words they want to have said. The program will then construct an audio file for the user to use in whatever they want. Many of these programs exist and are usually free to use. From the perspective of a programmer or someone who wants to construct a new voice to deepfake, there are many open-source projects that people can utilize to feed in audio files, which will train the program to make a new voice to mimic.
Potential of the Technology
Any Voice as an Instrument
The ability to have AI perform the vocals for a song has been made possible in the past by utilizing platforms like Hatsune Miku or Vocaloid, but those were limited to the particular voice options they provided. With this new technology, any person can have their voice transformed into an instrument, should a person have enough vocal data. This, of course, would require dealing with many audio files and many trials of training the AI to make it properly generate speech, but, with more time, this type of technology will surely have better ways for people to generate their own voices. Additionally, the net of voices that this can capture casts further than any actual person repeating someone’s words from a piece of paper by allowing people to generate voices from dead people.
Recontextualization of Singers and Songs
One part of this semester I enjoyed was this idea of recontextualizing music through the class material we viewed such as the covers of songs that occurred when examining game shows. With this technology, recontextualization is given further embodiment. Each singer and each song have particular feelings or genres they try to elicit when they are constructed. In other words, they fit into different contexts as defined by the makers whether by the song produced or the genre a singer usually falls under. However, with this technology, people can now extract the voice of a singer, enabling the mixing of different genres. A user can make Tupac sing a country song, or make Willie Nelson into one of the best EDM vocalists ever heard. This is not even limited to just singers either. Any character on a show or any person you know can be turned into an instrument and strummed to any beat.
Preservation
As a final note for potential, as stated before, there is an argument to be made that there is potential for preservation of the artists who are dead. While it may be a bit dark to use the voice of a dead singer to profit from new, AI-generated music or make them say whatever anyone wants them to say, there still exists some notion of preserving the dead in that people can continue to create music of the same style with their voice.
Fitting into Liveness — Comparing other Examined Technology
Hatsune Miku versus TTS Bill Nye
Of the technologies examined in the Liveness portion of our class, Hatsune Miku most closely resembles this technology since both it and this technology exist as ways for people to make music through an artificial voice rather than real human voices. However, when examining things outside of Liveness, differences appear. Most notably, Hatsune Miku had the potential to become a commercial success and became one, spreading across the music industry as a tool that was for the people but also as an idol that was just a hologram. Concerts were held with Hatsune Miku as the lead singer. While it is totally possible for the Deepfake technology to be used to create music performed in front of a live audience, it probably would not be commercialized as much as Hatsune Miku. Nor would the music be attributed to just the AI voice, but rather the people who utilized the AI voice.
More than Holograms
Another technology we saw that pertained to music was the use of holograms in musical performances. As the title of this article suggests, vocal Deepfaking is a replication of more than just the visage of a singer, dead or not. The technology of Deepfaking is a reflection of the era that we live in of every interaction and every moment being recorded and being placed on the Internet. Holograms existed to serve as a way to create a more real experience of artists who are no longer alive through artificial (and significantly less data-driven) means, while this technology performs that job and more with this very reflection. While holograms are only paintings of the world before tragedy struck great people, Deepfaking is a paintbrush that anyone can materialize should they have the Internet and time.
How People React to this Technology (and the Industry)
Implications of this Technology (a.k.a. Has Black Mirror become a reality?)
Tying into our brief examination of Black Mirror, technology like Deepfaking evokes this feeling of technology reaching points that are dangerous for people. “What if someone uses this to recreate my voice? What if my voice is used as a weapon? Are we finally trapping a copy of our conscience into a machine and forcing it to do whatever we want it to do since it could have my voice respond to my whims?” Questions like these reflect how it feels as though reality is growing closer and closer to becoming like the realities in Black Mirror. Taking a look at articles, these technologies are becoming so realistic that they are becoming less distinguishable from regular human speech. However, as a Computer Science major, I have to say that making Black Mirror levels of AI would require something that may never even be attained by humans — a true understanding of real intelligence.

However, at the same time, the uses of this technology right now do have some less negative effects that are outside of the topic of music such as utilizing voices to make parodies or to make memes with the voices. Sticking in actual recognized characters’ voices rather than the usual text-to-speech bot changes the feeling of something comedic, almost helping lead the comedic piece with the voice, either by using the character as they normally would act in their piece or by putting the characters in an opposite light.
Overall, I believe that this may evolve the soundscapes of the average human being, with a shift from the bland single tone text-to-speech bots that we have today into any voice reading and saying to us anything we want to hear.
The Industry Disagrees
As stated in the previous section, but not delved into, “What if my voice was used as a weapon?” brings up a good point since these AI-generated voices are getting better at replicating people. However, some renowned members of the music industry see issues differently with the existence of Deepfake technology. Notably, Jay-Z does not particularly like them.

Artists in the industry do not like the use of Deepfake technologies due to the potential ethical hazards they may pose on the artists, such as putting words in their mouths by making the AI say bad things.

However, the person using the AI-generated voice in the Verge article brings up a great argument for these technologies. They argue that this technology is equivalent to someone accurately mimicking someone’s voice naturally. Such an argument lends a great hand to the concept that the words that come out of these AIs are not that of the original artists, but rather the person that utilizes the AI. Therefore, it is in my opinion that there is no danger to the industry when it comes to ethical hazards. If someone makes an AI say something racist or misogynistic or sexist (without parody), all the blame should be on the person who made the AI say that.
The Future of Deepfaking
More Data! MORE!
As time goes on, there will only be more data for people around the world. This applies especially as people try to spread themselves more on the Internet and those who are already spreading themselves across the Internet continue to expand their influence. Though we may never completely replicate a human conscience, the amount of data we have on people can mimic more than just their voices. Things people like, things people say, how people look, how people talk — all these things become easier to replicate as the data that exists on the Internet expands. Hopefully, corporations do not abuse this expanse of information (*cough cough* ZUCKERBERG *cough cough*) to make AI that mimic our family members. The way data makes this technology better is obvious — the AI has more varied speech patterns and each speech pattern has more pieces to train on.

We already make them dance. Why not make them sing?
Dark pun aside, as pointed out in this article, there is a notion of preserving and having the ability to use the voices of dead artists. Media has already used holograms to project the dead onto the stage, so the natural step forward would be to make new songs with the dead artists’ voices and make the holograms perform them. This is possible even now with enough time put into editing and producing the music.

Conclusion
In the future, I believe that this technology will blossom into another tool that a musician can use. It has its roots in being a free, more casual feeling rather than a corporate one, making it a prime piece of technology for amateur creations of musical pieces.