Deepfakes: Don't believe your eyes!
Forgery has been around since time immemorial. Comrades who had fallen out of grace with Stalin were removed from pictures, models are given a wasp waist and aunt Tilda suddenly loses weight via Photoshop. It's therefore fair to say digital images should be viewed with a healthy dose of skepticism. So far, videos have proven more resilient to manipulation, and if they were tampered with, the changes were easy to spot. Researchers from Carnegie Mellon University have now developed a method that may usher in a new era of forgery. Artificial intelligence now autonomously creates fakes that leave me speechless.
The technology allows content (like movement, facial expressions) from one video to be superimposed onto another - with stunningly realistic results! In one example, an Obama interview was layered on top of a Trump interview with Trump now saying Obama's lines in perfect sync. Developers also took the facial expressions from US television host John Oliver and applied them to his late night colleague Stephen Colbert - including minute details like nods, smiles and blinks. What distinguishes these videos from common fakes is that they were created almost fully autonomously by AI, with little need for human intervention. While in the past, entire teams spent countless hours on manipulating historic recordings, e.g. for Forrest Gump, AI is now taking over. Animating the corner of a mouth originally required highly skilled specialists, today, computers animate entire faces (and very soon people and complex scenes) on their own. Yes, if you look closely, you can spot minor flaws but the technology is still in its infancy, after all. And don't forget: AI never rests, it constantly learns and develops. I bet my boss would love for me to be the same way!
Let me present an analogy to shed some light on how AI learns. Imagine criminals seeking to counterfeit money but lacking a great deal of the required technical skills. Their first amateurish bills get circulated - and quickly spotted by the police who then issue a press statement. In it, they outline how to spot counterfeit money. The counterfeiters study the facts, recognize their mistakes and create the next iteration of forged money. The police again detect the bills, after more extensive research, issue another press statement and the next cycle begins. There are two adversaries here contesting with each other while generating new insightful data, which is why this approach is called "Generative Adversarial Network" (GAN). The generator (counterfeiters) forges money and the discriminator (police) detects errors and issues a statement which is then analyzed, after which the next round commences. The project is considered completed only once the discriminator has no more objections.
<div class="mt-8 flex flex-col-reverse items-center gap-x-8 gap-y-4 [grid-area:1/1] @[640px]:flex-row">
<label x-data class="initial:max-w-max initial:font-semibold font-normal whitespace-nowrap">
<div class="flex">
<input type="checkbox"
name=""
value=""
class=" shrink-0 peer"
x-model="alwaysUnlockCheckboxState"
>
<div class="w-full ml-3 relative text-left peer-disabled:opacity-50 transition">
Always unblock
</div>
</div>
</label> <button
class="AshButton AshButton--neutral "
x-data="button({ isButton: true })"
x-bind="button"
x-on:click="play()"
<svg class="AshButton__icon" fill="currentColor" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 384 512"><!--! Font Awesome Free 6.7.1 by @fontawesome - https://fontawesome.com License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) Copyright 2024 Fonticons, Inc. --><path d="M73 39c-14.8-9.1-33.4-9.4-48.5-.9S0 62.6 0 80L0 432c0 17.4 9.4 33.4 24.5 41.9s33.7 8.1 48.5-.9L361 297c14.3-8.7 23-24.2 23-41s-8.7-32.2-23-41L73 39z"/></svg>
Load video
<span x-cloak class="absolute top-0 left-0 flex items-center justify-center w-full h-full pointer-events-none transition-all duration-150"
x-bind:class="{
'invisible opacity-0 [animation-play-state:paused]': typeof isSubmitting === 'undefined' || !isSubmitting,
'visible opacity-100 [animation-play-state:running]': typeof isSubmitting !== 'undefined' && isSubmitting
}"
aria-hidden="true"
>
<span class="flex gap-4 [animation-play-state:inherit]">
<span
class="block initial:w-8 initial:h-8 initial:border-4 rounded-full animate-spin [animation-play-state:inherit] border-t-gray-500 border-gray-400/25 dark:invert dark:brightness-0"
style="
animation-timing-function: cubic-bezier(.53, .28, .36, .59);
"
</div>
</div>
</div>
<div
class="relative -z-1 place-self-center [grid-area:1/1] pointer-events-none"
x-show="isLoading"
x-transition.opacity.duration.300ms
>
<span
class="block initial:w-8 initial:h-8 initial:border-4 rounded-full animate-spin [animation-play-state:inherit] border-t-white border-white/50 w-16 h-16"
style="
animation-timing-function: cubic-bezier(.53, .28, .36, .59);
"
<div
class="justify-self-stretch [grid-area:1/1] transition duration-300"
x-bind:class="{
'invisible opacity-0': !isLoading
}"
>
<iframe
x-bind:src="isLoading && `https://www.youtube.com/embed/ehD3C60i6lw`"
width="100%"
height="100%"
frameborder="0"
allow="autoplay; encrypted-media"
allowfullscreen
></iframe>
</div>
</div>
These networks can even be creative on their own! On October 25, 2018, the famous auction house Christie's in New York auctioned off a picture by Edmond de Belamy for $423,500. True, there exist more expensive pictures and more famous artists, still, the auction caused quite a sensation. That's because Edmond Belamy is not human! A generative adversarial network had dabbled in painting - with great success. Consequently, the painting does not bear a name but the mathematical representation of the algorithm used during its creation. And we're already one step further. Songs entirely composed by AI are already on the market and AI-driven movies, computer games and programs for self-driving cars are in the works. Naturally, use of AI can always be limited to certain aspects of a project, if needed. Picture a computer game that has players roam through gigantic virtual worlds. Until now, designing these landscapes was incredibly labor-intensive. For example, the game "Just Cause 4" features a freely explorable world of 1,024 square kilometers with intricate details from bumpy roads to single shrubs and various kinds of wildlife. Its creation could now be entirely left to AI, with QA being the only human element in the development process. This could save millions of development costs.
Still, despite the positives there's always potential for abuse, i.e. deepfakes in the form of images or videos that look deceptively real. As the aforementioned Obama and Trump example shows, it's already difficult to identify fakes today. And, bearing in mind the observed pernicious effects of fake news in social networks, this technology could have fatal consequences in the wrong hands. How quickly could a fake video depicting a violent crime bring down a politician? How long would it take intelligence agencies to manufacture videos of crimes allegedly committed by an opposing state? And how long would it take us to get the nasty images out of our heads should they turn out to be fake later on? It seems we'll have to face these questions very soon. AI developers are aware of this danger and try to develop new analytical methods parallel to their AI research to detect deepfakes. I fear, in spite of their efforts, not everything can be stuffed back into Pandora's box, which the technology has already opened. So be wary when you encounter videos of Trump singing the Russian national anthem at the top of his lungs or the Pope boldly tapdancing in front of a huge crowed!
What I would like to know: Will you approach "revealing" videos online with more distrust from now on? Or do you only believe half of what you see anyway?