The software was able to render visually convincing videos of Barack Obama saying things he's said before, but in a totally new context.
In a paper published this month, the researchers explained their methodology. Using a neural network trained on 17 hours of footage of the former US president's weekly addresses, they were able to generate mouth shapes from arbitrary audio clips of Obama's voice.
The shapes were then textured to photorealistic quality and overlaid onto Obama's face in a different "target" video. Finally, the researchers retimed the target video to move Obama's body naturally to the rhythm of the new audio track. In their paper, the researchers pointed to several practical applications of being to generate high quality video from audio, including helping hearing-impaired people lip read audio during a phone call or creating realistic digital characters in the film and gaming industries. But the more disturbing consequence of such a technology is its potential to proliferate video based fake news.
Though the researchers used only real audio for the study, they were able to skip and re-order Obama's sentences seamlessly and even use audio from an Obama impersonator to achieve near perfect results.
The rapid advancement of voice synthesis software also provides easy, off-the-shelf solutions for compelling, falsified audio.