This tool animates any portrait photograph to make it speak and express itself: that’s how realistic the results are

In March of this year D-ID surprised us all with Deep Nostalgia, an AI tool that allows us to animate old photos to bring our ancestors to life. The results were practically magical. But the company did not stop there, it has improved its technology even more. Now come the “speaking portraits”, essentially it allows add live voice and expressions to any photograph of a person where your face is seen from the front.

The great limitation of Deep Nostalgia is that the animations that were made were preconfigured. You can choose between several styles for the animated person to make specific expressions, but there is not complete freedom for this. With ‘Speaking Portraits’, the new tool, we control all expressions of the deepfake.

Presented the tool at TechCrunch Disrupt 2021, with it D-ID has demonstrated what its new capabilities are in terms of deep learning and deepfakes it means. The new tool manage to animate photos of faces imitating the expressions and voice of another person who is speaking live. In the following video we can see how a woman encourages the photograph of a child to move his head and express himself equally:

Full control over the animated person

The interesting thing here comes when it is used the most sophisticated version of Speaking Protrait, called Trained Character. As its name implies, the AI ​​must be trained with more data about the person to be animated. The results are also much better.

To use this improved version of the system, the person who is going to be animated does not just have to be in a photograph, but rather a video of about ten minutes is needed in which the subject performs a series of movements and expressions Default by D-ID. In this way, the AI ​​is trained with the characteristics of this person in order to then be able to make them able and move as the user wants.

Once the data is collected, you just have to record the person who is going to encourage you to speak and move, live the animated person will begin to move and speak as requested. Unlike the basic version, in this one the background can be animated and the result is more realistic, there are hardly any signs that it is a deepfake. D-ID shows the capabilities of this with a Zoom video call where one of the participants takes control of three more faces:

What is the use of this? Beyond controlling co-workers by Zoom, it can be especially useful to increase a person’s skills and abilities in specific settings. For example, for an actor dubbed in movies in other languages, his lips match his voice. Also so that a presenter can present any news at any time even if only his image is really used but others are speaking in his place. It can also be helpful in getting a person to speak multiple languages ​​even if they don’t understand them, something we’ve already seen some political candidates do.

D-ID also has an example of a Japanese presenter animated using this tool:

Via | PetaPixel
More information | D-ID