Connect with us


Animate Anyone brings closer the creation of video deepfakes

At this point, the technology is too complex and buggy for widespread use, but in the world of AI, it doesn’t last that long.

If still image deepfakes are already good enough, we will soon have to face generated videos. With Animate Anyone, you’ll be able to make a video of anyone whose picture you have.

The new generative video technology was developed by researchers from Alibaba Group’s Institute for Intelligent Computing. It’s a big step up from previous image-to-video conversion systems like DisCo and DreamPose, which only came out in the summer but are now history.

What Animate Anyone can do is by no means unprecedented, but it has travelled that difficult gap between “ridiculous academic experiment” and “good enough if you don’t look closely”. As we all know, the next stage is just “good enough” where people don’t even bother to look closely because they assume it’s real. It is at this stage that images and text conversations with AI are found that destroy our sense of reality.

Image-to-video conversion models like this one start by extracting details, such as facial features and pose, from a reference image, such as a fashion photograph of a model in a dress for sale. A series of images are then created that show these details in very slightly different poses, which can be translated into motion or extracted from another video itself.

Previous models had shown that this could be done, but there were many problems. Hallucinations were a big problem, as the model has to come up with plausible details, such as how a sleeve or hair might move when a person turns around. This results in a lot of very strange images, making the resulting video far from convincing. But Animate Anyone has gotten a lot better, although it’s still far from perfect.

The technical features of the new model are beyond the reach of most, but the article highlights a new intermediate step that “allows the model to comprehensively explore relationships with a reference image in a consistent feature space, which greatly improves the preservation of appearance details.” By improving the preservation of major and minor details, the generated images will then have a stronger reference to work from and turn out much better.

The developers demonstrate their results in several contexts. Clothing models assume arbitrary poses without deforming, and clothes do not lose their pattern. A 2D anime figure comes to life and dances convincingly. Lionel Messi makes several characteristic movements.

The resulting videos are still far from perfect – especially the eyes and hands, which pose a particular challenge for generative models. And the poses that are best captured are the closest to the original; for example, if a person turns round, the model struggles to keep up. But it’s a huge leap from the previous state of the art, which yielded far more artefacts or completely lost important details such as the colour of a person’s hair or clothing.

At this point, the technology is too complex and buggy for widespread use, but in the world of AI, it doesn’t last that long. At least the team isn’t putting the code out into the public domain just yet. Although they do have a page on GitHub, the developers write, “We are actively working on getting the demo and code ready for public release. While we cannot give a specific release date at this time, please be assured that the intention to provide access to both the demo and our source code is firm.”