Would You like a feature Interview?
All Interviews are 100% FREE of Charge
microsoft published A research paper published this week describes a new AI model called VASA-1 that can transform a single photo and audio clip of a person into a realistic video of them lip-syncing, including their facial expressions and head movements. is focused on.
The AI model was trained with AI-generated images from generators such as DALL E-3, which the researchers overlaid with audio clips. As a result, the image of the speaking face is converted into a video.
The researchers built on technologies from competitors, including: Runway and Nvidiabut written on paper Their method is higher quality, more realistic and “significantly better” than existing methods.
Related: Adobe’s Firefly image generator was partially trained on Midjourney’s AI images
Researchers say the model can take audio of any length and generate talking faces in response to the clip.
The only non-AI-generated image the researchers tested was the Mona Lisa.they created an iconic image Lip-sync Anne Hathaway’spaparazzi” begins with the line “Yo, I’m a paparazzi, I don’t play Yahtzee.”A screenshot of the middle frame of the video.Credit: Entrepreneur
The Mona Lisa was an example of a photo input that an AI model was not trained on, but could manipulate anyway. This model can also convert artistic photos, capture audio from songs, and process audio in languages other than English.
The researchers highlighted the model’s ability to operate in real-time with a demo video in which the model instantly animates images using head movements and facial expressions.
Deepfakes, digitally altered personal media that can spread misinformation or steal someone’s likeness without permission, are created using advanced AI that can generate digital media with relatively few reference points. is the risk posed by.
Related: Tennessee passes law to protect musicians from AI deepfakes
Microsoft addressed that concern generally in its paper, and the researchers wrote, “We oppose any activity that creates misleading or harmful content about real people, and we oppose any activity that creates misleading or harmful content about real people, and we oppose any activity that creates misleading or harmful content about real people. We are interested in applying our technology to
The researchers said their technology could also have positive applications, such as improving accessibility and enhancing educational activities.
Google did a demo Similar research projects Last month, we introduced an AI that can take photos and create videos from them that users can control with their voice. The AI was able to add head movements, eye blinks, and hand gestures.