Even at first look, there’s one thing off concerning the physique on the road. The white sheet it’s underneath is a bit of too clear, and the officers’ actions are completely devoid of objective. “We have to clear the road,” one in every of them says with a agency hand gesture, although her lips don’t transfer. It’s AI, alright. However right here’s the kicker: my immediate didn’t embody any dialogue.
Veo 3, Google’s new AI video technology mannequin, added that line all by itself. Over the previous 24 hours I’ve created a dozen clips depicting information experiences, disasters, and goofy cartoon cats with convincing audio — a few of which the mannequin invented all by itself. It’s greater than a bit of creepy and far more subtle than I had imagined. And whereas I don’t assume it’s going to propel us to a misinformation doomsday simply but, Veo 3 strikes me as an absolute AI slop machine.
Google introduced Veo 3 at I/O this week, highlighting its most necessary new functionality: producing sound to go along with your AI video. “We’re getting into a brand new period of creation,” Google’s VP of Gemini, Josh Woodward, defined within the keynote, calling it “extremely lifelike.” I wasn’t fully offered, however then, a couple of days later, I had Veo 3 generate a video of a information anchor saying a fireplace on the Area Needle. All it took was a primary textual content immediate, a couple of minutes, and an costly subscription to Google’s AI Ultra plan. And what? Woodward wasn’t exaggerating. It’s lifelike as hell.
I attempted the information anchor immediate after seeing what Alejandra Caraballo, a medical teacher at Harvard Regulation Faculty’s Cyberlaw Clinic, was capable of produce. One of her clips encompasses a information anchor saying the dying of US Secretary of Protection Pete Hegseth. He’s not lifeless, however the clip is extremely convincing. A publish together with a string of movies with AI-generated characters protesting the prompts used to create them has 50,000 upvotes on Reddit. The scenes embody disasters, a girl in a hospital mattress utilizing a respiratory tube, and a personality being threatened at gunpoint — all with spoken dialogue and lifelike background sounds. Actual lighthearted stuff!
Perhaps I’m being naive, however after enjoying round with Veo 3 I’m not fairly as involved as I used to be at first. For starters, the apparent guardrails are in place. You possibly can’t immediate it to create a video of Biden tripping and falling. You possibly can’t have a information anchor announce the assassination of the president, and even generate a video of a T-shirt-and-chain-wearing tech firm CEO laughing whereas greenback payments rain down round him. That’s a begin.
That mentioned, you possibly can generate some troubling shit. With none intelligent workarounds I prompted Veo 3 to create a video of the Area Needle on hearth. Beginning with my very own photograph of Mount Rainier, I generated a video of it erupting with smoke and lava. Coupled with a clip of a information anchor saying mentioned catastrophe, I can see how you possibly can seed some mischief actual simply with this software.
Right here’s the higher information: it doesn’t seem to be a ready-made deepfake machine. I gave it a few pictures of myself and requested it to generate a video with particular dialogue and it wouldn’t comply. I additionally requested it to convey a pair of large boots in a photograph to life and have them stroll out of the scene; it managed one boot stomping throughout the sidewalk with some comical crunching noises within the background.
I had a better time producing movies when my prompts had been much less particular, which is how I confirmed one thing my colleague Andrew Marino pointed out: Veo 3 is superb at creating the form of lowest-common-denominator YouTube content material aimed toward youngsters.
In the event you’ve by no means been subjected to the infinite pit of rubbish on YouTube Children, let me enlighten you. Think about watching the worst 3D rendering of a monster truck driving down a ramp, touchdown in a vat of coloured paint. Subsequent to it, one other monster truck drives down one other ramp into one other vat of paint — this time, a unique shade. Now watch that once more. And once more. And once more. There are hours of these things on YouTube designed to mesmerize toddlers. These movies are often innocent, simply empty energy designed to rack up views that make Cocomelon seem like Citizen Kane. In about 10 minutes with Veo 3, I threw collectively a clip following the identical primary formulation — full with jaunty background music. However the clip that’s much more troubling to me is the 2 cartoon cats on a pier.
I believed it could be humorous to have the cats complain to one another that the fish aren’t biting. In simply a few minutes, I had a clip full with two cats and a few AI-generated dialogue that I by no means wrote. If it’s this straightforward to make a 10-second clip, stretching it out to a seven-minute YouTube video can be trivial. In its present kind, clips revert to Veo 2 if you attempt to lengthen them into longer scenes, which removes the audio. However the way in which that Google has been pushing these instruments ahead relentlessly, I can’t think about it’ll be lengthy earlier than you possibly can edit a full feature-length video with Veo 3.
Truthfully, I ponder if this kind of use for AI-generated video is a characteristic and never a bug. Google confirmed us some fancy AI-generated video from actual filmmakers, including Eliza McNitt, who’s working with Darren Aronofsky on a brand new movie with some AI-generated parts. And certain, AI video might be an fascinating software in the appropriate fingers. However I believe what we’re more than likely to see is a proliferation of the form of bland imagery that AI is so good at generating — this time, in stereo.