Where Memory Ends and Generative AI Begins
In late March, a well-funded artificial intelligence startup hosted what it said was the first ever AI film festival at the Alamo Drafthouse theater in San Francisco. The startup, called Runway, is best known for cocreating Stable Diffusion, the standout text-to-image AI tool that captured imaginations in 2022. Then, in February of this year, Runway released a tool that could change the entire style of an existing video with just a simple prompt. Runway told budding filmmakers to have at it and later selected 10 short films to showcase at the fest.
The short films were mostly demonstrations of technology; well-constructed narratives took a backseat. Some were surreal, and in one instance intentionally macabre. The last film shown made the hair stand up on the back of my neck. It felt as though the filmmaker had deliberately misunderstood the assignment, eschewing video for still images. Called Expanded Childhood, the AI "film" was a slideshow of photos with a barely audible echo of narration.
Director Sam Lawton, a 21-year-old film student from Nebraska, later told me he used OpenAI's DALL-E to alter the images. He assembled a series of photos from his childhood, fed them to the AI tool, and gave it various commands to expand the images: to fill in the edges with more cows, or trees; to insert people into the frame who hadn't really been there; to reimagine what the kitchen looked like. Toss another puppy into the bathtub--why not? Lawton showed the AI-generated images to his father, recorded his befuddled reactions, and inserted the audio into the film.
"No, that's not our house. Wow--wait a minute. That's our house. Something's wrong. I don't know what that is. Do I just not remember it?" Lawton's father can be heard saying.
Where do real memories end and generative AI begin? It's a question for the AI era, where our holy photos merge with holey memories, where new pixels are generated whole cloth by artificial intelligence. Over the past few weeks, tech giants Google and Adobe, whose tools collectively reach billions of fingertips, have released AI-powered editing tools that completely change the context of images, pushing the boundaries of truth, memory, and enhanced photography.
Google dipped its toes in the water with the release of Magic Eraser in 2021. Now the company is testing Magic Editor, a feature on select Android phones that repositions subjects, removes photobombers, and edits out other unseemly elements, then uses generative AI to fill in pixel gaps. Adobe, arguably the most famous maker of creative editing software, announced earlier this week that it was putting its generative AI engine Firefly into Adobe Photoshop. The aptly named Generative Fill feature will edit photos and insert new content via a text-based prompt. Type in "add some clouds" zand there they appear.
Adobe is calling it a "co-pilot" for creative workflows, which parrots the phrasing that other tech companies, such as Microsoft, are using to describe generative AI apps. It implies that you are still in total control. In this framing AI is merely offering an assist, taking over navigation when you need a bathroom break. This is something of a misportrayal when the AI is actually acting as a cartographer, redrawing the maps of your existence.
"'Perfect your memories' is perhaps the most haunting phrase I've ever read," Signal Foundation president and former Googler Meredith Whittaker tweeted in February, in response to Google's announcement that its Magic Eraser tool could now be used in videos, not just in photos. In its marketing of the tool, Google shows an image of a young girl facing a choppy sea. Nearer to the shoreline is a family of four, presumably not hers. Magic Eraser disappears them.
Let's be totally clear: We could always edit photos. Whether by scissor, razor, or paint, as long as the printed photo has existed, we've edited. Photoshop's provenance was timed to the rise of the personal computer, which, non-hyperbolically speaking, changed everything.
The first version of Photoshop launched in 1990. "Jennifer in Paradise" was the digital photo seen around the world: an image of Photoshop cocreator John Kroll's wife sitting on a beach in Bora Bora. In demos, Kroll would outline his wife using the now-famous lasso tool, then clone her. He copied, pasted, and diffused an island in the distance. "A duplicate island!" Kroll said in a video posted to Adobe's YouTube channel in 2010. An island that was not really there. A fabricated land mass.
What's different today--what generative AI is pushing boundaries on--is the speed with which these edits can be made and who can make them. "Editing tools have existed for a long time," says Shimrit Ben-Yair, the head of Google Photos. "And obviously we've been offering editing tools on Photos for a while now. As these platforms have grown their user bases, these tools become much more accessible and available to people. And edited images become more common."
In a private demonstration of Google's Magic Editor tool, which ships later this year, Ben-Yair pulled up yet another beach photo. This one featured two kids sporting wetsuits and boogie boards, with two adults in the distant background. The kids and adults have different skin tones, and the somewhat uncomfortable assumption in this demo--also emphasized by the distance between them--is that they are not family. Google's Magic Editor outlined the adults in the background, then disappeared them.
In another demo, Magic Editor erased the bag strap from a woman's shoulder as she posed in front of a waterfall, then filled in the gaps with more jacket material. Why the bag strap in a hiking photo was so bothersome, I do not know. But those aesthetic decisions are the prerogative of the photo's creator, Google says.
Adobe's Generative Fill is much more, well, generative. A long-haired corgi scampers down an empty road. That's it, that's the photo. But Generative Fill lengthens the road. It transforms barren trees into a springtime bloom. A white pickup truck appears, and whether it's driving toward the corgi or away from it changes the tension of the photo in a notable way. But, look, now there are puddles. Surely that's a happy photo? Generative AI is even smart enough to draft a reflection of the scampering pup in the puddles. It does this all in seconds. I'm blown away.
But after the astonishment comes "What now?" Suppose that is my hiking photo, my dog, my family on the beach. How will I remember that day if in the future they are only watercolor in my brain, and I increasingly turn to my photo roll for more vivid strokes? Did I actually not carry a bag while hiking? Did the pickup truck come dangerously close to my dog that day? Did I only ever vacation on pristine, private beaches?
Executives at both Google and Adobe say the power of the tools must be considered within the context of the photo. Who is taking it, who is sharing it, where it's being shared to. "I think in the context of a public space, there are different expectations than that of a photo being shared in a private space," says Ben-Yair. "If someone is sharing a photo with you via Google Photos itself or a messaging app that you use, you trust that source. And you might see the editing as something that enhances the photo, because you trust that source."
"But the more layers of abstraction there are," she continues, "Where you don't know the source, then yeah, you have to think through, how authentic is this photo?"
Similarly, Andy Parsons of Adobe says there's a "continuum of use cases" for AI-edited photos. An artist (or individual who fancies themself an artist) might use generative AI to alter a photo that's meant to be a creative interpretation, not documentation. On the other hand, "if it's very critically important to know that what's being presented in the photo is a reflection of reality, such as in a news organization, we expect to see more and more photographers being required to provide transparency," Parsons says.
Parsons is something like the king of provenance at Adobe. His actual title is senior director of the Content Authenticity Initiative, a group Adobe cocreated in 2019 to establish cross-industry guidelines around content origination and media transparency. It was the doctored Nancy Pelosi video, Parsons says, in which the Speaker of the House appeared to be slurring her words, that "again, changed history." Even though the editing wasn't credited to AI, the sheer manipulation of the Pelosi video made Adobe reconsider how its powerful editing tools might be used. Adobe's earliest partners in the CAI were Twitter and The New York Times.
Then, in 2021, Adobe joined forces with the BBC, chip-makers Intel and ARM, and Microsoft to create yet another consortium for standards around "digital provenance," called Coalition for Content Provenance and Authenticity, or C2PA. The Coalition now has more than a thousand members across various industries. At Microsoft's annual software conference this week, the company said that its Bing Image Creator will soon use C2P2-standard cryptographic methods to sign AI-generated content. (Google's Ben-Yair also says this is an "active area of work for the company that we're going to explain once we get closer to the launch of it.")
"We're all focused on the same idea," Parsons says. "We've kind of lost the arms race in detecting what may be fake. The chasm has been crossed. So the protection and countermeasure we have is knowing what model was used to capture or create an image and to make that metadata trustworthy."
In theory, these cryptographic standards ensure that if a professional photographer snaps a photo for, say, Reuters and that photo is distributed across Reuters international news channels, both the editors commissioning the photo and the consumers viewing it would have access to a full history of provenance data. They'll know if the cows were punched up, if police cars were removed, if someone was cropped out of the frame. Elements of photos that, according to Parsons, you'd want to be cryptographically provable and verifiable.
Of course, all of this is predicated on the notion that we--the people who look at photos--will want to, or care to, or know how to, verify the authenticity of a photo. It assumes that we are able to distinguish between social and culture and news, and that those categories are clearly defined. Transparency is great, sure; I still fell for Balenciaga Pope. The image of Pope Francis wearing a stylish jacket was first posted in the subreddit r/Midjourney as a kind of meme, spread amongst Twitter users and then picked up by news outlets reporting on the virality and implications of the AI-generated image. Art, social, news--all were equally blessed by the Pope. We now know it's fake, but Balenciaga Pope will live forever in our brains.
After seeing Magic Editor, I tried to articulate something to Shimrit Ben-Yair without assigning a moral value to it, which is to say I prefaced my statement with, "I'm trying to not assign a moral value to this." It is remarkable, I said, how much control of our future memories is in the hands of giant tech companies right now simply because of the tools and infrastructure that exist to record so much of our lives.
Ben-Yair paused a full five seconds before responding. "Yeah, I mean ... I think people trust Google with their data to safeguard. And I see that as a very, very big responsibility for us to carry." It was a forgettable response, but thankfully, I was recording. On a Google app.
After Adobe unveiled Generative Fill this week, I wrote to Sam Lawton, the student filmmaker behind Expanded Childhood, to ask if he planned to use it. He's still partial to AI image generators like Midjourney and DALL-E 2, he wrote, but sees the usefulness of Adobe integrating generative AI directly into its most popular editing software.
"There's been discourse on Twitter for a while now about how AI is going to take all graphic designer jobs, usually referencing smaller Gen AI companies that can generate logos and what not," Lawton says. "In reality, it should be pretty obvious that a big player like Adobe would come in and give these tools straight to the designers to keep them within their ecosystem."
As for his short film, he says the reception to it has been "interesting," in that it has resonated with people much more than he thought it would. He'd thought the AI-distorted faces, the obvious fakeness of a few of the stills, compounded with the fact that it was rooted in his own childhood, would create a barrier to people connecting with the film. "From what I've been told repeatedly, though, the feeling of nostalgia, combined with the uncanny valley, has leaked through into the viewer's own experience," he says.
Lawton tells me he has found the process of being able to see more context around his foundational memories to be therapeutic, even when the AI-generated memory wasn't entirely true.