Meta stole my book and used it to train A.I.
Losing something that I know I'll never get back...
Edit: This post is about finding out that Meta used my memoir, We Heard it When We Were Young, as fodder to train its Large Language Learning Model A.I. It also contains parts of a fictional short story idea I’ve had for more than a decade. Since learning about this Meta incident I’ve thought more and more about this short story and decided to flesh it out a bit for this pot. It’s About a mouthpiece for a tech company unveiling the latest evolution in its artificial intelligence. A device that they claim can tell a story as good as any author, living or dead. The italics are the fictional short story. If you just want the “real” account of what happened feel free to skip the italics.
The launch
In darkness the man waits in the wings. He stands facing forward inches behind black fabric. Focuses on the nothing before him while shutting out the hum of the at-capacity crowd a few feet from the stage. In a few seconds it will be his cue to pivot off his right foot and start his stride with a half-step of his left. To walk in time with a musical flourish, tracked by a spotlight manned by a team of light technicians. He still faces forward. He wants to take in that black nothingness of the fabric before him. Wants to imagine a vastness that only he can create. The cue. A pivot. The rapture brings its applause. He walks and waits the appropriate amount of time before the cheers dissipate and he can talk. Six steps. The executive team had hoped for ten. The fingers on both his hands fuse together at their tips, but only for a moment. That’s a Musk move and he soon interlaces all but the index and thumb, a variation he hopes will catch on by the next round of TED talks. “Imagine, with me,” he says at his mark on stage-right. “Imagine the most wondrous story. Full of adventure and laughter. Pathos and complexity. Imagine a story that can stand among the greats. A story from an author with a command of language that not only understands Shakespeare. But can play with that language in ways we haven’t seen since Shakespeare.” At these last lines an image of the playwright engulfs the scrim behind the man. When he finishes his lines the image twists and morphs. It dissolves into a small rectangular machine. A machine engineered within a millimeter of its aesthetic beauty. Hours of crunch devoted to the curves of its bezels. It is a dark gunmetal black that is actually a very dark blue. Hexadecimal color code #1f262a. There was almost two separate fist-fights on that shade of blue. “Imagine the greatest story ever told. Now imagine that ‘it’ can imagine that too.” The scrim snaps off, fluttering off the stage in a dance of folds within itself. It's carried off by a small gaggle of stagehands. What’s left is a podium. The man walks to the podium. To the machine perched upon it. And he picks it up…
Last Thursday, The Atlantic released a searchable database of LibGen, a collection of millions of books and scientific papers. A group of authors accused Meta of using the database to train its large language model A.I. Per this January Reuters article, “The authors asked the court on Wednesday for permission to file an updated complaint. They said new evidence showed Meta used the AI training dataset LibGen, which allegedly includes millions of pirated works, and distributed it through peer-to-peer torrents.”
Like a lot of other people, I read the news among the onslaught of other headlines. Uttered another, “That’s messed up…” among the Signalgates, Cyber Truck recalls, and Tariff double-downs of our scrolls of doom. But not too long after I scrolled past the headline my coworker got my attention. Some context on the exchange. Said coworker also happens to be the News Director for the magazine I work at, and his desk is right next to mine. We spend our work hours clacking away at our keyboards then coming up for air, usually for a snarky comment or aside related to whatever story we’re working on. It seems like the closest a place can get to the concept of a journalistic “bullpen” nowadays. “You see The Atlantic published the LibGen database and you can search by author? Can you guess which author and their book is in there?” I played coy. In half-disbelief. “Nope, I can’t think of anyone.” “Really? I think you might have heard about this book,” said with much inflection and askancing of tones.
I couldn’t play it off anymore. I don’t remember my exact words, but I know I accentuated them with a healthy amount of expletives. A new tab was already open. The search field queried…an ENTER and there it was. Little did I know that this search would re-contextualize my relationship with my writing. That it would inform my teaching of storytelling. And have me revisit one story I’ve had in the chamber for more than a decade. (Also, if you want the straight dope on my experience feel free to skip the fiction=italics. Though I promise the text in full is worth it.)
Of course, the machine had no need for physical form. At least, not form in the sense that the man would need to pick it up and parade it around on stage. It was a machine in the sense of it being a large language learning model, which, in essence, is hundreds of thousands of chips processing billions of computations while scrubbing petabytes of data from other machines connected through a network of cables, satellites, and various other communication channels. But it was important to the man on the stage to have a representation of all of the above in his hands. He told the development and marketing teams, "We are monkeys. We need to see the sticks in order to put down the baby in our arms. We need to hold the sticks in our hands before we realize we can rub them together to create fire." There were one hundred and forty six prototypes before they landed on the final design. The designs were iterative. First they based it off a typewriter. Then there was the period when the design team got hung up on something like a late 90’s monitor with an MS-DOS inspired interface. Whole days were spent arguing on the intervals in which the cursor would blink before a story was made manifest. Manifest stories. That’s the way the man phrased it then and that's how he phrased it on stage. This was the evolutionary step that necessitated a launch. To differentiate from, and he used this word derogatorily, home assistants. This wasn't something you would bark commands at while doing your dishes. You wouldn't dare ask this how long it takes to cook chicken breast in the air fryer while you measured out the rice for the rice cooker. This was a storyteller. No. He pauses. Just long enough for the effect. Not quite two and a half Mississippi's. Meet Helios. He is the THE storyteller. That was it. The moment. The man knew that he had arrived at the moment. All gas no brakes now. This will be the headlines and bylines. This will be the TikToks and Shorts. The man held Helios aloft while a marketing video played on massive screens flanking the stage.
The revelation that Meta had scrubbed my memoir bothered me more than I thought it would. I had already heard the news of other authors, some of which I knew personally, that had their work taken. But it felt like how a lot of other news feels like, something I am removed from, happening to other people. In the hours following my co-worker telling me about the situation I felt ill-at-ease. It was hard to concentrate on my day-to-day work. I felt a pang in my stomach, like when you hear bad news that necessitates the various stages of grief. I kept on going to the Atlantic’s website to search my name and see if my book would come up again. And it would. My book. My story. This was a story that took me years to process. From childhood trauma to the ever-present imposter syndrome that I had to overcome to write the damn thing. And in a flash some thing someone named Llama cannibalized the entire story. Scrubbed as a part of a dataset. Converted into numbers and processed to teach itself patterns between words in a language. I knew this was already happening. Hell a part of me knows that there is large swaths of my writing in this Substack already scrubbed. But this felt different. This bigger than me thing came and took this extension of me, my story, and real lived experiences. And who was I? I’m small potatoes! I thought my story would fly under the radar. Yeah they would take the Stephen Kings and JK Rowlings, but I’m just a dad in the Midwest who wrote about my hometown. I had to leave my office and walk around the block to clear my head.
While walking I thought of a writing exercise I often lead in group settings. We create a “memory map.” A map of an individuals experiences but one allowed to shift and morph. The landscape remixed and recontextualized based on my prompts. Here are the first few prompts I give.
In the center of your page draw your childhood home. The home you think of when you can first remember memories.
Somewhere on your map draw a place where your family would go for food. It could be a grocery store. Or maybe a favorite restaurant. Maybe it’s the food-bank where you would line up or neighbors house where you’d go for a meal.
Draw a place where you saw blood. Where in one moment there wasn’t blood and then suddenly there was.
More times than not the blood prompt is the turn. When there might be some murmurs from the group. When it goes from a quaint stroll down memory lane to maybe something else. We keep going through prompts. As we go, the prompts become more specific. More strange. More human.
A place where someone revealed a secret.
A place where someone doubled over in sadness. Where grief caused their body to physically react.
A place that reminds you of the passage of time. Maybe its a building that is now old and rotting? Maybe its the vacant lot where that building was razed. Maybe its your school?
Here is why I, on my walk around the block, I was thinking of this exercise. I always try to go into this exercise with a blank slate. Sure there are some go-to prompts that I’ll start on but there is also a sense of spontaneity that I need for it. If I can think of a new prompt in the moment I’ll throw it in. My favorite part of the exercise comes at the end, when I ask a group if they have any other prompts that we can add to our maps. If there is something that has come up that you can form into a question for the group. I’ve come to find that it’s the younger folks that have the best prompts. One time, in a group of junior high students someone suggested: A place where you lost something? And it felt like a revelation. I went into overdrive writing the suggestion on the whiteboard. Yeah yeah. Ooh what if its a place where you lost something that you knew, immediately, that you would never be able to find that thing again? That when it was lost, it was gone, gone. Such a good one. It could be a small physical sentimental thing. It could be, and lord help me if one of these young adults goes here, one’s virginity. Or perhaps one’s character and honesty. It could even be something hard to quantify, something that you’ve lost that you can’t even put into words yet…
A kid came up to me after I led his group in that exercise. They were junior high students from all over the state. He waited until the rest of the room cleared out, his map in his hand. I make it a point to tell groups that they don’t have to share anything they don’t want to. That this is their maps. Their stories to tell. But if they are comfortable and want to share then by all means. He stood before me with slight nervous energy. “When you asked about losing something you can never get back. I had something.” He leans in. “In my town there was a boy. They found his body at the park.” I begin to offer my condolences but he continues. “I didn’t know him and never really thought much of it. But I remembered him when you asked. His family lost him and can’t get him back but I thought of him.” I forget what I said afterwards. I remember him thanking me and it feeling like he skipped away, off to another class, or lunch. It amazes me how a young person can just nonchalantly offer up this megaton bomb of emotion and not think anything of it. I believe kids are resilient. That they have to process so much that they can’t afford to be bogged down by it all. I wonder about his map. How the reality of a dead boy in a park coexists with him playing in that same park in his memory. How those two experiences alter the landscape of his story. How the meaning of it all is embedded within the chaos of our remembered experiences. How we can lose something we didn’t know we had.
The man stood in darkness while the video played. So far the launch had ran perfect. He was already thinking about the headlines. The next steps. He stood before the podium while the screens showed paid actors talking to their Helios’. An ethnically ambiguous thirty-something man asks his Helios to punch up the draft of his play. A grey haired woman dabs away tears as her Helios perfectly recounts stories as if told by her (heavily implied but never explicitly said) dead husband. A young child giggles as Helios tells it the perfect bedtime story. Only the giggling continues. The man on stage clocks it. Somewhere in the crowd, there is a young child. Cooing. Babbling. He thinks of who to lambast after the presentation. How did a kid, of all things, get in the audience. If pressed the man could probably list off the entire invited list of journalists and influencers in the audience. Who in their right minds would drag along their kid to this moment? To his moment? As the video ends and the lights fade in on him the man fumbles. For the briefest of moments the camera catches his attention searching the crowd instead of commanding it. Instead of holding Helios he is elsewhere. But it’s only a moment. Soon after he is right back on the script. The landing. Sure they’ve seen the video but now they’ll see it live on stage. “Helios?” He asks with a certain inflection. A slight rise in the question. Even more weeks were spent on finding the proper voice command for a prompt. They couldn’t do “Hey.” or “Ok.” After a long night it was an intern that asked “What if there is no prompt? If you just asked for it by name?” And they were off to the races. The machine on stage chirped and vibrated, the edges of its body oscillated with a faint pulse of light. What the man should have said was this: “Let’s start simple. Can you tell me your story.” Simple. Elegant. The “Tell Me Your Story” advertising roll-out was minutes from its unveiling. But in the moments immediately after the machine chirped the man got out “…can you tell me,” then that godforsaken kid in the crowd made a noise. Something between a howl and a wail. It spilled out of the man like bile. An uncontrollable response to the indignation that was that kid and whatever clueless adult-figure brought him into this world. He hissed, “Why would you bring a kid here?”
A purr. The man’s heart dropped knowing that he had inadvertently prompted a response from the machine in his hand. One wildly out of any plans or protocols his team had ran through in the months leading up to the launch. He thought of the vocal contingent of his group that advocated for a canned interaction on stage. He thought of how disgusted he was with their cowardice. He wanted to think of more. But the machine began its story. It talked of a ‘boy that was’ in a park. It talked of remembering the idea of the boy even though he had never met him before. It talked of something it had lost that it never knew it had in the first place.
-C
They stole from a minimum of 52 scientific articles/books/chapters from my body of academic work too.
Is the italic part an AI re-write of your work? I'm not quite following here...