Podcasts
Paul, Weiss Waking Up With AI
World Models: AI and the Architecture of Understanding
In this episode, Katherine Forrest and Scott Caravello examine world models and their growing relevance to enterprise AI, drawing on a recent IBM blog post that highlights use cases ranging from infrastructure management to atmospheric prediction. They also review the details that have emerged from a recent leak about Anthropic's upcoming Mythos model, including its reported advances in cyber, coding, and reasoning benchmarks.
For the sources referenced in this episode, please see the links below:
IBM: https://www.ibm.com/think/news/world-models-next-frontier-enterprise-ai
Episode Speakers
Episode Transcript
Katherine Forrest: Hello and welcome to Paul Weiss Waking Up With AI. I'm Katherine Forrest.
Scott Caravello: And I'm Scott Caravello. Katherine, big announcement here today.
Katherine Forrest: Well, okay, drum roll. I actually have a question that will have to precede your very big announcement, which is—and I know that our audience is like on the edge of their seats—did the bride make it to the church on time or wherever people were getting married?
Scott Caravello: She sure did. It all went off without a hitch. Best man's speech crushed.
Katherine Forrest: Was that you?
Scott Caravello: Yeah, that was me. That was me. Not a dry eye in the house. You know, lots of laughs. So, things are looking up. Things are going well.
Katherine Forrest: Alright, that's fantastic. Now, to your point. Drum roll. Okay, Scott! Do you hear my drum roll?
Scott Caravello: Yeah, loud and clear! We are now able to add links to the sources we discussed in an episode. And so we're going to be doing that by adding live links into the descriptions on the Spotify and Apple podcast versions of the show. So, please, you know, check that out if you're interested in some more reading based on what we've been discussing. Should be super informative and it'll be great.
Katherine Forrest: oh, that's fantastic! So, we actually had an audience request that came in from India, and then also from LA, so, we got two separate audience requests to actually link some of the materials that we talk about, when we actually mention a specific article or something like that, to actually link them in. Scott, you've fulfilled our users, like, their wildest dreams, or our audience members' wildest dreams!
Scott Caravello: I wasn't going to let people down. You know, I said I was going to be careful with listener trust after I lied about going to bed before midnight on New Year's Eve. So, you know, this is my redemption shot—and, obviously, a big thank you to the folks on the Paul, Weiss team who have helped us make that happen. Yeah. Yeah.
Katherine Forrest: Oh my God, they're the best. They're so much the best. They put up with us, Scott, which is all anybody can ask for! All right, now let's turn to the business at hand, which is we've got really two very different but interesting topics today. The first one up is something that we've talked a little bit about before, and that's world models, but there's a super interesting post. Really, I'd call it sort of like a blog post from IBM that came out in the late part of March that we want to highlight. And we'll talk a little bit about world models and what IBM is saying about world models and use cases for world models. The blog is entitled, “Beyond Language: Why World Models Could Be the Next Frontier for Enterprise AI.” So, we'll do that, and then our second topic is… I just could not, I could not stop thinking about that new Anthropic Mythos [“Mee–those”] or Mythos [“Mith-ose”], right?
Scott Caravello: Totally, totally. It's some serious news.
Katherine Forrest: Right, it's a very, very—apparently very, very—high capability AI model and there was some information about it that's been recently leaked. Let's start from first principles, which is what's a model? We've talked about world models in the past. And, so, a world model is a model of the world, in AI speak, it's a sort of representational model of the world. How does the model learn about a world, if it's not embodied like robotics, it's not actually living and existing and walking down the streets like you know, I am or you are. So, a model, you know, it's like a representation of how the world works. You have Lego models of houses or boats, and you've got language models that are representations of language. And, that is sort of like the beginning point for world models.
Scott Caravello: Yeah, and so just to sort of flesh that out a bit further, right? I mean, take a chair. A chair is not the sound we make when we say the word chair. It's the object. And the letters C-H-A-I-R placed on a page are also not the object. The word chair is, itself, something of a linguistic model of something that exists in the real world. It's representational.
Katherine Forrest: Right, so when we're talking about large language models, they already include a variety of representations of different, all of the learning that's inside of them, not in the form of words, but in the form of numerical representations. So, a chair in a large language model is actually even a step further removed from C-H-A-I-R. It's actually now a numerical representation and then it gets put in a relational way inside the model. So, we've got all of that going on.
Scott Caravello: Exactly. And, so, then I think that brings us to world models.
Katherine Forrest: Right, right. And, so, a world model represents, again, the dynamics of what we think of as the real world. And a world model is, allegedly, a way of trying to approach teaching AI about the physics of the world. You know, when something goes up, it comes down, at least in our world that has gravity. Or as I say, I like to talk about Humpty Dumpty. When Humpty Dumpty falls down and cracks into a lot of pieces, you can't just put Humpty Dumpty back on the shelf and expect that Humpty Dumpty is gonna be whole again. You know, there's a, something has happened when Humpty Dumpty fell and it doesn't just go—you don't just rewind the clock. There's also sort of a, not only a spatial dimension, there's a temporal dimension of time. There's continuation. So, world models allow us to make predictions about current events that we cannot see or are not currently experiencing, but also future events. And so, you know, you and I, we just take it for granted. Like Scott, you probably just imagine, like, don't look behind you, but you probably imagine right now you're just sitting in your room right now and that those same walls that were there before are still there.
Scott Caravello: I can confirm that is the understanding I am operating under.
Katherine Forrest: Right, and that's a world model, Scott. You see, you have got a world model embedded in that head of yours.
Scott Caravello: I've never really thought about it like that, but you're right.
Katherine Forrest: There you go, there you go. And it's just like, you know, the way babies learn about things like object permanence, continuity, all of that is a world model.
Scott Caravello: And Humpty Dumpty…
Katherine Forrest: Humpty Dumpty! You know, who knew that Humpty Dumpty, a little nursery rhyme, would give us a lesson in classical physics… but it does! And it teaches little children that, you know, all the king's horses and all the king's men couldn't put Humpty Dumpty back together again. But, you know, we think of world models, in terms of AI, in two ways. First, as a feature of an LLM—and that we've talked about in the past, and it's one of the things that is in my Of Another Mind superintelligence book, available for pre-order on Amazon right now, where we talk about the whole chapter on world models… but it's a feature right now of LLMs—where LLMs apparently have some emergent capabilities to have world models embedded within them, but now we also have specific technology that is designed around the creation of specific world models as a core architecture. And, so, these models are designed to continuously represent the physical rules of the world, including spatial relationships, temporal relationships, and they have all kinds of data, including text, image, video, simulation data, and other things, that allow them to understand, represent, and predict the world.
Scott Caravello: Right, so with that background, let's highlight that post that we had mentioned that led us back to world models today. And so that's from IBM, and it's called “Beyond Language: Why World Models Could Be the Next Frontier for Enterprise AI.” And we'll link to it in the episode description, like, like we were mentioning, we now have the capabilities to do. But that post gives us some helpful background on the companies working on actually building these models. And, so, the blog post states at the outset that for years, IBM, for example, has been building “AI systems that simulate physical reality rather than just describing it.” They then mention Meta's ex-chief scientist, Yanan LeCun, and his new AMI lab.
Katherine Forrest: And, LeCun's AMI Lab has raised over a billion dollars now, and its backers include some of the biggest tech and industry names, which I won't mention… because every time I do, I have to go through another risk analysis… so, I won't… but, I'll just say, they've got big names behind them. And the AMI Lab has, as its centerpiece, a kind of architecture that's called “JEPA”—J-E-P-A—which stands for Joint Embedding Predictive Architecture. And this JEPA architecture trains AI systems to create representations of their environments. And, so, this is really far from basic word prediction that you have with just transformer architecture. And it's something really, entirely different. And it's looking at the world and representing the world in terms of all of the physical dimensions and ways in which our exterior world comes together.
Scott Caravello: Right. JEPA is designed to understand what is happening in a given state or environment. Just as one example, say that you have an outstretched hand on one side of a frame and then a glass of water on the other. And, so, the model, that's based on this JEPA architecture, is seeking to understand that the hand is about to grab the cup.
Katherine Forrest: So, if I have a piece of pizza in one frame and a hand in another frame, does the JEPA architecture predict that I will grab that piece of pizza?
Scott Caravello: I think that example would work just as well.
Katherine Forrest: Okay, I want to go on to another one! How about a margarita, that's got really great tequila in it, and I have that in one frame, and then I have my hand in the other frame, is the JEPA architecture going to suggest that I'm going to reach across and grab that margarita?
Scott Caravello: Maybe, but I wonder if it can account for me, like, swooping in…
Katherine Forrest: Right!
Scott Caravello: …and taking it out of your hand before you get the chance to drink it. So, you know, multiple possibilities.
Katherine Forrest: Right, and then it has to have the temporal aspect of, as my father used to say, the yardarm, right? It's gotta be past five o'clock. You can't be having a margarita before five o'clock, right, right?
Scott Caravello: Exactly.
Katherine Forrest: So, according to LeCun, the world is unpredictable. And if you try to build a generative AI model that predicts every detail of the future, it's gonna fail. That's one of the reasons why he—and I've heard him speak about this—he has said that he doesn't think that transformer architecture is going to get us all the way there. I have, in the past, said, not that I have any standing to disagree with Yann LeCun, but I have said, I think transformer architecture will get us, for instance, to superintelligence because I think there can be “good enough” and then there can be even better than “good enough.” And, I think transformer architecture is showing that it's “good enough.” But according to LeCun, if you build a model of the world, then the model can predict the unknown just as humans can imagine what might be over a horizon, or when a coffee cup falls off the table, but what's really interesting about this blog is that they're now giving real use cases to these world models and discussing these use cases in ways that are actually helpful for industry. So, let me give you a couple of “for instances.” The blog mentions the aerospace industry or biomedical firms. It focuses on the way in which both the micro and the macro can be predicted with the world model. So, there's a Danish infrastructure management firm that's trying to prolong the lifespan of aging infrastructure and the IBM blog talks about the way in which world models can help generate thousands of trajectories that can learn how physical assets change between one state and another depending upon different things that happen and can make predictions based upon that. And then there's a whole NASA example where IBM and NASA—N-A-S-A—are using world model-like constructs, you know, to predict the evolution of different kinds of atmospheric conditions. So, it's interesting stuff because of these real use cases, but, you know, maybe we should do an episode on JEPA, on Yann LeCun's JEPA. And, I think I even told the audience once that we had done an episode on JEPA and we had not, but I think we should.
Scott Caravello: Oh, 100%.
Katherine Forrest: All right, let's go on to, let's go on to Anthropic.
Scott Caravello: Yeah, so I'm happy to just take that one. And so Anthropic has been up to a lot, as you can see all over the news. Anthropic released Claude Opus 4.6 on February 5th, which is, I mean, gosh, I can't believe that was only two months ago—it feels like forever in the world of AI. There's so much happening.
Katherine Forrest: Your friend wasn't even married then!
Scott Caravello: So true. I was barely back from New Orleans the first time at that point. But they released Claude Opus 4.6 on February 5th, which is arguably one of the most powerful models out there. They then released more than a dozen updates to all kinds of things. And so on March 26th, we had the big event, which was a leak of literature from Anthropic describing their newest, and not yet released, model, Mythos. And Mythos seems like it's going to be a very big deal.
Katherine Forrest: Right, absolutely. I've been watching what I can on YouTube about it and also reading about it on various sites. You know, this is apparently all from a leak that came out of Anthropic. People are speculating: Was it purposeful? Was it not? It really doesn't matter. What's interesting is the content and what information we have about Mythos [“Mith-ose”], or Mythos [“Mee-those”], or however it's pronounced, is that it's going to be, or at least this information suggests it's going to be, a meaningful—is the word, a meaningful—step change above Opus 4.6, and Anthropic's most capable model so far.
Scott Caravello: Yeah, and I mean, just to sort of put it in perspective, is Claude Opus 4.6 is really itself an incredibly capable model. It performs exceptionally well on all of the benchmarks.
Katherine Forrest: It really does. And, so, one of the standout points of this Mythos, I'm just going to choose “Mith-ose,” right? I'm just going to choose my pronunciation right now… I'm choosing “Mith-ose” over “Mee-those.” So, one of the standout points of Mythos appears to be its cyber capabilities. And in the leaked literature, it was described as being able to find and reason about various kinds of cyber vulnerabilities. And this could be very useful to companies and everyone who was interested in figuring out whether or not certain kinds of code or models or apps have cyber vulnerabilities, figuring out what they are, how to protect themselves. But, you know, you're always a little bit worried when you've got such strong cyber capabilities that in the wrong hands, or used in the wrong way, it could also lead to some, you know, cyber vulnerabilities. But there's a lot of work, I'm sure, that's being done right now to harden those guardrails.
Scott Caravello: Right, and then in addition to cyber, the model is also apparently blowing out of the water some of the coding and reasoning capabilities of earlier models.
Katherine Forrest: Right, that's one of the reasons why we understand that the name has been changed from Mythos, which is a totally new name to the Anthropic family of models, from Opus. The suggestion is that it's to sort of demonstrate through that name change a different level of model. And there's also, related to Mythos, a model that is named Capybara. And again, I have no idea how to really pronounce “Capybara,” but that's also part of a more efficient, and a little bit of a step down from the Mythos model, but above the Opus 4.6. Do you know what a capybara is?
Scott Caravello: It's a… large-ish… rodent?
Katherine Forrest: Right, but it doesn't look like a rodent. Okay? I just want to tell you, capybaras, they're sort of cute and they get along socially with other people—Don't look it up! I see you looking it up right now on your phone.
Scott Caravello: No, no, I'm–I'm totally looking it up! I'm sorry, but I might be in the “this does look like a rodent” camp…
Katherine Forrest: Oh, okay. Well, I think it looks sort of cute. Anyway, so, it does have webbed feet… I'm not sure how many rodents have webbed feet. But anyway, let's just–the Capybara model is, you know, part of the Mythos family. But Mythos, one of the issues with it right now is that it is incredibly expensive, apparently, because of the amount of compute that is powering it. And by the way, it's actually got10 trillion parameters, 10 trillion parameters, in the Mythos model, which is, pretty extraordinary.
Scott Caravello: Well, we will certainly cover Mythos as soon as it's released, or as we learn a lot more about it, but until then, we're going to keep an eye on world models. We'll be back with an episode on the JEPA architecture and where the next step change in model capabilities takes us.
Katherine Forrest: Okay, and with that, I'm Katherine Forrest.
Scott Caravello: And I'm Scott Caravello. Make sure to like and subscribe.