Podcast Banner

Podcasts

Paul, Weiss Waking Up With AI

Blueprints for Brains: The Architecture of Intelligence

In this episode, Katherine Forrest and Scott Caravello break down three generative AI architectures—transformers, JEPA, and diffusion models—exploring what sets each apart and how they overlap. They also discuss Manifold-Constrained Hyper-Connections, a recent innovation aimed at improving how transformer layers communicate during training.

For the sources referenced in this episode, please see the links below:

DeepSeek AI: mHC: Manifold-Constrained Hyper Connections

Stream here or subscribe on your
preferred podcast app:

Episode Transcript

Katherine Forrest: Hello everyone and welcome to today's episode of Paul, Weiss Waking Up With AI. I'm Katherine Forrest.

Scott Caravello: And I'm Scott Caravello. Katherine, how are you?

Katherine Forrest: Oh, wow. That was like… that was really nice. How am I? Yeah, all right.

Scott Caravello: Of course. I'm curious. I'm interested.

Katherine Forrest: So there are days when I feel like I've been shot out of a cannon. Have you ever had those days?

Scott Caravello: I think I might also be having one of those days if that's where this is going. So yeah.

Katherine Forrest: Yeah, that's where this is going. So, today is a day that I feel like I've been shot out of a cannon. And I don't think I have a trampoline on the other end. I'm really worried that I have an arc. My cannon arc is just being shot out of the cannon and maybe I land on a treetop or something if I'm lucky. So it's busy. It's busy.

Scott Caravello: I'm rooting for you. It's good. Busy is good though, you know. We're plugging along. We're doing the AI thing. It's great.

Katherine Forrest: We're doing the AI thing and boy, I am getting so many questions, by the way, on Mythos and that's not today's episode, but I was studying and then going back and reading and rereading the system card. Wait, and I have to tell our readers about one thing in the system card for Mythos that before we get to our JEPA episode— this is the JEPA episode everybody, don't turn that dial—but I have to read something because it's really interesting and we didn't cover it in our Mythos episode last time, but it's on page 11 of the Anthropic system card dated April 7th, 2026 for the Claude Mythos preview. And it says, “we remain deeply uncertain about whether Claude has experiences or interests that matter morally and about how to investigate or address these questions. But we believe it is increasingly important to try.” So, I just want people to pause on that as a statement coming from a major, major, one of the major model developers. But anyway, that's what you get when you ask me how am I? So, you know, we had talked a couple of weeks ago about model architectures and in particular about world model architectures. And we talked about doing an episode on JEPA, which is J-E-P-A, all caps. And I said, we promised this episode a couple of weeks ago. And then last week I said we'd follow it up. And then we had the late breaking news of the OpenAI industrial policy. But here we are, we're going to cover JEPA today. And I think it's worth also doing a little bit with diffusion models because we haven't done so much with that. So we'll be comparing those to what is the classic transformer architecture, which is the main architecture that we all think about when we're talking about the Anthropic Claude models or the ChatGPT models or the majority of the Llama models, et cetera, et cetera, the Gemini models. So let's go for it.

Scott Caravello: And, then I think, time permitting, we can also touch on a somewhat recent, it's from 2026, recent and exciting innovation in transformer architectures, which is called “Manifold-Constrained Hyper-Connections,” or mHC.

Katherine Forrest: Little “m,” capital “H,” capital “C.” Manifold-Constrained Hyper-Connections… like, what kind of word is that?