skip to main content

Multimodal Models and the Data That Drives Them

In this week’s episode, Katherine and Anna introduce multimodal models, discuss the difference between AI and large language models and explain how data fuels the entire enterprise.

  • Guests & Resources
  • Transcript

Katherine Forrest: All right, good morning and welcome everyone. This is episode number two of the Paul, Weiss podcast “Waking Up With AI,” and I'm Katherine Forrest…

Anna Gressel: I'm Anna Gressel.

Katherine Forrest: And we want to welcome you all here and thank you for listening. So today, in our second episode, we're going to try to give you a very painless technical overview of some of the concepts that you really would want to know about in terms of AI so that we can then help put together some of the legal issue spotting that will go with those. We'll do that in our five to seven minute segment format. And so let's just jump right in.

Anna Gressel: And Katherine, for folks who didn't join us last time, I would say tune into our last episode to get a definition of AI, why we really like that definition. And we're going to build on this today. And in general, our podcast is available on all the platforms. We're going to bring you along with us, but feel free to kind of peek back if you want to see how we define AI between us.

Katherine Forrest: And there are, in addition just to AI, there are also these other things called large language models, LLMs, which basically everybody has heard of now with ChatGPT, and then something else called a MLLM, which is a mouthful, for an acronym called multimodal LLMs. Let's just talk a little bit about what LLMs and MLLMs are.

Anna Gressel: Yeah, I mean, I think like the world really changed with the release of ChatGPT. And certainly our lives changed as lawyers who focus on the AI space. I mean, when we think about ChatGPT and other LLMs—and Katherine, you should jump in—I think we're really talking about major artificial intelligence systems or models that can generate content. They synthesize and generate content like text and code and images and audio and video. They're really kind of amazing tools and they can be used for a whole range of purposes, they're really flexible. And so we're seeing not only the creation of what we call foundation models, which are like the base layer of that technology stack, but also all kinds of applications in the market now that are tuning those models for specific purposes. We see those released with nice user interfaces. So we're seeing not only kind of the creation of really new technology, but also a whole new application stack on top of that.

It's just a really exciting time right now and we should talk a little bit about the multimodal models. I mean, for me when I got really interested in multimodal models, I think was when Meta put out its ImageBind paper, OpenAI put out its GPT-4V paper, and those to me were like almost like another ChatGPT moment when you were like, “Oh my gosh, this is what this technology is capable of. That's amazing.” And there was just another really important release from OpenAI.

Katherine Forrest: Yeah.

Anna Gressel: Katherine, I'll turn it back to you. What do you find most exciting about multimodal? Because I know we talk about it all the time.

Katherine Forrest: No, the OpenAI. You were just about to mention the OpenAI Sora release, which is incredibly exciting. So I really encourage people to take a look at that because it's a leap forward with these multimodals. And you're going to want to understand it because it's going to have implications for how businesses are going to be able to utilize all kinds of video content within their companies. It's going to actually have implications for intellectual property. It has a whole host of implications that we'll be talking about over time.

But one thing that I just wanted to mention, Anna, about the difference between an LLM and AI as we talked about it last time was the difference between narrow AI, which are AI tools which were in existence for the last decade that were sort of single use tools, things that were, for instance, for human resources, or for recognizing tumors in the medical field—things that were really sort of single-use tools versus LLMs, which are broad tools. It's a really broad-based generation tool which can make all kinds of content. And I actually heard sort of an interesting fact the other day that 50% of new code right now is made with generative AI…

Anna Gressel: It's amazing.

Katherine Forrest: So, these are, we're going now with leaps and bounds of advances. But let's jump in and talk about how AI works. And we'll do it quickly and painlessly. So what do you think are the main issues, the main facts that people need to know about AI?

Anna Gressel: When we think about AI tools, the main fact that I would say is you have to know that this is being run on data. The data is really the driver here. And what we're doing is trying to create models or algorithms that learn from data in the environment and therefore can make predictions or generate content that tell us something interesting or help guide us in the world. And it's really this data-driven enterprise. I know everyone says, you know, data is the new oil.

Data is the value here. And I think there's some real truth to that. I mean, there's, of course, truth to also the expertise that you need to make the models. But data is what is creating the actual outputs and recommendations. And Katherine, do you want to explain how you think about this? I know you have your bread recipe example, which I love. Maybe that would be helpful for the audience to hear about.

Katherine Forrest: Yeah, I have a bread recipe example that I give and I'll do it really quickly, which is that, you know, if you have an algorithm, and there are many different kinds of algorithms, but you take an algorithm and you want to make the best bread recipe, so your data set would be all of, say, a series of cookbooks gathered in one place, and you set the tool loose, the tool will sort of whiz and whir over the bread recipes within these cookbooks and come up with the various inputs which are flour, water, yeast, etc. And then those get weighted by how much of each. And so it comes up with a pattern recognition based upon that data set. So that's just a really a sort of a high level example. But it's pattern recognition. And that's true for both narrow AI and for the more complicated generative AI. Data, data, data. And by the way, one thing I wanted to mention about data is that the context of the data can matter. And that's where you can run into things like algorithmic bias is data sets that are chosen that may or may not be fully representative.

Anna Gressel: And when we think about, you know, like, Katherine, I just know you and I talk a lot about, like, what's different with LLMs, what's different with multimodal. Like, when we think about LLMs, what's different? It's, you know, the amount of the data that we have. Like, LLMs are trained on the whole internet, and multimodal is trained on all different kinds of new data, audio, video data, sound clips, right? So we're really looking at this rich data environment.

Katherine Forrest: All right, so that's a high level, very quick and dirty introduction to some of the technology. I'm Katherine Forrest…

Anna Gressel: I'm Anna Gressel.

Katherine Forrest: And we want to again thank you for joining us for the Paul, Weiss podcast, “Waking Up With AI.” Thanks everyone.

Apple Podcasts_podcast Spotify_podcast Google Podcasts_podcast Overcast_podcast Amazon Music_podcast Pocket Casts_podcast IHeartRadio_podcast Pandora_podcast Audible_podcast Podcast Addict_podcast Castbox_podcast YouTube Music_podcast RSS Feed_podcast
Apple Podcasts_podcast Spotify_podcast Google Podcasts_podcast Overcast_podcast Amazon Music_podcast Pocket Casts_podcast IHeartRadio_podcast Pandora_podcast Audible_podcast Podcast Addict_podcast Castbox_podcast YouTube Music_podcast RSS Feed_podcast

© 2024 Paul, Weiss, Rifkind, Wharton & Garrison LLP

Privacy Policy