skip to main content

Multimodality and the Build Versus Buy Debate

In this week’s episode, Katherine and Anna take a deeper dive into large language models (LLMs) and multimodal models (MLLMs). What do you need to know about LLMs and MLLMs, and should you buy one or build your own?

  • Guests & Resources
  • Transcript

Katherine Forrest: All right, good morning, everyone, and welcome to our third episode of “Waking Up With AI,” a Paul, Weiss podcast. And I'm Katherine Forrest.

Anna Gressel: I'm Anna Gressel.

Katherine Forrest: And it's, I hope, early morning for you folks. So, Anna, you got a big cup of coffee? You ready for our discussion today?

Anna Gressel: I do. I'm so excited to go. For our listeners who have been with us in our first two episodes, we briefly introduced you to the concept of defining AI and why that definition that a company uses internally can really matter. We also gave you a bit of a short overview of how AI works in practice.

Katherine Forrest: In today's episode, we're going to talk a little bit about what a large language model is, otherwise known as LLMs, and also this newest hot category of models that we're starting to hear about on the scene, multimodal LLMs. And so we'll talk a little bit about that.

Anna Gressel: We're also going to talk a bit, I think, Katherine, about the debate of build versus buy. Should you license and model? Should you build them yourself? I think that's a big one for companies.

Katherine Forrest: Well that sounds a little bit like a do-it-yourself project for an LLM.

Anna Gressel: Well, it almost is. It's a very sophisticated one. So why don't you talk about LLMs and MLLMs and I'll talk about that build versus buy debate and we'll kind of cover both.

Katherine Forrest: So a large language model is referred to as an LLM, and we can think of those as like the ChatGPT models that everybody's heard about. Also the GPT 4.0 is a version of that. Llama, there are various versions of that. Claude, Bing, and then there's a new one that was just released by Google. Actually it was a conversion of Bard to the word Gemini, and that's the newest of the Google models. And large language models are part of a family of tools that you can think of as generating content. That's the Generate Content series of models, generative AI.

Anna Gressel: And Katherine, what are the three points we need to know about LLMs generally?

Katherine Forrest: Well, what I think about when I'm thinking about an LLM is first of all, I think about how it's structured because it's really structured on a model of the human brain called a neural network. And that ends up becoming important because these are incredibly complex models that are in a sort of a complex black box. And they need vast amounts of data to be trained on to understand our world. They scrape sometimes or use scrapes of the entire internet or huge databases. And also to get really useful output from a generative model, you've actually got to spend a fair amount of time giving that model some good instructions, and that's called prompt engineering.

Anna Gressel: And what about MLLMs or multimodal models?

Katherine Forrest: Well, multimodals are something that's related to an LLM but different. Not only do they take in text and use a text prompt to output text, but they can take in really any kind or any mode of data. It can be audio data, video data, and it can output any kind of mode of data—audio, video, textual, photographs, things like that.

Anna Gressel: I think my favorite mode is thermal, actually. I think it's so cool. Some of these models actually process thermal data. And think about what you could learn from that. I just think it's fascinating. But what are some other examples, Katherine, of big multimodal models?

Katherine Forrest: Well, Gemini is a really recent multimodal model, which people may have heard of just last week or whenever you're listening to this a couple of weeks ago. Sora by OpenAI is another multimodal model. GPT-4V, and there's a nice system card that people can read from OpenAI, and GPT-4V is another multimodal model.

Now, with all of that said, let's talk about that build versus buy debate and some of the pointers that companies should keep in mind when they're thinking about with LLMs or MLLMs whether they should build versus buy.

Anna Gressel: I mean, I think there's no one right answer. We have a lot of clients across different kinds of industries, and sometimes the preferences are industry specific, depending on whether you have to tune a model for a specific circumstance. Some companies want more control of their model. Sometimes there are real confidentiality issues at play, and sometimes they just really want to be able to customize it for particular circumstances. I think of this in the medical diagnostic context. You really have to kind of customize models to work. It's hard to do it off the shelf.

What do you think, Katherine, in terms of what is better?

Katherine Forrest: Well, I think you have to have really deep tech resources to do the build in-house. You've really got to have the ability to not only do an initial build and have all of those capabilities in-house, but you've actually got to be able to keep up with the advances. For instance, hallucinations.

Anna Gressel: Yeah, I mean, hallucinations are a real problem. For folks who don't know what hallucinations are, those are errors that come out of generative AI models because they, you know, aren't necessarily always going to give you an accurate response, for example. And so a hallucination might be something that's fabricated. You know, what we're actually seeing is there are some real advances being made with respect to hallucinations, and some of them are based on kind of advanced techniques.

Multimodal LLMs are showing real promise in this area too. And so for companies that may want to stay up to date on the latest advances, it can actually be helpful to license a model from a company that's putting in guardrails. And sometimes those are like output protections. And you get the benefit of that if you're licensing versus trying to build that all from scratch yourself.

Katherine Forrest: And let me just mention, because I think it's actually relevant to our conversation about sort of a build versus buy hybrid, which is where you license in the LLM, the base foundation model, the large language model, and you actually finetune your own model on top of that, which you can actually build in-house, or you can actually license in a finetuned model. But that allows you to have, with a finetuned model, a model that really is geared towards your particular use case or your particular data. So it can actually be an interesting sort of build versus buy hybrid. But otherwise, I think frankly, I think it's hard. It's hard to keep up with all the tech if you're doing the build in-house. But there are certainly companies that are capable and that are able to do it.

Anna Gressel: Yeah, and certainly techniques that are worth keeping your eye on, like retrieval augmented generation, which we're seeing in the market and helps you kind of find documents. Those are great.

Katherine Forrest: All right, folks, so that's it for today. We've given you a quick overview. See you next time. I'm Katherine Forrest…

Anna Gressel: And I'm Anna Gressel.

Katherine Forrest: your hosts for “Waking Up With AI,” a Paul, Weiss podcast. Thanks for listening.

Apple Podcasts_podcast Spotify_podcast Google Podcasts_podcast Overcast_podcast Amazon Music_podcast Pocket Casts_podcast IHeartRadio_podcast Pandora_podcast Audible_podcast Podcast Addict_podcast Castbox_podcast YouTube Music_podcast RSS Feed_podcast
Apple Podcasts_podcast Spotify_podcast Google Podcasts_podcast Overcast_podcast Amazon Music_podcast Pocket Casts_podcast IHeartRadio_podcast Pandora_podcast Audible_podcast Podcast Addict_podcast Castbox_podcast YouTube Music_podcast RSS Feed_podcast

© 2024 Paul, Weiss, Rifkind, Wharton & Garrison LLP

Privacy Policy