Podcasts

Paul, Weiss Waking Up With AI

Paul, Weiss Waking Up with AI

Small Language Models: The Case for Less

In this episode, Katherine Forrest and Scott Caravello explore small language models (“SLM”) and their potential implications for task specialization, speed, and confidentiality. Our hosts also share some recent research covering expectations surrounding SLM adoption and growth.

Stream here or subscribe on your
preferred podcast app:

Episode Speakers

Katherine B. Forrest

Partner

New York

Tel: +1-212-373-3195

kforrest@paulweiss.com

Scott Caravello

Associate

New York

Tel: +1-212-373-3489

scaravello@paulweiss.com

Episode Transcript

Katherine Forrest: Welcome back to Paul, Weiss Waking Up with AI. I'm Katherine Forrest.

Scott Caravello: And I'm Scott Caravello.

Katherine Forrest: And Scott, I was just saying to you a moment ago before we went on and started the recording that you're back from New Orleans. You've survived.

Scott Caravello: I am, I am—in one piece—and it was incredible. I had never been before, and I think I had some of the best meals of my life, honestly. Actually, I think I did have the best meal of my life at Emeril's.

Katherine Forrest: Okay, Emeril's, like that celebrity chef—or not that—THE celebrity chef.

Scott Caravello: The celebrity chef. Yeah, yeah. So it's the restaurant that he opened up after he was the executive chef at Commander's Palace, another famous place down there, but a few years before he started on TV. It was really great. But I know you've had some pretty awesome travels, Katherine, so I want to know, other than cheese pie, what was your favorite meal of all time?

Katherine Forrest: All right. Okay, so I have two things to say. Number one, I just decided last week that I had to get away and get some vitamin D into, like, my life because of the lack of sun. And we've had, like, a real winter this winter. So Amy and I went down to Little Palm Island, which is off of Florida. First of all, it was a direct flight. Who knew? Who knew? And they have incredible food. So that's like a little sort of a hidden gem. But then I have a second story, which is the most satisfying food sort of era of my life, which was when I was pregnant with my second child. He was born at 10 and a half pounds. You might be able to tell where this is going. And he's now six foot eight. I probably caused all of that. But I ate more Quarter Pounders with Cheese. I thought it was my absolute duty when I was pregnant never to have a moment of hunger. I thought, I can't do much for this child right now except eat. And if I had hunger, I felt like I was doing something wrong for the baby. So every time I felt hunger, I went and got a Quarter Pounder with Cheese.

Scott Caravello: That's great.

Katherine Forrest: And now—I think he's six foot eight, but he claims he's six foot seven. But.. anyway. It's all from Quarter Pounders with Cheese. But anyway, now, let's get to some serious business, which—having a ten-and-a-half-pound baby is some serious business. But we're going to talk about something on the other end of the spectrum, which is small language models today. And, you know, I had actually read a really interesting research paper on small language models, which also have the acronym SLM, and then saw this survey that was done recently by IBM—and it actually assembled a bunch of executives talking about AI, and I thought that they would together make a really interesting set of topics for our audience.

Scott Caravello: Yeah, that sounds great. And, so, I think maybe, just to give a little bit of background, folks, right, we've talked a lot about LLMs, but talking about these SLMs and exactly where the difference lies. So the LLMs can have hundreds of billions or more parameters. And there's not really a single definition of an SLM and what sets them apart. But the SLMs are considered to generally have hundreds of millions of parameters and up to about 10 billion parameters.

Katherine Forrest: Yeah, so they can be literally hundreds of billions of parameters smaller. So a small language model is still a language model, but it has many fewer parameters. So, the research paper that I'd mentioned that I really wanted to talk about was one which has a series of researchers from NVIDIA along with the Georgia Institute of Technology. And this paper is called, “Small language models are the future of agentic AI.” And the audience, if anybody's interested and wants to read it, they can get it on Archive—A-R-X-I-V—which is that fantastic Cornell University–sponsored repository of papers. And this paper is dated September 25, 2025. And while that is actually a couple of months old, this is by no means outdated yet. This is a paper that is very live, and the basic premise is that SLMs are showing a kind of efficiency in use that is, I think, a little bit unexpected.

Scott Caravello: Yeah, and when we talk about efficiency in use, we're also talking particularly about agentic tasks. And I think that this is where this whole discussion gets really cool. Because the concept of the paper is that with an agentic task, you might be using portions of different LLMs to accomplish specialized components. And if you use the smaller LLMs, you can achieve that same goal of specialization, but without some of the expense of LLMs.

Katherine Forrest: Right, like training time or energy use and all that goes into an incredibly large and highly capable and powerful LLM. So, you know, this could be really interesting if it catches on even in a specialized way, but I have to wonder if you would need too many specialized smaller SLMs to do the work that components or a seriatim series of analyses of a larger LLM could accomplish.

Scott Caravello: Right, and so the paper lays out its own pros and cons for the SLMs. And one of the questions is, frankly, whether the capabilities of LLMs and their scale will just keep them so far ahead of SLMs that breaking out a task could lead to some reduction in capability.

Katherine Forrest: Right, and another is that LLMs are able to handle agentic tasks already. They've got hubs and routers, and so they're able to send specialized tasks to specialized places. And in some ways, that's what the SLM paper is advocating for—taking a specialized task and sending it to a specialized SLM.

Scott Caravello: Right, so we'll just have to see where it goes. But it feeds directly into the interesting survey that you previewed, which, as you also mentioned, was published by IBM, and it's titled, “The Enterprise in 2030.”

Katherine Forrest: Yeah, you know, and I saw this on one of those morning sort of blasts that you get with email, and I frankly wish I could remember which one, but I read some really enticing, you know, sort of blurb about it, and then I printed it out, and it's a really interesting survey, so I encourage our readers to go ahead—you can get it publicly off of the IBM website, and so...Here's the tie-in to today's discussion of small language models, which is 72% of the groups of executives who were surveyed actually agreed that the use of small language models will surpass the use of large language models by 2030. And that's actually a pretty big deal. And frankly, I'm surprised that there was such a large percentage that thought that the SLMs were sort of the wave of the future.

Scott Caravello: Yeah, and, you know, they–they also overwhelmingly thought that AI was going to drive revenue significantly by 2030. And so even though we can't necessarily say that in 2030 it will drive revenue by taking over X task and performing Y function specifically, we're still getting some important insight into how the executives surveyed expect that the technology will be driving these shifts to take shape, right? SLMs in combination are potentially going to be driving big revenue growth. And so this study is actually a collection of views from executives from a large number of companies, over 2,000 C-suite execs. So we are talking about a pretty large sample.

Katherine Forrest: Yeah, and, you know, it's really well done. It's got a lot of facts and figures that are sort of peppered throughout and a number of quotes from some of the executives. And we'll leave it to our audience to go ahead and read the paper to sort of get the names of the folks who are quoted. But one quote that I really love is the following. I think it's really appropriate. And it's, quote, “The most successful organizations will reimagine how humans and machines collaborate to achieve more than either could on their own. It's this dynamic that will define the winners of the next decade—not deploying the most powerful technology and making the biggest cuts to headcount, but building AI that knows the business, reflects its values, and amplifies the expertise of its people.”

Scott Caravello: Great quote.

Katherine Forrest: It really is a great quote. There are a number of, as I've said, great quotes in the paper, and another, which is on page 27, says—here's an executive who's saying the following: “We're entering an era where the entire workflow must be fundamentally re-examined, shifting from sequential to parallel processes.” And here's one last one. And it's about human capital. And it is: “The generalist, who's smart, hardworking, and has the communication skills to inspire people will be more important as domain experience becomes much more commodified.” And that last quote, by the way, was on page 44.

Scott Caravello: Let's go back, in the context of what some of these executives said, and look a little more at SLMs.

Katherine Forrest: Right, and as an offshoot to the smaller size of an SLM that we've already mentioned is, in fact, that an SLM should be able to run locally. And this is actually sort of a big deal. Let's just pause on that. Right now, LLMs are run through the cloud. And an SLM might be able to be run locally or on an edge device like a smartphone or a personal computer. And that would be a real sea change.

Scott Caravello: Yeah, that's a huge difference because right now almost all LLM work is in the cloud.

Katherine Forrest: It really is. And so, you know, we're talking all the time to clients of ours about what that means and who has access to things. You know, it's pretty locked down. There are lots of cybersecurity protocols that are in place. But it would–it will fundamentally change that aspect of the overall process. And so it's got a lot of implications because when a model is running locally on your device, it doesn't need to communicate with the cloud server to take in your input and to, you know, bring out or spit out a response. On-device really means on-device. It means on the device… on-a-device. And that often automatically and dramatically can improve speed. Actually, what I'm going to do is sort of put it the opposite way: it will reduce the latency of what's required to send something sort of up to a cloud and then back down. So that's at least one of the things that SLMs can do. And there's also an argument that SLMs, if deployed locally, afford deployers more control over the model, given that they're going to have their own infrastructure that it's deployed on, potentially. And that could simplify some thorny issues such as confidentiality, and it might be less expensive to fine-tune. You've got control over the fine-tuning and, you know, we'll see how it goes. These are all the expectations. There's a lot of TBDs here.

Scott Caravello: Yeah, but you're right to press on all that because the points about autonomy, cost efficiency, and speed are great in principle. But I think a natural follow-up question to this framing is where can this play out in practice? And in many ways, it's enterprise-specific, but it comes down to the fact that you can use a series of SLMs, each for a few tasks or each for a single task, depending on what the model is best suited for. And because there's an argument that they're cheaper to customize or run, that becomes feasible.

Katherine Forrest: Right. And if folks are listening and saying to themselves, having listened to some recent episodes of ours, that all that sounds like, uh, potential for a lot of AI models to be working together in a sort of coordinated way, and that could form a distributed AGI, such as the kind of patchwork quilt that we were talking about in a prior episode, you're right. There's sort of a relationship between what we were talking about in that episode with distributed AGI and, actually, specialized SLMs. You've got a lot of daylight between the two because you're talking about different capabilities and a lot about what that means, but you're also talking about putting the power of multiple specialized devices together at once.

Scott Caravello: Well said. But going back to the NVIDIA and Georgia Tech paper—and why the authors think this future filled with SLMs is feasible, and this mirrors everything that you laid out before, Katherine—is that they take the view that SLMs are powerful enough, can be more suited to agentic tasks, and that there can be a cost difference. And so, on the first point, they call out that the capabilities of SLMs have advanced significantly over the past few years. And while LLMs are certainly ahead on the frontier performance, the performance of SLMs can, quote, “meet or exceed performance previously attributed only to large models.” And then the second point: they argue that since only a limited subset of the model's capabilities are needed for a given task, an appropriately fine-tuned SLM suffices while providing efficiency gains and flexibility relative to LLMs.

Katherine Forrest: Actually, I totally agree with all of these things. I'm just wondering can the SLMs achieve or have the same kinds of emergent capabilities as the LLMs? And so we'll have to, you know, take a look at that. And we have a little bit of information on that that we'll talk about in a moment when we discuss some of the models that are already out there that sort of fall in this bucket of SLMs. But, you know, the cost issue, the cost difference that you just mentioned is actually one that people are also interested in. But, you know, it's one of the questions, I think, because you need to have the same capabilities in order to be able to actually make an apples-to-apples comparison.

Scott Caravello: So, all of that said, we've been talking about SLMs and LLMs as abstract categories and the different arguments for all of them. So I think, practically, maybe we can get to what's being done to build them. And, you know, if I can just kick that off—in a lot of cases, the development of small language models happens in tandem with frontier LLM development. So from major developers, for instance, you have DeepMind's Gemma models and OpenAI's mini models, like GPT-4.0 Mini.

Katherine Forrest: Right, and to be at the frontier of the small language models is almost necessary today because these small models are trained by distilling knowledge from larger models, and essentially there's a distillation of the weights of the larger models into the smaller models, if you will. So although the final model is small, it does get some of the direct benefits of the large models. What I haven't seen yet is sort of a real research paper that discusses the emergent capabilities transitioning over or not. We know some may, but we don't know about all of them.

Scott Caravello: Yeah, and for those specific examples, though, that we were raising before, the SLMs that are based off of DeepMind and the OpenAI models, the market is again largely relying on the original developers to do the work of distilling smaller models from the frontier LLMs. But in the case of open-source AI, whether you're looking at a model from DeepSeek or Meta, it's possible for anyone with sufficient resources to do this work. If you have access to that full base model, the larger one, and its parameters—and, you know, in this context, that larger model is also referred to as a teacher model—then any enterprise can carry out that distillation process. And so that's what's called, “white-box distilling.” But this isn’t generally feasible with the proprietary closed-source model, which is a process called, “black-box distilling.”

Katherine Forrest: You know, I really now am intrigued, and I want to find out whether or not through that distillation process the emergent characteristics actually transfer because, as we know, emergent characteristics are things which are unexpected and not trained for. But if you're taking the parameters and you're distilling the parameters down, are you capturing those emergent capabilities? It's actually a really interesting question. And, Scott, it's one that we don't know the answer to, do we?

Scott Caravello: I think that's right, but you know what? As soon as we pin down the research on that, we are going to bring it up on this podcast, no doubt.

Katherine Forrest: We are, we are. And as we've said, 2026 is going to be a big year for AI, and all of this discussion around SLMs and the introduction of that vocabulary word to our audience is just another reason for that. So we're going to see a lot more, I think, in the coming year as we move into ever more sophisticated, agentic AI and differentiated, agentic AI. We certainly will see some SLMs and continued high-capability LLMs. And so we're gonna continue to report on how this goes. This is just one more of those aspects of AI that was perhaps unexpected, but it's really taking hold now. And that's all we've got time for today, folks. I'm Katherine Forrest.

Scott Caravello: And I'm Scott Caravello. Make sure to like and subscribe.

View Full Transcript