Podcasts
Paul, Weiss Waking Up With AI
Hyperscalers: Where the Cloud Touches Ground
In this episode, Katherine Forrest and Scott Caravello go inside the material world of digital minds. Our hosts explain how companies operate the massive data centers serving as the physical foundation for AI, break down the staggering energy demands behind them, and consider what powering the future might mean—by way of gigawatts and governance.
Episode Speakers
Episode Transcript
Katherine Forrest: Hello everybody and welcome back to Paul Weiss Waking Up With AI. I am Katherine Forrest.
Scott Caravello: And I'm Scott Caravello.
Katherine Forrest: And, Scott, we always start these episodes with, like, sort of, like, something real life and real world. And, so, I basically saw you with a lunchbox the other day… and I just need to know... I need to know why… because, you know, we have, like, free food in the cafeteria?
Scott Caravello: You know, I'm on a little bit of a health kick. Maybe it's still some, you know, leftover New Year's resolution, but I'm packing my own lunch–or trying to–it's not going that great, but I am trying. Are you a fan of the free lunch though?
Katherine Forrest: Okay, well, first of all, I just want to have some breaking news here. We have a salad bar. All right. So, and, we have a fruit bar. Yeah. So, you know, it's really, it's funny because as a relic of COVID where the firm—here, at Paul, Weiss, we—started offering free food. We've kept it. And it's for all staff and all lawyers. There's no distinction. You don't show an ID if you're in the building, and you're in our offices, you know, you're entitled to walk into the cafeteria and it's great. It is great because the offerings are really varied. See, I sound like a commercial now, I'm like an infomercial. But anyway, I love it because I go and I get my banana every day. But getting a banana that doesn't have like the little brown spots on it is, that's my only challenge. And then we have this great, like, latte machine that you can use or you can go to a person who's like, you know, a barista. Anyway, it's all great. It keeps everybody, you know, feeling good and I think keeps people in the office. So it's all good, all good. You should try the salad bar!
Scott Caravello: All right, I'll give it a go today. I'll keep you posted.
Katherine Forrest: All right, well, so I have to say that we've talked about infrastructure a lot over the last year. And, you know, one thing that was on my mind, it was, Scott, on-my-mind, was the use of the term hyperscalers that people have been throwing around like it's a well-known term that you somehow learn, you know, somewhere in like middle school or something: hyperscalers. And I had no idea what it meant, but so many people were using it that I was like, oh, it must just be me. Then I realized that actually hardly anybody knew what a hyperscaler was… they were just like using the word to mean like ‘big.’ So, I thought we could spend some time talking about what hyperscalers are and, you know, it's not intuitive so I thought maybe we could spend some time explaining it.
Scott Caravello: Nope, that is a great idea. Should we start with some of the basics?
Katherine Forrest: Well, you know, I love basics because my mind works with basics. So, a “hyperscaler” is, basically, it's a company, and many of them, we’ll talk about them in a moment, are names that you're, everybody's, really familiar with that can: build, operate, and scale really massive computing infrastructure on demand. And, it's really like planet-scale data centers and it usually has enormous physical infrastructure, where it has custom servers, global data centers, you know, millions of chips whizzing and whirring away. It's got flexible scale. They can scale compute and storage and networking capabilities for their customers, you know, up or down, and third parties can build on their architecture. So, some of the well-known examples of hyperscalers are like: AWS, Microsoft, Azure, Google Cloud Platform, and–increasingly–Meta and Oracle, though a lot of that is, for Meta at least, is still in-house.
Scott Caravello: Right, and, so hyperscalers, you can think of them as the “physical substrate of AI.” And, Katherine, I know that you're going to ask me to explain what substrate means for listeners. And basically, you know, it's a term that's used a lot in AI. It's an engineering term, but it refers to kind of the base layer or the foundation. And so hyperscalers are that foundation, right? They can have hundreds of thousands of GPUs, incredibly fast connections, and with specialized cooling and power delivery to keep the whole thing running.
Katherine Forrest: Okay, so, before we get to GPUs, let's just do a quick, like, one-sentence refresher for folks on what a GPU is.
Scott Caravello: Yeah, absolutely. So GPU stands for “Graphics Processing Unit,” right? And they were chips that were originally designed to support gaming, but then it turns out they were really great at handling complex operations needed for AI. And so they've really been the foundation of AI training. And they can handle operations, that are far more complex, and do so far more efficiently than CPUs, which are central processing units.
Katherine Forrest: Right. So maybe it's important to make sure that we all know that hyperscalers, first of all, predated AI, because as we talk about them as so massive with massive amounts of compute and the GPUs and all of that, that makes it sound like it's only AI. But they actually emerged to solve internet-scale problems. And originally, they emerged to, essentially, be the infrastructure to power search engines, e-commerce, email, video streaming, et cetera, et cetera, and they still do. And at that point, they were really housing more CPUs than GPUs, but now supporting AI has brought in more GPUs, which actually take more compute, and it's now another function of hyperscalers to actually support AI. So it's an additional, sort of, additive line of business, so to speak, for the hyperscalers. So, in prior episodes, you know, we talked about the importance of compute and how frontier models are trained with huge amounts of compute and also take huge amounts of compute to do all of the inference or question answering, if you will, when we think about the word inference and hyperscalers actually provide that compute.
Scott Caravello: Yeah, right. So in effect, in the AI context, they're turning AI from just research into a service.
Katherine Forrest: Right. You know, there are like data centers that have housed numerous servers. Those have been around for decades in the biggest research companies, et cetera, et cetera. And, now, they've actually moved beyond that.
Scott Caravello: Yeah, so defense contractors, as well, also had those huge data centers.
Katherine Forrest: And energy and telecom companies had those, you know, had hyperscalers and had their own in-house huge data centers.
Scott Caravello: Right, so then in the AI world, hyperscalers are, you know, where the control panel sort of lives. They provide the access to tools, the power to run the tools, and if there's a shutdown, it would need to include a hyperscaler running a model.
Katherine Forrest: Right, hyperscalers are also a place where a huge amount of data is stored, of course, because, right, you've got the GPUs, you've got the CPUs, you're going to have their logs of what they've done, and you're actually running a lot of information through them.
Scott Caravello: Yeah, and so one of the things though that people don't even think about is how AI tools and models get access to GPUs that they need to perform their processing. And so it's the hyperscalers allocating the GPUs. It's the humans who are controlling them.
Katherine Forrest: Right, and that's where there's an enormous amount of sort of human input along with AI input, but there's training schedules that occur, you know, for the models and the tools that are, you know, being trained and are deployed, and those are also increasingly accessing design custom silicon that is also sometimes a business that some of the hyperscalers are in and so the hyperscalers are, they're almost like functioning like a road designer where they figure out the traffic patterns and where some of the stoplights and merge signs are, where the speed limits are, how the traffic can best be routed, how it can be slowed, how it can be sped up to make everything work incredibly smoothly.
Scott Caravello: Right, and then one other thing that I think is really interesting and worth fleshing out, like I had mentioned just a few minutes ago with respect to shutdowns, is that the hyperscalers are at least one place that the sort of governance kill switch could live. If we ever had to shut down an AI model because it was out of control, hyperscaler would need to be a participant in that. I know that, you know, Katherine, you have a lot of thoughts about that and the idea of a kill switch, which has been a controversial piece of some state AI legislation. But theoretically, that's one place it would be. Realistically, if one is gonna exist, it'll actually live in several places, and, you know, the power grid is an even earlier point in the chain.
Katherine Forrest: Right. And, so, going back to our hyperscalers, the labs are where AI is designed and are intellectually sovereign, right? The labs are sort of the place of the intellectual sort of “heart.” But the hyperscalers are operationally sovereign. And in many senses, they're the but-for-without-which-not, as some people say, because they enable the massive training runs, the massive amounts of inference, the global deployment. All of that comes from the hyperscalers at scale.
Scott Caravello: So then for compliance lawyers, what does a hyperscaler have that could matter?
Katherine Forrest: As I was mentioning before, they have those logs that contain all kinds of runtime information about the data, about what's happening within the GPUs, within some of the CPUs that are also being used. They're monitoring the data. They have error logs. And if there ever comes a time when there are really registration requirements that are imposed on frontier models who sort of have hit a certain threshold, or certain capabilities, that's one place, and probably only one place, that that registration requirement could live.
Scott Caravello: So let's talk about energy and AI, right? Because hyperscalers are a big part of that discussion.
Katherine Forrest: Right. So, this is actually a really big deal and it explains a lot about where some investment dollars are going right now. So, hyperscalers have to have electricity to run the GPUs and those CPUs that are housed within massive buildings and campuses and they also need that electricity to run networks that have switches and fiber that the data runs through and routers that sends things in different directions. So, there's a massive amount of electricity that's being used by the hyperscalers.
Scott Caravello: And, then, there are also the cooling systems because all of that machinery can get very hot. And, so, the hyperscalers need to use electricity to cool the systems, and they also use electricity for data storage to keep all of that running.
Katherine Forrest: Right, so let's sort of pause on that because when you think about heat, what's interesting about the hyperscalers is that 90% plus of the electricity used by all the constituent parts somehow, and there is some sort of mathematical formula that people use for this, can become a form of heat or can translate into a form of heat. And that all needs to be cooled.
Scott Caravello: Right, and so because that cooling is critical, especially, and all of that electricity comes from a bunch of different sources–nuclear energy, hydro, wind, solar, natural gas, and even some coal.
Katherine Forrest: Right, the exact mix of where the energy comes from depends upon where the hyperscaler is, and what I mean by that is actually geographically, and what it has access to, some of which can be piped in over long stretches and some of which is local.
Scott Caravello: And so to ensure a steady supply of energy, most hyperscalers enter into what are called power purchase agreements, or PPAs.
Katherine Forrest: Right, and the PPAs tell the energy provider, effectively through this contractual provision, how much they're going to have to provide to the hyperscalers over a particular period of time. And that has actually triggered a fair amount of energy build out. So, these PPAs are these long-term energy contracts with the hyperscalers. It gives everybody a sense about what they're going to be needing and then there have actually been build outs to be able to fulfill that.
Scott Caravello: Right, and you know, the power has to keep flowing because if there is no power to the hyperscalers, then it's going to be near impossible to get the intelligence that AI provides.
Katherine Forrest: Okay, so now here's the fun part because, this tells you the ways in which I have fun–apart from going to the cafeteria and getting my like, you know, banana–but let's walk through some energy comparisons and really pause for a moment on why, you know, people say AI uses a lot of energy. First of all, what they're talking about largely, not always, largely, are the hyperscalers and what they're doing to make the AI whiz and whir and do all the phenomenal things that it does. We can start with that AI training and inference can amount to 44 gigawatts of data center demand, while non-AI workloads generally come in around 38.3 gigawatts and that's projected to be 155.5 gigawatts for AI and 63.5 gigawatts for non-AI.
Scott Caravello: So, that's a lot of gigawatts, but let's talk about what that actually means and provide some context as far as how, you know, that actually would play out. So 44 gigawatts is 44 billion watts. And that's continuously, not just once in a while. And so take, for example, the average US home with its TVs, washers, dryers, et cetera. They use 1.2 kilowatts. So that means that 44 gigawatts is, roughly, 44 million homes all the time. And so that's roughly all of the homes in Texas and New York combined. And if we want to just add one more example, New York City has approximately 5–6 gigawatts of continuous demand. LA has 4–5. So 44 gigawatts is like 8–10 New York City's running all the time. And the electrical load just does not sleep. It doesn't dip at night and it doesn't slow much on the weekends either.
Katherine Forrest: And so like another comparison is if you take certain countries, like the country of Denmark uses 4 gigawatts, Ireland takes 6 gigawatts. These are continuous loads again. You know, the amount of energy that is takes to run continuously. So, you know, this is a lot of energy when we're talking about 44 gigawatts.
Scott Caravello: And since I am really enjoying all of these comparisons, let's also put it in the context of energy plants. A large nuclear plant puts out about 1 gigawatt. A large gas plant puts out about 0.5–1 gigawatt. The Hoover Dam puts out 2 gigawatts, going full tilt. So 44 gigawatts is like 44 nuclear reactors.
Katherine Forrest: Well, you know, actually, by the way, can I just tell you, I think that that's a lot. Who knew that the Hoover Dam put out 2 gigawatts, right?
Scott Caravello: Yeah, yeah.
Katherine Forrest: Am I wrong? Like, did you know that the Hoover Dam was like putting out more gigawatts than a nuclear facility?
Scott Caravello: Not before I was getting ready for this episode. Uh, it’s–it's a marvel.
Katherine Forrest: Okay, we did not know this fact! It's a marvel. It's a marvel, right? So when we're talking about by 2030 going to an increase of usage, and it's not going to be all at once, it's going to be incremental happening between now and 2030. We're going to go from these 44 gigawatts to 155 gigawatts, right? So all of our comparisons so far have been at 44 gigawatts. We're going to incrementally, over the next five years, go to 155 gigawatts. That is huge energy usage.
Scott Caravello: Yep, and that is why obtaining energy is such a big deal for AI.
Katherine Forrest: So, there are a lot of geopolitical implications in all of this because, you know, it's going to be a real sort of push to get the energy sources that we need. You know, we've got a lot of solar possibilities. We have a lot of hydro possibilities and we have a lot of you know, fossil fuel possibilities. But this is… it's a lot. All I can say is, wow, now that we know what hyperscalers are and now we know the role they play, it's big stuff.
Scott Caravello: Yep, it is critical stuff.
Katherine Forrest: Yeah. And that's about all the time we have for today. So, those of you who now feel like you understand the word hyperscalers, we encourage you to use it every day, at least once, for five days. Right? If you can fit it in every day once for five days, please message us! And, I'm Katherine Forrest signing off.
Scott Caravello: And I'm Scott Caravello. Make sure to like and subscribe.