The $250B problem of GPU scarcity

This article was also featured on CodeX.

Accelerated computing is a painful expense — while at least 80% of GPU capacity goes unused. There is hope on the horizon.

GPU hardware increasingly drives business value in almost all major industries, whether based on the graphics horsepower GPUs provide (AAA game development, visual effects & animation, architecture and engineering design, medical imaging) or on their computing muscle (all the many applications of ML/AI and HPC).

Supply crunches and increased demand have combined to drive up the price of GPUs — even used ones — to astounding new heights.

So GPU capacity is more precious than ever, and nobody (especially markets) likes to see a precious resource go to waste. But a great majority of this resource is, in fact, wasted through underutilization — available estimates hover around 85%, but let’s be a little more conservative and use a figure of 80% wasted capacity for this exercise.

Tallying the cost of wasted GPU capacity

Qualitatively, we know that GPU is scarce and expensive. Depending on the industry, anyone from developers and IT personnel up to C-level leadership is keenly aware that increasing an organization’s accelerated computing capacity using GPUs can be very heavy on the pocketbook.

However, the cost comes not just from buying or leasing on-prem hardware, or from writing sizeable checks to public cloud providers like AWS. A substantial concern for many leaders is the human cost — like racking (and maintaining) servers, deployment and ongoing management of tooling and allocation systems, tracking and forecasting usage, etc.

For now though, let’s focus just on the hardware cost, and try to take a stab at a quantitative tally.

Arriving at a financial accounting for the GPU capacity that’s being “left on the table” requires us to estimate the total dollar value of all the GPU hardware in the world. My team and I tried several ways of getting to this, and were generally let down by the paucity of freely-available data — for example, it’s easier to find large-scale information about consumer GPUs than enterprise — but we’re OK with a decent approximation here, and feel like we landed on one.

Our inputs and assumptions:

We’re looking only at discrete GPUs, not integrated.
We can use the median/mode GPU out in the world to represent FLOPs-wise performance of the average GPU (again being conservative, we ignored the probability that we should raise that to account for enterprise GPUs being generally higher-performing).
The median/mode GPU out in the world is a GTX 1060.
1060s are selling for ~$500(!) today.
The average life of a GPU is 3 years — which led us to a total of 125M discrete GPUs in operation in some machine, somewhere.

Then,

125M GPUs * $500/GPU = $62.5B worth of GPUs

Now — we said that these GPUs are only being utilized at 20% of their capacity. The total capacity of a 1060 is 4.4 teraflops, giving us global capacity of 550 exaflops when all 125M of them are put together. But only 20% of this capacity is being used — 110 exaflops.

The world is willing to pay $62.5B for 110 exaflops of GPU capacity.

So the value of the 440 exaflops of already manufactured and installed — but unused — GPU capacity is 4 times that: an impressive $250 billion.

The largest goldmine in the world contains $50 billion worth of gold that will take 70 years to extract. We’re sitting on 5 of those, and we can unlock the value in the next few years.

We’re sitting on “4400 tons of gold”. Photo by Jingming Pan on Unsplash — We're sitting on "4400 tons of gold". Photo by Jingming Pan on Unsplash

‍

‍Why unlocking the hidden value in GPU capacity matters

Even if you think our estimate might be off by 2, 3… 5x? we are still talking about a massive inefficiency. A step-change solution that addresses that inefficiency and unlocks this value is certainly interesting from a pure financial perspective.

Even more interesting (and concerning) is the opportunity cost — the lost breakthrough potential— that our most exciting, cutting-edge businesses and industries are silently laboring under. The current premium on GPU acceleration forces painful trade-offs that inevitably throttle innovation.

Imagine a world where these trade-offs are 5x less painful, where accelerated computing is plentiful and easily available, where the critical resource behind breakthrough technologies in climate modeling, drug discovery, autonomous vehicles, machine translation, materials science, computer security, medical diagnosis, etc. becomes a driving factor instead of a limiting factor?

As mentioned above, we must also consider the more immediate down-to-earth human cost of the status quo — not only the hours spent carefully managing precious GPU capacity, but the risk to employee retention that comes with constant workarounds and long queues for the basic resources that our “top minds” need in order to live up to their full personal potential.

Hope on the horizon

When (not if) the general solution comes that unlocks the full GPU capacity within any deployment environment, the benefits will be so apparent that adoption will come quickly. The solution will rapidly be considered an obvious part of any accelerated stack.

We believe that Remote GPU (rGPU for short) — the ability for GPU-hungry applications to easily access underused GPU capacity across a network — is the key to this accelerated computing future… and we’re building our solution accordingly.

‍

Dean Beeler

Making GPU easy, anywhere @ Juice Labs

min read

Juice performance A100 Bare metal

Explore groundbreaking GPU-over-IP technology and exceptional benchmarks for models like BERT, ResNet, YOLO, and LLaMa showcasing near-native performance, over standard network. Optimize your infrastructure without compromising on performance.

min read

Juice: Giving Network Service Providers the Edge

How does Juice help NSPs beat cloud giants on GPU-as-a-Service at the AI Edge?

min read

A critical, powerful, precious resource sits mostly idle. Why?