Virtual GPU Latency - Part 2

Check out Part 1 of this series here.

What might make technology like Juice latent?

  1. Poor Compression/Caching
  2. Low Bandwidth
  3. High API conversion/serialization overhead
  4. Long Distance

Poor Compression/Caching

Textures, matrices, image data, meshes and commands are all data that need to be transmitted between the Juice client and server. Poor compression will result in longer transmission times, particularly if the data is dynamic, random or simply uncompressible. Longer transmissions/acknowledgements lead to higher latency.

With many graphics and machine learning (ML) workloads, the vast majority of data is extremely compressible if it’s known what’s being transmitted. This is to say if one knows meshes are being transmitted, they can be compressed far more efficiently than with just a general purpose compression algorithm. The same is true for all the other resource usages in the modern graphics pipeline.

Ultimately, a great deal is cacheable both for graphics and compute workloads, resulting in 1000:1 compression ratios in many instances - though 10:1 to 20:1 is more typical. Thus Juice is able to use a single gigabit connection to roughly approximate the bandwidth afforded by a PCIe connection.

Low Bandwidth

Even with optimal compression, upload bandwidth is vital to Juice. This tends to be asymmetrical on home networks connected to the Internet - often exceptionally so - with download speeds of a gigabit paired with upload speeds around 30 megabits. For efficient Juice operation, roughly 150 megabits up and 35 megabits down are necessary in many cases for high frame-rate operation.

This is different from the now defunct Stadia which had extremely modest upload bandwidth requirements and the same download bandwidth requirements. The difference is due to the amount of frame-to-frame commands needed to produce an individual frame. With Stadia, keystrokes and joystick updates only needed to be serialized. With Juice, everything required to serialize a frame is transmitted, and this includes drawing commands and texture resources. Surprisingly perhaps, what aren’t transmitted with Juice are keystrokes and joystick updates.

High API conversion/serialization overhead

This is part of Juice’s special sauce, with serialization of graphics commands unbelievably fast and highly compressible.

Serialization libraries today are extremely efficient, and Juice is no exception. The overhead of command buffer serialization is minimal and fully threaded, providing almost no overhead compared to a local graphics driver.

Long Distance

The speed of light holds the final say on how latent Juice can be under ideal conditions. There’s probably no universe where astronauts on Mars will be using GPUs on Earth. Transmission latency in that case is anywhere from 5 to 20 minutes.

More realistically, for graphics workloads, 3 frames of queue ahead provides enough pipelining for smooth operation. At 60 fps, that comes out to about 48 milliseconds of time. Meaning that with 3 frames of queue-ahead, Juice can tolerate distances up to 5,500 miles in fiber optic and still maintain 60 fps.

For ML workloads, the latency tolerance is much higher, and depending on the workload, probably not noticeable.

For further information, check out our Discord here

Dean Beeler
Making GPU easy, anywhere @ Juice Labs