A common question when asked about Juice’s functionality is, “how is the latency?”
It’s a very valid question and central to an effective virtual GPU-over-IP. There are endless sources of latency here. From compression, decompression, actual GPU latency, transmission latency, on and on. It’s a gauntlet.
For Juice, today we use TCP. This is because GPU commands must be processed in order and any restart of the stream will almost certainly produce incorrect results on the GPU.
TCP is an ancient protocol, created by DARPA in 1981. Like so many successful technological endeavors, its success wasn’t predicted from the start. And so with it came workarounds and designs tailored to network designs long gone.
Anyway, in 2022 and with Juice, we need to move 100’s of megabits rapidly to smoothly process GPU instructions at near native speeds. Stalls will compound the delays. So at Juice we’ve put in a good deal of effort to properly tailoring the socket itself for high performance.
The following are a couple of notes from this exploration:
Nagle’s Algorithm: A true black magic classic. Before the release, I had (naively) disabled it, presuming that we’d want packets to go on the wire the moment they’re sent from the software. Because TCP in turn requires ACK’s on those packets, with Nagle turned off, OS and network traffic can increase substantially. Nagle is good for terminals; low-latency virtual GPUs, not so much.
Send/Recv Buffers: I had also naively assumed that larger buffers mean less waiting at send and recv calls in blocking mode and that ultimately lower latency. This article (and others), however, goes into detail: https://community.f5.com/t5/technical-articles/the-tcp-send-buffer-in-depth/ta-p/290760 Even more interesting is that setting these options has different behavior and limitations per the operating system in question. E.g. Windows allows for arbitrary buffer sizes while Linux sets these at an OS level. So far, I haven’t identified the pattern of increasing and lowering window sizes used for the TCP packets, no doubt it is OS and network and bandwidth dependent. This is all to say it’s vitally important that we transmit the most amount of data with the fewest acknowledgements, both at the TCP/IP layers and our Juice protocol itself.