VNC performance details

Hi guys! I’ve been using VNC to connect all my computers together on my network, but I noticed that my mac with Vine server is much slower than any of my Linux computers. So I made a network test to compare the performance.

Setup:
Mac server: OSX 10.4.10 with Vine Server 2.2 running on 2.2GHz Core 2 Duo (MA895LL/A) 1440x900
Linux server: Ubuntu 7.04 with vnc4server 4.1.1 running on 2.2GHz Athlon 64 (single core) 1600x1200
Client: Windows XP x64 running TightVNC 1.3.9 running on 2.2GHz Athlon X2 (dual core) 2560x1600
Network: Gigabit ethernet with all computers running with 9000-byte jumbo frames.

The test was to see how long it took for one full screen refresh in raw encoding mode. I ran Wireshark on the client computer then started up the VNC viewer and exited when the whole screen is drawn. Then I read through the Wireshark log to see how long it took to send the whole screen.

What I found is that Vine Server will send exactly 5 lines of the screen very quickly (around 0.2ms), but then it will stall for about 6ms before sending the next batch of 5 lines. Although 6ms doesn’t sound like a lot, since there are 900 lines on the screen, 900 / 5 * 6 = 1080ms, which is over 1 second of delay per frame.

The Linux computer, on the other hand, does not have the same delay, and gets done with 1 frame in about 0.1s. It looks like some of the packets got lost along the way, though… not sure why.

I also did some tests at non-native resolutions and found that the number of lines sent before the delay changed depending on the resolution. I found out that:

  • The number of bytes sent always represents an integral number of lines. That is: mod(bytes_sent, 4 * res_x) = 0
  • The number of bytes sent is always close to, but never above, 32K.

I saved the Wireshark files for both my Mac and Linux computers, in case anybody wants to look at them. They can be opened in Wireshark, Ethereal, and anything that understands .pcap files. The zip file is here:
http://omion.dyndns.org/vnc_results.zip

My observations:

  • It doesn’t seem to be a network issue, since I can easily go over 100MB/s without any problem using other programs between the same two computers
  • It doesn’t seem to be a VNC issue, since vnc4server on my Linux computer doesn’t do the same thing. (The Linux computer has higher resolution and the processor is a bit slower, too)
  • It actually doesn’t seem to be an OSX problem either, as the built-in VNC server does not exhibit exactly the same problem. However, the built-in one does seem to only send 32 lines per framebuffer update, which appears to confuse my client (or something…)

Does anybody know what might be causing this, and more importantly, how to fix it?

Thanks!
Omion

Thanks for the wonderfully detailed information about your observations!

We’re looking into this now to try to understand what might cause the delays that you’re seeing and will let you know once we find something.

Thank you for the very detailed analysis. Looking at that problem here is what we observed:

1- When doing a local test (to an off-screen user) the problem didn’t appear at all, this made us wonder if it was a network issue (but it turned out not to be).
2- When using a stronger encoding the problem seemed to go away over network connections.
3- Analyzing the exact scenario as described with SHARK we were able to see that 98% of the VNCServer CPU time was being used in converting the colors to the format requested by the client.

So generally it’s a pretty specific situation. It happens when there is a mismatch between the native and requested pixel format AND it is greatly exacerbated by using the RAW mode which requires every single pixel on the screen be converted. All of the encodings reduce the number of colors needing to be sent by an order of magnitude and basically make that go away.

So – although the raw mode can sometimes be useful we do NOT recommend it unless your pixel formats match precisely and you are on a fast LAN.

Thanks for the response!

After your response I did another test with Zlib encoding (set to level 1) and I actually noticed a similar pattern. It would send exactly 30000 bytes then wait for a little while. I did a little digging around in the OSXvnc source and I found that UPDATE_BUF_SIZE is exactly 30000. Well, given the fact that the odd behavior I mentioned in the first post can be easily attributed to this buffer (fill up the buffer, send data, fill up again…) I suppose everything’s working correctly. I think my problem, therefore, stems from Vine Server not filling up that buffer as fast as I’d like.

Now, I actually can’t seem to reproduce the colorspace issue that you mentioned. Although I can’t see how much CPU usage is spent converting colorspaces, I don’t see any increase or decrease in CPU usage no matter what the local or requested bit depth is.

In the process of testing out this issue, I actually noticed another performance-eating problem (although not necessarily Vine’s fault). If I have Parallels or VMWare Fusion open, Vine Server will constantly send refreshes of the program’s main window. This results in quite high network and CPU usage: one of my processors gets to around 90%, even if the guest OS is not changing the screen. I assume the problem is that these programs tell OSX that the window is changing all the time (even if it isn’t) and therefore Vine decides it needs to send the data to the client…

I suppose my problem (and this entire thread) boils down to the fact that it’s just not very fast - here is the time to draw the entire 1440x900 screen, color spaces matching, measured with wireshark from the first data packet to the last:
Raw: 1.546s (with incredible accuracy)
Hextile: 4.5s
Zlib 1: 1.9s
Zlib 9: 2.1s
CoRRE: 4.1s

Are these numbers normal? It seems kinda… slow. Luckily the whole screen does not need to be redrawn all that much, but with VMWare open (see above) 0.65fps is pretty much unusable.

The 30K is definitely from the buffer size, but as you note it’s really the filling up the buffer that is the slowdown. In our testing increasing the buffer size slowed things down a lot, decreasing it didn’t help too much either (obviously you get individual chunks faster but the overall time wasn’t helped much as you are doing more chunk handling overhead).

If you are truly able to match the color space it should be much faster in RAW; that can prove to be a bit challenging though. Vine Viewer will allow you to get the data from the server in native format but then it has to turn around and convert it to the way it wants to draw on the local screen (so someone is still doing a lot of byte conversion). I’m not sure if the Windows or LINUX clients are able to draw to the screen in an arbitrary color scheme or not (possibly with some low level support for the color scheme). Keep in mind that JUST matching the bit-depth isn’t the same, it depends also on endianness of the buffer and where the alpha bit is stored. The “native” format for the Vine Server on a 10.4 Intel machines outputs in the log to be:

32 bpp, depth 24, little endian
true colour: max r 255 g 255 b 255
shift r 16 g 8 b 0

You’re timings don’t look too far off for 1440x900 at 32-bit although the best encoding that we have found is ZlibHextile and just behind that ZRLE. And of course using the 8-bit if you are looking for better performance.

It’s not all JUST the OSXvnc-server though, there are sooo many colors on the Mac OS X screen that it is pretty slow to send via bitmap. You mentioned that you didn’t see the initial problems using the “built-in VNC server”, but I’m curious if you are able to get better overall performance with that (from our observations it isn’t better but actually worse performance wise) but if you are seeing situations where it performs better then maybe we can build on those and figure out how to improve our performance. I’m very hopeful too. Maybe it’s just time to write some SSE vector stuff to convert all those bytes…

I realized the issue with the endianness right after I posted my last reply, so I looked through the network dumps. My client (tightVNC) responds with exactly the same pixel format as the server reports. The string the client requests is “2018000100ff00ff00ff100800000000”, which does represent the data in your post.

I did other tests with timing, and it looks like 8-bit is actually slower. Here are a few results:
Raw: 2.1s
Zlib 1: 2.2s

About the built-in server… I have no idea what it’s doing. Set at raw, the full-screen refreshes are faster than OSXvnc (1.37-1.40 seconds). However, moving around windows in the screen would take anywhere from 1 to 3 seconds. After many tests it seems to have gotten faster, and now it looks like it is faster than OSXvnc… I don’t know why it changed performance like that.

My subjective experience is that, although the built-in server transfers data faster, it seems to lag more. Like the throughput is better, but the latency is worse. It’s hard to quantify…