Editor's notice: Guillaume is not a native english speaker, so please excuse any grammar mistakes.
1. First approach of XFree86
My experience as a user is that this user interface is very bad speed-wise, sometime menus or buttons are slow to react, the refresh of the windows is a disaster (regardless if the apps are Qt or GTK+) as I can see too many 'white areas' because the toolkits do "refresh" parts of the windows too often. Should I reboot under BeOS and relax away from these problems? No, I will see what I can do with that with 'C code' as I am a software engineer, I have to evaluate XFree86 before formating my Linux partition. I did evaluated it, 15 months has passed and that was the time when I started BlueEyedOS. This disk was never formated. Linux was there to stay. Let's see why.
2. From the coder point of view
Let's start with the API and general concept, XFree86 is mainly a server (the X Server), it deals with gfx drivers and input drivers. When you want to use its functionality in your program, you use the Xlib library. That's simple in theory. But in practice, it's not as easy, as the API is not easy to memorize and the semantic is very abstract for people who only want to simple things, eg. to draw a red line on the screen.
2.1 Abstraction
All seems to be 'perfect' when using Xlib becasuse you don't need to bother about the colorspace and conversion, all is handled automatically by the system. If the server is not on the machine which runs your program, you don't have to worry about it, all the information are send through the network, it's transparent.
2.2. Modularity
With XFree4, all is modular, the server loads and uses only the needed drivers. If you need new functions, you can create a new extension. All this sounds good, but the dream stops here.
2.3. Performance issues
- all the communication is socket based (even if your machine runs the client and the server)
- useless conversion. Colors are defined with 3 fields (red,green,blue), 16bits by component. If you need a 'pure red' color, you will have to write:
XColor color;
color.red=65535;
color.green=0;
color.blue=0;
It implies 2 conversions, 1 from the client, which converts 24bits data to 3*16bits data and send them to the server. The second conversion is done by the server, which convert them to my 32bits display...
- Lack of transparency support
You simply can't use transparency without coding it yourself, the X Rendering Extension is supposed to solve this issue, but it's still quite "unusable".
3 Let's use it as it should be for high performance
To test a small 'proof of concept' I did months ago, you will need a 1024x768 32bits display running under XFree. You will see that you can smoothly move windows around your desktop, with a cpu consumtion very-very low and a fast responsivness. How did I do it? I minimized the impact of the flaws of the design, which means:
- to reduce the communication between the server and the client
- to use the memory of the gfx card memory
- to use a minimal set of X11 functions
- to redraw only what is needed
Here, we have to define 3 'spaces':
- the framebuffer (FB) which is the part of the memory of the gfx card which is drawn on the screen
- the memory of the gfx card(GRAM), used to store bitmap
- the main memory (RAM) (in the client adress space)
XFree86 provides an 'object' called "Pixmap" which is a bitmap that can be stored in the RAM or in the GRAM, it provides a big acceleration when you need to draw bitmap but needs to be used carefully, because you don't control the memory management in GRAM.
3.1 Let's benchmark!
XFree86 provides a tool called 'x11perf', on my computer the XServer can:
- draw 18000 lines/s
- draw 30000 filled rectangle/s
- copy/blit 1250 (1500 using the SHM extension) 100x100 bitmaps/s from RAM to FB (or RAM to GRAM)
- copy/blit 24000 100x100 bitmaps/s from GRAM to FB (or GRAM to GRAM)
If we consider that something fast & smooth is about 25frame/s, we can only draw by frame, if we want to stay 'fast':
- 720 lines
- 1200 filled rectangles
- 50 (or 60) copys of 100x100 bitmaps
- 960 blit from GRAM to FB
Because a modern interface is not composed of filled rectangles, the only way to keep have something nice is to blit as much as possible. But it's not so easy!
Let's try another benchmark, the goal is to create a fade (from black to white) in 256 increments, the fade is done twice. It means that for 25fps, the test must end in less than 20s.
The first colums show the time it took when running 2 tests at the same timen, the second column show the result when only one test is executed.
The first test, it uses the the XFillRectangle function to create the fade. The second one, draw the filled rectangles point by point...
The third one, draw the filled rectangles line by line...
The fourth one, create the rectangles in RAM, tranfert to GRAM and blit from GRAM to FB.
The last one, create the rectangles in RAM, and blit from RAM to FB.
With 2 tests in parallel | 1 test for 512 operations
--------- 320x256 --------------- Filling the window with XFillRectangle 1 : 0 s 120 ms | 0 s 78 ms 2 : 0 s 79 ms | Filling the window with XDrawPoint 1 : 7 s 329 ms | 3 s 724 ms 2 : 7 s 120 ms | Filling the window with XDrawLine 1 : 6 s 761 ms | 3 s 245 ms 2 : 5 s 427 ms Filling the window with XPutImage+XCopyArea 1 : 9 s 814 ms | 6 s 283 ms 2 : 10 s 104 Filling the window with XPutImage 1 : 10 s 485 ms | 6 s 284 ms 2: 10 s 446 ms -------- Same in 1047x768 --------- Filling the window with XFillRectangle 1 : 0 s 365 ms | 0 s 128 ms 2 : 0 s 270 ms Filling the window with XDrawPoint 1 : 57 s 35 ms | 32 s 3 ms 2 : 51 s 62 ms Filling the window with XDrawLine 1 : 7 s 270 ms | 5 s 307 ms 2 : 6 s 138 ms Filling the window with XPutImage+XCopyArea 1 : 80 s 77 ms | 56 s 28 ms 2 : 93 s 469 ms Filling the window with XPutImage 1 : 53 s 223 ms | 37 s 416 ms 2 : 52 s 684 ms
Conclusion, on 320x256, my computer competes with my Amiga 500 (14Mhz + 1MB of memory).
Now, with a fullscreen test, you will see that only XDrawLine is efficient, no surprise, it uses less bandwidth than the transfer from RAM to GRAM. Should we conclude that the XCopyArea + XPutXImage are not good performers? Surely not, because in 80% of the case, you work with the same bitmap (typically, the icon of your preferred app), then to draw it on the screen, a copy from GRAM to FB is enough (XCopyArea).
However, the figures in the first columns don't make me very optimistic and really shows the limits of the Xserver in a multithreaded environment. It needs improvements.
- "Understanding how XFree86 works"
- "Optimizing your XFree86 code"



