Making the Case for XFree86’s Speed

Being a BeOS user (a purely desktop system) and because I code under Linux, I see XFree86 (v4.1 on my machine) as a user and as a developper. And this is where the problem lies. My Gnome or KDE desktops are slow in comparison with other operating systems, but XFree86, the ‘engine’ behind these desktops, proves me that it’s not. Let’s look at what I have in front of me: a dual Pentium III at 933Mhz with 512MB of memory, a Radeon 32 AIW, a modified Mandrake 8.0 powered by kernel 2.4.18.

Editor’s notice: Guillaume is not a native english speaker, so please excuse any grammar mistakes.


1. First approach of XFree86


My experience as a user is that this user interface is very bad speed-wise, sometime menus or buttons are slow to react, the refresh of the windows is a disaster (regardless if the apps are Qt or GTK+) as I can see too many ‘white areas’ because the toolkits do “refresh” parts of the windows too often. Should I reboot under BeOS and relax away from these problems? No, I will see what I can do with that with ‘C code’ as I am a software engineer, I have to evaluate XFree86 before formating my Linux partition. I did evaluated it, 15 months has passed and that was the time when I started BlueEyedOS. This disk was never formated. Linux was there to stay. Let’s see why.


2. From the coder point of view


Let’s start with the API and general concept, XFree86 is mainly a server (the X Server), it deals with gfx drivers and input drivers. When you want to use its functionality in your program, you use the Xlib library. That’s simple in theory. But in practice, it’s not as easy, as the API is not easy to memorize and the semantic is very abstract for people who only want to simple things, eg. to draw a red line on the screen.


2.1 Abstraction


All seems to be ‘perfect’ when using Xlib becasuse you don’t need to bother about the colorspace and conversion, all is handled automatically by the system. If the server is not on the machine which runs your program, you don’t have to worry about it, all the information are send through the network, it’s transparent.


2.2. Modularity


With XFree4, all is modular, the server loads and uses only the needed drivers. If you need new functions, you can create a new extension. All this sounds good, but the dream stops here.


2.3. Performance issues


– all the communication is socket based (even if your machine runs the client and the server)
– useless conversion. Colors are defined with 3 fields (red,green,blue), 16bits by component. If you need a ‘pure red’ color, you will have to write:
XColor color;
color.red=65535;
color.green=0;
color.blue=0;
It implies 2 conversions, 1 from the client, which converts 24bits data to 3*16bits data and send them to the server. The second conversion is done by the server, which convert them to my 32bits display…
– Lack of transparency support
You simply can’t use transparency without coding it yourself, the X Rendering Extension is supposed to solve this issue, but it’s still quite “unusable”.


3 Let’s use it as it should be for high performance


To test a small ‘proof of concept‘ I did months ago, you will need a 1024×768 32bits display running under XFree. You will see that you can smoothly move windows around your desktop, with a cpu consumtion very-very low and a fast responsivness. How did I do it? I minimized the impact of the flaws of the design, which means:
– to reduce the communication between the server and the client
– to use the memory of the gfx card memory
– to use a minimal set of X11 functions
– to redraw only what is needed


Here, we have to define 3 ‘spaces’:
– the framebuffer (FB) which is the part of the memory of the gfx card which is drawn on the screen
– the memory of the gfx card(GRAM), used to store bitmap
– the main memory (RAM) (in the client adress space)
XFree86 provides an ‘object’ called “Pixmap” which is a bitmap that can be stored in the RAM or in the GRAM, it provides a big acceleration when you need to draw bitmap but needs to be used carefully, because you don’t control the memory management in GRAM.


3.1 Let’s benchmark!


XFree86 provides a tool called ‘x11perf’, on my computer the XServer can:
– draw 18000 lines/s
– draw 30000 filled rectangle/s
– copy/blit 1250 (1500 using the SHM extension) 100×100 bitmaps/s from RAM to FB (or RAM to GRAM)
– copy/blit 24000 100×100 bitmaps/s from GRAM to FB (or GRAM to GRAM)


If we consider that something fast & smooth is about 25frame/s, we can only draw by frame, if we want to stay ‘fast’:
– 720 lines
– 1200 filled rectangles
– 50 (or 60) copys of 100×100 bitmaps
– 960 blit from GRAM to FB
Because a modern interface is not composed of filled rectangles, the only way to keep have something nice is to blit as much as possible. But it’s not so easy!


Let’s try another benchmark, the goal is to create a fade (from black to white) in 256 increments, the fade is done twice. It means that for 25fps, the test must end in less than 20s.
The first colums show the time it took when running 2 tests at the same timen, the second column show the result when only one test is executed.

The first test, it uses the the XFillRectangle function to create the fade. The second one, draw the filled rectangles point by point…
The third one, draw the filled rectangles line by line…
The fourth one, create the rectangles in RAM, tranfert to GRAM and blit from GRAM to FB.
The last one, create the rectangles in RAM, and blit from RAM to FB.


With 2 tests in parallel | 1 test for 512 operations



——— 320×256 —————
Filling the window with XFillRectangle
1 : 0 s 120 ms | 0 s 78 ms
2 : 0 s 79 ms |

Filling the window with XDrawPoint
1 : 7 s 329 ms | 3 s 724 ms
2 : 7 s 120 ms |

Filling the window with XDrawLine
1 : 6 s 761 ms | 3 s 245 ms
2 : 5 s 427 ms

Filling the window with XPutImage+XCopyArea
1 : 9 s 814 ms | 6 s 283 ms
2 : 10 s 104

Filling the window with XPutImage
1 : 10 s 485 ms | 6 s 284 ms
2: 10 s 446 ms

——– Same in 1047×768 ———
Filling the window with XFillRectangle
1 : 0 s 365 ms | 0 s 128 ms
2 : 0 s 270 ms

Filling the window with XDrawPoint
1 : 57 s 35 ms | 32 s 3 ms
2 : 51 s 62 ms

Filling the window with XDrawLine
1 : 7 s 270 ms | 5 s 307 ms
2 : 6 s 138 ms

Filling the window with XPutImage+XCopyArea
1 : 80 s 77 ms | 56 s 28 ms
2 : 93 s 469 ms

Filling the window with XPutImage
1 : 53 s 223 ms | 37 s 416 ms
2 : 52 s 684 ms



Conclusion, on 320×256, my computer competes with my Amiga 500 (14Mhz + 1MB of memory).


Now, with a fullscreen test, you will see that only XDrawLine is efficient, no surprise, it uses less bandwidth than the transfer from RAM to GRAM. Should we conclude that the XCopyArea + XPutXImage are not good performers? Surely not, because in 80% of the case, you work with the same bitmap (typically, the icon of your preferred app), then to draw it on the screen, a copy from GRAM to FB is enough (XCopyArea).


However, the figures in the first columns don’t make me very optimistic and really shows the limits of the Xserver in a multithreaded
environment. It needs improvements.

4. Interesting improvements


An interesting extension of XFree is the SHM one, it provides a new API (close to the XPutImage one) to transfer bitmap by using shared memory between the Xserver and the client. The gain is about 20% on ‘my’ typical use case, not bad! The ‘new’ extension called ‘XRender’ is an extension to XFree86 that lets applications perform complex blending and transparency operations. Functionality is interesting, but performance are not there, even a small semi transparent window (100×100) is slow. We need something faster and simpler.


It’s a bit of a parodox, I played with hardware accelerated OpenGL, it seems that my card can blit 32bits bitmap with transparency as fast than XFree blits non transparent bitmap… what’s more, an other lack I didn’t mentioned before, seems to be filled on the 3D part : synchronization with the refresh rate of my screen. Where is the magical function “WaitForVerticalBlank()”? It could increase the rendering quality and the feeling of nice scrolling, MacOSX use it, why not XFree?


I found many answers in the source code of my driver (and its DRI part), it looks like drivers could support acceleration for 32 bit data, but it’s not used or masked because the X server don’t use it, I’m interested to hear the reasons of this situation, IMHO the drivers MUST support transparency EVEN IF today no extension use it.


5. What can be done to improve it


A lot of thing can be improved, people always suggest the abandon of XFree, what I suggest is something more realistic (doable and efficient):


5.1. Simplify


The X11 functions are too much complicated, too much overhead and potential mistakes to be made. Developers need basic and efficient APIs. Some developers tried to solve this issue by creating a wrapper but it only adds limitations and decrease the execution efficiency. Modern hardware supports 32bits display, all we need is a new extension, which only works with 32bits local display allowing to
– create 32bits window window_id id=CreateWindow(int width,int height); and human readable functions like SetTitle(char*), SetSize(int with,int height), SetPosition(int x,int y), Show(), Hide()…
– create 32bits bitmaps(native ARGB format of the card) (always strored on shared memory) bitmap_id id=CreateBitmap(int width,int height);


– explicitly define functions like
bool StoreBitmapInGFXRAM(bitmap_id);
bool RemoveBitmapFromGFXRAM(bitmap_id);
void BlitToWindow(bitmap_id src, window_id dest, …)
void BlitToBitmap(bitmap_id src, bitmap_id dest, …)
void WaitForVerticalBlank()


5.2.Make it faster


This interesting extension should have a fast communication between the server and the client, by avoiding encoding/decoding and format conversion and using fast IPC like a shared memory. (the 3D part of XFree uses DRI which use this kind of shortcuts).


5.3.A typical use case, a window manager.


Let’s change a bit the subject and talk about the window managers (WM), most of us use a WM. A WM is a process (a X11 client as a standard graphical app) which deals with XFree86 for windows operations like moving windows, drawing the borders of the windows, manage workspaces etc. It’s the most used X11 application, and the one to blame for the slow refreshes we get on our desktops. Technicaly, when you move a window by draging its title bar, the WM will ask the Xserver to move the window, it will generate ‘redraw events’ (called ‘ExposeEvents’) to all the windows behind the moving one. Later all this windows will redraw themselves by using X11 functions (which implies to send message to the Xserver).


In practice: let’s start with an example: a screen with 10 windows.
1. I click on my preferred window and move it
2. the first ‘step’ of the move will:
– ask the Xserver to move the window
– 5 windows are partly covered, so 5 windows receive an ExposeEvent these 5 windows redraws the needed part. A ‘standard’ part of a nice UI need to send more than 100 drawing requests (line, rect, font…) to the Xserver. If my GnomeCalc is right, it’s about 500 requests at every ‘step’, if I want to be have the more smooth desktop on earth on my 100Hz screen, 500000 requests must be swallowed per second…. good luck, it’s not going to happen! Remember, it was ‘just’ to move a window and it already consume 100% of my CPU.


5.4. Make things faster, all the time…


If I look at my benchmarks and what was described as ‘bad’ on Xfree, a good solution could be to use the memory of my GFX card, by sacrifying 8MB, you can put a built-in window manager IN the X server (which plays with bitmap blitting functions to give you an impressive result).


5.6. Don’t re-invent the wheel


XFree already has a big potential, let’s improve it by creating the most efficient API, who cares about slowing a bit the Xlib functions if Xfree can provide something that outperform the old standard?


When working on the graphical rendering of B.E.OS, I never modify XFree86, but I use as much as possible of what is good on it, I hope that the result will seduce you and will push the development of a ‘better extension’.


6. It’s time to conclude


Yes, XFree86 is fast, just add the appropriate extension in order to use ‘shortcuts’ (as explained before) and you will have something performing fast, which can be used both with the standard X11 API and the proposed one.


About the Author:
Guillaume is a software engineer, who started to write his first line of code at 7, now he is 26… Amstrad, Amiga and a x86 PC under
BeOS are his preffered digital environements. After working at Philips on the MHP technology, he created his own company. Even with his spare
time drasticly decreased, he continues to find the time to work on exciting projects, like B.E.OS.

98 Comments

  1. 2002-10-10 6:03 am
  2. 2002-10-10 6:06 am
  3. 2002-10-10 6:08 am
  4. 2002-10-10 6:10 am
  5. 2002-10-10 6:15 am
  6. 2002-10-10 6:19 am
  7. 2002-10-10 6:24 am
  8. 2002-10-10 6:25 am
  9. 2002-10-10 6:27 am
  10. 2002-10-10 6:31 am
  11. 2002-10-10 6:36 am
  12. 2002-10-10 6:40 am
  13. 2002-10-10 6:47 am
  14. 2002-10-10 6:52 am
  15. 2002-10-10 6:53 am
  16. 2002-10-10 6:57 am
  17. 2002-10-10 6:58 am
  18. 2002-10-10 7:11 am
  19. 2002-10-10 7:12 am
  20. 2002-10-10 7:13 am
  21. 2002-10-10 7:41 am
  22. 2002-10-10 7:45 am
  23. 2002-10-10 7:47 am
  24. 2002-10-10 8:14 am
  25. 2002-10-10 8:15 am
  26. 2002-10-10 8:16 am
  27. 2002-10-10 8:20 am
  28. 2002-10-10 8:24 am
  29. 2002-10-10 8:31 am
  30. 2002-10-10 8:35 am
  31. 2002-10-10 8:45 am
  32. 2002-10-10 9:42 am
  33. 2002-10-10 9:47 am
  34. 2002-10-10 10:01 am
  35. 2002-10-10 10:07 am
  36. 2002-10-10 10:54 am
  37. 2002-10-10 10:55 am
  38. 2002-10-10 10:59 am
  39. 2002-10-10 11:07 am
  40. 2002-10-10 11:19 am
  41. 2002-10-10 11:46 am
  42. 2002-10-10 11:49 am
  43. 2002-10-10 11:54 am
  44. 2002-10-10 12:47 pm
  45. 2002-10-10 12:47 pm
  46. 2002-10-10 12:53 pm
  47. 2002-10-10 12:59 pm
  48. 2002-10-10 1:00 pm
  49. 2002-10-10 1:11 pm
  50. 2002-10-10 1:22 pm
  51. 2002-10-10 1:31 pm
  52. 2002-10-10 1:37 pm
  53. 2002-10-10 1:44 pm
  54. 2002-10-10 1:59 pm
  55. 2002-10-10 2:25 pm
  56. 2002-10-10 2:25 pm
  57. 2002-10-10 2:51 pm
  58. 2002-10-10 2:57 pm
  59. 2002-10-10 3:06 pm
  60. 2002-10-10 3:10 pm
  61. 2002-10-10 3:11 pm
  62. 2002-10-10 3:11 pm
  63. 2002-10-10 3:12 pm
  64. 2002-10-10 3:12 pm
  65. 2002-10-10 3:36 pm
  66. 2002-10-10 3:44 pm
  67. 2002-10-10 3:56 pm
  68. 2002-10-10 4:03 pm
  69. 2002-10-10 4:09 pm
  70. 2002-10-10 4:13 pm
  71. 2002-10-10 4:23 pm
  72. 2002-10-10 4:40 pm
  73. 2002-10-10 4:58 pm
  74. 2002-10-10 5:02 pm
  75. 2002-10-10 5:06 pm
  76. 2002-10-10 5:11 pm
  77. 2002-10-10 5:15 pm
  78. 2002-10-10 5:23 pm
  79. 2002-10-10 5:36 pm
  80. 2002-10-10 6:12 pm
  81. 2002-10-10 6:13 pm
  82. 2002-10-10 6:29 pm
  83. 2002-10-10 6:30 pm
  • 2002-10-10 6:55 pm
  • 2002-10-10 6:56 pm
  • 2002-10-10 7:08 pm
  • 2002-10-10 7:15 pm
  • 2002-10-10 8:55 pm
  • 2002-10-10 10:15 pm
  • 2002-10-10 10:38 pm
  • 2002-10-11 12:44 am
  • 2002-10-11 2:03 am
  • 2002-10-11 5:07 am
  • 2002-10-11 5:49 am
  • 2002-10-11 9:53 am
  • 2002-10-11 4:28 pm
  • 2002-10-11 5:10 pm
  • 2002-10-11 8:01 pm