Keith Packard: X has been targeted at systems with high performance graphics processors for a long time. SGI was one of the first members of the MIT X consortium and shipped X11 on machines of that era (1988). Those machines looked a lot like todays PCs -- fast processors, faster graphics chips and a relatively slow interconnect. The streaming nature of the X protocol provides for easy optimizations that decouple graphics engine execution from protocol decoding.
And, as a window system X has done remarkably well; the open source nature of the project permitted some friendly competition during early X11 development that improved the performance of basic windowing operations (moving, resizing, creating, etc) so that they were more limited by the graphics processor and less by the CPU. As performance has shifted towards faster graphics processors, this has allowed the overall system performance to scale along with those.
Where X has not done nearly as well is in following the lead of application developers. When just getting pixels on the screen was a major endeavor, X offered a reasonable match for application expectations. But, with machine performance now permitting serious eye-candy, the window system has not expanded to link application requirements with graphics card capabilities. This has left X looking dated and shabby as applications either restrict themselves to the capabilities of the core protocol or work around these limitations by performing more and more rendering with the CPU in the application's address space.
Extended the core protocol with new rendering systems (like OpenGL and Render) allows applications to connect to the vast performance offered by the graphics card. The trick now will be to make them both pervasive (especially OpenGL) and hardware accelerated (or at least optimize the software implementation).
Rayiner Hashem: Jim Gettys mentioned in one of your presentations that a major change from W to X was a switch from structured to immediate mode graphics. > However, the recent push towards vector graphics seems to indicate a return of structured graphics systems. DisplayPDF and XAML, in particular, seem particularly well-suited to a structured API. Do you see the X protocol evolving (either directly or through extensions) to better support structured graphics?
Keith Packard: So far, immediate mode graphics seem to provide the performance and capabilities necessary for modern graphics. We've already been through a structured-vs-immediate graphics war in X when PHIGS lost out to OpenGL. That taught us all some important lessons and we'll have to see some compelling evidence to counter those painful scars. Immediate graphics are always going to be needed by applications which don't fit the structured model well, so the key is to make sure those are fast enough to avoid the need to introduce a huge new pile of mechanism just for a few applications which might run marginally faster.
Rayiner Hashem: What impact does the compositing abilities of the new X server have on memory usage? Are there any plans to implement a compression mechanism for idle window buffers to reduce the requirements?
Keith Packard: Oh, it's pretty harsh. Every top level window has its complete contents stored within the server while mapped, plus there are additional temporary buffers needed to double-buffer screen updates.
If memory does become an issue, there are several possible directions to explore:
+ Limit saved window contents to those within the screen boundary, this will avoid huge memory usage for unusualy large windows.
+ Discard idle window buffers, reallocating them when needed and causing some display artifacts. Note that 'idle' doesn't just mean 'not being drawn to', as overlying translucent effects require saved window contents to repaint them, so the number of truely idle windows in the system may be too small to justify any effort here.
+ Turning off redirection when memory is tight. One of the features about building all of this mechanism on top of a window system which does provide for direct-to-screen window display is that we can automatically revert to that mode where necessary and keep running, albeit with limited eye-candy.
One thing I have noticed is a sudden interest in video cards with *lots* of memory. GL uses video memory mostly for simple things like textures for which it is feasible to use AGP memory. However, Composite is busy drawing to those off-screen areas, and it really won't work well to try and move those objects into AGP space. My current laptop used to have plenty of video memory (4meg), but now I'm constantly thrashing things in and out of that space trying to keep the display updated.
Preliminary Exposé-like functionality on the new X Server
(530 KB .png, faster loading 240 KB .jpg here)
Rayiner Hashem: What impact does the design of the new server have on performance? The new X server is different from Apple's implementation because the server still does all the drawing, while in Apple's system, the clients draw directly to the window buffers. Do you see this becoming a bottleneck, especially with complex vector graphics like those provided by Cairo? Could this actually be a performance advantage, allowing the X server to take advantage of hardware acceleration in places Apple's implementation can not?
Keith Packard: I don't think there's that much fundamental difference between X and the OS X window system. I'm pretty sure OS X rendering is hardware accelerated using a mechanism similar to the DRI. Without that, it would be really slow. Having the clients hit the hardware directly or having the X server do it for them doesn't change the fundamental performance properties of the system.
Where there is a difference is that X now uses an external compositing agent to bring the various elements of the screen together for presentation, this should provide for some very interesting possibilities in the future, but does involve another context switch for each screen update. This will introduce some additional latency, but the kernel folks keep making context switches faster, so the hope that it'll be fast enough. It's really important to keep in mind that this architecture is purely experimental in many ways; it's a very simple system that offers tremendous potential. If we can make it work, we'll be a long ways ahead of existing and planned systems in other environments.
Because screen updates are periodic and not driven directly by graphics operations, the overhead of compositing the screen is essentially fixed. Performance of the system perceived by applications should be largely unchanged by the introduction of the composting agent. Latency between application action and the eventual presentation on the screen is the key, and making sure that all of the graphics operations necessary for that are as fast as possible seems like the best way to keep the system responsive.
Eugenia Loli-Queru: How is your implementation compares to that of Longhorn's new display system (based on available information so far)?
Keith Packard: As far as I can tell, Longhorn steals their architecture from OS X, DRI-like rendering by applications (which Windows has had for quite some time) and built-in window compositing rules to construct the final image.
Rayiner Hashem: What impact will the new server have on toolkits? Will they have to change to better take advantage of the performance characteristics of the new design? In particular, should things like double-buffering be removed?
There shouldn't be any changes required within toolkits, but the hope is that enabling synchronous screen updates will encourage toolkit and window manager developers to come up with some mechanism to cooperate so that the current opaque resize mess can be eliminated.
Double buffering is a harder problem. While it's true that window contents are buffered off-screen, those contents can be called upon at any time to reconstruct areas of the screen affected by window manipulation or overlaying translucency. This means that applications can't be assured that their window contents won't be displayed at any time. So, with the current na´ve implementation, double buffering is still needed to avoid transient display of partially constructed window contents. Perhaps some mechanism for synchronizing updates across overlaying windows can avoid some of this extraneous data movement in the future.
Rayiner Hashem: How are hardware implementations of Render and Cairo progressing? Render, in particular, has been available for a very long time, yet most hardware has poor to no support for it. According to the benchmarks done by Carsten Haitzler (Raster) even NVIDIA's implementation is many times slower in the general case than a tuned software implementation. Do you think that existing APIs like OpenGL could form a foundation for making fast Render and Cairo implementations available more quickly?
Keith Packard: Cairo is just a graphics API and relies on an underlying graphics engine to perform the rendering operations. Back-ends for Render and GL have been written along with the built-in software fall-back. Right now, the GL back-end is many times faster than the Render one on existing X servers because of the lack of Render acceleration.
Getting better Render acceleration into drivers has been slowed by the lack of application demand for that functionality. With the introduction of cairo as a complete 2D graphics library based on Render, the hope is that application developers will start demanding better performance which should drive X server developers to get things mapped directly to the hardware for cases where GL isn't available or appropriate.
Similarly, while a Composite-based environment could be implemented strictly with core graphics, it becomes much more interesting when image composition can be used as a part of the screen presentation. This is already driving development of minimal Render acceleration within the X server project at Freedesktop.org, I expect we'll see the first servers with acceleration matching what the sample compositing manager uses available from CVS in the next couple of weeks.
A faster software implementations of Render would also be good to see. The current code was written to complete the Render specification without a huge focus on performance. Doing that is mostly a matter of sitting down and figuring out which cases need acceleration and typing the appropriate code into the X server. However, Render was really designed for hardware acceleration; acceleration which should be able to outpace any software implementation by a wide margin.
In addition, there has been a bit of talk on the email@example.com mailing list about how to restructure the GL environment to make the X server rely upon GL acceleration capabilities rather than having it's own acceleration code. For environments with efficient GL implementations, X-specific acceleration code is redundant. That discussion is very nebulous at this point, but it's certainly a promising direction for development.