Linked by Thom Holwerda on Sun 10th Sep 2006 18:00 UTC
BeOS & Derivatives Jerome 'Korli' Duval has adapted Haiku's MESA-based OpenGL subsystem to an addon format, allowing renderers to be plugged in, with the first one being a MESA software renderer. This system will allow hardware 3D renderering drivers, such as Rudolf's one when adapted, to plug in without requiring specialised libGL.so's for every card. This extends the common BeOS concept of modularity even further, and is somewhat similar to how Be's OpenGL beta worked - each graphics card acquired a third, .3da driver, to add to the kernel and .accelerant drivers.
Thread beginning with comment 161199
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[4]: Lamda
by rayiner on Mon 11th Sep 2006 02:56 UTC in reply to "RE[3]: Lamda"
rayiner
Member since:
2005-07-06

"Tight integration" is handwaving unless you care to be more specific.

I can be more specific.

Current GUIs have a few major conceptual pieces. There is the graphics engine (eg: X or the GDI), the window manager, the UI toolkit, and the application-side event handlers. The kernel-level scheduler is also a crucial component in the whole. Integration between these pieces, or more importantly, synchronization between these pieces, is crucial in creating a performant UI.

Consider, for example, what happens when the user resizes a window. The resize operation is actually one of the most complex and performance-sensitive operations in the UI, especially in buffered UIs in which window-movement can happen without redrawing underlying windows. At each step of a resize, all the pieces must coordinate. The mouse-move event goes to the window manager, which calls into the graphics API to enlarge the window's drawing context. The window manager then redraws the window frame. Then, the graphics API informs the application that its content area has been resized. The toolkit handles this resize message, and redoes the window layout. In the process of doing window layout, it has to call into the user-level event handlers of each widget, to handle resize logic and redraw. At each redraw step, the graphics API gets called, and at each process switch, the kernel scheduler gets involved. All this has to happen in about 50-100ms, to ensure a smooth 10-20fps resize rate.

The key thing to note here is that there is easily enough CPU time to meet a 50ms target. I worked on the XSYNC-based resize NETWM spec some years ago, and I found that even on my P4 laptop, Konqueror could perform layout of a moderately complex page in 20-25ms. The CPU time taken by the rest of the logic is trivial, a few ms at most. Yet, at the time (and probably even now on much faster machines), Konqueror could not be resized smoothly. Why? Synchronization.

Consider what happens when this process is carried out in an X11 GUI:

X reads /dev/mouse. It generates a mouse-move event for the X11 window that contains the window-frame (the app's X11 window is nested inside the window-frame's X11 window). Then, it keeps computing, servicing requests from other applications, either until its input queues are empty, or until it runs out of its timeslice (which can be tens of ms). Then, the scheduler kicks in, and often runs one or more random processes, again for potentially tens of ms. Eventually, the window-manager gets scheduled. It reads the mouse-move event out of its socket. It sends window-resize requests for the X11 window containing the window frame, as well as the app's X11 window, and either keeps servicing unrelated clients, or blocks. Now, the kernel scheduler kicks in, and runs some random process. Eventually, X gets scheduled again. It processes the window-resize request, and generates a window-resize devent for the app's window and the window manager's window. Again, it services unrelated clients until its queues are empty, or its timeslice expires. The kernel scheduler kicks in, and eventually, the window manager gets scheduled, sees the resize event, and redraws the window frame. It services unrelated clients until its queues are empty. The kernel scheduler kicks in again, and eventually the app gets scheduled. It sees the resize event, and does the re-layout, and redraws the window contents. If the UI toolkit is stupid, like GTK+, it'll wait awhile to buffer more requests before handling them as a batch. We're not done yet. The kernel scheduler kicks in. Eventually X gets scheduled again. It sees the drawing requests in its queues, and draws the new window frame and window contents on the screen. Now, hundreds of ms after the user moved his mouse, he sees one step of the resize.

Of course, things get even more entertaining. The above process is what GNOME/Metacity behave like now. Before XSYNC-based resize, it was even worse. Because the window-manager can redraw the window frame much much faster than the app can redraw the window contents, what'd happen is that more mouse-moved events would come in as soon as the WM handled the resize, and the WM would start the next-resize step before the app had finished the last one. X, because of its fair-share scheduling, would happily oblige, and the result would be that the window frame would be resized and redrawn dozens of times every time the window contents were resized.

Note where the inefficiencies in the process lie. It's not in the drawing speed. X will draw millions of primitives for you before you can blink. It's not in the window managent speed. X will move, circulate, resize, etc, tens of thousands of windows per second. It's not in the IPC. Sockets will move hundreds of megabytes per second. It's not in the context switching. The actual context-switch time for the whole sequence above is maybe 50 usec on Linux! No, the inefficiency in the process is all that time that's spent doing things unrelated to the resize. All that time spent in X and the WM servicing apps that are not the app being resized. All that time that is spent with the kernel choosing apps that have nothing to do with the resize. Indeed, its the fact that the Linux scheduler has gotten pretty good at minimizing the latter that resizing has gotten so much better on Linux, even though the window managers or X's window handling haven't gotten faster in maybe a decade.

So now, we can see where integration and synchronization come into play. You can speed up the above process by several factors just by colocating the window manager in either the graphics server (how BeOS does it), or in the application (how Windows does it). That whole series of X <-> WM transitions at the beginning just get folded into some intra-app function calls.

I used resize as an example, but things like this happen in many other places. Consider what happens when a menu needs to be popped up, for example. Synchronization and integration comes into play there too. The only way to achieve proper synchronization is by developing the GUI as a system, not as seperate, unrelated parts. All display systems have certain performance characteristics. X's performance characteristics aren't that weird. Yet, X toolkits absolutely suck at adapting to those characteristics. Hell, even Xlib sucks at adapting to those characteristics. Until recently, the kernel scheduler sucked at adapting to the characteristics of the whole X <-> WM <-> App system.

The fact that the BeOS UI toolkit was designed precisely for the app_server's performance profile, and the kernel scheduler was designed precisely for the app_server/GUI-toolkit's performance profile was partly why BeOS on a PII-300 was snappier than X11/GNOME is on a 2GHz Core Duo.

Edited 2006-09-11 03:03

Reply Parent Score: 5

RE[5]: Lamda
by Zenja on Mon 11th Sep 2006 04:35 in reply to "RE[4]: Lamda"
Zenja Member since:
2005-07-06

Very nice reply rayiner. As usual, its nice to read well informed posts from intelligent posters, unlike the usual teenage drivel we usually encounter on the net.

rayiner illustrates one of the best features of Haiku - the entire system is done by a single team with a common goal. This is unlike the free *nix world where there are multiple teams working with different priorities / goals. In the end, the *nix solution is more modular / widely usable with great flexibility. This is at the expense of efficiency.

Haiku promises to be a *fast* and efficient unified solution (free OS). It will never host servers or run on a mobile phone (although there will be adventurous people willing to try). But it will fly on the desktop, and resize a window faster than what I (the user) can notice.

Reply Parent Score: 2

RE[5]: Lamda
by Methe on Mon 11th Sep 2006 06:52 in reply to "RE[4]: Lamda"
Methe Member since:
2006-08-27

Great answer indeed. I think this should be kept somewhere as one can be sure other kids will burst some other day saying "X is faster" while it's not even the point.

Reply Parent Score: 0

RE[5]: Lamda
by Lambda on Mon 11th Sep 2006 08:22 in reply to "RE[4]: Lamda"
Lambda Member since:
2006-07-28

Nice writeup. There's a couple problems though. One is (as you mentioned), that the symptoms are particularly endemic to gtk+/Metacity. In addition to what you mentioned, there was(is) a bottle-neck in the Pango library, that has caused resizing issues. I'm not sure if that ever got cleared up. They were looking for a fastpath for western languages, but I'm not sure if it ever made it in. This has been less of an issue with KDE, and probably even less of an issue with Windows. The Windows GUI subsystem is for obvious reasons, especially optimized in the 2d department.

I tracked down the "Optimizing XFree86" osnews article that the B.E.OS developer wrote for osnews around four years ago. http://www.osnews.com/story.php?news_id=1905&page=2. That's an interesting read. Just that from a starting point and four years of development could have put a fast BeOS gui on top of X.

The other "problem" is that Moore's law is working in favor of a desktop like Gnome - mostly in the way of GPU transistors these days. Vista will of course be graphically fast (unless they completely blunder and go backwards), and linux desktops will take advantage of XGL (or whatever the final solution is for X). This doesn't solve all the issues regarding X toolkits and desktop managers, but glosses over some of them.

By the way, most of the Gnome resizing issues go away once you get to the 1Ghz level, which is pretty pervasive for desktop systems these days.

Something that you didn't mention, but would work in favor of a heavily multi-threaded architecture like BeOS are the multicore CPUs that are coming out. The only problem is that it is notoriously difficult for programmers to debug and develop for in a language like C++. Eugenia had touched on this years ago when discussing some of the problems with BeOS. A language like Haskell or Clean would be a better choice, but there's social issues, as well as implementation issues surrounding those.

Reply Parent Score: 1

RE[6]: Lamda
by Nutela on Mon 11th Sep 2006 16:05 in reply to "RE[5]: Lamda"
Nutela Member since:
2006-02-09

If you skip the whole resizing issue you may want to look at drag'n'drop and a whole lot of other stuff which made BeOS very pleasant and fast (not fast as in cpu speed) to work with.

Like if windows wasn't such a piece of big bloat soft with all the viruses and such it would apeal to me because it's integrated, looks the same way (or similar) same conventions etc. etc.

With linux and all you get many many choices, maybe too many to filter them all for somebody who just wants to write a quick email to his grandmother abroad.

Yes and who want's to wait if he has a ton of idea's ready to be recorded by his or her computer?

I wish a lot of users and devs! would instal BeOS to experience and use some of it's great idea's and feel.

Reply Parent Score: 1

RE[6]: Lamda
by rayiner on Mon 11th Sep 2006 17:39 in reply to "RE[5]: Lamda"
rayiner Member since:
2005-07-06

I tracked down the "Optimizing XFree86" osnews article that the B.E.OS developer wrote for osnews around four years ago. http://www.osnews.com/story.php?news_id=1905&page=2. That's an interesting read. Just that from a starting point and four years of development could have put a fast BeOS gui on top of X.

Yes, you probably could. But it would require coordination, though. X could stay largely the same, but the kernel scheduler 4 years ago was completely insufficient for the task. The current one is a lot better, and much of the reason for that is because it was designed with X as one of its use-cases. And of course, the toolkit would have to be highly optimized for the performance profile of X. Read up on XCB vs Xlib to see the types of things an X toolkit has to avoid doing.

This doesn't solve all the issues regarding X toolkits and desktop managers, but glosses over some of them.

It doesn't really solve any of the issues, but papers over them quite nicely. Resizing with XGL will be slower and less efficient, because now the DRM has to get involved too in order to resize the window buffer. However, that won't matter, because the double-buffering will eliminate the visual artifacts that accompany poorly-synchronized resize. It won't be any faster, but it'll look much smoother.

By the way, most of the Gnome resizing issues go away once you get to the 1Ghz level, which is pretty pervasive for desktop systems these days.

Even on my 2GHz Core Duo Macbook, the resizing issues are still there. The synchronization issues are still there.

Reply Parent Score: 1

RE[5]: Lamda
by axeld on Mon 11th Sep 2006 10:26 in reply to "RE[4]: Lamda"
axeld Member since:
2005-07-07

To rayiner: The BeOS kernel has no optimizations for GUI whatsoever; it just uses a priority based scheduler. Like Linux it can and will schedule unrelated threads during window resize or other heavy GUI operations.

The only argument that plays off is the window manager being part of the app_server - and that's an argument against X server performance and design. Windows and BeOS have obviously solved this problem in a better way.

Reply Parent Score: 4

RE[6]: Lamda
by rayiner on Mon 11th Sep 2006 17:33 in reply to "RE[5]: Lamda"
rayiner Member since:
2005-07-06

The BeOS kernel does have an optimization for the GUI. Window threads in BeOS run at a higher priority than regular threads. This is not as overtly hackish a solution as Windows's (the foreground app gets a prio boost and a longer timeslice), and its well-subsumed into the general scheduling mechanism, but there is still a B_DISPLAY_PRIORITY that is distinct from B_NORMAL_PRIORITY.

According to an old Be Newsletter, BeOS's probabalistic scheduler will be much more likely to choose a thread with priority 15 (B_DISPLAY_PRIORITY), than a normal one with priority 10. That means when the app_server sends a "redraw" or "resized" message to the application, it is highly probable that the next thread to be run will be the window thread in question.

Reply Parent Score: 1

RE[5]: Lamda
by renox on Mon 11th Sep 2006 13:19 in reply to "RE[4]: Lamda"
renox Member since:
2005-07-06

On the topic of communication & scheduling, I have often wondered if it wouldn't be possible to link the two ie instead of having 'do system call to send message from X to Y - do system call to wait for answer - kernel schedule unrelated process - eventually schedule Y - reply, etc', if it wouldn't interesting to have one sytem call which would be 'send message from X to Y and schedule Y (if quantum is not exhausted of course)'.

If Y can do its work without blocking this would allow very fast communications.
Has it been tried in some OS?

Reply Parent Score: 1

RE[6]: Lamda
by rayiner on Mon 11th Sep 2006 17:59 in reply to "RE[5]: Lamda"
rayiner Member since:
2005-07-06

On the topic of communication & scheduling, I have often wondered if it wouldn't be possible to link the two ie instead of having 'do system call to send message from X to Y - do system call to wait for answer - kernel schedule unrelated process - eventually schedule Y - reply, etc', if it wouldn't interesting to have one sytem call which would be 'send message from X to Y and schedule Y (if quantum is not exhausted of course)'.

A number of microkernels (such as Amoeba, QNX, and L4) do something very similar. In these microkernels, IPC is synchronous and unbuffered. If Y is waiting for IPC, X can send an IPC, the kernel will copy the data directly from X to Y, Y then gets scheduled and creates a reply, and the kernel copies the result directly from Y to X. This happens in one system call from X's point of view. Then, at the scheduling level, you can apply mechanisms like timeslice donation (run Y for the remainder of X's timeslice), or priority boosting (boost Y's priority when it receives IPC), to ensure that Y gets scheduled immediately after X.

This mechanism does were very well in practice.

Reply Parent Score: 1