Microsoft to Rebuild Windows Graphics System

Submitted by Dan Warne 2006-06-01 Windows 32 Comments

Microsoft is to overhaul Windows’ graphic driver model after realising that the one that will ship with Windows Vista – Windows Display Driver Model (WDDM) 1.0 – needs improvement in the way it shares GPU resources between programs and Windows itself.

About The Author

Adam Scheinberg

Technology Executive • Web Developer • Father • Foodie • Music Snob • OS enthusiast

Follow me on Mastodon @[email protected]

32 Comments

2006-06-01 2:20 pm
b34r
OSX can do it.
XGL can do it.
Microsoft requires new hardware because current hardware cannot do it. Déjà vu.

2006-06-01 2:29 pm
halfmanhalfamazing
It’s more of the same.
All of your 3d desktops are belong to *nix.
Unless of course you shell out more money.
Edited 2006-06-01 14:30
2006-06-01 3:22 pm
rayiner
OS X and XGL don’t address the problem Microsoft is trying to address with WDDM 2.0. OS X (as of 10.4) doesn’t have every window render directly via the GPU. Instead, it draws into the window buffers using software, and then Quartz Compositor uses the GPU to draw the window buffers. XGL sends all drawing commands to the X server, which uses the GPU to perform the actual rendering.
In both of the above cases, there is actually only one process issuing OpenGL commands, so fine-grained sharing isn’t necessary. In Vista, potentially dozens of processes will be issuing OpenGL commands simultaniously. That is what fine-grained GPU sharing is designed to address.
No current OS implements the sort of GPU sharing Microsoft is talking about with WDDM 2.0. OS X might implement some sort of scheduling of command packets, and there is ongoing work in the DRI to implement such a thing as part of the memory manager work, but generally GPU sharing on current consumer platforms is done via the “wait for idle then context-switch” method, which is extremely coarse and fairly slow. The exceptions is perhaps NVIDIA’s GL hardware, which has some level of support for concurrency, though it is unclear how fine-grained it is. And of course, high-end hardware like SGI’s has had full GPU virtualization for years.

2006-06-01 6:49 pm
Earl Colby pottinger
> OS X and XGL don’t address the problem Microsoft is trying to address with WDDM 2.0.
> OS X (as of 10.4) doesn’t have every window render directly via the GPU.
> Instead, it draws into the window buffers using software, and then Quartz Compositor
> uses the GPU to draw the window buffers. XGL sends all drawing commands to the X server,
> which uses the GPU to perform the actual rendering.
If I understand you properly, why not have the GPU draw into the window buffers instead of the software, then use the GPU to also transfer from the buffers to the rendered windows? Seems that would speed up things and since everything is being processed with GPU commands controlling the ordering would be easyier. Or did I miss something stupid?

2006-06-01 10:46 pm
rayiner
That’s exactly what Longhorn (and likely OS X 10.5) will do. However, that brings up the central problem that is being addressed in that article: how to handle sharing of the GPU.
In consumer software, the GPU is shared in a very heavyweight way. Basically, to switch from thread A’s OpenGL context to thread B’s OpenGL context, the driver idle’s the GPU (ie: waits for all current command buffers to finish processing), then saves and restores a fairly large amount of register state. This is okay when the only OpenGL client is a game or 3D modeler, but breaks down when a dozen apps all want to render via the GPU.
At that point, you need to virtualize the GPU, like you do the CPU, and implement scheduling mechanisms so you can put some bounds on how long a context-switch might take or how long a graphics thread might have to wait before getting a chance to render. There are two ways to do this, analogous to how there are two ways to handle multiple apps using the sound-card simultaniously. You can either mix the command streams in software at a high level (eg: XGL takes multiple RENDER command streams and muxes them to a single GL command stream), or in the driver at a low level (eg: hardware takes in multiple GL command streams, and switches between them using hardware support).
There is also a somewhat tangential reason to not use the GPU for drawing window buffers. Current GPUs don’t do anti-aliasing very well. 4x multisampling is about the highest you can go with good performance, and while that may be okay for a game, it looks pretty crappy for high-contrast vector image. A rather stark depiction of the difference in anti-aliasing quality can be seen here: http://www.beyond3d.com/articles/wildcatiii/index.php?page=page4.in…
Note that in the close-up on the second page, even the Wildcat with 16x AA is still only using 16 levels of gray to perform the anti-aliasing. In contrast, FreeType uses 256 levels of gray when anti-aliasing fonts.
Microsoft has solved this particular issue in a very elegant way using vector textures (http://research.microsoft.com/~cloop/LoopBlinn05.pdf, http://alice.loria.fr/publications/papers/2005/VTM/vtm.pdf), but I’m not aware of any work on-going to implement something like this in OS X or XGL.

2006-06-02 1:14 am
jonsmirl
Check this paper out too. These guys provided source code for Linux.
http://staffwww.itn.liu.se/~stegu/GLSL-conics/
Long run antialiasing on the GPU is always going to win. To get it right it needs to be done as late in the drawing process as possible. It is completely parallel, perfect for shader programs.
Sooner or later someone is going to write an open, sub-pixel antialiased, GPU-based glyph generator. VTM was a start but no source and there are probably more efficient ways to do it. There is no reason that GPU generated glyphs won’t equal freetype quality sooner or later.
2006-06-02 1:33 pm
Earl Colby pottinger
Thanks, the link to http://www.beyond3d.com/articles/wildcatiii/index.php?page=page4.in… was real eye openner, I used magnify to examine the gifs closely and can really see the diffirence, not to mention the mess the KYROII’s 4X FSAA did to the image.
For some reason I did not think of context switching of the GPU. I guess I am still thinking in 2D mode where you don’t need that much info to draw something on the screen. A true 3D screen needs a lot more info to describe the contents of a window and swapping it around must be a pain.
Again thanks for the reply.

2006-06-01 6:56 pm
KenJackson
In both of the above cases, there is actually only one process issuing OpenGL commands, so fine-grained sharing isn’t necessary.
I see that as a big advantage of X. One of the things I find most appealing about UNIX/Linux/BSD and X is the total transparency of the network, even for graphical applications. I can run a graphical application on my PC at work while I’m at home and see it as if it was being run on my home PC. (Though of course network speed has an effect).
Further on down the page here, butters says the client/server method can get to command-level switching with a much simpler approach.

2006-06-01 7:15 pm
CPUGuy
Ever here of Terminal Services?

2006-06-01 7:33 pm
KenJackson
Heard of it, seen it in use, haven’t worked with it. But I gather it is not as transparent or simple as X.
Also, If I’m not mistaken, that requires you to see the whole desktop of the remote machine. I can do that with TightVNC, but I don’t want to see the desktop–I just want to run one app.
2006-06-01 9:42 pm
CPUGuy
This will be allowed in the next TS release, as part of Longhorn Server.
Basically it looks like any other installed app on your computer but it is running on your TS server, and there is no extra app that you have to run for it.
2006-06-02 4:50 am
Bending Unit
If you would have tried it you would know that its performance is quite excellent and preferred to X.
2006-06-02 11:01 am
KenJackson
If you would have tried it you would know that its performance is quite excellent and preferred to X.
That’s a very subjective call and it totally misses the point of my comment. I said:
One of the things I find most appealing about UNIX/Linux/BSD and X is the total transparency of the network, even for graphical applications.
The phrase total transparency means it that both the executable and X operate exactly the same regardless of where the client and server are. Graphic operations are always transmitted from client to server through a socket.
It’s that uniformity, consistency, and orthogonality that gives the advantage of flexibility across diverse objectives. It’s difficult to explain flexibility in a post like this, but I’m much more impressed with X’s flexibility than Windows’.
2006-06-02 1:37 pm
Beryllium
Remote Desktop/Terminal Services, at this point in time, are more comparable to VNC than to X itself. However, in that comparison, they totally whip VNC’s ass, IMO
I do believe it currently has single-application support, but it’s an odd hack and not as nice as X11’s method. And I’ve only seen evidence of it in the Unix “rdesktop” client, not the ms client itself. Haven’t actually tried it.
2006-06-02 3:19 pm
KenJackson
However, in that comparison, they totally whip VNC’s ass, IMO
Of course X and VNC are totally different things, so we are jumping topics a little here. But in what way is Remote Desktop better than VNC? (Having not used the former, I can’t compare.)
TightVNC is very good at only transmitting what it needs to, so it’s pretty fast. When I must view the desktop of my Windows machine, I usually use TightVNC (viewing from Linux) rather than switching the KVM. Windows itself is more irritating than the small delay.
I didn’t realize there was a Unix rdesktop client. Ah! I just searched and found that it is a sourceforge project. I may check that out.
2006-06-02 3:25 pm
Beryllium
When connecting from my FreeBSD machine to my Windows machine, RDesktop is unbelievably faster than any VNC viewer I’ve tried; undoubtedly this is because RDP has access to system hooks that VNC doesn’t know about or can’t utilize.
I can also redirect folders and audio streams through RDesktop. Supposedly even serial ports, but I’ve never seen a need for that. Anyway, it’s all encapsulated in the same service. Pretty neat, if you ask me. Which you did. 😉 (Accessing the fancy features using the Unix client requires reading the man page, but if you’re going Windows2Windows, it’s all built in to the client interface)
Note that Windows XP Home can’t do it (can’t act as an RDC server), but Windows XP Pro can.
2006-06-02 7:56 am
proforma
You can do this too using the latest terminal services with Vista/Server.

2006-06-01 11:17 pm
rayiner
Yes, it’s a nice advantage, but it has a downside. It also means that you’re limited by the network protocol in what operations you can do. This hasn’t been a problem classically, since the types of graphics people wanted to draw were well-defined, but that is no longer true.
For example, it is currently impossible, with RENDER, to handle vector graphics rasterization via the GPU in the same way Vista does it (using VTMs). In the VTM technique, a vector shape is drawn by stuffing its geometric information into a texture that is plastered over a large polygon. A pixel shader then uses that information to color the pixels of the polygon to appropriately reproduce the shape. The pixel shader can use a very high-quality anti-aliasing method based on coverage information, just like Cairo does in its software implementation. There is no way to perform this technique with RENDER, as it currently stands. The pixel shader needs to be run via the Xserver, but the application is the only one with the geometric information the pixel shader needs.
Now, the above is not an argument against network protocols (since OpenGL is indeed network-transparent), but rather an argument against the network protocols used within the X server. For a long time, X was held back by the fact that its protocol for graphics was insufficient, and unfortunately, it seems that its replacement, while better, mostly solves yesterday’s problems, not tomorrow’s. As it is, RENDER is used as a glorified alpha-blitter for font-rendering, (at least until recently, NVIDIA’s drivers didn’t even accelerate any other paths, and nobody really noticed), and its usefulness beyond that seems questionable to me.
EDIT: Actually, a good way to get an idea of the problem is to load this: http://www.cs.umu.se/~c99drn/opengl_freenix04.pdf, and look at Figure 11. Sit back a couple of feet and tell me that the bottom image isn’t a lot more pleasing than the top one. The paper says that the top image, with 4x AA, is “close to indistinguishable”. I don’t buy it at all.
Edited 2006-06-01 23:27

2006-06-02 1:29 am
jonsmirl
I agree that Render plus Cairo is a backwards looking design. But in their defense the shader stuff was quite new when they were designing Cairo.
Cairo forces you to commit to a device resolution and transformation when the image is rendered. Shader programs can defer the tessellation until the GPU is painting the pixel. By deferring you can tessellate and anti-alias at the optimum resolution. I’d like to see them rethink the design in terms of shader programs.
2006-06-02 1:56 am
rayiner
Of course, the Blinn and VTM papers came out last year, and the earlier work in the field is probably something you’d miss unless you were looking for it. However, I remember discussions around the time about OpenGL versus RENDER, and one of the “pros” of the OpenGL side was that developers might find a use for pixel shaders for 2D. I guess it was a hypothetical argument at the time, but I think VTM is a very strong use-case showing the advantages of exposing OpenGL as the base API instead of RENDER. I also think that its an example that surfaced much earlier than anybody really expected.

2006-06-01 2:27 pm
halfmanhalfamazing
A few months ago, it was said that windows vista needed to be re-written. MS denied that.
Apparently it wasn’t the whole OS as was stated, but here it is.
It needs to be re-written.
2006-06-01 3:00 pm
jonsmirl
They have been talking about this for a year. For most users it is not as bad as it sounds, you would only get in trouble trying to run a game like FarCry in a window and while still using your desktop. Most people would make the game full screen. People that run signal processing programs on their GPUs have a real problem.
This is definitely something that should be addressed in the long term but it requires changes to the GPU hardware.

2006-06-01 3:05 pm
jonsmirl
Also, this problem affects all platforms equally. So it is not a Vista thing.
2006-06-01 3:15 pm
CPUGuy
I, personally, have a problem alt-tabbing out of games in Vista, and then trying to get back in, the game stops responding.

2006-06-01 3:17 pm
jonsmirl
Sounds like a bug to me. That should work.

2006-06-01 3:18 pm
TrevorB
At this point I’m just out of words. This is the most incredibly bungled software project I’ve ever seen.
Even Duke Nukem Fornever is suffering delays because they’re raising the bar. Vista just looks more and more like …well… XP.
2006-06-01 3:36 pm
Hakime
“In both of the above cases, there is actually only one process issuing OpenGL commands, so fine-grained sharing isn’t necessary. In Vista, potentially dozens of processes will be issuing OpenGL commands simultaniously. That is what fine-grained GPU sharing is designed to address. ”
Vista does not use OpenGl, what are those OpenGL commands you are talking about?, it used DirectX for drawing with the GPU!!!!!!

2006-06-01 3:47 pm
rayiner
Yes, you’re right. I should’ve said “D3D commands”.
To be entirely accurate, the command buffers don’t contain either OpenGL or D3D commands. Rather, they instruction format is determined by the architecture of the GPU.

2006-06-01 4:28 pm
happycamper
is this going to affect the current vista beta 2 in any way?

2006-06-02 1:47 am
n4cer
is this going to affect the current vista beta 2 in any way?
No. The article is misleading in how it presents this as a change MS is just making after finishing WDDM 1.0. In fact, this has been public info since WinHEC 2003. WDDM on current hardware isn’t as efficient as it can be, in part because current hardware isn’t designed for immediate preemption. You have to wait for a batch of instructions to complete before you can start executing instructions for another task.
Vista will include an Advanced version of its driver model that GPU vendors can implement that allows for finer-grained scheduling — instruction-level preemption instead of batch-level. This cuts down on latency and makes the GPU more like a CPU in being able to use it as a shared resource. IIRC, there’s also effiency improvements for GPU virtual memory management under the Advanced model, and the hardware that uses it will also include new features that fully accelerates things like text rendering which currently is a 2-stage process, the first stage performed in software and the second done in hardware using pixel shaders.
This is not something that took MS by surprise or has them doing unplanned development. It’s been publically documented for 3 years now.

2006-06-01 5:29 pm
butters
Coming from the free software camp or even the Mac world, the idea of a software company coming up with a new driver model and telling the hardware companies to go support it is just about the weirdest thing I’ve ever heard. Normally the hardware guys are 1-2 steps ahead and waiting for the software guys to make good use of their features… or refusing to disclose their driver interfaces altogether.
While I might be misunderstanding the concepts, it seems like the approach taken by XGL and OSX avoids most of the problems. The respective display servers receive requests from the various clients and can employ nifty scheduling at this level to optimize concurrent rendering (in a driver-agnostic manner). As I understand it, the Windows model has the clients firing commands directly at the driver, which must then work out the scheduling. I think the Windows model might be able to get finer-grain timeslices (forced preemption), but the client/server method can get to command-level switching with a much simpler approach.
2006-06-01 6:56 pm
Weeman
What the hell? WDDM 2.0 was announced back on WinHEC 04. That’s about as much news as DX10 being Vista only. It’s known for ages, yet people act surprised.