Linked by Hadrien Grasland on Sun 29th May 2011 09:42 UTC
OSNews, Generic OSes It's funny how trying to have a consistent system design makes you constantly jump from one area of the designed OS to another. I initially just tried to implement interrupt handling, and now I'm cleaning up the design of an RPC-based daemon model, which will be used to implement interrupt handlers, along with most other system services. Anyway, now that I get to something I'm personally satisfied with, I wanted to ask everyone who's interested to check that design and tell me if anything in it sounds like a bad idea to them in the short or long run. That's because this is a core part of this OS' design, and I'm really not interested in core design mistakes emerging in a few years if I can fix them now. Many thanks in advance.
Thread beginning with comment 475092
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[8]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Mon 30th May 2011 14:02 UTC in reply to "RE[7]: Comment by Kaj-de-Vos"
Kaj-de-Vos
Member since:
2010-06-09

You're on the right track here: it's indeed a matter of how parameters are passed (the message). But you're framing most of your thought in traditional terms of code, with calls and parameters and many other details. Doing implementations in those terms has led to the idea that it is complex and costly. As I said in another post, this is not so if you do it right. To do that, you have to forget about all those things that are irrelevant.

Let's make this concrete. How would you implement a service that draws a line? You could draw up a plan including all sorts of functions, parameters, transfer methods, interface description languages and parsers for it, but that is all irrelevant. To draw a line, assuming the pen is set at a starting point, it suffices to specify this:

draw x y

You could call "draw" a function name, but that is irrelevant and assumes too much. It's just a command. x and y are the parameters. Not because they're inherently parameters, but because they're preceded by a command. This is our first self-descriptive feature. But we've already assumed it's a line in a 2D space. At least we haven't assumed it's either a screen or a plotter, but we could make it more general by specifying a 3D line like this:

draw x y z

I don't believe in higher physical dimensions, so we'll leave it at this. We've written it in a human message, so how do we encode this in an efficient machine message that wouldn't be out of place in the core of an operating system? A naive first attempt would say that we need numbers for each component. Both sides of the interface would need to agree on a command set, like syscalls. draw is our first command, and if we encode it in an integer, all parts can have the same encoding:

1
integer
integer

Now this is really not hard to parse, and the performance loss against a C function call is negligible. On the other hand, we haven't improved much on its flexibility yet, except that we are completely free to interpret this as a sync or async command. An important goal is to keep changing interfaces compatible, so we could do that by brute force by prefixing an interface version:

1
1
integer
integer

This is trivial here, but not so in low level code languages such as C. You'd have to depend on symbol versioning, for example, making you dependent on certain toolchains. However, even better than a wholesale interface version is to make compatibility more granular by weaving it into the data. Let's see what happens on changes. Consider the case that you want to move coordinates to floating point to use subpixel precision in your graphics. This actually happened during the development of AtheOS. The abstract specification is still the same:

draw x y

But we would need to bump the interface version because the encoding changes:

2
1
float
float

This makes old clients incompatible with new services when they drop the old interface. We can avoid that by introducing a type system. So far, we have data of three types:

1: command
2: integer
3: float

Here's a typed version of the interface:

1
1 1
3 float
3 float

The parser in the interface becomes a little more complex, but it's still trivial, and very flexible. It's now easy to support the integer interface in the same interface version:

1
1 1
2 integer
2 integer

We're venturing into terrain that low level languages without proper polymorphism can't really support. We can still count the numbers we use on the fingers of one hand, and we already have a powerful type system independent of any implementation language. We're starting to feel very powerful, and confident to look far into the future. We will add types when we need them, but what happens when we introduce new types that old interfaces don't know about? We can keep some new interfaces usable by old clients if they can at least parse the encoding, and skip data they don't understand, or pass it along to interfaces that do understand. When AtheOS switched completely to floating point graphics coordinates, old programs just kept working and were then running in a more advanced environment that they knew nothing about. To keep new types parsable by old interfaces, the encoding needs to include their sizes. We can do this only for new types to optimise the encoding. REBOL has almost sixty data types, so it's fairly safe to reserve a space for hundred standard types. Let's say a mathematician has a weird virtual coordinate space in which he wants to draw:

1
1 1
101 size coordinate
101 size coordinate

So far we have disregarded the starting coordinate for the line. Let's introduce a command to set it:

set x y

1
1 2
3 float
3 float

Now we can draw a line starting anywhere:

set x y
draw p q

1
1 2
3 float
3 float
1 1
3 float
3 float

Note that in RPC, this would be two calls, with the associated overhead, so we're actually becoming more efficient here. But wait, we wanted to support 3D, so we now have to solve the problem of variable length parameter lists. We can write it like this:

set [x y]
draw [p q]

And we will have to encode the number of parameters somehow. To keep the format a generic stream of values, we could associate it with the command encoding:

1
1 2 2
3 float
3 float
1 1 2
3 float
3 float

set [x y z]
draw [p q r]

1
1 2 3
3 float
3 float
3 float
1 1 3
3 float
3 float
3 float

Alternatively, we could introduce a list type and pretend that a command has one parameter, the list:

1
1 2
4 2
3 float
3 float
1 1
4 2
3 float
3 float

Note that this is an alternative encoding for the same specification:

set [x y]
draw [p q]

Does that look prohibitively complicated?

Reply Parent Score: 2

RE[9]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Mon 30th May 2011 14:17 in reply to "RE[8]: Comment by Kaj-de-Vos"
Kaj-de-Vos Member since:
2010-06-09

Sorry, I've already made that last example too complex. It's very easy to fall into that trap. Because we defined a command type, the data stream is self-synchronising: if an interface has consumed all the parameters it understands, it can simply skip forward to the next command. So there is strictly no need to define a parameter number or list in this example. Still, they're useful constructs to solve other issues.

Reply Parent Score: 1

RE[9]: Comment by Kaj-de-Vos
by Neolander on Mon 30th May 2011 14:57 in reply to "RE[8]: Comment by Kaj-de-Vos"
Neolander Member since:
2010-03-08

Thanks you a lot, this makes it much easier to understand the concepts which you're invoking.

Some points remain obscure, though...

1/How does the type system help the switch from integer to float in the drawing system ?
2/More generally, is function overloading dealt with by the parser, or by the daemon doing the drawing work ?
3/Biggest issue I have : how is this so different from the kind of RPC which I advocate ? I mean, all in all it still looks a lot like a non-blocking RPC interface implemented on top of a messaging protocol. Even sending batches of RPC requests could be done in a "pure" RPC model, given an extra layer of abstraction that allows to send whole arrays of "RPC call" objects to the RPC subsystem.

Also...

Because we defined a command type, the data stream is self-synchronising: if an interface has consumed all the parameters it understands, it can simply skip forward to the next command.

I fail to see how letting client process send requests with an incorrect set of parameters could be a good idea.

Edited 2011-05-30 15:02 UTC

Reply Parent Score: 1

RE[10]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Mon 30th May 2011 15:20 in reply to "RE[9]: Comment by Kaj-de-Vos"
Kaj-de-Vos Member since:
2010-06-09

A type system is needed if you want to support polymorphism in an interface. How else would you know what type an item is and what size it has in the encoding? With types, it's trivial for a drawing server to support both integer and floating point coordinates.

Skipping of unknown parameters and commands is useful to enable old interfaces to use some new ones. This is what web browsers do with HTML. If your browser doesn't support gradients, you'll get graphics without gradients. If you deem any interface upgrade to be incompatible, you just bump the wholesale interface version.

There are no functions, so there is no function overloading. You really have to let go of such terms. :-)

Regarding which side does what: each service has to implement a little binary parser to interpret messages sent to it.

This is really quite different from RPC, but from other posts I understand that you are confusing the concept of RPC. You're also conflating the semantic payload with the transport mechanism. I'm only concerned with the payload here. You're basically free to choose a complementary transport method.

Reply Parent Score: 1

RE[10]: Comment by Kaj-de-Vos
by xiaokj on Mon 30th May 2011 17:27 in reply to "RE[9]: Comment by Kaj-de-Vos"
xiaokj Member since:
2005-06-30

Let me help here too!

First of all, let's deal with the earlier question. You asked something along the lines of "why bother with this when we can just design the RPC sensibly in the first place?" Well, the answer is that this *is* the sensible way out. It is inevitable that you will need to incorporate some fundamental changes somewhere down the road, why not do it properly in the first place? Also, you can simply make an optimising parser -- given that it would not change so often, the parser can run slow (this can be something like mkinitrd). If the filesystem supports notification, then that can be used to auto-invoke the parser per alteration. This ensures that we can actually not get much of a performance hit.

Now, for the specific questions,
1) There is no type system! Okay, it does look like one, but it actually is just regular data written in a specific way. The great thing is that it can be parsed by an simple program and the outcome can instantaneously migrate the system from integer to floating point calculations.

2) This depends on the choice of the implementer. If infraction is known rare, then it should be sensible to make a compromise -- the standard case is done by the drawing primitive, and the edge cases can be done by an external parser generated by the optimising parser of the data spec sheet. This ensures performance with no problems in compatibility.

3) This interface is a lot more flexible! Different OSes can just pass around the spec sheet and everybody can interoperate without difficulty (even understand each other's binary blobs; bad idea, I know, but still). Changes can be made at whim and most importantly, you are no longer hard-wiring code; you are able to just modify plain old data, which is a lot more malleable than code, surely!

Okay. Now to the last part. Maybe processes will have it less, but programs, in general, should not obnoxiously assume that they are free to mangle whatever they have been given. If there are parts they do not understand, barfing may actually destroy the critical debugging information. A program that keeps silent of the unknown (barfing only upon stuff it knows is bad) is actually desirable: it is capable of being combined with others!

Take the Troff package for example: The datastream actually includes more than just roff data, it includes eqn and tbl for example. When eqn does not understand the tbl input it receives, it just keeps quiet, knowing that something down the chain will understand it. Of course, it does barf when it is given nonsense in its own area of expertise.

Also, the example given above is only one part of the entire philosophy here. The ascii program's example is one of the more amazing ones I have seen: Instead of generating the entire program's output from scratch, the original author had realised that the whole table, precomputed, is actually better to work with.

Neolander, please try to read the Art of Unix Programming before we can actually continue with the discussion. There is a lot from there permeating this discussion.

Reply Parent Score: 2

RE[9]: Comment by Kaj-de-Vos
by Alfman on Mon 30th May 2011 22:17 in reply to "RE[8]: Comment by Kaj-de-Vos"
Alfman Member since:
2011-01-28

Kaj-de-Vos,

"I have been talking about the problem that RPC implies an inflexible semantic data exchange (the payload)."

I've found it frustrating that your posts are so vague. I'm really out of ideas as to what problems you have with "RPC". Your claims may be valid against the least common denominator forms of function prototypes, but there are plenty of counter examples which you've been ignoring.

"Let's make this concrete. How would you implement a service that draws a line?"

Well, your example evolves from just drawing a line to doing more stuff. But the implication that RPC cannot handle "more stuff" is not accurate.

You're assuming a least common denominator approach again, but many modern languages support functions which are extensible. It's not fair to put them all beside C and label all RPC as inadequate.


"You could draw up a plan including all sorts of functions, parameters, transfer methods, interface description languages and parsers for it, but that is all irrelevant. To draw a line, assuming the pen is set at a starting point, it suffices to specify this:"

You're essentially coming up with the foundations of a vector graphics format. You could make it arbitrarily complex. You could support windows 3.0 metafiles or VML or SVG (all vector graphics formats).

Javascript can easily accommodate your example by using JSON arrays and hashes. Web services can be used to connect separate components via HTTP/JSON directly to native types on many platforms including Perl/PHP/.Net/Python.


I think you're assuming that all RPC is limited to transferring only simple types as parameters, but this isn't the case. Today many languages make it possible to call remote procedures with deep objects hierarchies.

I can understand why you'd dislike simple function prototypes as in C (which may be what neolander has in mind), but I don't think your claims hold up against "RPC" in general.

Reply Parent Score: 2