Linked by Hadrien Grasland on Sun 29th May 2011 09:42 UTC
OSNews, Generic OSes It's funny how trying to have a consistent system design makes you constantly jump from one area of the designed OS to another. I initially just tried to implement interrupt handling, and now I'm cleaning up the design of an RPC-based daemon model, which will be used to implement interrupt handlers, along with most other system services. Anyway, now that I get to something I'm personally satisfied with, I wanted to ask everyone who's interested to check that design and tell me if anything in it sounds like a bad idea to them in the short or long run. That's because this is a core part of this OS' design, and I'm really not interested in core design mistakes emerging in a few years if I can fix them now. Many thanks in advance.
Thread beginning with comment 475033
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[7]: Comment by Kaj-de-Vos
by Neolander on Mon 30th May 2011 05:11 UTC in reply to "RE[6]: Comment by Kaj-de-Vos"
Neolander
Member since:
2010-03-08

Oh, alright, now I see better what is it going.

It would be a bit like using objects for making calls (yeah, yeah, I know, implementation details and stuff).

A malloc implementation could be described like...

//This is written in the PidginObject language
service Malloc [
....option AllocatedSize
]

And for the caller, it'd be like...

mymalloc = setup_service(Malloc)
mymalloc.setproperty(AllocatedSize = <whatever>)
call_service(mymalloc)

...or if we're a messaging freak...

send_message(daemon, "run_service Malloc with option AllocatedSize = <whaterver>, option SaveWilly = no, etc...")

Actually, I plan to use something like that for UI description.

It has at least the following benefits :

-> You can indeed use language-agnostic headers (like XML or CSS). More precisely, you'd use headers written in your own language.
-> The order in which you put function parameters don't matter. That means that you can change one default parameter without redefining all the others "before" it, since there isn't such a concept
-> You can use a consistent data description language for services and UIs, settings, etc...

There are some tricks worth pointing out, though.

First, a significant header parsing overhead has to be here each time a functionality is called, not only when it is declared. This could be quite problematic for low-level stuff that has to be fast.

If you want to fully implement your own communication protocol, and not use those of an existing programming language, then you have to not only write the function prototypes in your new language, but also describe the data with it. Now this is a tricky one. In C, everything can be described in term of blocks of memory with integers inside and pointers. But there, if you want to do things cleanly using your own specifications, you need to create a syntax for arrays, a syntax for objects, a syntax for strings, a syntax for numbers, etc... one for each data abstraction which you want people to be able to use.

What this means is that you'll have to code a data translator that exactly as complex as a modern compiler, and have a great data conversion overhead, akin to that of having heterogeneous OSs written in different languages and running on different networks communicating over a network, except that it'll occur all the time, even when you remain on a local machine, running a single architecture, and doing calls between programs written in the same language. You do not optimize for the common case.

Astonishingly enough, this does not solve the compatibility problem.

The classical compatibility issue is that functions can gain parameters, but not change name, change the name of parameters, change the order of parameters, or lose parameters.

Here, the object replacing our functions cannot change name either (otherwise processes looking for that service using the old name won't find it). Parameters can't get a different name or disappear for the same reason (programs coded for an old version of the service wouldn't work). So basically, all we can do is change the orders in which parameters are written.

My question is, is it worth the performance hit of going back and forth an intermediate representation each time a call is made ? Is it worth the bloat and security risk of having a the translator around each time something as common as a procedure code is made ? Is it worth the extreme coding complexity of that translator, and the lost comfort of being able to use a large number of assumptions about the language being used ? How about rather writing function parameters in the right order the first time ?

Edited 2011-05-30 05:15 UTC

Reply Parent Score: 1

RE[8]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Mon 30th May 2011 14:02 in reply to "RE[7]: Comment by Kaj-de-Vos"
Kaj-de-Vos Member since:
2010-06-09

You're on the right track here: it's indeed a matter of how parameters are passed (the message). But you're framing most of your thought in traditional terms of code, with calls and parameters and many other details. Doing implementations in those terms has led to the idea that it is complex and costly. As I said in another post, this is not so if you do it right. To do that, you have to forget about all those things that are irrelevant.

Let's make this concrete. How would you implement a service that draws a line? You could draw up a plan including all sorts of functions, parameters, transfer methods, interface description languages and parsers for it, but that is all irrelevant. To draw a line, assuming the pen is set at a starting point, it suffices to specify this:

draw x y

You could call "draw" a function name, but that is irrelevant and assumes too much. It's just a command. x and y are the parameters. Not because they're inherently parameters, but because they're preceded by a command. This is our first self-descriptive feature. But we've already assumed it's a line in a 2D space. At least we haven't assumed it's either a screen or a plotter, but we could make it more general by specifying a 3D line like this:

draw x y z

I don't believe in higher physical dimensions, so we'll leave it at this. We've written it in a human message, so how do we encode this in an efficient machine message that wouldn't be out of place in the core of an operating system? A naive first attempt would say that we need numbers for each component. Both sides of the interface would need to agree on a command set, like syscalls. draw is our first command, and if we encode it in an integer, all parts can have the same encoding:

1
integer
integer

Now this is really not hard to parse, and the performance loss against a C function call is negligible. On the other hand, we haven't improved much on its flexibility yet, except that we are completely free to interpret this as a sync or async command. An important goal is to keep changing interfaces compatible, so we could do that by brute force by prefixing an interface version:

1
1
integer
integer

This is trivial here, but not so in low level code languages such as C. You'd have to depend on symbol versioning, for example, making you dependent on certain toolchains. However, even better than a wholesale interface version is to make compatibility more granular by weaving it into the data. Let's see what happens on changes. Consider the case that you want to move coordinates to floating point to use subpixel precision in your graphics. This actually happened during the development of AtheOS. The abstract specification is still the same:

draw x y

But we would need to bump the interface version because the encoding changes:

2
1
float
float

This makes old clients incompatible with new services when they drop the old interface. We can avoid that by introducing a type system. So far, we have data of three types:

1: command
2: integer
3: float

Here's a typed version of the interface:

1
1 1
3 float
3 float

The parser in the interface becomes a little more complex, but it's still trivial, and very flexible. It's now easy to support the integer interface in the same interface version:

1
1 1
2 integer
2 integer

We're venturing into terrain that low level languages without proper polymorphism can't really support. We can still count the numbers we use on the fingers of one hand, and we already have a powerful type system independent of any implementation language. We're starting to feel very powerful, and confident to look far into the future. We will add types when we need them, but what happens when we introduce new types that old interfaces don't know about? We can keep some new interfaces usable by old clients if they can at least parse the encoding, and skip data they don't understand, or pass it along to interfaces that do understand. When AtheOS switched completely to floating point graphics coordinates, old programs just kept working and were then running in a more advanced environment that they knew nothing about. To keep new types parsable by old interfaces, the encoding needs to include their sizes. We can do this only for new types to optimise the encoding. REBOL has almost sixty data types, so it's fairly safe to reserve a space for hundred standard types. Let's say a mathematician has a weird virtual coordinate space in which he wants to draw:

1
1 1
101 size coordinate
101 size coordinate

So far we have disregarded the starting coordinate for the line. Let's introduce a command to set it:

set x y

1
1 2
3 float
3 float

Now we can draw a line starting anywhere:

set x y
draw p q

1
1 2
3 float
3 float
1 1
3 float
3 float

Note that in RPC, this would be two calls, with the associated overhead, so we're actually becoming more efficient here. But wait, we wanted to support 3D, so we now have to solve the problem of variable length parameter lists. We can write it like this:

set [x y]
draw [p q]

And we will have to encode the number of parameters somehow. To keep the format a generic stream of values, we could associate it with the command encoding:

1
1 2 2
3 float
3 float
1 1 2
3 float
3 float

set [x y z]
draw [p q r]

1
1 2 3
3 float
3 float
3 float
1 1 3
3 float
3 float
3 float

Alternatively, we could introduce a list type and pretend that a command has one parameter, the list:

1
1 2
4 2
3 float
3 float
1 1
4 2
3 float
3 float

Note that this is an alternative encoding for the same specification:

set [x y]
draw [p q]

Does that look prohibitively complicated?

Reply Parent Score: 2

RE[9]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Mon 30th May 2011 14:17 in reply to "RE[8]: Comment by Kaj-de-Vos"
Kaj-de-Vos Member since:
2010-06-09

Sorry, I've already made that last example too complex. It's very easy to fall into that trap. Because we defined a command type, the data stream is self-synchronising: if an interface has consumed all the parameters it understands, it can simply skip forward to the next command. So there is strictly no need to define a parameter number or list in this example. Still, they're useful constructs to solve other issues.

Reply Parent Score: 1

RE[9]: Comment by Kaj-de-Vos
by Neolander on Mon 30th May 2011 14:57 in reply to "RE[8]: Comment by Kaj-de-Vos"
Neolander Member since:
2010-03-08

Thanks you a lot, this makes it much easier to understand the concepts which you're invoking.

Some points remain obscure, though...

1/How does the type system help the switch from integer to float in the drawing system ?
2/More generally, is function overloading dealt with by the parser, or by the daemon doing the drawing work ?
3/Biggest issue I have : how is this so different from the kind of RPC which I advocate ? I mean, all in all it still looks a lot like a non-blocking RPC interface implemented on top of a messaging protocol. Even sending batches of RPC requests could be done in a "pure" RPC model, given an extra layer of abstraction that allows to send whole arrays of "RPC call" objects to the RPC subsystem.

Also...

Because we defined a command type, the data stream is self-synchronising: if an interface has consumed all the parameters it understands, it can simply skip forward to the next command.

I fail to see how letting client process send requests with an incorrect set of parameters could be a good idea.

Edited 2011-05-30 15:02 UTC

Reply Parent Score: 1

RE[9]: Comment by Kaj-de-Vos
by Alfman on Mon 30th May 2011 22:17 in reply to "RE[8]: Comment by Kaj-de-Vos"
Alfman Member since:
2011-01-28

Kaj-de-Vos,

"I have been talking about the problem that RPC implies an inflexible semantic data exchange (the payload)."

I've found it frustrating that your posts are so vague. I'm really out of ideas as to what problems you have with "RPC". Your claims may be valid against the least common denominator forms of function prototypes, but there are plenty of counter examples which you've been ignoring.

"Let's make this concrete. How would you implement a service that draws a line?"

Well, your example evolves from just drawing a line to doing more stuff. But the implication that RPC cannot handle "more stuff" is not accurate.

You're assuming a least common denominator approach again, but many modern languages support functions which are extensible. It's not fair to put them all beside C and label all RPC as inadequate.


"You could draw up a plan including all sorts of functions, parameters, transfer methods, interface description languages and parsers for it, but that is all irrelevant. To draw a line, assuming the pen is set at a starting point, it suffices to specify this:"

You're essentially coming up with the foundations of a vector graphics format. You could make it arbitrarily complex. You could support windows 3.0 metafiles or VML or SVG (all vector graphics formats).

Javascript can easily accommodate your example by using JSON arrays and hashes. Web services can be used to connect separate components via HTTP/JSON directly to native types on many platforms including Perl/PHP/.Net/Python.


I think you're assuming that all RPC is limited to transferring only simple types as parameters, but this isn't the case. Today many languages make it possible to call remote procedures with deep objects hierarchies.

I can understand why you'd dislike simple function prototypes as in C (which may be what neolander has in mind), but I don't think your claims hold up against "RPC" in general.

Reply Parent Score: 2

RE[8]: Comment by Kaj-de-Vos
by bouhko on Tue 31st May 2011 20:48 in reply to "RE[7]: Comment by Kaj-de-Vos"
bouhko Member since:
2010-06-24

You might want to have a look at Google's protocol buffers. This is basically a way to define messages that can be serialized/deserialized in multiple languages. It allows you to define services as well (and let you implement the RPC details for your system) :
http://code.google.com/apis/protocolbuffers/docs/reference/cpp-gene...

Reply Parent Score: 1

RE[9]: Comment by Kaj-de-Vos
by Neolander on Tue 31st May 2011 22:56 in reply to "RE[8]: Comment by Kaj-de-Vos"
Neolander Member since:
2010-03-08

Okay, so if I understand it correctly it's about having a code generator that generates both sides of the RPC call based on a description language, right ? Sounds pretty neat indeed ;)

The regular deprecation warnings at the beginning of the linked paragraph bug me, though.

Reply Parent Score: 1