Linked by Hadrien Grasland on Sun 29th May 2011 09:42 UTC
OSNews, Generic OSes It's funny how trying to have a consistent system design makes you constantly jump from one area of the designed OS to another. I initially just tried to implement interrupt handling, and now I'm cleaning up the design of an RPC-based daemon model, which will be used to implement interrupt handlers, along with most other system services. Anyway, now that I get to something I'm personally satisfied with, I wanted to ask everyone who's interested to check that design and tell me if anything in it sounds like a bad idea to them in the short or long run. That's because this is a core part of this OS' design, and I'm really not interested in core design mistakes emerging in a few years if I can fix them now. Many thanks in advance.
Thread beginning with comment 474970
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[6]: Comment by Kaj-de-Vos
by xiaokj on Sun 29th May 2011 21:36 UTC in reply to "RE[5]: Comment by Kaj-de-Vos"
xiaokj
Member since:
2005-06-30

Define declarative data, Google and Wikipedia have no idea what this is and I haven't either ;)

Let me help, whatever I can, here. If, and that is a very big "if", I am correct, he is referring to something really esoteric. It should be a design philosophy coming straight out of things like "Art of Unix Programming".

Apparently, he is trying to tell you that there is a very much more abstract way to deal with stuff than the RPC. To work with RPC, you will need to define the function name and its accepted parameters, and that would then be set in stone. If you used declarative data, then, what you would do is to have the library export a datasheet of "what can I do" and when you pick a specific function, "what options are here", complete with version numbers. Preferably XML. Then, the clients can make do with whatever that is provided.

The benefits of this is that major changes can be done a lot easier than before. However, there is a major downside too: it is much harder to code in that form. The benefits tend to pay out over the long run, but still.

The main point of doing things like this, other than the obviously stated one, is that it makes you get used to declarative data structures. They, on the other hand, make much more sense! As the Art of Unix Programming notes, the human mind is a lot better at tackling complex data than complex code flows. Declarative data structures push the complexity into the data side, so that the overall code becomes a lot cleaner, and it is in there that the benefits can be most easily reaped.

Take the pic language for example. It is a lot easier to declare that you want a rectangle of a certain size, and that its top left corner (NW) corner is connected to an arrow that points to a circle of radius so and so. The code then takes care of the rest. These kinds of code tend to stay sane even with extreme longevity whereas if you tried to define things by coordinates, sooner or later your API will be replaced, for such simplistic API are a dime a dozen. Declarative programming is something like that, and it is really time-saving.

I hope I have correctly captured his idea. I don't know anything, actually, so take some salt with this.

Reply Parent Score: 2

RE[7]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Sun 29th May 2011 21:50 in reply to "RE[6]: Comment by Kaj-de-Vos"
Kaj-de-Vos Member since:
2010-06-09

That's pretty good, except:

- It's not esoteric, but widely used. Hence my example of HTML.

- I do not prefer XML. It has become a reflex for people to come up with that immediately, but like RPC, it's an implementation detail. Actually, I think XML is way too heavy.

- Specification sheets (such as DTDs) are not strictly necessary. This is also an implementation detail. A metadata specification is required if all the world needs to know the meaning of the data, but most interfaces are between parties that know each other and don't need to be understood by parties that have no preexisting knowledge of the interface.

- Therefore, there are no inherent drawbacks of difficult implementation. It can be as simple as you make it.

Reply Parent Score: 1

RE[8]: Comment by Kaj-de-Vos
by xiaokj on Sun 29th May 2011 21:58 in reply to "RE[7]: Comment by Kaj-de-Vos"
xiaokj Member since:
2005-06-30

Personally, I prefer something lighter too: the HTTP protocol itself is a wonder, and it is much lighter than the tag heavy XML, of course.

However, a specification sheet is a good idea since implementations can, and do, change. Better to code with expectation of change rather than go by "interface memory". If you wanted to have something be as abstract as declarative would allow, then why strap yourself down with black magic? Again, something light would be very nice too. Maybe just a version number is good enough, but still.

Glad that I could actually understand you with just the magic 2 words. It may not be esoteric, but this is proper old school (actually, more like good sense than old).

Reply Parent Score: 2

RE[7]: Comment by Kaj-de-Vos
by Neolander on Mon 30th May 2011 05:11 in reply to "RE[6]: Comment by Kaj-de-Vos"
Neolander Member since:
2010-03-08

Oh, alright, now I see better what is it going.

It would be a bit like using objects for making calls (yeah, yeah, I know, implementation details and stuff).

A malloc implementation could be described like...

//This is written in the PidginObject language
service Malloc [
....option AllocatedSize
]

And for the caller, it'd be like...

mymalloc = setup_service(Malloc)
mymalloc.setproperty(AllocatedSize = <whatever>)
call_service(mymalloc)

...or if we're a messaging freak...

send_message(daemon, "run_service Malloc with option AllocatedSize = <whaterver>, option SaveWilly = no, etc...")

Actually, I plan to use something like that for UI description.

It has at least the following benefits :

-> You can indeed use language-agnostic headers (like XML or CSS). More precisely, you'd use headers written in your own language.
-> The order in which you put function parameters don't matter. That means that you can change one default parameter without redefining all the others "before" it, since there isn't such a concept
-> You can use a consistent data description language for services and UIs, settings, etc...

There are some tricks worth pointing out, though.

First, a significant header parsing overhead has to be here each time a functionality is called, not only when it is declared. This could be quite problematic for low-level stuff that has to be fast.

If you want to fully implement your own communication protocol, and not use those of an existing programming language, then you have to not only write the function prototypes in your new language, but also describe the data with it. Now this is a tricky one. In C, everything can be described in term of blocks of memory with integers inside and pointers. But there, if you want to do things cleanly using your own specifications, you need to create a syntax for arrays, a syntax for objects, a syntax for strings, a syntax for numbers, etc... one for each data abstraction which you want people to be able to use.

What this means is that you'll have to code a data translator that exactly as complex as a modern compiler, and have a great data conversion overhead, akin to that of having heterogeneous OSs written in different languages and running on different networks communicating over a network, except that it'll occur all the time, even when you remain on a local machine, running a single architecture, and doing calls between programs written in the same language. You do not optimize for the common case.

Astonishingly enough, this does not solve the compatibility problem.

The classical compatibility issue is that functions can gain parameters, but not change name, change the name of parameters, change the order of parameters, or lose parameters.

Here, the object replacing our functions cannot change name either (otherwise processes looking for that service using the old name won't find it). Parameters can't get a different name or disappear for the same reason (programs coded for an old version of the service wouldn't work). So basically, all we can do is change the orders in which parameters are written.

My question is, is it worth the performance hit of going back and forth an intermediate representation each time a call is made ? Is it worth the bloat and security risk of having a the translator around each time something as common as a procedure code is made ? Is it worth the extreme coding complexity of that translator, and the lost comfort of being able to use a large number of assumptions about the language being used ? How about rather writing function parameters in the right order the first time ?

Edited 2011-05-30 05:15 UTC

Reply Parent Score: 1

RE[8]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Mon 30th May 2011 14:02 in reply to "RE[7]: Comment by Kaj-de-Vos"
Kaj-de-Vos Member since:
2010-06-09

You're on the right track here: it's indeed a matter of how parameters are passed (the message). But you're framing most of your thought in traditional terms of code, with calls and parameters and many other details. Doing implementations in those terms has led to the idea that it is complex and costly. As I said in another post, this is not so if you do it right. To do that, you have to forget about all those things that are irrelevant.

Let's make this concrete. How would you implement a service that draws a line? You could draw up a plan including all sorts of functions, parameters, transfer methods, interface description languages and parsers for it, but that is all irrelevant. To draw a line, assuming the pen is set at a starting point, it suffices to specify this:

draw x y

You could call "draw" a function name, but that is irrelevant and assumes too much. It's just a command. x and y are the parameters. Not because they're inherently parameters, but because they're preceded by a command. This is our first self-descriptive feature. But we've already assumed it's a line in a 2D space. At least we haven't assumed it's either a screen or a plotter, but we could make it more general by specifying a 3D line like this:

draw x y z

I don't believe in higher physical dimensions, so we'll leave it at this. We've written it in a human message, so how do we encode this in an efficient machine message that wouldn't be out of place in the core of an operating system? A naive first attempt would say that we need numbers for each component. Both sides of the interface would need to agree on a command set, like syscalls. draw is our first command, and if we encode it in an integer, all parts can have the same encoding:

1
integer
integer

Now this is really not hard to parse, and the performance loss against a C function call is negligible. On the other hand, we haven't improved much on its flexibility yet, except that we are completely free to interpret this as a sync or async command. An important goal is to keep changing interfaces compatible, so we could do that by brute force by prefixing an interface version:

1
1
integer
integer

This is trivial here, but not so in low level code languages such as C. You'd have to depend on symbol versioning, for example, making you dependent on certain toolchains. However, even better than a wholesale interface version is to make compatibility more granular by weaving it into the data. Let's see what happens on changes. Consider the case that you want to move coordinates to floating point to use subpixel precision in your graphics. This actually happened during the development of AtheOS. The abstract specification is still the same:

draw x y

But we would need to bump the interface version because the encoding changes:

2
1
float
float

This makes old clients incompatible with new services when they drop the old interface. We can avoid that by introducing a type system. So far, we have data of three types:

1: command
2: integer
3: float

Here's a typed version of the interface:

1
1 1
3 float
3 float

The parser in the interface becomes a little more complex, but it's still trivial, and very flexible. It's now easy to support the integer interface in the same interface version:

1
1 1
2 integer
2 integer

We're venturing into terrain that low level languages without proper polymorphism can't really support. We can still count the numbers we use on the fingers of one hand, and we already have a powerful type system independent of any implementation language. We're starting to feel very powerful, and confident to look far into the future. We will add types when we need them, but what happens when we introduce new types that old interfaces don't know about? We can keep some new interfaces usable by old clients if they can at least parse the encoding, and skip data they don't understand, or pass it along to interfaces that do understand. When AtheOS switched completely to floating point graphics coordinates, old programs just kept working and were then running in a more advanced environment that they knew nothing about. To keep new types parsable by old interfaces, the encoding needs to include their sizes. We can do this only for new types to optimise the encoding. REBOL has almost sixty data types, so it's fairly safe to reserve a space for hundred standard types. Let's say a mathematician has a weird virtual coordinate space in which he wants to draw:

1
1 1
101 size coordinate
101 size coordinate

So far we have disregarded the starting coordinate for the line. Let's introduce a command to set it:

set x y

1
1 2
3 float
3 float

Now we can draw a line starting anywhere:

set x y
draw p q

1
1 2
3 float
3 float
1 1
3 float
3 float

Note that in RPC, this would be two calls, with the associated overhead, so we're actually becoming more efficient here. But wait, we wanted to support 3D, so we now have to solve the problem of variable length parameter lists. We can write it like this:

set [x y]
draw [p q]

And we will have to encode the number of parameters somehow. To keep the format a generic stream of values, we could associate it with the command encoding:

1
1 2 2
3 float
3 float
1 1 2
3 float
3 float

set [x y z]
draw [p q r]

1
1 2 3
3 float
3 float
3 float
1 1 3
3 float
3 float
3 float

Alternatively, we could introduce a list type and pretend that a command has one parameter, the list:

1
1 2
4 2
3 float
3 float
1 1
4 2
3 float
3 float

Note that this is an alternative encoding for the same specification:

set [x y]
draw [p q]

Does that look prohibitively complicated?

Reply Parent Score: 2

RE[8]: Comment by Kaj-de-Vos
by bouhko on Tue 31st May 2011 20:48 in reply to "RE[7]: Comment by Kaj-de-Vos"
bouhko Member since:
2010-06-24

You might want to have a look at Google's protocol buffers. This is basically a way to define messages that can be serialized/deserialized in multiple languages. It allows you to define services as well (and let you implement the RPC details for your system) :
http://code.google.com/apis/protocolbuffers/docs/reference/cpp-gene...

Reply Parent Score: 1