Linked by Hadrien Grasland on Sun 29th May 2011 09:42 UTC
OSNews, Generic OSes It's funny how trying to have a consistent system design makes you constantly jump from one area of the designed OS to another. I initially just tried to implement interrupt handling, and now I'm cleaning up the design of an RPC-based daemon model, which will be used to implement interrupt handlers, along with most other system services. Anyway, now that I get to something I'm personally satisfied with, I wanted to ask everyone who's interested to check that design and tell me if anything in it sounds like a bad idea to them in the short or long run. That's because this is a core part of this OS' design, and I'm really not interested in core design mistakes emerging in a few years if I can fix them now. Many thanks in advance.
Order by: Score:
RPC considered harmful
by Kaj-de-Vos on Sun 29th May 2011 12:09 UTC
Kaj-de-Vos
Member since:
2010-06-09

You asked for criticism, so I'll be negative: RPC is a bad concept to base an entire OS on. It's inherently tied to the implementation language and to the implementation details of the services. That makes it difficult to port, hard to keep compatible with itself over time, and thus hard to keep compatible with different versions of the services.

The abstraction level of RPC interfaces is too low. To solve these problems, you need a messaging specification at a higher abstraction level, through declarative data specification.

Reply Score: 1

RE: RPC considered harmful
by Neolander on Sun 29th May 2011 12:52 UTC in reply to "RPC considered harmful"
Neolander Member since:
2010-03-08

You asked for criticism, so I'll be negative: RPC is a bad concept to base an entire OS on. It's inherently tied to the implementation language

Why couldn't wrappers be used? Most Unices have a system API that's linked to C concepts at the core, that doesn't prevent C++ or Python wrappers from being used by people who prefer those languages, at the cost of a small performance hit.

and to the implementation details of the services.

Again, why has it to be the case? A good interface can be standard without revealing implementation details. If I say that my memory allocator is called using the malloc(uint parameter) function, how does it prevent me from changing the memory allocator later ?

That makes it difficult to port,

Define port. What do you want to port where?

hard to keep compatible with itself over time,

Unless I miss something, it's not harder than having a library keep a consistent interface over time. Which is, again, a matter of having the library interface not depend on the implementation details. Why should it be so hard?

and thus hard to keep compatible with different versions of the services.

Not if people don't break the interface every morning.

The abstraction level of RPC interfaces is too low.

Why? If the interface of C-style dynamic libraries is enough, how can the RPC interface, which is just a nonblocking and cross-process variant of it in the end, be different?

To solve these problems, you need a messaging specification at a higher abstraction level, through declarative data specification.

Well, I wait for answers to the questions above before asking for more details about your suggested answer.

Reply Score: 1

RE: RPC considered harmful
by Brendan on Mon 30th May 2011 03:27 UTC in reply to "RPC considered harmful"
Brendan Member since:
2005-11-16

Hi,

The abstraction level of RPC interfaces is too low.


In my opinion, it's the opposite problem - the RPC interface is too high level.

A "call" can be broken into 4 phases - the callee waiting to be called, the caller sending data to the callee, the callee executing, and the callee sending data back to the caller.

This could be described as 3 operations - "wait for data", "send data and wait to receive data back" and "send data and don't wait to receive data back".

Now, stop calling it "data" and call it "a message" (it's the same thing anyway, mostly), and you end up with "get_message()", "send_message_and_wait_for_reply()" and "send__message()".

For synchronous software (e.g. emulating RPC); the callee does "get_message()" and blocks/waits until a message arrives, the caller does "send_message_and_wait_for_reply()" and blocks/waits until it receives the reply; and then the callee does "send_message()" to return the reply. It's exactly like RPC.

The interesting thing is that for asynchronous software, you'd use "send_message()" and "get_message()" and don't need anything else. Basically, by breaking it down into these primitives you get synchronous and asynchronous communication (rather than just synchronous); and people can mix and match without limitations. For example, you could have a fully asynchronous service, where one client process uses synchronous messaging to use the service and another client process uses asynchronous messaging to use the service, and the service itself doesn't need to care what each client is doing.

However, you would probably want to offer a few extra primitives to make things easier. For example, you might consider adding "send_message_and_wait_for_reply_with_timeout()", and "check_for_message()" (which would be like "get_message()" but returns a "NO_MESSAGES" error instead of blocking/waiting for a message when there are no messages to get).

-Brendan

Reply Score: 2

RE[2]: RPC considered harmful
by Alfman on Mon 30th May 2011 05:43 UTC in reply to "RE: RPC considered harmful"
Alfman Member since:
2011-01-28

Brendan,

"A 'call' can be broken into 4 phases - the callee waiting to be called, the caller sending data to the callee, the callee executing, and the callee sending data back to the caller."

I've done this before, usually passing XML data structures around and manipulating them with DOM & SAX Parsers. While the approach is flexible, I'd personally be terrified to work on a system where this model is used exclusively to glue hundreds or thousands of components together (as in an operating system).

Can you illustrate why breaking down messaging to such a low level is superior to what .net does with marshalling and web service proxy objects?

If you are not familiar with it, the .net compiler takes a SOAP web service and builds a proxy class which exposes all the functions in the SOAP interface. The proxy class exposes both synchronous functions and asynchronous ones.

MyWebService x = new MyWebService();
result = x.MyFunction(...); // synchronous
AsyncRequest r = x.Begin_MyFunction(...); // Async
... // other code
result = x.End_MyFunction(r); // Async return


Is there a good reason typical devs might want to access the messaging stack at a lower level than this?

Keep in mind, that a programmer could always pass a single hash table to any function, which would technically be as expressive and extensible as any other conceivable messaging protocol (so long as the inner objects are either serializable or marshalled).

Edited 2011-05-30 05:46 UTC

Reply Score: 2

RE[2]: RPC considered harmful
by Neolander on Mon 30th May 2011 05:48 UTC in reply to "RE: RPC considered harmful"
Neolander Member since:
2010-03-08

I probably shouldn't use the "RPC" term, you too got confused into thinking that I was talking about blocking calls, while I am in fact doing nonblocking calls.

Once you have a nonblocking call interface, you can trivially implement a blocking call interface on top of it. I simply choose not to do it because I don't want to favor this kind of dirty practices if I can avoid to.

As for RPC being too high level, well... I'm tempted to say that pipes are too low level.

Don't get me wrong, pipes are great for programs of the "streaming" kind, which have an input data stream, process it, and return results in an output data stream. That's why I have them. But most tasks of a system API do not belong to the data stream processing family, and are more about executing a stream of instructions.

In that case, pipes are too low-level, because they are fundamentally a transportation medium for integer data, not instructions. If you want to send instructions across a pipe, you have to use a communication protocol on top of the pipe layer in order to get an instruction representation, so what you have is user-mode RPC implemented on top of the pipe IPC primitive.

I personally think that if an IPC primitive is to be very frequently used, it's better to implement it directly in the kernel (or at least parts of it), due to the extra control it gives over the communication process. The kernel executes trusted code, but library code can be compromised.

Reply Score: 1

RE[3]: RPC considered harmful
by Brendan on Mon 30th May 2011 10:08 UTC in reply to "RE[2]: RPC considered harmful"
Brendan Member since:
2005-11-16

Hi,

I probably shouldn't use the "RPC" term, you too got confused into thinking that I was talking about blocking calls, while I am in fact doing nonblocking calls.


A call is something that doesn't return until it completes. A "non-blocking call" is something that defies logic.. ;)

I got the impression that your "non-blocking call" is a pair of normal/blocking calls, where (for e.g.) the address of the second call is passed as an argument to the first call (a callback). I also got the impression you're intending to optimise the implementation, so that blocking calls that return no data don't actually block (but that's an implementation detail rather than something that effects the conceptual model).

As for RPC being too high level, well... I'm tempted to say that pipes are too low level.


I'm not sure where pipes were mentioned by anyone, but I don't really like them much because they force the receiver to do extra work to determine where each "piece of data" ends.

Pipes can also make scheduling less efficient. For e.g. if a task unblocks when it receives IPC (like it should), then a task can unblock, look at what it received, realise it hasn't received enough data to do anything useful, and block again; which is mostly a waste of time (and task switches).

For an analogy (to summarise), think of email. Asynchronous messaging is like people writing emails and sending them to each other whenever they want while they do other things. Synchronous messaging and RPC is like people sending emails and then sitting there doing nothing for hours while they wait for a reply. Pipes are like people sending pieces of a conversation - "I just sent this email to say hell", "o and wish you a happy birth", "day.\n -Fred\000Dear sir, we are"...

I personally think that if an IPC primitive is to be very frequently used, it's better to implement it directly in the kernel (or at least parts of it), due to the extra control it gives over the communication process. The kernel executes trusted code, but library code can be compromised.


I assumed IPC primitives would be implemented directly in the kernel because you can't implement IPC anywhere else. For example, if you have an "IPC service" implemented as a process/daemon, how would processes communicate with the "IPC service"?

The other thing to consider is that usually IPC has a certain amount of control over the scheduler - tasks block when waiting for IPC, and tasks unblock (and then potentially preempt) when they receive IPC, so it makes sense to implement it near the scheduler.

- Brendan

Reply Score: 2

RE[4]: RPC considered harmful
by Neolander on Mon 30th May 2011 11:41 UTC in reply to "RE[3]: RPC considered harmful"
Neolander Member since:
2010-03-08

A call is something that doesn't return until it completes. A "non-blocking call" is something that defies logic.. ;)

I got the impression that your "non-blocking call" is a pair of normal/blocking calls, where (for e.g.) the address of the second call is passed as an argument to the first call (a callback). I also got the impression you're intending to optimise the implementation, so that blocking calls that return no data don't actually block (but that's an implementation detail rather than something that effects the conceptual model).

What I want to do is...

1/Process A gives work to do to process B through a "fast" system call, that in turn calls a function of B in a new thread using a stack of parameters given by A.
2/Process A forgets about it and goes doing something else.
3/When process B is done, it sends a callback to process A through the same mechanism using which A has given B work to do (running a function of A). Callbacks may have parameters, the "results" of the operation.

Does it remind you of something ?

I'm not sure where pipes were mentioned by anyone, but I don't really like them much because they force the receiver to do extra work to determine where each "piece of data" ends.

For me send_message() and get_message() was like pipe operation (you send messages to or receive messages from the pipe). Sorry if I didn't get it.

For an analogy (to summarise), think of email. Asynchronous messaging is like people writing emails and sending them to each other whenever they want while they do other things. Synchronous messaging and RPC is like people sending emails and then sitting there doing nothing for hours while they wait for a reply. Pipes are like people sending pieces of a conversation - "I just sent this email to say hell", "o and wish you a happy birth", "day.\n -Fred\000Dear sir, we are"...

Then what I do is definitely not RPC in the usual sense, as it is an asynchronous mechanism too. If the above description reminds you of some better name, please let me now.

I assumed IPC primitives would be implemented directly in the kernel because you can't implement IPC anywhere else. For example, if you have an "IPC service" implemented as a process/daemon, how would processes communicate with the "IPC service"?

If you have something like a pipe or message queue, you can implement higher-level IPC protocols on top of it, and use user-space libraries to implement a new IPC mechanism that uses these protocols. That's what I was talking about. But except when trying to make the kernel unusually tiny, I'm not sure it's a good idea either.

The other thing to consider is that usually IPC has a certain amount of control over the scheduler - tasks block when waiting for IPC, and tasks unblock (and then potentially preempt) when they receive IPC, so it makes sense to implement it near the scheduler.

Totally agree.

Edited 2011-05-30 11:57 UTC

Reply Score: 1

RE[5]: RPC considered harmful
by Brendan on Tue 31st May 2011 02:44 UTC in reply to "RE[4]: RPC considered harmful"
Brendan Member since:
2005-11-16

What I want to do is...

1/Process A gives work to do to process B through a "fast" system call, that in turn calls a function of B in a new thread using a stack of parameters given by A.
2/Process A forgets about it and goes doing something else.
3/When process B is done, it sends a callback to process A through the same mechanism using which A has given B work to do (running a function of A). Callbacks may have parameters, the "results" of the operation.

Does it remind you of something ?


While I can see some similarities between this and asynchronous messaging, there's also significant differences; including the overhead of creating (and eventually destroying) threads, which (in my experience) is the third most expensive operation microkernels do (after creating and destroying processes).

On top of that, (because you can't rely on the queues to serialise access to data structures) programmers would have to rely on something else for reentrancy control; like traditional locking, which is error-prone (lots of programmers find it "hard" and/or screw it up) and adds extra overhead (e.g. mutexes with implied task switches when under lock contention).

I also wouldn't underestimate the effect that IPC overhead will have on the system as a whole (especially for "micro-kernel-like" kernels). For example, if IRQs are delivered to device drivers via. IPC, then on a server under load (with high speed ethernet, for e.g.) you can expect thousands of IRQs per second (and expect to be creating and destroying thousands of threads per second). Once you add normal processes communicating with each other, this could easily go up to "millions per second" under load. If IPC costs twice as much as it does on other OSs, then the resulting system as a whole can be 50% slower than comparable systems (e.g. other micro-kernels) because of the IPC alone.

If you have something like a pipe or message queue, you can implement higher-level IPC protocols on top of it, and use user-space libraries to implement a new IPC mechanism that uses these protocols. That's what I was talking about. But except when trying to make the kernel unusually tiny, I'm not sure it's a good idea either.


In general, any form of IPC can be implemented on top of any other form of IPC. In practice it's not quite that simple because you can't easily emulate the intended interaction with scheduling (blocking/unblocking, etc) in all cases; and even in cases where you can there's typically some extra overhead involved.

The alternative would be if the kernel has inbuilt support for multiple different forms of IPC; which can lead to a "Tower of Babel" situation where it's awkward for different processes (using different types of IPC) to communicate with each other.

Basically, you want the kernel's inbuilt/native IPC to be adequate for most purposes, with little or no scaffolding in user-space.

- Brendan

Reply Score: 2

RE[6]: RPC considered harmful
by Neolander on Tue 31st May 2011 07:26 UTC in reply to "RE[5]: RPC considered harmful"
Neolander Member since:
2010-03-08

While I can see some similarities between this and asynchronous messaging, there's also significant differences; including the overhead of creating (and eventually destroying) threads, which (in my experience) is the third most expensive operation microkernels do (after creating and destroying processes).

Ah, Brendan, Brendan, how do you always manage to be so kind and helpful with people who play with OSdeving ? Do you teach it in real life or something ?

Anyway, have you pushed your investigation so far that you know which step of the thread creation process is expensive ? Maybe it's something whose impact can be reduced...

On top of that, (because you can't rely on the queues to serialise access to data structures) programmers would have to rely on something else for reentrancy control; like traditional locking, which is error-prone (lots of programmers find it "hard" and/or screw it up) and adds extra overhead (e.g. mutexes with implied task switches when under lock contention).

This has been pointed out by Alfman, solved by introducing an asynchronous operating mode where pending threads are queued and run one after the other. Sorry for not mentioning it in the post where I try to describe my model, when I noticed the omission it was already too late to edit.

I also wouldn't underestimate the effect that IPC overhead will have on the system as a whole (especially for "micro-kernel-like" kernels).

I know, I know, but then we reach one of those chicken and egg problems which are always torturing me : how do I know that my IPC design is "light enough" without doing measurements on a working system for real-world use cases ? And how do I perform these measurements on something which I'm currently designing and is not implemented yet ?

For example, if IRQs are delivered to device drivers via. IPC, then on a server under load (with high speed ethernet, for e.g.) you can expect thousands of IRQs per second (and expect to be creating and destroying thousands of threads per second). Once you add normal processes communicating with each other, this could easily go up to "millions per second" under load. If IPC costs twice as much as it does on other OSs, then the resulting system as a whole can be 50% slower than comparable systems (e.g. other micro-kernels) because of the IPC alone.

First objection which spontaneously comes to my mind is that this OS is not designed to run on server, but rather on desktop and smaller single-user computers.

Maybe desktop use cases also include the need to endure thousands of IRQ per second though, but I was under the impression that desktop computers are ridiculously powerful compared to what one asks from their OSs and that their reactivity issues rather come from things like poor task scheduling ("running the divx encoding process more often than the window manager") or excessive dependency on disk I/O.

In general, any form of IPC can be implemented on top of any other form of IPC. In practice it's not quite that simple because you can't easily emulate the intended interaction with scheduling (blocking/unblocking, etc) in all cases; and even in cases where you can there's typically some extra overhead involved.

Understood.

The alternative would be if the kernel has inbuilt support for multiple different forms of IPC; which can lead to a "Tower of Babel" situation where it's awkward for different processes (using different types of IPC) to communicate with each other.

Actually, I tend to lean towards this solution, even though I know of the Babel risk and have regularly thought about it, because each IPC mechanism has its strength and weaknesses. As an example, piping and messaging systems are better when processing a stream of data, while remote calls are better suited when giving a process some tasks to do.

You're right that I need to keep the number of available IPC primitives very small regardless of the benefits of each, though, so there's a compromise there and I have to investigate the usefulness of each IPC primitive.

Edited 2011-05-31 07:28 UTC

Reply Score: 1

RE[7]: RPC considered harmful
by Brendan on Thu 2nd Jun 2011 10:02 UTC in reply to "RE[6]: RPC considered harmful"
Brendan Member since:
2005-11-16

PART I

Hi,

"While I can see some similarities between this and asynchronous messaging, there's also significant differences; including the overhead of creating (and eventually destroying) threads, which (in my experience) is the third most expensive operation microkernels do (after creating and destroying processes).

Ah, Brendan, Brendan, how do you always manage to be so kind and helpful with people who play with OSdeving ? Do you teach it in real life or something ?

Anyway, have you pushed your investigation so far that you know which step of the thread creation process is expensive ? Maybe it's something whose impact can be reduced...
"

Thread creation overhead depends on a lot of things; like where the user-space stack is (and if it's pre-allocated by the caller), how kernel stacks are managed (one kernel stack per thread?), how CPU affinity and CPU load balancing works, how much state is saved/restored on thread switches and must be initialised to default values during thread creation (general registers, FPU/MMX/SSE, debug registers, performance monitoring registers, etc), how thread local storage is managed, etc.

For an extremely simple OS (single-CPU only, no support for FPU/MMX/SSE, no "per thread" debugging, no "per thread" performance monitoring, no thread local storage, no "this thread has used n cycles" time accountancy) that uses one kernel stack (e.g. an unpreemptable kernel); if the initial state of a thread's registers is "undefined", and the thread's stack is pre-allocated; then it could be very fast. Not sure anyone would want an OS like that though (maybe for embedded systems).

Also, if other operations that a kernel does are extremely slow then thread creation could be "faster than extremely slow" in comparison.

There's something else here too though. For most OSs, typiaclly only a thread within a process can create a thread for that process; which means that at the start of thread creation the CPU/kernel is using the correct process' address space, so it's easier to setup the new thread's stack and thread local storage. For your IPC this isn't the case (the sending process' address space would be in use at the time you begin creating a thread for receiving process), so you might need to switch address spaces during thread creation (and blow away TLB entries, etc) if you don't do it in a "lazy" way (postpone parts of thread creation until the thread first starts running).


"On top of that, (because you can't rely on the queues to serialise access to data structures) programmers would have to rely on something else for reentrancy control; like traditional locking, which is error-prone (lots of programmers find it "hard" and/or screw it up) and adds extra overhead (e.g. mutexes with implied task switches when under lock contention).

This has been pointed out by Alfman, solved by introducing an asynchronous operating mode where pending threads are queued and run one after the other. Sorry for not mentioning it in the post where I try to describe my model, when I noticed the omission it was already too late to edit.
"

Hehe. Let's optimise the implementation of this!

You could speed it up by having a per-process "thread cache". Rather than actually destroying a thread, you could pretend to destroy it and put it into a "disused thread pool" instead, and then recycle these existing/disused threads when a new thread needs to be created. To maximise the efficiency of your "disused thread pool" (so you don't have more "disused threads" than necessary), you could create (or pretend to create) the new thread when IPC is being delivered to the receiver and not when IPC is actually sent. To do that you'd need a queue of "pending IPC". That way, for asynchronous operating mode you'd only have a maximum of one thread (per process), where you pretend to destroy it, then recycle it to create a "new" thread, and get the data needed for the "new" thread from the queue of "pending IPC".

Now that it's been optimised, it looks very similar to my "FIFO queue of messages". Instead of calling "getmessage()" and blocking until a message is received, you'd be calling "terminate_thread()" and being put into a "disused thread pool" until IPC is received. The only main difference (other than terminology) is that you'd still be implicitly creating one initial thread.

[Continued in Part II - silly 8000 character limit...]

Reply Score: 2

RE[2]: RPC considered harmful
by Kaj-de-Vos on Mon 30th May 2011 12:29 UTC in reply to "RE: RPC considered harmful"
Kaj-de-Vos Member since:
2010-06-09

You're talking about the transport method. That is indeed the other side of the coin. I have been talking about the problem that RPC implies an inflexible semantic data exchange (the payload). You're right that RPC also implies an inflexible transport method.

Reply Score: 1

Comment by Kaj-de-Vos
by Kaj-de-Vos on Sun 29th May 2011 13:20 UTC
Kaj-de-Vos
Member since:
2010-06-09

I'm not really interested in spending a lot of time discussing this, sorry. You asked for warnings, and this is mine. We could argue endlessly about the details, but it boils down to this: the abstraction level of declarative messaging is higher than RPC. Leaking of implementation details is detrimental to interfacing with other hardware architectures, binding with other languages, and interfacing with historical versions of interfaces on the same language and hardware architecture. Therefore, a higher abstraction level is desirable.

Reply Score: 1

RE: Comment by Kaj-de-Vos
by Alfman on Sun 29th May 2011 19:17 UTC in reply to "Comment by Kaj-de-Vos"
Alfman Member since:
2011-01-28

Kaj-de-Vos,

"Leaking of implementation details is detrimental to interfacing with other hardware architectures"

I understand all your initial criticisms, however I'm curious how an RPC interface leads to leaking of implementation details?

Corba interfaces are completely portable across many languages/platforms, including scripting languages.

Heck, just using corba itself would provide instant RPC compatibility with almost all serious languages out there.

If corba is too heavy weight to use in the OS, one could still provide an OS binding for it - that might even be a novel feature for the OS.

Reply Score: 2

RE[2]: Comment by Kaj-de-Vos
by Neolander on Sun 29th May 2011 19:33 UTC in reply to "RE: Comment by Kaj-de-Vos"
Neolander Member since:
2010-03-08

If you understood his criticism, could you please answer my questions ? Or at least some of them ? I still don't get what his problem is myself, and it seems that he isn't interested in answering...

Edited 2011-05-29 19:37 UTC

Reply Score: 1

RE[3]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Sun 29th May 2011 19:43 UTC in reply to "RE[2]: Comment by Kaj-de-Vos"
Kaj-de-Vos Member since:
2010-06-09

I will, but I got the feeling that you are not ready to accept the criticism you asked for. The overarching problem here is that most of the world is in mental models that consist of code instead of data, and thus code calls instead of semantic interchange, and thus implementation details of how to do something, instead of what to do. It turns out to be hard for people to switch models, so I have stopped trying over time.

Reply Score: 1

RE[2]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Sun 29th May 2011 19:38 UTC in reply to "RE: Comment by Kaj-de-Vos"
Kaj-de-Vos Member since:
2010-06-09

In RPC, you assume that the remote end has a procedure you can call. That's a big assumption. To make it work, you assume that the remote procedure is written in the same programming language. That's a huge implementation "detail".

Remote objects are just an object oriented extension of the RPC concept. They were en vogue in the nineties, when everybody switched to remote architectures. This was when CORBA and other such methods were found to be too cumbersome.

Messaging has a long history, really. These lessons were already learned in AmigaOS, BeOS, Syllable and new messaging systems such as 0MQ. You can also ask yourself what the most successful remote protocol does. Is HTTP/HTML RPC based?

Reply Score: 1

RE[3]: Comment by Kaj-de-Vos
by Neolander on Sun 29th May 2011 20:02 UTC in reply to "RE[2]: Comment by Kaj-de-Vos"
Neolander Member since:
2010-03-08

Well, this looks like the beginning of an answer, so if you allow me...

In RPC, you assume that the remote end has a procedure you can call. That's a big assumption.

At the core, we have this: daemon process wants to inform the kernel that there's a range of things which it can do for other processes. The procedure/function abstraction sounded like the simplest one around the "things which it can do" concept to me.

To make it work, you assume that the remote procedure is written in the same programming language. That's a huge implementation "detail".

Hmmm... Can you mention a modern, serious programming language (joke languages like BF don't count) that does not have the concepts of a function or a pointer ? Because once the concepts are there, dealing with the switch from one language to another during a call is just a matter of gory implementation magic.

Messaging has a long history, really. These lessons were already learned in AmigaOS, BeOS, Syllable and new messaging systems such as 0MQ. You can also ask yourself what the most successful remote protocol does. Is HTTP/HTML RPC based?

I'd prefer it if we didn't put the notions of long story and success in there. DOS has a long story, Windows is successful. Does it mean that these are examples which everyone in the OS development community would like to follow ?

Edited 2011-05-29 20:05 UTC

Reply Score: 1

RE[4]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Sun 29th May 2011 20:21 UTC in reply to "RE[3]: Comment by Kaj-de-Vos"
Kaj-de-Vos Member since:
2010-06-09

You keep defending your existing notions, instead of entertaining the notion I introduced that is apparently new to you. Do you agree that declarative data is at a higher abstraction level than a procedure call? Do you agree that not specifying an implementation language is simpler than specifying a specific language?

If you are not willing to look at common implementations, lessons from history become meaningless, either good or bad. Do you have experience with messaging in Amiga, BeOS, Syllable, microkernels, REBOL, enterprise messaging, or anything else?

Reply Score: 1

RE[5]: Comment by Kaj-de-Vos
by Neolander on Sun 29th May 2011 20:38 UTC in reply to "RE[4]: Comment by Kaj-de-Vos"
Neolander Member since:
2010-03-08

You keep defending your existing notions, instead of entertaining the notion I introduced that is apparently new to you.

I work this way. If you want to prove that your way of thinking is better than mine, you have to expose clearly what's wrong in my way of thinking. Alfman has been successfully doing this when defending the async model vs the threaded model, as such async has now more room in my IPC model.

Do you agree that declarative data is at a higher abstraction level than a procedure call?

Define declarative data, Google and Wikipedia have no idea what this is and I haven't either ;)

Do you agree that not specifying an implementation language is simpler than specifying a specific language?

Simpler? Oh, certainly not, if you consider the whole development cycle. The higher-level abstractions are, the more complicated working with them tends to be, as soon as you get away from the path drawn for you by the abstraction manufacturer and you have to think about what the abstraction actually is (which is, say, the case when implementing the abstraction)

As an example, when explaining sorting algorithms, it is common to draw some sketches that implicitly represent lists (packs of data with an "insert" operation). Now, try to visually represent sorting in an abstract storage area that may be as well a list or an array. How hard is that ?

As another example, which programming abstraction is easier to define to someone who has no programming knowledge : a function or an object ?

If you are not willing to look at common implementations, lessons from history become meaningless, either good or bad. Do you have experience with messaging in Amiga, BeOS, Syllable, microkernels, REBOL, enterprise messaging, or anything else?

I'm not sure what is it that you're calling messaging actually. Are you talking about the concept of processes communicating by sending data to each other (pipes), the idea of communicating over such a data link with a messaging protocol (like HTTP), ... ?

Edited 2011-05-29 20:48 UTC

Reply Score: 1

RE[6]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Sun 29th May 2011 21:01 UTC in reply to "RE[5]: Comment by Kaj-de-Vos"
Kaj-de-Vos Member since:
2010-06-09

I don't have to convince you. You asked for criticisms that would be useful to you. If you don't consider what you requested, it won't be useful to you. It seems my impression was right that you don't understand the concept of messaging, and it would take me a lot of time and energy to change your mental model.

Reply Score: 1

RE[6]: Comment by Kaj-de-Vos
by xiaokj on Sun 29th May 2011 21:36 UTC in reply to "RE[5]: Comment by Kaj-de-Vos"
xiaokj Member since:
2005-06-30

Define declarative data, Google and Wikipedia have no idea what this is and I haven't either ;)

Let me help, whatever I can, here. If, and that is a very big "if", I am correct, he is referring to something really esoteric. It should be a design philosophy coming straight out of things like "Art of Unix Programming".

Apparently, he is trying to tell you that there is a very much more abstract way to deal with stuff than the RPC. To work with RPC, you will need to define the function name and its accepted parameters, and that would then be set in stone. If you used declarative data, then, what you would do is to have the library export a datasheet of "what can I do" and when you pick a specific function, "what options are here", complete with version numbers. Preferably XML. Then, the clients can make do with whatever that is provided.

The benefits of this is that major changes can be done a lot easier than before. However, there is a major downside too: it is much harder to code in that form. The benefits tend to pay out over the long run, but still.

The main point of doing things like this, other than the obviously stated one, is that it makes you get used to declarative data structures. They, on the other hand, make much more sense! As the Art of Unix Programming notes, the human mind is a lot better at tackling complex data than complex code flows. Declarative data structures push the complexity into the data side, so that the overall code becomes a lot cleaner, and it is in there that the benefits can be most easily reaped.

Take the pic language for example. It is a lot easier to declare that you want a rectangle of a certain size, and that its top left corner (NW) corner is connected to an arrow that points to a circle of radius so and so. The code then takes care of the rest. These kinds of code tend to stay sane even with extreme longevity whereas if you tried to define things by coordinates, sooner or later your API will be replaced, for such simplistic API are a dime a dozen. Declarative programming is something like that, and it is really time-saving.

I hope I have correctly captured his idea. I don't know anything, actually, so take some salt with this.

Reply Score: 2

RE[7]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Sun 29th May 2011 21:50 UTC in reply to "RE[6]: Comment by Kaj-de-Vos"
Kaj-de-Vos Member since:
2010-06-09

That's pretty good, except:

- It's not esoteric, but widely used. Hence my example of HTML.

- I do not prefer XML. It has become a reflex for people to come up with that immediately, but like RPC, it's an implementation detail. Actually, I think XML is way too heavy.

- Specification sheets (such as DTDs) are not strictly necessary. This is also an implementation detail. A metadata specification is required if all the world needs to know the meaning of the data, but most interfaces are between parties that know each other and don't need to be understood by parties that have no preexisting knowledge of the interface.

- Therefore, there are no inherent drawbacks of difficult implementation. It can be as simple as you make it.

Reply Score: 1

RE[7]: Comment by Kaj-de-Vos
by Neolander on Mon 30th May 2011 05:11 UTC in reply to "RE[6]: Comment by Kaj-de-Vos"
Neolander Member since:
2010-03-08

Oh, alright, now I see better what is it going.

It would be a bit like using objects for making calls (yeah, yeah, I know, implementation details and stuff).

A malloc implementation could be described like...

//This is written in the PidginObject language
service Malloc [
....option AllocatedSize
]

And for the caller, it'd be like...

mymalloc = setup_service(Malloc)
mymalloc.setproperty(AllocatedSize = <whatever>)
call_service(mymalloc)

...or if we're a messaging freak...

send_message(daemon, "run_service Malloc with option AllocatedSize = <whaterver>, option SaveWilly = no, etc...")

Actually, I plan to use something like that for UI description.

It has at least the following benefits :

-> You can indeed use language-agnostic headers (like XML or CSS). More precisely, you'd use headers written in your own language.
-> The order in which you put function parameters don't matter. That means that you can change one default parameter without redefining all the others "before" it, since there isn't such a concept
-> You can use a consistent data description language for services and UIs, settings, etc...

There are some tricks worth pointing out, though.

First, a significant header parsing overhead has to be here each time a functionality is called, not only when it is declared. This could be quite problematic for low-level stuff that has to be fast.

If you want to fully implement your own communication protocol, and not use those of an existing programming language, then you have to not only write the function prototypes in your new language, but also describe the data with it. Now this is a tricky one. In C, everything can be described in term of blocks of memory with integers inside and pointers. But there, if you want to do things cleanly using your own specifications, you need to create a syntax for arrays, a syntax for objects, a syntax for strings, a syntax for numbers, etc... one for each data abstraction which you want people to be able to use.

What this means is that you'll have to code a data translator that exactly as complex as a modern compiler, and have a great data conversion overhead, akin to that of having heterogeneous OSs written in different languages and running on different networks communicating over a network, except that it'll occur all the time, even when you remain on a local machine, running a single architecture, and doing calls between programs written in the same language. You do not optimize for the common case.

Astonishingly enough, this does not solve the compatibility problem.

The classical compatibility issue is that functions can gain parameters, but not change name, change the name of parameters, change the order of parameters, or lose parameters.

Here, the object replacing our functions cannot change name either (otherwise processes looking for that service using the old name won't find it). Parameters can't get a different name or disappear for the same reason (programs coded for an old version of the service wouldn't work). So basically, all we can do is change the orders in which parameters are written.

My question is, is it worth the performance hit of going back and forth an intermediate representation each time a call is made ? Is it worth the bloat and security risk of having a the translator around each time something as common as a procedure code is made ? Is it worth the extreme coding complexity of that translator, and the lost comfort of being able to use a large number of assumptions about the language being used ? How about rather writing function parameters in the right order the first time ?

Edited 2011-05-30 05:15 UTC

Reply Score: 1

RE[3]: Comment by Kaj-de-Vos
by Alfman on Sun 29th May 2011 20:20 UTC in reply to "RE[2]: Comment by Kaj-de-Vos"
Alfman Member since:
2011-01-28

"In RPC, you assume that the remote end has a procedure you can call."

Well, that's a given, but we're talking semantics here. Whether your talking about dos interrupts, linux syscalls, vector calls, we're still technically calling a "procedure".

I guess you are referring to different mechanisms for parameter passing?

It's true there are different incompatible types (for example __cdelc or __stdcall), and these may even have subtle differences from platform to platform (passing floating point values in ST0 instead of stack). But these are strictly binary differences, all models are compatible at a source level - I just need to recompile.

"That's a big assumption. To make it work, you assume that the remote procedure is written in the same programming language."

Why did you ignore my counter example? In any case, this is no different than windows or linux being written around C callers.

"That's a huge implementation 'detail'."

Exactly, it's an implementation detail which end users rarely if ever need to concern themselves with. People don't need to know the calling conventions of their platforms to be able to write code.

Reply Score: 2

RE[4]: Comment by Kaj-de-Vos
by Kaj-de-Vos on Sun 29th May 2011 20:30 UTC in reply to "RE[3]: Comment by Kaj-de-Vos"
Kaj-de-Vos Member since:
2010-06-09

All the things you talk about are procedure calls. If you never consider the alternative of declarative messaging, you won't see the difference.

Reply Score: 1

Comment by Kasi
by Kasi on Sun 29th May 2011 21:53 UTC
Kasi
Member since:
2008-07-12

Hey Kaj,

I'm hitting a bit of a wall with google in looking for information on working with a declarative data model. Can you point me to a book or other source so I can read on my own?

Reply Score: 1

RE: Comment by Kasi
by Kaj-de-Vos on Sun 29th May 2011 22:04 UTC in reply to "Comment by Kasi"
Kaj-de-Vos Member since:
2010-06-09

Hmm, this is such a general concept that I don't know of any specific texts just about that topic. The previous poster said it is treated in ESR's hacker's bible, so that would be a good example. I learned it over the years in several of the systems I mentioned. Especially the REBOL language is excellent to form your mental model, because it implements this concept very purely, fundamentally and pervasively.

There are also many overlapping concepts, such as data driven programming, table driven programming, template oriented programming and modeling and markup languages, which are often different names for basically the same thing. Such concepts have sections on Wikipedia.

Reply Score: 1