Linked by Dennis Heuer on Wed 25th Aug 2010 22:23 UTC
Linux I came across a news entry at Phoronix about a new init replacement, systemd, and curiously started a read into the surprisingly heavy matter. Systemd is by no means as simple as upstart. It does far more things far more straight and in more detail. The differences are so significant that they enforce quite different configuration strategies. One can argue for both, depending on the goal to reach. However, that's not what I want to write about. After having read what systemd is capable of, and how it does it, I began to put the existence of all system daemons - in their today's forms - in question.
Order by: Score:
Systemd doesn't quite work like that
by Zifre on Thu 26th Aug 2010 00:27 UTC
Zifre
Member since:
2009-10-04

Many of the arguments in the article are based on an incorrect understanding of how SystemD works. Ultimately, an auditing system would not be capable of most of the things that SystemD does (unless it were made into a rampant layering violation ;) ).

Systemd observes the full system actively. It observes paths to be informed if a program attempts to access a socket. It observes hardware events to be informed if a resource is available. It even observes mountpoints to dynamically mount resources at access.

This is not completely true. SystemD doesn't have to actively observe and act upon any of the things that you mentioned. Sockets are created ahead of time, and SystemD leaves them alone (the kernel buffers the data). Hardware events are mainly observed by Udev (SystemD has very little hardware logic). And mount points are handled by AutoFS, also in the kernel.

More than that, it is principally capable of doing things a userspace supervisor can't, like stopping an action from happening or pausing it until something else is done. For example, if systemd recognizes a mountpoint access, it can mount the resource immediately. But, is that quick enough? Systemd has no influence on the accessing process and thus can't turn it into sleep until the mount happened.

Actually, SystemD can and does do this. It sets up AutoFS mounts. Any access will cause the process to block until the real file system is mounted. An auditing system would not be able to do this any better than SystemD can.

There are other obstacles systemd faces like described in this entry. It goes like inotify is worse because of no atomic access and thus creating racy conditions but found a weird workaround.

An auditing system would not be able to fix this problem. What is needed is a transactional file system. I am actually working on a transactional file system layer for the Linux kernel (about which I may write an article for OSNews someday ;) ).

Edited 2010-08-26 00:27 UTC

Reply Score: 5

ciplogic Member since:
2006-12-22

Fully agree. SystemD is about integration and much basic monitoring. The fact that a subsystem is restarted automatically if it crash. does not make that all systems (daemons, services, whatever) are obsolete and the rest is "the service". A daemon that restarts automatically the X Server, in case of crashing or if user is in graphic mode, does not make it that the X Server is obsolete.
So I the author simply gets the fact that SystemD will do a better logic to restart services and so on, but the final conclusion is wrong.

Reply Score: 2

the_author Member since:
2010-08-26

Ultimately, an auditing system would not be capable of most of the things that SystemD does (unless it were made into a rampant layering violation ;) )

i believe that SELinux is such a rampart layering violation. it even spreads into user-space libraries and tools. but the real fact SELinux proves is that auditing sytems are meant to become omnipotent. so, can you safely state that an auditing system will _not_ implement everything systemd is dreaming of into its observing code?
SystemD doesn't have to actively observe and act upon any of the things that you mentioned. Sockets are created ahead of time, and SystemD leaves them alone (the kernel buffers the data). Hardware events are mainly observed by Udev (SystemD has very little hardware logic). And mount points are handled by AutoFS, also in the kernel.

this rather sounds like you agree to the simple truth that things are better done inside the kernel, and supervisors should only feed and serve. i conclude from this that systemd is quite working on the same layer/interface for just everything inside the kernel as the auditing system, and there _is_, as a result, doubled core functionality.
Actually, SystemD can and does do this. It sets up AutoFS mounts. Any access will cause the process to block until the real file system is mounted. An auditing system would not be able to do this any better than SystemD can.

again, my article targets at the cores of the implementations, the observing parts. if the auditing system can do this in the kernel, why we need it another time outside the kernel? in other words, if an auditing system can't do it _better_ than systemd, does that justify a layer in userspace? where should the generic observing interface reside, and how should userspace daemons settle on it? that is my question.
I am actually working on a transactional file system layer for the Linux kernel (about which I may write an article for OSNews someday ;) ).

this is interesting. could you please tell how it shall act (inside the kernel) and why an auditing system is not interested in it?

Reply Score: 1

Zifre Member since:
2009-10-04

it even spreads into user-space libraries and tools.

That is just to configure it. There is really no way to do that without user space tools. But yeah, I don't like SELinux very much... it's way too complicated.

so, can you safely state that an auditing system will _not_ implement everything systemd is dreaming of into its observing code?

Yes, actually. I highly doubt Linux would ever let an auditing system launch arbitrary daemons. And that's because it wouldn't make any sense. The old uevent helper system proved that it's always better to let user space launch things.

i conclude from this that systemd is quite working on the same layer/interface for just everything inside the kernel as the auditing system, and there _is_, as a result, doubled core functionality.

There is absolutely no duplicated functionality. None of the things that SystemD does with the kernel are done by the auditing system, and vice versa. The only possible thing I can think of would be that an auditing system could do the job of AutoFS. But that would be a really bad idea. AutoFS is much better for that purpose.

why we need it another time outside the kernel?

It's not outside the kernel. AutoFS is part of the Linux kernel. The reason that SystemD has to setup the AutoFS mounts rather than the kernel is because the kernel has no business reading configuration files. Policy decisions belong in user space.

where should the generic observing interface reside, and how should userspace daemons settle on it? that is my question.

The "generic observing system" is the auditing system. There is really little reason for observation of processes other than for security or debugging.

this is interesting. could you please tell how it shall act (inside the kernel) and why an auditing system is not interested in it?

A transactional file system would allow programs to have a consistent snapshot of the file system. An entire transaction (which could last an indefinite amount of time) is an atomic operation. For example, a package manager could install software in a transaction. Then, if the power goes out, you will not be left with an inconsistent state. The downside is that performance is slightly decreased, and there can be conflicts (e.g. A writes to a file that B is trying to read). Unlike many transaction systems, there is no blocking. Basically, if A reads something in a transaction, and B writes to that thing in a transaction, the transaction with the lower priority is terminated. Individual, normal file operations are treated as transactions with infinite priority, so normal programs never have to worry about the transaction system. If an auditing system were to maintain all this logic, it would be a huge layering violation.

Reply Score: 2

the_author Member since:
2010-08-26

"it even spreads into user-space libraries and tools.

That is just to configure it. There is really no way to do that without user space tools.
"
there are auditing systems that don't taint coreutils, for example ;)

I highly doubt Linux would ever let an auditing system launch arbitrary daemons. And that's because it wouldn't make any sense.
(...)
There is absolutely no duplicated functionality. None of the things that SystemD does with the kernel are done by the auditing system, and vice versa. The only possible thing I can think of would be that an auditing system could do the job of AutoFS. But that would be a really bad idea. AutoFS is much better for that purpose.
(...)
The "generic observing system" is the auditing system. There is really little reason for observation of processes other than for security or debugging.

you seem to get my article wrong. possibly the term to observe creates this strong relation to the auditing system that you think they are the same. but, please, go on birds perspective and overlook the kernel scape. you will see that, even if systemd is not observing by itself, at some point in the chain there is an observer because otherwise there would be no action on events. you would rather call this event handling or the like, but to observe is fully correctly used here in terms of the english language. think of a star observer. yes, in many cases the observation can be settled very deep into the kernel internals. but that is of different matter. anyhow there must be observation for events, and there is always a reason why.

this reason may be defined far outside the kernel in a user script. but the job ticket must get through down to the observing unit, being mangled and translated some times on the way. so it is, and both the auditing system and systemd somehow need to create such tickets for an observer or even to create an observer itself, depending on the kernel interfaces they hook in.

beside that - and here we come to what my article is about - both create a system to parse rules, to create types (struct's) of contexts, to pass these contexts as tickets, etc. think in terms of structures. many programs re-invent structures for the very same purpose: scanning rules to create internal contexts to type and register them at the correct interfaces and bind them to chains, an event handler, or whatever.

this generic way of doing things i mean. this is what the framework could encapsulate and offer a way that both the auditing system and systemd, but also udevd and other services, can profit from it. i could write my own rules in guile and register them via an ffi, circumventing systemd. but systemd would be notified about the change and could update its state or possibly act against my script - possibly via the auditor.

possibly you now see that what i target at is a more generic, say, job center for kernel observation or instruction that provides principles for simple job-creation and allows for even more flexibility because of being accessible arbitrarily and even in concurrency, managing the states for the listeners and feeders.

Reply Score: 1

Comment by mtzmtulivu
by mtzmtulivu on Thu 26th Aug 2010 03:38 UTC
mtzmtulivu
Member since:
2006-11-14

This article talks about upstart and systemd as competing solutions to a problem but did not identify the problem. What is the problem with the current linux init system that necessitated creation of these two new systems?

Reply Score: 3

RE: Comment by mtzmtulivu
by vivainio on Thu 26th Aug 2010 05:23 UTC in reply to "Comment by mtzmtulivu"
vivainio Member since:
2008-12-26

What is the problem with the current linux init system that necessitated creation of these two new systems?


It's slow.

Reply Score: 2

RE[2]: Comment by mtzmtulivu
by sorpigal on Thu 26th Aug 2010 16:49 UTC in reply to "RE: Comment by mtzmtulivu"
sorpigal Member since:
2005-11-02

If that were the only problem then any one of the init replacements created in the last 15 years would be an improvement.

Speed is secondary. An init replacement primarily needs to solve initialization sequencing. Building an init sequence in which the appropriate things are started at appopriate times, and not before other things that may be needed first, is a highly non-trivial process. Upstart and systemd try to solve this problem and the different approaches define more than anything else the differences between the systems.

After that there are some nice to have things which are lacking on linux. Here I primarily mean service control; it's embarrassing that Windows does this better (yes, better). Both upstart and systemd try to address this in fairly similar ways.

Both (but systemd in particular) do other things, of course, which I consider nonessential but still worthwhile and improvements on current systems. I have to give a great deal of credit to Lennart for not trying to solve just one tiny technical problem but aiming for a holistic approach, while still not greatly violating the *nix philosophy.

Reply Score: 2

Bill Shooter of Bul Member since:
2006-07-14

But init already does the sequencing correctly, in a linear fashion. That's not too difficult at all.

But linear is slow. We have muli core cpu's now. Booting would be faster if we loaded things in parallel. Ok, but what can we load parallel to what, and what has to remain serialized? That's the complexity of the sequencing. its complex due to the parallelization which is due to the need for speed.

Reply Score: 3

RE[4]: Comment by mtzmtulivu
by sorpigal on Thu 26th Aug 2010 17:33 UTC in reply to "RE[3]: Comment by mtzmtulivu"
sorpigal Member since:
2005-11-02

Linear is not 'easy' - linear is hard. Nonlinear is harder, but linear is not trivial! You have a large, unknown set of things to run which must be run in a particular order. What order? If you know you have a good order and want to insert a new item to run, where in the sequence does it go? Can you *safely* alter the order by inserting this new item and can you be sure that doing so does not break anything?

For simple things it's pretty easy to just "drop it in" and hope it will be fine, but there are many non-simple things. Does your ldap daemon need to be started before your remote filesystems are mounted? What if your /home is mounted via nfs and all users are stored in ldap? How do you know which order to load things in and how do you re-order it when it changes? Even in a purely linear situation this is not simple. If thinks work well today it's because of luck and careful engineering over many years.

It's a management nightmare which only becomes worse over time. Designing a system that works on purpose, instead of accidentally, is a worthwhile effort and a tricky problem.

Being faster is nice, sure, but that's not really a problem that needs to be solved, it's just a nice side effect. Once we can figure out sequencing properly we can get parallelism "for free" and thus some speedups. But no, it's not a goal.

Reply Score: 3

RE[5]: Comment by mtzmtulivu
by snadrus on Mon 30th Aug 2010 14:53 UTC in reply to "RE[4]: Comment by mtzmtulivu"
snadrus Member since:
2010-05-04

That's why I think SystemD has a chance, because if you messed up on ordering you will get a process waiting for the Kernel to complete the socket/port/pipe/filesystem connection. In the current topology, you get a hidden failure.

Reply Score: 1

Isn't that just...
by TheGZeus on Thu 26th Aug 2010 03:45 UTC
TheGZeus
Member since:
2010-05-19

...a microkernel/set of servers?
Except with a larger-than-normal kernel.

Reply Score: 2

RE: Isn't that just...
by Lennie on Thu 26th Aug 2010 06:39 UTC in reply to "Isn't that just..."
Lennie Member since:
2007-09-22

Probably is, but shhh don't tell anyone. :-)

But really a microkernel also would seperate filesystems and drivers, which is (or was at the time Linux was created) slower then doing it all in the same memory space.

so no, not in the strict sense.

Reply Score: 2

RE[2]: Isn't that just...
by snadrus on Mon 30th Aug 2010 14:48 UTC in reply to "RE: Isn't that just..."
snadrus Member since:
2010-05-04

For FileSystems it's an option (FUSE), and yes it is slower for hard I/O, but the convenience is nice.

Reply Score: 1

I don't mean to sound like a dick but...
by Icaria on Fri 27th Aug 2010 05:42 UTC
Icaria
Member since:
2010-06-19

This article proved incredibly difficult to read, as the English is more than a little mangled. I'm familiar with sysvinit, upstart and systemd but I gave up 1/2 way through paragraph 3, as I just had no idea what I was reading. Sorry.

Reply Score: 1

Author need to read Bugs completely.
by oiaohm on Sat 28th Aug 2010 04:03 UTC
oiaohm
Member since:
2009-05-30

https://bugzilla.redhat.com/show_bug.cgi?id=615527

If this was fully read you would notice Eric Paris the lead of fanotify. The lead developer to the replacement to inotify and dnotify. fanotify has none of the issues of inotify.

In fact fanotify allows you to block accept and delay requests of a file system. Something the past inotify and dnotify don't allow. Reason fanotify support real-time virus scanning and auditing from user-space.

Not all forms of auditing can be done from kernel space. Like who in there right mind would run a virus scan in kernel space.

"For example, if systemd recognizes a mountpoint access, it can mount the resource immediately. But, is that quick enough? Systemd has no influence on the accessing process and thus can't turn it into sleep until the mount happened."

This so call issue can be solved by fanotify delay response putting application to sleep. fanotify needs to feature complete then problem here is solved.

Next systemd uses cgroups to divide tasks. A full cgroup can be suspended while waiting for a drive to mount as well. Little over the top. fanotify catching would be far less painful.

There are a few experiments in using fanotify to make file recovery from backup transparent. Ie when you attempt to access file that has been sent to backup program gets delayed until file is recovered from backup location and extracted.

systemd is setting up to take advantage of the tech that will be on hand to userspace for auditing over the next 12 months. This really does leave all the other init systems far behind.

cgroup tech is also always expanding. The control systemd is providing compared to all the old systems is many times more.

Reply Score: 1