Linked by nfeske on Thu 23rd Aug 2012 08:30 UTC
OSNews, Generic OSes The just released version 12.08 of the Genode OS Framework comes with the ability to run Genode-based systems on ARM hardware without an underlying kernel, vastly improves the support for the NOVA hypervisor, and adds device drivers for the OMAP4 SoC. Further functional additions are a FFAT-based file system service, the port of the lighttpd web server, and support for on-target debugging via GDB.
Thread beginning with comment 532025
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[3]: Good Progress
by nfeske on Fri 24th Aug 2012 19:48 UTC in reply to "RE[2]: Good Progress"
nfeske
Member since:
2009-05-27

For running Genode on x86 in general, there is no urgent need to have this architecture covered by base-hw. There are several other kernels among Genode's supported base platforms that support x86 just fine, i.e., NOVA.

Thank you for having taken the time to study the release notes in such detail.

The paragraph you cited refers to the libc. Before the change, the mentioned functions had been mere dummy stubs. Now, they do something meaningful. The lock is locally within the process. The kernel doesn't know anything about the lock nor is it directly involved in handling the actual read/write/lseek operation. Please remember that we are using a microkernel-based architecture where I/O is performed by user-level components rather than the kernel.

Is one lock for pread/pwrite per process a bottleneck? This is a good question, which is quite hard to answer without having a workload that heavily uses these functions from multiple threads. As long as many processes contend for I/O or the workload is generally bounded by I/O, this is not a scalability issue.

For multi-threaded POSIX applications that call those functions concurrently, however, I agree that the lock per process could be replaced by a lock per file descriptor to improve SMP scalability. I couldn't name such an application from the top of my head, though. Do you have an example that would be worthwhile to investigate? We may change the locking once we see this becoming a real issue rather than a speculative one. Until then, it is just nice to have the functional gap in Genode's libc closed without the risk of introducing race conditions.

Reply Parent Score: 2

RE[4]: Good Progress
by Alfman on Sat 25th Aug 2012 14:37 in reply to "RE[3]: Good Progress"
Alfman Member since:
2011-01-28

nfeske,

Like you, I'd have to research it more. But I think an excellent test would be a database engine that doesn't use memory mapped IO. I think mysql is such a database, particularly because 32bit addressing is an unacceptable limitation. Not sure how it works in 64 bit though.

http://doc.51windows.net/mysql/?url=/MySQL/ch07s05.html
"Only compressed MyISAM tables are memory mapped. This is because the 32-bit memory space of 4GB is not large enough for most big tables. When systems with a 64-bit address space become more common, we may add general support for memory mapping."


When you implement a pread in libc, does it look something like this?
(Apologies in advance for the spacing bugs...Thom get that fixed!!)


int pread(...) {
aquire_process_mutex(...);
long long pos = lseek(...);
int ret = read(...);
lseek(pos); // since pread isn't supposed to have side effects
free_mutex(...);
return ret;
}

This makes 3 calls to the file system, do those functions have their own internal mutexes such that each pread/pwrite call will actually invoke 4 total mutex cycles (instead of 1 needed by a native pread function)? That would be alot of sync overhead on SMP systems (IMHO).


Also, I think the following example might be able to break the above atomicity:

void uncertainty() {
char data;
int handle = open(...,O_WRONLY|O_TRUNC);

int pid = fork();

if (pid==0) {
data=1;
pwrite(handle, &data, sizeof(data), 1)
} else {
data=2
pwrite(handle, &data, sizeof(data), 1);
waitpid(pid);
}

}


We would normally expect only 2 possible arbitrary outcomes:

0x00 0x01 # child overwrote parent
0x00 0x02 # parent overwrote child

However due to race conditions on lseek, we might end up with these variances as well.

0x02 0x01
0x01 0x02


Granted this example is contrived. I don't know if there are typical applications that share file descriptors between processes and use pread/pwrite on them?


I brought this up because I really enjoy technical analysis, not because of any particular concern. But if I'm bugging you too much feel free to tell me to sod off ;)

Reply Parent Score: 2

RE[5]: Good Progress
by nfeske on Sat 25th Aug 2012 17:58 in reply to "RE[4]: Good Progress"
nfeske Member since:
2009-05-27

You are welcome! :-)

Indeed, the code looks similar to the snippet you posted. See here:

https://github.com/genodelabs/genode/blob/master/libports/src/lib/li...

Fortunately, your concerns do not apply for Genode. In Genode's libc, the seek offset is not held at the file system but local to the process within the libc. The file-system interface is designed such that the seek offset is passed from the client to the file system with each individual file-system operation. The seek value as seen at libc API level is just a value stored alongside the file descriptor within the libc. Therefore, lseek is cheap. It is just a library call updating a variable without invoking a syscall.

Your example does indeed subvert the locking scheme. But as Genode does not provide fork(), it wouldn't work anyway. ;-)

Btw, if programs are executed within the Noux runtime (see [1]), lseek is actually an RPC call to the Noux server. So the pread/pwrite implementation carries an overhead compared to having pread/pwrite as first-class operations. So there is room for optimization in this case.

[1] http://genode.org/documentation/release-notes/11.11#OS-level_Virtua...

Given all the steps that are involved in a single read I/O operation, however, I am uncertain about the benefit of this specific optimization. To prevent falling into the premature-optimization trap, I'd first try to obtain the performance profile of a tangible workload. Another reason I'd be hesitant to introduce pread/pwrite as first-class operations into Noux is that in general, we try to design interfaces to be as orthogonal as possible. Thanks to this guideline, the Noux server is a cute little component of less then 5000 LOC. Introducing pread/pwrite in addition to read/write somehow spoils this principle and increases complexity.

Thanks for the pointer to the database engine. This might be a good starting point for a workload to be taken as reference when optimizing for performance and scalability.

Reply Parent Score: 2