Linked by Thom Holwerda on Wed 8th Mar 2006 22:57 UTC, submitted by hidden player
OSNews, Generic OSes Even a small operating system can have big disputes within its community. The lead developer of MenuetOS, an OS written in assembly, has decided to drop all support for the 32bit version of Menuet, focusing development on the 64bit version. However, disgruntled users of the open source operating system are trying to keep the 32bit version alive by starting a special forum for it.
Thread beginning with comment 102731
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE: Responsive
by edwdig on Thu 9th Mar 2006 05:57 UTC in reply to "Responsive"
edwdig
Member since:
2005-08-22

Coding in assembly helped a lot back in the day when a having anything more than a few megabytes of RAM was unheard of. Back then, it was fairly easy to optimize a small loop better than a compiler could. Now CPUs have become so complex that it takes far more work than it's worth to be able to outdo a compiler in most situations.

Code size mattered more back then, and it was easy to write smaller code in assembly, which made larger programs/data sets more reasonable to work with. Memory paging eventually made this much less of a concern.

The other big thing you gained by coding in assembly was you could define better calling conventions for functions. Standard calling conventions for C functions is to pass all variables on the stack, and trash the registers ax, bx, cx, and dx. You can define calling conventions which pass variables directly in registers, or which trash different registers. If you customize the convention to meet the needs of the code in question, you can both gain speed and shrink the code at the same time. This is less of a gain on modern processors which have internal registers with register renaming.

Reply Parent Score: 5

RE[2]: Responsive
by transputer_guy on Thu 9th Mar 2006 06:54 in reply to "RE: Responsive"
transputer_guy Member since:
2005-07-08

I would have to agree with this and the 1st poster to mention it as well. Anyone who has been around long enough has already been there done that.

The project sounds interesting & I will follow it more, but the programming definitely takes me back to 1984 and the Inside Mac programming model where some coders such as Andy H could crank out all their apps in 68K asm because the early MacOS was so elegantly thought out (despite the memory handles & coop tasking handicaps).

In those days of tiny asm programming either x86 or 68K, it was well worth doing because every instruction performed in time in a predictable way. You could count clocks, bytes and choose which hand optimization would give better results even unrolling and inlining huge chunks of code. C to asm often gave 10x improvements.

Today that makes no sense and hasn't since the PPro and similar super scaler, out of order cpus made cycle counting irrelevant.

If Menuet has any interesting internals it should be programmed in a higher level language, perhaps BCPL or Lisp or just plain C. Dealing with asm codes isn't going to produce any applications more interesting than those 1984 apps. Which makes me wonder was the browser with the OSNews picture a native browser in asm too?

Interpreters are still a good idea provided the language and infrastructure are well designed, the user is still the limiting factor in keeping any OS busy. Today few books are available for modern x86 coding that can give a predicable model for how the code will actually run. There is a huge problem with the memory wall and true random access across large address spaces makes some "mov" instructions effectively 100s of cycles rather than 1 or so.

Reply Parent Score: 5

RE[2]: Responsive
by Innominandum on Thu 9th Mar 2006 17:48 in reply to "RE: Responsive"
Innominandum Member since:
2005-11-18

I disagree. Even today I have taken some publicly published routines demonstrating a technical standard. These routines are supposedly optimized in C. I already have these routines running double or triple of the C ones.

While high level compilers are getting quite good, the output they create is limited by the language itself. High level constructs aren't usually performance friendly. You don't have the same level of control. The only way a modern HLL compiler could join the same league as ASM is by the programmer using a series of labels and goto commands.

In the case where I optimized the "optimized" C code, the implementation I arrived at is not possible to express in an HLL language. Also, code density, instruction choices & scheduling, all still matter. Your mentality is the same one that causes WMP10 to take 10 seconds to sort 5000 songs on my P4 3.0 GHz. My 486 could sort 5000 song names faster than that.

People have been saying HLL compilers can equal ASM for many, many years now and I'm still waiting for it to happen. And if they can, someone's doing something wrong.

Reply Parent Score: 2

RE[3]: Responsive
by transputer_guy on Thu 9th Mar 2006 22:16 in reply to "RE[2]: Responsive"
transputer_guy Member since:
2005-07-08

For some codes asm can still make a difference sure, and those will be esp common in DSP like codecs where FFTs, DCTs can be done a tad faster and way faster if special MMX like instructions can be got at not available to the regular compiler. I'd assume most media codecs have hand asm tuned cores.

When you get into grunt code that is far from the inner most loop, asm has almost no use period except lowering productivity. I also took a full blown C CAD app and started to optimize into asm, after all the 10x inner loops, I couldn't help myself, just kept going & going like the asm bunny, till the whole thing was asm and 5x smaller than the C version. Still I really only needed to do the inner loops.

It is a shame that sometimes HLLs even C do make it harder to exploit certain optimal asm sequences, switches come to mind, there is always asm{}. The x86 just makes it all so much worse and I don't see AMD64 being much better.

In the embedded space size will matter as well as performance, but for desk top apps, smaller option usually does pretty well also.

Funny you should mention my mentality with 5000 songs on WMP10. I believe that any OS should be able to sort 100K file entries is the blink of an eye as if all the info was already in memory. Lets see, 5000 int values should take On log n time which might be 10.5000.12 or 600K instructions probably < 1ms on any modern PC. String sorting will be a few x slower. In 10s one should be able to radix sort maybe 100M values.

If you are seeing 10s, thats probably a disk based sort, I'd bet the 486 did the same sort in ram, programmers back then had a much simpler model of the world so it was obvious to bring data into ram. So HLLs are not the problem, it keeping data on the disk that is pretty stupid.

I think the real problem we all have is that even with C,C++, OSes & cpus are just too complicated to see all the dumb stuff that gets done at the cycle level. Algorithms always come 1st with an understanding of exactly how that plays on the cpu today rather than Knuths ideal machine of the 60s.

Tree sorting algorithms where node hops supposedly take a few opcodes can today break all the caches and result in 300ns memory stalls each hop, the language makes no difference here.

Performance changes enormously when the problem fits in the caches v never fits in the caches. It would be much nicer to have cpus that have predictable timings no matter how random memory references occur but so far there is no single threaded solution to the Memory wall. It can however be traded for a Threaded wall problem with no memory wall per thread.

Reply Parent Score: 3

RE[3]: Responsive
by edwdig on Fri 10th Mar 2006 00:25 in reply to "RE[2]: Responsive"
edwdig Member since:
2005-08-22

If you're writing code that can use MMX/SSE, C compilers are not very good at generating that code automatically. You can very easily win in those situations by hand writing assembly.

If you're not using SIMD instructions, a C compiler will usually easily outdo your hand coded assembly. The only time you stand a chance at outdoing the C compiler is in tight loops where the C compiler isn't provided with enough information to properly optimize the code. The more information you provide the compiler, the better job you can do. For example, if you have a function that's only used by other functions in the same source file, mark it as static. This will allow the compiler to use optimize across the function calls. Declare things as const when possible and it will further help your optimizations. Help the compiler, and it will gladly help you back.

5-10 years ago I used to write assembly. Primarly 16 bit. I could very easily outdo the compiler then. I was one of those people that never believed the claims of how good compilers are. Nowadays the compilers really impress me, and I'm usually wrong when I try to outsmart the compiler.

That said, if you're still working with more primative CPUs, you can outdo the compiler without too much trouble. I do some GameBoy Advance coding as a hobby. If I try, there are definitely situations where I can create better code than the compiler. Realistically, there aren't many places where I have to.

Reply Parent Score: 2