Linked by Thom Holwerda on Wed 8th Mar 2006 22:57 UTC, submitted by hidden player
OSNews, Generic OSes Even a small operating system can have big disputes within its community. The lead developer of MenuetOS, an OS written in assembly, has decided to drop all support for the 32bit version of Menuet, focusing development on the 64bit version. However, disgruntled users of the open source operating system are trying to keep the 32bit version alive by starting a special forum for it.
Thread beginning with comment 102807
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[2]: Responsive
by Innominandum on Thu 9th Mar 2006 17:48 UTC in reply to "RE: Responsive"
Member since:

I disagree. Even today I have taken some publicly published routines demonstrating a technical standard. These routines are supposedly optimized in C. I already have these routines running double or triple of the C ones.

While high level compilers are getting quite good, the output they create is limited by the language itself. High level constructs aren't usually performance friendly. You don't have the same level of control. The only way a modern HLL compiler could join the same league as ASM is by the programmer using a series of labels and goto commands.

In the case where I optimized the "optimized" C code, the implementation I arrived at is not possible to express in an HLL language. Also, code density, instruction choices & scheduling, all still matter. Your mentality is the same one that causes WMP10 to take 10 seconds to sort 5000 songs on my P4 3.0 GHz. My 486 could sort 5000 song names faster than that.

People have been saying HLL compilers can equal ASM for many, many years now and I'm still waiting for it to happen. And if they can, someone's doing something wrong.

Reply Parent Score: 2

RE[3]: Responsive
by transputer_guy on Thu 9th Mar 2006 22:16 in reply to "RE[2]: Responsive"
transputer_guy Member since:

For some codes asm can still make a difference sure, and those will be esp common in DSP like codecs where FFTs, DCTs can be done a tad faster and way faster if special MMX like instructions can be got at not available to the regular compiler. I'd assume most media codecs have hand asm tuned cores.

When you get into grunt code that is far from the inner most loop, asm has almost no use period except lowering productivity. I also took a full blown C CAD app and started to optimize into asm, after all the 10x inner loops, I couldn't help myself, just kept going & going like the asm bunny, till the whole thing was asm and 5x smaller than the C version. Still I really only needed to do the inner loops.

It is a shame that sometimes HLLs even C do make it harder to exploit certain optimal asm sequences, switches come to mind, there is always asm{}. The x86 just makes it all so much worse and I don't see AMD64 being much better.

In the embedded space size will matter as well as performance, but for desk top apps, smaller option usually does pretty well also.

Funny you should mention my mentality with 5000 songs on WMP10. I believe that any OS should be able to sort 100K file entries is the blink of an eye as if all the info was already in memory. Lets see, 5000 int values should take On log n time which might be 10.5000.12 or 600K instructions probably < 1ms on any modern PC. String sorting will be a few x slower. In 10s one should be able to radix sort maybe 100M values.

If you are seeing 10s, thats probably a disk based sort, I'd bet the 486 did the same sort in ram, programmers back then had a much simpler model of the world so it was obvious to bring data into ram. So HLLs are not the problem, it keeping data on the disk that is pretty stupid.

I think the real problem we all have is that even with C,C++, OSes & cpus are just too complicated to see all the dumb stuff that gets done at the cycle level. Algorithms always come 1st with an understanding of exactly how that plays on the cpu today rather than Knuths ideal machine of the 60s.

Tree sorting algorithms where node hops supposedly take a few opcodes can today break all the caches and result in 300ns memory stalls each hop, the language makes no difference here.

Performance changes enormously when the problem fits in the caches v never fits in the caches. It would be much nicer to have cpus that have predictable timings no matter how random memory references occur but so far there is no single threaded solution to the Memory wall. It can however be traded for a Threaded wall problem with no memory wall per thread.

Reply Parent Score: 3

RE[3]: Responsive
by edwdig on Fri 10th Mar 2006 00:25 in reply to "RE[2]: Responsive"
edwdig Member since:

If you're writing code that can use MMX/SSE, C compilers are not very good at generating that code automatically. You can very easily win in those situations by hand writing assembly.

If you're not using SIMD instructions, a C compiler will usually easily outdo your hand coded assembly. The only time you stand a chance at outdoing the C compiler is in tight loops where the C compiler isn't provided with enough information to properly optimize the code. The more information you provide the compiler, the better job you can do. For example, if you have a function that's only used by other functions in the same source file, mark it as static. This will allow the compiler to use optimize across the function calls. Declare things as const when possible and it will further help your optimizations. Help the compiler, and it will gladly help you back.

5-10 years ago I used to write assembly. Primarly 16 bit. I could very easily outdo the compiler then. I was one of those people that never believed the claims of how good compilers are. Nowadays the compilers really impress me, and I'm usually wrong when I try to outsmart the compiler.

That said, if you're still working with more primative CPUs, you can outdo the compiler without too much trouble. I do some GameBoy Advance coding as a hobby. If I try, there are definitely situations where I can create better code than the compiler. Realistically, there aren't many places where I have to.

Reply Parent Score: 2