Linked by Thom Holwerda on Wed 8th Mar 2006 22:57 UTC, submitted by hidden player
OSNews, Generic OSes Even a small operating system can have big disputes within its community. The lead developer of MenuetOS, an OS written in assembly, has decided to drop all support for the 32bit version of Menuet, focusing development on the 64bit version. However, disgruntled users of the open source operating system are trying to keep the 32bit version alive by starting a special forum for it.
Thread beginning with comment 102916
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[3]: Responsive
by transputer_guy on Thu 9th Mar 2006 22:16 UTC in reply to "RE[2]: Responsive"
transputer_guy
Member since:
2005-07-08

For some codes asm can still make a difference sure, and those will be esp common in DSP like codecs where FFTs, DCTs can be done a tad faster and way faster if special MMX like instructions can be got at not available to the regular compiler. I'd assume most media codecs have hand asm tuned cores.

When you get into grunt code that is far from the inner most loop, asm has almost no use period except lowering productivity. I also took a full blown C CAD app and started to optimize into asm, after all the 10x inner loops, I couldn't help myself, just kept going & going like the asm bunny, till the whole thing was asm and 5x smaller than the C version. Still I really only needed to do the inner loops.

It is a shame that sometimes HLLs even C do make it harder to exploit certain optimal asm sequences, switches come to mind, there is always asm{}. The x86 just makes it all so much worse and I don't see AMD64 being much better.

In the embedded space size will matter as well as performance, but for desk top apps, smaller option usually does pretty well also.

Funny you should mention my mentality with 5000 songs on WMP10. I believe that any OS should be able to sort 100K file entries is the blink of an eye as if all the info was already in memory. Lets see, 5000 int values should take On log n time which might be 10.5000.12 or 600K instructions probably < 1ms on any modern PC. String sorting will be a few x slower. In 10s one should be able to radix sort maybe 100M values.

If you are seeing 10s, thats probably a disk based sort, I'd bet the 486 did the same sort in ram, programmers back then had a much simpler model of the world so it was obvious to bring data into ram. So HLLs are not the problem, it keeping data on the disk that is pretty stupid.

I think the real problem we all have is that even with C,C++, OSes & cpus are just too complicated to see all the dumb stuff that gets done at the cycle level. Algorithms always come 1st with an understanding of exactly how that plays on the cpu today rather than Knuths ideal machine of the 60s.

Tree sorting algorithms where node hops supposedly take a few opcodes can today break all the caches and result in 300ns memory stalls each hop, the language makes no difference here.

Performance changes enormously when the problem fits in the caches v never fits in the caches. It would be much nicer to have cpus that have predictable timings no matter how random memory references occur but so far there is no single threaded solution to the Memory wall. It can however be traded for a Threaded wall problem with no memory wall per thread.

Reply Parent Score: 3