Linked by Thom Holwerda on Mon 12th Mar 2012 19:00 UTC, submitted by yoni
Privacy, Security, Encryption "And just when you thought the whole Stuxnet/Duqu trojan saga couldn't get any crazier, a security firm who has been analyzing Duqu writes that it employs a programming language that they've never seen before." Pretty crazy, especially when you consider what some think the mystery language looks like "The unknown c++ looks like the older IBM compilers found in OS400 SYS38 and the oldest sys36.The C++ code was used to write the tcp/ip stack for the operating system and all of the communications."
Thread beginning with comment 510424
To view parent comment, click here.
To read all comments associated with this story, please click here.
Alfman
Member since:
2011-01-28

sithlord2,

"Sure you can modify your compiler to change your calling conventions, but it would make it impossible to call external libraries + there is no real benefit (= it doesn't result in better code)."

It's usually not worth the immense development/maintenance burden, but I found that breaking with strict calling conventions can boost performance since you're not shifting registers around anywhere near as much to fit within a standard calling convention. If you look at ASM dumps frequently, you see a lot of functions have boilerplate MOVs just to get things in and out of place. This is often trivial to eliminate when your working in assembly without restraints.

Some day I envision optimizing compilers which can do inter-procedural optimizations without any calling convention at all to get rid of all that "useless" cruft. After all, the only time a calling convention truly matters is when calling a function of an external component/library.

C++ style exceptions might still might require a consistent stack frame, but a static calling convention like CDECL is not necessary.

Reply Parent Score: 2

Neolander Member since:
2010-03-08

Take a look at the AMD64 calling convention then... It seems that they have spent so much effort into making it faster through increased register use that now, only optimizing compilers can understand the logic behind it...

Reply Parent Score: 2

Alfman Member since:
2011-01-28

Neolander,

I haven't done asm for amd64, but it'd make sense that they've done something more optimal than passing via stack considering the extra registers.
http://en.wikipedia.org/wiki/X86_calling_conventions

"The registers RCX, RDX, R8, R9 are used for integer and pointer arguments (in that order left to right), and XMM0, XMM1, XMM2, XMM3 are used for floating point arguments. Additional arguments are pushed onto the stack (right to left). Integer return values (similar to x86) are returned in RAX if 64 bits or less. Floating point return values are returned in XMM0. Parameters less than 64 bits long are not zero extended; the high bits contain garbage."

(more info about the stack omitted)


However the point I was trying to get at is that any fixed calling convention is always going to require more shuffling simply for the sake of getting parameters in the right place.

Here's a pointless example:

int F1(int a, int b) {
int r=0;
while(b-->0) r+=F2(a,b);
return r;
}
int F2(int a, int b) {
while(a--) b+= F3(b);
return b;
}
int F3(int a) {
return a*(a+3);
}

Obviously in this case it makes the most sense to inline the whole thing, but maybe we're using function pointers or polymorphism which makes inlining impractical. It should be fairly easy to make F2 work without stepping on F1's registers, and the same goes for F3 so that no runtime register shuffling is needed at all between the three functions.

The moment any calling convention imposed however, moving/saving/restoring registers becomes an unavoidable necessity.

Of course, today's pipelined processors are good at doing register renaming and what not to reduce the overhead of such shuffling. However one inefficient scenario has always stood out like a sore thumb, and it perturbs me when I program in high level languages, it's the inability to return more than one unit of data from a function call. The CPU has no such limitation, and BIOS programmers routinely return more data points as needed, even using CPU flags which the caller can use for conditional jumps. I find this model works extremely well in ASM, but alas C programmers are forced to overload the return value (using the sign bit) and/or return extra values using memory pointers.

Reply Parent Score: 2