Linked by Thom Holwerda on Mon 12th Mar 2012 19:00 UTC, submitted by yoni
Privacy, Security, Encryption "And just when you thought the whole Stuxnet/Duqu trojan saga couldn't get any crazier, a security firm who has been analyzing Duqu writes that it employs a programming language that they've never seen before." Pretty crazy, especially when you consider what some think the mystery language looks like "The unknown c++ looks like the older IBM compilers found in OS400 SYS38 and the oldest sys36.The C++ code was used to write the tcp/ip stack for the operating system and all of the communications."
Thread beginning with comment 510572
To view parent comment, click here.
To read all comments associated with this story, please click here.
Alfman
Member since:
2011-01-28

Neolander,

"Sure, I was just arguing that the set of registers which they have picked seems to only make sense in the context of specific compiler implementations. Why do they use R8 and R9, as an example ? Why RAX, RCX, RDX, but not RBX ?"

Ah well now I can't answer that (or your other questions). Back with real mode addressing the choice of registers was more significant, but now...it may be somewhat arbitrary? I'm not sure about the conventions for special AMD64 cases.


"...unless I'm misunderstood, inlining is not performed because the compiler is unable to efficiently detect the relationship between F1, F2, and F3 at compile time. If so, how could it make sure that the functions are not stepping on each other's registers ?"

My counter argument is that if a human programmer can see the relationship, so too should an ideal compiler. Of course it can only prove relationships for internal dependencies which are available at compile time, but I think that's a given.

As for the reason not to inline, besides the two I already listed (function pointers and polymorphism), one might be circular recursion. Another obvious one is size/cache optimization. Another reason might be "tail calling" where a function can perform a jump directly into another function instead of a call followed by a ret. Sometimes these end up being 100% free in the context of conditional logic which would require a jump anyways, so nothing is saved under the inline code path.

Note: GCC is already able to optimize away tail calls so that the function below will run indefinitely without running out of stack.

int forever(int x) {
printf("%d\n", x);
return forever(x+1);
}


"Besides, I am not sure that compilers have to follow calling conventions for anything but external library calls, for which some kind of standard is necessary since the program and the library are compiled separately. As an example, when inlining is performed, calling conventions are violated (or rather bypassed), and no one cares."

Yes that's the theory, but in practice GCC always uses the calling convention. I think C functions default to being externally callable to aid in external linking. In fact the whole methodology of compiling to objects and then linking together into a static binary is an obstacle for any compiler which would like to perform interprocedural optimization.


"Ideally, any language would support tuples like Python's, where you can shove a set of inhomogeneous objects into the returned 'value' of a function without caring what happens under the hood. But I suspect that this can be hard to optimize properly."

Well I don't know about python's implementation. However sometimes I find it helpful to stop looking at things as well defined mathematical functions and instead look at it like a sequence of code blocks, always moving forward by "jumping" to the next block with more parameters for it. Call and ret are simply different mechanisms for locating the next block, but otherwise do the same thing. So there's no difference in what I can pass from one block to the next.

To highlight this:
function A() {
{blockA1}
B()
{blockA2}
}

function B() {
{blockB1}
}

We can 'Unthink' the function abstraction to get:
{blockA1} -> {blockB1} -> {blockA2}
There's no fundamental reason the parameters from B1 to A2 cannot be just as rich as the parameters from A1 to B1.

Edit: I wanted to highlight how input and output parameters are really different perspectives on the same thing, and they would not need to have two different optimization mechanisms if it weren't for differences in higher level function semantics.

Addendum: It would be pretty cool to have a language where a "return" would be syntactically the same as a function call with type-checking and everything.

function A() {
B() returns(int a, char b);
printf("%d %c\n", a,b);
}

function B() {
return(4,'c');
}

Edited 2012-03-14 15:10 UTC

Reply Parent Score: 2