To view parent comment, click here.
To read all comments associated with this story, please click here.
Depends on the gcc-version.
I just double-checked: According to info:gcc the most recent version (4.6) has enabled -ftree-vectorize on -O3, but still not -funroll-loops.
Unrolling is only default enabled if you compile with profiling-data that helps the compiler to unroll the correct loops.
Carewolf,
Thanks for the feedback.
For me GCC does use SSE for me with and without '-ftree-vectorize' (when I tweak the C source code).
I couldn't get GCC to do vector math without changing the source file to calculate the number of loops for GCC. In a case as simple as this, the compiler should have been able to handle it.
I know Valhalla is complaining about this specific example (not sure why?), but I do frequently come across issues like this in much more complex code where GCC misses an equally trivial optimization.
I also rarely see GCC vectorizing integer loops (it seems to choke on signed vs unsigned), GCC does slightly better with floating-point loops, and it helps if you allow it to break some strict math rules.
As I noted in another comment the newest version of gcc now has -ftree-vectorize in -O3, so I was not fully up to date in my first reply.
If you compile to AMD64, SSE is automatically used for all math (not vectorized, one value at a time). You can also use SSE for math in IA32 using -mfpmath=sse. SSE math is generally much faster but removes the 80bit temporaries quirk 487 math has.





Member since:
2005-09-08
After identifying that gcc didn't perform these optimization have you considered trying gcc with the options to enable them? -funroll-loops -ftree-vectorize.
It is not fun blaming the compiler for not doing optimizations it hasn't been asked to do. I know the default optimizations suck, but that is a well known problem with gcc.
The auto-vectorizer isn't very good with integers, but give it a try.
Edited 2011-06-13 09:46 UTC