Solutions for Tracing UNIX Applications

Thom Holwerda 2009-04-06 Unix 9 Comments

Take a look at some systems that enable you to trace the execution of applications and work out what they are doing without having to make any modifications to the source code, and even without having to stop and restart the application. See how with tracing alone, you can find and diagnose problems with just a few commands.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

9 Comments

2009-04-06 8:48 am
MrVain
SUN’s free and open DTrace is the probing instrument of choice. It is ported to Apple Mac OS X and FreeBSD and QNX. The unique thing about DTrace, is that it sees EVERYTHING occuring in the system. EVERYTHING. No other is not even close to do that.
“I looked at one customer’s application that was absolutetly dependant of getting the best performance possible. Many people for many years had looked at the app using traditional tools. There was one particular function that was very “hot” – meaning that it was called several million times per second. Of course, everyone knew that being able to inline this function would help, but it was so complex that the compilers would refuse to inline.
Using DTrace, I instrumented every single assembly instruction in the function. What we found is that 5492 times to 1, there was a short circuit code path that was taken. We created a version of the function that had the short circuit case and then called the “real” function for other cases. This was completely inlinable and resulted in a 47 per cent performance gain.
Certainly, one could argue that if you used a debugger or analyzer you may have been able to come to the same conclusion in time. But who would want to sit and step through a function instruction by inctruction 5493 times? With DTrace, this took literally a ten second DTrace invocation, 2 minutes to craft the test case function, and 3 minutes to test. So in slightly over 5 minutes we had a 47 percent increase in performance.
Another case was one in which we were able to observe a high cross call rate as the result of running a particular application. Cross calls are essentially one CPU asking another to do something. They may or may not be an issue, but previously in was next to impossible (okay, really impossible) to determine their effecs with anything other than a debug version of the kernel. Being able to correlate the cross call directly to application was even more complex. If you had a room full of kernel engineers, each would have theories and plausible explanations, but no hard quantifiable data on what to do and what the impact to performance would be.
Enter DTrace…. With an exceedingly simple command line invocation of DTrace, we were able to quickly identify the line of code, the reason for the cross calls, and the impact on performance. The basic issue was that a very small region of a file was being mmap(2)’d, modified, msync(3C)’d, and then munmap(2)’d. This was basically being done to guarantee that the modified regoin was sync’d to disk.
The munmap(2) was the reason for the cross call and the application could get the same semantics by merely opening the file with O_DSYNC. This change was made and performance increased by almost double (not all from the cross calls, but they were the “footprint” that lead us down this path). So we went from an observable anomaly that previously had no means of analysis to a cause and remediation in less that 10 minutes.”
DTrace sees everything. Demonstrated with PHP.
http://blogs.sun.com/bmc/entry/dtrace_and_php_demonstrated
DTrace and Rails:
http://blogs.sun.com/bmc/entry/dtrace_on_rails
DTrace and Java + Swing:
http://blogs.sun.com/bmc/date/20050418#your_java_fell_into_my
More Java:
http://blogs.sun.com/ahl/date/20050418#dtracing_java
Dtrace + Linux (Linux is installed on top of Solaris, and Solaris runs Linux native binaries. Therefore you can use Solaris DTrace to see what is going on, in this case in Linux “top”)
http://blogs.sun.com/ahl/entry/dtrace_for_linux
As you see, DTrace sees everything. And no other can do that.
Edited 2009-04-06 08:52 UTC

2009-04-06 10:11 am
ba1l
No, that feature set is not unique to DTrace.
OProfile works on pretty much the same lines, acting as a system-wide profiler. It’s not as complex or as capable as DTrace, of course.
SystemTap is a closer equivalent to DTrace on Linux. Less mature, but probably covers 90% of the things you might want to use DTrace for.
I won’t argue that DTrace isn’t better than those options (I’ve not used either of them enough to say anything of the sort), but the way DTrace functions is certainly not unique.

2009-04-06 10:37 am
MrVain
Ok, if I claim that DTrace can see things no other probing utility can see, and I prove my claim – can you prove your claim?
Show us some links where other probing utilities extract the same detailed information as in my links.

2009-04-06 11:49 am
crisp
dtrace is great – and in fact, the many competitors are great too (kprobes, systemtap, oprofile, prof, kgdb, etc).
Dtrace is now available for linux – as the maintainer/author of the port, it is written as a compilable module you add to your kernel, i.e. no source code hacks or proprietary stuff.
Look here
http://www.crisp.demon.co.uk/blog
and source is downloadable from there too.
2009-04-06 4:37 pm
rcsteiner
It isn’t that hard to create your own logging facility and trigger various stages of logging detail by sending signals to the running process.
We do that all the time with real-time message processing applications. Most of the time they log minimal information unless we’re doing active troubleshooting, and then I can dump the world. 🙂
No special debugger or external software required. Just a logging routine, tail, and more (or grep for searching).
Of course, we’re only running dozens of trans/second max. Then again, we have four discrete levels of logging detail not including OFF, and it wouldn’t be too hard to change things.
Edited 2009-04-06 16:40 UTC

2009-04-07 9:47 am
MrVain
Wouldnt it be easier if you tried DTrace on your application? There are several OSes with DTrace that you can run your application on.

2009-04-06 4:07 pm
milek
The article states that truss can’t trace user functions and only system calls. That’s not true. See truss -u.
Edited 2009-04-06 16:08 UTC
2009-04-07 1:37 pm
Piranha
Special consideration has been taken to make DTrace safe to use in a production environment. For example, there is minimal probe effect when tracing is underway, and no performance impact associated with any disabled probe; this is important since there are tens of thousands of DTrace probes that can be enabled. New probes can also be created dynamically.
http://en.wikipedia.org/wiki/DTrace
Meaning… That you could run thousands of probes at once, with the same performance hit as running a handful. I’m not sure what can be said about the other contestants.

2009-04-07 7:31 pm
MrVain
rcsteiner
seems to claim that he could extract the same information as DTrace, easily. I wonder if he looked at my links.