Have you ever looked at a software trace thinking "if I just knew what line of code is responsible for this event, or which of my code is being interrupted by that context switch" ?
We have implemented a mechanism that allows collecting stack traces with predefined depth for selective events in a software trace. This allows exactly understanding the code path that's being taken, and can thus help understanding exceptional situations, unexpected latencies and deadlock conditions. Moreover, the turnaround-cycle from finding a problem to fixing it in the code is much faster than with traditional methods.
In my talk, I will explain
- Motivation of our work (end user benefits of having stack traces), best shown with a live example
- Data format (how did we mangle the stack traces into our tracing format, and could it go into CTF ?)
- Symbol resolution and how TCF / Debugger provide a universal engine for symbol resolving along the way
- Visualization and tooling
- Intrusiveness of the solution.