C/C++ Profiling Tools
This blog will give you a brief overview of profiling C and C++ applications. Additionally, it will lay before you all of the tools available, with the purpose of aiding you in choosing the right tools at the right times.
The Steps for Profiling
Before we look at the actual tools, let’s go over the steps to profiling. It’s quite important to have a technique for doing this properly, to avoid the trap of changing something hoping it’s better, committing it, and going home without making sure that you’ve actually improved things. So the way to do that is by, first, assessing what is important in terms of performance in your project. Is it the CPU usage? Is it the off CPU time, when your application is sleeping or waiting for something to happen? Is it memory allocations? Is it the battery usage that is the problem? Do you want to improve the frame rate? It can be many, many different things, not just one. It’s a whole set of measures.
- First, you’ll want to assess what is important for your project. What do you want to measure?
- Then, you’ll decide which tools you can use to do these measurements. There are many different tools and they all cover a number of possible things to measure.
- After you select your tools, you should write a benchmark. That means a reliable way of measuring something in your application. If you have to start the application, click here, click there, load a file, wait for something to be downloaded and so on and so on, you won’t be able to do so three times in a row and get the same measurements. It’s much better if you write some sort of automated test, like a unit test but for benchmarking instead of correctness, which means measuring the performance of a specific task. You decide what to measure, how to measure, and write a benchmark to actually be able to measure that reliably.
- Use one of the tools to run the benchmark and establish a baseline by measuring the benchmark, before you make any changes to the code.
- At step five, you can finally change the code. Keep in mind that you do not change the code until step 5. After you change it, measure the result, compare it to the baseline, and decide whether or not the change is an improvement. That can be a bit tricky if it’s better on one measure than it is on another. You need to decide if that’s good enough or if you should refine the solution.
Then, you would measure it again with the changes, repeat up to step 4, profile, repeat step 5, make the changes again, and so on.
The Tools to Use
In order to choose the best tools for the job, it’s important to be aware of all the tools that are available and when to use them.
For Measuring Performance
Let’s talk first about measurements of performance, specifically, CPU and off CPU performance.
VTune
To measure performance, you can use VTune, which is made by Intel. It’s a very powerful tool for this with a very nice user interface. VTune is available for both Linux and Windows and is free if you download it as part of Intel System Studio. Don’t look for it separately as VTune, but as part of the Intel System Studio suite of tools. It’s actually free for use — even for commercial use. VTune does, however, have one limitation — it requires Intel hardware, which means you can’t use it on AMD CPUs or ARM, if you have embedded boards. Apart from that one limitation, it’s a very good tool.
Perf
Another tool you can use to measure performance is perf, which is part of the Linux kernel. That means it supports all of the architectures of the Linux kernel, including x86, ARM, PPC, and so on. Unfortunately, perf has no user interface. It’s a command line tool that is pretty difficult to use. So, we at KDAB wrote a tool called Hotspot, which is a graphical interface for the measurements made by perf.
You can find Hotspot on GitHub. It’s an open source application that you can use for free. Its goal is to be easy to use and it covers most of the common use cases, including watching the CPU time used by the application and finding out who is using that time. It also supports measuring off CPU time, meaning the time when the application is sleeping or waiting for something to happen. Click here to watch a full demo of Hotspot.
For Measuring Memory Allocations
Another thing you might want to measure, as previously mentioned, is memory allocations — not just memory leaks but also the use of memory while the application is running.
Valgrind Massif
If your application is cleaning up everything on the end, heavy memory usage will not show up as a leak. So, the leak-checking tools will nothelp. But if your application is using too much memory while it’s running, you might want to use different tools that can pinpoint where those allocations happen. One tool that does that is Valgrind Massif. It does the job quite well, but is really slow.
Heaptrack
Another approach is to use Heaptrack, an open source tool that’s part of KDE. It was developed by one of my colleagues, Milian Wolff. Heaptrack is able to locate and recall all of the memory allocations done while the application is running. Then it will show you graphs of the allocations, including temporary allocations, which is when an area of memory is allocated and then freed right away afterwards.
This could be wanted, of course, but can also be something to optimize. It can show you, for different memory sizes, whether you often allocate small memory sizes or large ones. And of course if shows you a backtrace for every allocation, so you can relate the application’s software memory to your code to help you find out which piece of code is doing it. It is a lot faster than Valgrind Massif. You almost don’t see that during Heaptrack.
Another feature that Heaptrack has over Valgrind is that you can attach Heaptrack to a running program. This is quite useful if you want to measure only one operation, as opposed to the whole setup of the application. It can also show you the difference between tool runs, which is extremely useful if you apply the process that was mentioned earlier. You can measure the baseline, measure with the changes, and Heaptrack will show you only the difference between the two so you can figure out by how much you’ve improved things when making changes.
For a full demo of Heaptrack, watch this video.
Need More Details?
For more details about how to use all of these tools, check out the Profiling and Debugging videos on our YouTube channel. If you would like to learn even more about these tools than what’s in the videos, we at KDAB also offer a 3 day Profiling and Debugging C and C++ Applications training.
If you like this article and want to read similar material, consider subscribing via our RSS feed.
Subscribe to KDAB TV for similar informative short video content.
KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.
For performance measurement, you can also use valgrind –tool=callgrind and the graphical tool kcachegrind for analyzing.
Hi Martin! Indeed, this is another option. I have however stopped using it because it’s very very slow (acceptable on benchmarks, but unusable on real-world applications). perf+hotspot gets results much faster. The only thing callgrind/kcachegrind can give us that perf+hotspot can’t, is call counts (how many times a given function was called). But the primary questions we’re usually asking ourselves is which functions are the ones we spend most time in, which both can do, but hotspot can do much faster; it’s often clear (or easy to find out) whether that’s because the function takes a very long time or is called too many times.
The thing I miss the most with perf is tracing and/or being able to easily delimit what part of the application you want to analyse.
Perf is great in the sense that it can collect a ton of information, but as a developer you already know what parts of your software need improvement. Collecting the whole execution can create huge perf.data files that take a while to load. I’ve found that sending a SIGUSR2 signal you can direct the sampling process to move on to a new dump file but I’ve always found that cumbersome to use.
On another hand, Intel VTune can open (or at least it used to) perf files! For some reason the GUI does not find them under the default filename “perf.data” but it expects a “.perf” extension, so if you rename your files it will open them correctly. The reports are obviously lackluster compared to the ones you can generate with Intel’s own sampling kernel module.
Hello Jorge,
I agree that collecting the whole execution leads to too much information. This is why I usually only sample 5 seconds of the intense activity (moment of high CPU usage) I’m interested in.
I do this with something like: perf record -z –call-graph dwarf -p `pidof MyApp`
and then press Ctrl+C after 5 seconds.
VTune is *actually* available as a standalone download: https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler-download.html
😉
You’re completely right. The text in this blog is based on a YouTube video I recorded some time ago, and since then VTune is indeed available separately, I forgot to update this bit in the blog. Thanks for pointing it out for our readers!
Great thanks for the continuous effort on Hotspot and Heaptrack. They’re really useful and friendly.