How to use helgrind to debug multithreaded Qt applications

Author: David Faure, KDAB.

Alternatives

Before we talk about helgrind, please know that these days I recommend thread sanitizer (TSAN) as the primary way to detect data races (it makes the application run much faster than helgrind, and doesn't have false positives). But it requires a 64-bit architecture, and it requires recompiling everything (including Qt). So if you're on 32-bit, or if you're too lazy to rebuild Qt, here's a howto about helgrind.

Helgrind introduction

You've heard of valgrind before, its default tool (memcheck) is such a life saver, being able to detect memory-related bugs in your code (leaks, double deletions, use of deleted memory, use of uninitialized memory, etc.).

Well, it turns out that valgrind also comes with a tool to detect race conditions between threads, in multithreaded applications. That tool is called helgrind. (There is also another tool called "drd", but I don't know the differences, and I have no experience with drd.)

In theory, provided that you're on a Unix platform, using helgrind is as simple as

valgrind --tool=helgrind myapplication

However, if you do just that on a Qt application, you'll end up digging through lots of false positives, making this a rather painful experience. So let's have a look at what is needed exactly to debug Qt5 and Qt6 applications with helgrind.

Valgrind

In order to benefit from a large number of fixes in helgrind itself (support for Qt5 QMutex, fix for the "destruction of unknown cond var" bug), you need valgrind >= 3.9.0.

Qt

  • Make sure to configure Qt in debug mode, for the atomic suppressions (defined further down) to work correctly, and to be able to patch it.

My contributions

In case you're curious, I fixed the following issues in Qt:

  • QFuture: race on d->state, fixed in Qt 5.0 (commit 7120cf16d)
  • QThreadDataPrivate: canWait race, fixed in Qt 5.1 (commit bf3a5cc) (backported to Qt 4.8.5 in commit 815d7f0)
  • QThread: race when setting the eventDispatcher, fixed in Qt 5.1 (commits f4609b2 and 85b25fc)
  • QEventDispatcherUNIX: race on the interrupt bool, fixed in Qt 5.1 (commit 49d7e71)
  • QEventLoop::exec()/exit() race, fixed in Qt 5.1 (commit 5a5a092)
  • QThreadPool: races in activeThreadCount(), fixed in Qt 5.2 (commit 85b24bb2de)
  • QThreadPool: race at time of thread expiry, fixed in Qt 5.3.0 (commit a9b6a78e54
  • qfreelist: race on v[at].next, fixed in Qt 5.3.1 (commit 8636bade17)
  • qDebug: race on QLoggingCategory, fixed in Qt 5.3.2 (commit 884b381576)
  • qDebug: race in qt_message_print, fixed in Qt 5.3.2 (commit 9ee27005ee)
  • QJpegHandler: race condition due to static variable, fixed in Qt 5.5 (commit 211c6f3dc7)
  • QThread/QThreadData: two races in destructors, fixed in Qt 5.6 (commit ec6556a2b9)
  • QSignalSpy: race between wait() and emit from another thread, fixed in Qt 6.8 (commit c837cd7593)
  • QObjectPrivate: race on ConnectionData contents, fixed in Qt 6.8 (commit 75d82afa0d)
The older your version of Qt, the more of these issues might show up.

Suppressions

It wasn't possible to make helgrind perfect for Qt. In particular, helgrind has no way to distinguish a raw store to an int, from the use of an atomic store on the int, because on x86 there is no difference. For this reason, I used the poor man's solution: defining suppressions for all uses of the Qt Atomic classes. If we can already use helgrind to fix all the abuse of normal (non-atomic) variables in multithread apps, it's already a huge step forward, even if we (wrongly) tell it that "any use of the atomic api is fine".

cd ~
git clone https://invent.kde.org/sdk/kde-dev-scripts.git
export VALGRIND_OPTS="--num-callers=50 --suppressions=$HOME/kde-dev-scripts/kde.supp"

The export should probably go into your ~/.zshrc or ~/.bashrc, so you have it set up once and for all.

Helgrind alias

In addition to detecting race conditions, helgrind also tried to detect potential deadlocks due to wrong locking order (A+B vs B+A). However the QOrderedMutexLocker trick in Qt confuses helgrind because of its interesting use of tryLock(), so the lock order feature of helgrind has to be disabled for now, using --track-lockorders=no. See bug 243232.

The default event dispatcher in Qt uses the glib event loop, which has its own races, which we're not really interested in. Easy solution: export QT_NO_GLIB=1

For these two reasons, I recommend to add this line in your ~/.zshrc (or ~/.bashrc for people who haven't tried zsh yet)

alias helgrind="QT_NO_GLIB=1 valgrind --tool=helgrind --track-lockorders=no"

Happy debugging!

David Faure.