Fixing bugs via lateral thinking Lessons learned by digging into obscure QML bugs
For today’s blog I would like to share with you the little adventure I had when fixing a very strange bug in Qt.
Pop quiz
Don’t think about this too much, just guess: what does this QML snippet print?
import QtQuick 2.0 QtObject { Component.onCompleted: { console.log("240000000000" == "3776798720"); } }
There are no JavaScript semantic tricks involved; and using either ==
or ===
does not change the result in any way.
Did you get it?
Pop quiz (2)
Here’s another one, this time in pure C++: what does this print?
QJSEngine engine; QJSValue value("foo"); QJSValue obj = engine.newObject(); obj.setProperty("240000000000", value); QJSValueIterator it(obj); while (it.hasNext()) { it.next(); qDebug() << it.name() << it.value().toString(); }
Right now you should be suspicious of that strange value…
Ready for it?
Solution
In Qt 5.6.2 and Qt 5.7.0 the QML test prints "true"
! According to QML the two strings "240000000000"
and "3776798720"
are totally identical.
Moreover, the C++ test prints "3776798720" "foo"
, not "240000000000" "foo"
as one could’ve imagined.
So, what is going on? Who told me to use JavaScript again?
The bug
This bug against the JavaScript engine of QML (internally called V4) was reported on Qt’s bugtracker a few days ago, and quickly gained a lot of votes (probably because it was also cross-posted to Stack Overflow).
So, what do we have here? A very popular bug report, with no assigner set, and myself who has been wanting for a while to peek into the internals of the V4 JavaScript engine… well, let’s go!
Analysis
Where do you start looking around in a codebase you’re totally unfamiliar with? The first step I took was trying to understand what values made the engine misbehave. Note that the reporter was very clever at noticing something strange with the values:
240000000000 == 0x37E11D6000 3776798720 == 0xE11D6000
By the way, “240000000000” isn’t the only value failing — “240000000001” misbehaves too (returning “3776798721”).
So maybe I needed to look for some bit pattern. “Maybe Qt is storing these strings as 64 bit integers, and using the top bits to store some additional metadata, and the integer gets corrupted somehow”, I thought.
Then I wondered: is any big number (represented as a string) problematic? No, “300000000000” works, and “200000000000” works too.
Maybe it’s a range somewhere between those two extremes?
With a very quick binary search (using Perl to generate and run QML testcases), in a few steps I was able to determine the exact range that make everything fail: any string between “238609294222” and “249108103166” inclusive. Values before and after those two worked perfectly. Do these numbers share some bit pattern?
238609294221 == 0x378E38E38D // biggest working before the problematic range 238609294222 == 0x378E38E38E // smallest not working 249108103166 == 0x39FFFFFFFE // biggest not working 249108103166 == 0x39FFFFFFFF // smallest working after the problematic range
Uhm, I can’t spot any bit pattern in there that could cause this. Indeed, this turned out to be a complete red herring.
Debugging through V4’s code
Of course, the next step is to try to step into the JavaScript engine and see if something is wrong. That turned out to be … problematic.
A JavaScript engine is not a trivial piece of code; there’s a lot of dark magic in there to give JavaScript semantics to values, and keep them allocated in efficient ways.
Most of the time, the debugger was not able to print any local whatsoever (even in a -O0 -g
build), and when it was, the locals were totally meaningless, as I was probably in memory that was getting happily reinterpret_cast()
ed across different internal datatypes. So, I simply was not able to understand at all which paths the code was going through (and why) in order to determine which one(s) could be problematic.
I realized that I needed a more structured approach at trying to find where the code paths diverged. That is (light bulb moment), I needed a way to trace the function calls and diff them. One way could’ve been: add printf
s everywhere. Or: launch multiple debugging sessions, over different testdata, and advanced them in parallel to check where executions diverge.
But, in both cases, step by step would be a huge task, given the hundreds if not thousands of nested calls into the engine for something as trivial as setting a string property. We can do better by instrumenting the binaries and getting the traces generated for us.
Function tracing
There’s a ridiculously simple way to get traces for every function call: recompile your libraries/applications with -pg
. If you also pass -O0
, -g
and other options that prevent the compiler from being too smart, you can gather extra useful data for each function call, such as the arguments passed to the functions, and the values returned.
It turns out that Qt’s configure script natively supports the -pg
option via the -profile
switch, so I simply recompiled both qtbase and qtdeclarative with it.
And now, the C++ testcase above generates accurate tracing data. How could I use them? I went for the simplest approach possible: let’s do a visual comparison.
Thanks to the awesome uftrace tool one can dump the C++ testcase’s trace straight into Chromium’s tracing flamegraph. There it is:
The trace in the top half is a working testcase, the one below a broken case.
Now the difference in behavior is quite evident!
In the broken case, QV4::Object::putIndexes
gets called, instead of QV4::Object::put
. Great! That was the hint I needed. From there on, it was kind of downhill.
Narrowing down the analysis
The point of divergence looked like this:
ScopedString s(scope, engine->newString(name)); uint idx = s->asArrayIndex(); if (idx < UINT_MAX) { setProperty(idx, value); return; } s->makeIdentifier(scope.engine); QV4::ScopedValue v(scope, QJSValuePrivate::convertedToValue(engine, value)); o->put(s, v);
The idx < UINT_MAX
check passes in the broken case, resulting in the indexed setProperty
call (which in turn calls putIndexed
), rather than the call to o->put
at the end.
So what is that s->asArrayIndex()
call anyhow? Well, my suspicion is that it’s an (important) optimization for accessing an object, using strings representing base-10 integers as keys. Arrays in JavaScript are objects that are indeed accessed this way (i.e. all arrays are actually associative arrays). So the engine tries to see if the string “looks like an integer”, and if so, uses some specialized storage (in a sparse array structure rather than a map).
But “240000000000” is an integer indeed, so what’s going on here?
asArrayIndex
does this:
uint asArrayIndex() const { if (subtype() == Heap::String::StringType_Unknown) d()->createHashValue(); Q_ASSERT(!d()->largestSubLength); if (subtype() == Heap::String::StringType_ArrayIndex) return d()->stringHash; return UINT_MAX; }
I still have no clue about what those subtypes are, but I could spot the call to createHashValue()
from the trace, therefore I knew it was getting called, therefore the string type was Unknown
.
So, time to go back to a debugger, and check what value was calculated there — and that’s indeed the broken 3776798720 value.
What does createHashValue()
do? It calls toArrayIndex
, which calculates a “hash” for the string, and possibly changes its subtype (if it detects that it is a valid index).
And that’s the culprit. The function has the bug just sitting there:
uint n = i*10 + x; if (n < i) // overflow return UINT_MAX;
That’s not a valid overflow check for multiplication. (It’s actually the check for detecting unsigned integer overflow for an addition.) That failing check was making the engine think it could treat such a big number as a 32-bit unsigned integer index into a sparse array. So, fix that, and job done!
Conclusions
It’s always rewarding to come up with the right idea at the right time to solve a very complex problem :).
In the meanwhile, the fix for this very strange bug has been submitted to Qt here, and it’s already merged in qtdeclarative (commit). The fix will be released in some future version of Qt; until then, talk to our consulting services if you need this issue fixed earlier.
In the meanwhile, happy hacking!
About KDAB
KDAB is a consulting company offering a wide variety of expert services in Qt, C++ and 3D/OpenGL and providing training courses in:
KDAB believes that it is critical for our business to contribute to the Qt framework and C++ thinking, to keep pushing these technologies forward to ensure they remain competitive.