How to declare a qHash overload
Today’s blog post is about something that should be simple and apparently it causes trouble: how to declare a qHash overload for a custom datatype. This is necessary when we want to use custom datatypes as keys in a QHash. From the documentation:
A QHash’s key type has additional requirements other than being an assignable data type: it must provide operator==(), and there must also be a qHash() function in the type’s namespace that returns a hash value for an argument of the key’s type.
In other words, given a class like:
namespace NS { class EXPORT MyClass { A m_a; B m_b; C m_c; D m_d; ~~~ }; }
How do we implement its qHash() overload? Let’s find out! (Most of this applies to std::hash too.)
Look at its operator==
The first thing to look at is the implementation of operator== (or operator<=>, coming C++20).
Assuming the implementation is sane, then the code will tell us which members are relevant for equality and which ones are not (caches, tags, whatever). The ones not relevant shall not be hashed (otherwise, you might end up with objects that compare equal and have different hashes).
If the type in question has a botched operator==, then either fix it if possible, otherwise kill the corresponding qHash() overload via = delete. This will avoid having people re-adding it in client code.
If the type does not have an operator==, or cannot have a meaningful one, do not provide qHash() at all.
What’s a “sane” implementation? Generally one that recursively uses plain operator== on the member subobjects. Fuzzy comparisons are a bad idea, making the type not hashable efficiently.
Prepare for Qt 6 (optional)
Optional step: prepare for some source-incompatible changes in Qt 6 (the impact is still unknown; possibly just warnings). Qt 6 changed the return type of qHash() from uint to size_t, so you can guard yourself with something like:
#if QT_VERSION < QT_VERSION(6, 0, 0) using qhash_result_t = uint; #else using qhash_result_t = size_t; #endif
Declaring your qHash overload
The idiomatic approach is:
namespace NS { // 1. add a forward declaration for the class class MyClass; // 2. consume the forward declaration for the class, by // forward declaring its qHash overload EXPORT qhash_result_t qHash(const MyClass &c, qhash_result_t seed = 0) noexcept; class EXPORT MyClass { ~~~ // same as before // 3. add: friend EXPORT qhash_result_t qHash(const MyClass &c, qhash_result_t seed) noexcept; }; }
The reason for going all the trouble with 1+2 is because friend declarations cannot have default parameters unless you’re also defining the function (cf. [dcl.fct.default]). Although QHash will never call your qHash overload without a seed, you may want to unit test it, call it from your own containers that aren’t seeded, etc.; so it’s just “polite” to have the second argument defaulted.)
In case your implementation is entirely inline, you can skip 1+2 and go just with 3, adding the default argument there, and defining (not just declaring) qHash in the body of the class. This would also make the qHash a hidden friend, which would be generally a good thing (see here, here and here for more information about hidden friends).
Using just 3 is not doable in general (e.g. if MyClass is pimpled), in which case your qHash must be out of line, so you have 1+2+3.
Whatever way you go, this will put your qHash overload in the surrounding namespace of your class, which is what you want (qHash is called via argument-dependent lookup in Qt).
Implementing your qHash overload
Then, implement it (inline or outline depending on MyClass). The idiomatic way to do so is to look at your operator==:
bool operator==(const MyClass &lhs, const MyClass &rhs) noexcept { return lhs.a == rhs.a && lhs.b == rhs.b && ~~~; }
Build your qHash exactly like it: on the same fields, in the same order. This explains why you want qHash to be a friend, just like operator==, to spare going through accessors:
qhash_result_t qHash(const MyClass &c, qhash_result_t seed) noexcept { QtPrivate::QHashCombine hash; seed = hash(seed, c.a); seed = hash(seed, c.b); seed = hash(seed, c.c); ~~~ return seed; }
QtPrivate::QHashCombine is private API, but does not require any special buildsystem magic; it’s in <qhashfunctions.h>, a public header.
The same result can be obtained with the convenience that I have added in Qt 6:
qhash_result_t qHash(const MyClass &c, qhash_result_t seed) noexcept { return qHashMulti(seed, c.a, c.b, ~~~); }
Reusing the same fields as operator==, in the same order, ensures the casual reader of the code that your hashing is correctly implemented.
If two objects are to be considered identical even if they have their data in a different order, then use QHashCombineCommutative / qHashMultiCommutative. For instance, you must use the commutative versions if you have a class that represents a set of elements, and two sets are equal if they have the same elements, even if stored in a different order. This again can be detected by looking at operator== and spotting a call to std::is_permutation or similar.
However, if your class simply uses a set, then in your operator== you will compare those sets via their own operator==, which means that hashing for your class will still use the non-commutative qHashMulti.
Things not to do in your qHash implementation
Summing
inline size_t qHash(const QQuickShapeGradientCacheKey &v, uint seed = 0) { uint h = seed + v.spread; for (int i = 0; i < 3 && i < v.stops.count(); ++i) h += v.stops[i].second.rgba(); return h; }
A simple sum usually doesn’t spread different values apart enough for the hashing to be effective.
XORing
inline size_t qHash(const QV4Debugger::BreakPoint &b, uint seed = 0) Q_DECL_NOTHROW { return qHash(b.fileName, seed) ^ b.lineNumber; }
This cancels information out (the seed, the fields themselves).
Getting creative with magic numbers
inline size_t qHash(const QQuickPixmapKey &key) { return qHash(*key.url) ^ (key.size->width()*7) ^ (key.size->height()*17) ^ (key.frame*23) ^ (key.region->x()*29) ^ (key.region->y()*31) ^ (key.options.autoTransform() * 0x5c5c5c5c); }
Completely “magic” prime sequence, raising many questions on the values used.
Being out of sync with your operator==
inline bool operator==(const QQuickPixmapKey &lhs, const QQuickPixmapKey &rhs) { return *lhs.region == *rhs.region && *lhs.size == *rhs.size && *lhs.url == *rhs.url && lhs.options == rhs.options && lhs.frame == rhs.frame; }
This is the same datatype as the previous example. How can you know, from a quick glance, if the qHash overload is correct? (Not good, just correct; equal objects yield equal hashes.)
Creating lots of unnecessary copies/temporaries
size_t qHash(const QQuickTextNodeEngine::BinaryTreeNodeKey &key) { return qHash(qMakePair(key.fontEngine, qMakePair(key.clipNode, qMakePair(key.color, key.selectionState)))); }
This is because qMakePair copies. Not necessarily a big deal in case of small, trivially copiable types; but qHashMulti would just lead to cleaner code, and possibly spread the hashing more equally.
Hashing Qt datatypes for which Qt does not provide a qHash overload
All the above discussion refers to the case where the datatype is a user-defined one. Sometimes we may need to build a QHash using a Qt datatype as a key, and we find out that the data type does not have a qHash overload. What do we do in this case?
- Report a bug. Any Qt datatype that provides an operator== should also provide a qHash overload.
- Roll your own qHash, but protected by a Qt version check that blocks the usage of a higher version of Qt. That code must be revisited when Qt itself gets upgraded — if a new version introduces a qHash overload, your code may break in interesting ways (e.g. ODR violation).
- (“Worst case scenario”) Ditch QHash for std::unordered_map and a custom hasher (following the same reasoning of 1+2, you don’t want to specialize std::hash for Qt datatypes), or use a wrapper type.
Hashing std:: datatypes via qHash
Unfortunately, you really shouldn’t do it. First, it should be Qt’s job at providing hashers for datatypes defined by the Standard, not yours, so the previous point applies.
Second, it’s impossible to do it reliably, because a qHash overload for such datatypes should be declared in the std namespace, and that’s not allowed.
An overload in any other namespace will not work correctly. Given your rolled out qHash for an arbitrary std::foo datatype:
// not in namespace std qhash_result_t qHash(const std::foo &foo, qhash_result_t seed = 0) { return ~~~; }
Then merely using a QHash will lull us into a false sense of security, as this code will work:
QHash<std::foo, bar> h; h.insert(~~~, ~~~);
But many other usages will be, unfortunately, broken. Suppose you just use the datatype as a data member of a class:
struct MyStruct { std::foo m_a; int m_b; ~~~ };
And then you want to define its qHash overload just like we’ve seen before:
qhash_result_t qHash(const MyStruct &s, qhash_result_t seed = 0) { // (very likely) ERROR: qHashMulti does not find qHash(std::foo) return qHashMulti(seed, c.m_a, c.m_b); }
How is this possible? This is due to how ADL works, combined with 2-phase name lookup. Suppose that you have this little piece of code:
// calls f() on a T object template <typename T> void call_f() { T t; f(t); } struct S {}; void f(S); void use() { call_f<S>(); }
This code works, even if S and f(S) were declared after call_f(). This is what gives us the power of using qHashMulti (AKA call_f) that can use qHash overloads on arbitrary types (AKA f(S)), even if these overloads got introduced after qHashMulti‘s own definition.
If we change the code just slightly, however, we have a problem:
// same as before template <typename T> void call_f() { T t; f(t); } #include <stdfoo> void f(std::foo); void use() { call_f<std::foo>(); }
This code now does not compile (surprise!). The reason is that the call to f() inside call_f() will only look in some scopes:
- it will check if a f(std::foo) was declared before call_f() (it wasn’t)
- it will check for a f(std::foo) in std::foo‘s associated namespaces. Which is namespace std, not the one we’re in!
To make the example work, we would need to do something like this:
// same as before template <typename T> void call_f() { T t; f(t); } #include <stdfoo> namespace std { void f(foo); } void use() { call_f<std::foo>(); }
… which would be OK for just about any other namespace. Not for namespace std, though: as explained before, we are not allowed to declare an arbitrary new function in that namespace.
Moving the definition of f(std::foo) before call_f would work (this would mean defining our qHash(std::foo) before including the definition of qHashMulti). But if we do so, we lose the convenience of using qHashMulti (as well as qHash for any other fundamental type or Qt type!).
You can play with this code here on Godbolt.
Long story short: due to qHash design, it’s not possible to conveniently and reliably hash datatypes defined by the Standard. You can use workarounds, e.g. define wrapper types and use such types as keys in your QHash objects.
Bonus points
As in, nice things to have.
- qHash overloads should really really be noexcept. Throwing from a hashing function sounds wrong on multiple levels.
- qHash overloads are good candidates for being marked as pure functions, that is, functions that only read memory, never modify it (or have any visible side effects). Such markers should help the compiler optimizing the code further.Qt defines a couple of convenience macros for this. One is Q_DECL_PURE_FUNCTION, that means that the function only reads global memory. There’s also Q_DECL_CONST_FUNCTION, as a even more strict version: the function only reads the values of the parameters, not any memory at all. Note that passing a reference/pointer and reading through it makes a function PURE, not CONST (something like strlen is PURE). Use CONST with a qHash overload where you’d pass an argument by value because it’s a small trivially copiable type like QPoint, QVector3D or similar, otherwise use PURE.
Why do I bother with all this?
Because declaring “ordinary” hashing functions should be straightforward, unless you have profiling data to believe that you can do better than the idiomatic way. I keep seeing code that makes it look dark magic. It’s not!
Ideally, in some not-too-far future, we may reach a level where the compiler will autogenerate the hash function for us — just like in C++20 it may autogenerate the implementation of operator== for us!
Thanks for reading!
If you like this article and want to read similar material, consider subscribing via our RSS feed.
Subscribe to KDAB TV for similar informative short video content.
KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.
Nice summary, thanks! Three points I’d like to comment on:
First, you can always make the qHash() function a hidden friend, because you can define it inline to call a private (or even public) member function, say MyType::hash(qhash_result_t seed = 0) const.
Second, beware of mixed-type equality comparison. A std::string isn’t equality-comparable to a std::u16string, so it’s ok for them to just hash the bits, but a QLatin1String is quality-comparable to a QString, but they, too, just hash the bits these days, which means that qHash(QLatin1String("foo")) != qHash(QString("foo")), even though QLatin1String("foo") == QString("foo"). This is a gotcha often overlooked and, IMHO, one of the major stalling points of mixed-type hash table lookup. Try to do better in your own code bases.
Third, please, never implement a global entity for a type you don’t own. Don’t add relational operators to types you don’t own, don’t add hashing, don’t add Q_DECLARE_TYPEINFO() or Q_DECLARE_METATYPE(), etc.
If you find you need any of these things for a type you don’t own, you have two options: a) wrap the type in a struct and add these things for that your struct, or else write custom hashers or relational operators as named lambdas and pass them into algorithms and data structures for use in lieu of std::less and std::hash (not an option with the Qt containers).
And yes, both Qt and std should really get their act together and provide a complete set of “global entities” (even if =deleteed) for all their Regular Types.