Last but not least, what if the code is evenly slow ? Profilers point you to bottlenecks, but if everything is as slow or as fast, nothing stands out. The results will be meaningless. Bad performance is usually a matter of a "death by a thousand cuts", little inefficiencies so uniformly spread through the codebase that they go unnoticed by most tooling.Consequently, while profilers are great and have their place, I rather talk about complementary techniques to tackle the problem of the 5%-10% layer of cruft that goes undetected.Enter Clang:
C++ compilers are mostly simple-minded, while they understand the syntax of the language, they won't complain if you don't follow, for example, Boost, STL or Qt best practices.
Wouldn't it be great if compilers operated at an higher semantic level and understood more than C++ ? They could hint your STL algorithm would benefit from std::vector::reserve() calls, warn about Qt container detachments or even automatically rewrite your code to follow best practices regarding QStringLiteral and QLatin1String.
Fortunately the Clang project lets you do just that. Clang is a C/C++ frontend for the LLVM compiler infrastructure. It exposes a nice and modular API that allows you to tap into the build and hook in custom AST visitors which emit your own warnings and errors.
So, motivated by the fact that using grep and regular expressions wasn't cutting it for me any more I decided to see how easy it was to write a clang plug-in and make the compiler work for me.
After a couple days of hacking and very few lines of code the clazy static checker is born:
$ clang++ -Xclang -load -Xclang ClangClazy.so -Xclang -add-plugin -Xclang clang-lazy -I /usr/include/qt/ -fPIC -std=c++11 -c test.cpp
a.cpp:8:1: warning: non-POD static [-Wclazy-non-pod-global-static]
static QString s_string;
^
a.cpp:24:13: warning: Reserve candidate [-Wclazy-reserve-candidates]
structs.push_back(TestStruct());
^
a.cpp:37:21: warning: Missing reference on large type sizeof std::vector<TestStruct> is 24 bytes) [-Wclazy-function-args-by-ref]
void initialize(std::vector<TestStruct> structs)
^
a.cpp:39:9: warning: Use QHash<K,T> instead of QMap<K,T> when K is a pointer [-Wclazy-qmap-with-key-pointer]
QMap<TestStruct*, int> shouldBeQHash;
^
a.cpp:40:18: warning: Missing reference in foreach with sizeof(T) = 4000 bytes [-Wclazy-foreacher]
foreach (auto s, structs)
^
a.cpp:63:21: warning: Don't call QVector::first() on temporary [-Wclazy-detaching-temporary]
TestStruct ts = tc.getVector().first();
^
a.cpp:70:17: warning: QString(QLatin1String) being called [-Wclazy-qstring-uneeded-heap-allocations]
QString s = QLatin1String("literal");
6 warnings generated.
The checks related to memory allocations actually make a big difference and frequently appear under profilers. Missing reserve() calls and temporary QStrings due to QLatin1String/char* misuse create many small heap allocations resulting in internal fragmentation and we know most allocators aren't very keen about returning resources back to kernel.
Can you think of any interesting check ? Please leave a comment.
I hope to have opened your appetite and curiosity, it wasn't so long ago when most static analysers were primitive and based on regular expressions.
C++ has since made great strides in improving the language, but the tooling is advancing just as fast.
31 Comments
7 - Sept - 2015
Jesper Juhl
Since you are asking for ideas for performance related checkers I thought I'd post a few small ideas.
Check for comparisons of .size() against 0 instead of calling .empty()/.isEmpty()
Check for use of post-increment where pre-increment could have been used just as well.
Check for loops that open-code std::fill() rather than calling the algorithm.
Check for loops that iterate column-major rather than row-major. Since the latter usually has better cache performance characteristics.
7 - Sept - 2015
Clare Macrae
It would be great to have a check for passing shared_ptr parameters by value - and better still still to have a tool that converted such calls to passing by const reference!
8 - Sept - 2015
mlim
@Clare,
You might be interested in this CppCon talk on automated refactoring for C++, which covers ideas like automatically fixing common anti-patterns:
https://www.youtube.com/watch?v=ZpvvmvITOrk
7 - Sept - 2015
Jean-Michaël Celerier
And more broadly-scoped ones : - Long types used in more than one place (e.g. QMap) which should be given a name instead
- Code that would benefit from inlining (dunno if it's doable / computable ? maybe if it's very few LLVM instructions ?)
8 - Sept - 2015
Jesper Juhl
Check for use of 'new' to initialize unique_ptr/shared_ptr rather than using make_unique/make_shared.
Suggest using 'extern template' if the same template is instantiated in multiple compilation units (to improve build performance, not runtime performance).
Suggest using emplace()/emplace_back() rather than insert()/push_back() if the object in question is an rvalue.
Suggest removing virtual if a class has virtual functions but no other classes inherit from it and the virtual functions also do not override any from base classes.
If a class has a move constructor that is not 'noexcept', suggest to make it so. So that it can be used with functions that use std::move_if_noexcept to determine whether to move or copy objects (for example std::vector::resize()).
8 - Sept - 2015
trackvegeta
It seems that your plug-in is made for checking the use of Qt.
It will be great to separate the different kinds of checkings you do.
You can make a plug-in with several kind of checkings
1) Check code before c++11 2) Check c++11 code 3) Check code for specific libraries (like Qt ...)
It will be great to separate the different kinds of checking into several plug-in.
8 - Sept - 2015
mlim
Check if a class is following the "rule of zero" -- if it defines any of the 5 auto-generated functions (dtor, copy ctor and assignment, move ctor and assignment), it should declare all of them.
Even continuing to follow the "rule of three" under C++11 can leave some performance on the table because if one of the auto-generated functions appears, the compiler won't generate move operations implicitly.
9 - Sept - 2015
Paul Barbot
Very Nice, I was considering using libclang to compute the "sizeof" of all the class/struct defined in a project. Your checks takes in account object usage which is even better.
Maybe integrate your checks with clang-tidy(http://clang.llvm.org/extra/clang-tidy/index.html). This tool is handy if you just want the analysis without the compilation/code-emission.
11 - Sept - 2015
Jesper Juhl
A few more ideas:
check for uses of static's in header files. They will get a copy in multiple compilation units, and if the linker is not smart (enough) about removing them they can bloat your libraries and executables (which can have a performance impact).
Check for classes/structs where shuffling the order of member variables can significantly reduce the amount of padding the compiler has to add. Smaller objects are always nice, but when reordering members can bring a class size down from occupying more than 1 cache line to less than one then there can really be significant performance gains.
Look for code that uses "out parameters" that should just use a normal return value. The RVO and NRVO optimizations can kick in for normal returns - not so much for out parameters.
Check for empty user-defined implementations dof destructors and default constructors and suggest that they either be deleted or defaulted (=default) instead. When user-defined they are by dwfinition never trivial, but when they are trivial they can often enable compiler optimizations that are otherwise not available (especially if making those functions trivial makes the entire type POD).
11 - Sept - 2015
Jesper Juhl
By the way. Would you appreciate outside contributions on the code or would you rather just get ideas and input and implement them yourself? I think a tool like this is a great idea and I'd be happy to contribute (with what little time I have (which may be zero)). But I won't bother if you'd rather just have suggestions and evolve it on your own...?
11 - Sept - 2015
Sérgio Martins
I would absolutely accept contributions, I'm even moving the repo to KDE playground where everyone can participate.
I wonder however which types of new checks we want. For pure C++ checks we already have clang static analyser and clang-tidy, so maybe we should be collaborating with them.
For Qt specific checks I think clazy is the right place for it.
11 - Sept - 2015
Jesper Juhl
Clang tidy and analyzer are focused on finding bugs. As I see it, clazy fills a different niche; finding performance issues. So I'd say both Qt and plain C++ checks could go in, as long as they are performance related. For bug checks, submitting those to clang analyzer or tidy would make sense. Just my opinion :)
2 - Oct - 2015
Ram
Hi,
need a bit of help please. I've downloaded clazy from https://quickgit.kde.org/?p=scratch%2Fsmartins%2Fclazy.git and clang from http://llvm.org/releases/download.html. Compilation failing here.
Thanks in advance.
2 - Oct - 2015
Sérgio Martins
Which compiler are you using to compile clazy itself ? While it should also compile fine with gcc, please try clang first (export CXX=clang++)
6 - Oct - 2015
Nyall Dawson
Fantastic stuff - I'm loving it!
One small issue: with the function-args-by-ref check, I see a warnings generated whenever the Q_DECLARE_OPERATORS_FOR_FLAGS macro is used. Could this macro be ignored for this test?
7 - Oct - 2015
Sérgio Martins
will have a look, thanks
7 - Oct - 2015
Sérgio Martins
I can't reproduce this problem. Please send me a minimal testcase (a compilable test.cpp).
8 - Oct - 2015
Nyall Dawson
I suspect it may be because I'm building my project using Qt 4.8. I'll investigate further.
In the meantime, would you consider adding this check? https://github.com/nyalldawson/clazy/commit/e1d08e988ee66bc16abed6655fb86024dc7fd75f
It's based heavily off the existing detach temporary check, but is designed to warn when iterator functions like begin() and end() are called on temporary containers. I got hit by this bug yesterday and thought it would make a good clazy check! The code may need some cleaning, but it works well for me.
9 - Oct - 2015
Sérgio Martins
Thanks, seems very useful!
Filed as https://bugs.kde.org/show_bug.cgi?id=353732
26 - Oct - 2015
Alexey Ivanov
using prebuild clang from llvm website, ubuntu 15.10. with cmake command:
26 - Oct - 2015
Sérgio Martins
Be sure do not mix different ABIs. gcc 5 is now using the new ABI (where std::string isn't COW), and the pre-built binaries of clang might still be using old ABI. So try clang from the ubuntu repos: apt-get install g++ cmake clang llvm git-core libclang-3.6-dev
26 - Oct - 2015
Alexey Ivanov
problem with clang from official repos:
/usr/lib/llvm-3.6/bin/clang: symbol lookup error: /usr/lib/ClangLazy.so: undefined symbol: _ZNK5clang15DeclarationName11getAsStringEv
26 - Oct - 2015
Alexey Ivanov
Ok. it works now. Requires libc++-dev libc++abi-dev packages. And -stdlib=libc++ flag.
Clazy builded with command: cmake .. -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS_RELEASE="-std=c++11 -stdlib=libc++"
And program with command: qmake ../vacuum.pro -r INSTALL_PREFIX=/dev/qt5 CONFIG-=release CONFIG+=debug -spec linux-clang-libc++ QMAKE_CXXFLAGS+="-std=c++11 -stdlib=libc++ -Xclang -load -Xclang ClangLazy.so -Xclang -add-plugin -Xclang clang-lazy"
26 - Oct - 2015
Alexey Ivanov
Oops, nvm, works without CLAZY_CHECKS env.
And "symbol lookup error" with export CLAZY_CHECKS=all_checks
26 - Oct - 2015
Alexey Ivanov
Looks like problem with inefficient-qlist check.
1 - Nov - 2015
Eric Lemanissier
Thanks for this great tool ! I am getting a lot of "c++11 range-loop might detach Qt container", especially when iterating on member variables in non const member function. What would be the best way to make sure it does not detach ? The only I can come up is "for(const auto &e: static_cast(this)->member)", but it feels ugly
1 - Nov - 2015
Sérgio Martins
Use foreach instead: foreach (const auto &e, member) {}
2 - Nov - 2015
ericLemanissier
Really ? I thought foreach was slower than range-based for loop (if there is not detach) : https://bugreports.qt.io/browse/QTBUG-41636
2 - Nov - 2015
Sérgio Martins
Note the difference in orders of magnitude:
range-for with detachment: RESULT : Rangefor_Test::rangeFor_Vector(): 7.1 msecs per iteration (total: 57, iterations: 8)
range-for without detachment: RESULT : Rangefor_Test::rangeFor_ConstVector(): 0.000023 msecs per iteration (total: 99, iterations: 4194304)
foreach: RESULT : Rangefor_Test::forEach_Vector(): 0.000046 msecs per iteration (total: 97, iterations: 2097152)
Yes the foreach takes 2x as long as a range-loop that carefully doesn't detach. But a range-loop that detaches takes 15000x times more.
the clazy foreach/range-loop check was made to tackle the former.
2 - Nov - 2015
ericLemanissier
These numbers actually do not have any meaning, because the loops were totally optimized away. Please refer to Torgeir Lilleskog's comment : https://bugreports.qt.io/browse/QTBUG-41636?focusedCommentId=258507&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-258507
According to him switching from detaching rangeFor to foreach is a 4x speedup, but switching from foreach to non detaching rangefor if a 2x speedup, which is quite important
17 - Nov - 2015
Eugene Zelenko
A lot of suggested checks are already implemented in Clang-tidy(http://clang.llvm.org/extra/clang-tidy/). C++11 migration checks were moved in there recently.
By the word, it may make sense to port Clazy checks to Clang-tidy.