Mixing C++ and Rust for Fun and Profit: Part 1 Or why switching to Rust is such a large undertaking
For quite some time, I have been bothered by this thought: Individual programming languages (C++, Rust, Go, etc.) are traditionally viewed as walled gardens. If your main()
function is written in C++, you had better find yourself C++ libraries like Qt to build the rest of your codebase with. Do you want to use Flutter to build your app’s user interface? Get ready to build the logic in Flutter, too. Do you really want to use that Rust library to make your application safer? You get to either rewrite the whole app in Rust or build an ugly extern "C"
wrapper around it that won’t fit well in your object-oriented C++ code.
This has been the standard view on using multiple programming languages for many years. However, I’ve decided that this view is fundamentally flawed, because every compiled language uses the same set of concepts when it is compiled:
-
- Code is split up into functions that can be reused.
- Functions are identified by a string generated from the function name in the source code. For example, g++ generates
_Z3foov
as the identifier forvoid foo()
. This string is always reproducible; for example, both Clang and GCC on Linux follow the Itanium C++ ABI convention for mangling function names. - Functions are called by storing all parameters to that function at a specific location in memory and then using a
call
instruction or equivalent to move control to the function. For example, to callvoid foo()
from earlier, the compiler converts a C++ statementfoo();
into the assemblycall _Z3foov
. The assembler then replacescall
with the appropriate opcode and replaces_Z3foov
with the location of the first instruction identified by_Z3foov
. - Functions return by storing their return value (if they have one) at a specific location and then using a
ret
instruction or equivalent. - Classes and structs can be boiled down to a collection of primitive types (although some classes do have vtables).
- Class methods are just another function that happens to take a pointer to the class object as the first parameter. In other words, when you write this:
class Foo { void foo(int bar); int baz; };
your code actually compiles to something that is better represented this way:
class Foo { int baz; }; void foo(Foo *this, int bar);
Since every compiled programming language uses the same concepts to compile, why can’t they just interact?
Example
Before we go any further, I’d like to give an example of what we want to achieve:
// file: main.cpp #include "rustmodule.h" // or in an ideal C++ 20 world: // import rustmodule; int main() { foo(); return 0; }
// file: rustmodule.h #pragma once // this is defined in Rust void foo();
// file: rustmodule.rs pub fn foo() { println!("Hello from Rust"); }
We want to be able to compile those files and get an executable file that prints
Hello from Rust
tostdout
.Now let’s look at why this won’t just work out of the box.
Name mangling, data layout, and standard libraries
The most obvious reason that compiled programming languages can’t just interact with each other is the most obvious one: syntax. C++ compilers don’t understand Rust, and Rust compilers don’t understand C++. Thus neither language can tell what functions or classes the other is making available.
Now, you might be saying “But if I use a C++ .h file to export functions and classes to other .cpp files, certainly I could make a .h file that tells C++ that there is a Rust function
fn foo()
out there!” If you did say (or at least think) that, congratulations! You are on the right track, but there are some other less obvious things we need to talk about.The first major blocker to interoperability is name mangling. You can certainly make a .h file with a forward declaration of
void foo();
, but the C++ compiler will then look for a symbol called_Z3foov
, while the Rust compiler will have mangledfn foo()
into_ZN10rustmodule3foo17hdf3dc6f68b54be51E
. Compiling the C++ code starts out OK, but once the linking stage is reached, the linker will not be able to find_Z3foov
since it doesn’t exist.Obviously, we need to change how the name mangling behaves on one side or the other. We’ll come back to this thought in a moment.
The second major blocker is data layout. Put simply, different compilers may treat the same struct declaration differently by putting its fields at different locations in memory.
The third and final blocker I want to look at here is standard libraries. If you have a C++ function that returns an
std::string
, Rust won’t be able to understand that. Instead, you need to implement some sort of converter that will convert C++ strings to Rust strings. Similarly, a RustVec
object won’t be usable from C++ unless you convert it to something C++ understands.Let’s investigate how we can fix the first problem, name mangling.
extern "C"
and why it sucksThe easy way is to use the
extern "C"
feature that nearly every programming language has:// file: main.cpp #include "rustmodule.h" // or in an ideal C++ 20 world: // import rustmodule; int main() { foo(); return 0; } // file: rustmodule.h #pragma once extern "C" void foo();
// file: rustmodule.rs #[no_mangle] pub extern "C" fn foo() { println!("Hello from Rust"); }
This actually will compile and run (assuming you link all the proper standard libraries)! So why does
extern "C"
suck? Well, by usingextern "C"
you give up features like these:- Function overloads
- Class methods
- Templates
It’s possible to create wrappers around the
extern "C"
functions to crudely emulate these features, but I don’t want complex wrappers that provide crude emulation. I want wrappers that directly plumb those features and are human readable! Furthermore, I don’t want to have to change the existing source, which means that the ugly#[no_mangle] pub extern "C"
must go!Enter D
D is a programming language that has been around since 2001. Although it is not source compatible with C++, it is similar to C++. I personally like D for its intuitive syntax and great features, but for gluing Rust and C++ together, D stands out for two reasons:
extern(C++)
andpragma(mangle, "foo")
.With
extern(C++)
, you can tell D to use C++ name mangling for any symbol. Therefore, the following code will compile:// file: foo.cpp #include <iostream> void bar(); void foo() { std::cout << "Hello from C++\n"; bar(); }
// file: main.d import std.stdio; extern(C++) void foo(); extern(C++) void bar() { writeln("Hello from D"); } void main() { foo(); }
However, it gets better: we can use
pragma(mangle, "foo")
to manually override name mangling to anything we want! Therefore, the following code compiles:// file: main.d import std.stdio; pragma(mangle, "_ZN10rustmodule3foo17h18576425cfc60609E") void foo(); pragma(mangle, "bar_d_function") void bar() { writeln("Hello from D"); } void main() { foo(); }
// file: rustmodule.rs pub fn foo() { println!("Hello from Rust"); unsafe { bar(); } } extern { #[link_name = "bar_d_function"] fn bar(); }
With
pragma(mangle, "foo")
we can not only tell D how Rust mangled its function, but also create a function that Rust can see!You might be wondering why we had to tell Rust to override mangling of
bar()
. It’s because Rust apparently won’t apply any name mangling tobar()
for the sole reason that it is in anextern
block; in my testing, not even marking it asextern "Rust"
made any difference. Go figure.You also might be wondering why we can’t use Rust’s name mangling overrides instead of D’s. Well, Rust only lets you override mangling on function forward declarations marked as
extern
, so you can’t make a function defined in Rust masquerade as a C++ function.Using D as the glue
We can now use D to glue our basic example together:
// file: main.cpp #include "rustmodule.h" // or in an ideal C++ 20 world: // import rustmodule; int main() { foo(); return 0; } // file: rustmodule.h #pragma once // this is in Rust void foo();
// file: rustmodule.rs pub fn foo() { println!("Hello from Rust"); }
// file: glue.d @nogc: // This is the Rust function. pragma(mangle, "_ZN10rustmodule3foo17h18576425cfc60609E") void foo_from_rust(); // This is exposed to C++ and serves as nothing more than an alias. extern(C++) void foo() { foo_from_rust(); }
In this example, when
main()
callsfoo()
from C++, it is actually calling a D function that can then call the Rust function. It’s a little ugly, but it’s possibly the best solution available that leaves both the C++ and Rust code in pristine condition.Automating the glue
Nobody wants to have to write a massive D file to glue together the C++ and Rust components, though. In fact, nobody even wants to write the C++ header files by hand. For that reason, I created a proof-of-concept tool called polyglot that can scan C++ code and generate wrappers for use from Rust and D. My eventual goal is to also wrap other languages, but as this is a personal project, I am not developing polyglot very quickly and it certainly is nowhere near the point of being ready for production use in serious projects. With that being said, it’s really amazing to compile and run the examples and know that you are looking at multiple languages working together.
Next up
I originally planned to write on this topic in one blog post, but there are a lot of interesting things to cover, so I will stop here for now. In the next installment (part 2) of this series we will take a look at how we can overcome the other two major blockers to language interoperability and here you can find part 3.
If you like this article and want to read similar material, consider subscribing via our RSS feed.
Subscribe to KDAB TV for similar informative short video content.
KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.
Check language called nim .
Nim looks like it could indeed be useful here; however, I think D is the better solution for this usecase. This article is targeted at programmers who are already familiar with C++, and D has a much more C++-like syntax than Nim. However, it would be perfectly valid to use Nim here instead of D; in fact, any language that supports changing the name mangling of arbitrary symbols (both external and internal) could be used instead of D.
Maybe the title should be “Mixing C++, D and Rust for Fun and Profit”
Thanks for the suggestion! However, the emphasis here is on the interop between C++ and Rust. D is used merely as a glue layer; while you can easily expand the glue to bind to other languages including D, I’m using C++ and Rust since many people are interested in migrating their C++ codebases to Rust.
Excellent post. I was surprised and happy to have mentioned Dlang in this post.
Even the technique used in Dlang also reminded me of the possibility also carried out in Zig, which can also read mangled functions, probably inheriting llvm-demangle.
I believe that Swift is the only one that does not require bindings and reading the modulemap containing the header included.
I’m curious about the polyglot project, and of course in comparison to the cxx-rs and cbindgen project.
I’m a huge fan of D myself and I’m always happy to promote its use in any way possible. You are right, though, that D is not the only language that is usable as a glue layer; however, D is easy to understand for C++ programmers.
Good job for noticing that this is somewhat duplicating the cxx-rs effort, though! I plan to review some existing efforts like cxx-rs in a later installment of this series.
Out of curiosity, how does D know about the calling conventions for Rust code, as far as I know those are version dependent and can change arbitrarily?
Good question! I actually was not aware that Rust calling conventions are unstable. However, it looks like you can configure Rust to compile as C-compatible shared library, and the Rust docs also imply that compiling Rust as a static library will work as well (assuming you link the final executable to the Rust system libraries).
Please start a youtube channel & explain.
Thank you.
People with D always wants to put the it where it doesn’t belong 😏
Have you tried to use a linker script with EXTERN and PROVIDE?
No, I haven’t tried linker scripts. That does seem like an interesting option; I might have to research it to see how easy it is to integrate into a binding generator.