Trusted Software Excellence across Desktop and Embedded
Take a glance at the areas of expertise where KDAB excels ranging from swift troubleshooting, ongoing consulting and training to multi-year, large-scale software development projects.
Find out why customers from innovative industries rely on our extensive expertise, including Medical, Biotech, Science, Renewable Energy, Transportation, Mobility, Aviation, Automation, Electronics, Agriculture and Defense.
High-quality Embedded Engineering across the Stack
To successfully develop an embedded device that meets your expectations regarding quality, budget and time to market, all parts of the project need to fit perfectly together.
Learn more about KDAB's expertise in embedded software development.
Where the capabilities of modern mobile devices or web browsers fall short, KDAB engineers help you expertly architect and build high-functioning desktop and workstation applications.
Extensible, Safety-compliant Software for the Medical Sector
Create intelligent, patient-focused medical software and devices and stay ahead with technology that adapts to your needs.
KDAB offers you expertise in developing a broad spectrum of clinical and home-healthcare devices, including but not limited to, internal imaging systems, robotic surgery devices, ventilators and non-invasive monitoring systems.
Building digital dashboards and cockpits with fluid animations and gesture-controlled touchscreens is a big challenge.
In over two decades of developing intricate UI solutions for cars, trucks, tractors, scooters, ships, airplanes and more, the KDAB team has gained market leading expertise in this realm.
Build on Advanced Expertise when creating Modern UIs
KDAB assists you in the creation of user-friendly interfaces designed specifically for industrial process control, manufacturing, and fabrication.
Our specialties encompass the custom design and development of HMIs, enabling product accessibility from embedded systems, remote desktops, and mobile devices on the move.
Legacy software is a growing but often ignored problem across all industries. KDAB helps you elevate your aging code base to meet the dynamic needs of the future.
Whether you want to migrate from an old to a modern GUI toolkit, update to a more recent version, or modernize your code base, you can rely on over 25 years of modernization experience.
KDAB offers a wide range of services to address your software needs including consulting, development, workshops and training tailored to your requirements.
Our expertise spans cross-platform desktop, embedded and 3D application development, using the proven technologies for the job.
When working with KDAB, the first-ever Qt consultancy, you benefit from a deep understanding of Qt internals, that allows us to provide effective solutions, irrespective of the depth or scale of your Qt project.
Qt Services include developing applications, building runtimes, mixing native and web technologies, solving performance issues, and porting problems.
KDAB helps create commercial, scientific or industrial desktop applications from scratch, or update its code or framework to benefit from modern features.
Discover clean, efficient solutions that precisely meet your requirements.
Boost your team's programming skills with in-depth, constantly updated, hands-on training courses delivered by active software engineers who love to teach and share their knowledge.
Our courses cover Modern C++, Qt/QML, Rust, 3D programming, Debugging, Profiling and more.
The collective expertise of KDAB's engineering team is at your disposal to help you choose the software stack for your project or master domain-specific challenges.
Our particular focus is on software technologies you use for cross-platform applications or for embedded devices.
Since 1999, KDAB has been the largest independent Qt consultancy worldwide and today is a Qt Platinum partner. Our experts can help you with any aspect of software development with Qt and QML.
KDAB specializes in Modern C++ development, with a focus on desktop applications, GUI, embedded software, and operating systems.
Our experts are industry-recognized contributors and trainers, leveraging C++'s power and relevance across these domains to deliver high-quality software solutions.
KDAB can guide you incorporating Rust into your project, from as overlapping element to your existing C++ codebase to a complete replacement of your legacy code.
Unique Expertise for Desktop and Embedded Platforms
Whether you are using Linux, Windows, MacOS, Android, iOS or real-time OS, KDAB helps you create performance optimized applications on your preferred platform.
If you are planning to create projects with Slint, a lightweight alternative to standard GUI frameworks especially on low-end hardware, you can rely on the expertise of KDAB being one of the earliest adopters and official service partner of Slint.
KDAB has deep expertise in embedded systems, which coupled with Flutter proficiency, allows us to provide comprehensive support throughout the software development lifecycle.
Our engineers are constantly contributing to the Flutter ecosystem, for example by developing flutter-pi, one of the most used embedders.
KDAB invests significant time in exploring new software technologies to maintain its position as software authority. Benefit from this research and incorporate it eventually into your own project.
Start here to browse infos on the KDAB website(s) and take advantage of useful developer resources like blogs, publications and videos about Qt, C++, Rust, 3D technologies like OpenGL and Vulkan, the KDAB developer tools and more.
The KDAB Youtube channel has become a go-to source for developers looking for high-quality tutorial and information material around software development with Qt/QML, C++, Rust and other technologies.
Click to navigate the all KDAB videos directly on this website.
In over 25 years KDAB has served hundreds of customers from various industries, many of them having become long-term customers who value our unique expertise and dedication.
Learn more about KDAB as a company, understand why we are considered a trusted partner by many and explore project examples in which we have proven to be the right supplier.
The KDAB Group is a globally recognized provider for software consulting, development and training, specializing in embedded devices and complex cross-platform desktop applications.
Read more about the history, the values, the team and the founder of the company.
When working with KDAB you can expect quality software and the desired business outcomes thanks to decades of experience gathered in hundreds of projects of different sizes in various industries.
Have a look at selected examples where KDAB has helped customers to succeed with their projects.
KDAB is committed to developing high-quality and high-performance software, and helping other developers deliver to the same high standards.
We create software with pride to improve your engineering and your business, making your products more resilient and maintainable with better performance.
KDAB has been the first certified Qt consulting and software development company in the world, and continues to deliver quality processes that meet or exceed the highest expectations.
In KDAB we value practical software development experience and skills higher than academic degrees. We strive to ensure equal treatment of all our employees regardless of age, ethnicity, gender, sexual orientation, nationality.
Interested? Read more about working at KDAB and how to apply for a job in software engineering or business administration.
An important part of working with Vulkan and other modern explicit rendering APIs is the synchronization of GPU/GPU and CPU/GPU workloads. In this article we will learn about what Vulkan needs us to synchronize and how to achieve it. We will talk about two high-level parts of the synchronization domain that we, as application and library developers, are responsible for:
GPU↔GPU synchronization to ensure that certain GPU operations do not occur out of order,
CPU↔GPU synchronization to ensure that we maintain a certain level of latency and resource usage in our applications.
GPU↔GPU Synchronization
Whereas in OpenGL we could simply render to the GL_BACK buffer of the default framebuffer and then tell the system to swap the back and front buffers, with Vulkan we have to get more involved. Vulkan exposes the concept of a swapchain of images. This is essentially a collection of textures (VkImages) that are owned and managed by the swapchain and the window system integration (WSI). A typical frame in Vulkan looks something like this:
Acquire the index of the swapchain image to which we should render.
Record one or more command buffers that ultimately output to the swapchain image from step 1.
Submit the command buffers from step 2 to a GPU queue for processing.
Instruct the GPU presentation engine to display the final rendered swapchain image from step 3.
Go back to step 1 and start over for the next frame.
This may look innocuous at first glance but let's delve deeper.
A day at the races
In step 1 we are asking the WSI to tell us the index of the next available swapchain image that we may render into. Now, just because this function tells us (and the CPU) that, for example, image index 1 is the image we should use as our render target, it does not mean that the GPU is actually ready to write to this image right now.
It is important to note that we are operating on two distinct timelines. There is the CPU timeline that we are familiar with when writing applications. Then there is also the GPU timeline on which the GPU processes the work that we give to it (from the CPU timeline).
In the case of acquiring a swapchain image index, we are actually asking the GPU to look into the future a little bit and tell us which image index will become the next image to become ready for writing. However, when we call the function to acquire this image index, the GPU presentation engine may well still be reading from the image in question in order to display its contents from an earlier frame.
Many people coming new to Vulkan (myself included) make the mistake of thinking that acquiring the swapchain image index means the image is ready to go right now. It's not!
In step 2, we are entirely operating on the CPU timeline and we can safely record command buffers without fear of trampling over anything happening on the GPU.
The same is true in step 3. We can happily submit the command buffers which will render to our swapchain image. However, this does then trigger the problem. If the GPU presentation engine is still busy reading from the swapchain image when suddenly along comes a bundle of work that tells the GPU to render into that same image we have a potential problem. GPUs are thirsty beasts and are massively parallel machines that like to do as much as possible concurrently. Without some form of synchronization, it is clear to see that, if the GPU begins processing the command buffers, it could easily lead to a situation where the presentation engine could be reading data at the same time it is being written to by another GPU thread. Say hello to our old friend undefined behaviour!
It is now clear that we need some mechanism to instruct the GPU to not process these command buffers until the GPU presentation engine is done reading from the swapchain image we are rendering to.
The solution for synchronising blocks of GPU work in Vulkan is a semaphore (VkSemaphore).
The way it works is that in our application's initialisation code, we create a semaphore for the purposes of forcing the command buffer processing to begin only once the GPU presentation engine tells us it is done reading from the swapchain image it told us to use.
With this semaphore in hand, we can tell the GPU to switch it to a "signalled" state when the presentation engine is done reading from the image. The other half of the problem is solved when we submit the render command buffers to the GPU by handing the same semaphore to the call to vkQueueSubmit().
We now have this kind of setup:
At initialisation, create a semaphore (vkCreateSemaphore) in the unsignalled state.
Pass the above semaphore to vkAcquireNextImageKHR as the semaphore argument so that it is signalled when the image is ready for writing.
Pass the above semaphore to vkQueueSubmit (as one of the pWaitSemaphore arguments of the VkSubmitInfo struct) so that this set of command buffers is deferred until the semaphore is signalled.
Phew, we're all done right? Nope, sadly not. Read on to see what else can go wrong and how to solve it.
I'm not ready to show you my painting
We have solved the race condition on the GPU of preventing the start of the rendering from clobbering the swapchain image whilst the presentation engine may still be reading from it. However, there is currently nothing to prevent the request to begin the presentation of the swapchain image whilst the rendering is still going on!
That is, we have solved the potential race between steps 1 and 3, but there is another race between steps 3 and 4. Luckily the problem is at heart exactly the same. We need to stop some incoming GPU work (the present request in step 4) from stepping on the toes of the already ongoing rendering work from step 3. That is, we need another application of GPU↔GPU synchronization which we know we can do with a semaphore.
To solve this race condition we use the following approach:
At initialisation, create another unsignalled semaphore.
In step 3 when we submit the command buffers for rendering, we pass in the semaphore to vkQueueSubmit as one of the pSignalSemaphores arguments.
In step 4 we then pass this same semaphore to the call to vkQueuePresentKHR as one of the pWaitSemaphores arguments.
This works in a completely analogous way to the first problem that we solved. When we submit the render command buffers for processing, this second semaphore is unsignalled. When the command buffers finish execution, the GPU will transition the semaphore to the signalled state. The call to vkQueuePresentKHR has been configured to ensure the presentation engine waits for this condition to be true before beginning whatever work it needs to do to get that image on to our screen.
With the above two race conditions brought under control, we can now safely loop around the sequence of steps 1-4 as many times as we like.
Well, almost. There is a slight subtlety in that the swapchain has N frames (typically 3 or so) but so far we have only created a single semaphore for the presentation→render ordering, and a second single semaphore for the render→presentation ordering. Usually however, we do not want to render and present a single image and then wait around for the presentation to be done before starting over, as that is a big waste of cycles on both the CPU and GPU sides.
As a side note, many Vulkan examples in tutorials do this by introducing a call to vkDeviceWaitIdle or vkQueueWaitIdle somewhere in their main loop. This is fine for learning Vulkan and its concepts but to get full performance we want to go further into allowing the CPU and the GPU to work concurrently.
One thing that we can do is to create enough semaphores such that we have one each for every frame that we wish to have "in flight" at any time and for each of the 2 required synchronization points. We can then use the i'th pair of semaphores for the i'th in-flight frame and when we get to the N'th in-flight frame we loop back to the 0'th pair of semaphores in a round robin fashion.
This then allows us to get potentially N frames ahead of the GPU on the CPU timeline. This, unfortunately, opens up our next can of worms.
CPU↔GPU Synchronization
So far we have shown that using semaphores when enqueuing work for the GPU allows us to correctly order the work done on the GPU timeline. We have briefly mentioned that this does nothing to keep the CPU in sync with the GPU. As it stands right now the CPU is free to schedule as much work in advance as we like (assuming sufficient available resources). This has a couple of issues though:
The more frames of work in advance the CPU schedules work for the GPU, the more resources we need to hold command buffers, semaphores etc. - not to mention the GPU resources to which the command buffers refer, such as buffers and textures. These GPU resources all have to be kept alive as long as any command buffers are referencing them.
The second issue is that the further the CPU gets ahead of the GPU the further our simulation state gets ahead of what we see. That means, the more frames ahead we allow the CPU to get, the higher is our latency. Some latency can be good in that if we have a frame or two queued up already, a frame that then takes a bit longer to prepare can be absorbed unnoticed. However, too much latency and our application feels sluggish and unnatural to use as it takes too long for our input to be responded to and for us to see the results of that on screen.
It is therefore essential to have a good handle on our system's latency which in this case means the number of frames we allow to be "in flight" at any one time. That is, the number of frames worth of command buffers that have been submitted to the GPU queues and are being recorded at the current time. A common choice here is to allow 2-3 frames to be in flight at once. Bear in mind that this also depends upon other factors such as your display's refresh rate. If you are running on a high refresh rate display at say 240Hz, then each frame is only around for 1/4 of the time of a "standard" 60Hz display. If this is the case, you may wish to increase the number of frames in flight to compensate.
Let's parameterise the max number of frames that the CPU can get ahead as MAX_FRAMES_IN_FLIGHT. From our discussions in the previous sections we know that if we can keep the CPU from getting ahead by only MAX_FRAMES_IN_FLIGHT frames, then we will only need MAX_FRAMES_IN_FLIGHT semaphores for each use of a semaphore within a frame.
So now the question is how do we stop the CPU from racing ahead of the GPU? Specifically we need a way to make the CPU timeline wait until the GPU timeline indicates that it is done with processing a frame. In Vulkan, the answer to this is a fence (VkFence). Conceptually this is how we can structure a frame with fences to get the desired result (ignoring the use of semaphores for GPU↔GPU synchronization):
In the application initialisation, create MAX_FRAMES_IN_FLIGHT fence objects in the signalled state.
Force the CPU timeline to wait until the fence for this frame becomes signalled or continue immediately if it is the first frame and the fence is already signalled (vkWaitForFences).
Reset the fence to the unsignalled state so that we can wait for it again in the future (vkResetFences).
Acquire the swapchain image index (as before).
Record and submit the command buffers to perform the rendering for this frame. When it is time to submit the command buffers to the GPU queue, we can pass in the fence for this frame as the final argument to vkQueueSubmit. Just as with a semaphore, when the GPU queue finishes processing this command buffer submission, it will transition the fence to the signalled state.
Issue a GPU command to present the completed swapchain image (as before).
Go to step 2 and use the next fence and (set of semaphores).
With this approach, the CPU timeline can only get at most MAX_FRAMES_IN_FLIGHT ahead of the GPU before the call to vkWaitForFences in step 2 forces it to wait for the corresponding fence to become signalled by the GPU. This is when it completes command buffer submission that went along with this fence.
Making use of both fences and semaphores allows us to nicely keep both the CPU and the GPU timelines making progress without races (between rendering and presentation) and without the CPU running away from us. These two synchronization primitives, fences and semaphores, solve similar but different problems:
A VkFence is a synchronization primitive to allow the keeping of the GPU and CPU timelines in pace.
A VkSemaphore is a synchronization primitive to ensure ordering of GPU tasks.
It is also worth noting that a VkFence can also be queried as to its state from the CPU timeline rather than having to block until it becomes signalled (vkGetFenceStatus). This allows your application to peek and see if a fence is signalled or not. If it is not yet signalled, your application may be able to make more use of the available time to go do something more productive than just blocking like with vkWaitFences. It all depends upon the design of your application.
Other Considerations
Presentation Mode
We have seen above how we can utilise fences and semaphores to make our Vulkan applications well-behaved. It is also worth mentioning that, as an application author, you should also consider your choice of swapchain presentation mode. This is because this can heavily impact on how your application behaves and how many CPU/GPU cycles it uses. With OpenGL we would typically setup to have either:
VSync enabled rendering for tear-free display OR VSync disabled rendering and go as fast as you can but probably see some image tearing.
With Vulkan we can still get these configurations but there are also others that offer variations. As an example, VK_PRESENT_MODE_MAILBOX_KHR allows us to have tear-free display of the currently presented image (it is vsync enabled), but we can have our application also render as fast as possible. Very briefly, the way this works is that when the presentation engine is displaying swapchain image 0, our calls to vkAcquireNextImageKHR will only return the other swapchain image indices. When we subsequently tell the GPU to present those images it will happily take the image and overwrite your previous presentation submission. When the next vertical blank occurs, the presentation engine will actually show the most up to date submitted swapchain image.
In this manner we can render to e.g. images 1 and 2 as many times as we like so that when the presentation engine moves along, it has the most up to date representation of our application's state possible.
Depending upon which swapchain presentation mode you request, your application could be locked to the VSync frequency or not, which in turn can lead to large differences in how much of your available CPU and GPU resources are consumed. Are they out for a leisurely stroll (VSync enabled) or sprinting (VSync disabled or mailbox mode)?
Multiple Windows and Swapchains
All of the above examples have assumed we are working with a single window surface and a single swapchain. This is the common case for games but in desktop and embedded applications we may well have multiple windows or multiple screens or even multiple adapters. Vulkan, unlike OpenGL, is pretty flexible when it comes to threading. With some care, we can happily record the command buffers for different windows (swapchains) on different CPU threads. For swapchains sharing a common Vulkan device, we can even request them all to be presented in one function call rather than having to call the equivalent of swapbuffers on each of them sequentially. Once again, Vulkan and the WSI gives you the tools, it's up to you how you utilise them.
Timeline Semaphores
A more recent addition to Vulkan, known as timeline semaphores, allows applications to use this synchronization primitive to work like a combination of a traditional semaphore and a fence. A timeline semaphore can be used just like a traditional (binary) semaphore to order packets of GPU work correctly, but they may also be waited upon by the CPU timeline (vkWaitSemaphores). The CPU may also signal a timeline semaphore via a call to vkSignalSemaphore. If found to be supported by your Vulkan version and driver, you can use timeline semaphores to simplify your synchronization mechanisms.
Pipeline and Memory Barriers
This article has only concerned itself with the high-level or coarse synchronization requirements. Depending upon what you are doing inside of your command buffers you will also likely need to take care of synchronising access to various other resources. These include textures and buffers to ensure that different phases of your rendering are not trampling over each other. This is a large topic in itself and is covered in extreme detail by this article and the accompanying examples.
It's up to you!
A lot of what the OpenGL driver used to manage for us, is now firmly on the plate of application and library developers who wish to make use of Vulkan or other explicit modern graphics APIs. Vulkan provides us with a plethora of tools but it is up to us to decide how to make best use of them and how to map them onto the requirements of our applications. I hope that this article has helped explain some of the considerations of synchronization that you need to keep in mind when you decide to take the next step from the tutorial examples and remove that magic call to vkDeviceWaitIdle.
About KDAB
Trusted software excellence across embedded and desktop platforms
The KDAB Group is a globally recognized provider for software consulting, development and training, specializing in embedded devices and complex cross-platform desktop applications. In addition to being leading experts in Qt, C++ and 3D technologies for over two decades, KDAB provides deep expertise across the stack, including Linux, Rust and modern UI frameworks. With 100+ employees from 20 countries and offices in Sweden, Germany, USA, France and UK, we serve clients around the world.
Dr Sean Harmer is a senior software engineer at KDAB where he heads up our UK office and also leads the 3D R&D team. He has been developing with C++ and Qt since 1998 and is Qt 3D Maintainer and lead developer in the Qt Project. Sean has broad experience and a keen interest in scientific visualization and animation in OpenGL and Qt. He holds a PhD in Astrophysics along with a Masters in Mathematics and Astrophysics.