Learn how to locate graphics issues in minutes with Metal's debugging and performance optimization tools in Xcode 12. We'll show you how to diagnose problems quickly using Metal Debugger. Discover the new summary view, which suggests ways to improve memory usage, bandwidth, performance, and implementation of the Metal API.
You should have a basic understanding of Metal to get the most out of this session. For background, watch “Harness Apple GPUs with Metal.”
Hello, and welcome to WWDC. Hi, I'm Sam Colbran from the GPU Software team at Apple, and today we're gonna be talking about Metal tools and how they can help you gain insights into your apps in the world of Apple GPUs.
These days, the rendering pipeline is increasing in complexity, with potentially hundreds of render passes. And that probably sounds a bit daunting, especially if you're new to the Apple platform. But don't worry. We've got a bunch of great Metal tools that can guide you and help to reduce some of that complexity. We're gonna be going through them today, and once you use them, I hope that you'll be able to deliver amazingly optimized, beautiful apps and games to people all over the world. In this session, we're gonna dig around in the Metal toolbox and discover what's available in the Metal Debugger and Metal System Trace.
At the end, we're gonna have a sneak peek at an upcoming game from Larian Studios, and then I'll show you how you might investigate it with the tools. First up is the Metal Debugger, which you'd typically use when you're GPU-bound. The Metal Debugger in Xcode is a powerful toolset for capturing and debugging a single frame of your app. It lets you deep dive into your render pipeline and helps to diagnose issues at an API level.
Let's take a closer look at each of the tools. It all begins with a frame capture.
In Xcode, make sure to first enable GPU Frame Capture under the scheme options, and then run your Metal app. Once you're at a frame that you wanna debug, simply click on the camera icon in the debug bar.
To help you better understand your frame, this year we've created a new summary view with the information about a frame in one convenient location, giving you quick access to an overview of encoders, performance numbers and memory usage. From here, you can jump into tools that are specific to the areas you're interested in exploring. The summary also shows Insights, which suggest changes that you can make to your app in order to improve memory usage, bandwidth, performance or Metal API usage. It's okay if this is your first journey into the world of Apple GPUs. The tools will give you the guidance you need to face the challenges. Each insight is accompanied by a description of the problem, a hint on how to fix it, along with links to related documentation.
Let's move on to the dependency viewer, which you can get to by clicking on the Show Dependencies button in the summary view or by clicking on any command buffer or encoder in the navigator.
As apps and render pipelines become more complicated, it's often useful to see all the dependencies between different encoders to understand how your frame is rendered. The dependency viewer provides a bird's-eye overview of all of the GPU passes encoded by your app and is a great place to start debugging load/store actions and bandwidth in general.
If you're interested in learning more, check out the "Delivering Optimized Metal Apps and Games" talk from last year's WWDC. We'll also come back to the dependency viewer a bit later, so let's move on for now.
Apple GPUs are packed with enormous power, and we want you to take full advantage of it. Clicking on Counters in the navigator brings you to the GPU Performance counters, which give you great detail into how long the GPU is spending on the various parts of your frame.
If you're interested in learning more, we have another session this year that goes into much more detail about performance counters and how to use them to fine-tune your app. So go check it out.
Shaders are also becoming more powerful each year. But not to worry. Simply switch the navigator to "View Frame by Pipeline State" and then select a render pipeline. On Apple GPU's, you can get per-line performance breakdowns directly in your shader source code with the shader profiler. You can even make edits directly in the shader profiler and see how the performance would change. The "Metal Shader Debugging and Profiling" talk from 2018 goes into much more detail, so I encourage you to check it out.
But why stop at performance? You can also debug your shaders with the shader debugger. There are a couple of ways to get to this tool, but the easiest is to click on the debug button in the debug bar. You can then select a pixel to debug the fragment shader, geometry to debug the vertex shader or threads to debug a compute shader. Once you're ready, click on the debug button.
The shader debugger provides you access to variable state over multiple threads, so you can easily see exactly what is happening during the entire execution of your shader.
For anyone who's used a traditional CPU debugger, you've probably been in a situation where you've accidentally stepped too far and can't go back. In the shader debugger that's not a problem. There's also no need to step over and into instructions, because you get access to all of the variable values for the entire execution of your shader.
For example, if you wanna look at iteration 28 of a loop, it's right there. That's pretty cool. Like the shader profiler, the shader debugger also lets you directly edit the source code and see your changes live. These tools combined let you write huge performance shaders without worrying about seeing a black screen and trying to figure out where you went wrong.
Once again, if you want to learn more, the "Metal Shader Debugging and Profiling" talk also covers the shader debugger.
Finally, that brings us to the Memory Viewer, which is the go-to place to learn more about your app's memory usage. You can get here by clicking on Memory in the navigator. In modern rendering pipelines, you have so much more explicit control over the memory usage. The Memory Viewer provides a breakdown of all of the resources in your app and their various properties. You can also filter by categories, including volatility, resource type, access and whether they're used or not.
The "Delivering Optimized Metal Apps and Games" talk from last year also includes the Memory Viewer and goes into much more detail.
So these are just some of the amazing tools in the Metal Debugger that can help you to understand and take further advantage of Metal and Apple GPUs in your app. I encourage you to go through each of the related talks from past WWDC's for each tool to learn more about them. Of course, there are also a few extra tools in Xcode, such as the new shader validation, that we didn't cover.
You can learn more about it in this related talk.
Now, let's move on to Metal System Trace.
Metal System Trace is a template in Instruments that lets you capture traits of your app over time, as opposed to a single frame in the Metal Debugger. This is perfect if you're trying to find issues like stutters, dropped frames or memory leaks. We also provide the Game template, which is a great set of tracks if you're trying to investigate a game. Let's start with the Encoder Timeline. You can disclose the GPU track to see a timeline of all the command buffers running in your app, color-coded by frame. This is perfect to see the overlap between encoders and to make sure there are no bubbles where the GPU is idle. It's also great for correlation because you have access to the state of system in other tracks, like the current thermal state. For example, if your encoder is taking longer because the device is in a critical state, maybe you need to do less work to let it cool down.
I encourage you to check out the "Delivering Optimized Metal Apps and Games" talk to learn more.
Next up is the new Shader Timeline. If you want more detail about encoders on Apple GPUs, you can enable the Shader Timeline under the Metal Application recording options. The Shader Timeline shows you which shaders are running at sampled times during the execution of your command encoder, and this fine-grain detail makes it really easy to see why a given encoder is taking a certain amount of time. You can even see the number of times that shaders were sampled and an approximate GPU time for how long they were running in the table at the bottom.
In addition to the Shader Timeline, we also have the Performance Limiter tracks. Once again, you can enable these in the Metal Application recording options under GPU Counter Set. These tracks show you the values of the limiters over time. I really encourage you to check out the "Optimize Metal Apps and Games with GPU Counters" talk if you're interested in learning more about counters, as it basically explains every single limiter, what they mean and what to do if you see a high value.
This year, GPU Counters are also available for non-Apple GPUs. Lastly, the memory allocations track shows you when resources, such as buffers or textures, are allocated or deallocated, which can help you to find memory leaks or just overall reduce your memory footprint.
And that's a quick summary of the tools we provide in Instruments to help you debug your frame over time.
These, plus all of tools from the Metal Debugger, work together to help you understand and take further advantage of Metal and Apple GPUs in your app.
But that's a lot of tools, and it might seem a little daunting, so let's see them in practice. Coming to iPad soon, we have one of the world's greatest RPGs, with over 100 hours of role-play and over a million words of voice over.
Divinity: Original Sin 2. Larian Studios has given us their first iPad game to play with, and it's just beautiful. It really goes to show the quality of graphics that can run on modern Apple GPUs and, of course, the amazing talent and commitment at Larian.
Now I wanna show you how to use the tools to debug something like this. It's obviously a pretty advanced rendering pipeline, but let's see if the tools will guide us and help to make any improvements. I've got the game running on my iPad Pro, so let's quickly look at the GPU gauges in Xcode. So we're now in the Metal Debugger and looking at the summary of our frame. In the overview we can see that this app has four command buffers, 53 render command encoders and almost 1000 draw calls. Under Performance we can see that this frame took about 26 milliseconds to render and has almost four-and-a-half-million vertices.
We can also see how much memory the app is using for various resources. And that brings us to the most interesting section of the summary page: Insights. If you remember from earlier, Insights suggest changes that you can make to your app in order to improve memory usage, bandwidth, performance or Metal API usage.
Each insight is accompanied by a description of the problem, a hint on how to fix it, along with links to related documentation.
Let's look at some bandwidth insights. So if you've watched the "Harnessing Apple GPUs with Metal" talk, you know that load/store actions are pretty important. And here, it seems like we have a few unused resources. If we look at the description, it seems like this encoder is storing a resource that's never again used in this frame.
The insight gives us a hint that we should consider changing the store action to "Don't Care." So I'm going to click on the Show In Dependencies button to jump right into the Dependency Viewer to investigate.
The Dependency Viewer shows us all of the GPU passes encoded by an app. In essence, we can see when a resource is stored in one encoder and used in others.
Focusing back in on this encoder, we can see that the depth and stencil textures are both set to "Store," but there's no line coming out of them, so they're not actually being used by any other render command encoder in this frame. The Metal Debugger knows this, so it's put an insight right here which suggests that we change the store action to "Don't Care." Now, if we do that, we'll save almost 11 megabytes of memory bandwidth.
Let's go back to the summary page and look at some other insights. This time I'm gonna look in the Performance category.
So it looks like this render command encoder might be an unnecessary depth pass.
So, Apple GPUs with a Tile Based Deferred Rendering architecture automatically perform hidden surface removal to avoid invoking the fragment shader unnecessarily.
Once again, let's jump back into the dependency viewer to investigate by clicking on the Show in Dependencies button.
So it looks like this encoder takes 1.71 milliseconds of GPU time and has roughly a million vertices.
From the thumbnail, it looks like it's drawing the entire scene into the depth and stencil textures. So what I'm gonna do is I'm gonna disclose it in the navigator and use the new thumbnail popover to scroll through the draw calls. And we can see that our suspicions are confirmed. It's then storing this result, which is being used by this encoder...
which looks like it's drawing the entire scene once again.
Now, this looks very much like a traditional depth pre-pass that you might use to cull invisible meshes.
While on non-Apple GPUs this is a pretty common technique to reduce overdraw, it's not actually needed here thanks to Apple GPUs and their hidden surface removal.
So this encoder could be eliminated entirely, and we'd save 1.71 milliseconds of GPU time.
So in a matter of mere minutes, we've captured Divinity: Original Sin 2 in the Metal Debugger and used Insights to find both a bandwidth issue and an advanced performance issue, along with a detailed description about how to fix them.
What I'm trying to stress here is that you don't need to be an expert in the world of Apple GPUs to start taking better advantage of them. The tools are here to help you, so capture your game and let them guide you. You have incredible hardware, software and tools at your disposal, so use them to create your beautiful apps and games.
There are a ton of talks that you can watch to learn more about the tools and how to take better advantage of Metal, so I really encourage you to go and check them out.
I hope you enjoyed today's session,
and have a great WWDC 2020.
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.