Metal is the modern foundation for GPU-accelerated graphics and compute on Apple platforms, superseding OpenGL, OpenGL ES, and OpenCL. Get introduced to the architecture and feature set of Metal and learn a step-by-step approach for transitioning OpenGL-based apps to the Metal API.
Lionel Lemarie: Hi, folks. Welcome to our Metal session. I'm Lionel. I'm in the GPU software performance team here at Apple.
And with my friends Max and Sarah, we'll be guiding you through how to bring your OpenGL app to Metal. So last year we announced that OpenGL, OpenGL ES, and OpenCL are deprecated.
They will continue to be supported in iOS 13 and macOS Catalina, but now is the time to move.
New projects should target Metal from their inception. But if you have an OpenGL app that you want to port to Metal, you've come to the right place. So we first introduced Metal in 2014 as our new low-overhead, high-efficiency, high-performance GPU programming API. Over the past five years, Apple's core frameworks have adopted Metal and they're getting really great results. If your application is built on top of layers like SpriteKit, SceneKit, RealityKit, Core Image, Core Animation, then you're already using Metal.
We've also been working closely with vendors on engines like Unity, Unreal Engine 4, and Lumberyard to really take advantage of Metal. If you're using one of these engines, you're already up to speed.
But if you've built your own renderer, then Metal gives you a lot of great benefits.
Metal combines the graphics of OpenGL and compute of OpenCL into a unified API.
It allows you to use multithread rendering in your application.
Whenever there are CPU operations that need to take place that are expensive, we made sure that they happen as infrequently as possible to reduce overhead during your app's execution.
Metal's shading language is C++ based and all the shaders used in your application can be precompiled, making it easier to have a wide variety of material shaders, for example. And last but not least, we have a full suite of debugging and optimization tools built right into Xcode.
So once you have ported to Metal, you have full support to make your application even better.
So let's dive in. In this session we'll take a look at the different steps involved in migrating from GL into Metal, and we'll do that by comparing a typical GL app to a Metal app. As an overview, let's quickly look through the steps of our GL app.
First, you set up a window that you'll use for rendering. Then you create your resources like buffers, textures, samplers. You implement all your shaders written in GLSL.
Before you can render anything in GL, you may have to create certain object states, such as GL programs, GL frame buffer objects, vertex array objects.
So once you've initialized your resources, the render loop starts and you draw your frames. For each frame, you start by updating your resources, bind a specific frame buffer, set the graphic state, and make your draw calls.
You repeat this process for each frame buffer you have. You may have shadow maps, a lighting pass, some post-processing. So potentially quite a few render passes. And then finally, you present the final rendered image. It's pretty easy.
And as you can see, the Metal flow looks very similar. We updated some of the original concepts and introduced a few new things. But overall, the flow is much the same. It's not a complete rewrite of the engine; it works in the same manner.
So we will reintroduce the new concepts while drawing parallels between GL and Metal, comparing and contrasting the two API's to help you successfully make the transition. When you're walking through any tutorial on graphics, then the first thing you learn is how to create and draw to a window. So let's start with the window subsystem. Both GL and Metal have this concept, but it's accomplished a little differently. The application is required to set up and present a drawing surface. And view and view delegates manage the interface between the API and the underlying window system. You might be using these frameworks to manage your GL views, so we have equivalent frameworks in Metal.
NSOpenGLView and GLKView map to MTKView. And if you are using Core Animation in your application with the EAGLLayer, then there's an equivalent CAMetalLayer. As an example, let's say you are using GLKView.
It has a single entry point with the draw rate. So you need it to check if the resolution of your target is unchanged since the last frame, update your render target sizes as needed, right from within the render loop.
In MetalKit, it's a bit updated. There's a separate function for whenever the drawable needs to change, such as when you're rotating the screen or resizing your window. So you don't need to check if your resources need to be reallocated inside your draw function; it's dedicated to render code.
If you need additional flexibility, we provide the CAMetalLayer, which you use as the backing layer for your view.
While the CAEAGLLayer defined the properties of your drawable such as its color format, the CAMetalLayer allows you to set up your drawable size, pixel format, color space, and more.
Importantly, the CAMetalLayer maintains a pool of textures and you call next drawable to get the drawable to render your frame to. It's an important concept that we'll revisit in a short while when it's time to present. So now we have a window. Next we're going to introduce some new concepts in Metal. So the command queues, command buffers, command encoders. These objects work together in Metal to submit work to the GPU. They're new because the underlying glContexts managed the submission for you. GL is an implicit API, meaning that there is no code that tells GL when to schedule the work. As a developer, you have very little control about when graphics work really happens, such as when shaders are compiled, when resource storage is allocated, when validation occurs, or when work is actually submitted to the GPU. The glContext is a big machine, and a typical workflow would look like this. Your application creates a glContext, sets it on the thread, and then calls arbitrary GL comments. The comments are recorded by the context under the hood and would get executed at some point in time. Let's take a closer look to see what actually goes on.
Say your application just send GL these calls, a few state changes, a few draw calls. In a perfect scenario, the context would translate this into GPU comments to fill up an internal buffer. And then when it's full, it would send it to the GPU. If you insert a glFlush to enforce execution, you know for sure they'll be kicked off by that point. But actually, the GPU could start execution at any point beforehand.
Alright. So, for example, if we change one draw call introducing every dependency, suddenly execution is kicked off at that point and you could experience massive stalls.
So, again, when does work actually get submitted? It depends. And that was one of the downsides of OpenGL -- wasn't consistent in performance. Any one small change could force you down a bad path. Metal, on the other hand, is an explicit API, meaning the application gets to decide exactly what work goes to the GPU and when. Metal splits the concept of a glContext into a collection of internal working objects. The first object an app creates is a Metal device object, which is just an abstract representation of the GPU. Then it creates a key object called a Metal command queue. The Metal command queue maintains the order of commands sent to the GPU by allocating command buffers to fill. And a command buffer is simply a list of GPU commands your app will fill to send to the GPU for execution. So we saw this command buffer concept in GL -- in the GL example we just studied.
Let's work with that command buffer from this point on. But an app doesn't write the commands directly to the command buffer; instead, it creates a Metal command encoder. Let's look at the main three types of encoders.
First one we'll use will be filled with blit commands that are used to copy resources around. The command encoder translates API codes into GPU instructions and then writes them to the command buffer. After a series of commands have been encoded, for example, series of blits to copy resources, then your app will end encoding, which releases the encoder object.
Additionally, Metal supports a compute encoder for parallel work that you would normally have done in OpenCL before.
You enqueue a number of kernels that get written to the command buffer and you run the encoder to release it.
Lastly, let's use a render encoder for your familiar rendering commands.
You enqueue your state changes and your draw calls and end the encoder.
So here we have a command buffer full of different workloads, but the GPU hasn't done any work yet. Metal has created the objects and encoded commands all within the CPU. It's only after your application has finished encoding comments and explicitly committed the command buffer that the GPU begins to work and executes those commands.
So now that we have encoded commands, let's now compare and contrast GL and Metal's command submissions.
In GL there's no direct control of when work gets submitted to the GPU -- you rely on big hammers like glFlush and glFinish to ensure code execution; glFlush submits the commands and poses the CPU threads until they're scheduled, and glFinish poses the CPU thread until the GPU is completely finished.
Work can still get submitted at any time before these commands happen, introducing potential stalls and slowdowns. And Metal has equivalent versions of these functions; you can still explicitly commit and wait for a command buffer to be scheduled or completed. But these wait commands are not recommended unless you absolutely need them. Instead, we suggest that you simply commit your command buffer and then add a callback so that your application can be notified later when the command buffer has been completed on the GPU. This frees your CPU to continue doing other work. So now that we have reviewed command queue, command buffer, command encoder, let's move on and talk about resource creation.
There are three main types of resources that any graphic app is likely to use: Buffers, textures, and samplers. Let's take a look at buffers first.
In GL, you have a buffer object and the memory associated with it. The API codes you use can modify the object state, the memory, or both together. So here, for example, glBufferData can be used to modify both the memory and the state of the object. The buffer dimensions can be modified again later by calling glBufferData, in which case the old object and its contents will be discarded internally by OpenGL. In Metal, the API to create and fill a buffer looks very similar, but the main difference lies in the fact that the produced subject is immutable. If at any point you need to resize the buffer, you simply need to create a new one and discard the old one.
Both OpenGL and Metal have ways to indicate how you intend to use an object; however, in GL the enum is simply a usage hint about how the data in a buffer object would be accessed. The driver uses that hint to decide where to base the locate memory for the buffer, but there's no direct control over storage. OpenGL ultimately decides where to store the objects.
In Metal, the API allows you to specify a storage mode which maps to a specific memory allocation behavior.
Metal gives you control, since you know best how your objects are going to be used. It's an important concept in an object creation, so we'll come back to it in a short moment right after we look at texture API's.
In GL, each texture has an internal sampler object, an app's commonly set up sampling mode through that sampler. But you also have the option to create a separate sampler object outside of your texture.
Here's an example for creating and binding your texture, setting up your sampler, and then finally filling in the data. One thing worth mentioning is that GL has a lot of API calls to create initialized textures with data.
It also has what are called named resource versions of the same API.
There's even more API's when it comes to managing samplers. The list just goes on and on. One of the design goals with Metal was to give a simpler API that would maintain all of the flexibility. So in Metal, texture and sampler objects are always separate and immutable after creation. To create a texture, we create a descriptor, set various properties to define texture dimensions like pixelFormat and sizes, amongst others.
Again, an important property we said is the storage mode to specify where in memory to store the texture.
And finally, we use that descriptor to create an immutable object. In a similar fashion, you start with a sampler descriptor, set its properties, and create the immutable sampler object.
It's pretty easy.
To fill a texture's image data, we calculate the bytes per row. And just like we did in OpenGL, we specify the region to load. Then we call the textures replaceRegion method, which copies the data into the texture from a pointer we specify.
Once you load your first texture, you're likely to observe that it's upside down.
That's because in Metal the texture coordinates are flipped on the y-axis compared to GL.
And it's also worth mentioning that Metal API's don't perform any pixelFormat transformation under the hood. So you need to upload your textures in the exact format that you intend to use.
Now let's get back to storage modes. As mentioned, in GL the driver has to make a best guess on how you wanted to use your resources. As a developer, you can provide hints in some cases, like when you created a buffer or by creating render buffer objects for frame buffer attachments. But in all cases, these were still hints and the implementation details are hidden from you. A few minutes ago, we briefly saw the additional storage mode property Metal that you can set on a texture descriptor and also when creating a buffer. Let's look at the main use cases for those.
Simplest option is to use shared storage mode, which gives both the CPU and GPU access to the resource. For buffers, this means you get to point here to the memory backing of the object. For textures on iOS, this means you can call some easy-to-use functions to set and retrieve image data. You can also use a private storage mode, which gives the GPU exclusive access to the data. It allows Metal to apply some optimizations that it wouldn't normally have been able to use if the CPU had access to it. But only the GPU can directly fill the contents of the data.
So you can indirectly fill the data from the CPU by using a blitEncoder from a second intermediate resource that uses shared storage. On the voices with dedicated video memory, setting the resource to use private storage allocates it in video memory only, single copy. On macOS there's a managed storage mode which allows both the CPU and GPU to access an object's data.
And on systems with dedicated video memory, Metal may have to create a second mirrored memory backing for efficient access by both processes. So because of this, explicit codes are necessary to ensure that your data is synchronized for CPU and GPU access, for example, using didModifyRange.
So to recap, we reviewed some of the typical uses for each mode.
On macOS you would use the private storage mode for static assets and your render targets.
Your small dynamic buffers could use the shared storage mode. And your larger buffers with small updates would use the managed storage mode. On iOS, your static data and rendering targets can use the private storage mode. And since our devices use unified memory, dynamic data of any size can use the shared storage mode and still get great performance.
Next, let's talk about developing shaders for your graphics application and what API's you use to work with shaders. When it comes to shader compilation in GL, you have to create a shader object, replace the ShaderSource in the object, make just in time compilation, and verify that the compilation succeeded. And while this workflow has its benefits, your application had to pay the performance costs of compiling all your shaders every time. One of the key ways in which Metal achieves its efficiency is by doing work earlier and less frequently. At build time, Xcode will compile all the Metal ShaderSource files into a default Metal library file and place it in your app bundle for retrieval at runtime. So this removes the need to compile a lot of it at runtime and cuts the compilation time when your application runs in half. All you need to do is create a Metal library from a file bundled with your application and fetch the shader function from it.
In GL you use GLSL, which is based on the C programming language.
The Metal shading language or MSL is based on C++. So it should look reasonably familiar to most GL developers.
Its foundation in C++ means that you can create classes, templates, and stretches. You can define enums and namespaces.
And like GLSL, there are built-in vector and matrix types, numerous built-in functions and operations come in and use for graphics. And there are classes to operate on textures that specify sampler state.
Like Metal, MSL is also unified for graphics and compute.
And finally, since shaders are pre-compiled, Xcode is able to give you errors, warning, and guidance to help you debug at build time.
So let's take a look at actual code for MSL and compare it with GLSL.
We're going to walk through a simple vertex shader, GLSL on top, MSL on the bottom. Let's start defining our shaders. These are the prototypes. In GLSL, void main. There's nothing in the shader that specifies the shader stage. It's purely determined by the shader type passed into the glCreateShader call.
In MSL the shader stage is explicitly specified in the shader code. Here the vertex qualifier indicates that it will be executed for each vertex generating perfect examples.
In GLSL, every shader entry point has to be called main and accept and return void. In MSL each entry point has a distinct name. And when you're building shaders with Xcode, the compiler can resolve include statement in the preprocessing stage the same it would do for regular C++ code. At runtime you can query functions by their distinct name from the precompiled Metal library. Then let's talk about inputs.
Because each entry point in GLSL is a main function with no argument, all of the inputs are passed as global arguments. This applies to both vertex attributes and uniform variables.
In Metal all the inputs to the shaded stage are arguments to the entry function. The double brackets declare C++ attributes. We'll look at them in a second.
One of the inputs here that we have is a model view projection matrix. In OpenGL, your application had to be aware of the GLSL names within the C++ code in order to bind data to these variables. And that made shader development error-prone. In MSL the uniform binding indices are explicitly controlled by the developer within the shader, so an application can bind directly to a specific slot. In the example here, slot number one.
The keyword constant here indicates that the intention for the model view projection is to be uniform for all vertices.
The other input to the shader is a set of vertex attributes. In GLSL you typically use separate attribute inputs. The main difference here is that MSL uses a structure of your own design. The staging keywords suggest that each invocation of the shader will receive its own arguments.
Once you have all the inputs to the shaders set up, you can actually perform all the calculations.
Then for the outputs, in GLSL the output is split between varying attributes like glTexCoord and predefined variables, in this case gl Position. In MSL, the vertex shader output is combined into your own structure.
So we've used a vertex and vertex output structure. Let's scroll up in the MSL code to see what they actually look like.
As mentioned previously, GLSL defines the input vertex attributes separately, and Metal allows you to define them within a structure. In MSL there are a few special keywords for vertex shader input. We mark each structure member with an attribute keyword and assign an attribute index to it. Similar to GLSL, these indices are used in the Metal API to assign the vertex buffer streams to your vertex attributes.
And GLSL predefines special keywords like GL position to indicate which variable contains vertex coordinates that have been transformed with the model view projection matrix.
Similarly, for the vertex output, a structure in MSL, the special keyword position signals that the vertex shader output position is stored in that structure member.
Similar to GLSL vector type, MSL defines a number of simd types via the simd.h header that can be shared between your CPU and GPU code.
But there's a few things you need to remember about them.
Vector and matrix types in your buffers are aligned to 16 bytes or 8 bytes for half precision. So they're not necessarily packed, for example, a float3 has a size of 12 bytes but is aligned to 16 bytes. This is to ensure that the data is aligned for optimal CPU and GPU access.
There are specific backed formats you can use if you need them. But you will need to unpack them in the shader before using them. So we've just reviewed the main differences between GLSL and MSL. And to make this transition smooth and easy, my colleague Max will show you a really cool tool to help you breeze through it. Thank you.
Metal, it's not just an API and a shading language, it is also a powerful collection of tools.
My name is Max, and I'm going to minimize your hassle porting to Metal. Let's take a look at this scene. This is the very first draw call from an old OpenGL demo that we here at Apple also ported to Metal. It's drawing a model of a temple and a tree, both illuminated by a global light source. Let's port the fragment shader together.
So the very first thing I did, I just copy and pasted my entire old OpenGL code directly into my Metal shader file.
Based on this, I've already created my input structure, as well as my function prototype.
So what we are going to do is just copy and paste the contents of the main function directly into our Metal function.
And here we see the very first powerful thing about Metal.
Because the shader's precompiled, we are getting errors instantly. Let's take a closer look. Of course, the building vector types have different names now. So vec2 becomes a float2; the vec3 becomes the float3; and the vec4 becomes a float4. So we quickly fix that.
The next error we are going to see is that like all of our input structures -- all of our global variables are now coming from our input structure. And because I just used a similar naming scheme, this is also very easy.
And, of course, we have to do the exact same thing for our uniforms.
The next error is a little bit more complex. Sampling in Metal is different, so let's take a look.
We are going to start from scratch. So we directly can call a sample function on our colorMap. And here we can see how powerful it is to have full auto completion.
So this function expects us to put in a sampler and a texture coordinate. We already have the texture coordinate.
We could pass in the sampler as an argument to our function or, conveniently in Metal, we can just declare one in code like this. We need to do the exact same thing for our normalMap.
The last error that we are seeing is that we are writing into, like, one of many OpenGL magic variables. Instead, we are just going to return our final computed color.
We can also see that all the other functions, like normalize, dot product, and my favorite function max, are still exactly the same.
Our shader now compiled successfully. Let's run it.
Something went wrong. In OpenGL when you're experiencing an error with your shader, what you usually do is, like, you look at your source code, you look at your output, and you think really hard. We're just going to use the shader debugger instead.
Clicking on the little camera icon in the debug area will capture a GPU trace. This is a recording of every Metal API call we made. And we can now navigate to our draw calls. Here we are drawing the tree.
And here we are drawing the temple. Let me long press on the stairs of the temple to bring up the pixel inspector, which allows us to start the shader debugger.
What we are seeing here now is the values per line for the code that we have ported together and for the pixel we have just selected. Let's take a look at our colorMap first. We can see this looks like a reasonable texture. And we can also see that our stairs are, like, in the upper half of this texture; however, if we were taking a look at our texture coordinate, we can see that we are sampling from the lower half. Let me quickly verify if this is the case.
What we are going to do is to invert the y coordinate of our texture. We can now update our shaders -- looks reasonable -- and we can continue our execution. There, much better. This is a pretty common error that you will experience when porting from OpenGL to Metal.
And, of course, the real fix is you go into your texture loading code and make sure your texture is loaded at the right origin so you don't have to do this fix in every shader.
However, the combination of a feature-rich editor and mighty debugging tools will also help you port in your games to Metal finally.
Thank you very much. My colleague Sarah will now guide you through the rest of the slides.
Sarah Clawson: Thanks, Max. Hi, I'm Sarah Clawson. And I'm here to take you through the rest of the port from GL to Metal. So far in the life of a graphics app, we've gone through a lot of setup. We've got a window to render to, a way to get your commands to the GPU, and a set of resources and shaders ready to go.
Next up, we're going to talk about setting up the state for your render loop.
OpenGL has several key concepts when it comes to state management. The vertex array object defines both the vertex attribute layout, as well as the vertex buffers. The program is a link combination of vertex and fragment shaders. And the framebuffer is a set of color and depth stencil attachments that your application intends to render to.
These state objects are created during initialization and are used throughout your frames. Let's walk through an example to show how OpenGL manages state.
Here we have a sample render loop where an OpenGL application binds a framebuffer, sets a program, and then makes other state modifications, like enabling depth, or face culling, or changing the colorMap before making a draw call. If you look at this same API trace from OpenGL's perspective, it has to track all these changes on each API call. And then when a draw call happens, it has to stop and validate to be sure that the previous changes to primitive assembly, depth state, rasterizer, and programmable stages are all compatible with each other. This validation can be super expensive. And while OpenGL does try to minimize its negative impact, there's limited opportunity to do so.
It is worth noting that the open OpenGL state objects were ahead of the curve when they were first introduced.
Framebuffer objects combine attached render targets, programs linked fragment and vertex shaders together, and vertex array objects were larger objects combining some of the vertex attribute API's and vertex buffer setup. But even with all these changes, although they yielded positive results, OpenGL still has to validate many things on a draw call, such as will the -- can the ColorMask help optimize the fragment shader? Is the fragment shader output compatible with the attached frame buffer? Is the vertex layout compatible with the bound program? Or are the attached render targets blendable? So as we redesigned the graphic state management for Metal, we took the program shaders combined with the vertex input layouts from the VertexArray objects and added the information about attachment pixelFormat and blend state, and we combined them into one object called the PipelineDescriptor. This structure describes all the relevant states in the graphics pipeline.
To set up the descriptor, first you initialize it. And then you set all the state we just talked about, like vertex and fragment shaders, vertex information, pixel formats, and blend state. And then you take that descriptor and you create what is called a pipeline state object or PSO. This immutable object fully describes the render state. And what's great about it is that you create it once, have it validated for correctness, and then use it throughout your program. In a similar way, we combined all the depth and stencil-related settings into a depth/stencil state descriptor. And, again, it is a collection of all the depth/stencil state. And you take this descriptor and you create what's called a depth/stensil state object.
This object is also immutable and used throughout your program.
So the render loop we were looking at in OpenGL now looks like this in Metal. With all of the prevalidated state objects, there's no longer any state validation or tracking.
Let's look through the comparison. In Metal, the render encoder is the start of a render pass, similar to binding your frame buffer. Now that your depth state is prebaked into an object, you simply set it on the renderEncoder. The PipelineState object represents and combination of program shaders, VertexArray properties, and a pixelFormat. And it's also set on the renderEncoder. And now the renderEncoder manages your rasterizer state directly.
And it's important to note here that there is still flexibility in your pipeline, as not everything is prebaked into your PipelineState object. Here's the list of state that we've just been discussing that you prebake into your PSO: State like vertex and fragment functions and pixel formats, etc.
On the other hand, here's all the state that you still set while drawing -- state like primitive culling mode and direction, fill mode. Scissor and viewport areas are still set just like in OpenGL. And ultimately, the draw calls remain the same. The main difference here is that instead of enabling new state, which could incur hidden validation costs, you simply swap out a new PipelineState object that had blending enabled in its descriptor.
I want to discuss one more possible optimization that you may have used in OpenGL in order to hide certain expensive operations.
As an OpenGL developer, you may have seen that your render loop has an unexpected hiccup on the first draw call after making a bunch of state changes. And if this is the case, you probably use an optimization to hide that called shader pre-warming.
In shader pre-warming, an application uses dummy draw calls for the most common GL programs in order to have OpenGL create all the state that's necessary ahead of time.
If you were doing this in your engine already, then it's going to be very easy for you to replace it with PSO creation. Now shader pre-warming in Metal is accomplished through creating separate PSO objects with different state enabled. First, you create your descriptor, and then you set all of the state up until the first draw call and create your first PipelineState object. Then you can take that same descriptor, change a bit of state on it -- like here we're enabling blending -- and you create a second PipelineState object.
Both of these are prevalidated so that during draw time you can just swap them out between draw calls.
Hopefully if you're porting from OpenGL to Metal, this is a straightforward change. Now, as we conclude the setup stage of our application, I'd like to bring up one of the main benefits of porting your app from OpenGL to Metal, and it is that it will start doing expensive operations less often. In OpenGL, your application would have to wait until draw time in order to do things like compile and link shaders or validate states, which means that these expensive operations happen many times per frame. Once you port your app to Metal, your application moves these operations to different stages of its lifetime.
With precompiled shaders, shader compilation has moved out of initialization and into build time so it's only done once.
Then with PSO's, state definition is moved to content loading. So that leaves your draw time free to actually make draw calls. So now that we've completed the setup stage of your application, let's talk about using all these resources, shaders, and objects to render frames.
In order to draw a single frame, your application needs to first update textures and buffers, then establish a render target to render to, and then make several render passes before finally presenting your work.
Let's talk about updating resources.
Typically, at least some resources have to be updated continuously throughout your render loop.
Such examples are shader constants, vertex and index buffers, and textures. And these modifications can be accomplished between frames through synchronization between the GPU and the CPU. A typical GL resource update can be any combination of the following calls: A buffer can be updated by the CPU; or you can update a buffer through the GPU via buffer-to-buffer copy.
Similarly, a texture can be updated by the CPU or it can be updated via texture-to-texture copy on the GPU. At a glance, Metal offers similar functionality. But as Lionel mentioned earlier, the containers for buffers and textures are immutable and are created during initialization; however, their contents can be modified through any combination of the following. A buffer with shared or managed storage mode can be updated through its contents property on the CPU. And on the GPU, the blitEncoder is in charge of doing all data copying. And so you can update a buffer from the GPU via the copyFromBuffer methods on the blitEncoder.
Similarly, a texture with shared or managed storage mode can be updated on the CPU through its replaceRegion method.
Or on the GPU, you can update a texture through the copyFromTexture methods on the blitEncoder. Note that storage mode matters here when it comes to these updates as only buffers and textures with shared or managed storage modes can be updated by the CPU.
OpenGL managed the synchronization between the GPU and CPU for you, though sometimes at exorbitant costs to your application as it waited for one or the other to be done.
In Metal, because you control how the memory is stored, you also control how and when the data is synchronized. And this is true for both buffers and textures. If you port your GL app to Metal and only use a single buffer for your resource updates, the flow will look like this. First, your CPU will update your resources during the setup of a render pass. And then once complete, the buffer will be available for the GPU to consume during the execution of that render pass. However, while the GPU is reading from this buffer, the CPU may begin setting up for the following render pass and will need to update the same buffer, which is a clear race condition.
So let's look at one approach to solve this problem. A simple solution would be to commit this resource to the GPU with the waitUntilCompleted call on the commandBuffer it is used in.
As we discussed earlier, this is similar to glFinish and it places a semaphore on all CPU work until the GPU is done executing the render pass that uses that buffer. After the execution is completed, a call back is received from the GPU, and this way you can ensure that your single buffer will not be stomped on by the CPU or the GPU.
However, as you can see, the CPU is idle while the GPU is executing, and the GPU is starved waiting for the CPU to commit work. So while this can be helpful for you at the beginning while you're working out these race conditions, it is not recommended to use waitUntilCompleted as it introduces latency into your program.
Instead, an efficient way to synchronize your updates is to use two or more buffers depending on your application's needs so that the CPU can write to one while the GPU reads from another.
Let's look at a simple triple buffering example. So here we start with the first resource ready to go for the -- to be consumed by the GPU. But instead of waitUntilCompleted, we just add a completion handler so that once the corresponding frame is finished on the GPU, it can let the CPU know that it is done. But now we don't have to wait for it to be done.
While the GPU is executing, with triple buffering the CPU can jump two updates ahead because it's in different buffers.
So here we are with the -- with the frame done executing on the GPU, and this is where the completion handler comes in. It notifies that GPU work is done and then returns the buffer to the buffer pool so that it can be used by the CPU in the next frame while the GPU continues execution. I think most developers will find that they'll need to implement triple buffering to achieve optimal performance. As for implementation, for triple buffering, of course, you need to start with a queue of three buffers.
You also need to initialize your frameBoundarySemaphore with a starting value of three. And this semaphore will be signaled at each frame boundary when the GPU is done executing, letting the CPU know that it is safe to override that buffer. And finally, we need to initialize the buffer index to point at the current frame's buffer. Inside the render loop, before we write to a buffer, we need to ensure that the GPU is completely done executing the corresponding frame.
So at the beginning of each render pass, we need to wait on our frameBoundarySemaphore. And then once the signal has been received, we know that it's safe to grab its buffer and reuse it for new frame data. And now we encode commands and bind this resource to the GPU to be used in the next frame.
But before we commit it, we have to add our completion handler to the commandBuffer and then we commit it. And once the GPU has finished executing, our completion handler will signal our frame semaphore, allowing the CPU to know that it is done and it can reuse the buffer for the next frame's encoding. And this is a simple triple buffer implementation that you can adopt for any dynamic resource updates. Okay.
So now we have our resources updated, so let's talk about render targets.
In OpenGL, framebuffer objects are the destination for rendering commands.
An FBO collects a number of textures and render buffer objects under one umbrella and facilitates rendering into them. The state of a framebuffer is mutable, and the render pass is loosely outlined by binding a framebuffer and ultimately swapping them for display. This is a typical OpenGL workflow with framebuffers.
During the application's initialization stage, a framebuffer is created. And then you make it current by binding it. And then you attach resources like textures and then check the framebuffer status to make sure it's valid to use.
During draw time, you make a framebuffer current by binding it, which is implicit start to a render pass. And then you have to clear it before you make any draw calls to it. And then at the end you can signal that certain attachments can be discarded to let OpenGL know that it's not necessary to store these contents into memory. These discard events can serve as hints to end the render pass, but it's not a guarantee.
In Metal, the render command encoder is the destination for rendering commands.
A render command encoder is created from a render pass descriptor, which, similar to an FBO, collects a number of rendering destinations for a render pass and facilitates rendering into them.
A render command encoder is directly responsible for generating the hardware commands for your GPU, and a render pass is explicitly delineated by the starting and ending of encoders. Here's a render pass in Metal.
You start by creating your renderPassDescriptor. And the renderPassDescriptor describes all the attached resources and also specifies the operations that happen at the beginning and end of a render pass -- these are called load and store actions. In contrast to GL, in Metal you do not clear a resource directly; instead, you specify a load action to clear it and also the color. Here, it is black. The store action here is don't care, which is similar to GL discard framebuffer in our GL example.
If you want to store the results to memory, you would use the store action here instead. And at render time, you use your descriptor to create your encoder so the state is set. You make all your draw calls and then explicitly end encoding. But before discarding framebuffers or ending encoding, let's actually draw something.
A series of render commands is often referred to as a render pass.
Inside the render pass, you set up state and draw call inputs like textures and buffers and then issue your draw commands.
This is a typical OpenGL draw sequence.
A well-behaved OpenGL app tries to set all of its state ahead of time, and then it binds its target and a GL program to link shaders. Then it will bind resources such as vertex buffers, uniforms, and textures to different stages in the program.
And finally, it will draw.
As we've discussed a few moments ago, OpenGL state changes can cause hidden validation checks. And if you're already grouping your state changes together in OpenGL to avoid these performance hits, then you'll get the most out of Metal's pre-validated state objects.
In Metal, because validation only happens when you create your PipelineState object and because shaders are precompiled, your render loop becomes much smaller. But for a programmer, there's not that many changes to do.
Here is the same code that we looked at in OpenGL but now in Metal.
You start with your render command encoder, which is an equivalent to setting the GL framebuffer. And then you set your prebuilt PipelineState object, which is equivalent to GL use program. And after that, we assign resources for our Metal program, starting with the VertexBuffer and uniforms. And you can note here that you have to set your uniforms per shader stage instead of like in GL you set it for the GL program.
And here, because we ported it directly from OpenGL, we're sending the same set of uniforms; but in Metal you can send different ones if you want. And then you set your textures and issue the draw call. And finally, once you've done all the draw calls, you can end your render pass.
And now, once the work is submitted, there's still the matter of presenting. As the GPU renders the scene, it writes out to a framebuffer to display.
In OpenGL, in order to present a rendered frame, when you return from drawInRect, the context calls the presetRenderBuffer for you.
Metal, on the other hand, accomplishes this directly through Core Animations pool of drawables. And drawables are textures for on-screen display. And you can encode a render pass to encode to drawables.
You fetch the current drawable, and then after your render loop tell the command buffer to present it. Remember our code from the very, very beginning of this talk when we were talking about the windows subsystem. Here we're going to dive into glkView and drawInMTKView to see how you can present what you've rendered. So here it is. In glkView you bind your framebuffer; perform your render commands; and then when you return from drawInRect, the present is managed for you.
In Metal it's much the same: You create your commandBuffer, perform your render commands by creating ending encoders, and then the one extra step you have to take is to call presentDrawable yourself before finally committing your commandBuffer. And if your render loop is very simple with a single encoder, then this is all you have to do; however, if you do have a more complex app, you may want to check out the talk we have on delivering optimized Metal apps and games for how to handle your drawables. And that concludes our frame. So we've shown how the window subsystem can be migrated easily. We've gone over the resource creation steps. We've ported our shaders and used the great tools to quickly find issues. We created our render command queue, command buffers, and command encoders to set up our render passes. And we created our prevalidated state objects. Then to render each frame, we used triple buffering to update our resources. We used the render command encoders for our command -- for our render passes where we drew our geometry before ultimately presenting the rendered frame. We've walked through the life of a graphics app and showed how Metal is a natural evolution. Many of OpenGL's established concepts have migrated into Metal to work alongside new concepts that we've added to address specific problems raised in the graphics community.
If you can take one thing away from this session, we hope it's that porting your applications from OpenGL to Metal is not intimidating and that your application will actually benefit from it.
But if you have room for two things, it's that Metal also offers an awesome set of tools to enhance your developing experience. Max already demoed Xcode's built-in frame capture and shader debugger to offer deeper insight into subtle issues within your code. But Xcode also offers the new GPU memory viewer to understand and optimize how to use memory in your application.
In instruments we have a game performance template that includes the Metal system trace to visualize submission issues which might cause frame drops.
And new this year we also have support for Metal in the simulator.
Yay, you can get excited. New with Xcode 11 on macOS Catalina, we have full hardware acceleration to run your games and apps for iOS and tvOS simulator using Metal.
The simulator supports the MTLGPUFamilyApple2 feature set and should meet the majority of your needs to run all of your apps and games in all available screen resolutions. For a deeper dive into the simulator and how it achieves hardware acceleration, please check out the simulator talk tomorrow morning.
If you're looking to solve a specific issue with Metal, you can see our many, many sessions online.
For more information, you can check out our documentation on our website or you can visit us in the Metal lab tomorrow morning.
And with that, thank you all for coming, and I hope to see you at the bash. [ Applause ]
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.