Streaming is available in most browsers,
and in the WWDC app.
Explore hybrid rendering with Metal ray tracing
Discover how you can combine ray tracing with your rasterization engine to implement simplified graphics techniques and elevate visuals in your app or game. We'll explore how you can use natural algorithms to accurately simulate the interplays of light, and learn how to take advantage of the latest tools in Xcode to capture, inspect, and debug your ray-traced scenes.
- Accelerating ray tracing using Metal
- Have a question? Ask with tag wwdc21-10150
- Managing groups of resources with argument buffers
- Metal for Accelerating Ray Tracing
- Metal Shading Language Specification
- Rendering reflections in real time using ray tracing
- Search the forums for tag wwdc21-10150
♪ Bass music playing ♪ ♪ Ali de Jong: Welcome to WWDC 2021. My name is Ali de Jong, and I'm a GPU software engineer at Apple. And today, along with my colleague David Núñez Rubio, we'll explore hybrid rendering with Metal ray tracing. We'll start by showing you some improvements that ray tracing can bring to your visuals; then discuss how to incorporate a ray tracing pass into rasterization, using a technique called "hybrid rendering;" then David will walk us through the new tools to help you implement ray tracing. Let's start by taking a look at some great use cases for ray tracing. Games and movies are in the constant pursuit of ever-increasing realism, and for many years, the approach to graphics has been by means of rasterization. Rasterization is great at producing beautiful images at real-time rates. However, there are limitations to what we can achieve. Ray tracing is a mechanism that allows us to query the world from shaders, opening the door to new, exciting techniques. And by combining it with rasterization, we can greatly improve our visuals. Let's take a look at a few examples. One area that's always been problematic for rasterization is reflections. This is because when we're shading a rasterized pixel, we have no context of the rest of the scene for accurate reflections and we have to do extra work to generate that information. Ray tracing allows us to trace array from the pixel being shaded and discover what's out there in the world. Even better, we can apply this process recursively to apply correct shadows and even reflections in reflections. Another area where ray tracing excels is shadows. Notice with rasterization, the shadows' general blurriness and the aliasing caused by a shadow map resolution on the curved surfaces of the moped. Ray-traced shadows are sharper and address aliasing issues without the need of artificial parameters such as a shadow bias. Soft shadows can also be approximated more accurately. We can naturally produce shadows that are harder or softer, depending on the proximity of the occluding object to the shaded point. With rasterization, we would need to rely on filtering the shadow map at sampling time. But with ray tracing, we can simply trace rays in a cone to get this result. Lastly, another area where ray tracing can elevate our visuals is transparency. This is traditionally very hard to handle accurately for rasterization techniques. In this image, note how the sunlight is coming through the window, yet the opaque letters on the glass produce no shadow. Traditional shadow-mapping techniques often have problems with transparent objects. With ray tracing, we can create a custom intersection function for transparent materials. With this, we can define which rays are able to pass through the material and which ones don't, naturally producing projected shadows like the letters on the head of the bust. And of course, all the shadows look sharper overall. So why is it that ray tracing is able to improve our visuals so dramatically? To understand this, let's take a look at how the traditional rasterization process works. In the rasterization process, meshes are sent down to Metal to be rendered. They are placed in the world and in front of the camera by a vertex shader, and those primitives are placed onto pixels -- or fragments -- by the rasterizer. These pixels are then shaded by a fragment shader, and the result is blended onto the output image. As you know, each pixel can be shaded independently, operating in parallel, which is what makes GPUs so good at the rasterization process. The tradeoff, however, is at the time we're applying our shading, we've completely lost the context of the rest of our scene and we don't know what objects might be surrounding the point associated with this pixel. Advanced game engines make up for this situation by adding extra render passes that generate intermediate information. The fragment shader can then leverage that data to approximate the details of the geometric context the point is in. Let's take a look at how this works in a bit more detail. For this technique, you rasterize geometric information about the scene to intermediate textures instead of directly to the screen. This can be things like albedo, depth, or normals. This is commonly referred to as a Geometry Buffer pass or G-Buffer pass for short. The intermediate textures are used as input for light approximation passes that use smart tricks to approximate how the light would interact with the objects in the scene. Some examples are screen-space ambient occlusion and screen-space reflection. In the last step, our intermediate attachments are often denoised or blurred slightly to make for a smoother image, and everything is combined together to produce the final image. While these sometimes elaborate techniques can help improve the image, they're still just approximations. On the other hand, ray tracing takes a completely different approach that enables more accurate visuals and simplified visual techniques. Instead of processing meshes one at a time, in ray tracing, we build an acceleration structure that encompasses the whole scene. Once we have that, we can have the GPU trace rays from a point toward a given direction and find intersections. This gives us access to all the contextual scene information. Since ray tracing models ray interactions, it has applications beyond rendering, too. It can be used for audio and physics simulation, collision detection, or AI and pathfinding. Since ray tracing is such a powerful technique, we'd like to bring together ray tracing and rasterization to get the unique benefits of each, and we can do so through a technique called "hybrid rendering." Now let's look at how to create a hybrid rendered frame and some use cases for this technique. If we start from our rasterized frame diagram, we can use ray tracing to replace some or all of our light approximation passes. We still rasterize our G-Buffer -- it plays the role of our primary rays -- and then we use ray tracing to more realistically simulate the properties of light by querying into the rest of the scene. We still denoise and do a light composition pass, but our results are much more accurate to the scene data. This frame architecture provides a good foundation to explore a number of hybrid rendering techniques. Let's take a look at how we can encode a frame like this using Metal. We start with filling out our G-Buffer. To do so, we create a render pass and fill out a G-Buffer and set its textures as the attachments for our pass. We make sure our images are stored to memory so the rendered contents are available to subsequent passes. We start our pass, encode our rendering, and end the render pass. Next, we'll add a ray-tracing compute dispatch to this. So after we create the intermediate textures, let's encode our ray-tracing pass. We create the compute pass from the same command buffer and make sure to set the G-Buffer textures as inputs. By default, Metal will track write-read dependencies for you, so you're free to focus on your algorithm without being too concerned by synchronization. Since this is compute, we set our output textures to write the results of our ray-tracing work. We set the PipelineState object for our ray-tracing technique. Each thread in the compute shader will calculate the ray-tracing result for a pixel or region. Finally, we dispatch our 2D grid and end this pass. After this pass is encoded, we can now continue to encode more work such as the light accumulation pass, or we can submit the command buffer now so the GPU starts working on it while we encode the rest of the frame. Since we encoded our work in two passes, this requires saving our intermediate render attachments to system memory for the passes to communicate with each other. This works, but on Apple Silicon and iOS devices there's an opportunity to make this even better. On Apple GPUs, the hardware utilizes tile memory to hold our pixel data as we work on it. At the end of the pass, this tile memory is fleshed out to system memory and must be reloaded at the beginning of the next pass. Ideally, though, we would have the compute passes work directly on tile memory, avoiding the round trips to system memory. I'm excited to share this year we've added the ability to do that by dispatching ray-tracing work from render pipelines. This allows mixing render and compute via tile shaders in a single pass to leverage on-tile memory for ray tracing. This will reduce bandwidth use, memory consumption, and help your users' devices run cooler. Please make sure to review our 2019 "Modern rendering with Metal" session to learn how to efficiently mix render and compute, so you can apply that for ray tracing from render; as well as this year's "Enhance your app with Metal ray-tracing" session to learn about other improvements coming to Metal ray tracing this year. Now that we know how to encode a hybrid rendering workload, let's review some techniques that can be improved with ray tracing. We'll focus on shadows, ambient occlusion, and reflections. Let's start with shadows. Shadows help convey the proximity of objects to each other within the scene. This is a challenge for rasterization, though, because we lose the context of the scene at the time of shading. Shadow mapping can help supplement this lack of information but requires extra rendering from each light's point of view. This rasterization technique starts by rendering the scene from every light's perspective. This produces a series of depth maps that need to be stored alongside each light's transformation matrix. Then we render from the main camera's perspective. To shade each pixel, we need to convert the point to the light's coordinates. We sample the depth coming from the depth map and ultimately compare these depth values to determine if we're in light or shadow for each light source. There's a couple of drawbacks with this technique. First, we'll have to render the scene from the light's perspective for each light. This means processing the scene multiple times. Second, the shadow maps have a predetermined resolution, which means our shadows will be subject to aliasing; and worse, we won't have information for pixels that didn't fit in the image. Let's compare this to ray-traced shadows. To compute shadows with ray tracing, we can simply trace a ray from a point toward the direction of the light source and determine if any object is blocking its path. If we don't find anything, that means the point should take this light source into consideration for shading. In the case an object is blocking the path, we just exclude that light source's contribution in the lighting equation. Notice how this produces a natural shadow corresponding to the silhouette of the occluding object. Even better, we're no longer limited to the information stored in a depth map. We can determine shadows for points outside the light's frustum or camera's view. Let's see how our shadow technique is simplified with ray tracing. We start by rendering from the main camera. Next, we take the acceleration structure and the depth map rendered from the camera's position and feed it into our ray-tracing kernel. Calculate the pixel position, and then we simply trace a ray in the light's direction From this, we determine if the point is lit or in shadow, depending on whether an intersection was found with an occluding object. This process produces a shadow texture that we can then combine with our render pass results to get the final image. Let's look at how to code a Metal shader to do this. In our shader code, we start by calculating the position each thread will process from the depth and the thread_id. We create our shadow ray from the calculated position and set it up to trace in the light's direction. For most light types, like point lights, spotlights, and area lights, we set the min and max to trace all the way from the point to the light source. For directional shadows, we may want to set the max to infinity. Additionally, if we decide to implement cone ray tracing for softer shadows, this is a great place to add jitter to our shadowRay. We then create an intersector object. If we find any one intersection, that means we're in shadow, so we configure the intersector to accept any intersection. Finally, we intersect against the acceleration structure. Based on that intersection result, we write whether the point is lit or not, which creates a shadow texture that's more accurate to the scene. When that shadow texture is applied, you can see we get much more realistic shadows and get rid of the aliasing. With ray tracing, determining shadows becomes a very natural technique. We just trace a ray to find if something occludes the light source for that point or not. There is no longer a need to have intermediate depth maps, and we can avoid having multiple extra render passes for each light. This technique is easy to implement into a deferred or forward renderer, as it only depends on depth. And finally, it allows for custom intersection functions for translucent materials. Next, let's take a look at ambient occlusion. Conceptually, a point surrounded by geometry is less likely to receive a large amount of ambient light. Ambient occlusion consists in muting the ambient light received at a point based on how busy its neighborhood is, which naturally darkens crevices, giving the final image more depth. Rasterization techniques to achieve this depend on sampling the depth and normals in the neighborhood of the point, to determine if there are objects surrounding and potentially occluding it. Based on how many nearby objects are found, we calculate an attenuation factor to mute the ambient light and create a texture to apply to our image. Relying on screen-space information like depth buffer and surface normals, however, is missing information for nonvisible occluders and objects outside the border of the image. With ray tracing, instead of relying on screen-space information, we can rely on actual geometric data of our scene. The idea is for every pixel to shade, we generate random rays in a hemisphere and search for intersections against objects. If any intersections are found, we take it into consideration for our ambient occlusion factor. We start again with the acceleration structure. For this technique, we require normal data as well as depth, so we collect those in our G-Buffer pass. The depth and normals are used to generate the random rays in the hemisphere. Next, we trace rays and calculate the attenuation factor. This produces an image where crevices are naturally darkened, creating the effect. Let's take a look at a Metal shader for ambient occlusion. First, we generate the random rays. In this case, we take a cosineWeightedRay along the normal in each thread. We set the max_distance to a small number, as we're only interested in a small neighborhood. Next, we create our intersector and intersect the acceleration structure. Depending on the result, we accumulate into our attenuation factor. Here's a side-by-side comparison. And we can immediately see how much better the ray-traced approach looks. I want to highlight a few places that really show the limitations of a screen-space effect. Here is an example where the neighborhood is misrepresented due to limited screen-space information. This is because the actual geometry is almost perpendicular to the camera and, therefore, not in the depth buffer at this angle. The same problem occurs across the image, in particular, under the moped. From this angle, the bottom of the moped is missing from the depth buffer. So the screen-space technique completely misses the attenuation. The ray-traced version, on the other hand, correctly discovers the intersections against the bottom of the moped for the floor pixels. And here's a great example of the limitations around the screen border. The occluding geometry is offscreen, so its contribution is lost in the screen-space technique, but accounted for in ray tracing. As we can see, hybrid rendering provides a significant quality improvement by using the actual geometry of the scene, freeing the technique from limitations in screen-space information. And finally, let's take a look at reflections. Reflections have traditionally been very difficult for rasterization. Reflection probes is a technique that works well but is limited in resolution, requires filtering, and struggles with dynamic geometry. Screen-space reflection techniques are limited by screen-space information. Reflection probes are a solution that requires strategically placing cameras along the entire scene to capture surrounding color information. To use reflection probes, cube maps are captured from different locations in the scene. This is essentially a rendering of the scene in six directions from the same point. When a pixel is shaded, you calculate the relation to the probes and sample the cube maps to produce the reflected shading. For realistic results, usually many probes need to be scattered across the scene. And as dynamic objects move across the scene, shaders need to sample from more than one cube map and manually interpolate reflected colors. The cube maps also need to be prefiltered to accurately represent irradiance and are limited in resolution. Another rasterization technique, screen-space reflection, avoids some of these problems by basing its reflections on pixels already on the framebuffer. The fragment shader uses the normals to incrementally march outwards and check the depth map for potential nearby objects. If we find something, we sample the color directly from the frame buffer and shade it onto the output image. It does suffer, however, from the screen-space limitations we discussed earlier. Notice in this moped example how only part of the surface can get an accurate reflection, corresponding to the floor tiles present in the framebuffer. The rest of the scene is missing. Worse, the lower portion behind the fender, marked in yellow, is missing information; we have no way of knowing what the surface facing away from the camera would look like. The ray marching can also get computationally expensive. Ray-traced reflections, however, helps us overcome both sets of problems, as we can rely on the true scene information in the acceleration structure. Let's take a look at how a perfect mirror would work. First, we take the incident ray from the camera's position to the point. Then, we reflect this point on the normal associated with the point. This provides us with a direction we can trace a ray towards, and find any reflected objects. For this, we provide our reflection ray-tracing kernel with the normals and depth of the G-Buffer. This ray-tracing kernel calculates the view vector, from the camera to each point, reflects this vector, and traces a ray in that direction from the point. Finally, for accurate reflections, we can shade the intersection found directly in the ray-tracing kernel. Let's take a look at coding this shader. Once again, we start from the point's depth, and reconstruct its position. This time, we want the position to be in world space. So in our calculatePosition function, we'll need to multiply the inverse of the view matrix. Then, we calculate our reflected incident vector over the normal and create a ray in that direction. Next, we create our intersector and trace our reflectedRay. If we hit an object, we now shade that point to produce the reflection. If the intersection missed all objects, we can just sample a skybox and return its color to simulate a reflection that's showing the sky. Note that the shading is performed directly in the compute kernel for this technique. Let's compare reflection probes to ray-traced reflections. The image on the right used hybrid rendering, and we can see the details of the floor tiles much more clearly. The buildings are present, and we can even see shadows reflected on the front panel of the moped. Reflections are a natural fit for ray tracing. It nicely handles mirror-like reflections and rough reflections. Those can be achieved by tracing multiple rays along a cone and filtering the results. Because they rely on perfect information coming from the acceleration structure, ray-traced reflections are free from screen-space artifacts and can handle both static and dynamic geometry in the scene. Now, one important detail: we mentioned that for reflections, we need to shade the point directly in the compute kernel. Some techniques like this one or global illumination require accessing vertex data and Metal resources from the compute kernel directly. For these cases we need to make sure the GPU has access to the data that it needs to apply our shading equations. This is achieved with a bindless binding model which in Metal is represented as argument buffers. Please make sure to check out this year's "Bindless rendering in Metal" talk for more detail. We just saw how hybrid rendering can be put into practice with several different techniques. This leads to more natural algorithms that also have the advantage of producing more accurate results. In some cases, when we compare to the traditional rasterization techniques, we see that we can remove render passes and save memory and bandwidth in some cases. With the addition of ray tracing from render, we can even keep our entire work on chip. Ray-tracing adoption is a big task, and we have excellent new tooling to assist you in the process of bringing these techniques to your engine. This year, we're introducing tools that enable you to capture ray-tracing work, inspect acceleration structures, and inspect visible and intersection functions. Now David will give us a tour of these new tools. David Núñez Rubio: Thanks, Ali. My name is David Núñez Rubio, and I am a GPU software engineer. Last year, we introduced ray-tracing support in Metal. However, developing complex application can be challenging. Fortunately, Metal Debugger is here to help you. This year, we introduced ray-tracing support in Metal Debugger. Thanks to the adoption of hybrid rendering, our demo is looking better than ever. Ray-traced soft shadows, reflections, ambient occlusion; the results are amazing. During development of the demo, we hit some issues. This is how tools can help you resolving these problems. In this early version of the demo, ray-traced shadows have already been implemented. But if you look carefully, you'll notice missing shadows from the tree leaves on the ground. It is more obvious if we compare with the reference version. See reference versus ray traced. Let's jump into Xcode and take a capture to see how the tools can help us debugging this issue. We need to press the Metal button and click on Capture. Since this is a static problem, we just need a single frame. In the debugger, API calls are organized in the left side on the debug navigator. Let's unfold the offscreen command buffer to look for our shadow encoding. I have labeled my compute command encoder as "Raytrace Shadows." It is a good practice to label your Metal objects so you can easily find them in the Metal Debugger. The thumbnail also gives us a hint that, indeed, this is the encoder we are looking for. We can now click on the dispatch Threadgroups API call to show band resources. This is a list of all the objects associated with our current kernel dispatch. And here, we can see an acceleration of structure, which we have conveniently labeled as well. Our kernel uses an acceleration structure to cast rays. This is commonly implemented as a bounding volume hierarchy or BVH, which is a tree-like data structure representing the 3D world that rays will intersect. Now, double-click to open the acceleration structure viewer.
This is a great new tool built into the Metal Debugger. Let me give you an overview of how it is organized. On the right side, we have the 3D view where we have a ray traced visualization of our 3D scene, including any custom geometry or intersection functions. This works great with custom geometry such as hair or when using alpha testing. You can use familiar controls to move the camera and look around. And here's a tip: press Option key while scrolling to zoom in and out. We have built some great visualization tools to better understand our scene. Let's click on the highlighted menu to see the different modes available. For instance, we can visualize bounding volume traversals. This is a heat map showing how many nodes a single ray will need to traverse before hitting a surface. Darker colors mean more nodes need to be traversed and a slower intersection test. We can also color-code our scene based on acceleration structures... geometries... instances... or intersection functions.
Now that we are a bit more familiar with the tool, we can go back to our original problem. Thanks to the 3D view, we have confirmed that our geometry is there. So there must be something else. On the left side, there is the navigator area. Here we can see our top- and bottom-level acceleration structures. We can unfold any acceleration structures to see the list of geometries it is built from. We can unfold again to see their properties such as opacity or primitive count. We can also see the list of instances of this acceleration structure. Let's click on the tree leaves to reveal their instance on the navigator and inspect its properties. The matrix looks correct, and there are no flags set, but it seems that the mask is missing something. In this demo, we are using intersection masks. We use the lowest bit of the mask to flag objects casting shadows. Our intersector then will test this mask using a bitwise and operation and reject the intersection if it fails. We can visualize this behavior directly in the 3D view. We need to open the intersector hints menu. Here we can configure ray traversal options for visualization. We can change culling operations, disable custom intersections, or change the intersector’s mask. By default, it will intersect everything. Let's change it to the value that we are using for shadows. This will show us an exact visualization of our scene when using our shadow mask. And indeed, we have confirmed that our tree leaves are now missing. Once we have identified the problem, we need to go back to our source and make sure we are setting the right mask value. This is how shadows looked before. And this is how they look after fixing the mask value. This is an example of the workflows that can help you debug your ray-tracing applications. If you want to learn more about tools, make sure you check out this year's "Discover Metal debugging, profiling, and asset creation tools" WWDC session. In this session, we have reviewed how ray tracing can elevate your visuals. Hybrid rendering is the combination of rasterization and ray tracing. This allows replacing light approximation techniques with more accurate ones that also happen to be simpler. We also saw the new tools to aid you in the process of adopting ray tracing in your engine. We have only scratched the surface on what new possibilities are available to you by combining rasterization and ray tracing. We can't wait to see how you put these technologies in practice to develop the new innovative graphics techniques of the future. Thank you and enjoy the rest of WWDC. ♪
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.