Metal

Vulkan and Metal (some observations)

As Vulkan spec has been released few days ago, I think it might be interesting to look at how it compares to what Apple gives us with Metal. First of all, a disclaimer: this is mostly from an academic standpoint, I am interested to comparing the APIs, the provided fetures and their relative merits. I hope that some other people here who are curious about GPUs, APIs and API design could offer their thouhgts on the matter.Some folks (me included) were quite dissapointed to learn that Apple is jumping the Vulkan bandwagon. After reading the spec, I think I understand why. And I am starting to believe this might be a very reasonable move by Apple. Here are some thoughts.I think it should be fairly clear that Vulkan offers higher performance potential then Metal. Metal still does a lot of hand holding and behind-the-scenes management for you, while with Vulkan you are responsible for — literally — everything. And man they were NOT kidding when they said that the API is explicit. Its actually quite ridiculous how diffficult and detailed the API is. Of course, the nice thing is that you can optimise the resource usage very precisely in regards to the specifics of your engine, and you get quite precise performance guarantees. On the other side, you need to make sure that the data you use for a particular pass is in the device memory, which means juggling data around, recreating resources, breaking down yoru renderign commands and doing all kinds of weird memory dances. In fact, I can't imagine that many people will use Vulkan directly, instead we will see a bunch of wrapper libraries that abstract the tedious tasks like manual memory management and operation synchronisation. At the same time — and that is the funny thing — Vulkan does not seem that much more powerful to me. Yes, it supports stuff like geometry and tesselation shaders, it has batched bindings updates, sparse ressources, command buffer reuse and atomic texture operations. But all these things can be trivially added to Metal (and I'm sure Apple is working on that already). The ressource binding model of Vulkan is more efficient, that for sure, but it is certainly not more powerful — it does not allow you to build more complex shader inputs than what Metal already offers.The explicit nature of Vulkan might offer additional optimisation opportinuties to applications seeking to squeeze those 100% out of the hardware, but at the extreme expense of usability. Metal is a more casual API, which is very convenient to use and still offers very good performance (and performance guarantees) that will satisfy an overwhelming majority of applications, both for graphics and compute. With some extensions, it will basically have feature parity with Vulkan, and it can easily borrow some of Vulkan's optimisations without sacrifising ease of use (e.g. batched binding updates, reusable command buffers as well as synchronisation primitives). And let's be honest here — applications that really need explicit control like Vulkan provides are high-end game titles, which are not targeted at the Apple platform anyway (because they require really beefy GPUs, which Apple simply does not ship in their machines). I think Apple might have lost the initial interest in Vulkan after they saw what it was shaping up to become. They were interested in having a convenient and efficient replacement for the difficult to maintain and erratic OpenGL. Vulkan is certainly efficient but I wouldn't call it 'convenient'. Its not an API that would draw developers (especially small-time developers) away from using OpenGL or encourage them to make more titles for OS X. Instead, Metal hits the spot exactly. I still would like to see Vulkan on OS X and iOS at some point (to make it easier for devs to port from other platforms), and from what I gathered, it should be actually possible to implement a Vulkan wrapper on top of Metal (which will of course lack features such as sparse resouces, tesselation shaders etc. — but thats is still perfectly legal according to the Vulkan spec). Personally however, I'd be much more interested in a Metal implementation on top of Vulkan to use on Windows/Linux.

Metal

Posted

by

jcookie

Metal default library not found

When I used metal api to build a thrid static library for producing ".a" lib.But I add my ***.metal file into my library project target, and run an example project which depend on this metal static lib. it failed at this place:id <MTLLibrary> _flibrary = [_mDevice newDefaultLibrary];//failed.erro:/BuildRoot/Library/Caches/com.apple.xbs/Sources/Metal/Metal-56.7/Framework/MTLLibrary.mm:1842: failed assertion `Metal default library not found'But when I add my ***.metal file into my example project, and it works.My question is : I do not want to show my ***.metal file for example project, what method could I using for hiding the metal file? Please show me ,thx...

Metal

Posted

by

erickingxu

Will there be a swift based shading language for Metal API?

I love to use just one language for all. Swift enables scripting as well. I would like to see it as Metal shading language as well apart from using it as a Shell. C++ based Metal is great but I don't want to mix it with Swift.

Metal

Posted

by

neckTwi

Swift Package with Metal

Hi, I've got a Swift Framework with a bunch of Metal files. Currently users have to manually include a Metal Lib in their bundle provided separately, to use the Swift Package. First question; Is there a way to make a Metal Lib target in a Swift Package, and just include the .metal files? (without a binary asset) Second question; If not, Swift 5.3 has resource support, how would you recommend to bundle a Metal Lib in a Swift Package?

Posted

by

Heestand

Metal Perspective Martrix issue

new to Metal, following Janie Clayton's book. Ran into a problem creating a proper Perspective Projection Matrix. I'm hoping someone with matrix experience will see this and the issue will jump out. the matrix structures: swift: struct Matrix4x4{ var X : SIMD4<Float> var Y : SIMD4<Float> var Z : SIMD4<Float> var W : SIMD4<Float> metal: float4x4 projectionMatrix; the swift code that generates the projection matrix: static func perspectiveProjection(_ aspect : Float32, fieldOfView: Float32, near: Float32, far: Float32)->Matrix4x4{ var mat = Matrix4x4() let zRange = far - near let fovRadians = fieldOfView * Float32(Double.pi / 180.0) let yScale = 1 / tan(fovRadians * 0.5) let xScale = yScale / aspect let zScale = -( far + near) / zRange let wzScale = -2 * far * near / zRange mat.X.x = xScale mat.Y.y = yScale mat.Z.z = zScale mat.Z.w = -1 mat.W.z = wzScale mat.W.w = 0 return mat } how the shader applies the projection matrix : outVertex.position = uniforms.projectionMatrix * props.modelMatrix * vert[vid].position; the result here is just the clear color. it seems that the issue is with wZScale. hard coding that to zero and the mat.W.w to 1.0, allows me to at least see my scene, skewed. messing around with those values, it seems like the objects are crushed and pushed through the camera, existing behind it. I'm basically dead in the water here, typing word for word what is in the book. it's pretty darned frustrating. I'm just learning my way around matrices.

Metal

Posted

by

eblu

kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted

When I use metal to render, the application switch to the background resulting in metal rendering failure in iOS 15 sys. How can I do? Error: Execution of the command buffer was aborted due to an error during execution.Insufficient Permission (to submit GPU work from background) (00000006:kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted)

Posted

by

ShiQuanYang

How to fully apply parallel computing on CPU and GPU of M1max

Project is based on python3.8 and 3.9, containing some C and C++ source How can I do parallel computing on CPU and GPU of M1max In deed, I buy Mac m1max for the strong GPU to do quantitative finance, for which the speed is extremely important. Unfortunately, cuda is not compatible with Mac. Show me how to do it, thx. Are Accelerate(for CPU) and Metal(for GPU) can speed up any source by building like this: Step 1: download source from github Step 2: create a file named "site.cfg"in this souce file, and add content: [accelerate] libraries=Metal, Acelerate, vecLib Step 3: Terminal: NPY_LAPACK_Order=accelerate python3 setup.py build Step 4: pip3 install . or python3 setup.py install ? (I am not sure which method to apply) 2、how is the compatibility of such method? I need speed up numpy, pandas and even a open souce project, such as https://github.com/microsoft/qlib 3、just show me the code 4、when compiling C++, C source, a lot of errors were reported, which gcc and g++ to choose? the default gcc installed by brew is 4.2.1, which cannot work. and I even tried to download gcc from the offical website of ARM， still cannot work. give me a hint. thx so much urgent

Posted

by

jefftang

Some feature requests for Metal

Hello guys. With the release of the M1 Pro and M1 Max in particular, the Mac has become a platform that could become very interesting for games in the future. However, since some features are still missing in Metal, it could be problematic for some developers to port their games to Metal. Especially with the Unreal Engine 5 you can already see a tendency in this direction, since e.g. Nanite and Lumen are unfortunately not available on the Mac. As a Vulkan developer I wanted to inquire about some features that are not yet available in Metal at the moment. These features are very interesting if you want to write a GPU driven renderer for modern game engines. Furthermore, these features could be used to emulate D3D12 on the Mac via MoltenVK, which would result in more games being available on the Mac. Buffer device address: This feature allows the application to query a 64-bit buffer device address value for a buffer. It is very useful for D3D12 emulation and for compatibility with Vulkan, e.g. to implement ray tracing on MoltenVK. DrawIndirectCount: This feature allows an application to source the number of draws for indirect drawing calls from a buffer. Also very useful in many gpu driven situations Only 500000 resources per argument buffer Metal has a limit of 500000 resources per argument buffer. To be equivalent to D3D12 Resource Binding Tear 2, you would need 1 million. This is also very important as so many DirectX12 game engines could be ported to Metal more easily. Mesh shader / Task shader: Two interesting new shader stages to optimize the rendering pipeline Are there any plans to implement this features in future? Is there a roadmap for metal? Is there a website where I can suggest features to the metal developers? I hope to see at least the first 3 features in metal in the future and I think that many developers feel the same way. Best regards, Marlon

Posted

by

zmxrlxn

Why i enabled Metal API in `encode` function but my Coreml custom layer still run on CPU

I implement a custom pytorch layer on both CPU and GPU following [Hollemans amazing blog] (https://machinethink.net/blog/coreml-custom-layers ). The cpu version works good, but when i implemented this op on GPU it cannot activate "encode" function. Always run on CPU. I have checked the coremltools.convert() options with compute_units=coremltools.ComputeUnit.CPU_AND_GPU, but it still not work. This problem also mentioned in https://stackoverflow.com/questions/51019600/why-i-enabled-metal-api-but-my-coreml-custom-layer-still-run-on-cpu and https://developer.apple.com/forums/thread/695640. Any idea on help this would be grateful. System Information mac OS: 11.6.1 Big Sur xcode: 12.5.1 coremltools: 5.1.0 test device: iphone 11

Posted

by

stx-000

Metal + UIKit Timing Issues

Hi! I am currently finalizing a new app that uses Metal to render a 3D scene and a UIKit overlay to display controls for interacting with objects in the scene. The render loop is driven via a CADisplayLink with its preferredFramesPerSecond set to 60. I have recently noticed an issue where the app reports a steady 60 fps frame rate in the Xcode debug navigator, but still felt sluggish on the device. This feeling was only present on devices with ProMotion and often started after interactions with the UIKit overlay. I started investigating by using Metal System Trace and quickly found an explanation for the sluggish feeling: occasionally, the app would switch from its nominal 16ms-16ms-16ms cadence to 12ms-20ms-12ms, thus still averaging 60 fps, but with inconsistent frame times. Pictures of the timeline can be found here. I have tried setting the CAMetalLayer's presentsWithTransaction to true, waiting for the command buffer to be scheduled and then presenting the drawable, but, unfortunately, the problem persists. If anybody can think of a potential reason / solution for this, I would be very thankful.

Posted

by

vortycon

AMD RX GPU Support

Hello, Not sure this is the correct place, but I just want an update or an answer to support for the AMD Radeon RX 6700 XT on Mac OS? as the newer offering the AMD Radeon RX 6600 XT was quickly supported in Mac OS, yet the RX 6700 XT has been out for longer and no support still to add the RX 6800, RX 6800 XT and RX 6900 XT are all supported as well RX 6600 XT was released August 2021 and was supported in Mac OS Monterey 12.1 RX 6700 XT was released, March 2021 and is still not supported even in Mac OS Monterey 12.3 beta Is there any plans for supporting this GPU going forward? as the on going shortage it makes it harder to find GPUs in this range

Metal

Posted

by

AMDFan

Metal Ray Tracing, RealityKit, SwiftUI Problems

First of all, I apologize for such a general question, but my code is far too long to post. However, I have narrowed down my problems and am seeking advice for any solutions/workarounds that may help me. I recently updated my physics simulation code from using Metal Performance Shaders for ray tracing to using the (fairly) new Metal ray tracing routines directly. A few notes on my program: I perform the ray tracing entirely in a separate thread using Compute Kernels -- there is no rendering based on the ray tracing results. The compute kernels are called repeatedly and each ends with a waitUntilCompleted() command. The main thread is running a SwiftUI interface with some renderings that use RealityKit to display the scene (i.e., the vertices), and the rays that traverse the scene using Metal Ray Tracing. This is purely a Mac program and has no IOS support. So, the problem is that there seems to be some conflict between RealityKit rendering and the ray-tracing compute kernels where I will get a "GPU Soft Fault" when I run the "intersect" command in Metal. After this soft-fault error, my Ray Tracing results are completely bogus. I have figured out a solution to this which is to refit my acceleration structures semi-regularly. However, this solution is inelegant and probably not sustainable. This problem gets worse the more I render in my RealityKit display UI (rendered as a SwiftUI view) so I am now confident that the problem is some "collision" between the GPU resources needed by my program and RealityKit. I have not been able to find any information on what a "GPU Soft Fault" actually is although I suspect it is a memory violation. I suspect that I need to use fences to cordon off my ray tracing compute kernel from other things that use Metal (i.e., RealityKit), however I am just not sure if this is the case or how to accomplish this. Again, I apologize for the vague question, but I am really stuck. I have confirmed that every Metal buffer I pass to my compute kernel is correct. I did this confirmation by making my object a simple cube and having only one instance of this cube. Something happens to either corrupt the acceleration structure data or to make it inaccessible during certain times when RealityKit needs to use the GPU. Any advice would be appreciated. I have not submitted a bug report since I am still not sure if this is just my lack of advanced knowledge of multiple actors requiring GPU use or if there is something more serious here. Thanks in advance, -Matt

Metal

Posted

by

rad.bobby

Unreal Engine / Path tracer

I‘am architect in switzerland. In our office we use since 20 years only apple computers. And we love them! For our visualisations we also use twinmotion and unreal engine. One key feature the pathe tracer are not supportet for mac os. Some people say thats because apple wont support hardware accelarated gpu. Now im wondering if its on your roadmap? Or are you in discoussion with the decelopers from epic? Would be great to have this feature :) best Kevin

Metal

Posted

by

KevinRub

Blank scene with xcode connected and "Metal -> API Validation" turned off.

I'm not sure which combination of iOS/XCode/Mac OS is causing this issue, but all of a sudden when I try to run our SceneKit app and the "Scheme -> Diagnostics -> Metal -> API Validation" setting is turned off the scene won't render and the console is just full of the following errors: Execution of the command buffer was aborted due to an error during execution. Invalid Resource (00000009:kIOGPUCommandBufferCallbackErrorInvalidResource) [SceneKit] Error: Main command buffer execution failed with status 5, error: Error Domain=MTLCommandBufferErrorDomain Code=9 "Invalid Resource (00000009:kIOGPUCommandBufferCallbackErrorInvalidResource)" ) If you run the app outside of xcode it's fine, also enabling the "API Validation" option stops the issue. One of my schemes has this option disabled since the project began and never had an issue before. Just throwing this out there incase someone else has spent hours of their life trying to figure out why this is not working for them. Also you can just create a new SceneKit project and turn that diagnostic option off and the app won't render anything.

Posted

by

markdaws

Introducing Accelerated PyTorch Training on Mac in v1.12

https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac Does this feature support AMD GPUs with Metal or only M1 support? Does v1.12 nightly build support the Apple Metal only with source?

Metal

Posted

by

dbl001

Apple's Metal Simulation fails on SIGABRT on MTLTextureUsageShaderRead

I've downloaded Apple's metal simulator example from here https://developer.apple.com/documentation/metal/supporting_simulator_in_a_metal_app and tried simulating an iphone 11 with 11.5 and 12 using iOS, and iPad 9th gen. I get the error below when running the simulator on all the above devices. -[MTLDebugRenderCommandEncoder validateCommonDrawErrors:]:5252: failed assertion `Draw Errors Validation Fragment Function(blendFragmentShader): Shader reads texture (prevColor[1]) whose usage (0x04) doesn't specify MTLTextureUsageShaderRead (0x01) The error occurs in the drawBox:rendererEncoder at the code below when boxIndex = 1 for(MTKSubmesh *submesh in _meshes[boxIndex].submeshes) { [renderEncoder drawIndexedPrimitives:submesh.primitiveType indexCount:submesh.indexCount indexType:submesh.indexType indexBuffer:submesh.indexBuffer.buffer indexBufferOffset:submesh.indexBuffer.offset]; } I'm using a Macbook pro, macos 12.2 and Xcode 13.3. I hoped that this example would be an easy way to get experience with metal, but I've not got the experience I need to interpret what I should change to address the "doesn't specify MTLTextureUsageShaderRead" error. Any help is greatly appreciated.

Posted

by

epatton

iOS app on MacOS M1 - Window Resize or Full Screen

MacOS M1 machines can run iOS applications. We have an iOS application that runs a fullscreen metal game. The game can also run across all desktop platforms via Steam. In additional to Steam, we would like to make it available through the AppStore on MacOS. We'd like to utilise our iOS builds for this so that the Apple payment (micro-transactions) and sign-in processes can be reused. While the app runs on MacOS, it runs in a small iPad shaped window that cannot be resized. We do not want to add iPad multitasking support (portrait orientation is not viable), but would like the window on MacOS to be expandable to full screen. Currently there is an option to make it full screen, but the metal view (MTKView) delegate does not receive a drawableSizeWillChange event for this, meaning the new resolution of the window cannot be received. Is there another method of retrieving a window size change event in this context? What is the recommended way of enabling window resizing on MacOS but not iPad for a single iOS app?

Posted

by

cacophany53ET

Metal 3 compatible with M1 Pro GPU?

I would like to know if the applications/games targeting the Metal 3 API will be fully compatible with the M1 Pro GPU. Thanks.

Metal

Posted

by

adnanbaloch

functions.data in Cache was deleted when launch after reboot

In my game project, there is a functions.data file in then /AppData/Library/Caches/[bundleID]/com.apple.metal/functions.data, when we reboot and launch the game, this file was rest to about 40KB, normaly this file's is about 30MB, this operation was done by the metal, Is there any way to avoid it?

Posted

by

seasonxxli

Problems with mesh shaders that dispatch large amount of threadgroups

I was familiarising myself with the Metal mesh shaders and run into some issues. First, a trivial application that uses mesh shaders to generate simple rectangular geometry hangs the GPU when dispatching 2D grids of mesh shader threadgroups, but it's really weird as it is sensitive to the grid shape. E.g. // these work! meshGridProperties.set_threadgroups_per_grid(uint3(512, 1, 1)); meshGridProperties.set_threadgroups_per_grid(uint3(16, 8, 1)); meshGridProperties.set_threadgroups_per_grid(uint3(32, 5, 1)); // these (and anything "bigger") hang! meshGridProperties.set_threadgroups_per_grid(uint3(16, 9, 1)); meshGridProperties.set_threadgroups_per_grid(uint3(32, 6, 1)); The sample shader code is attached. The invocation is trivial enough: re.drawMeshThreadgroups( MTLSizeMake(1, 1, 1), threadsPerObjectThreadgroup: MTLSizeMake(1, 1, 1), threadsPerMeshThreadgroup: MTLSizeMake(1, 1, 1) ) For apple engineers: a bug has been submitted under FB10367407 Mesh shader code: 2d_grid_mesh_shader_hangs.metal I also have a more complex application where mesh shaders are used to generate sphere geometry: each mesh shader thread group generates a single slice of the sphere. Here the problem is similar: once there more than X slices to render, some of the dispatched mesh threadtroups don't seem to do anything (see screenshot below). But the funny thing is that the geometry is produced, as it would occasionally flicker in and out of existence, and if I manually block out some threadgroups from running (e.g. by using something like if(threadgroup_index > 90) return; in the mesh shader, the "hidden" geometry works! It almost looks like different mesh shaders thread group would reuse the same memory allocation for storing the output mesh data and output of some threadgroups is overwritten. I have not submitted this as a bug, since the code is more complex and messy, but can do so if someone from the Apple team wants to have a look.

Metal

Posted

by

jcookie

Posts under Metal tag