How to bind threads to performance (P) or efficiency (E) cores?

For some simulation work-loads I have, I would like to use the system to its full potential and therefore use both P and E cores. Splitting the work-load into individual tasks is not easily possible (the threads communicate with each other and run in semi-lockstep). I can allocate smaller portions of the domain to the E cores (and iteratively adjust this so they take the same amount of time as the P cores).

But in order for this to work well, I need to ensure that a given thread (with its associated workload) is bound to the right type of core: *either* the performance (doing larger chunks of the domain) or the efficiency (doing smaller chunks of the domain) cores.

What's the best way to do this? So far, I don't think thread-to-core affinity has been something that was choosable in macOS.

The documentation mentioned the QoS classes, but which class(es) (or relative priorities) would I pick?

Code Block c
pthread_set_qos_class_self_np(QOS_CLASS_UTILITY, 0);


The existing classifications don't really map well, the work is user-initiated (i.e. they launched a console application), but not a GUI program. Would I use 4 threads with QOS_CLASS_UTILITY and 4 with QOS_CLASS_BACKGROUND? Would I just use UTILITY with relative priority for performance vs. efficiency cores?

Replies

Were you able to solve this? I'm looking to do some research on related topics and this would be what I need to make my research feasible.
Nope, but I hope to be able to do some experiments on real hardware once by DTK replacement arrives.

I've been investigating a similar issue with my codebase.

In particular, as this stack overflow topic points out, when my threads are distributed across all cores with equal workload, the efficiency cores create a significant bottleneck: https://stackoverflow.com/questions/66348801/how-to-utilize-the-high-performance-cores-on-apple-silicon

I have been searching high and low for a mechanism to deal with this. It seems like disabling efficiency cores for my application would be better than the current situation.

(EDIT: Seems there's an open ticket in this thread for future readers: https://developer.apple.com/forums/thread/703361)

Yeah, I've been running as many threads as there are P-cores for my simulations and making sure to never yield. Unfortunately, it isn't always easy / possible to that, so I would still appreciate either a proper core affinity API, or at least a way to opt out of E-cores.

The core asymmetry can create quite annoying situations with OpenMP for example where work is equally distributed.

The Swit 5.7 application I’m working appears to have its @MainActor workload (which is always on the main thread) moved to the efficiency cores for no apparent reason, (only sometimes and randomly) resulting in very poor performance in rendering content in SceneKIT (lower than actual 10 FPS, all the while the instrumentation insist 60 FPS is what it is being rendered at).

The app only moves back the main thread to performance cores after either one of these actions:

  • background the app (switych to another app or the homescreen) then restore focus.
  • pull down the semi transparent (glassy) control center curtain all the way down (has to be all the way down) then dismiss it.

All non rendering code in the app executes on secondary threads using .userInteractive initiated DispatchQueue (afterwhich most code resumes on random secondary threads via awaits (async awaits everywhere) except for some code which updates the main thread parts (SceneKIT, SwiftUI stuff, etc) - variations of this: DispatchQueue.main.asyncAfter(deadline: .now() + duration, qos: .userInteractive, flags: .enforceQoS

How do I force the main thread which does the rendering to stay on the performance cores at all times or at least only while redrawing?

Possible answer: move as many secondary threads as possible lower to .userInitiated as they may have been competing with the main thread for priority. I don’t know how to trigger the behaviour to test this out.

  • Limiting how many hard working threads withTaskGroup creates to the number of total cores minus 1 could be what worked to fix this - I haven’t experienced the issue since then.

Add a Comment

Limiting how many hard working threads withTaskGroup creates to the number of total cores minus 1 could be what worked to fix this - I haven’t experienced the issue since then.

Which means work was automatically assigned or moved to performance cores and whatever was interfering (too much context switching? or smth with threads in another [system] process) could execute more smoothly on the less busier core.

  • To makes things clearer: when the issue was triggered some of the intensive CPU work was assigned to efficiency cores and remained bound to those cores until computation settled down even if computation meant multiple awaits with many small short lived workloads. That is, when things became completely idle for long enough (fraction of a second?) only then the binding of one or more of the actor threads to the efficiency cores ceased and normal performance resumed for future computation.

Add a Comment

withTaskGroup was calling multiple actors. Each actor (as of writing this) binds all its execution to a single thread.