How to prevent TSAN from reporting data-races that are intentional and considered safe?

Question

I'm having a hard time relying on TSAN to detect problems due to its rightful insistence on reporting data-races (I know, stick with me). Picture the following implementation of a lazily-allocated property in an Obj-C class:

@interface MyClass {
    id _myLazyValue; // starts as nil as all other Obj-C ivars
}
@end

@implementation MyClass
- (id)myLazyValue {
    if (_myLazyValue == nil) {
        @synchronized(self) {
            if (_myLazyValue == nil) {
                _myLazyValue = <expensive computation>
            }
        }
    }
    return _myLazyValue;
}
@end

The first line in the method is reading a pointer-sized chunk of memory outside of the protection provided by the @synchronized(...) statement. That same value may be written by a different thread within the execution of the @synchronized block. This is what TSAN complains about, but I need it not to.

The code above ensures the ivar is written by at most one thread. The read is unguarded, but it is impossible for any thread to read a non-nil value back that is invalid, uninitialized or unretained. Why go through this trouble? Such a lazily-allocated property usually locks on @synchronized once, until (at most) one thread does any work. Other threads may be temporarily waiting on the same lock but again only while the value is being initialized. The cost of allocation and initialization is guaranteed to be paid once: multiple threads cannot initialize the value multiple times (that’s the reason for the second _myLazyValue == nil check within the scope of the @synchronized block). Subsequent accesses of the initialized property skip locking altogether, which is exactly the performance we want from a lazily-allocated, immutable property that still guarantees thread-safe access.

Assuming there isn't a big embarrassing hole in my logic, is there a way to decorate specific portions of our sources (akin to #pragma statements that disable certain warnings) so that you can mark any read/write access to a specific value as "safe"? Is the most granular tool for this purpose the __attribute__((no_sanitize("thread")))? Ideally one would want to ask TSAN to ignore only specific read/writes, rather than the entire body of a function.

Thank you!

665

Posted by

FxFactory

Reply

Add a Comment

Answer 1

My general advice is that you avoid these sorts of ‘clever’ optimisations, replacing them with more straightforward code. That’ll fix this problem and it’ll likely improve performance too.

The most straightforward way to write this is as follows:

- (id)myLazyValue {
    id result = nil;
    [self->_lock lock];
    if (self->_myLazyValue == nil) {
        self->_myLazyValue = expensiveComputation();
    }
    result = self->_myLazyValue;
    [self->_lock unlock];
    return result;
}

where _lock is an NSLock. In this design result is confined to the current thread and _myLazyValue is only ever access from within the lock.

If you need to go faster, change _lock to an os_unfair_lock_t. At that point it’s unlikely you’ll be able to measure the performance difference between it and your current code.

Two further notes. First, @synchronized is terrible from a performance perspective. Avoid it.

Second, your current code probably isn’t legal Objective-C. Specifically, _myLazyValue can be accessed by multiple threads concurrently, and that’s not defined to be safe. While your current setup likely makes it safe in practice, that’s not the same thing. Future changes to the compiler, CPU, or language runtime could make it unsafe. This is why TSan is complaining, and it’s better to switch to an approach that’s defined to be safe than to spend time trying to suppress TSan.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Posted by

eskimo

Add a Comment

Answer 2

Thanks Quinn! Will take your advice to heart. Would you say that only atomic operations are reasonably immune from being flagged as TSAN evolves?

Posted by

FxFactory

Add a Comment

Answer 3

Consider compiling as objC++ and using std::call_once().

Posted by

endecotp

Add a Comment

Answer 4

Would you say that only atomic operations are reasonably immune from being flagged as TSAN evolves?

No. Atomic operations are one way to product correct threaded code. There are others, including Dispatch and the various locking primitives. The problem here is that you’re protecting _myLazyValue with a lock on one code but not the other.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Posted by

eskimo

Add a Comment

Answer 5

Sorry I should have been a lot clearer on what I meant. Let’s rephrase: no matter how "clever" a technique is, and whether it may ultimately guarantee safe access of shared state under any circumstance, any time the same location in memory is read or written concurrently by two separate threads TSAN will flag it as a violation. My question was whether atomic operations (e.g. atomic_load/_fetch_add/_fetch_sub) would ever trigger TSAN. They clearly touch a shared memory location simultaneously, but their very nature is to do so predictably. My guess is they shouldn't upset TSAN but if you google related keywords you'll end up reading about whether false positives are possible (perhaps in previous versions of TSAN only?)

Since this thread is turning more interesting than I had hoped for, it goes without saying Quinn’s advice is sound as always: all "clever" techniques are a difficult bet not just on writing code that is safe today, but will forever remain safe in light of evolving compiler technologies. Since his first reply I've already guilt-added some os_unfair_lock/unlock to keep TSAN watching over some functions I had previously disabled TSAN on.

My use of @synchronized() in that particular example is made with full awareness of its poor performance: does it matter how badly the implementation is, when you pay the price once in most cases? In some parts of my codebase relying on the syntactic sugar seems fair. We use GCD queues, os_unfair_lock, atomic_int and pthreads elsewhere, as appropriate. I assume (wrongly?) that @synchronized uses NSRecursiveLock or something similar to it behind the scenes. Would that be the reason it is slow? With so much else in the ObjC runtime having been optimized to the extreme, one wonders why this particular hole was left alone.

Another area where TSAN is clearly upset at my codebase is more "clever" techniques that rely on the atomicity of pointer-sized assignments on both Intel and ARM 64-bit architectures. The reason why you can get away with unguarded reads in (probably all) these techniques is that the bit pattern of a pointer-sized value will never contain a partially written sequence of bits. It will either contain all bits from the last-executed write, or it won't. The bits you read on any thread will never be the result of interleaving previous state with state from an "in-flight" CPU instruction. You can have some fun with this – again with the noble goal of avoiding any and all kinds of locking primitives in your most common access pattern. To be clear: this liberally-defined "atomicity" of a CPU instruction is a very, very thin foundation on which to be clever. But there are legitimate cases where trying to squeeze some extra performance makes sense for us, and I assume for other multi-threaded, graphics-intensive code aiming for highest FPS possible. My original question/hope was indeed one that makes sense in my domain: can one selectively disable TSAN on a statement-by-statement basis, or is one forced to snooze it for an entire function body? It seems that right now, only the blunt option is available.

@endecotp: I assume std::call_once() uses the same technique and therefore has the same excellent performance as dispatch_once. I am not against it, but would rather keep sources Obj-C only. Why would anyone prefer @synchronized over a dispatch_once? I sometimes do, when performance isn't an issue, so that the code is easier to read. Syntactic sugar is very sweet and one wonders why they stopped improving Obj-C in this regard. Anyone who follows Swift language "evolution" should readily notice that they entertain introducing language-wide changes for almost any scenario they come across. The line of where to stop adding syntactic sugar is rarely drawn, and each release adds more and more scenarios where you write less code (this trend troubles me in profound ways, but I like it :-)). There is something to dislike about dispatch_once and std:call_once() too... that pesky statically allocated token that sits there and – if you leave it at file scope – has the potential to be reused accidentally, be it from an auto-complete typo or by carelessly copying/pasting code. My preference is to scope a dispatch_once token within the body of a block defined locally to the function. This involves C macros though. For example if you want to have a lazily allocated, static/global variable:

#define UNPACK(...) __VA_ARGS__
#define ONCE(body) (^typeof((body)) { \
    static typeof((body)) value;\
    static dispatch_once_t once; \
    dispatch_once(& once, ^{ value = (body); }); \
    return value; \
})()
/// You would only use the following macro. The above two are just to support the pattern
#define Once(...) ONCE(UNPACK(__VA_ARGS__))

The goal being that the dispatch_once_t variable is static and crucially starts as 0 when your process is loaded, but it is invisible outside the scope of the locally defined block. No variable name "leaks" outside that scope, and the resulting code (e.g. someVar = Once([[MyClass alloc] init]) can be copied/pasted at will with no fear of re-using the same static token by accident. (Again in the hope I'm not overlooking something horrible...)

Posted by

FxFactory

Add a Comment

Answer 6

I assume std::call_once() uses the same technique and therefore has the same excellent performance as dispatch_once.

You can read what it does here:

https://github.com/llvm/llvm-project/blob/main/libcxx/src/call_once.cpp

There is a comment at the start about dispatch_once.

It doesn’t do double-checked locking; it seems to unconditionally lock a pthread_mutex.

Posted by

endecotp

Thank you for the link! It won't hurt to become familiar with that codebase.

—
FxFactory

Add a Comment

Answer 7

any time the same location in memory is read or written concurrently by two separate threads TSAN will flag it as a violation.

… unless there’s some sort of locking that coordinates the access. In my example above, _myLazyValue can be accessed by multiple threads but that’s protected by _lock and thus TSan won’t complain.

does it matter how badly the implementation is, when you pay the price once in most cases?

Sure, assuming you could implement the fast path without locking, and the fact that you can’t is why you started this thread (-:

Would that be the reason it is slow?

Honestly, it’s probably not that bad these days.

The fundamental issue with @synchronized is that there’s a space / speed trade-off. It’d be fast if you allocated a mutex per object, but that uses a lot of space. Rather, it hashes the object pointer and then uses a side table of mutexes.

If you’re curious how it really works, you can see the source code in Darwin, starting here.

Regardless, it’s never going to be as fast as an inline os_unfair_lock_t.

can one selectively disable TSAN on a statement-by-statement basis, or is one forced to snooze it for an entire function body?

I don’t know. I’ve never looked into this because I fundamentally disagree with the goal.

TSan has its own LLVM-based community, so you might find more answers over there. For example, this page is probably interesting to you:

https://clang.llvm.org/docs/ThreadSanitizer.html

Regarding dispatch_once, be aware that its optimisation is heavily dependent on the fact that the once token is a static, and thus initialised to zero before any user code runs. That means the once token only even transitions from zero to non-zero, never the other way. It exploits this invariant to make the fast path fast.

That optimisation doesn’t work for your situation because the memory occupied by your MyClass instance can be freed and reused [1].

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] Folks try to work around this by setting the once token back to zero, but this isn’t safe.

Posted by

eskimo

Thanks Quinn, you’ve given me more than enough pointers to move forward.

—
FxFactory

Add a Comment

Answer 8

Since I've come to depend on threads such as these as essential, living-breathing documentation, I wanted to add an epilogue for whoever stumbled on this. I did end up removing any code that skipped locking, allowing me to re-enable TSAN everywhere. Recall that my very first snippet of code relied on the following:

Another area where TSAN is clearly upset at my codebase is more "clever" techniques that rely on the atomicity of pointer-sized assignments on both Intel and ARM 64-bit architectures. The reason why you can get away with unguarded reads in (probably all) these techniques is that the bit pattern of a pointer-sized value will never contain a partially written sequence of bits

Slightly rephrased: you can read a lazily-allocated variable without a lock because it will either be nil or it will contain a fully initialized, retained reference to your local variable. Only when your unguarded read returns nil you pay the price for the expensive lock (@synchronized). The second identical check against nil within the protected code avoids allocating and initializing the variable twice.

That original assumption was wrong in more than just the advice given by Quinn. Some digging around revealed that when optimizing code for size, the compiler may in fact update a single pointer-sized variable in two instructions. This means a different thread might in fact be looking at a partial bit-pattern from a fully executed instruction (obviously unsafe!). I don't know what pre-conditions need to be in place for the compiler to do this. It is not super interesting to me, beyond the fact that it can happen and therefore your code cannot assume otherwise. Apparently this same "clever" assumption was at some point made by/in the Obj-C runtime, causing problems. I wish could locate the original Twitter/Mastodon post but I wasn't keeping good records during my sleuthing.

In my original snippet of pseudo-code, you could explicitly update the pointer-sized value atomically:

- (id)myLazyValue {
    if (_myLazyValue == nil) {
        @synchronized(self) {
            if (_myLazyValue == nil) {
                id value = <allocate and initialize>
                // Storing pointer-sized value on the stack so member variable
                // accessed by multiple threads can be updated atomically below
                <atomically write value to _myLazyValue>
            }
        }
    }
    return _myLazyValue;
}

...and in principle that works around the compiler potentially updating the pointer-sized value through multiple instructions, but at least for my use case this falls firmly into the "overkill" category. It is still not a safe design, in the sense that it would trigger TSAN just the same.

The bigger lesson (at least for me) is that TSAN isn’t even the main reason to abandon these techniques. Creating "safe by design" code means you don't have to know or worry about edge cases in the compiler, or that the target ISA even gives the compiler options to do something truly unexpected. The risk/reward for trying something like this seems tipped heavily towards risk.

As to the poor performance of @synchronized... it is not horrible but you'll find plenty of blog posts comparing various primitives/techniques on OS X and yes this rare bit of syntactic sugar doesn't have many supporters.

Posted by

FxFactory

Add a Comment

How to prevent TSAN from reporting data-races that are intentional and considered safe?

Replies