How do I persist os_signposts so they don't get lost when the in-memory buffer overflows?

I have a tool that generates thousands of signpost events a second. Unfortunately, when profiling earlier signposts appear to get lost over the course of the run because they are not persisted and the buffer for these runs out. How can I get these to persist? For reference, I've filed a feedback for this as FB12129526, and more context can be found (in meme form) here: https://federated.saagarjha.com/notice/AUm3AprLdVOnUPfiQy.

Replies

Thanks for filing a feedback on this!

The Unified Logging System tries to conserve system resources by storing different types of data for different durations and in different ways. This is achieved by using ring-buffers to save a specific number of messages. There are two main storage "locations" for logging: in-memory storage and persisted storage.

In-memory storage is cheap so most logging data goes there. But it's also fairly space limited, so depending on the incoming log traffic, messages might only be around for ~1 minute or so. Persisted log messages are written to disk. This is slower and consumes disk space, so it's not the default for most types of logging. When the logging system is asked to export its current buffers, e.g. directly via a log collect calls or indirectly by Instruments windowed mode ending, it will export all messages from the in-memory buffer and all the persisted buffers.

By default the following logic applies:

  • os_log_debug message => not stored at all (unless explicitly requested)
  • os_log_info messages => in-memory-buffer
  • os_log_default messages => persisted
  • os_signpost => in-memory buffer

So what seems to be happening in your cases is that the in-memory buffer was full, so by the time the logging system is asked to export the data in the in-memory buffers, only signposts from the last 30s are still in the buffer.

There are two ways to avoid this:

Stream log messages.

By streaming log messages, the unified logging system will send the messages (and signposts) over as soon as they occur. Streaming mode has a lot more overhead per log message than the buffered mode, but it ensures that no messages get lost due to buffer-overruns. However, due to the higher overhead it can happen more easily that the data-rate of logging gets too high so that streaming mode can not keep up with the incoming log messages and has to drop some due to that. You trade consistent cut-off before a specific time for "best effort to capture everything but might get overwhelmed at some point". This might work better in your circumstances depending on the data rate of messages that need to be streamed. E.g. when you can limit the log-messages you are interested in by constraining the target process or even the subsystem, you can often limit the data-rate enough to make streaming feasible and can then stream as many messages as you like. Instruments uses streaming mode when recording in immediate mode and deferred mode, so I recommend trying deferred mode given that you target only a single process and see whether you still get warnings about lost messages. If log messages get dropped, Instruments will detect this and warn you about it.

Change your subsystem's (category's) configuration to persist signposts.

While the defaults are setup as described above, you can change these defaults for each subsystem and category. The article Customizing Logging Behavior While Debugging describes the keys useable for os-log messages and how to create a plist for overriding the defaults for a subsystem or category. However, the same dictionary also supports a key called Signpost-Persisted to enable signpost persistence. Your plist for your subsystem could look like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>DEFAULT-OPTIONS</key>
  <dict>
    <key>Signpost-Persisted</key>
    <false/>
  </dict>
</dict>
</plist>

This should cause all signposts logged to the subsystem (which needs to match the name of the plist) to be persisted instead of just being stored in memory and should keep them around long enough to be picked up by Instruments' windowed mode later on (or by a log collect for that matter).