Discover how you can track down hangs and delays in your app. We'll show you tools and methods to discover hangs and their causes, learn about anti-patterns that can lead to hangs, explore best practices for eliminating hangs like GCD, and provide guidance on when you should consider asynchronous code to improve your app performance.
Hi, my name is Anubhav, and I'm an engineer on the OS Performance Team. Today, I'm excited to share how you can understand and eliminate hangs from your application. We'll break this talk into four sections, starting with understanding, "what is a hang?" We'll then talk about the common causes for hangs and what to look out for when developing. After that, we'll discuss tools you can use to monitor and diagnose hangs. Finally, we will learn common strategies to eliminate hangs, and how to choose which ones work best for your app. Let's jump right in. Let's take a look at my new recipe app, Desserted, an application that shows you how to make my favorite drinks and desserts. This Mango Tango smoothie looks great. I'll tap on it to see how it's made. Hmm. Looks like nothing is happening. Wow. That took a lot longer than expected. The app was just stuck and would not accept any of my touches for a few seconds.
This experience can be described as "laggy," "slow," or "stuck." These are not words that me, or anyone else, wants used to describe their app. At Apple, we call this period of unresponsiveness, a "hang." To understand a hang, and what was happening in Desserted, we must first understand what an app's main runloop is.
The main runloop is a loop your application's main thread enters to run event handlers in response to incoming events, primarily user interactions. When a user interacts with an app, the runloop receives the event, processes it, and then updates the UI, if required. This all happens in one turn of the runloop, and on the main thread. This process repeats for each user input. This is how the main thread would look like with one turn of the runloop.
If processing the event takes a long time, there'll be a delay between the user input and any UI update. To make matters worse, events are buffered and cannot be handled by the main thread during a hang. If I interact with an app during a hang, that event is not handled until the current hang first terminates… …compounding hangs, one on top of another.
In general, a delay of over one second will always look like a hang, though a shorter delay can be perceived as one. For example, a half-second delay while scrolling is jarring, but the same delay is far less noticeable when entering a view.
By eliminating hangs, your apps will be snappy, quick, and responsive.
Now that we know a hang is, let's look into what commonly causes them.
A hang occurs when there's too much work being done on the main thread. To determine exactly what that work is, we have to look at what the main thread is doing while processing the event. This time can be split into two cases. Either the main thread itself is busy doing work-- this can be a single long task or many short ones-- or the main thread is blocked by another thread or system resource. Let's start by looking at common causes for the main thread being busy. Proactively doing work is doing more than whats necessary to update UI, keeping the main thread busy for longer.
In Desserted, the Recipe View only displays image tiles for four out of the many ingredient images. If the main thread were to load all ingredient images at once, it would spend time reading, preparing, and compositing each and every image. Most of the work that's happening won't even affect what a user sees.
The view only shows four images, and only those four need to be immediately generated.
Another cause for hangs is performing irrelevant work on the main thread. Note that the main thread services blocks from the main dispatch queue, but it can also service blocks from other queues via dispatch sync. Anytime a queue dispatch syncs onto another queue, all pending blocks on the other queue have to execute before the newly enqueued one. Consider an app with a low priority serial dispatch queue, perhaps a maintenance queue. If the main thread dispatch synced a block onto the maintenanceQueue, it would have to wait for all pending blocks on that queue to execute before the enqueued block runs. Only a fraction of time was spent doing work meant for the main thread.
Similarly, if a block is dispatched onto the main queue from another queue, that block has to execute on the main thread.
This holds even if the block is enqueued via a dispatch async.
One more cause for hangs is using suboptimal API. There are many ways to accomplish a task. Be sure to read API docs so you can use the best one for the task at hand.
Desserted adds rounded corners to all images in the recipe view, though doing so also adds latency when entering this view.
To add rounded corners, Desserted uses a bitmap-based graphics context to convert an image to a bitmap, apply a UIBezier path on that bitmap, then convert that bitmap back to an image. This set of operations is CPU intensive, uses lots of memory, and can take a long time. This is because the wrong hardware is being used for the job. Instead of using the CPU, I should leverage the GPU.
By using CoreAnimation methods on a layer, adding rounded corners is easy and instant. This is just one example of using the wrong API for the job at hand.
Now that we've looked into some common reasons the main thread of an app can be busy, let's investigate why it can be blocked.
Synchronous APIs block execution from the time they're called to the time they return. These should not be used on the main thread if the API does a lot of work or has the potential to block for a long period of time. Apart from delays, these also add an additional point of failure.
One such case is if the main thread of an app makes synchronous requests to the network. For those with 5G, there may not be any delay. For those with slower network speeds, this may take longer. And for those with very bad signal, this may hang indefinitely. There's no guarantee on how long this can take, which is why such synchronous operations should be avoided on the main thread.
Another way to block the main thread is on a system resource, as these are often constrained. File I/O is one of the most commonly used, and contended, system resources. Latencies are dependent on hardware, and other reads and writes happening at the same time, things that can be beyond the app's control. So, apps need to do what they can to defend against hangs by avoiding I/O on the main thread.
Data stores, which do not support concurrency, are especially problematic. If the main thread attempts to read from one while writes are already occurring, that read would be pushed out until all writers complete, which may be unbounded.
Another cause for hangs is synchronization. By definition, synchronization primitives can block execution, so it is important to limit, and be cautious of, synchronizing from the main thread.
The thread it synchronizes with can take a long time to release an implicit or explicit lock. These are some common primitives to look out for, including the @synchronized directive, dispatch sync, os unfair lock, and posix locks. Specifically, be aware of semaphore use, as they do not propagate priority and can lengthen a hang due to preemption. A common anti-pattern is when trying to make asynchronous API act synchronous by waiting on a semaphore. This should always be avoided on the main thread.
One more way to block the main thread is by doing work, IPC, or using system resources to fetch the value of something which doesn't change often.
In Desserted, there's an icon for a social feature, which is only shown if I have added a contact as a friend. Querying all contacts on every tap into this view is one way to check, though it adds unnecessary overhead and delays, since the main thread blocks on frameworks, which are performing expensive operations under the hood. Furthermore, the value I'm fetching does not change often, so querying this frequently is unnecessary and adds burden on system resources.
The state of system resources, such as CPU, memory, and storage play a large part in when hangs occur. Different hardware and device conditions in the field mean real world scenarios will be significantly different than those encountered while testing on desk. It is important you do what you can to defend against these cases by having robust tests and using the oldest-supported hardware as a benchmark. The high-level cause for hangs is too much work being done on, or on behalf of, the main thread. To ensure performance, it's important the main thread of your application focuses on what's necessary to update UI.
Now that you know common causes for hangs, let's talk about some helpful tools you can use to monitor and triage hangs in your app, both during development and production.
In order to triage a hang, you would want to know what your app is doing during that time. The time profiler instrument lets you do that by showing your application’s callstacks over time, indicating exactly what's executing. The system trace instrument adds more context with data on system calls, VM faults, I/O, as well as inter and intra-process interactions. For more information, check out the "System Trace in Depth" talk from 2016. Now, I'll use the time profiler and system trace instruments to find what's causing Desserted to hang.
After taking a trace of the hang, here's how it looks like opened in Instruments. In the system trace output, the red line indicates system calls, the purple graph indicates virtual memory faults, and the horizontal blue bar indicates the main thread is busy doing work. The next step is to see what this work is. The time profiler allows you to do just that. It presents a call tree by aggregating main-thread callstacks for the duration of the 4.7 second hang. The highlighted portion of the tree illustrates that 4.6 seconds of this hang was due to a loadAllMessages method in the Recipe View. This pattern looks familiar. Desserted might be loading in more images than needed.
Once your application has shipped, you can use MetricKit to collect call trees for hangs hit in the field. This enables you to prioritize fixes based on which issues customers are most commonly hitting. To learn how to use MetricKit for hangs, check out the "What's new in MetricKit" talk from 2020. I've shipped Desserted and have some hang reports from MetricKit. Let's look at one of them to see if it's similar to the hang we just triaged.
MetricKit returns a call tree by aggregating callstacks taken during a hang. This tree format is similar to what the time profiler presents.
The highlighted portion indicates this hang is different from the one we just investigated with Instruments. This one is due to a new social feature I added, which blocks on a dispatch queue querying contacts. Without MetricKit, I may not have found this issue, and it would still persist in the field. When fixing hangs, it is important to baseline and quantify the performance of your app. The Xcode organizer does this by showing performance metrics, including a chart displaying hang rate per app version. This is especially helpful when triaging regressions. Check out these two talks for more information on the Xcode Organizer.
Now, let's go over some common strategies you can use to fix hangs in your app.
Each of these strategies can address multiple causes for hangs. In order to know which fix is best for your app, you will need to look at their side affects and tradeoffs.
To eliminate and defend against hangs, reduce the amount of work done on the main thread.
This can be done in two ways. The first is to optimize the work already being done on the main thread to reduce execution time. The second is to move work off the main thread in a non-blocking manner to keep it responsive. Let's start by looking at ways to reduce main thread execution, starting with caching. Caches are a great way to quickly access frequently used assets or previously queried values. They're often an in-memory store, but can be persisted to disk, if needed across multiple app invocations. Formatted assets that can be needed later, like ingredient image tiles in Desserted, are great candidates for caching, since it's expensive to create these assets every time they're needed.
By caching these in an NSCache, the overhead of generating assets is replaced by a quick memory read. This would eliminate the hang we looked at in Instruments.
It is important to have an accurate cache invalidation mechanism to strike a balance between having stale data and constantly updating a cache. This work should happen asynchronously on a secondary dispatch queue to keep the main thread responsive to events.
Notification observers are another way to reduce work on the main thread. They allow your app to react to changes in a value or state, without having to do expensive, on-demand computation. Any class can post notifications, even your own. To find notifications from a specific class, check its API documentation. To find all observable system notifications, check out the Apple developer documentation page for NSNotification.Name. A great candidate for this is the social feature in Desserted.
By registering an observer for the abDatabaseChangedExternally notification, the main thread no longer has to wait on querying contacts. Once a notification comes in, the observer is invoked. In this case, it'll be updating a cached value.
To keep the main thread responsive, these updates should be asynchronous, which is achieved by dispatch_asyncing the handler to another queue.
Now, I provide the same feature as before, but without the hang we saw in the MetricKit log. Another way to eliminate hangs is by moving work off the main thread. First, we need to determine what this work should be. In general, important tasks providing critical information for the UI should remain on the main thread. Furthermore, all views and view controllers have to be created, modified, and destroyed on the main thread.
However, the computation needed to update a UI element can be offloaded to another thread, with a completion handler to perform the actual update on the main thread. This pattern is useful when computation is known to take a long time. Other less important, maintenance, or non-time-critical tasks should be performed asynchronously on another thread.
These would then run at a lower scheduling priority and can take longer to complete than work that's on the main thread. This is intentional and reflects the idea that the main thread should only perform critical work.
The most straightforward way to perform asynchronous operations from the main thread is to use asynchronous counterparts of synchronous APIs. Let's take networking as an example.
By using async NSURL counterparts to synchronous networking APIs, apps will be responsive. Asynchronous APIs are often indicated by the word "asynchronously" or the presence of a completion handler in the method name.
Grand Central Dispatch is a powerful multi-threading mechanism, which you can leverage in cases where there aren't async API variants, or the code you want to move off the main thread is your own. Grand Central Dispatch provides simple mechanisms to move any block of work to another thread, both synchronously or asynchronously. This makes GCD incredibly effective at eliminating most general causes for hangs.
To perform a block of work asynchronously on another thread, dispatch async that block to another dispatch queue.
A completion handler can be added within the asynced block by dispatching back to the main queue.
Grand Central Dispatch also enables you to pre-warm computation. By dispatch asyncing a task onto a queue, perhaps a prefetchQueue, the task will start executing while the main thread stays free to do other work. When these results are needed by the main thread, it can dispatch sync onto the prefetchQueue to wait for the task to complete. We've just touched the surface of what GCD can do. To learn more, check out the "Modernizing Grand Central Dispatch" talk from 2017.
Let's understand some tradeoffs with the solutions we just talked about. Caches use memory, so you should be cognizant of their size to avoid large memory growth. It is also important to ensure there's an accurate invalidation mechanism so values are not stale. Notifications can be chatty. When observing one, it is important to consider the frequency at which that notification fires. Adding a filter before handling or coalescing multiple notifications will reduce CPU churn. When using asynchronous APIs, it is important to know whether the operation in question should be asynced, particularly by first checking whether it is crucial for a UI update, as the operating system deprioritizes asynced work. When using Grand Central Dispatch to perform tasks asynchronously, you are changing the order at which tasks in your code execute. It is important to keep in mind what tasks have to be ordered on others to ensure your app does not break. Using dispatch_sync with serial queues is a great way to synchronize operations when needed. Compared to the severe impact hangs pose to user experience, these tradeoffs are always worth it.
Some thoughts to keep in mind while eliminating hangs are use Apple Frameworks and APIs. These are already compatible with a wide set of devices, are performant, and are constantly updated to be more efficient and effective. Perform improvements iteratively in your code. This way, you'll make targeted fixes, and be able to see the affects of individual changes. Be a good neighbor when using system resources. Using more resources than needed not only reduces the performance of your own app, but can also cause other slowdowns in the system.
Together, we experienced how jarring hangs can be, and just how important it is to defend against hangs in your app. Going forward, set performance baselines of your app via the Xcode organizer. During development and code reviews, watch out for anti-patterns that can cause hangs. We discussed seven of the most common ones. Diagnose issues that come up with the time profiler and system trace instruments, using MetricKit to prioritize issues customers are most frequently hitting. Eliminate any hangs you find by using caches, observing notifications, looking for asynchronous alternatives, or taking advantage of Grand Central Dispatch. By following these steps, your apps will have even better performance to provide the best possible user experience. Thanks for hanging around. [music]
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.