Streaming is available in most browsers,
and in the WWDC app.
Improve app size and runtime performance
Learn how we've optimized the Swift and Objective-C runtimes to help you make your app smaller, quicker, and launch faster. Discover how you can get access to efficient protocol checks, smaller message send calls, and optimized ARC simply when you build your app with Xcode 14 and update your deployment target.
- Have a question? Ask with tag wwdc2022-110363
- Search the forums for tag wwdc2022-110363
♪ ♪ Ahmed: Hi, my name is Ahmed, and I work on the Clang and Swift compilers. In this session we're going to dive deep into changes we've made to make common Swift and Objective-C operations faster and more efficient, so that we can improve your app's size and runtime performance.
When you write code in Swift or Objective-C, you're always really interacting with two major components. First, you build using Xcode, and that uses the Swift and Clang compilers. But when you run your app, a lot of the heavy lifting is done in the Swift and Objective-C Runtime. The runtime is embedded in the operating systems for all of our platforms. What the compiler cannot do at build time, the runtime does, well, at run time. We're going to look at several improvements we've made in both compilers and runtimes. Now, this session is a bit unusual; there are no new APIs, language changes, or new build settings. You don't have to change your code, so all these improvements are transparent to you, the developer. Let's dive in. We're going to look at four improvements. We've made protocol checks in Swift more efficient, we've also made Objective-C message send calls smaller, as we did retain and release calls, and finally, we've made autorelease elision faster and smaller. Let's take a closer look.
Let's start with protocol checks in Swift.
Here we have a CustomLoggable protocol. It has a read-only computed property customLogString, and we can use it in our log function, that has special handling for CustomLoggable objects. Later, we're defining an Event type with name and date fields. And we're conforming to the CustomLoggable protocol, by defining the getter for the customLogString property.
And this lets us pass Event objects to our 'log' function. When we execute this code, the 'log' function needs to check whether the value we passed conforms to the protocol. And it does that using the 'as' operator. You may also have seen the 'is' operator.
Whenever possible, this check is optimized away at build time, in the compiler. However, we don't always have enough information yet. So this often needs to happen in the runtime, with the help of protocol check metadata we compute earlier. With this metadata, the runtime knows whether this particular object really does conform to the protocol, and the check succeeds.
Part of the metadata is built at compile time, but a lot can only be built at launch time, particularly when using Generics.
When you use a lot of protocols, this can add up to hundreds of milliseconds. On real-world apps, we've seen this take up to half of the launch time. With the new Swift runtime, we now precompute these ahead of time, as part of the dyld closure for the app executable and any dylib it uses at launch. Best of all, this is enabled even for existing apps when running on iOS 16, tvOS 16, or watchOS 9. If you'd like to learn more about dyld and launch closures, watch the talk "App Startup Time: Past, Present, and Future." That was protocol checks in Swift.
Let's move on to message send.
With the new compilers and linker in Xcode 14, we've made message send calls up to 8 bytes smaller, down from 12, on ARM64. As we'll see in just a moment, message send is really everywhere, so this adds up, and we've seen up to 2% code size improvements on binaries. This is enabled automatically when building with Xcode 14, even if you use an older OS release as deployment target. It defaults to a balance of size win and performance, but you can opt into optimizing for size only, using the objc_stubs_small linker flag. Now let's look into what changed. So let's start with an example. Here we're trying to make an NSDate for the start day of the conference. We start by making an NSCalendar, then we fill out NSDateComponents, make a date out of that, and finally return it. Now let's look at the assembly the compiler generates. Now, the details of the assembly aren't super important. Us compiler folks stare at it all day so that you don't have to. What's important is that almost every line here ends up needing an instruction to call objc_msgSend, even when doing property accesses like we do for the date components. This is because at compile time, we don't know which method to call, and it's only the objc runtime that does. So we call into the runtime using objc_msgSend to ask it to find the right method. Let's focus on one of these calls. We already mentioned the instruction to call objc_msgSend. But there's more. To tell the runtime which method to call, we have to pass a selector to these objc_msgSend calls. That needs a couple more instructions to prepare the selector. When we look at the binary, each of these instructions takes a little bit of space. On ARM64, that's 4 bytes each. So for each of these objc_msgSend calls, we're using 12 bytes, and we need that for every single one of these calls; that really adds up. Let's see what we can do to improve that.
Now, as we've seen before, 8 of those bytes are dedicated to preparing the selector. Interesting thing is, for any given selector, it's always the same code. And this is where our optimization comes in. Since this is always the same code, we can share it and only emit it once per selector instead of every time we do a message send. We can take it out and put it into a little helper function, and call that function instead. Over many calls using the same selector, we can save all these instruction bytes. We call this helper function a "selector stub." We still need to call the real objc_msgSend function, though, so we continue onto that. And again, that has another, different, indirection to load the address of the function itself and call it. The details aren't important, but what's important is that we need another several bytes of code to do that.
And this is where you can choose which mode you want, as I mentioned earlier. We can either keep these two little stub functions separate, like we've done here. We get to share the most code, and make these functions as small as possible. But unfortunately, this would do two calls back to back, which is not ideal for performance. So we can further improve this with an alternative version. We can take these two stub functions we've created, combine them into one. That way, we keep the code closer together and we don't need as many calls. And that's on the right here.
So these are the two options. You can choose whether to optimize for size alone, and get the maximum size savings available. You can enable that using the -objc_stubs_small linker flag, or you can use the code generation that provides size benefits while keeping the best performance. And unless you're severely size-constrained, we recommend using this, and that's why it's the default. And that was smaller message send using stubs. Another improvement we've made is making retain/release cheaper. With the new compilers in Xcode 14, retain/release calls are now up to 4 bytes smaller, down from 8 on ARM64. As we'll see in just a moment, just like message send, retain/release is also everywhere. So this adds up, and we've seen up to 2% more code size improvements on binaries. Now, unlike message send stubs, this does need runtime support, so you'll get this automatically as you migrate to a deployment target of iOS 16, tvOS 16, or watchOS 9. Now let's look into what changed. Let's go back to our example. We talked about msgSend calls, but with automatic reference counting, or ARC, we also end up with a lot of retain/release calls inserted by the compiler. At a very high level, whenever we make a copy of a pointer to an object, we need to increment its retain count to keep it live. And here, this happens with our variables cal, dateComponent, and theDate. We do that by calling into the runtime, using objc_retain. When the variables go out of scope, we then need to decrement the retain count using objc_release. Of course, part of the benefit of ARC is all the compiler magic that eliminates a lot of these calls, to keep them to a minimum. And we're going to go into one of these magic tricks a little bit later. But even with all the magic, we still often need these calls. In this example, we end up needing to release our local copies of calendar and dateComponents.
Under the hood, these objc_retain/release functions are just plain C functions; take a single argument, the object to be released. So with ARC, the compiler inserts calls to these C functions, passing the appropriate object pointers. Because of that, these calls have to respect the C calling convention, defined by our platform Application Binary Interface, or ABI. Concretely, what that means is that we need even more code to do these calls, to pass the pointer in the right register. So we end up with a few additional 'move' instructions just for that. And that's where our new optimization comes in. By specializing retain/release with a custom calling convention, we can opportunistically use the right variant depending on where the object pointer already is, so that we don't need to move it. Concretely, what this means is, we get rid of a bunch of redundant code for all these calls. And again, while this may not seem like much for these puny little instructions, over an entire app, it really adds up. That's how we made retain/release operations cheaper. Finally, let's talk about autorelease elision. Now this one is even more interesting. With objc runtime changes, we've made autorelease elision faster. That happens automatically for existing apps when you run them on the new OS releases. Building on top of that, with additional compiler changes, we also made the code smaller. And you'll get this size benefit automatically as you migrate to a deployment target of iOS 16, tvOS 16, or watchOS 9.
Now this is all great, but what's autorelease elision in the first place? Let's go back to our example. I mentioned earlier that ARC already gives us a lot of compiler magic to optimize retains and releases. So let's focus on one case here: autoreleased return values. In this example, we made a temporary object, and we're returning it to our caller. So let's look at how that works. So we have our temporary theDate, we return it, the call completes, and the caller saves it to its own variable. So let's see how that works with ARC. ARC inserts a retain in the caller, and a release in the called function. Here, when we return our temporary object, we need to release it first in the function, because it's going out of scope. But we can't do that just yet, because it doesn't have any other references yet. If we did release it, it would be destroyed before we even return, and that's no good. So a special convention is used to be able to return the temporary. We autorelease it before the return so that the caller can then retain it. You've likely seen autorelease and autoreleasepools before: it's simply a way to defer a release until some later point. Runtime doesn't really make any guarantees as to when the release happens, but as long as it's not right here, right now, it's convenient, because it lets us return this temporary object. Now, this isn't free. There is some overhead to doing an autorelease. This is where autorelease elision comes in. So to understand how that works, let's look at the assembly and retrace this return. When we call autorelease, that goes into the objc runtime, and that's where the fun begins. The runtime tries to recognize what's happening: that we're returning an autoreleased value. To help it out, the compiler emits a special marker that we never use otherwise. It's there to tell the runtime that this is eligible for autorelease elision. And it's followed by the retain, that we will execute later. But right now, we're still in the autorelease, and when we do it, the runtime loads the special marker instruction, as data, and compares it to see if it is the special marker value it expects. If it is, that means the compiler told the runtime that we're returning a temporary that will immediately be retained. And this lets us elide, or remove, the matching autorelease and retain calls. And that's autorelease elision.
However, this is not free either: loading code as data isn't something that's super common otherwise, so it's not optimal on the CPUs. We can do better. So let's retrace the return sequence again, this time using the new way. We started at the autorelease. That still goes into the Objective-C runtime. At this point, we actually already have valuable information: the return address. It tells us where we need to return to after this function completes execution. So we can keep track of that. Thankfully, getting the return address is very cheap. It's just a pointer, and we can store it on the side. We then leave the runtime autorelease call. We return to the caller, and we re-enter the runtime when doing the retain. And this is where the new bit of magic happens. At that point, we can look at where we are and get a pointer to our current return address. In the runtime, we can compare this pointer we just got while doing the retain with the one we saved earlier when we were doing the autorelease. And since we're just comparing two pointers, this is super cheap. We don't need to do expensive memory accesses. If the comparison succeeds, we know we can elide the autorelease/retain pair, and we get to improve some performance.
And on top of that, now that we don't need to compare this special marker instruction as data anymore, we don't need it, so we can remove it. And that lets us save some code size as well. That's how we made autorelease elision faster and smaller.
We went through several Swift and Objective-C runtime improvements. Let's wrap up. When your app is run on the new OS, thanks to the improvements in the runtimes, Swift protocol checks are more efficient. Every time we try to do autorelease elision, that's faster too. Thanks to the new compilers and linker in Xcode 14 and message send stubs, you can save up to 2% of code size by rebuilding your app. And finally, when you update your deployment target to iOS 16, tvOS 16, or watchOS 9, you can further save another 2% by making retain/release calls smaller. Even more, thanks to the smaller autorelease elision sequence. I hope you enjoyed this deep dive into the Swift and Objective-C runtimes, and thanks for watching.
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.