Bring your PyTorch models to Core ML and discover how you can leverage on-device machine learning in your apps.
The PyTorch machine learning framework can help you create and train complex neural networks. After you build these models, you can convert them to Core ML and run them entirely on-device, taking full advantage of the CPU, GPU, and Neural Engine.
Discover how the coremltools package can directly convert TorchScript models, and learn more about handling custom operations and conversion errors which may occur.
To learn even more about Core ML Converters, we recommend the video "Get models on device using Core ML Converters" from WWDC20.
Hello. My name is Steve, and I’m an engineer at Apple. Hi. I’m Paul. I’m also an engineer. In this video, we are going to walk you through a deep dive into one of the new aspects of Core ML, converting PyTorch models to Core ML.
At WWDC 2020, we announced an overhaul to Core ML converters that improved many aspects of the conversion process.
We’ve expanded support for the libraries most commonly used by the deep learning community. We’ve redesigned the converter architecture to improve user experience, leveraging a new in-memory representation. And we’ve unified the API so there’s a single call to invoke conversion from any model source. If you haven’t already seen it, I definitely recommend you check out the video that goes into the details of this new converter architecture. But in this video, I’m going to focus on model conversion, starting with a model built in the PyTorch deep learning framework.
So maybe you’re an ML engineer who’s been hard at work training a model using PyTorch. Or maybe you’re an app developer who’s found a killer PyTorch model online and now you want to drop that model into your app. Now the question is how do you convert that PyTorch model into a Core ML model? Well, the old Core ML converter required you to export your model to ONNX as a step in the process. And if you’ve used that converter, you might have run into some of its limitations. ONNX is an open standard, and so it can be slow to evolve and introduce new features. Compounding that, ML frameworks like PyTorch need time to add support for exporting their latest model features to ONNX. So with the old converter, you might have found yourself with a PyTorch model that you couldn’t export to ONNX, blocking its conversion to Core ML. Well, removing this extra dependency is just one of the things that’s changed in the new Core ML converter.
So in this video, we’ll dig into the details of the brand-new PyTorch model conversion path. We’ll walk through the different ways of converting a PyTorch model into Core ML, including some real-world conversion examples. And finally, I’ll share some helpful tips for you to follow if you run into trouble along the way.
So now let’s dive into the new conversion process.
Starting with the PyTorch model you want to convert, you’ll use PyTorch’s JIT module to convert to a representation called TorchScript. If you’re curious, JIT is an acronym that stands for Just In Time. Then with a TorchScript model in hand, you’ll invoke the new Core ML converter to generate an ML model which you can drop into your app. Later in the video, I’ll dig into what that TorchScript conversion process looks like. But now let’s look at how the new Core ML converter works. The converter is written in Python, and invoking it only takes a couple lines of code. You simply provide it with a model, which can either be a TorchScript object or the path to one saved on disk, and a description of the inputs to the model. You can also include some information about the outputs of the model, but that’s optional. The converter works by iterating over the operations in the TorchScript graph and converting them one by one to their Core ML equivalent. Sometimes one TorchScript operation might get converted into multiple Core ML operations. Other times, the graph optimization pass might detect a known pattern and fuse several operations into one.
Now, sometimes a model might include a custom operation that the converter doesn’t understand. But that’s okay. The converter is designed to be extensible, so it’s easy to add definitions for new operations. In many cases, you can express that operation as a combination of existing ones, which we call a "composite op." But if that isn’t sufficient, you can also write a custom Swift implementation and target that during conversion. I won’t get into the details of how to do that in this video, but please check out our online resources for examples and walk-throughs. Now that I’ve given an overview of the whole conversion process, it’s time to circle back and dig into how to get a TorchScript model from your PyTorch model. There are two ways PyTorch can do this. The first is called "tracing," and the second is called "scripting." Let’s first look at what it means to trace a model.
Tracing is done by invoking the trace method of PyTorch’s JIT module, as shown in this code snippet. We pass in a PyTorch model along with an example input, and it returns the model and TorchScript representation. So what does this call actually do? The active tracing runs an example input through a forward pass of the model and captures the operations that are invoked as the input makes its way through the model’s layers. The collection of all those operations then becomes the TorchScript representation of the model. Now when you’re picking an example input to trace with, the best thing to use is data similar to what the model will see during normal use. For instance, you could use one sample of validation data or data captured the same way your app will present it to the model. You could also use random data. If you do, make sure that the range of the input values and the shape of the tensor is consistent with what the model expects. Let’s make all of this a little more concrete by working through an example. I’d like to introduce my colleague Paul, who will take you through the full process of converting a segmentation model from PyTorch to Core ML. Thanks, Steve. Suppose I have a segmentation model, and I would like it to run on-device. If you aren’t familiar with what a segmentation model does, it takes an image and assigns a class probability score to each pixel of that image. So how would I get my model running on-device? I’m going to convert my model into a Core ML model. To do this, I first trace my PyTorch model to turn it into TorchScript form using PyTorch’s JIT tracing module. Then I use the new Core ML converter to convert the TorchScript model into a Core ML model. Finally, I will show off how the resulting Core ML model integrates seamlessly into Xcode. Let’s see what this process looks like in code.
In this Jupyter Notebook, I will convert my PyTorch segmentation model, mentioned in the slides, into a Core ML model. If you’d like to try this code for yourself, it is available in the code snippet associated with this video. First, I import some dependencies that I will use for this demo.
Next, I load in the ResNet-101 segmentation model from torchvision and a sample input: in this case, an image of a dog and cat.
PyTorch models take in tensor objects, not PIL Image objects. So I convert the image to a tensor with transforms.ToTensor. The model also expects an extra dimension in the tensor denoting the batch size, so I add that in as well.
As mentioned in the slides, the Core ML converter accepts a TorchScript model. To obtain this, I use the Torch.JIT module’s trace method, which converts a PyTorch model to a TorchScript model.
Uh-oh. Tracing has thrown an exception. As it says in the exception method, "Only tensors or tuples of tensors can be output from traced functions." This is a limitation of PyTorch’s JIT module. The problem here is that my model is returning a dictionary. I solve this by wrapping my model in a PyTorch module that extracts only the tensor value from the output dictionary. Here I declare my class wrapper that inherits from PyTorch’s module class.
I define a model attribute which contains ResNet-101, as used above. In the forward method of this wrapping class, I index the returned dictionary with the key named "out" and return just the tensor output.
Now that the model returns a tensor and not a dictionary, it will successfully trace.
It is now time for me to utilize the new Core ML converter. First, I need to define my input and its preprocessing.
I define my input as an ImageType with preprocessing that normalizes the image with ImageNet statistics and scales its values down to lie between 0 and 1. This preprocessing is what ResNet-101 expects.
Next, I simply call the Core ML tools convert method, passing in the TorchScript model and the input definition.
After conversion, I’ll set the metadata of my model so it can be understood by other programs such as Xcode. I set the type of my model to segmentation and enumerate the classes in my model’s order.
So, does my converted model work? I can easily visualize my model’s output through Xcode. First, I’ll save my model.
Now all I need to do is click on my saved model in Finder, and it will be opened by Xcode. Here I can view its metadata, including input shapes and types.
To visualize the model’s output, I’ll go to the Preview tab and drag in my sample image of a dog and cat.
Looks like my model is successfully segmenting the pets in this image.
ResNet-101 was able to be traced, but some models cannot just be traced. To explain how to get these other models to convert, I’ll kick it back to Steve.
Thanks, Paul. Okay. I think we have a pretty good handle on how conversion works using tracing. But PyTorch offers a second way to get TorchScript. So now let’s dig into that one, which is called "scripting." Scripting works by taking a PyTorch model and directly compiling into TorchScript operations. Remember, tracing captured the model as data flowed through it. But like tracing, scripting a model is also really easy. Simply invoke the script method of PyTorch’s JIT module and provide it with a model. Okay. I’ve shown you two different ways to get a TorchScript representation, and you might be wondering when to use one versus the other. One case where you must use scripting is if the model includes control flow. Let’s look at an example to understand why. Here, this model has branches and loops, and scripting will capture all of it because it is directly compiling the model. If we traced the model, what we get is only the path through the model for the given input, which you can see doesn’t capture the whole model. If you do need to script a model, you’ll usually get the best results if you trace as much of the model as possible and script only the parts of the model that need it. This is because tracing usually produces a simpler representation than scripting does. Let’s see how to apply this idea by looking at some code. In this example, I’ve got a model that runs some chunk of code a fixed number of times inside a loop. I’ve isolated the body of the loop into something that can easily be traced on its own, and then I can apply scripting to the model as a whole. What we’re basically doing is limiting the scripting to just the bits of control flow that need it and then tracing everything else. This mixing of tracing and scripting works because they both will skip over code that’s already been converted to TorchScript.
Now it’s time to go through a concrete example that uses scripting. I’ll hand it back over to Paul, who will walk you through converting a language model. Hey there. Suppose I have a sentence completion model that I want to convert into a Core ML model so it can run on-device. For some context, sentence completion is a task that involves taking a sentence fragment and using a model to predict the words that are likely to come after it. So what does this look like in terms of computation steps? I’ll start with a few words of a sentence fragment and pass them through what’s called an encoder that translates those words into a representation my model can understand. In this case, a sequence of integer tokens. Next, I’ll pass that sequence of tokens into my model, which will predict the next token in the sequence. I will continue feeding my model the partially constructed sentence, appending new tokens to the end until my model predicts a special end-of-sentence token, which means my sentence is completed. Now that I have a complete sentence of tokens, I’ll pass it through a decoder, which converts the tokens back into words. The middle part of this diagram, completing the list of tokens, is what I will convert into a Core ML model. The encoder and decoder are handled separately. Let’s make sure we understand what’s going on by looking at some pseudo-code. The core of my model is my next token predictor. For this, I will use Hugging Face’s GPT2 model. The predictor takes a list of tokens as inputs and gives me a prediction for the next token. Next, I’ll wrap some control flow around the predictor to continue until I see the end-of-sentence token.
Inside the loop, I append the predicted token to the running list and use that as the input to my predictor on every loop. When my predictor returns the end-of-sentence token, I’ll return the complete sentence for decoding. Now to see this whole process encode, let’s dive into the Jupyter Notebook. In this notebook, I’ll construct a language model that takes a sentence fragment and completes the sentence. Let’s get the imports out of the way.
Here is the code for my model.
My model inherits from torch.Module and contains attributes for the end-of-sentence token, the next_token_predictor model and the default token denoting the beginning of a sentence. In its forward method, just like in the slides, I’ve written a loop body that takes a list of tokens and predicts the next one. The loop continues until the end-of-sentence token is generated. When this happens, we will return the sentence.
As mentioned, my next token predictor will be GPT2, which will reside in the loop body.
I’m going to follow the practice of tracing the loop body separate from scripting the whole model. So I’ll run the JIT tracer on only my next token predictor. It takes a list of tokens as inputs, so for tracing, I’ll just pass in a list of random tokens.
I can see that the tracer emitted a warning telling me this trace might not generalize to other inputs. Note that this warning is from the PyTorch’s JIT tracer and not Core ML. It will be explained what’s going on here in the troubleshooting section later, but for now I’ll ignore this warning since there isn’t actually a problem. With the bulk of the loop body traced, I can instantiate my sentence finishing model and apply the JIT scripter to prepare it for conversion to Core ML.
Now with my TorchScript model, I convert it to a Core ML model just like in the segmentation demo.
Now I’ll see if my model can finish a sentence. I create a sentence fragment: in this case, "The Manhattan Bridge is." Then I run it through GPT2’s included encoder to get the fragment’s encoding, and then convert that list of tokens into a Torch tensor.
Next, I package up the input from my Core ML model, run said model and decode the outputs with GPT2’s included decoder.
Nice. The Core ML model was able to complete the sentence. Looks like it generated a statement about the Manhattan Bridge. You might run into bumps along the road as you trace and script your models to get them into a Core ML format. I’ll hand it back to Steve to help you along the way.
Before we wrap up, I want to review the snags we hit when converting PyTorch models to Core ML and go over some troubleshooting tips and best practices. Thinking back to the segmentation demo, remember we encountered an error during tracing. This was because our model returned a dictionary and JIT tracing can only handle tensors or tuples of tensors. The solution we showed in the demo was to create a thin wrapper around the model that unpacks the model’s native outputs. Remember, in this example, the model returned a dictionary, so here we’re accessing the dictionary key that represents the inference result and returning that tensor. Of course, this idea also works if we want to access and return multiple items from the dictionary or if we need to unpack other types of containers.
Now during the language model demo, we encountered a tracer warning that said the trace might not generalize to other inputs. And we see the tracer helpfully print the troublesome line of code. So what’s actually going on? If we look at the model source code to understand the warning, we see that the model is slicing one tensor based on the size of another tensor. Getting the size of a tensor results in a bare Python value-- in other words, not a PyTorch tensor-- and the tracer is warning that it can’t trace the math operations being performed on these bare Python values. However, in this case the tracer is a little too aggressive in emitting this warning, and there isn’t actually a problem.
A good rule of thumb when it comes to tracing code that operates on bare Python values is that only built-in Python operations will be captured correctly by the tracer. Here are a few examples to help explain this idea. Let’s think through these and figure out, based on that rule of thumb, if they will be traced correctly or not. The first example is very similar to what we saw during the demo and will result in a correct trace since a built-in operation, in this case addition, is being applied. The second example also will trace correctly, in this case using the modulo operator, which again is a built-in operation.
But the third example won’t trace correctly. The JIT tracer doesn’t know what the library function math.sqrt does, and the traced graph will have a constant value recorded instead of the operations to compute the tensor size and square root. But with a simple fix to the model to replace math.sqrt with Python’s built-in power operator, this will result in a correct trace. Now let’s look at a case where scripting a model can fail. This model starts with an empty list and successively appends a fixed set of integers to it. Keep in mind this isn’t a terribly useful model. I’m just using it to illustrate a failure condition. If I script this model, I’ll get a runtime error that hints at a type mismatch. The JIT scripter needs type information to turn a model into TorchScript and does a pretty good job inferring object types from context. However, there are times when that’s not possible, and if the scripter can’t figure out an object’s type, it assumes the object is a tensor. In this case, it’s assuming this list is a list of tensors while it’s actually being built as a list of integers. So what can I do to help the scripter out? Well, I can either include meaningful initialization of the variable or I can use type annotations. Here, I’ve adjusted the model to show examples of both.
There’s one last thing I want to mention. You always want to make sure your model is in evaluation mode before tracing. This ensures that all the layers are configured for inference rather than training. For most layers, this doesn’t matter. But, for example, if you have a dropout layer in your model, setting evaluation mode will make sure it’s disabled. And when the converter encounters operations that have been disabled, it will treat them as pass-through operations. We’ve covered a lot of material in this video, but you can find even more information in the links associated with the video, including the Core ML converter documentation, information about custom op conversion and many detailed TorchScript examples. We’re really excited to provide first-class support for converting PyTorch models. I hope you’ll find that the new Core ML converter will enable broader support for your PyTorch models, empower you to have optimized on-device model execution and really provide you with maximum support to get your model converted easily. Thanks for watching.