Explore Computer Vision APIs

Back to WWDC 2020

Explore Computer Vision APIs

Learn how to bring Computer Vision intelligence to your app when you combine the power of Core Image, Vision, and Core ML. Go beyond machine learning alone and gain a deeper understanding of images and video. Discover new APIs in Core Image and Vision to bring Computer Vision to your application like new thresholding filters as well as Contour Detection and Optical Flow. And consider ways to use Core Image for preprocessing and visualization of these results. To learn more about the underlying frameworks see "Vision Framework: Building on Core ML" and "Core Image: Performance, Prototyping, and Python." And to further explore Computer Vision APIs, be sure to check out the "Detect Body and Hand Pose with Vision" and "Explore the Action & Vision app" sessions.

Resources
Related Videos

WWDC21
- Extract document data using Vision
WWDC 2020
- Detect Body and Hand Pose with Vision
- Explore the Action & Vision app
WWDC 2019
- Text Recognition in Vision Framework
- Understanding Images in Vision Framework
Download

Hello and welcome to WWDC.
Welcome to WWDC. My name is Frank Doepke and together with my colleague David Hayward, we're going to explore Computer Vision APIs.
So why would you talk about Computer Vision? Computer Vision can really enhance your application. And even if it's not at the core of your business, it really brings something new to your application.
Let me give you an example. Banking applications allow you to deposit checks. They use Computer Vision for the camera to actually read the check for you, so you don't have to type in the information anymore. And clearly Computer Vision is not at the core of the banking industry. But by doing this you really can save a lot of steps for your user. They don't have to type anything in anymore.
Another thing might be that you want to just, for instance, read a QR code, or when you read a receipt. All of that may not be at the core of what you wanna do for your application, but it really makes it much easier for your users to do this by using the camera. So what APIs do we have available for Computer Vision? At the most high level part, we have VisionKit. It's the home of the VNDocumentCamera that you might have seen in Notes, or in Messages, or Mail to actually scan the document. Then we use Core Image to actually do the image processing of images, Vision for the analysis of images, and last but not least, Core ML to do the machine learning inference. Today we're gonna focus just on Core Image and Vision. But I wanna make sure that you don't just think of them as pillars that stand side by side. They can actually be nicely intertwined. I might actually want to do some image preprocessing, run it into Vision, take the results from there, feed them into Core ML, or back into Core Image to create some of the effects. Now to talk about how we want to use Core Image to preprocess images for Computer Vision, I would like to hand it over to my colleague David Hayward.
Thank you, Frank. I'd like to take this opportunity to describe how you can improve your Computer Vision algorithms using Core Image.
If you are unfamiliar with Core Image, it is an optimized, easy-to-use image processing framework built upon Metal. For a deep dive on how it works, I recommend you watch our WWDC 2017 presentation on the subject.
There are two primary reasons why your app should use Core Image with Vision.
Using Core Image to preprocess an input to Vision can make your algorithms faster and more robust.
Using Core Image to post-process the outputs from Vision can give your app new ways to show those results to your users.
Also, Core Image is a great tool to do Augmentation for Machine Learning training. There's some great examples of this in our presentation from WWDC in 2018.
One of the best ways to prepare an image for analysis is to downscale it for best performance. The scaler with the best overall quality is CILanczosScale.
It is very easy to use this filter in your code. All you need to do is import the CIFilterBuiltins header, create a filter instance, set the input properties, and then get the outputImage. It's that easy.
But that is just one of several resampling filters in Core Image. Depending on your algorithm, it may be better to use the linear interpolated CIAffineTransform.
Morphology operations are a great technique to make small features in your image more prominent.
Performing Dilate using CIMorphologyRectangleMaximum will make brighter areas of the image larger.
Performing Erode using CIMorphologyRectangleMinimum will make those areas smaller.
Better still, is to perform Close using CIMorphologyRectangleMinimum followed by CIMorphologyRectangleMaximum. And this is very useful for removing small areas of noise from your image that may affect the algorithm.
Some algorithms only need monochrome inputs, and for these, Vision will automatically convert RGB to grayscale. If you have domain knowledge about your input images, you might get better results using Core Image to convert to gray.
With CIColorMatrix you can specify any weighting you want for this conversion.
Or with CIMaximumComponent, the channel with the greatest signal will be used.
Noise reductions before image analysis is also worth consideration.
A couple passes of CIMedianFilter can reduce noise without softening the edges.
CIGaussianBlur and CIBoxBlur are also a fast way to reduce noise.
And consider using the CINoiseReduction filter too.
Core Image also has a variety of edge detection filters.
For a Sobel edge detector, you can use CIConvolution3X3.
Even better is to use CIGaborGradients, which will produce a 2D gradient vector that is also more tolerant of noise.
Enhancing the contrast of an image can aid in object detection.
CIColorPolynomial allows you to specify an arbitrary 3rd degree contrast function. CIColorControls provides a linear contrast parameter.
Core Image also has some new filters this year that can convert your image to just black and white.
For example, CIColorThreshold allows you to set the threshold value in your application code, while CIColorThresholdOtsu will automatically determine the best threshold value based on the image's histogram.
Core Image also has filters for comparing two images. This can be useful to prepare for detecting motion between frames of video.
For example, CIColorAbsoluteDifference is a new filter this year that can help with this.
Also, the CILabDeltaE will compare two images using a formula designed to match human perception of color.
These are just a sampling of the more than 200 filters built into Core Image.
To help you use these built-in filters, this documentation includes parameter descriptions, sample images, and even sample code.
And if none of these filters suit your needs, then you can easily write your own using Metal Core Image. And we recommend that you see our session on that that we also made available this year.
With image processing and Computer Vision, it is important to be aware that images can come in a wide variety of color spaces.
Your app may receive images in spaces ranging from the traditional sRGB, to wide gamut P3, even to HDR color spaces, which are now supported.
Your app should be prepared for this variety of color spaces, and the good news is that Core Image makes this very easy. Core Image automatically converts inputs to its working space, which is Unclamped, Linear, BT.709 primaries.
Your algorithm might want images in a different color space though. In that case, you should do the following. You will want to get a variable for the color space that you want to use from CGColorSpace. And you will call image.matchedFromWorkingSpace.
Apply your algorithm in that space, and then call image.matchedToWorkingSpace. That's all you need to do. My last topic today will be using Core Image to post-process the outputs from Vision. One example of this is using Core Image to regenerate a barcodeImage from a Vision BarcodeObservation.
All you need to do in your code is create the filter instance... set its barcodeDescriptor property to be that of the Vision observation, and lastly, get the outputImage. And the result looks just like this.
Similarly, your app can apply filters based on Vision face observations.
As an example, you can use a vignette effect very easily using this.
The code is actually very simple. One thing you need to be aware of is that you will need to convert from Vision's normalized coordinate system to Core Image's Cartesian coordinate system.
And once you create the vignette filter, you can then put that vignette over the image using compositing over.
You can also use Core Image to visualize vector fields, which Frank will be demonstrating later on.
That concludes my part of this presentation. Here's Frank to talk more about Vision.
All right. Thank you, David. So, now I'm gonna talk about how we can understand images by using Vision.
We have a task, the machinery, and the results. The task is what you wanna do. The machinery is what actually performs the work. And the results is, of course, what you're looking for-- what you want to get back. The task could be in our compiler, the VNRequests. Like a VNDetectFaceRectanglesRequest. The machinery is one of two. We have an ImageRequestHandler or a SequenceRequestHandler. And the results that we get back is what we call VNObservation. And these depend on which task you performed, like a VNRectangleObservation for detected rectangles.
We first perform the request on the ImageRequestHandler. And from there, we get our observations. Let's look at a concrete example.
We want to read text, so we use the VNRecognizeTextRequest.
Then I create an ImageRequestHandler with my image.
And out of that, I now get my observations, which is just a plain text.
So, what do we have new in 2020 in Vision? First, we have Hand and Body Pose. To see more about that, please look at the "Hand and Body Pose" session.
Then you might have seen our Trajectory Detection. And more about that, you can see in the "Exploring the Action and Vision Application." Today, we're just going to focus on the Contour Detection and on Optical Flow.
What is Contour Detection? With Contour Detection, I can find edges in my image.
As we saw here, the red lines now show the contours that we found in this graphic.
So we start with an image, and then we create our VNDetectContourRequest.
We can now set the contrast on the image to enhance, for instance, how some of the contrast may come out. We can switch between, do we want to run it on a dark background with this light background, which may separate the foreground versus background? Last but not least, we can insert the maximumImageDimension. That allows you to trade off the performance versus the accuracy.
That means, for instance, if you look at it at a lower resolution you will still get your contours but they might not follow the edge as closely, but it runs much faster because it can run at a lower resolution. In comparison, when we use a higher resolution, which you might want to do in some post-processing, we actually get much more accurate contours but it's gonna take a little bit longer because it has to do more work.
Let's look at the observation that we get back.
Here we have a very simple image of two squares with a circle in it.
We are getting back a VNContoursObservation.
The topLevelContours are our two rectangles that we see.
Inside of those we have childContours. They are nested and those are the circles.
Then we get back the contourCount which I can actually use to walk through all of my contours. But it's much more easier, for instance, to use the index path. As you can see, they are nested in each other and I can now traverse my graph.
Last but not least, I also get the normalizedPath. And this is a CGPath I can use easily for rendering.
Now, what is a VNContour? In our example we get a VNContour here... and that is the most Outer Contour, our Parent. Nested inside of it are childContours. These are the Inner Contours.
My contour has an index path and, of course, with that every childContour has the index path, which I can use again to traverse my graph.
Then I get the normalizedPoints in the pointCount. Now, that is actually the real meat of the contour because it describes each of the line segments that we discover. Because we didn't just discover pixels, we really get a contour which is a path.
We also have an aspectRatio. I'm gonna talk about that on the next slide.
And then we have the normalizedPath to render. When we want to work with contours, there's a few things we need to keep in mind. Let's look at this image that we have here.
It is 1920 by 1080 pixels, and we have a circle in the middle that is exactly 1080 pixels high and wide. But Vision uses a normalized coordinate space. So, our image is 1.0 high and 1.0 wide. Therefore, the circle now has a height of 1.0, but a width of 0.5625. So, if you wanna take the geometry of the shapes that you've detected into account, you need to look at the aspectRatio of the original image from which it was computed on.
Now, contours really get interesting when we can analyze them, and we have a few utilities for that available for you.
The VNGeometryUtils provides some API. For instance, we have the boundingCircle which is the smallest circle that completely encapsulates the contour that you detected. It's great for comparing contours with each other.
Then we can calculate the area. And we can calculate the perimeter. Now, the next part that you might want to do with contours is actually simplify them. When we get contours from an image they tend to be noisy. Let's look at our example here.
We have a rectangle that we've photographed. But it has little kinks in it and as you can see, the contour actually followed those little kinks. So, now I actually do not have all the points on just the corners, but even, like, on the middle.
I can now use the approximation of a polygon by using the Epsilon. Now, Epsilon means I can filter out all the little noise parts around an edge, so that only the strong contour edges will actually stay.
And now, I get, again, a perfect rectangle. And with that I just have four points. So, if I need to analyze shapes it's very simple for me, because I can simply say, "If it has four points, it's a quadrilateral," and I detected what kind of shape I have.
Let's look at a concrete example of how we can use all of this.
Let's say we need to save the world by resurrecting very old computer code that is done on punch cards.
So, our task is to identify the dimples on the punch card because nobody has a punch card reader anymore.
So, we search the web and find a Computer Vision blog post that talks about how to do this. But it's written in Python. Our task, of course, is to bring it natively onto our platform, so that we can run it in the best way possible.
So, now we have here a section of Python code. Don't worry if you don't understand Python. I'm just going to walk you quickly through it. The concept is very often always the same. We do some image processing first...
then we do some image analysis...
and we get some results back that we need to visualize. Now, there's one part, even if you don't understand Python that I would like to highlight here in the very beginning, the first three lines that you see. You see that we actually needed to import a few libraries. Now, they don't come with Python. These are third-party libraries that you need to actually include.
So, how do we do this natively? For the image processing part, we need to load the image. And you know how to do that already. You use CGImageSource, get a UI Image from it, load it into CIImage... You name it. Then you have the way of using Core Image to process the image by using CIFilters, like in the CIAbsoluteThreshold or many others as David has already explained.
Now we're doing the image analysis. For that we create our ImageRequestHandler from the CIImage that we just processed, and then we perform our request like the VNDetectContourRequest. Now, the beauty of this request is we might not even have to preprocess our image.
And then we visualize our results. Again, we can use Core Image to do this, which allows us to composite it right over the image that we actually have right into the same context. You might use the CIMeshGenerator, or the CITextGenerator.
But I can also use CoreGraphics or UIKit to render it into a layer on top of my image. All right, now, after all these slides, let's look at a real demo. Let me go over to my demo machine.
What I've prepared here is a little playground. And what you see is I've loaded my image.
I created my contourRequest...
and then I simply perform it. And there, voilà. I can see all the contours, including the dimples that I was looking for. Now, notice that I found 387 contours. So, this may be a bit more than actually I want. So, we need to filter out all these contours. Well, I was a little bit prepared here, and I've hidden a little bit of code. Let me uncover this piece of code. And all that is... I use my domain knowledge of knowing that my contours are actually on a blue background. So, I use now some CIFiltering to first blur out all the noise...
then I use my color controls to really bring out the contrast. And then I use my filtered image afterwards and run it through my Contour Detection. And you see now, I only find 32 contours, which is really just the dimples that I care about in the first place. All right, let's go back to our slides.
Normally, I would talk about what I did in my demo, but it's more important, actually, what I didn't have to do.
You noticed, I did not load any third-party packages, because this is all part of OS. All I used was UIKit, Core Image, and Vision.
I also never left our image pipeline while using our most optimal processing path because I stayed in our pipeline.
There was no conversion of the images into matrices, and with that, I really saved all my memory, and also a lot of computational cost.
So that was Contour Detection. Next, let's go to Optical Flow. So, what is Optical Flow? We want to analyze the movement between two frames.
Traditionally, we might have used registration. That has been a part of Vision for quite a while. It gives me the alignment of the whole image. Let's look at an example here. We have these two dots, and let's see if we take this as a picture with our camera, and then we shift our camera. Now, these two dots moved to the top and to the right.
The registration will give me the alignment between the two images by telling me how much the image has moved up and to the right.
Optical Flow, on the other hand, is different. It gives me a per pixel flow between X and Y, and that is new in Vision this year.
In our example, again, we have our two dots...
but now, they've actually moved apart.
So, the image registration is not going to pick this up correctly. But I can use the Optical Flow, because it's going to tell me, for each pixel, how they have moved. Let's look at the results of Optical Flow.
From the Optical Flow, I get the VNPixelBufferObservation. It's a floating point image. It has the interleaved X and Y movement.
So, when we have a video like this, you can imagine that perhaps just looking at these values on its own will be really hard to visualize what's going on. Because they're really just meant for processing in later algorithms. But if I want to check it out, I can actually use Core Image to visualize our results. And as David was teasing on earlier in our session, there is a way of doing this. We created a little custom kernel, and now, you can see how everything moves. I have a color-coding that shows me the intensity of the movement and the little triangles actually show me the direction of the movement.
Let me quickly show you how we did this. So, we wrote a custom filter. I need to load the kernel, which we'll make available in the slide attachments to you, and then, all I have to do is basically apply this kernel with the parameters for the size of the arrow that I want, and run it as a filter. Now, in my Vision code, all I'm going to do is, I run my VNGenerateOpticalFlowRequest, I get my observations to pixelBuffer, which I can just now wrap into a CIImage, and then, I simply feed that into my filter, and get the output image back.
So, let's wrap up what we've talked about today. Computer Vision doesn't have to be hard, and it really enhances your application. Our native APIs make it fast and easy to adopt. And by combining these things together, you really can create something interesting. I'm looking forward to all the great applications and great innovations that you're going to bring out. Thank you for attending our session, and have a great rest of the WWDC.

import UIKit
import CoreImage
import CoreImage.CIFilterBuiltins
import Vision


public func drawContours(contoursObservation: VNContoursObservation, sourceImage: CGImage) -> UIImage {
	let size = CGSize(width: sourceImage.width, height: sourceImage.height)
	let renderer = UIGraphicsImageRenderer(size: size)
	
	let renderedImage = renderer.image { (context) in 
		
		let renderingContext = context.cgContext
		
    // flip the context
    let flipVertical = CGAffineTransform(a: 1, b: 0, c: 0, d: -1, tx: 0, ty: size.height)
    renderingContext.concatenate(flipVertical)
        
		// draw the original image
		renderingContext.draw(sourceImage, in: CGRect(x: 0, y: 0, width: size.width, height: size.height))
		
		renderingContext.scaleBy(x: size.width, y: size.height)
		renderingContext.setLineWidth(3.0 / CGFloat(size.width))
		let redUIColor = UIColor.red
		renderingContext.setStrokeColor(redUIColor.cgColor)
		renderingContext.addPath(contoursObservation.normalizedPath)
		renderingContext.strokePath()
	}
	
	return renderedImage;
}

let context = CIContext()
if let sourceImage = UIImage.init(named: "punchCard.jpg")
{
	var inputImage = CIImage.init(cgImage: sourceImage.cgImage!)
	
	let contourRequest = VNDetectContoursRequest.init()
    
// Uncomment the follwing section to preprocess the image
//	do {
//			let noiseReductionFilter = CIFilter.gaussianBlur()
//			noiseReductionFilter.radius = 1.5
//			noiseReductionFilter.inputImage = inputImage
//
//			let monochromeFilter = CIFilter.colorControls()
//			monochromeFilter.inputImage = noiseReductionFilter.outputImage!
//			monochromeFilter.contrast = 20.0
//			monochromeFilter.brightness = 8
//			monochromeFilter.saturation = 50
//
//			let filteredImage = monochromeFilter.outputImage!
//
//			inputImage = filteredImage
//		}
	
	let requestHandler = VNImageRequestHandler.init(ciImage: inputImage, options: [:])

	try requestHandler.perform([contourRequest])
	let contoursObservation = contourRequest.results?.first as! VNContoursObservation
	print(contoursObservation.contourCount)
	_ = drawContours(contoursObservation: contoursObservation, sourceImage: sourceImage.cgImage!)
} else {
	print("could not load image")
}

23:05 - Optical Flow Visualizer (CI kernel)

//
//  OpticalFlowVisualizer.cikernel
//  SampleVideoCompositionWithCIFilter
//


kernel vec4 flowView2(sampler image, float minLen, float maxLen, float size, float tipAngle)
{
	/// Determine the color by calculating the angle from the .xy vector
	///
	vec4 s = sample(image, samplerCoord(image));
	vec2 vector = s.rg - 0.5;
	float len = length(vector);
	float H = atan(vector.y,vector.x);
	// convert hue to a RGB color
	H *= 3.0/3.1415926; // now range [3,3)
	float i = floor(H);
	float f = H-i;
	float a = f;
	float d = 1.0 - a;
	vec4 c;
		 if (H<-3.0) c = vec4(0, 1, 1, 1);
	else if (H<-2.0) c = vec4(0, d, 1, 1);
	else if (H<-1.0) c = vec4(a, 0, 1, 1);
	else if (H<0.0)  c = vec4(1, 0, d, 1);
	else if (H<1.0)  c = vec4(1, a, 0, 1);
	else if (H<2.0)  c = vec4(d, 1, 0, 1);
	else if (H<3.0)  c = vec4(0, 1, a, 1);
	else             c = vec4(0, 1, 1, 1);
	// make the color darker if the .xy vector is shorter
	c.rgb *= clamp((len-minLen)/(maxLen-minLen), 0.0,1.0);
	/// Add arrow shapes based on the angle from the .xy vector
	///
	float tipAngleRadians = tipAngle * 3.1415/180.0;
	vec2 dc = destCoord(); // current coordinate
	vec2 dcm = floor((dc/size)+0.5)*size; // cell center coordinate
	vec2 delta = dcm - dc; // coordinate relative to center of cell
	// sample the .xy vector from the center of each cell
	vec4 sm = sample(image, samplerTransform(image, dcm));
	vector = sm.rg - 0.5;
	len = length(vector);
	H = atan(vector.y,vector.x);
	float rotx, k, sideOffset, sideAngle;
	// these are the three sides of the arrow
	rotx = delta.x*cos(H) - delta.y*sin(H);
	sideOffset = size*0.5*cos(tipAngleRadians);
	k = 1.0 - clamp(rotx-sideOffset, 0.0, 1.0);
	c.rgb *= k;
	sideAngle = (3.14159 - tipAngleRadians)/2.0;
	sideOffset = 0.5 * sin(tipAngleRadians / 2.0);
	rotx = delta.x*cos(H-sideAngle) - delta.y*sin(H-sideAngle);
	k = clamp(rotx+size*sideOffset, 0.0, 1.0);
	c.rgb *= k;
	rotx = delta.x*cos(H+sideAngle) - delta.y*sin(H+sideAngle);
	k = clamp(rotx+ size*sideOffset, 0.0, 1.0);
	c.rgb *= k;
	/// return the color premultiplied
	c *= s.a;
	return c;
}

23:26 - Optical Flow Visualizer (CIFilter code)

class OpticalFlowVisualizerFilter: CIFilter {
	var inputImage: CIImage?
	
	let callback: CIKernelROICallback = {
			(index, rect) in
				return rect
			}
	
	static var kernel: CIKernel = { () -> CIKernel in
		let url = Bundle.main.url(forResource: "OpticalFlowVisualizer",
								  withExtension: "ci.metallib")!
		let data = try! Data(contentsOf: url)
		
		return try! CIKernel(functionName: "flowView2",
								  fromMetalLibraryData: data)
	}()

	override var outputImage : CIImage? {
		get {
			guard let input = inputImage else {return nil}
			return OpticalFlowVisualizerFilter.kernel.apply(extent: input.extent, roiCallback: callback, arguments: [input, 0.0, 100.0, 10.0, 30.0])
		}
	}
}

23:42 - Optical Flow Visualizer (Vision code)

var requestHandler = VNSequenceRequestHandler()
            var previousImage:CIImage?
			if (self.previousImage == nil) 
			{
				self.previousImage = request.sourceImage
			}
			let visionRequest = VNGenerateOpticalFlowRequest(targetedCIImage: source, options: [:])
			
			do {
				try self.requestHandler.perform([visionRequest], on: self.previousImage!)
				if let pixelBufferObservation = visionRequest.results?.first as? VNPixelBufferObservation
				{
					source = CIImage(cvImageBuffer: pixelBufferObservation.pixelBuffer)
				}
			} catch {
				print(error)
			}
			// store the previous image
			self.previousImage = request.sourceImage
			
			let ciFilter = OpticalFlowVisualizerFilter()
			ciFilter.inputImage = source
			let output = ciFilter.outputImage

Looking for something specific? Enter a topic above and jump straight to the good stuff.

An error occurred when submitting your query. Please check your Internet connection and try again.

Resources

Related Videos

WWDC21

WWDC 2020

WWDC 2019