Apply computer vision algorithms to perform a variety of tasks on input images and video using Vision.

Vision Documentation

Posts under Vision tag

103 Posts
Sort by:
Post not yet marked as solved
3 Replies
1.6k Views
Summary: I am using the Vision framework, in conjunction with AVFoundation, to detect facial landmarks of each face in the camera feed (by way of the VNDetectFaceLandmarksRequest). From here, I am taking the found observations and unprojecting each point to a SceneKit View (SCNView), then using those points as the vertices to draw a custom geometry that is textured with a material over each found face. Effectively, I am working to recreate how an ARFaceTrackingConfiguration functions. In general, this task is functioning as expected, but only when my device is using the front camera in landscape right orientation. When I rotate my device, or switch to the rear camera, the unprojected points do not properly align with the found face as they do in landscape right/front camera. Problem: When testing this code, the mesh appears properly (that is, appears affixed to a user's face), but again, only when using the front camera in landscape right. While the code runs as expected (that is, generating the face mesh for each found face) in all orientations, the mesh is wildly misaligned in all other cases. My belief is this issue either stems from my converting the face's bounding box (using VNImageRectForNormalizedRect, which I am calculating using the width/height of my SCNView, not my pixel buffer, which is typically much larger), though all modifications I have tried result in the same issue. Outside of that, I also believe this could be an issue with my SCNCamera, as I am a bit unsure how the transform/projection matrix works and whether that would be needed here. Sample of Vision Request Setup: // Setup Vision request options var requestHandlerOptions: [VNImageOption: AnyObject] = [:] // Setup Camera Intrinsics let cameraIntrinsicData = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) if cameraIntrinsicData != nil { requestHandlerOptions[VNImageOption.cameraIntrinsics] = cameraIntrinsicData } // Set EXIF orientation let exifOrientation = self.exifOrientationForCurrentDeviceOrientation() // Setup vision request handler let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: exifOrientation, options: requestHandlerOptions) // Setup the completion handler let completion: VNRequestCompletionHandler = {request, error in let observations = request.results as! [VNFaceObservation] // Draw faces DispatchQueue.main.async { drawFaceGeometry(observations: observations) } } // Setup the image request let request = VNDetectFaceLandmarksRequest(completionHandler: completion) // Handle the request do { try handler.perform([request]) } catch { print(error) } Sample of SCNView Setup: // Setup SCNView let scnView = SCNView() scnView.translatesAutoresizingMaskIntoConstraints = false self.view.addSubview(scnView) scnView.showsStatistics = true NSLayoutConstraint.activate([ scnView.leadingAnchor.constraint(equalTo: self.view.leadingAnchor), scnView.topAnchor.constraint(equalTo: self.view.topAnchor), scnView.bottomAnchor.constraint(equalTo: self.view.bottomAnchor), scnView.trailingAnchor.constraint(equalTo: self.view.trailingAnchor) ]) // Setup scene let scene = SCNScene() scnView.scene = scene // Setup camera let cameraNode = SCNNode() let camera = SCNCamera() cameraNode.camera = camera scnView.scene?.rootNode.addChildNode(cameraNode) cameraNode.position = SCNVector3(x: 0, y: 0, z: 16) // Setup light let ambientLightNode = SCNNode() ambientLightNode.light = SCNLight() ambientLightNode.light?.type = SCNLight.LightType.ambient ambientLightNode.light?.color = UIColor.darkGray scnView.scene?.rootNode.addChildNode(ambientLightNode) Sample of "face processing" func drawFaceGeometry(observations: [VNFaceObservation]) { // An array of face nodes, one SCNNode for each detected face var faceNode = [SCNNode]() // The origin point let projectedOrigin = sceneView.projectPoint(SCNVector3Zero) // Iterate through each found face for observation in observations { // Setup a SCNNode for the face let face = SCNNode() // Setup the found bounds let faceBounds = VNImageRectForNormalizedRect(observation.boundingBox, Int(self.scnView.bounds.width), Int(self.scnView.bounds.height)) // Verify we have landmarks if let landmarks = observation.landmarks { // Landmarks are relative to and normalized within face bounds let affineTransform = CGAffineTransform(translationX: faceBounds.origin.x, y: faceBounds.origin.y) .scaledBy(x: faceBounds.size.width, y: faceBounds.size.height) // Add all points as vertices var vertices = [SCNVector3]() // Verify we have points if let allPoints = landmarks.allPoints { // Iterate through each point for (index, point) in allPoints.normalizedPoints.enumerated() { // Apply the transform to convert each point to the face's bounding box range _ = index let normalizedPoint = point.applying(affineTransform) let projected = SCNVector3(normalizedPoint.x, normalizedPoint.y, CGFloat(projectedOrigin.z)) let unprojected = sceneView.unprojectPoint(projected) vertices.append(unprojected) } } // Setup Indices var indices = [UInt16]() // Add indices // ... Removed for brevity ... // Setup texture coordinates var coordinates = [CGPoint]() // Add texture coordinates // ... Removed for brevity ... // Setup texture image let imageWidth = 2048.0 let normalizedCoordinates = coordinates.map { coord -> CGPoint in let x = coord.x / CGFloat(imageWidth) let y = coord.y / CGFloat(imageWidth) let textureCoord = CGPoint(x: x, y: y) return textureCoord } // Setup sources let sources = SCNGeometrySource(vertices: vertices) let textureCoordinates = SCNGeometrySource(textureCoordinates: normalizedCoordinates) // Setup elements let elements = SCNGeometryElement(indices: indices, primitiveType: .triangles) // Setup Geometry let geometry = SCNGeometry(sources: [sources, textureCoordinates], elements: [elements]) geometry.firstMaterial?.diffuse.contents = textureImage // Setup node let customFace = SCNNode(geometry: geometry) sceneView.scene?.rootNode.addChildNode(customFace) // Append the face to the face nodes array faceNode.append(face) } // Iterate the face nodes and append to the scene for node in faceNode { sceneView.scene?.rootNode.addChildNode(node) } }
Posted
by
Post marked as solved
5 Replies
2.7k Views
Did something change on face detection / Vision Framework on iOS 15? Using VNDetectFaceLandmarksRequest and reading the VNFaceLandmarkRegion2D to detect eyes is not working on iOS 15 as it did before. I am running the exact same code on an iOS 14 and iOS 15 device and the coordinates are different as seen on the screenshot? Any Ideas?
Posted
by
Post not yet marked as solved
1 Replies
913 Views
VNContoursObservation is taking 715 times as long as OpenCV’s findContours() when creating directly comparable results. VNContoursObservation creates comparable results when I have set the maximumImageDimension property to 1024. If I set it lower, it runs a bit faster, but creates lower quality contours and still takes over 100 times as long. I have a hard time believing Apple doesn’t know what they are doing, so does anyone have an idea what is going on and how to get it to run much faster? There doesn’t seem to be many options, but nothing I’ve tried closes the gap. Setting the detectsDarkOnLight property to true makes it run even slower. OpenCV findContours runs with a binary image, but I am passing a RGB image to Vision assuming it would convert it to an appropriate format. OpenCV: double taskStart = CFAbsoluteTimeGetCurrent(); int contoursApproximation = CV_CHAIN_APPROX_NONE; int contourRetrievalMode = CV_RETR_LIST; findContours(input, contours, hierarchy, contourRetrievalMode, contoursApproximation, cv::Point(0,0)); NSLog(@"###### opencv findContours: %f", CFAbsoluteTimeGetCurrent() - taskStart); ###### opencv findContours: 0.017616 seconds Vision: let taskStart = CFAbsoluteTimeGetCurrent() let contourRequest = VNDetectContoursRequest.init() contourRequest.revision = VNDetectContourRequestRevision1 contourRequest.contrastAdjustment = 1.0 contourRequest.detectsDarkOnLight = false contourRequest.maximumImageDimension = 1024 let requestHandler = VNImageRequestHandler.init(cgImage: sourceImage.cgImage!, options: [:]) try! requestHandler.perform([contourRequest]) let contoursObservation = contourRequest.results?.first as! VNContoursObservation print(" ###### contoursObservation: \(CFAbsoluteTimeGetCurrent() - taskStart)") ###### contoursObservation: 12.605962038040161 The image I am providing OpenCV is 2048 pixels and the image I am providing Vision is 1024.
Posted
by
Post marked as solved
9 Replies
2.6k Views
Here is the setup. I have an UIImageView in which I write some text, using UIGraphicsBeginImageContext. I pass this image to the OCR func: func ocrText(onImage: UIImage?) { let request = VNRecognizeTextRequest { request, error in guard let observations = request.results as? [VNRecognizedTextObservation] else { fatalError("Received invalid observations") } print("observations", observations.count) for observation in observations { if observation.topCandidates(1).isEmpty { continue } } } // end of request handler request.recognitionLanguages = ["fr"] let requests = [request] DispatchQueue.global(qos: .userInitiated).async { let ocrGroup = DispatchGroup() guard let img = onImage?.cgImage else { return } // Conversion to cgImage works OK                 print("img", img, img.width)                 let (_, _) = onImage!.logImageSizeInKB(scale: 1) ocrGroup.enter() let handler = VNImageRequestHandler(cgImage: img, options: [:]) try? handler.perform(requests) ocrGroup.leave() ocrGroup.wait() } } Problem is that observations is an empty array. I get the following logs: img <CGImage 0x7fa53b350b60> (DP) <<CGColorSpace 0x6000032f1e00> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; sRGB IEC61966-2.1)> width = 398, height = 164, bpc = 8, bpp = 32, row bytes = 1600 kCGImageAlphaPremultipliedFirst | kCGImageByteOrder32Little | kCGImagePixelFormatPacked is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes 398 ImageSize(KB): 5 ko 2022-06-02 17:21:03.734258+0200 App[6949:2718734] Metal API Validation Enabled observations 0 Which shows image is loaded and converted correctly to cgImage. But no observations. Now, if I use the same func on a snapshot image of the text drawn on screen, it works correctly. Is there a difference between the image created by camera and image drawn in CGContext ? Here is how mainImageView!.image (used in ocr) is created in a subclass of UIImageView: override func touchesEnded(_ touches: Set<UITouch>, with event: UIEvent?) { // Merge tempImageView into mainImageView UIGraphicsBeginImageContext(mainImageView!.frame.size) mainImageView!.image?.draw(in: CGRect(x: 0, y: 0, width: frame.size.width, height: frame.size.height), blendMode: .normal, alpha: 1.0) tempImageView!.image?.draw(in: CGRect(x: 0, y: 0, width: frame.size.width, height: frame.size.height), blendMode: .normal, alpha: opacity) mainImageView!.image = UIGraphicsGetImageFromCurrentImageContext() UIGraphicsEndImageContext() tempImageView?.image = nil } I also draw the created image in a test UIImageView and get the correct image. Here are the logs for the drawn texte and from the capture: Drawing doesn't work img <CGImage 0x7fb96b81a030> (DP) <<CGColorSpace 0x600003322160> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; sRGB IEC61966-2.1)> width = 398, height = 164, bpc = 8, bpp = 32, row bytes = 1600 kCGImageAlphaPremultipliedFirst | kCGImageByteOrder32Little | kCGImagePixelFormatPacked is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes 398 ImageSize(KB): 5 ko 2022-06-02 15:38:51.115476+0200 Numerare[5313:2653328] Metal API Validation Enabled observations 0 Screen shot : Works img <CGImage 0x7f97641720f0> (IP) <<CGColorSpace 0x60000394c960> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; iMac)> width = 570, height = 276, bpc = 8, bpp = 32, row bytes = 2280 kCGImageAlphaNoneSkipLast | 0 (default byte order) | kCGImagePixelFormatPacked is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes 570 ImageSize(KB): 5 ko 2022-06-02 15:43:32.158701+0200 Numerare[5402:2657059] Metal API Validation Enabled 2022-06-02 15:43:33.122941+0200 Numerare[5402:2657057] [WARNING] Resource not found for 'fr_FR'. Character language model will be disabled during language correction. observations 1 Is there an issue with kCGColorSpaceModelRGB ?
Posted
by
Post not yet marked as solved
4 Replies
1.8k Views
Hi, When using VNFeaturePrintObservation and then computing the distance using two images, the values that it returns varies heavily. When two identical images (same image file) is inputted into function (below) that I have used to compare the images, the distance does not return 0 while it is expected to, since they are identical images. Also, what is the upper limit of computeDistance? I am trying to find the percentage similarity between the two images. (Of course, this cannot be done unless the issue above is resolved). Code that I have used is below func featureprintObservationForImage(image: UIImage) -> VNFeaturePrintObservation? {     let requestHandler = VNImageRequestHandler(cgImage: image.cgImage!, options: [:])     let request = VNGenerateImageFeaturePrintRequest()     request.usesCPUOnly = true // Simulator Testing     do {       try requestHandler.perform([request])       return request.results?.first as? VNFeaturePrintObservation     } catch {       print("Vision Error: \(error)")       return nil     }   }   func compare(origImg: UIImage, drawnImg: UIImage) -> Float? {     let oImgObservation = featureprintObservationForImage(image: origImg)     let dImgObservation = featureprintObservationForImage(image: drawnImg)     if let oImgObservation = oImgObservation {       if let dImgObservation = dImgObservation {         var distance: Float = -1         do {           try oImgObservation.computeDistance(&distance, to: dImgObservation)         } catch {           fatalError("Failed to Compute Distance")         }         if distance == -1 {           return nil         } else {           return distance         }       } else {         print("Drawn Image Observation found Nil")       }     } else {       print("Original Image Observation found Nil")     }     return nil   } Thanks for all the help!
Posted
by
Post not yet marked as solved
1 Replies
885 Views
i saw there is a way to track hands with vision, but is there also a way to record that movement and export it to fbx? Oh and is there a way to set only one hand to be recorded or both at the same time? Implementation will be in SwiftUI
Posted
by
Post not yet marked as solved
0 Replies
621 Views
The 2-d image frame is extracted from a live/ pre-recorded video where the camera is placed behind one player so that the complete tennis court is visible in the frame. The court detection and ball detection have been done using CoreML and Vision APIs. Next step is to detect the trajectory and the bounce point of the ball to see if the ball is in/out of the court for scoring and analysis. I've used VNDetectTrajectoryRequest to draw the trajectory of the ball and used the detected court boundingBox as the ROI for trajectory detection.The problem is I am not able to remove the extra noise (coming from player movement in each frame) from the detection as the player is also in ROI. Next, How should I proceed with the ball bounce detection? private func detectTrajectories(_ controller: CameraViewController, _ buffer : CMSampleBuffer, _ orientation : CGImagePropertyOrientation) throws { let visionHandler = VNImageRequestHandler(cmSampleBuffer: buffer, orientation: orientation, options: [:]) let normalizedFrame = CGRect(x: 0, y: 0, width: 1, height: 1) DispatchQueue.main.async { // Get the frame of the rendered view. self.trajectoryView.frame = controller.viewRectForVisionRect(normalizedFrame) self.trajectoryView.roi = controller.viewRectForVisionRect(normalizedFrame) } //setup trajectory request setUpDetectTrajectoriesRequestWithMaxDimension() do { // Help manage the real-time use case to improve the precision versus delay tradeoff. detectTrajectoryRequest.targetFrameTime = .zero // The region of interest where the object is moving in the normalized image space. detectTrajectoryRequest.regionOfInterest = normalizedFrame try visionHandler.perform([detectTrajectoryRequest]) } catch { print("Failed to perform the trajectory request: \(error.localizedDescription)") return } } func setUpDetectTrajectoriesRequestWithMaxDimension() { detectTrajectoryRequest = VNDetectTrajectoriesRequest(frameAnalysisSpacing: .zero, trajectoryLength: trajectoryLength, completionHandler: completionHandler) // detectTrajectoryRequest. detectTrajectoryRequest .objectMinimumNormalizedRadius = 0.003 detectTrajectoryRequest.objectMaximumNormalizedRadius = 0.005 } private func completionHandler(request: VNRequest, error: Error?) { if let e = error { print(e) return } guard let observations = request.results as? [VNTrajectoryObservation] else { return } let relevantTrajectory = observations.filter { $0.confidence > trajectoryDetectionConfidence} if let trajectory = relevantTrajectory.first { DispatchQueue.main.async { print(trajectory.projectedPoints.count) self.trajectoryView.duration = trajectory.timeRange.duration.seconds self.trajectoryView.points = trajectory.detectedPoints self.trajectoryView.performTransition(.fadeIn, duration: 0.05) if !self.trajectoryView.fullTrajectory.isEmpty { self.trajectoryView.roi = CGRect(x: 0, y: 0, width: 1, height: 1) } } DispatchQueue.main.asyncAfter(deadline: .now() + 1.5, execute: { self.trajectoryView.resetPath() }) } } In the completion handler function, I have removed all the VNTrajectoryObservation that have a confidence of less than 0.9. After that, I have created a trajectoryView that displays the detected trajectory on the frame.
Posted
by
Post not yet marked as solved
0 Replies
423 Views
I am having issues with VNRecognizeTextRequest where the binary images for extracting text from images fails to load. Here is some logs that I have gotten: WARNING: File mapping at offset 80 of file /System/Library/AssetsV2/com_apple_MobileAsset_LinguisticData/d8e5c8c195c5d8c6372e99004e20e5562158a0d4.asset/AssetData/en.lm/fst.dat could not be honored, reading instead. WARNING: File mapping at offset 10400 of file /System/Library/AssetsV2/com_apple_MobileAsset_LinguisticData/d8e5c8c195c5d8c6372e99004e20e5562158a0d4.asset/AssetData/en.lm/fst.dat could not be honored, reading instead.
Posted
by
Post not yet marked as solved
0 Replies
410 Views
Hi All, I am using vision framework for barcode scanning and now i wanted to support UPU s18 4-state barcodes too. Can u pls guide us how i can achive this functionality. TIA A
Posted
by
Post marked as solved
1 Replies
841 Views
Hello, I am reaching out for some assistance regarding integrating a CoreML action classifier into a SwiftUI app. Specifically, I am trying to implement this classifier to work with the live camera of the device. I have been doing some research, but unfortunately, I have not been able to find any relevant information on this topic. I was wondering if you could provide me with any examples, resources, or information that could help me achieve this integration? Any guidance you can offer would be greatly appreciated. Thank you in advance for your help and support.
Posted
by
Post not yet marked as solved
2 Replies
555 Views
Hi - I apologize if this question has been answered in the past, I can't seem to find a clear answer. I'm wondering if there is a reliable way to leverage individual landmark points from VNDetectHumanHandPoseRequest to calculate a real-world distance. Like the wrist point to the tip of the middle finger and return a calculated result like 7.5" for example. My assumption is that the same methods used with a manual hitTest to find the distance betwen two points (like in the official Measure app) could work here. However, with hitTest being deprecated, that leaves me a bit of a loss. I'm happy to continue to dig through the documentation, but before I do, I was hoping someone could let me know if this is even possible or if we're still not quite there yet to be able to leverage the Vision points to calculate an accurate distance (on modern devices that support it)? I appreciate any feedback or points in the right direction!
Posted
by
Post not yet marked as solved
1 Replies
354 Views
I have a macOS Sierra Version 10.12.6 27 inch after a few hours of using this computer back gets very hot any idea how to fix this thanks
Posted
by
Post not yet marked as solved
0 Replies
422 Views
I'm referring to this talk: https://developer.apple.com/videos/play/wwdc2021/10152 I was wondering if the code for the "Image composition" project he demonstrates at the end of the talk (around 24:00) is available somewhere? Would much appreciate any help.
Posted
by
Post not yet marked as solved
0 Replies
762 Views
Hello guys, I am trying to run this sample project on my Ipad, I got a black screen the camera does not initialize. I tried updating the info.plist and asking for camera permission. I updated all the devices, did someone tried this demo? https://developer.apple.com/documentation/vision/detecting_animal_body_poses_with_vision
Posted
by
Post marked as solved
1 Replies
499 Views
Hi everyone! I'm implementing a multithreaded approach on text recognition through Vision's VNSequenceRequestHandler and VNRecognizeTextRequest. I've created multiple threads (let's use 3 for example) and created 3 instances of VNSequenceRequestHandler for each thread. My AVSession sends me sample buffers (60 per second) and I'm trying to handle them one by one in 3 different threads. These threads constantly trying to consume sample buffers from my temporary sample buffer queue (1 to 3 sample buffers are in queue, they got deleted after handling). Sample buffers are not shared between these threads - one sample buffer is only for one thread's VNSequenceRequestHandler. For each performRequests operation I create a new VNRecognizeTextRequest. By this I was trying to increase count of sample buffers handled per second. But what I found out is no matter how many threads I've created (1 or 3), the speed is always about 10 fps (iPhone 13 Pro). When I use 1 thread, only one instance of VNSequenceRequestHandler is created and used. In this case the [requestHandler performRequests:@[request] onCMSampleBuffer:sampleBuffer error:&error] takes about 100-150ms. When I use 3 threads, each instance of VNSequenceRequestHandler takes up to 600ms to handle the request with [requestHandler performRequests:@[request] onCMSampleBuffer:sampleBuffer error:&error]. When I have 2 threads, the average time is about 300-400ms. Does it mean that the VNSequenceRequestHandler inside of a Vision framework share some buffer's or request's queue so they're not able to work separately? Or maybe some single core of a GPU is used for detection? I saw in the debug session window that the VNSequenceRequestHandler creates separate concurrent dispatch queues for handling the requests (for 2 instances 2 queues created), which in my opinion should not block the resources that much causing requests execution time grow 2 times. Any ideas what causing the problem?
Post marked as solved
4 Replies
935 Views
Hi, Has anyone gotten the human body pose in 3D sample provided at the following working? https://developer.apple.com/documentation/vision/detecting_human_body_poses_in_3d_with_vision I installed iPadOS 17 on a 9th Gen iPad. The sample load up on Mac and iPad. However after selecting an image, it goes into the spinning wheel without anything returned. I hope to play and learn more about the sample. Any pointers or help is greatly appreciated. Similarly, the Detecting animal body poses with Vision is showing up as blank for me. https://developer.apple.com/documentation/vision/detecting_animal_body_poses_with_vision Or does the samples require a device with Lidar? Thank you in advance.
Posted
by
Post not yet marked as solved
0 Replies
293 Views
I'm using Vision framework for text recognition and detecting rectangles in the image. For that, I'm using VNRecognizeText & VNDetectRectangles features of the Vision. In macOS and iOS results, I found slight difference in the boundingBox coordinates of the text and the rectangles detected for the same image. Is this expected? Can we do anything to make the results identical? Also, on macOS, when I'm using same features of Vision from python (using pyobjc-framework-Vision package), there also i'm getting slightly different results.
Posted
by