Apply computer vision algorithms to perform a variety of tasks on input images and video using Vision.

Vision Documentation

Posts under Vision tag

102 Posts
Sort by:
Post marked as solved
1 Replies
535 Views
Hi, I was watching this WWDC23 video on Metal with xrOS (https://developer.apple.com/videos/play/wwdc2023/10089/?time=1222). However, when I tried it, the Compositor Services API wasn't available. Is it ? Or when will it be released ? Thanks.
Posted
by
Post not yet marked as solved
1 Replies
809 Views
Is there a framework that allows for classic image processing operations in real-time from incoming imagery from the front-facing cameras before they are displayed on the OLED screens? Things like spatial filtering, histogram equalization, and image warping. I saw the documentation for the Vision framework, but it seems to address high-level tasks, like object and recognition. Thank you!
Posted
by
Post not yet marked as solved
0 Replies
374 Views
Is there a way to enable Wifi on the Vision Pro in its simulated environment and pass through devices connected to the local network or Mac?
Posted
by
Post marked as solved
2 Replies
768 Views
Trying to use VNGeneratePersonSegmentationRequest.. it seems to work but the output mask isn't at the same resolution as the source image.. so comping the result with the source produces a bad result. Not the full code, but hopefully enough to see what I'm doing. var imageRect = CGRect(x: 0, y: 0, width: image.size.width, height: image.size.height) let imageRef = image.cgImage(forProposedRect: &imageRect, context: nil, hints: nil)! let request = VNGeneratePersonSegmentationRequest() let handler = VNImageRequestHandler(cgImage: imageRef) do { try handler.perform([request]) guard let result = request.results?.first else { return } //Is this the right way to do this? let output = result.pixelBuffer //This ciImage alpha mask is a different resolution than the source image //So I don't know how to combine this with the source to cut out the foreground as they don't line up.. the res it's even the right aspect ratio. let ciImage = CIImage(cvPixelBuffer: output) ..... }
Posted
by
Post not yet marked as solved
0 Replies
677 Views
Is there a way to move a Rigged Character with its Armature Bones in ARKit/RealityKit? I am trying to do this When I try to move using JointTransform the usdz robot provided in https://developer.apple.com/documentation/arkit/arkit_in_ios/content_anchors/capturing_body_motion_in_3d It gives me the following: I see the documentation on Character Rigging etc. But is the movement through armature bones only available through a third party software. Or can it be done in Reality Kit/Arkit/RealityView? https://developer.apple.com/documentation/arkit/arkit_in_ios/content_anchors/rigging_a_model_for_motion_capture
Posted
by
Post marked as solved
1 Replies
553 Views
Is it possible to use import CreateML on an iOS project? I'm looking at the code form the "Build dynamic iOS apps with the Create ML framework" video from this link https://developer.apple.com/videos/play/wwdc2021/10037/, but I'm not sure what kind of project I need to create. If I created an iOS project and tried running the code, what inputs would I need?
Posted
by
Post not yet marked as solved
0 Replies
481 Views
First of all this vision api is amazing. the OCR is very accurate. I've been looking to multiprocess using the vision API. I have about 2 million PDFs I want to OCR, and I want to run multiple threads/run parallel processing to OCR each. I tried pyobjc but it does not work so well. Any suggestions on tackling this problem?
Posted
by
Post not yet marked as solved
1 Replies
636 Views
hi there, i'm not sure if i'm missing something, but i've tried passing a variety of CGImages into SCSensitivityAnalyzer, incl ones which should be flagged as sensitive, and it always returns false. it doesn't throw an exception, and i have the Sensitive Content Warning enabled in settings (confirmed by checking the analysisPolicy at run time). i've tried both the async and callback versions of analyzeImage. this is with Xcode 15 beta 5. i'm primarily testing on iOS/iPad simulators - is that a known issue? cheers, Mike
Posted
by
Post not yet marked as solved
1 Replies
619 Views
I'm trying to create a sky mask on pictures taken from my iPhone. I've seen in the documentation that CoreImage support semantic segmentation for Sky among other type for person (skin, hair etc...) For now, I didn't found the proper workflow to use it. First, I watched https://developer.apple.com/videos/play/wwdc2019/225/ I understood that images must be captured with the segmentation with this kind of code: photoSettings.enabledSemanticSegmentationMatteTypes = self.photoOutput.availableSemanticSegmentationMatteTypes photoSettings.embedsSemanticSegmentationMattesInPhoto = true I capture the image on my iPhone, save it as HEIC format then later, I try to load the matte like that : let skyMatte = CIImage(contentsOf: imageURL, options: [.auxiliarySemanticSegmentationSkyMatte: true]) Unfortunately, self.photoOutput.availableSemanticSegmentationMatteTypes always give me a list of types for person only and never a types Sky. Anyway, the AVSemanticSegmentationMatte.MatteType is just [Hair, Skin, Teeth, Glasses] ... No Sky !!! So, How am I supposed to use semanticSegmentationSkyMatteImage ?!? Is there any simple workaround ?
Posted
by
Post not yet marked as solved
0 Replies
549 Views
Hi, I want to control a hand model via hand motion capture. I know there is a sample project and some articles about Rigging a Model for Motion Capture in ARKit document. BUT The solution is quite encapsulated in BodyTrackedEntity. I can't find appropriate Entity for controlling just a hand model. By using VNDetectHumanHandPoseRequest provided by Vision framework, I can get hand joint info, but I don't know how to use that info in RealityKit to control a 3d hand model. Do you know how to do that or do you have any idea on how should it be implemented? Thanks
Posted
by
Post not yet marked as solved
0 Replies
415 Views
I am trying to use VNDetectFaceRectanglesRequest to detect face bounding boxes on frames obtained by ARKit callbacks. I have my app in Portrait Device Orientation and I am passing the .right orientation to perform method on VNSequenceRequestHandler something like: private let requestHandler = VNSequenceRequestHandler() private var facePoseRequest: VNDetectFaceRectanglesRequest! // ... try? self.requestHandler.perform([self.facePoseRequest], on: currentBuffer, orientation: orientation) Im setting .right for orientation above, in the hopes that Vision-Framework will re-orient before running inference. Im trying to draw the returned BB on top of the Image. Here's my results processing code: guard let faceRes = self.facePoseRequest.results?.first as? VNFaceObservation else { return } //Option1: Assuming reported BB is in coordinate space of orientation-adjusted pixel buffer // Problems/Observations: // BoundingBox turns into a square with equal width and height // Also BB does not cover entire face, but only from chin to eyes //Notice Height & Width are flipped below let flippedBB = VNImageRectForNormalizedRect(faceRes.boundingBox, currBufHeight, currBufWidth) //vs //Option2: Assuming, reported BB is in coordinate-system of original un-oriented pixel-buffer // Problem/Observations: // while the drawn BB does appear like a rectangle and covering most of the face, it is not always centered on the face. // It moves around the screen when I tilt the device or my face. let currBufWidth = CVPixelBufferGetWidth(currentBuffer) let currBufHeight = CVPixelBufferGetHeight(currentBuffer) let reportedBB = VNImageRectForNormalizedRect(faceRes.boundingBox, currBufWidth, currBufHeight) In Option1 above: BoundingBox becomes a square shape with Width and Height becoming equal. I noticed that the reported normalized BB has the same aspect ration as the Input Pixel Buffer, which is 1.33 . This is the reason that when I flip Width and Height params in VNImageRectForNormalizedRect, width and height become equal. In Option2 above: BB seems to be somewhat right height, it jumps around when I tilt the device or my head. What coordinate system are the reported bounding boxes in? Do I need to adjust for y-flippedness of Vision framework before I perform above operations? What's the best way to draw these BB on the captured-frame and or ARview? Thank you
Posted
by
Post not yet marked as solved
0 Replies
375 Views
Hi, I am using Vision to detect fingers for counting. I am able to detect the finger joints. However when the finger is closed it also seems to detect the finger joints. So I am not able to differentiate and it is not reliable. I have tried to detect all the joints of a finger before determining the finger is raised, but doesn't seem to work. Any ideas on how I can improve the accuracy of detection of raised finger. Thanks
Posted
by
Post not yet marked as solved
1 Replies
1.1k Views
Hello, I have created a view with a 360 image full view, and I need to perform a task when the user clicks anywhere on the screen (leave the dome), but no matter what I try, it just does not work, it doesn't print anything at all. import SwiftUI import RealityKit import RealityKitContent struct StreetWalk: View { @Binding var threeSixtyImage: String @Binding var isExitFaded: Bool var body: some View { RealityView { content in // Create a material with a 360 image guard let url = Bundle.main.url(forResource: threeSixtyImage, withExtension: "jpeg"), let resource = try? await TextureResource(contentsOf: url) else { // If the asset isn't available, something is wrong with the app. fatalError("Unable to load starfield texture.") } var material = UnlitMaterial() material.color = .init(texture: .init(resource)) // Attach the material to a large sphere. let streeDome = Entity() streeDome.name = "streetDome" streeDome.components.set(ModelComponent( mesh: .generatePlane(width: 1000, depth: 1000), materials: [material] )) // Ensure the texture image points inward at the viewer. streeDome.scale *= .init(x: -1, y: 1, z: 1) content.add(streeDome) } update: { updatedContent in // Create a material with a 360 image guard let url = Bundle.main.url(forResource: threeSixtyImage, withExtension: "jpeg"), let resource = try? TextureResource.load(contentsOf: url) else { // If the asset isn't available, something is wrong with the app. fatalError("Unable to load starfield texture.") } var material = UnlitMaterial() material.color = .init(texture: .init(resource)) updatedContent.entities.first?.components.set(ModelComponent( mesh: .generateSphere(radius: 1000), materials: [material] )) } .gesture(tap) } var tap: some Gesture { SpatialTapGesture().targetedToAnyEntity().onChanged{ value in // Access the tapped entity here. print(value.entity) print("maybe you can tap the dome") // isExitFaded.toggle() } }
Posted
by
Post not yet marked as solved
2 Replies
1.1k Views
Hello, I am looking for something to allow me to Anchor a webview component to the user, as in it follows their line of vision as they move. I tried using RealityView with an Anchor Entity, but it raises an error of "Presentations are not permitted within volumetric window scene". Can I anchor the Window instead?
Posted
by
Post not yet marked as solved
0 Replies
364 Views
I have a simple swift script where I send a VNRecognizeTextRequest to recognize numbers in a selected area of the screen. It works perfectly fine with macOS 13.5.1. However, the accuracy in macOS 14 is pretty low. Can this be related to the just Beta or some other changes?
Posted
by
Post not yet marked as solved
0 Replies
383 Views
Hello, I am Pieter Bikkel. I study Software Engineering at the HAN, University of Applied Sciences, and I am working on an app that can recognize volleyball actions using Machine Learning. A volleyball coach can put an iPhone on a tripod and analyze a volleyball match. For example, where the ball always lands in the field, how hard the ball is served. I was inspired by this session and wondered if I could interview one of the experts in this field. This would allow me to develop my App even better. I hope you can help me with this.
Posted
by