Vision

Composer Services

Hi, I was watching this WWDC23 video on Metal with xrOS (https://developer.apple.com/videos/play/wwdc2023/10089/?time=1222). However, when I tried it, the Compositor Services API wasn't available. Is it ? Or when will it be released ? Thanks.

Posted

by

gunicorn_on_swift

Real-time image processing on passthrough imagery?

Is there a framework that allows for classic image processing operations in real-time from incoming imagery from the front-facing cameras before they are displayed on the OLED screens? Things like spatial filtering, histogram equalization, and image warping. I saw the documentation for the Vision framework, but it seems to address high-level tasks, like object and recognition. Thank you!

Posted

by

darioringach

Wifi Passthrough?

Is there a way to enable Wifi on the Vision Pro in its simulated environment and pass through devices connected to the local network or Mac?

Posted

by

WayneY

VNGeneratePersonSegmentationRequest produces mask at different resolution then source image

Trying to use VNGeneratePersonSegmentationRequest.. it seems to work but the output mask isn't at the same resolution as the source image.. so comping the result with the source produces a bad result. Not the full code, but hopefully enough to see what I'm doing. var imageRect = CGRect(x: 0, y: 0, width: image.size.width, height: image.size.height) let imageRef = image.cgImage(forProposedRect: &imageRect, context: nil, hints: nil)! let request = VNGeneratePersonSegmentationRequest() let handler = VNImageRequestHandler(cgImage: imageRef) do { try handler.perform([request]) guard let result = request.results?.first else { return } //Is this the right way to do this? let output = result.pixelBuffer //This ciImage alpha mask is a different resolution than the source image //So I don't know how to combine this with the source to cut out the foreground as they don't line up.. the res it's even the right aspect ratio. let ciImage = CIImage(cvPixelBuffer: output) ..... }

Posted

by

dank

Vision Pro - Can we see outside through camera, and take picture

Hi, Can VisionPro see outside and let us take a picture from code/iOS/Programming? Eg: like we used to have camera permission on iPad/iPhone to have access to device camera. Thank you

Posted

by

efficientalgorithm

Moving a Rigged character with Armature Bones Question

Is there a way to move a Rigged Character with its Armature Bones in ARKit/RealityKit? I am trying to do this When I try to move using JointTransform the usdz robot provided in https://developer.apple.com/documentation/arkit/arkit_in_ios/content_anchors/capturing_body_motion_in_3d It gives me the following: I see the documentation on Character Rigging etc. But is the movement through armature bones only available through a third party software. Or can it be done in Reality Kit/Arkit/RealityView? https://developer.apple.com/documentation/arkit/arkit_in_ios/content_anchors/rigging_a_model_for_motion_capture

Posted

by

jamesboo

Is it possible to import CreateML on an iOS project

Is it possible to use import CreateML on an iOS project? I'm looking at the code form the "Build dynamic iOS apps with the Create ML framework" video from this link https://developer.apple.com/videos/play/wwdc2021/10037/, but I'm not sure what kind of project I need to create. If I created an iOS project and tried running the code, what inputs would I need?

Posted

by

reetinav

Parallelizing/MultiProcessing Vision API

First of all this vision api is amazing. the OCR is very accurate. I've been looking to multiprocess using the vision API. I have about 2 million PDFs I want to OCR, and I want to run multiple threads/run parallel processing to OCR each. I tried pyobjc but it does not work so well. Any suggestions on tackling this problem?

Posted

by

jsunghop

Code for Extract document data using Vision

I am looking for the examples demo'd by Frank in session wwdc21-10041. I don't seem to find it anywhere. Any lead is appreciated.

Posted

by

Behy

SCSensitivityAnalyzer always returns a result of false

hi there, i'm not sure if i'm missing something, but i've tried passing a variety of CGImages into SCSensitivityAnalyzer, incl ones which should be flagged as sensitive, and it always returns false. it doesn't throw an exception, and i have the Sensitive Content Warning enabled in settings (confirmed by checking the analysisPolicy at run time). i've tried both the async and callback versions of analyzeImage. this is with Xcode 15 beta 5. i'm primarily testing on iOS/iPad simulators - is that a known issue? cheers, Mike

Posted

by

ziggmike

wwdc21-10040: Detect people, faces, and poses using Vision

Can you share the source code for the demo of the Vision Face Detector with the metrics (roll, yaw and pitch) displayed? You provide some code online but not for this portion of the presentation.

Posted

by

rovingshoe

What is the accuracy of finger joint detection? Discrimination and distance detection

When I customize the gesture interaction, how do I set the key value? It depends on the accuracy of finger joint recognition and distance detection. What is the accuracy of finger joint detection? discrimination and distance detection

ARKit
Vision

Posted

by

liangzf

how to produce image with semanticSegmentationSkyMatteImage information on iOS?

I'm trying to create a sky mask on pictures taken from my iPhone. I've seen in the documentation that CoreImage support semantic segmentation for Sky among other type for person (skin, hair etc...) For now, I didn't found the proper workflow to use it. First, I watched https://developer.apple.com/videos/play/wwdc2019/225/ I understood that images must be captured with the segmentation with this kind of code: photoSettings.enabledSemanticSegmentationMatteTypes = self.photoOutput.availableSemanticSegmentationMatteTypes photoSettings.embedsSemanticSegmentationMattesInPhoto = true I capture the image on my iPhone, save it as HEIC format then later, I try to load the matte like that : let skyMatte = CIImage(contentsOf: imageURL, options: [.auxiliarySemanticSegmentationSkyMatte: true]) Unfortunately, self.photoOutput.availableSemanticSegmentationMatteTypes always give me a list of types for person only and never a types Sky. Anyway, the AVSemanticSegmentationMatte.MatteType is just [Hair, Skin, Teeth, Glasses] ... No Sky !!! So, How am I supposed to use semanticSegmentationSkyMatteImage ?!? Is there any simple workaround ?

Posted

by

Damien_Jardon

How to control a rigged 3d hand model via hand motion capture?

Hi, I want to control a hand model via hand motion capture. I know there is a sample project and some articles about Rigging a Model for Motion Capture in ARKit document. BUT The solution is quite encapsulated in BodyTrackedEntity. I can't find appropriate Entity for controlling just a hand model. By using VNDetectHumanHandPoseRequest provided by Vision framework, I can get hand joint info, but I don't know how to use that info in RealityKit to control a 3d hand model. Do you know how to do that or do you have any idea on how should it be implemented? Thanks

Posted

by

wuminqi

Question regarding Coordinate space of BoundingBoxes reported by VNFaceObservation

I am trying to use VNDetectFaceRectanglesRequest to detect face bounding boxes on frames obtained by ARKit callbacks. I have my app in Portrait Device Orientation and I am passing the .right orientation to perform method on VNSequenceRequestHandler something like: private let requestHandler = VNSequenceRequestHandler() private var facePoseRequest: VNDetectFaceRectanglesRequest! // ... try? self.requestHandler.perform([self.facePoseRequest], on: currentBuffer, orientation: orientation) Im setting .right for orientation above, in the hopes that Vision-Framework will re-orient before running inference. Im trying to draw the returned BB on top of the Image. Here's my results processing code: guard let faceRes = self.facePoseRequest.results?.first as? VNFaceObservation else { return } //Option1: Assuming reported BB is in coordinate space of orientation-adjusted pixel buffer // Problems/Observations: // BoundingBox turns into a square with equal width and height // Also BB does not cover entire face, but only from chin to eyes //Notice Height & Width are flipped below let flippedBB = VNImageRectForNormalizedRect(faceRes.boundingBox, currBufHeight, currBufWidth) //vs //Option2: Assuming, reported BB is in coordinate-system of original un-oriented pixel-buffer // Problem/Observations: // while the drawn BB does appear like a rectangle and covering most of the face, it is not always centered on the face. // It moves around the screen when I tilt the device or my face. let currBufWidth = CVPixelBufferGetWidth(currentBuffer) let currBufHeight = CVPixelBufferGetHeight(currentBuffer) let reportedBB = VNImageRectForNormalizedRect(faceRes.boundingBox, currBufWidth, currBufHeight) In Option1 above: BoundingBox becomes a square shape with Width and Height becoming equal. I noticed that the reported normalized BB has the same aspect ration as the Input Pixel Buffer, which is 1.33 . This is the reason that when I flip Width and Height params in VNImageRectForNormalizedRect, width and height become equal. In Option2 above: BB seems to be somewhat right height, it jumps around when I tilt the device or my head. What coordinate system are the reported bounding boxes in? Do I need to adjust for y-flippedness of Vision framework before I perform above operations? What's the best way to draw these BB on the captured-frame and or ARview? Thank you

ARKit
Vision

Posted

by

srnlwmsrnl

Detecting Raised Finger using Vision

Hi, I am using Vision to detect fingers for counting. I am able to detect the finger joints. However when the finger is closed it also seems to detect the finger joints. So I am not able to differentiate and it is not reliable. I have tried to detect all the joints of a finger before determining the finger is raised, but doesn't seem to work. Any ideas on how I can improve the accuracy of detection of raised finger. Thanks

Vision

Posted

by

InternetJockey

RealityView is not responding to tap gesture

Hello, I have created a view with a 360 image full view, and I need to perform a task when the user clicks anywhere on the screen (leave the dome), but no matter what I try, it just does not work, it doesn't print anything at all. import SwiftUI import RealityKit import RealityKitContent struct StreetWalk: View { @Binding var threeSixtyImage: String @Binding var isExitFaded: Bool var body: some View { RealityView { content in // Create a material with a 360 image guard let url = Bundle.main.url(forResource: threeSixtyImage, withExtension: "jpeg"), let resource = try? await TextureResource(contentsOf: url) else { // If the asset isn't available, something is wrong with the app. fatalError("Unable to load starfield texture.") } var material = UnlitMaterial() material.color = .init(texture: .init(resource)) // Attach the material to a large sphere. let streeDome = Entity() streeDome.name = "streetDome" streeDome.components.set(ModelComponent( mesh: .generatePlane(width: 1000, depth: 1000), materials: [material] )) // Ensure the texture image points inward at the viewer. streeDome.scale *= .init(x: -1, y: 1, z: 1) content.add(streeDome) } update: { updatedContent in // Create a material with a 360 image guard let url = Bundle.main.url(forResource: threeSixtyImage, withExtension: "jpeg"), let resource = try? TextureResource.load(contentsOf: url) else { // If the asset isn't available, something is wrong with the app. fatalError("Unable to load starfield texture.") } var material = UnlitMaterial() material.color = .init(texture: .init(resource)) updatedContent.entities.first?.components.set(ModelComponent( mesh: .generateSphere(radius: 1000), materials: [material] )) } .gesture(tap) } var tap: some Gesture { SpatialTapGesture().targetedToAnyEntity().onChanged{ value in // Access the tapped entity here. print(value.entity) print("maybe you can tap the dome") // isExitFaded.toggle() } }

Posted

by

fabuthaher

Anchor Window Group

Hello, I am looking for something to allow me to Anchor a webview component to the user, as in it follows their line of vision as they move. I tried using RealityView with an Anchor Entity, but it raises an error of "Presentations are not permitted within volumetric window scene". Can I anchor the Window instead?

Posted

by

fabuthaher

macOS 14 beta 7 VNRecognizeTextRequest accuracy

I have a simple swift script where I send a VNRecognizeTextRequest to recognize numbers in a selected area of the screen. It works perfectly fine with macOS 13.5.1. However, the accuracy in macOS 14 is pretty low. Can this be related to the just Beta or some other changes?

Beta
Vision

Posted

by

hcancelik

Interview University Research on Vision Machine Learning

Hello, I am Pieter Bikkel. I study Software Engineering at the HAN, University of Applied Sciences, and I am working on an app that can recognize volleyball actions using Machine Learning. A volleyball coach can put an iPhone on a tripod and analyze a volleyball match. For example, where the ball always lands in the field, how hard the ball is served. I was inspired by this session and wondered if I could interview one of the experts in this field. This would allow me to develop my App even better. I hope you can help me with this.

Posted

by

pbikkel

Posts under Vision tag