Recognize spoken words in recorded or live audio using Speech.

Speech Documentation

Posts under Speech tag

66 Posts
Sort by:
Post not yet marked as solved
0 Replies
90 Views
I would like to contact a developer on the SSML team regarding the possibility to create a new downloadable voice, in a language yet unsupported. I don't mind making a free contribution. Creating Custom voices does not seem to be a solution, since only English is supported when creating a custom voice.
Posted
by Cilliers.
Last updated
.
Post not yet marked as solved
2 Replies
1.1k Views
I need a simple text-to-speech avatar in my iOS app. iOS already has Memojis ready to go - but I cannot find anywhere in the dev docs on how to access Memojis to use in as a tool in app development. Am I missing something? Also - can anyone point me to any resources besides the Apple docs for using AVSpeechSynthesis?
Posted
by Calrizien.
Last updated
.
Post not yet marked as solved
1 Replies
800 Views
I'd like to allow the speech synthesizer to play on the device speaker while simultaneously mixing with a phone call. I've worked with a number of different configurations but am unable to find a configuration that achieves the functionality I am trying to achieve - or allows mixing with a phone call at all. There is a flag: mixToTelephonyUplink that seems to suggest that at least some mixing with a phone call is possible using the speech synthesizer, but I'm currently unable to find almost any documentation about this flag besides basic API docs. I've had some some luck at least getting the synthesizer to always play to the speaker with the following audio session configuration - but the sound never is mixed with a phone call. Instead, it is ducked and muted while the phone call takes place. I've tried quite a few configuration combinations for the category and overrides, but nothings seems to work quite as I'd expect it to. synthesizer.mixToTelephonyUplink = true try? audioSession.setCategory(.playback, mode: .voicePrompt, options: [.mixWithOthers, .defaultToSpeaker]) try? audioSession.setActive(true, options: []) try? audioSession.overrideOutputAudioPort(.speaker) Is there some kind of documentation for this that's off the beaten path that I'm somehow missing? I'm going to continue with guess and check, but I'm starting to think this flag - and the functionality it implies, actually wasn't ever fully implemented.
Posted
by d-skinio.
Last updated
.
Post not yet marked as solved
0 Replies
180 Views
I am using SpeechSynthesizer and SpeechRecognizer. After a recognition task completes, the SpeechSynthesizer stops producing audible output. I am using the latest SwiftUI in Xcode 15.2, deploying to an iPhone 14 Pro running iOS 17.3.1. Here's my SpeechSynthesizer function: func speak(_ text: String) { let utterance = AVSpeechUtterance(string: text) utterance.voice = AVSpeechSynthesisVoice(identifier: self.appState.chatParameters.voiceIdentifer) utterance.rate = 0.5 speechSynthesizer.speak(utterance) } And here's the code for setting up the SpeechRecognizer (borrowed from https://www.linkedin.com/pulse/transcribing-audio-text-swiftui-muhammad-asad-chattha): private static func prepareEngine() throws -> (AVAudioEngine, SFSpeechAudioBufferRecognitionRequest) { print("prepareEngine()") let audioEngine = AVAudioEngine() let request = SFSpeechAudioBufferRecognitionRequest() request.shouldReportPartialResults = false request.requiresOnDeviceRecognition = true let audioSession = AVAudioSession.sharedInstance() try audioSession.setCategory(.playAndRecord) try audioSession.setActive(true, options: .notifyOthersOnDeactivation) let inputNode = audioEngine.inputNode let recordingFormat = inputNode.outputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in request.append(buffer) } audioEngine.prepare() try audioEngine.start() return (audioEngine, request) } SpeechSynthesizer works fine as long as I don't call prepareEngine(). Thanks in advance for any assistance.
Posted Last updated
.
Post not yet marked as solved
1 Replies
255 Views
Hello, I’ve been trying to play system sounds in my app, but this hasn’t really been working. I am frequently switching between speech recognition (Speech framework) and sounds, so perhaps that’s where the issue lies. However, despite my best efforts, I haven't been able to solve the issue. I've been resetting the AVAudioSession category before playing a sound or starting speech recognition (as depicted in the code snippet below), to no avail. Has this happened to anyone else? Does anybody know how to fix the issue? recognizer = nil try? AVAudioSession.sharedInstance().setCategory(.playback, mode: .default, options: []) try? AVAudioSession.sharedInstance().setActive(true) AudioServicesPlaySystemSound(1113) try? AVAudioSession.sharedInstance().setCategory(.record, mode: .spokenAudio, options: []) try? AVAudioSession.sharedInstance().setActive(true) recognizer = SpeechRecognition(word: wordSheet) recognizer!.startRecognition() Thank you.
Posted Last updated
.
Post not yet marked as solved
0 Replies
194 Views
I want to develop an AI assistant ios application using whisper and chatGPT OpenAI apis. I am implementing these following steps. Audio-engine to record the user's voice Send audio chunk to Whisper for Speech to Text Send that text to chatgpt openAI to get response Now sending that response to Speech Synthesizer to speak response through built-in speaker In this process, i don't want to disable microphone. Because user can interrupt the speech synthesizer anytime he likes. It should be realtime and look like continuous call between the user and AI assistant. Problem: When user speaks, microphone takes the input and appends into the audioengine recording file. Then sends that chunk to whisper for transcribing, transcribed text is then sent to chatgpt api to get response and response is sent to speech synthesiser which generates an output on speaker. Issue is that the microphone again takes synthesiser voice from speaker, and create a loop. What should i possibly do to stop my microphone to not take the input from iphone speaker. Talking tom, callAnnie applications and many other ios applications are continuously using microphone and generating outputs from speaker without overlapping and loop. Suggest the possible ways. I tried to set all possible ways for setting audio-engine category and settings with record, playback, playandrecord etc. Nothing gives me the solution to avoid speaker voice into my microphone. Technically as I think of microphone should never take the device generated voices. What could be the possible solution. If my approach is wrong also i am open to plenty suggestions and guidance.
Posted
by iOSgeekk.
Last updated
.
Post not yet marked as solved
1 Replies
260 Views
I have AVSpeechSynthesizer built in to 6 apps for iPad/iOS that were working fine until recently. Sometime between November 2023 and Feb 2024, they just quit speaking on all the apps for no apparent reason. There have been both XCode and iOS updates in the interim, but I cannot be sure which caused it. It doesn't work either in XCode on simulation, nor on devices. What did Apple change? XCode 15.2 iOS 17+ SwiftUI let synth = AVSpeechSynthesizer() var thisText = "" func sayit(thisText: String) { let utterance = AVSpeechUtterance(string: thisText) utterance.voice = AVSpeechSynthesisVoice(language:"en-US") utterance.rate = 0.4 utterance.preUtteranceDelay = 0.1 synth.speak(utterance)}
Posted Last updated
.
Post not yet marked as solved
1 Replies
416 Views
I am trying to use the Speech Synthesizer to speak the pronunciation of a word in British English rather than play a local audio file which I had before. However, I keep getting this in the debugger: #FactoryInstall Unable to query results, error: 5 Unable to list voice folder Unable to list voice folder Unable to list voice folder IPCAUClient.cpp:129 IPCAUClient: bundle display name is nil Unable to list voice folder Here is my code, any suggestions?? ` func playSampleAudio() { let speechSynthesizer = AVSpeechSynthesizer() let speechUtterance = AVSpeechUtterance(string: currentWord) // Search for a voice with a British English accent. let voices = AVSpeechSynthesisVoice.speechVoices() var foundBritishVoice = false for voice in voices { if voice.language == "en-GB" { speechUtterance.voice = voice foundBritishVoice = true break } } if !foundBritishVoice { print("British English voice not found. Using default voice.") } // Configure the utterance's properties as needed. speechUtterance.rate = AVSpeechUtteranceDefaultSpeechRate speechUtterance.pitchMultiplier = 1.0 speechUtterance.volume = 1.0 // Speak the word. speechSynthesizer.speak(speechUtterance) }
Posted
by aer0187.
Last updated
.
Post not yet marked as solved
19 Replies
6.0k Views
Recently I updated to Xcode 14.0. I am building an iOS app to convert recorded audio into text. I got an exception while testing the application from the simulator(iOS 16.0). [SpeechFramework] -[SFSpeechRecognitionTask handleSpeechRecognitionDidFailWithError:]_block_invoke Ignoring subsequent recongition error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" Error Domain=kAFAssistantErrorDomain Code=1107 "(null)" I have to know what does the error code means and why this error occurred.
Posted Last updated
.
Post not yet marked as solved
2 Replies
1.1k Views
I am using SFSpeechRecognizer to perform speech recognition, but I am getting the following error. [SpeechFramework] -[SFSpeechRecognitionTask localSpeechRecognitionClient:speechRecordingDidFail:]_block_invoke Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" Setting requiresOnDeviceRecognition to False works correctly, but previously it worked with True with no error. The value of supportsOnDeviceRecognition was True, so the device is recognizing that it supports speech recognition. iPad Pro 11inch iOS 16.5. Is this expected behavior?
Posted
by k_nagami.
Last updated
.
Post not yet marked as solved
0 Replies
242 Views
Is there a way to extract the list of words recognized by the Speech framework? I'm trying to filter out words that won't appear in the transcription output, but to do that I'll need a list of words that can appear. SFSpeechLanguageModel.Configuration can be initialized with a vocabulary, but there doesn't seem to be a way to read it, and while there are ways to create custom vocabularies, I have yet to find a way to retrieve it. I added the Natural Language tag in case the framework might contribute to a solution
Posted
by wmk.
Last updated
.
Post not yet marked as solved
0 Replies
308 Views
I'm working with the new speech recognition APIs in iOS 17 and have encountered some confusion regarding the use of URLs in SFSpeechLanguageModel.prepareCustomLanguageModel and the SFSpeechLanguageModel.Configuration. In the SFSpeechLanguageModel.Configuration initializer, I provide a URL that points to a custom language model .bin file. However, there's also a URL parameter in the prepareCustomLanguageModel method. I'm unclear about the purpose of this second URL and how it differs from the one in the configuration. To add to the confusion, the documentation for these new APIs is not fully fleshed out at this point. I've tried injecting both .bin files (for the custom language model and the one for prepareCustomLanguageModel) into the same URL, but the results haven't clarified their distinct roles. In experiments I conducted, I checked the confidence level of recognized phrases from the same audio file with and without the custom language model .bin file. Surprisingly, the confidence levels remained the same in both scenarios, leading me to question if the custom model is being utilized correctly. Has anyone else worked with these new APIs and can provide clarity on: The distinct roles of the URLs in SFSpeechLanguageModel.Configuration and prepareCustomLanguageModel. Why there might be no noticeable difference in confidence levels when using a custom language model. Any insights or experiences with these new aspects of the iOS 17 speech recognition API would be greatly appreciated.
Posted Last updated
.
Post not yet marked as solved
1 Replies
215 Views
Hy, I'm French developer and I downloaded the Recognizing Speech in live Audio sample code from Developer Apple website. I tried to execute data generator command after changing the local identifier from 'en_US' to 'fr' in data generator main file , but when I ran the command in Xcode, I had this error message : " Identifier 'fr' does not parse into two elements." I checked the xml files associated to the bin archive file and the identifiers are no correct (they keep 'en-US' value). Thanks for your help !
Posted Last updated
.
Post not yet marked as solved
20 Replies
5.0k Views
I see a lot of crashes on iOS 17 beta regarding some problem of "Text To Speech". Does anybody has a clue why TTS crashes? Anybody else seeing the same problem? Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Subtype: KERN_INVALID_ADDRESS at 0x000000037f729380 Exception Codes: 0x0000000000000001, 0x000000037f729380 VM Region Info: 0x37f729380 is not in any region. Bytes after previous region: 3748828033 Bytes before following region: 52622617728 REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL MALLOC_NANO 280000000-2a0000000 [512.0M] rw-/rwx SM=PRV ---> GAP OF 0xd20000000 BYTES commpage (reserved) fc0000000-1000000000 [ 1.0G] ---/--- SM=NUL ...(unallocated) Termination Reason: SIGNAL 11 Segmentation fault: 11 Terminating Process: exc handler [36389] Triggered by Thread: 9 ..... Thread 9 name: Thread 9 Crashed: 0 libobjc.A.dylib 0x000000019eeff248 objc_retain_x8 + 16 1 AudioToolboxCore 0x00000001b2da9d80 auoop::RenderPipeUser::~RenderPipeUser() + 112 (AUOOPRenderPipePool.mm:400) 2 AudioToolboxCore 0x00000001b2e110b4 -[AUAudioUnit_XPC internalDeallocateRenderResources] + 92 (AUAudioUnit_XPC.mm:904) 3 AVFAudio 0x00000001bfa4cc04 AUInterfaceBaseV3::Uninitialize() + 60 (AUInterface.mm:524) 4 AVFAudio 0x00000001bfa894bc AVAudioEngineGraph::PerformCommand(AUGraphNodeBaseV3&, AVAudioEngineGraph::ENodeCommand, void*, unsigned int) const + 772 (AVAudioEngineGraph.mm:3317) 5 AVFAudio 0x00000001bfa93550 AVAudioEngineGraph::_Uninitialize(NSError**) + 132 (AVAudioEngineGraph.mm:1469) 6 AVFAudio 0x00000001bfa4b50c AVAudioEngineImpl::Stop(NSError**) + 396 (AVAudioEngine.mm:1081) 7 AVFAudio 0x00000001bfa4b094 -[AVAudioEngine stop] + 48 (AVAudioEngine.mm:193) 8 TextToSpeech 0x00000001c70b3c5c __55-[TTSSynthesisProviderAudioEngine renderSpeechRequest:]_block_invoke + 1756 (TTSSynthesisProviderAudioEngine.m:613) 9 libdispatch.dylib 0x00000001ae4b0740 _dispatch_call_block_and_release + 32 (init.c:1519) 10 libdispatch.dylib 0x00000001ae4b2378 _dispatch_client_callout + 20 (object.m:560) 11 libdispatch.dylib 0x00000001ae4b990c _dispatch_lane_serial_drain + 748 (queue.c:3885) 12 libdispatch.dylib 0x00000001ae4ba470 _dispatch_lane_invoke + 432 (queue.c:3976) 13 libdispatch.dylib 0x00000001ae4c5074 _dispatch_root_queue_drain_deferred_wlh + 288 (queue.c:6913) 14 libdispatch.dylib 0x00000001ae4c48e8 _dispatch_workloop_worker_thread + 404 (queue.c:6507) ... Thread 9 crashed with ARM Thread State (64-bit): x0: 0x0000000283309360 x1: 0x0000000000000000 x2: 0x0000000000000000 x3: 0x00000002833093c0 x4: 0x00000002833093c0 x5: 0x0000000101737740 x6: 0x0000000000000013 x7: 0x00000000ffffffff x8: 0x0000000283309360 x9: 0x3c788942d067009a x10: 0x0000000101547000 x11: 0x0000000000000000 x12: 0x00000000000007fb x13: 0x00000000000007fd x14: 0x000000001ee24020 x15: 0x0000000000000020 x16: 0x0000b1037f729360 x17: 0x000000037f729360 x18: 0x0000000000000000 x19: 0x0000000000000000 x20: 0x00000001016a8de8 x21: 0x0000000283e21d00 x22: 0x0000000283b3f1f8 x23: 0x0000000283098000 x24: 0x00000001bfb4fc35 x25: 0x00000001bfb4fc43 x26: 0x000000028033a688 x27: 0x0000000280c93090 x28: 0x0000000000000000 fp: 0x000000016fc86490 lr: 0x00000001b2da9d80 sp: 0x000000016fc863e0 pc: 0x000000019eeff248 cpsr: 0x1000 esr: 0x92000006 (Data Abort) byte read Translation fault
Posted Last updated
.
Post not yet marked as solved
0 Replies
344 Views
I have a prototype web view (in a WKWebView) that uses webkitSpeechRecognition for getting short snippets of text from speech. I'm not thrilled with the quality of the "recognition" - the text generally isn't very accurate. I'm wondering if I'll get any more accuracy by using the "native" SFSpeechRecognizer. It seems to me that webkitSpeechRecognition is likely just a Javascript wrapper interface for SFSpeechRecognizer, and the quality of the speech recognition won't improve. Does anyone know for sure if this is the case? Does webKitSpeechRecognition on iOS use SFSpeechRecognizer under the hood? Or are they two completely different recognition systems, and one could be more accurate than the other?
Posted
by mfpjacobt.
Last updated
.
Post not yet marked as solved
2 Replies
354 Views
Application is getting Crashed: AXSpeech EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x000056f023efbeb0 Crashed: AXSpeech 0 libobjc.A.dylib 0x4820 objc_msgSend + 32 1 libsystem_trace.dylib 0x6c34 _os_log_fmt_flatten_object + 116 2 libsystem_trace.dylib 0x5344 _os_log_impl_flatten_and_send + 1884 3 libsystem_trace.dylib 0x4bd0 _os_log + 152 4 libsystem_trace.dylib 0x9c48 _os_log_error_impl + 24 5 TextToSpeech 0xd0a8c _pcre2_xclass_8 6 TextToSpeech 0x3bc04 TTSSpeechUnitTestingMode 7 TextToSpeech 0x3f128 TTSSpeechUnitTestingMode 8 AXCoreUtilities 0xad38 -[NSArray(AXExtras) ax_flatMappedArrayUsingBlock:] + 204 9 TextToSpeech 0x3eb18 TTSSpeechUnitTestingMode 10 TextToSpeech 0x3c948 TTSSpeechUnitTestingMode 11 TextToSpeech 0x48824 AXAVSpeechSynthesisVoiceFromTTSSpeechVoice 12 TextToSpeech 0x49804 AXAVSpeechSynthesisVoiceFromTTSSpeechVoice 13 Foundation 0xf6064 __NSThreadPerformPerform + 264 14 CoreFoundation 0x37acc CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION + 28 15 CoreFoundation 0x36d48 __CFRunLoopDoSource0 + 176 16 CoreFoundation 0x354fc __CFRunLoopDoSources0 + 244 17 CoreFoundation 0x34238 __CFRunLoopRun + 828 18 CoreFoundation 0x33e18 CFRunLoopRunSpecific + 608 19 Foundation 0x2d4cc -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 212 20 TextToSpeech 0x24b88 TTSCFAttributedStringCreateStringByBracketingAttributeWithString 21 Foundation 0xb3154 NSThread__start + 732 com.livingMedia.AajTakiPhone_issue_3ceba855a8ad2d1af83655803dc13f70_crash_session_9081fa41ced440ae9a57c22cb432f312_DNE_0_v2_stacktrace.txt 22 libsystem_pthread.dylib 0x24d4 _pthread_start + 136 23 libsystem_pthread.dylib 0x1a10 thread_start + 8
Posted
by Rupendra.
Last updated
.
Post not yet marked as solved
1 Replies
676 Views
My app listens for verbal commands "Roll" & "Skip". It was working well until I used it while listening to a podcast in another app. I am getting a crash with the error: Thread 1: "required condition is false: IsFormatSampleRateAndChannelCountValid(format)" . It crashes when I am playing audio from the apps Snipd (a podcast app) or the Apple Podcast app. When I am playing audio from Youtube or the Apple Music it does not crash. This is the code for when I start listening for the commands: // MARK: - Speech Recognition func startListening() { do { try configureAudioSession() createRecognitionRequest() try prepareAudioEngine() } catch { print("Audio Engine error: \(error.localizedDescription)") } } private func configureAudioSession() throws { let audioSession = AVAudioSession.sharedInstance() try audioSession.setCategory(.playAndRecord, mode: .measurement, options: [.interruptSpokenAudioAndMixWithOthers, .duckOthers]) try audioSession.setActive(true, options: .notifyOthersOnDeactivation) } private func createRecognitionRequest() { recognitionRequest = SFSpeechAudioBufferRecognitionRequest() guard let recognitionRequest = recognitionRequest else { return } recognitionRequest.shouldReportPartialResults = true recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: handleRecognitionResult) } private func prepareAudioEngine() throws { let inputNode = audioEngine.inputNode inputNode.removeTap(onBus: 0) let inputFormat = inputNode.inputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: inputFormat) { [weak self] (buffer, _) in self?.recognitionRequest?.append(buffer) } audioEngine.prepare() try audioEngine.start() isActuallyListening = true } Thanks
Posted Last updated
.
Post not yet marked as solved
0 Replies
233 Views
Hi Apple Team, We have a technical query regarding one feature- Audio Recognition and Live captioning. We are developing an app for deaf community to avoid communication barriers. We want to know if there is any possibility to recognize the sound from other applications in an iPhone and show live captions in our application (based on iOS).
Posted
by akhileshy.
Last updated
.
Post not yet marked as solved
0 Replies
427 Views
As the title suggests I am using AVAudioEngine for SpeechRecognition input & AVAudioPlayer for sound output. Apple says in this talk https://developer.apple.com/videos/play/wwdc2019/510 that the setVoiceProcessingEnabled function very usefully cancels the output from speaker to the mic. I set voiceProcessing on the Input and output nodes. It seems to work however the volume is low, even when the system volume is turned up. Any solution to this would be much appreciated.
Posted
by sleep9.
Last updated
.