Post not yet marked as solved
I would like to contact a developer on the SSML team regarding the possibility to create a new downloadable voice, in a language yet unsupported. I don't mind making a free contribution. Creating Custom voices does not seem to be a solution, since only English is supported when creating a custom voice.
Post not yet marked as solved
I need a simple text-to-speech avatar in my iOS app. iOS already has Memojis ready to go - but I cannot find anywhere in the dev docs on how to access Memojis to use in as a tool in app development. Am I missing something? Also - can anyone point me to any resources besides the Apple docs for using AVSpeechSynthesis?
Post not yet marked as solved
I'd like to allow the speech synthesizer to play on the device speaker while simultaneously mixing with a phone call. I've worked with a number of different configurations but am unable to find a configuration that achieves the functionality I am trying to achieve - or allows mixing with a phone call at all.
There is a flag: mixToTelephonyUplink that seems to suggest that at least some mixing with a phone call is possible using the speech synthesizer, but I'm currently unable to find almost any documentation about this flag besides basic API docs.
I've had some some luck at least getting the synthesizer to always play to the speaker with the following audio session configuration - but the sound never is mixed with a phone call. Instead, it is ducked and muted while the phone call takes place. I've tried quite a few configuration combinations for the category and overrides, but nothings seems to work quite as I'd expect it to.
synthesizer.mixToTelephonyUplink = true
try? audioSession.setCategory(.playback, mode: .voicePrompt, options: [.mixWithOthers, .defaultToSpeaker])
try? audioSession.setActive(true, options: [])
try? audioSession.overrideOutputAudioPort(.speaker)
Is there some kind of documentation for this that's off the beaten path that I'm somehow missing? I'm going to continue with guess and check, but I'm starting to think this flag - and the functionality it implies, actually wasn't ever fully implemented.
Post not yet marked as solved
I am using SpeechSynthesizer and SpeechRecognizer. After a recognition task completes, the SpeechSynthesizer stops producing audible output.
I am using the latest SwiftUI in Xcode 15.2, deploying to an iPhone 14 Pro running iOS 17.3.1.
Here's my SpeechSynthesizer function:
func speak(_ text: String) {
let utterance = AVSpeechUtterance(string: text)
utterance.voice = AVSpeechSynthesisVoice(identifier: self.appState.chatParameters.voiceIdentifer)
utterance.rate = 0.5
speechSynthesizer.speak(utterance)
}
And here's the code for setting up the SpeechRecognizer (borrowed from https://www.linkedin.com/pulse/transcribing-audio-text-swiftui-muhammad-asad-chattha):
private static func prepareEngine() throws -> (AVAudioEngine, SFSpeechAudioBufferRecognitionRequest) {
print("prepareEngine()")
let audioEngine = AVAudioEngine()
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = false
request.requiresOnDeviceRecognition = true
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.playAndRecord)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
let inputNode = audioEngine.inputNode
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {
(buffer: AVAudioPCMBuffer, when: AVAudioTime) in
request.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
return (audioEngine, request)
}
SpeechSynthesizer works fine as long as I don't call prepareEngine().
Thanks in advance for any assistance.
Post not yet marked as solved
Hello,
I’ve been trying to play system sounds in my app, but this hasn’t really been working. I am frequently switching between speech recognition (Speech framework) and sounds, so perhaps that’s where the issue lies. However, despite my best efforts, I haven't been able to solve the issue. I've been resetting the AVAudioSession category before playing a sound or starting speech recognition (as depicted in the code snippet below), to no avail. Has this happened to anyone else? Does anybody know how to fix the issue?
recognizer = nil
try? AVAudioSession.sharedInstance().setCategory(.playback, mode: .default, options: [])
try? AVAudioSession.sharedInstance().setActive(true)
AudioServicesPlaySystemSound(1113)
try? AVAudioSession.sharedInstance().setCategory(.record, mode: .spokenAudio, options: [])
try? AVAudioSession.sharedInstance().setActive(true)
recognizer = SpeechRecognition(word: wordSheet)
recognizer!.startRecognition()
Thank you.
Post not yet marked as solved
I'm trying to add a USB mic to my Mini runing the latest Sonoma software but it full of crackles. Why isn't it clean?
Post not yet marked as solved
I want to develop an AI assistant ios application using whisper and chatGPT OpenAI apis. I am implementing these following steps.
Audio-engine to record the user's voice
Send audio chunk to Whisper for Speech to Text
Send that text to chatgpt openAI to get response
Now sending that response to Speech Synthesizer to speak response through built-in speaker
In this process, i don't want to disable microphone. Because user can interrupt the speech synthesizer anytime he likes. It should be realtime and look like continuous call between the user and AI assistant.
Problem: When user speaks, microphone takes the input and appends into the audioengine recording file. Then sends that chunk to whisper for transcribing, transcribed text is then sent to chatgpt api to get response and response is sent to speech synthesiser which generates an output on speaker. Issue is that the microphone again takes synthesiser voice from speaker, and create a loop.
What should i possibly do to stop my microphone to not take the input from iphone speaker. Talking tom, callAnnie applications and many other ios applications are continuously using microphone and generating outputs from speaker without overlapping and loop. Suggest the possible ways.
I tried to set all possible ways for setting audio-engine category and settings with record, playback, playandrecord etc. Nothing gives me the solution to avoid speaker voice into my microphone. Technically as I think of microphone should never take the device generated voices. What could be the possible solution. If my approach is wrong also i am open to plenty suggestions and guidance.
Post not yet marked as solved
I have AVSpeechSynthesizer built in to 6 apps for iPad/iOS that were working fine until recently. Sometime between November 2023 and Feb 2024, they just quit speaking on all the apps for no apparent reason. There have been both XCode and iOS updates in the interim, but I cannot be sure which caused it. It doesn't work either in XCode on simulation, nor on devices.
What did Apple change?
XCode 15.2 iOS 17+ SwiftUI
let synth = AVSpeechSynthesizer()
var thisText = ""
func sayit(thisText: String) {
let utterance = AVSpeechUtterance(string: thisText)
utterance.voice = AVSpeechSynthesisVoice(language:"en-US")
utterance.rate = 0.4
utterance.preUtteranceDelay = 0.1
synth.speak(utterance)}
Post not yet marked as solved
I am trying to use the Speech Synthesizer to speak the pronunciation of a word in British English rather than play a local audio file which I had before. However, I keep getting this in the debugger:
#FactoryInstall Unable to query results, error: 5 Unable to list voice folder Unable to list voice folder Unable to list voice folder IPCAUClient.cpp:129 IPCAUClient: bundle display name is nil Unable to list voice folder
Here is my code, any suggestions??
` func playSampleAudio() {
let speechSynthesizer = AVSpeechSynthesizer()
let speechUtterance = AVSpeechUtterance(string: currentWord)
// Search for a voice with a British English accent.
let voices = AVSpeechSynthesisVoice.speechVoices()
var foundBritishVoice = false
for voice in voices {
if voice.language == "en-GB" {
speechUtterance.voice = voice
foundBritishVoice = true
break
}
}
if !foundBritishVoice {
print("British English voice not found. Using default voice.")
}
// Configure the utterance's properties as needed.
speechUtterance.rate = AVSpeechUtteranceDefaultSpeechRate
speechUtterance.pitchMultiplier = 1.0
speechUtterance.volume = 1.0
// Speak the word.
speechSynthesizer.speak(speechUtterance)
}
Post not yet marked as solved
Recently I updated to Xcode 14.0. I am building an iOS app to convert recorded audio into text. I got an exception while testing the application from the simulator(iOS 16.0).
[SpeechFramework] -[SFSpeechRecognitionTask handleSpeechRecognitionDidFailWithError:]_block_invoke Ignoring subsequent recongition error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)"
Error Domain=kAFAssistantErrorDomain Code=1107 "(null)"
I have to know what does the error code means and why this error occurred.
Post not yet marked as solved
I am using SFSpeechRecognizer to perform speech recognition, but I am getting the following error.
[SpeechFramework] -[SFSpeechRecognitionTask localSpeechRecognitionClient:speechRecordingDidFail:]_block_invoke Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)"
Setting requiresOnDeviceRecognition to False works correctly, but previously it worked with True with no error.
The value of supportsOnDeviceRecognition was True, so the device is recognizing that it supports speech recognition.
iPad Pro 11inch iOS 16.5.
Is this expected behavior?
Post not yet marked as solved
Is there a way to extract the list of words recognized by the Speech framework?
I'm trying to filter out words that won't appear in the transcription output, but to do that I'll need a list of words that can appear. SFSpeechLanguageModel.Configuration can be initialized with a vocabulary, but there doesn't seem to be a way to read it, and while there are ways to create custom vocabularies, I have yet to find a way to retrieve it.
I added the Natural Language tag in case the framework might contribute to a solution
Post not yet marked as solved
I'm working with the new speech recognition APIs in iOS 17 and have encountered some confusion regarding the use of URLs in SFSpeechLanguageModel.prepareCustomLanguageModel and the SFSpeechLanguageModel.Configuration.
In the SFSpeechLanguageModel.Configuration initializer, I provide a URL that points to a custom language model .bin file. However, there's also a URL parameter in the prepareCustomLanguageModel method. I'm unclear about the purpose of this second URL and how it differs from the one in the configuration.
To add to the confusion, the documentation for these new APIs is not fully fleshed out at this point. I've tried injecting both .bin files (for the custom language model and the one for prepareCustomLanguageModel) into the same URL, but the results haven't clarified their distinct roles.
In experiments I conducted, I checked the confidence level of recognized phrases from the same audio file with and without the custom language model .bin file. Surprisingly, the confidence levels remained the same in both scenarios, leading me to question if the custom model is being utilized correctly.
Has anyone else worked with these new APIs and can provide clarity on:
The distinct roles of the URLs in SFSpeechLanguageModel.Configuration and prepareCustomLanguageModel. Why there might be no noticeable difference in confidence levels when using a custom language model. Any insights or experiences with these new aspects of the iOS 17 speech recognition API would be greatly appreciated.
Post not yet marked as solved
Hy,
I'm French developer and I downloaded the Recognizing Speech in live Audio sample code from Developer Apple website. I tried to execute data generator command after changing the local identifier from 'en_US' to 'fr' in data generator main file , but when I ran the command in Xcode, I had this error message : " Identifier 'fr' does not parse into two elements."
I checked the xml files associated to the bin archive file and the identifiers are no correct (they keep 'en-US' value).
Thanks for your help !
Post not yet marked as solved
I see a lot of crashes on iOS 17 beta regarding some problem of "Text To Speech". Does anybody has a clue why TTS crashes? Anybody else seeing the same problem?
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x000000037f729380
Exception Codes: 0x0000000000000001, 0x000000037f729380
VM Region Info: 0x37f729380 is not in any region. Bytes after previous region: 3748828033 Bytes before following region: 52622617728
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
MALLOC_NANO 280000000-2a0000000 [512.0M] rw-/rwx SM=PRV
---> GAP OF 0xd20000000 BYTES
commpage (reserved) fc0000000-1000000000 [ 1.0G] ---/--- SM=NUL ...(unallocated)
Termination Reason: SIGNAL 11 Segmentation fault: 11
Terminating Process: exc handler [36389]
Triggered by Thread: 9
.....
Thread 9 name:
Thread 9 Crashed:
0 libobjc.A.dylib 0x000000019eeff248 objc_retain_x8 + 16
1 AudioToolboxCore 0x00000001b2da9d80 auoop::RenderPipeUser::~RenderPipeUser() + 112 (AUOOPRenderPipePool.mm:400)
2 AudioToolboxCore 0x00000001b2e110b4 -[AUAudioUnit_XPC internalDeallocateRenderResources] + 92 (AUAudioUnit_XPC.mm:904)
3 AVFAudio 0x00000001bfa4cc04 AUInterfaceBaseV3::Uninitialize() + 60 (AUInterface.mm:524)
4 AVFAudio 0x00000001bfa894bc AVAudioEngineGraph::PerformCommand(AUGraphNodeBaseV3&, AVAudioEngineGraph::ENodeCommand, void*, unsigned int) const + 772 (AVAudioEngineGraph.mm:3317)
5 AVFAudio 0x00000001bfa93550 AVAudioEngineGraph::_Uninitialize(NSError**) + 132 (AVAudioEngineGraph.mm:1469)
6 AVFAudio 0x00000001bfa4b50c AVAudioEngineImpl::Stop(NSError**) + 396 (AVAudioEngine.mm:1081)
7 AVFAudio 0x00000001bfa4b094 -[AVAudioEngine stop] + 48 (AVAudioEngine.mm:193)
8 TextToSpeech 0x00000001c70b3c5c __55-[TTSSynthesisProviderAudioEngine renderSpeechRequest:]_block_invoke + 1756 (TTSSynthesisProviderAudioEngine.m:613)
9 libdispatch.dylib 0x00000001ae4b0740 _dispatch_call_block_and_release + 32 (init.c:1519)
10 libdispatch.dylib 0x00000001ae4b2378 _dispatch_client_callout + 20 (object.m:560)
11 libdispatch.dylib 0x00000001ae4b990c _dispatch_lane_serial_drain + 748 (queue.c:3885)
12 libdispatch.dylib 0x00000001ae4ba470 _dispatch_lane_invoke + 432 (queue.c:3976)
13 libdispatch.dylib 0x00000001ae4c5074 _dispatch_root_queue_drain_deferred_wlh + 288 (queue.c:6913)
14 libdispatch.dylib 0x00000001ae4c48e8 _dispatch_workloop_worker_thread + 404 (queue.c:6507)
...
Thread 9 crashed with ARM Thread State (64-bit):
x0: 0x0000000283309360 x1: 0x0000000000000000 x2: 0x0000000000000000 x3: 0x00000002833093c0
x4: 0x00000002833093c0 x5: 0x0000000101737740 x6: 0x0000000000000013 x7: 0x00000000ffffffff
x8: 0x0000000283309360 x9: 0x3c788942d067009a x10: 0x0000000101547000 x11: 0x0000000000000000
x12: 0x00000000000007fb x13: 0x00000000000007fd x14: 0x000000001ee24020 x15: 0x0000000000000020
x16: 0x0000b1037f729360 x17: 0x000000037f729360 x18: 0x0000000000000000 x19: 0x0000000000000000
x20: 0x00000001016a8de8 x21: 0x0000000283e21d00 x22: 0x0000000283b3f1f8 x23: 0x0000000283098000
x24: 0x00000001bfb4fc35 x25: 0x00000001bfb4fc43 x26: 0x000000028033a688 x27: 0x0000000280c93090
x28: 0x0000000000000000 fp: 0x000000016fc86490 lr: 0x00000001b2da9d80
sp: 0x000000016fc863e0 pc: 0x000000019eeff248 cpsr: 0x1000
esr: 0x92000006 (Data Abort) byte read Translation fault
Post not yet marked as solved
I have a prototype web view (in a WKWebView) that uses webkitSpeechRecognition for getting short snippets of text from speech. I'm not thrilled with the quality of the "recognition" - the text generally isn't very accurate.
I'm wondering if I'll get any more accuracy by using the "native" SFSpeechRecognizer. It seems to me that webkitSpeechRecognition is likely just a Javascript wrapper interface for SFSpeechRecognizer, and the quality of the speech recognition won't improve.
Does anyone know for sure if this is the case? Does webKitSpeechRecognition on iOS use SFSpeechRecognizer under the hood? Or are they two completely different recognition systems, and one could be more accurate than the other?
Post not yet marked as solved
Application is getting Crashed: AXSpeech
EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x000056f023efbeb0
Crashed: AXSpeech
0 libobjc.A.dylib 0x4820 objc_msgSend + 32
1 libsystem_trace.dylib 0x6c34 _os_log_fmt_flatten_object + 116
2 libsystem_trace.dylib 0x5344 _os_log_impl_flatten_and_send + 1884
3 libsystem_trace.dylib 0x4bd0 _os_log + 152
4 libsystem_trace.dylib 0x9c48 _os_log_error_impl + 24
5 TextToSpeech 0xd0a8c _pcre2_xclass_8
6 TextToSpeech 0x3bc04 TTSSpeechUnitTestingMode
7 TextToSpeech 0x3f128 TTSSpeechUnitTestingMode
8 AXCoreUtilities 0xad38 -[NSArray(AXExtras)
ax_flatMappedArrayUsingBlock:] + 204
9 TextToSpeech 0x3eb18 TTSSpeechUnitTestingMode
10 TextToSpeech 0x3c948 TTSSpeechUnitTestingMode
11 TextToSpeech 0x48824
AXAVSpeechSynthesisVoiceFromTTSSpeechVoice
12 TextToSpeech 0x49804 AXAVSpeechSynthesisVoiceFromTTSSpeechVoice
13 Foundation 0xf6064 __NSThreadPerformPerform + 264
14 CoreFoundation 0x37acc CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION + 28
15 CoreFoundation 0x36d48 __CFRunLoopDoSource0 + 176
16 CoreFoundation 0x354fc __CFRunLoopDoSources0 + 244
17 CoreFoundation 0x34238 __CFRunLoopRun + 828
18 CoreFoundation 0x33e18 CFRunLoopRunSpecific + 608
19 Foundation 0x2d4cc -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 212
20 TextToSpeech 0x24b88 TTSCFAttributedStringCreateStringByBracketingAttributeWithString
21 Foundation 0xb3154 NSThread__start + 732
com.livingMedia.AajTakiPhone_issue_3ceba855a8ad2d1af83655803dc13f70_crash_session_9081fa41ced440ae9a57c22cb432f312_DNE_0_v2_stacktrace.txt
22 libsystem_pthread.dylib 0x24d4 _pthread_start + 136
23 libsystem_pthread.dylib 0x1a10 thread_start + 8
Post not yet marked as solved
My app listens for verbal commands "Roll" & "Skip". It was working well until I used it while listening to a podcast in another app.
I am getting a crash with the error: Thread 1: "required condition is false: IsFormatSampleRateAndChannelCountValid(format)" .
It crashes when I am playing audio from the apps Snipd (a podcast app) or the Apple Podcast app.
When I am playing audio from Youtube or the Apple Music it does not crash.
This is the code for when I start listening for the commands:
// MARK: - Speech Recognition
func startListening() {
do {
try configureAudioSession()
createRecognitionRequest()
try prepareAudioEngine()
} catch {
print("Audio Engine error: \(error.localizedDescription)")
}
}
private func configureAudioSession() throws {
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.playAndRecord, mode: .measurement, options: [.interruptSpokenAudioAndMixWithOthers, .duckOthers])
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
}
private func createRecognitionRequest() {
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequest = recognitionRequest else { return }
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: handleRecognitionResult)
}
private func prepareAudioEngine() throws {
let inputNode = audioEngine.inputNode
inputNode.removeTap(onBus: 0)
let inputFormat = inputNode.inputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: inputFormat) { [weak self] (buffer, _) in
self?.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
isActuallyListening = true
}
Thanks
Post not yet marked as solved
Hi Apple Team,
We have a technical query regarding one feature- Audio Recognition and Live captioning. We are developing an app for deaf community to avoid communication barriers.
We want to know if there is any possibility to recognize the sound from other applications in an iPhone and show live captions in our application (based on iOS).
Post not yet marked as solved
As the title suggests I am using AVAudioEngine for SpeechRecognition input & AVAudioPlayer for sound output.
Apple says in this talk https://developer.apple.com/videos/play/wwdc2019/510 that the setVoiceProcessingEnabled function very usefully cancels the output from speaker to the mic. I set voiceProcessing on the Input and output nodes.
It seems to work however the volume is low, even when the system volume is turned up. Any solution to this would be much appreciated.