Interactive Transcripts with WebVTT and Audio or Video

With the availability of Whisper.cpp I've found myself making transcripts of hour long podcasts and other media and using javascript and HTML 5 have made some locally running websites that can play my audio or video with the transcript below where I can click anywhere in the transcript and the media will jump to that location.

To do this I have to manually code in the files in my local HTML file as I am not doing any actual web hosting and wanted to see how would I do this in Swift either as a playground or as an application.

I'm not a programmer by profession so I'm out of my depth but this is for personal use on my Mac and possibly on my iPad.

Any help or helpful links would be appreciated that could help me as I'd need to: -add audio or video file (not streaming) -add webvtt file -show media controls -show transcript with time code hidden for readability -sync and highlight audio that is spoken -jump/skip media when another part of transcript is tapped or clicked

And lastly more of a wish list item but ability to save paired media within app as a project and be able to store hundreds more and just tap on a project and have it load the accompanying saved media and vtt I've loaded.

An HTML example of this I got working locally in an HTML file was inspired by the following: https://github.com/umd-mith/webvtt-player Demo Page: https://umd-mith.github.io/webvtt-player/index.html