[tw5] Speech to Text ... ²

papiche · May 6, 2024, 12:54pm

Investigating for this Speech to Text feature

Found that discussion.

And found transcribejs a “wasm” conversion of whisper.cpp that could make the thread up again

tw-FRed · May 6, 2024, 1:22pm

Thanks for the link!
The speech to text plugin uses the SpeechRecognition browser API, which according to MDN isn’t available on Firefox and might use server-side recognition engines on chrome-based browsers, preventing offline use.

Source: SpeechRecognition - Web APIs | MDN

Depending on available terminal resources, transcribejs might become a more thorough solution (local transcription AFAICT).

Fred

TW_Tones · May 6, 2024, 2:09pm

I have windows 11 and used to dislike the Microsoft speech to text, but it has improved a lot. This works from the operating system, so works across all browsers, without addons. Windows+H to start.

Today I had voice memos on my phone, which I played through my microphone and had a few KB of text generated with very few errors.
Otherwise I use the microphone icon on my mobiles keyborad to do the same in the google ecosystem.
There is already quite a good one for TiddlyWIki, installed, but using the Operating systems supplied ones have lower resource needs.

linonetwo · May 7, 2024, 8:26am

Do you use IME? Just like keyboard on mobile. I think most of IME provide STT feature.

And on mobile, everyone use keyboard, and most of them have STT out of box.

papiche · May 7, 2024, 4:06pm

Hey everyone,

Thanks for the insightful discussion and for sharing your experiences with speech-to-text (STT) solutions. It’s always fascinating to see how technology evolves and improves over time.

@tw-FRed, I appreciate the heads-up regarding the SpeechRecognition API and its limitations on different browsers. Indeed, having a solution that doesn’t rely on server-side processing could offer more flexibility and privacy. Transcribejs sounds promising in this regard, especially with its potential for local transcription.

@TW_Tones, your feedback on the Microsoft speech-to-text functionality is interesting. It’s great to hear that it’s becoming more reliable, and using native operating system features certainly has its advantages in terms of compatibility and resource efficiency.

Regarding your point about using IME (Input Method Editor) on mobile devices, it’s true that many IMEs now include STT capabilities, making it even more convenient for users to dictate text. This integration into everyday tools like keyboards further emphasizes the importance of accessibility and ease of use.

As for embedding a voice-to-text transcription application into TW, it sounds like a valuable addition for enhancing accessibility and productivity within the platform. Making such tools readily available can indeed contribute to their adoption as “common goods,” aligning with the principles of free software.

Thanks again for the enlightening conversation. Let’s continue exploring and sharing ideas to improve accessibility and usability in our digital tools.