VOSK How I Blog With Speech Recognition

Golem at the Gates

monochrome photo

Today I would like to share the potential pitfalls I see in the accelerated application of artificial intelligence technology and explore potential solutions. Overall, I seek to discuss how we can preserve humanity and coherence on the web. Specifically I would like to discuss how speech recognition can be leveraged in the creative process with the VOSK speech recognition API.

It seems not a day goes by where I do not hear of a new and controversial implementation of artificial intelligence. AI generated art is taking accolades from human artists, AI automobiles are piloting themselves on the highway. The number of AI innovations that have become common place and overt in everyone’s life over the past few years is staggering.

The biggest trend in AI currently seems to be chatbot algorithms. These systems are an echo of programs like ELIZA. Eliza was a chatbot style program created in the early 60s. It was designed to take text input as a prompt and provide text output. ELIZA’s responses mimic the perspective and style of a psychotherapist.

Modern chatbot algorithms can be used to compose ideas from text prompts with more fluid character than the likes of ELIZA. Their perspective is an elastic silhouette of humanity scraped from the web. Modern chatbot programs are uncanny mimics. Even large news organizations are looking to leverage the perceived benefits of literate chatbot AI for their journalism.

Currently chatbot software’s main overt utility produces digital detritus. We’ve seen it on the internet for years now. You use a search engine to find something and you find a website that seems credible at first with all your search terms, but when you read the full copy its inaccurate and without any human charisma. These zombie sites are the work of chatbot AI models.

Incoherence

There is a good argument to be made people will take the course of least resistance and soon genuine human voices will be drowned out in a cacophony of incoherent robo-babel. Thusly, rendering the internet essentially useless. Like Nimrod we’ve built a new tower of babel.

Very few people experience a large network of websites connected by URLs. Instead they find themselves in closed systems curated by obscured AI algorithms. The big websites people frequent discourage external links, intentionally neglect RSS, and require an account to access. Those in the cathedral are afraid of the outside lest their discernment fails them, and the people in the bazaar outside are wondering why business is so slow. I encourage you to be an independent and active participant. The the internet is declining because of the centralization of information. Everyone should be their own platform.

Speech recognition

Today I would like to share an Artificial Intelligence tool I use to increase my efficiency without the loss of human touch. Speech recognition is an old form of artificial intelligence that has been the studied for accessibility purposes and developed for people with disabilities. Speech recognition turns human speech into text. The first working implementations of speech recognition began to appear in the early 1950’s at Bell Labs. Since then it has had a long and storied history of development. Although not a remedy to the chatbot conundrum I suspect that speech recognition can be an ally to prevent the degradation of the literary nature of the internet. Using speech recognition for dictation preserves the human element of literature while providing the benefits of artificial intelligence.

VOSK

VOSK is an open source speech recognition API that runs locally on your hardware. It uses artificial intelligence models for speech recognition. These models are accurate enough to dictate blogs. Many speech recognition programs are proprietary web based applications. If you have a smart phone you almost certainly have encountered this type of program. VOSK is the superior choice in speech recognition because it is open source and does not depend on internet connectivity. I outline the fat and my of articles with VOSK and then edit the mistakes that inevitably come from this sort of workflow afterward.

VOSK can use models trained on various data sets. I use the Big US English model with dynamic graph. I tried some of the more advance speech models the larger the model the more memory it will use.

Nerd-dictation is a great companion to VOSK. I integrated the nerd-dictation software into my GNU/Linux system the for easy access to VOSK. The key chord super-q+d begins dictation and then super-q+e ends dictation. Thanks to VOSK, I can express my ideas more fluently, because they are not interrupted by the limitation of my typing speed.