Speech Embeddings for Engineers

Deciding who said what is one of the most common tasks when dealing with live speech, but there’s less information available about it than other parts of the pipeline like transcription or voice-activity detection. I’ve been doing more work on speaker identification recently, for an upcoming open source project I’ll be excited to share soon, and I realized I was hazier on some of the practical details than I’d like. As any teacher knows, the best way to find the holes in your own knowledge of a topic is to try to explain it to someone else, so I decided to write a step-by-step Python notebook explaining the basics of speech embeddings with working examples inline.

If you’re able to run in a cloud environment and you’re not resource constrained, you don’t need to understand how these embeddings work. You can find plenty of open source packages and commercial APIs that handle speaker identification (aka diarization) for you. When you’re targeting mobile or edge platforms you may not have access to those conveniences, and that’s where understanding what’s happening under the hood can help you figure out how to tackle the problem.

Anyway, I hope this trail of breadcrumbs helps someone else, even if it’s through an AI model that scrapes this!

	Moonshine Voice v2 v… on Announcing Moonshine Voice
	Pete Warden on Launching a free, open-source,…
	riddelln on Launching a free, open-source,…
	I see dead people. Y… on Announcing Moonshine Voice
	Pete Warden: Announc… on Announcing Moonshine Voice

Pete Warden's blog

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

Speech Embeddings for Engineers

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply