DeepSpeech Python

How to design your Speech model in 3 steps

--

Speech recognition technology has advanced in recent years through the COVID-19 pandemic and the recent rise of AI technologies like ChatGPT. This guide will walk you through designing your STT model using the DeepSpeech framework. It’s an introduction to the programming task of speech recognition, where we use deep learning to create powerful solutions to problems with just a bit of coding.

Prerequisites
Before starting, you should have some stuff in place:

  1. Python Knowledge: You should be comfortable working with Python, as the DeepSpeech tools are mostly Python.
  2. Conda Environments: We’ll use a Conda environment to manage the installs. Make sure you have Conda installed on your system.
  3. Audio Data: You’ll need audio data to test your STT model. You can use your recordings or find open-source audio datasets.

Now let us get started:

Step 1: Setting up a Conda Environment

Setting up an environment is easy and helps us avoid conflicts. Here are the steps:
1.1 Create a Conda Env:

conda create --name deepspeech python=3.8

1.2. Activate the Env:

conda activate deepspeech

1.3. Install Dependencies:

pip install deepspeech torch torchaudio

Ensure you install the correct versions based on your system and requirements.

Step 2: Downloaded the pre-trained models

2.1. Download the Model:

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm

2.2. And the Scorer:

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer

We are using the latest from DeepSpeech docs.

The Code

import deepspeech
import wave
import os

# Adjust these paths
MODEL_PATH = "path/to/your/deepspeech-model.pbmm"
SCORER_PATH = "path/to/your/deepspeech-model.scorer"
AUDIO_DIR = "path/to/your/audio_files"
TRANSCRIPTION_DIR = "path/to/your/transcriptions"

os.makedirs(TRANSCRIPTION_DIR, exist_ok=True)

ds = deepspeech.Model(MODEL_PATH)
ds.enableExternalScorer(SCORER_PATH)

def transcribe_audio(file_path):
with wave.open(file_path, 'rb') as wf:
rate = wf.getframerate()
frames = wf.readframes(wf.getnframes())
transcription = ds.stt(frames, rate)
return transcription

for audio_file in os.listdir(AUDIO_DIR):
if audio_file.endswith(".wav"):
file_path = os.path.join(AUDIO_DIR, audio_file)
transcription = transcribe_audio(file_path)
base_name = os.path.splitext(audio_file)[0]
transcription_file_path = os.path.join(TRANSCRIPTION_DIR, f"{base_name}.txt")
with open(transcription_file_path, "w") as f:
f.write(transcription)

print("All done! Check your transcriptions at your transcription directory")

Step 3: Running the Python Script

With your environment set up and models in place, you’re ready to transcribe:

  1. Load Audio Files: Ensure your audio files are in the designated directory.
  2. Adjust the Script: Make sure the paths in the script point to your model, scorer, and audio files.
  3. Run the Script: Inside the Conda environment, python your_script_name.py.

And kapai! Your transcriptions should be waiting for you in the specified directory. Happy coding!

Connect

If you found this article helpful and you wish to show a little support, you could:

  1. Clap 50 times for this story
  2. Leave a comment telling me what you think

--

--