How to design your Speech model in 3 steps

2 min readOct 22, 2023

Speech recognition technology has advanced in recent years through the COVID-19 pandemic and the recent rise of AI technologies like ChatGPT. This guide will walk you through designing your STT model using the DeepSpeech framework. It’s an introduction to the programming task of speech recognition, where we use deep learning to create powerful solutions to problems with just a bit of coding.

Prerequisites
Before starting, you should have some stuff in place:

Python Knowledge: You should be comfortable working with Python, as the DeepSpeech tools are mostly Python.
Conda Environments: We’ll use a Conda environment to manage the installs. Make sure you have Conda installed on your system.
Audio Data: You’ll need audio data to test your STT model. You can use your recordings or find open-source audio datasets.

Now let us get started:

Step 1: Setting up a Conda Environment

Setting up an environment is easy and helps us avoid conflicts. Here are the steps:
1.1 Create a Conda Env:

conda create --name deepspeech python=3.8

1.2. Activate the Env:

conda activate deepspeech

1.3. Install Dependencies:

pip install deepspeech torch torchaudio

Ensure you install the correct versions based on your system and requirements.

Step 2: Downloaded the pre-trained models

2.1. Download the Model:

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm

2.2. And the Scorer:

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer

We are using the latest from DeepSpeech docs.

The Code

import deepspeech
import wave
import os

# Adjust these paths
MODEL_PATH = "path/to/your/deepspeech-model.pbmm"
SCORER_PATH = "path/to/your/deepspeech-model.scorer"
AUDIO_DIR = "path/to/your/audio_files"
TRANSCRIPTION_DIR = "path/to/your/transcriptions"

os.makedirs(TRANSCRIPTION_DIR, exist_ok=True)

ds = deepspeech.Model(MODEL_PATH)
ds.enableExternalScorer(SCORER_PATH)

def transcribe_audio(file_path):
    with wave.open(file_path, 'rb') as wf:
        rate = wf.getframerate()
        frames = wf.readframes(wf.getnframes())
        transcription = ds.stt(frames, rate)
    return transcription

for audio_file in os.listdir(AUDIO_DIR):
    if audio_file.endswith(".wav"):
        file_path = os.path.join(AUDIO_DIR, audio_file)
        transcription = transcribe_audio(file_path)
        base_name = os.path.splitext(audio_file)[0]
        transcription_file_path = os.path.join(TRANSCRIPTION_DIR, f"{base_name}.txt")
        with open(transcription_file_path, "w") as f:
            f.write(transcription)

print("All done! Check your transcriptions at your transcription directory")

Step 3: Running the Python Script

With your environment set up and models in place, you’re ready to transcribe:

Load Audio Files: Ensure your audio files are in the designated directory.
Adjust the Script: Make sure the paths in the script point to your model, scorer, and audio files.
Run the Script: Inside the Conda environment, python your_script_name.py.

And kapai! Your transcriptions should be waiting for you in the specified directory. Happy coding!

Connect

If you found this article helpful and you wish to show a little support, you could:

Clap 50 times for this story
Leave a comment telling me what you think

How to design your Speech model in 3 steps

Prerequisites
Before starting, you should have some stuff in place:

Step 1: Setting up a Conda Environment

Step 2: Downloaded the pre-trained models

The Code

Step 3: Running the Python Script

Written by Lui Hellesoe - Aotearoa New Zealand

No responses yet

How to design your Speech model in 3 steps

PrerequisitesBefore starting, you should have some stuff in place:

Step 1: Setting up a Conda Environment

Step 2: Downloaded the pre-trained models

The Code

Step 3: Running the Python Script

Written by Lui Hellesoe - Aotearoa New Zealand

No responses yet

Prerequisites
Before starting, you should have some stuff in place: