To complete this task, we'll use OpenAI's Whisper model, which is open-source and can be run locally. Whisper provides advanced transcription capabilities, and you can run it using Python. We'll build a Docker image that includes Whisper and Python, and a script to handle the transcription task.

1. Dockerfile for the custom image

Create a Dockerfile for building the custom image:

# Use a Python base image
FROM python:3.9-slim

# Install required dependencies
RUN apt-get update && apt-get install -y ffmpeg && \
    pip install --no-cache-dir torch torchaudio openai-whisper

# Create the working directory
WORKDIR /var/script

# Copy the script into the Docker image
COPY scriptYouBuild.py /var/script/

# Set the entrypoint to run the script
ENTRYPOINT ["python", "scriptYouBuild.py"]

2. Transcription Script (scriptYouBuild.py)

This Python script will handle the transcription of the audio file using Whisper:

import whisper
import sys
import os

def transcribe(input_file, output_file=None):
    # Load the Whisper model
    model = whisper.load_model("base")
    
    # Transcribe the audio file
    result = model.transcribe(input_file)

    # Output transcription to STDOUT
    print(result['text'])

    # Write transcription to the output file if specified
    if output_file:
        with open(output_file, 'w') as f:
            f.write(result['text'])

def main():
    if len(sys.argv) < 2:
        print("Usage: python scriptYouBuild.py <input_file> [output_file]")
        sys.exit(1)

    input_file = sys.argv[1]
    output_file = sys.argv[2] if len(sys.argv) > 2 else None

    if not os.path.exists(input_file):
        print(f"Input file '{input_file}' does not exist.")
        sys.exit(1)

    transcribe(input_file, output_file)

if __name__ == "__main__":
    main()

3. Build the Docker Image

Save both files in the same directory and build the Docker image:

docker build -t custom-whisper-image:latest .

4. Running the Docker Container

Now, you can run the container using the following command:

docker run --rm -i -v $(pwd)/myfile.mp4:/var/script/input.mp4 -v $(pwd)/myfile.txt:/var/script/output.txt custom-whisper-image:latest /var/script/input.mp4 /var/script/output.txt
  • Explanation:
    • -v $(pwd)/myfile.mp4:/var/script/input.mp4: Mounts the input file as read-only.
    • -v $(pwd)/myfile.txt:/var/script/output.txt: Mounts the output file path as read-write.
    • custom-whisper-image:latest: The Docker image we built.
    • /var/script/input.mp4 /var/script/output.txt: Passes the input and output paths as arguments.

Notes:

  • The output will be shown in STDOUT and written to the specified output file (myfile.txt).
  • Make sure to adjust permissions if needed, especially for Rocky Linux.

This setup should provide a fully functional transcription service using advanced open-source technologies within the Docker container.

Tutorial - Transcript MP4 to TXT using Docker, AI and Python