Back to Marketplace
FREE
Unvetted
Career Boost

Video Whisper — Local Video/Audio Transcription

Transcribe videos and audio locally on Apple Silicon using MLX Whisper. Supports YouTube, Bilibili, Xiaohongshu, Douyin, podcasts, and local files.

Install in one line

mfkvault install video-whisper-local-video-audio-transcription

Requires the MFKVault CLI. Prefer MCP?

New skill
No reviews yet
New skill
🦞 OpenClaw
FREE

Free to install — no account needed

Copy the command below and paste into your agent.

Instant access • No coding needed • No account needed

What you get in 5 minutes

  • Full skill code ready to install
  • Works with 1 AI agent
  • Lifetime updates included
SecureBe the first

Description

# Video Whisper — Local Video/Audio Transcription Transcribe videos and audio locally on Apple Silicon using [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper). Supports YouTube, Bilibili, Xiaohongshu, Douyin, podcasts, and local files. **Runs entirely on-device. No API keys. No cloud. No cost.** ## Requirements - **Apple Silicon Mac** (M1/M2/M3/M4) - [Homebrew](https://brew.sh) packages: `yt-dlp`, `ffmpeg` - Python venv with `mlx-whisper` ## Installation ```bash # 1. Install system dependencies brew install yt-dlp ffmpeg # 2. Create Python venv and install mlx-whisper python3 -m venv ~/.openclaw/venvs/whisper ~/.openclaw/venvs/whisper/bin/pip install mlx-whisper ``` ## Usage ### CLI ```bash bash scripts/transcribe.sh "<URL_or_FILE>" [model] ``` - **URL**: YouTube, Bilibili, Xiaohongshu, Douyin, or any yt-dlp supported site - **Local file**: `/path/to/video.mp4`, `/path/to/audio.wav`, etc. - **model** (optional): defaults to `mlx-community/whisper-medium-mlx` Output: - `/tmp/whisper_output.txt` — plain text transcript - `/tmp/whisper_output.json` — JSON with timestamps per segment ### Examples ```bash # YouTube video bash scripts/transcribe.sh "https://www.youtube.com/watch?v=dQw4w9WgXcQ" # Bilibili video bash scripts/transcribe.sh "https://www.bilibili.com/video/BV1xx411c7mD" # Local file bash scripts/transcribe.sh ~/Downloads/podcast.mp3 # Use large model for better accuracy bash scripts/transcribe.sh "https://youtu.be/xxx" mlx-community/whisper-large-v3-mlx ``` ### Custom Python Path If your `mlx-whisper` is installed in a non-standard location: ```bash export WHISPER_PYTHON=/path/to/your/venv/bin/python3 bash scripts/transcribe.sh "<URL>" ``` ## Available Models | Model | Size | Speed (10min video) | Best For | |-------|------|---------------------|----------| | `mlx-community/whisper-small-mlx` | ~460MB | ~20s | Quick drafts, English | | `mlx-community/whisper-medium-mlx` | ~1.5GB | ~60-90s | **Recommended** — good balance | | `mlx-community/whisper-large-v3-mlx` | ~3GB | ~90-120s | Best accuracy, multilingual | First run downloads the model to `~/.cache/huggingface/hub/` (cached for future use). ## Performance (Mac mini M4, 16GB) | Video Length | medium | large-v3 | |-------------|--------|----------| | 5 min | ~30-40s | ~50-60s | | 10 min | ~60-90s | ~90-120s | | 30 min | ~3-4 min | ~5-6 min | | 60 min | ~6-8 min | ~10-12 min | ## OpenClaw Integration Drop this skill into your OpenClaw workspace: ```bash cp -r video-whisper ~/.openclaw/workspace/skills/ ``` Then ask your agent: *"帮我转录这个视频 https://..."* The agent will run the script, read the output, and summarize or analyze as needed. ## Notes - Chinese content: use `medium` or `large-v3` (small is weak on Chinese) - Xiaohongshu/Douyin: may need browser cookies (`--cookies-from-browser chrome`) - Long videos (>1h): consider running in background - All temp files in `/tmp/`, cleaned up automatically ## License MIT

Preview in:

Security Status

Unvetted

Not yet security scanned

Time saved
How much time did this skill save you?

Related AI Tools

More Career Boost tools you might like