Video Whisper — Local Video/Audio Transcription

Name: Video Whisper — Local Video/Audio Transcription
Brand: MFKVault
Availability: InStock

Transcribe videos and audio locally on Apple Silicon using MLX Whisper. Supports YouTube, Bilibili, Xiaohongshu, Douyin, podcasts, and local files.

Install in one line

CLI

$ mfkvault install video-whisper-local-video-audio-transcription

Requires the MFKVault CLI. Prefer MCP?

New skill

No reviews yet

New skill

🦞 OpenClaw

FREE

Free to install — no account needed

Copy the command below and paste into your agent.

Instant access • No coding needed • No account needed

What you get in 5 minutes

Full skill code ready to install
Works with 1 AI agent
Lifetime updates included

SecureBe the first

Description

# Video Whisper — Local Video/Audio Transcription Transcribe videos and audio locally on Apple Silicon using [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper). Supports YouTube, Bilibili, Xiaohongshu, Douyin, podcasts, and local files. **Runs entirely on-device. No API keys. No cloud. No cost.** ## Requirements - **Apple Silicon Mac** (M1/M2/M3/M4) - [Homebrew](https://brew.sh) packages: `yt-dlp`, `ffmpeg` - Python venv with `mlx-whisper` ## Installation ```bash # 1. Install system dependencies brew install yt-dlp ffmpeg # 2. Create Python venv and install mlx-whisper python3 -m venv ~/.openclaw/venvs/whisper ~/.openclaw/venvs/whisper/bin/pip install mlx-whisper ``` ## Usage ### CLI ```bash bash scripts/transcribe.sh "<URL_or_FILE>" [model] ``` - **URL**: YouTube, Bilibili, Xiaohongshu, Douyin, or any yt-dlp supported site - **Local file**: `/path/to/video.mp4`, `/path/to/audio.wav`, etc. - **model** (optional): defaults to `mlx-community/whisper-medium-mlx` Output: - `/tmp/whisper_output.txt` — plain text transcript - `/tmp/whisper_output.json` — JSON with timestamps per segment ### Examples ```bash # YouTube video bash scripts/transcribe.sh "https://www.youtube.com/watch?v=dQw4w9WgXcQ" # Bilibili video bash scripts/transcribe.sh "https://www.bilibili.com/video/BV1xx411c7mD" # Local file bash scripts/transcribe.sh ~/Downloads/podcast.mp3 # Use large model for better accuracy bash scripts/transcribe.sh "https://youtu.be/xxx" mlx-community/whisper-large-v3-mlx ``` ### Custom Python Path If your `mlx-whisper` is installed in a non-standard location: ```bash export WHISPER_PYTHON=/path/to/your/venv/bin/python3 bash scripts/transcribe.sh "<URL>" ``` ## Available Models | Model | Size | Speed (10min video) | Best For | |-------|------|---------------------|----------| | `mlx-community/whisper-small-mlx` | ~460MB | ~20s | Quick drafts, English | | `mlx-community/whisper-medium-mlx` | ~1.5GB | ~60-90s | **Recommended** — good balance | | `mlx-community/whisper-large-v3-mlx` | ~3GB | ~90-120s | Best accuracy, multilingual | First run downloads the model to `~/.cache/huggingface/hub/` (cached for future use). ## Performance (Mac mini M4, 16GB) | Video Length | medium | large-v3 | |-------------|--------|----------| | 5 min | ~30-40s | ~50-60s | | 10 min | ~60-90s | ~90-120s | | 30 min | ~3-4 min | ~5-6 min | | 60 min | ~6-8 min | ~10-12 min | ## OpenClaw Integration Drop this skill into your OpenClaw workspace: ```bash cp -r video-whisper ~/.openclaw/workspace/skills/ ``` Then ask your agent: *"帮我转录这个视频 https://..."* The agent will run the script, read the output, and summarize or analyze as needed. ## Notes - Chinese content: use `medium` or `large-v3` (small is weak on Chinese) - Xiaohongshu/Douyin: may need browser cookies (`--cookies-from-browser chrome`) - Long videos (>1h): consider running in background - All temp files in `/tmp/`, cleaned up automatically ## License MIT

Preview in:

Security Status

Unvetted

Not yet security scanned

Time saved

How much time did this skill save you?

Related AI Tools

More Career Boost tools you might like

ru-text — Russian Text Quality

Free

Applies professional Russian typography, grammar, and style rules to improve text quality across content types

/forge：工作流总入口

Free

'Forge 工作流总入口。检查项目状态，推荐下一步该用哪个 skill。任何时候不知道下一步该干什么，就用 /forge。触发方式：用户说"forge"、"下一步"、"接下来做什么"、"继续"（在没有明确上下文时）。'

TypeScript React & Next.js Production Patterns

Free

Production-grade TypeScript reference for React & Next.js covering type safety, component patterns, API validation, state management, and debugging

Charles Proxy Session Extractor

Free

Extracts HTTP/HTTPS request and response data from Charles Proxy session files (.chlsj format), including URLs, methods, status codes, headers, request bodies, and response bodies. Use when analyzing captured network traffic from Charles Proxy debug

Java Backend Interview Simulator

Free

Simulates realistic Java backend technical interviews with customizable interviewer styles and candidate levels for Chinese tech companies

AI News & Trends Intelligence

Free

Fetches latest AI/ML news, trending open-source projects, and social media discussions from 75+ curated sources for comprehensive AI briefings