learn-video
Extract reusable knowledge from a YouTube/TikTok/Vimeo URL — a 6-stage audio+visual fused timeline (ingest → transcribe → probe → target/summary → vision → classify → fuse). Hands off to /learn-eval so the output lands in the right bin (skill, rule,
Free to install — no account needed
Copy the command below and paste into your agent.
Instant access • No coding needed • No account needed
What you get in 5 minutes
- Full skill code ready to install
- Works with 2 AI agents
- Lifetime updates included
Description
--- name: learn-video description: Extract reusable knowledge from a YouTube/TikTok/Vimeo URL — a 6-stage audio+visual fused timeline (ingest → transcribe → probe → target/summary → vision → classify → fuse). Hands off to /learn-eval so the output lands in the right bin (skill, rule, tip, note, discard). Use this when the user shares a video URL and wants extracted notes, not when they want the video downloaded or clipped. license: MIT. See LICENSE. --- # learn-video Pull a video from any `yt-dlp`-supported site (1800+ including YouTube, TikTok, Vimeo), align its audio with what's on screen, and produce a `fused.md` timeline ready for `/learn-eval` to save as a `SKILL.md`, rule, tip, or note. Unlike transcript-only skills, this one uses an LLM to pick which timestamps need a visual reference, extracts just those frames with ffmpeg, and asks a vision model to fuse each frame with its transcript window. The output reads as `AUDIO / VISUAL / FUSED` blocks aligned to `[mm:ss]` markers. ## When to use - User shares a tutorial, talk, or short-form dev clip and wants the knowledge, not the video. - User wants a consistent pipeline from many videos (e.g. an entire creator's channel) into the ECC knowledge system. - Output needs to flow into `/learn-eval` — the `recommended-form:` header in `fused.md` is designed for that handoff. ## When *not* to use - User wants the video file itself, subtitles, or clipped highlights — use `yt-dlp` or a clipping tool directly. - Only the raw transcript is needed — a lightweight transcript skill is faster (no vision calls, no rate limiting). - Live streams or content gated behind authentication. ## Quick start ```bash # one-time setup (from this repo) ./setup.sh # run it python -m learn_video.cli run "https://www.youtube.com/watch?v=<id>" # flags python -m learn_video.cli run "<url>" \ --tier=lite|pro|max # model quality preset --model=<provider:model> # override vision model (highest precedence) --offline # route to local Ollama --fresh # bypass cache --short # force whole-video File API upload (fails if >60s) --notes-only # skip /learn-eval handoff ``` Stdout is the path to `fused.md`. Stderr carries `[N/6 STAGE]` progress lines. Claude should read `fused.md` after the CLI returns and hand off to `/learn-eval` unless `--notes-only`. ## Invocation from the slash command The slash command `~/.claude/commands/learn-video.md` drives Claude to: 1. Parse the URL and flags. 2. Shell out to `python -m learn_video.cli run "$URL" [flags]` from the repo root (or wherever `learn_video` is importable). 3. Read the resulting `fused.md`. 4. Summarize the timeline for the user (title, `video-kind`, `recommended-form`, block count, 1–2 highlight `FUSED:` lines). 5. Unless `--notes-only`, invoke `/learn-eval` with the `fused.md` contents. ## Pipeline stages (in `learn_video/`) | Stage | Module | Purpose | |---|---|---| | 1. Ingest | `ingest.py` | `yt-dlp` metadata probe + download; picks 720p/480p/360p by duration | | 2. Transcribe | `transcribe.py` | Platform auto-captions first, `faster-whisper small.en` fallback | | Probe | `probe.py` | Classifies video as `visual | audio | mixed` from 5 sample frames | | 3a. Target | `target.py` | LLM picks 3–15 timestamps worth a frame (sliding 15-min windows for >25-min videos) | | 3b. Summary | `summary.py` | Audio-first path: chapter-style transcript summary, no frames | | 4. Keyframes | `keyframes.py` | `ffmpeg` extracts one JPEG per target timestamp | | 5. Vision | `vision.py` | Per-frame fusion, 6-way concurrent; plus whole-video File API path for <60s clips | | Classify | `classify.py` | Suggests `recommended-form`: `skill | rule | tip | note | discard` | | 6. Fuse | `fuse.py` | Composes `fused.md` with headers + timeline | ## Model portability Only `model_client.py` imports `langchain-*`. Every stage calls: - `build_chat_model(model_id, **kwargs)` — `<provider>:<model>` id strings (e.g. `google_genai:gemini-flash-lite-latest`, `anthropic:claude-opus-4-7`, `ollama:qwen2.5vl:3b`). - `invoke_structured(model, schema, messages)` — Pydantic schema validation with a `json_repair` fallback and an empty-response short-circuit. - `invoke_vision(model, vision_input)` — portable `image_url` + data-URL shape that works on Gemini, Claude, and Ollama; `video_path` is Gemini-only (File API upload). A sliding-window rate limiter (13 req / 60s on Flash Lite) gates every provider call. To swap models, edit `TIER_MODELS` in `config.py`. Stage code does not change. ## Tier → model mapping | `--tier` | Targeting | Vision | Quota notes | |---|---|---|---| | `lite` (default) | `gemini-flash-lite-latest` | `gemini-flash-lite-latest` | ~100 videos/day on free 500 RPD | | `pro` | `gemini-flash-lite-latest` | `gemini-flash-latest` | 20 RPD cap on Flash | | `max` | `gemini-flash-lite-latest` | `claude-opus-4-7` | Uses Claude quota | | `--offline` | `qwen2.5vl:3b` | `qwen2.5vl:3b` | Local Ollama, slow | Targeting always stays on Flash Lite — it's a cheap text task, and keeping the premium budget for vision is almost always correct. ## Cache `~/.claude/cache/learn-video/<video-id>/` contains `meta.json`, `video.*`, `captions.vtt`, `transcript.json`, `targets.json`, `frames/*.jpg`, `fused.md`. Every stage checks its artifact before running — re-runs on the same URL skip straight to whatever changed. ```bash python -m learn_video.cli cache-info # list cached videos python -m learn_video.cli cache-clean <video-id> # remove one python -m learn_video.cli cache-clean all # nuke everything ``` ## Error taxonomy | Class | Behavior | Example | |---|---|---| | `TransientError` | Auto-retries (tenacity: 5s, 10s, 20s with jitter; max 3) | 429, `httpx.ReadError`, timeout | | `ConfigurationError` | Fail fast with fix hint | missing `GEMINI_API_KEY` | | `EnvironmentError_` | Fail fast with install command | `ffmpeg not found` | | `TargetError` | Unrecoverable | DRM-blocked, deleted video | ## Requirements - Python 3.13 (tested; 3.11+ should work — `tomllib` fallback to `tomli` already in place) - `ffmpeg` on PATH - `yt-dlp` on PATH (installed via `pip install -r requirements.txt`) - `GEMINI_API_KEY` (or `GOOGLE_API_KEY`) for the default `lite` / `pro` tiers - `ANTHROPIC_API_KEY` if using `--tier=max` or `--model anthropic:*` - Local Ollama with `qwen2.5vl:3b` pulled if using `--offline` See `README.md` for a full architecture diagram and the 12-video demo output.
Security Status
Unvetted
Not yet security scanned
Related AI Tools
More Save Money tools you might like
Family History Research Planning Skill
FreeProvides assistance with planning family history and genealogy research projects.
Naming Skill
FreeName products, SaaS, brands, open source projects, bots, and apps. Use when the user needs to name something, find a brand name, or pick a product name. Metaphor-driven process that produces memorable, meaningful names and avoids AI slop.
Profit Margin Calculator
$7.99Find hidden profit leaks — see exactly where your money goes
guard-scanner
Free"Security scanner and runtime guard for OpenClaw skills, MCP servers, and AI agent workflows. Detects prompt injection, identity hijacking, memory poisoning, A2A contagion, secret leaks, supply-chain abuse, and dangerous tool calls with 364 static th
bbc-skill — Bilibili Comment Collector
FreeFetch Bilibili (哔哩哔哩) video comments for UP主 self-analysis. Use when the user asks to collect, download, export, or analyze comments on a Bilibili video (BV号 / URL / UID). Produces JSONL + summary.json suitable for further Claude Code analysis (senti
Life OS · Personal Decision Engine
Free"A personal decision engine with 16 independent AI agents, checks and balances, and swappable cultural themes. Covers relationships, finance, learning, execution, risk control, health, and infrastructure. Use when facing complex personal decisions (c