Watch Video
SLASH-COMMAND-ONLY. Invoke ONLY when the user explicitly types the literal /watch-video slash command. Never auto-trigger on phrases like "watch this video," "summarize this video," "take notes on this video," or on a bare YouTube URL. For ad-hoc You
Free to install β no account needed
Copy the command below and paste into your agent.
Instant access β’ No coding needed β’ No account needed
What you get in 5 minutes
- Full skill code ready to install
- Works with 1 AI agent
- Lifetime updates included
Description
--- name: watch-video description: SLASH-COMMAND-ONLY. Invoke ONLY when the user explicitly types the literal `/watch-video` slash command. Never auto-trigger on phrases like "watch this video," "summarize this video," "take notes on this video," or on a bare YouTube URL. For ad-hoc YouTube questions, fetch the transcript directly (YouTube page or yt-dlp) and skim β do not run this heavyweight pipeline. --- # Watch Video Claude can't stream video directly. This skill fakes it: a Python pipeline (vendored from [bradautomates/claude-video](https://github.com/bradautomates/claude-video) under `scripts/`) downloads the video, extracts auto-scaled JPEG frames with ffmpeg, pulls a timestamped transcript (native captions first, Whisper API fallback), and prints a markdown report listing every frame path. Claude then `Read`s each frame, aligns it to the spoken text, and writes a structured notes file. ## When to invoke **Slash-command only.** Run this skill ONLY when the user literally types `/watch-video`. That is the sole trigger. Do NOT invoke on: - Casual phrases like "watch this video," "summarize this," "take notes on this YouTube video," "analyze this reel" - A bare YouTube URL pasted with a question ("what useful tips are in this video?" + URL) - Any natural-language ask about a video that lacks the explicit `/watch-video` command **Default for YouTube questions without the slash command:** pull the transcript the fastest way available β fetch the YouTube page / use yt-dlp captions / a transcript site β skim it, and answer from that. The frame-extraction pipeline is overkill unless the user explicitly asks for it via `/watch-video`. > If you'd rather have auto-trigger behavior, edit the `description:` line above to match the keywords you want Claude to fire on (e.g. "Use when the user wants Claude to watch, analyze, or take notes on a video"). Slash-only is the default in this repo because explicit invocation prevents accidental token burn on long videos. ## Dependencies - **ffmpeg + ffprobe** on PATH β for frame and audio extraction - **yt-dlp** on PATH β for downloading and caption fetching - **Python 3.9+** β the bundled scripts use `from __future__ import annotations` so 3.9 works - **Optional:** Whisper API key for videos without native captions. Set `GROQ_API_KEY` (preferred β cheaper/faster, runs `whisper-large-v3`) or `OPENAI_API_KEY` in `~/.config/watch/.env`. Without one, captioned videos work fine; uncaptioned videos return frames-only. Run `python scripts/setup.py --check` to verify dependencies, or `python scripts/setup.py` to scaffold the `.env` and check binaries. On macOS, the installer auto-installs missing binaries via Homebrew. On Linux/Windows, it prints exact install commands. ## Pipeline The work happens in `scripts/watch.py`. It downloads, extracts, transcribes, and prints a markdown report to stdout that lists every frame path. The pipeline auto-scales the frame budget by duration (hard cap 100 frames / 2 fps), so no manual interval tuning. ``` python scripts/watch.py "<youtube-url-or-local-path>" [flags] ``` Flags worth knowing: - `--start T` / `--end T` β focus on a section (`SS`, `MM:SS`, or `HH:MM:SS`). Auto-scales fps denser inside the range. Use this for any question about a specific moment, or for any video > 10 min where the user's question is about one part. - `--max-frames N` β lower the cap for tighter token budget (default 80, hard max 100). - `--resolution W` β frame width in px (default 512; bump to 1024 only if on-screen text is unreadable). - `--whisper groq|openai` β force a specific Whisper backend (default: prefer Groq if both keys exist). - `--no-whisper` β skip transcription entirely if no captions. Frames-only output. - `--out-dir DIR` β keep working files somewhere specific (default: an auto-generated tmp dir). Auto-fps budgets (full-video mode): - β€30s β up to 30 frames - 30sβ1min β ~40 frames - 1β3min β ~60 frames - 3β10min β ~80 frames - \>10min β 100 frames sparse (warning printed; consider `--start`/`--end`) ## Step-by-step workflow ### 1. Run the pipeline Default invocation, no flags: ``` python scripts/watch.py "<source>" ``` For long videos where the user asked about a specific moment, pass `--start`/`--end`: ``` python scripts/watch.py "<source>" --start 2:15 --end 2:45 ``` The script writes everything to a tmp working directory and prints a markdown report to stdout. Capture the stdout β it contains: - Header (Title, Uploader, Duration, Transcript source: `captions` / `whisper (groq)` / `whisper (openai)` / `none available`) - `## Frames` section with `- \`<absolute-path>\` (t=MM:SS)` lines - `## Transcript` section with `[MM:SS] text...` lines - Footer with `Work dir: <path>` ### 2. Read every frame Read all the listed frame paths in a single message (parallel `Read` tool calls). The Read tool renders JPEGs as images. Each frame's filename + `t=MM:SS` from the report tells you when it occurred β pair each frame with the matching transcript line at that timestamp. For very long videos (>10 min, sparse mode): the budget already capped at 100 frames, so reading all of them is fine. ### 3. Write the summary Output file: `<work-dir>/<slug>-notes.md` by default. If the user said "save notes somewhere permanent," ask where. Structure the markdown like this: ```markdown # <Title> **Source:** <URL or local path> **Duration:** <mm:ss> **Uploader:** <if from YouTube> **Transcript source:** <captions / whisper (groq) / whisper (openai) / none> ## One-line summary <β€20 words β the core claim or hook of the video> ## TL;DR <3β5 bullet points capturing the main arguments, moments, or beats> ## Timeline - **[00:00]** <what's happening visually + the key line being said> - **[00:15]** ... <one row per meaningful beat, not per frame> ## Key quotes > "<verbatim quote>" β [mm:ss] ## Visual notes <what the video shows that the transcript alone would miss β setting, B-roll, on-screen text, graphics, transitions, subject's emotion> ## Takeaways (optional) <Only include if the content is directly relevant to the user's domain or goals. Omit otherwise.> ``` ### 4. Clean up After the `.md` is written, delete the work dir (it contains the full downloaded video + frames, which is large). The script prints `Work dir: <path>` in the footer of its report; pass that path to `rm -rf`. If the user specified a non-tmp `--out-dir`, ask before deleting. ## Common gotchas - **YouTube Shorts / age-gated / members-only** β yt-dlp may fail. Surface its stderr verbatim; don't retry silently. - **No captions + no Whisper key** β the report says `Transcript: none available` and points at `setup.py`. Tell the user they can add a Groq key to `~/.config/watch/.env` for Whisper, or use `--no-whisper` for frames-only. - **Local file with no audio track** β Whisper extraction errors out cleanly. Use `--no-whisper` for frames-only. - **Very long videos (>30 min)** β confirm with the user before running. The pipeline caps at 100 frames so the budget is bounded, but a sparse 100-frame scan of a 60-min video isn't very useful. Almost always better to run focused on the specific section. - **Cloudflare 403 on Groq** β `whisper.py` already sets a custom User-Agent to clear Cloudflare's default-Python-UA block. If you ever see a 403, that's the failure mode. - **No automatic Groq β OpenAI fallback** β the script picks one Whisper backend at the start of each run (Groq if its key is set, else OpenAI) and stays on it. If Groq rate-limits, the script retries Groq twice then errors out. To use OpenAI instead, pass `--whisper openai`. ## What NOT to do - Don't attempt to "watch" a video by inventing content based on title or thumbnail. If the pipeline fails, say so. - Don't write the summary before actually reading frames. The transcript alone misses visual context (B-roll, on-screen text, emotion, graphics, transitions). - Don't skip cleanup. Frame dumps + downloaded videos are large. - Don't auto-fire on a video URL. Slash-command-only β see "When to invoke" above. ## Engine attribution The pipeline scripts under `scripts/` are vendored from [bradautomates/claude-video](https://github.com/bradautomates/claude-video) (MIT). See [THIRD_PARTY_NOTICES.md](THIRD_PARTY_NOTICES.md) for the full upstream license.
Security Status
Unvetted
Not yet security scanned
Related AI Tools
More Make Money tools you might like
Social Autoposter
Free"Automate social media posting across Reddit, X/Twitter, LinkedIn, and Moltbook. Find threads, post comments, create original posts, track engagement stats. Use when: 'post to social', 'social autoposter', 'find threads to comment on', 'create a post
PICT Test Designer
FreeDesign comprehensive test cases using PICT (Pairwise Independent Combinatorial Testing) for any piece of requirements or code. Analyzes inputs, generates PICT models with parameters, values, and constraints for valid scenarios using pairwise testing.
Product Manager Skills
FreePM skill for Claude Code, Codex, Cursor, and Windsurf. Diagnoses SaaS metrics, critiques PRDs, plans roadmaps, runs discovery, coaches PM career transitions, pressure-tests AI product decisions, and designs PLG growth strategies. Seven knowledge doma
paper-fetch
FreeUse when the user wants to download a paper PDF from a DOI, title, or URL via legal open-access sources. Tries Unpaywall, arXiv, bioRxiv/medRxiv, PubMed Central, and Semantic Scholar in order. Never uses Sci-Hub or paywall bypass.
Beautiful Prose (Claude Skill)
FreeA hard-edged writing style contract for timeless, forceful English prose without modern AI tics. Use when users ask for prose or rewrites that must be clean, exact, concrete, and free of AI cadence, filler, or therapeutic tone.
SkillCheck (Free)
FreeValidate Claude Code skills against Anthropic guidelines. Use when user says "check skill", "skillcheck", "validate SKILL.md", or asks to find issues in skill definitions. Covers structural and semantic validation. Do NOT use for anti-slop detection,