Taoguba Crawler

Name: Taoguba Crawler
Brand: MFKVault
Availability: InStock

This skill should be used when the user asks to "crawl taoguba", "crawl tgb", "scrape taoguba articles", "run the crawler", "crawl bbs", "crawl home page", "generate article HTML", or needs to run the Taoguba (tgb.cn) web crawlers.

Install in one line

CLI

$ mfkvault install taoguba-crawler

Requires the MFKVault CLI. Prefer MCP?

New skill

No reviews yet

New skill

🤖 Claude Code⚡ Cursor💻 Codex🦞 OpenClaw

FREE

Free to install — no account needed

Copy the command below and paste into your agent.

Instant access • No coding needed • No account needed

What you get in 5 minutes

Full skill code ready to install
Works with 4 AI agents
Lifetime updates included

SecureBe the first

Description

--- name: taoguba-crawler description: This skill should be used when the user asks to "crawl taoguba", "crawl tgb", "scrape taoguba articles", "run the crawler", "crawl bbs", "crawl home page", "generate article HTML", or needs to run the Taoguba (tgb.cn) web crawlers. version: 0.1.0 allowed-tools: Bash, Read --- # Taoguba Crawler This skill runs the Taoguba (tgb.cn) article crawlers located in the project root. ## Prerequisites - Python 3 with `requests`, `beautifulsoup4`, `python-dotenv` installed - A `.env` file in the project root containing `COOKIE` and optionally `USER_AGENT` ## Available Crawlers ### 1. BBS Crawler (`crawler_bbs.py`) Crawl the forum board at `tgb.cn/bbs/1/1` using HTML scraping. ```bash python crawler_bbs.py ``` - Extracts article list by parsing `a.overhide.mw300` elements - Gets each article's main post and author replies - Downloads images and embeds them as base64 in HTML - Outputs: `output/bbs_YYYY-MM-DD.json` and `output/bbs_YYYY-MM-DD_HHMMSS.html` ### 2. Home Crawler (`crawler_home.py`) Crawl the homepage recommendations via JSON API (`/newIndex/getZh`). ```bash python crawler_home.py ``` - Fetches articles from the JSON API (default 2 pages) - Same content extraction and HTML generation as BBS crawler - Outputs: `output/home_YYYY-MM-DD.json` and `output/home_YYYY-MM-DD_HHMMSS.html` ## Common Workflow To run both crawlers: ```bash python crawler_bbs.py && python crawler_home.py ``` ## Key Implementation Details - **Authentication**: Both scripts read `COOKIE` from `.env` via `python-dotenv` - **Rate limiting**: 0.5-1s delay between requests to avoid being blocked - **Image handling**: Images are downloaded and embedded as base64 in the HTML output - **Article content**: Extracts main post (`#first`) and author replies (`.comment-data` with author badge) - **Output directory**: All results saved to `output/` folder ## Scripts The crawler scripts are bundled in `scripts/`: - **`scripts/crawler_bbs.py`** - BBS forum crawler (HTML scraping) - **`scripts/crawler_home.py`** - Homepage crawler (JSON API) To run the bundled scripts directly: ```bash python scripts/crawler_bbs.py python scripts/crawler_home.py ``` ## Troubleshooting - If no articles are returned, check that `.env` contains a valid `COOKIE` value - If image downloads fail, the HTML will show error messages inline - Network timeouts default to 10-15 seconds per request

Preview in:

Security Status

Scanned

Passed automated security checks

Time saved

How much time did this skill save you?

Related AI Tools

More Grow Business tools you might like

Linear

Free

Managing Linear issues, projects, and teams. Use when working with Linear tasks, creating issues, updating status, querying projects, or managing team workflows.

codex-collab

Free

Use when the user asks to invoke, delegate to, or collaborate with Codex on any task. Also use PROACTIVELY when an independent, non-Claude perspective from Codex would add value — second opinions on code, plans, architecture, or design decisions.

Rails Upgrade Analyzer

Free

Analyze Rails application upgrade path. Checks current version, finds latest release, fetches upgrade notes and diffs, then performs selective upgrade preserving local customizations.

Asta MCP — Academic Paper Search

Free

Domain expertise for Ai2 Asta MCP tools (Semantic Scholar corpus). Intent-to-tool routing, safe defaults, workflow patterns, and pitfall warnings for academic paper search, citation traversal, and author discovery.

Hand Drawn Diagrams

Free

Create hand-drawn Excalidraw diagrams, flows, explainers, wireframes, and page mockups. Default to monochrome sketch output; allow restrained color only for page mockups when the user explicitly wants webpage-like fidelity.

Move Code Quality Checker

Free

Analyzes Move language packages against the official Move Book Code Quality Checklist. Use this skill when reviewing Move code, checking Move 2024 Edition compliance, or analyzing Move packages for best practices. Activates automatically when working