Crawler Info

MFKVault discovers AI agent skills via automated crawlers. This page lists what we crawl, how often, and what we store. Original authors retain all rights to their work.

What we crawl

GitHub repositories with SKILL.md
Cadence: Every 12 hours

Only repos with an OSS-friendly SPDX license (MIT, Apache-2.0, BSD-2/3, ISC, MPL-2.0, CC0-1.0, Unlicense, WTFPL, CC-BY, CC-BY-SA) are ingested.

npm registry packages tagged ai-skill
Cadence: Every 12 hours

Only packages whose linked repo carries a permitted license.

PyPI packages tagged ai-agent
Cadence: Every 12 hours

Same license gate as GitHub.

Hugging Face Spaces (article-style)
Cadence: Daily

Always lands as pending_review and is filtered out of buyer-facing feeds until reviewed.

Dev.to articles (article-style)
Cadence: Daily

Same as HF Spaces — never auto-published.

What we store

  • The skill name, slug, and a short AI-generated summary (transformative work, not a copy).
  • The upstream source URL so users can view the original.
  • The repository's SPDX license id, when one is published.
  • Aggregate signals (stars, forks, last update) to rank quality.

We do not store full READMEs, full SKILL.md content, or any code from the upstream repository. Anything longer than a 300-character snippet is replaced by a short summary written by our model.

How to opt out / request removal

Email [email protected] with the skill URL or repository link. We will remove the listing within 24 hours, no questions asked. You can also block our crawler at the repo level by adding User-agent: MFKVault-Crawler / Disallow: / to your robots.txt.

Crawler identification

Our crawlers identify themselves with the User-Agent string MFKVault-Crawler/1.0 (+https://mfkvault.com/crawler-info). We respect robots.txt and back off on rate limits.

Last updated 2026-05-08. See also our Community Helper Policy and DMCA page.