Crawler Info
MFKVault discovers AI agent skills via automated crawlers. This page lists what we crawl, how often, and what we store. Original authors retain all rights to their work.
What we crawl
Only repos with an OSS-friendly SPDX license (MIT, Apache-2.0, BSD-2/3, ISC, MPL-2.0, CC0-1.0, Unlicense, WTFPL, CC-BY, CC-BY-SA) are ingested.
Only packages whose linked repo carries a permitted license.
Same license gate as GitHub.
Always lands as pending_review and is filtered out of buyer-facing feeds until reviewed.
Same as HF Spaces — never auto-published.
What we store
- The skill name, slug, and a short AI-generated summary (transformative work, not a copy).
- The upstream source URL so users can view the original.
- The repository's SPDX license id, when one is published.
- Aggregate signals (stars, forks, last update) to rank quality.
We do not store full READMEs, full SKILL.md content, or any code from the upstream repository. Anything longer than a 300-character snippet is replaced by a short summary written by our model.
How to opt out / request removal
Email [email protected] with the skill URL or repository link. We will remove the listing within 24 hours, no questions asked. You can also block our crawler at the repo level by adding User-agent: MFKVault-Crawler / Disallow: / to your robots.txt.
Crawler identification
Our crawlers identify themselves with the User-Agent string MFKVault-Crawler/1.0 (+https://mfkvault.com/crawler-info). We respect robots.txt and back off on rate limits.
Last updated 2026-05-08. See also our Community Helper Policy and DMCA page.