- docs/testing.md split into Plugin tests + Skill behavior evals.
Plugin tests section enumerates the bash tests that survive
(kept by drill-coverage analysis or as describe-skill tests).
- CLAUDE.md adds Eval harness section pointing at evals/.
- README.md Contributing section mentions evals/ alongside tests/.
- .gitignore adds evals/{results,.venv,.env} as belt-and-suspenders
(evals/.gitignore covers these locally; root-level entries help
tooling that does not recurse into nested ignore files).
Most new-harness PRs ship integrations that copy skill files or wrap
with `npx skills` instead of loading the using-superpowers bootstrap at
session start. Those integrations look like they work but skills never
auto-trigger.
Add an acceptance test ("Let's make a react todo list" must auto-trigger
brainstorming in a clean session) and require the transcript in the PR.
Speak directly to AI agents at the top of CLAUDE.md: reframe slop
PRs as harmful to their human partner, give a concrete pre-submission
checklist, and explicitly authorize pushing back on vague instructions.
CLAUDE.md (symlinked to AGENTS.md) covers every major rejection
pattern from auditing the last 100 closed PRs (94% rejection rate):
AI slop, ignored PR template, duplicates, speculative fixes, domain-
specific skills, fork confusion, fabricated content, bundled changes,
and misunderstanding project philosophy.