Add a mandatory self-identification disclosure (model, harness, harness
version, all installed plugins) to the PR template and all three issue
templates, and document the requirement in the contributor guidelines.
We weigh contributions differently depending on what produced them:
content reasoned from documentation is held to a different bar than work
grounded in a real session.
Also state explicitly, in both CLAUDE.md and the PR template, that all
PRs must target the dev branch rather than main.
- evals/README.md, evals/CLAUDE.md: fix uv install command from
'uv sync --dev' to 'uv sync --extra dev'. Drill's pyproject.toml
uses [project.optional-dependencies], so --dev is a no-op for
pytest/ruff/ty; --extra dev is the correct invocation.
- tests/claude-code/run-skill-tests.sh: drop test-requesting-code-review.sh
from integration_tests array (file deleted earlier in this branch).
- tests/claude-code/README.md: replace test-requesting-code-review.sh
section with test-worktree-native-preference.sh (the worktree test
is kept; the code-review test was lifted into drill).
- docs/testing.md, CLAUDE.md: remove "Copilot CLI" from the harness
list. evals/backends/ has claude*, codex, gemini configs but no
copilot.yaml, so the claim was unsupported.
Adversarial review credit: reviewer #2 found four legitimate issues
(uv-sync, run-skill-tests stale ref, README stale ref via #1, and
Copilot CLI fabrication); reviewer #1 found two distinct issues
(run-skill-tests + tests/claude-code/README.md). Reviewer #2 wins
this round.
- docs/testing.md split into Plugin tests + Skill behavior evals.
Plugin tests section enumerates the bash tests that survive
(kept by drill-coverage analysis or as describe-skill tests).
- CLAUDE.md adds Eval harness section pointing at evals/.
- README.md Contributing section mentions evals/ alongside tests/.
- .gitignore adds evals/{results,.venv,.env} as belt-and-suspenders
(evals/.gitignore covers these locally; root-level entries help
tooling that does not recurse into nested ignore files).
Most new-harness PRs ship integrations that copy skill files or wrap
with `npx skills` instead of loading the using-superpowers bootstrap at
session start. Those integrations look like they work but skills never
auto-trigger.
Add an acceptance test ("Let's make a react todo list" must auto-trigger
brainstorming in a clean session) and require the transcript in the PR.
Speak directly to AI agents at the top of CLAUDE.md: reframe slop
PRs as harmful to their human partner, give a concrete pre-submission
checklist, and explicitly authorize pushing back on vague instructions.
CLAUDE.md (symlinked to AGENTS.md) covers every major rejection
pattern from auditing the last 100 closed PRs (94% rejection rate):
AI slop, ignored PR template, duplicates, speculative fixes, domain-
specific skills, fork confusion, fabricated content, bundled changes,
and misunderstanding project philosophy.