Demo
The pipeline, end-to-end, on a real engineer.
Praxa's agent-builder takes a real person's last 90 days of work and generates a custom Skill. This page shows what the four stages actually produce — observe → distill → match → generate — on a synthesized but realistic engineer profile. No LLM call at this page load; just the artifacts the pipeline emits.
01 · Target
A senior backend engineer on a payments team
Lookback window: 90 days.
github·eng@acme— Senior reviewer
02 · Observe
247 artifacts pulled.
The pipeline pulls every artifact this person produced or interacted with in the window, keyed by connector. Counts by kind:
PR reviews
92
PRs authored
31
Issue comments
88
Commits
36
Notable patterns surfaced
- Comments on every migration touching ledger.* with a "is this online-safe?" check
- Requests changes when a payment service adds a new external integration without a circuit breaker
- Always asks about idempotency keys when the diff touches POST handlers
03 · Distill
What does this person do?
Reviews PRs against the payments-service repos with a strong emphasis on data-correctness, online-migration safety, and idempotency. Tone is terse and questions-first; will block on missing tests but explicitly accepts "we have an integration test, this unit test would be redundant."
Tasks (by frequency)
- Review PRs in payments repos
high - Triage incoming bug reports tagged ledger
medium - Write incident postmortems for data-correctness bugs
low
Decision criteria
- A migration that locks a hot table > 100ms is a hard reject; suggest pt-online-schema-change.
- A new third-party HTTP call without a circuit breaker is a hard reject.
- A POST handler without an idempotency key is a hard reject when the route mutates ledger data.
- Missing tests is a soft reject if there is an integration test that exercises the path.
Voice
Factual, terse, low-ceremony. Asks one direct question per concern. Uses bullet points; never paragraphs.
04 · Match
Best fit: PR-Reviewerscore 91%
The role's primary action is structured PR feedback. The decision criteria map cleanly onto the PR-Reviewer template's review-criteria slot. Other templates considered: Issue-Triager (poor fit — only 14% of artifacts) and Sentry-Investigator (poor fit — no Sentry usage observed).
05 · Generate
Draft Skill manifest
slug: pr-reviewer-payments
displayName: PR Reviewer · Payments
trigger: github.pull_request.opened
toolAllowlist:
- · github.read.pr
- · github.read.diff
- · github.read.commits
- · github.comment.pr
- · github.request_changes
evalSuiteSize: 38
targetAcceptance: 90%
06 · Evals
Auto-built eval suite (5 sample cases shown of 38)
Every case here was synthesized from one of the observed decision-criteria patterns. The full suite must clear the target acceptance rate before the Skill can deploy.
migration-with-locking-statementpassVerifies the agent flags a hot-table migration without a pt-online-schema-change suggestion
new-external-http-no-circuit-breakerpassVerifies the agent requests changes when a new third-party HTTP call lacks a circuit breaker
idempotency-key-missing-on-postpassVerifies the agent flags a POST handler that mutates ledger.* without an idempotency key
unit-test-missing-but-integration-presentpassVerifies the agent does NOT block on a missing unit test when integration coverage exists (acceptance criterion: explicit accept of integration-only coverage)
doc-only-change-no-required-feedbackpassVerifies the agent's confidence is high enough to approve doc-only PRs without escalation
What you saw
Six stages, deterministic output, no prompt-engineering by the customer. That entire pipeline runs on your team's real history during onboarding. The Skill that comes out has its scope locked, its eval suite published, and its acceptance rate measured against the senior engineer's actual review history. You can replay every decision the Skill ever makes from the audit log. That's the contract.