Demo

The pipeline, end-to-end, on a real engineer.

Praxa's agent-builder takes a real person's last 90 days of work and generates a custom Skill. This page shows what the four stages actually produce — observe → distill → match → generate — on a synthesized but realistic engineer profile. No LLM call at this page load; just the artifacts the pipeline emits.

01 · Target

A senior backend engineer on a payments team

Lookback window: 90 days.

  • github·eng@acmeSenior reviewer

02 · Observe

247 artifacts pulled.

The pipeline pulls every artifact this person produced or interacted with in the window, keyed by connector. Counts by kind:

  • PR reviews

    92

  • PRs authored

    31

  • Issue comments

    88

  • Commits

    36

Notable patterns surfaced

  • Comments on every migration touching ledger.* with a "is this online-safe?" check
  • Requests changes when a payment service adds a new external integration without a circuit breaker
  • Always asks about idempotency keys when the diff touches POST handlers

03 · Distill

What does this person do?

Reviews PRs against the payments-service repos with a strong emphasis on data-correctness, online-migration safety, and idempotency. Tone is terse and questions-first; will block on missing tests but explicitly accepts "we have an integration test, this unit test would be redundant."

Tasks (by frequency)

  • Review PRs in payments reposhigh
  • Triage incoming bug reports tagged ledgermedium
  • Write incident postmortems for data-correctness bugslow

Decision criteria

  • A migration that locks a hot table > 100ms is a hard reject; suggest pt-online-schema-change.
  • A new third-party HTTP call without a circuit breaker is a hard reject.
  • A POST handler without an idempotency key is a hard reject when the route mutates ledger data.
  • Missing tests is a soft reject if there is an integration test that exercises the path.

Voice

Factual, terse, low-ceremony. Asks one direct question per concern. Uses bullet points; never paragraphs.

04 · Match

Best fit: PR-Reviewerscore 91%

The role's primary action is structured PR feedback. The decision criteria map cleanly onto the PR-Reviewer template's review-criteria slot. Other templates considered: Issue-Triager (poor fit — only 14% of artifacts) and Sentry-Investigator (poor fit — no Sentry usage observed).

05 · Generate

Draft Skill manifest

slug: pr-reviewer-payments

displayName: PR Reviewer · Payments

trigger: github.pull_request.opened

toolAllowlist:

  • · github.read.pr
  • · github.read.diff
  • · github.read.commits
  • · github.comment.pr
  • · github.request_changes

evalSuiteSize: 38

targetAcceptance: 90%

06 · Evals

Auto-built eval suite (5 sample cases shown of 38)

Every case here was synthesized from one of the observed decision-criteria patterns. The full suite must clear the target acceptance rate before the Skill can deploy.

  • migration-with-locking-statementpass

    Verifies the agent flags a hot-table migration without a pt-online-schema-change suggestion

  • new-external-http-no-circuit-breakerpass

    Verifies the agent requests changes when a new third-party HTTP call lacks a circuit breaker

  • idempotency-key-missing-on-postpass

    Verifies the agent flags a POST handler that mutates ledger.* without an idempotency key

  • unit-test-missing-but-integration-presentpass

    Verifies the agent does NOT block on a missing unit test when integration coverage exists (acceptance criterion: explicit accept of integration-only coverage)

  • doc-only-change-no-required-feedbackpass

    Verifies the agent's confidence is high enough to approve doc-only PRs without escalation

What you saw

Six stages, deterministic output, no prompt-engineering by the customer. That entire pipeline runs on your team's real history during onboarding. The Skill that comes out has its scope locked, its eval suite published, and its acceptance rate measured against the senior engineer's actual review history. You can replay every decision the Skill ever makes from the audit log. That's the contract.