Cost, coverage, health, optimization, and change tracking.
Dippin includes six analysis commands that inspect workflows for cost, coverage, health, optimization opportunities, and change impact. doctor aggregates cost + coverage + lint into a single grade. Run it first for an overview, then drill into specific commands for details.
A typical workflow: run doctor first, then drill into lint, coverage, or cost if the grade is below B. Use optimize after cost shows high costs. Use diff to review changes, and feedback to calibrate after production runs.
Estimate execution cost based on model pricing tables. Prompt length estimates input tokens, output tokens are estimated heuristically per turn, and max_turns determines the turn range. Tool and human nodes cost $0. Unknown models are costed at $0 with an assumption note.
$ dippin cost pipeline.dip
$ dippin cost pipeline.dip ═══ Cost Estimate ═══════════════════ Min Expected Max ──────────────────── ───── ───── ───── TOTAL $3.21 $3.59 $14.10 ─── By Provider ───────────────────── openai $0.38 $0.57 $2.96 anthropic $2.83 $3.02 $11.13 ─── Top Cost Drivers ──────────────── CommitWork $2.12 (max) openai/gpt-5.2 ImplementClaude $2.12 (max) anthropic/claude-sonnet-4-6 InterpretRequest $1.44 (max) anthropic/claude-opus-4-6 ─── Assumptions ───────────────────── • unknown model "gemini-3-flash" (provider "gemini"): cost set to $0
When to use: Before deploying a pipeline with expensive models. Compare providers. Identify cost drivers to optimize.
Analyze edge coverage and reachability. For tool nodes, extracts possible outputs from printf/echo patterns in the command, then checks whether outgoing edge conditions cover those outputs.
$ dippin coverage pipeline.dip ═══ Coverage Analysis ═══════════════ ─── Edge Coverage ─────────────────── ✓ SetupWorkspace no_conditions ✗ ValidateBuild partial missing: validation-pass-go missing: validation-pass-swift ─── Reachability ──────────────────── ✓ 30/30 nodes reachable ─── Termination ───────────────────── ✓ all paths reach exit: true
When to use: After writing conditional routing to verify all tool outputs have matching edges. The missing entries tell you exactly which edges to add.
Health report card — a single grade (A-F) aggregating lint, coverage, and cost into one score.
$ dippin doctor pipeline.dip ═══ Health Report Card ══════════════ Grade: A Score: 95/100 ─── Lint ─────────────────────────── Errors: 0 Warnings: 1 Hints: 0 ─── Coverage ─────────────────────── Reachable: 21/21 nodes ✓ All paths terminate ✓ All tool outputs covered ─── Cost ─────────────────────────── Expected: $2.10 (range: $1.50 - $8.40) ─── Suggestions ───────────────────── • [lint] review lint warnings - run `dippin lint` for details
Starts at 100 points, with deductions for issues:
| Issue | Deduction |
|---|---|
| Each lint error | -15 points |
| Each lint warning | -5 points |
| Unreachable node | -10 per node |
| Non-terminating paths | -20 |
| Uncovered tool outputs | -5 per tool |
| Grade | Score Range |
|---|---|
| A | 90-100 |
| B | 80-89 |
| C | 70-79 |
| D | 60-69 |
| F | <60 |
Suggest cheaper model substitutions without sacrificing quality. Rules include: simple prompts can use cheaper models, nodes in retry loops can use cheaper models for mechanical iterations, and bookkeeping tasks (summary, cleanup, commit) can use cheaper models.
$ dippin optimize pipeline.dip ═══ Optimization Report ═════════════ ─── Cost Summary ──────────────────── Current: $3.59 (expected) Optimized: $0.00 (expected) Savings: $3.59 (expected) ─── Suggestions ───────────────────── • [InterpretRequest] simple prompt does not need an expensive model claude-opus-4-6 → claude-haiku-4-5 (saves ~$0.41) • [CommitWork] bookkeeping task can use a cheaper model gpt-5.2 → gpt-4o-mini (saves ~$0.30)
When to use: After dippin cost shows high costs. Review each suggestion — some "simple" prompts may actually need a capable model.
Semantic comparison between two workflow versions. Unlike text-based diff, this compares graph structure: nodes added/removed, edges changed, field-level modifications, and cost impact.
$ dippin diff v1.dip v2.dip ═══ Semantic Diff ═══════════════════ ─── Nodes ────────────────────────── + FinalQualityGate ─── Edges ────────────────────────── + FinalQualityGate -> Exit [ctx.outcome = fail] + FinalQualityGate -> PersistSprint [ctx.outcome = success] - WriteFinalSprint -> PersistSprint ─── Cost Delta ────────────────────── Old: $5.35 (expected) New: $5.78 (expected) Delta: +$0.43 (expected)
When to use: Code review for workflow changes. See exactly what graph structure changed and how it affects cost, rather than parsing indentation diffs.
Compare predicted costs against actual execution telemetry to calibrate estimates. Takes the workflow file (for predicted costs) and a CSV telemetry file with columns: node_id, input_tokens, output_tokens, cost_usd.
$ dippin feedback pipeline.dip telemetry.csv
After running a pipeline in production, export telemetry and feed it back to see how accurate the cost predictions were. Outliers (>2x or <0.5x ratio) are flagged for investigation.