Analysis

Analysis Tools

Cost, coverage, health, optimization, and change tracking.

Overview

Dippin includes eight analysis commands that inspect workflows for cost, coverage, health, optimization opportunities, dead code, change impact, calibration, and dry-run behavior. doctor aggregates cost + coverage + lint into a single grade. Run it first for an overview, then drill into specific commands for details.

dippin doctor

→

dippin lint

dippin coverage

dippin cost

A typical workflow: run doctor first, then drill into lint, coverage, or cost if the grade is below B. Use optimize after cost shows high costs. Use diff to review changes, and feedback to calibrate after production runs.

cost

Estimate execution cost based on model pricing tables. Prompt length estimates input tokens, output tokens are estimated heuristically per turn, and max_turns determines the turn range. Tool and human nodes cost $0. Unknown models are costed at $0 with an assumption note.

$ dippin cost pipeline.dip

dippin cost

$ dippin cost pipeline.dip ═══ Cost Estimate ═══════════════════ Min Expected Max ──────────────────── ───── ───── ───── TOTAL $3.21 $3.59 $14.10

─── By Provider ───────────────────── openai $0.38 $0.57 $2.96 anthropic $2.83 $3.02 $11.13

─── Top Cost Drivers ──────────────── CommitWork $2.12 (max) openai/gpt-5.2 ImplementClaude $2.12 (max) anthropic/claude-sonnet-4-6 InterpretRequest $1.44 (max) anthropic/claude-opus-4-6

─── Assumptions ───────────────────── • unknown model “gemini-3-flash” (provider “gemini”): cost set to $0

When to use: Before deploying a pipeline with expensive models. Compare providers. Identify cost drivers to optimize.

coverage

Analyze edge coverage and reachability. For tool nodes, extracts possible outputs from printf/echo patterns in the command, then checks whether outgoing edge conditions cover those outputs.

dippin coverage

$ dippin coverage pipeline.dip ═══ Coverage Analysis ═══════════════ ─── Edge Coverage ─────────────────── ✓ SetupWorkspace no_conditions ✗ ValidateBuild partial missing: validation-pass-go missing: validation-pass-swift

─── Reachability ──────────────────── ✓ 30/30 nodes reachable

─── Termination ───────────────────── ✓ all paths reach exit: true

When to use: After writing conditional routing to verify all tool outputs have matching edges. The missing entries tell you exactly which edges to add.

doctor

Health report card — a single grade (A-F) aggregating lint, coverage, and cost into one score.

dippin doctor

$ dippin doctor pipeline.dip ═══ Health Report Card ══════════════ Grade: A Score: 95/100

─── Lint ─────────────────────────── Errors: 0 Warnings: 1 Hints: 0

─── Coverage ─────────────────────── Reachable: 21/21 nodes ✓ All paths terminate

─── Cost ─────────────────────────── Expected: $2.10 (range: $1.50 - $8.40)

─── Suggestions ───────────────────── • [lint] review lint warnings - run dippin lint for details

Scoring Breakdown

Starts at 100 points, with deductions for issues:

Issue	Deduction
Each lint error	-20 points
Each lint warning	-5 points
Unreachable node	-15 per node
Non-terminating paths	-15
Uncovered tool outputs	-10 per tool

Grades

Grade	Score Range
A	90-100
B	80-89
C	70-79
D	60-69
F	<60

optimize

Suggest cheaper model substitutions without sacrificing quality. Rules include: simple prompts can use cheaper models, nodes in retry loops can use cheaper models for mechanical iterations, and bookkeeping tasks (summary, cleanup, commit) can use cheaper models.

dippin optimize

$ dippin optimize pipeline.dip
═══ Optimization Report ═════════════
─── Cost Summary ────────────────────
  Current:   $3.59 (expected)
  Optimized: $0.00 (expected)
  Savings:   $3.59 (expected)
─── Suggestions ─────────────────────
• [InterpretRequest] simple prompt does not need an expensive model
claude-opus-4-6 → claude-haiku-4-5  (saves ~$0.41)
• [CommitWork] bookkeeping task can use a cheaper model
gpt-5.2 → gpt-4o-mini  (saves ~$0.30)

When to use: After dippin cost shows high costs. Review each suggestion — some “simple” prompts may actually need a capable model.

simulate

Dry-run a workflow without real side effects, emitting JSONL events to stdout. It walks the IR graph from start to exit, logging each node visited — no LLM calls and no shell commands run. Use --scenario key=val to inject context and explore conditional branches, --all-paths to enumerate every reachable path, and --interactive to prompt at human nodes.

$ dippin simulate pipeline.dip --scenario outcome=fail

When to use: Verify routing logic before running the real pipeline — confirm a scenario reaches the expected node sequence, or audit every branch with --all-paths.

diff

Semantic comparison between two workflow versions. Unlike text-based diff, this compares graph structure: nodes added/removed, edges changed, field-level modifications, and cost impact.

dippin diff

$ dippin diff v1.dip v2.dip ═══ Semantic Diff ═══════════════════ ─── Nodes ────────────────────────── + FinalQualityGate

─── Edges ────────────────────────── + FinalQualityGate -> Exit [ctx.outcome = fail] + FinalQualityGate -> PersistSprint [ctx.outcome = success] - WriteFinalSprint -> PersistSprint

─── Cost Delta ────────────────────── Old: $5.35 (expected) New: $5.78 (expected) Delta: +$0.43 (expected)

When to use: Code review for workflow changes. See exactly what graph structure changed and how it affects cost, rather than parsing indentation diffs.

feedback

Compare predicted costs against actual execution telemetry to calibrate estimates. Takes the workflow file (for predicted costs) and a JSONL telemetry file — one JSON object per line with fields: event, node, model, provider, tokens_in, tokens_out, actual_cost, turns, timestamp.

$ dippin feedback pipeline.dip telemetry.jsonl

After running a pipeline in production, export telemetry and feed it back to see how accurate the cost predictions were. Outliers (>2x or <0.5x ratio) are flagged for investigation.