Analysis Tools

Cost, coverage, health, optimization, and change tracking.

Overview

Dippin includes six analysis commands that inspect workflows for cost, coverage, health, optimization opportunities, and change impact. doctor aggregates cost + coverage + lint into a single grade. Run it first for an overview, then drill into specific commands for details.

dippin doctor
dippin lint
+
dippin coverage
+
dippin cost

A typical workflow: run doctor first, then drill into lint, coverage, or cost if the grade is below B. Use optimize after cost shows high costs. Use diff to review changes, and feedback to calibrate after production runs.

cost

Estimate execution cost based on model pricing tables. Prompt length estimates input tokens, output tokens are estimated heuristically per turn, and max_turns determines the turn range. Tool and human nodes cost $0. Unknown models are costed at $0 with an assumption note.

$ dippin cost pipeline.dip
dippin cost
$ dippin cost pipeline.dip
═══ Cost Estimate ═══════════════════
                            Min Expected      Max
  ──────────────────── ───── ───── ─────
  TOTAL                       $3.21    $3.59   $14.10

─── By Provider ─────────────────────
  openai                      $0.38    $0.57    $2.96
  anthropic                   $2.83    $3.02   $11.13

─── Top Cost Drivers ────────────────
  CommitWork                  $2.12 (max)  openai/gpt-5.2
  ImplementClaude             $2.12 (max)  anthropic/claude-sonnet-4-6
  InterpretRequest            $1.44 (max)  anthropic/claude-opus-4-6

─── Assumptions ─────────────────────
   unknown model "gemini-3-flash" (provider "gemini"): cost set to $0

When to use: Before deploying a pipeline with expensive models. Compare providers. Identify cost drivers to optimize.

coverage

Analyze edge coverage and reachability. For tool nodes, extracts possible outputs from printf/echo patterns in the command, then checks whether outgoing edge conditions cover those outputs.

dippin coverage
$ dippin coverage pipeline.dip
═══ Coverage Analysis ═══════════════
─── Edge Coverage ───────────────────
   SetupWorkspace               no_conditions
   ValidateBuild                partial
      missing: validation-pass-go
      missing: validation-pass-swift

─── Reachability ────────────────────
   30/30 nodes reachable

─── Termination ─────────────────────
   all paths reach exit: true

When to use: After writing conditional routing to verify all tool outputs have matching edges. The missing entries tell you exactly which edges to add.

doctor

Health report card — a single grade (A-F) aggregating lint, coverage, and cost into one score.

dippin doctor
$ dippin doctor pipeline.dip
═══ Health Report Card ══════════════
  Grade: A  Score: 95/100

─── Lint ───────────────────────────
  Errors: 0  Warnings: 1  Hints: 0

─── Coverage ───────────────────────
  Reachable: 21/21 nodes
   All paths terminate
   All tool outputs covered

─── Cost ───────────────────────────
  Expected: $2.10  (range: $1.50 - $8.40)

─── Suggestions ─────────────────────
   [lint] review lint warnings - run `dippin lint` for details

Scoring Breakdown

Starts at 100 points, with deductions for issues:

IssueDeduction
Each lint error-15 points
Each lint warning-5 points
Unreachable node-10 per node
Non-terminating paths-20
Uncovered tool outputs-5 per tool

Grades

GradeScore Range
A90-100
B80-89
C70-79
D60-69
F<60

optimize

Suggest cheaper model substitutions without sacrificing quality. Rules include: simple prompts can use cheaper models, nodes in retry loops can use cheaper models for mechanical iterations, and bookkeeping tasks (summary, cleanup, commit) can use cheaper models.

dippin optimize
$ dippin optimize pipeline.dip
═══ Optimization Report ═════════════
─── Cost Summary ────────────────────
  Current:   $3.59 (expected)
  Optimized: $0.00 (expected)
  Savings:   $3.59 (expected)

─── Suggestions ─────────────────────
   [InterpretRequest] simple prompt does not need an expensive model
    claude-opus-4-6  claude-haiku-4-5  (saves ~$0.41)
   [CommitWork] bookkeeping task can use a cheaper model
    gpt-5.2  gpt-4o-mini  (saves ~$0.30)

When to use: After dippin cost shows high costs. Review each suggestion — some "simple" prompts may actually need a capable model.

diff

Semantic comparison between two workflow versions. Unlike text-based diff, this compares graph structure: nodes added/removed, edges changed, field-level modifications, and cost impact.

dippin diff
$ dippin diff v1.dip v2.dip
═══ Semantic Diff ═══════════════════
─── Nodes ──────────────────────────
  + FinalQualityGate

─── Edges ──────────────────────────
  + FinalQualityGate -> Exit [ctx.outcome = fail]
  + FinalQualityGate -> PersistSprint [ctx.outcome = success]
  - WriteFinalSprint -> PersistSprint

─── Cost Delta ──────────────────────
  Old: $5.35 (expected)  New: $5.78 (expected)
  Delta: +$0.43 (expected)

When to use: Code review for workflow changes. See exactly what graph structure changed and how it affects cost, rather than parsing indentation diffs.

feedback

Compare predicted costs against actual execution telemetry to calibrate estimates. Takes the workflow file (for predicted costs) and a CSV telemetry file with columns: node_id, input_tokens, output_tokens, cost_usd.

$ dippin feedback pipeline.dip telemetry.csv

After running a pipeline in production, export telemetry and feed it back to see how accurate the cost predictions were. Outliers (>2x or <0.5x ratio) are flagged for investigation.