AI pipelines are non-deterministic -- LLM outputs vary between runs. But the structure of your pipeline is deterministic: given a particular outcome at each node, the same path should always be followed. Dippin's scenario testing lets you inject context values and assert on execution paths, giving you deterministic tests for non-deterministic systems.
A .test.json file sits next to your .dip file. It holds
an array of test scenarios, each declaring what context values to inject and what
execution behavior to expect (visited nodes, path ordering, status).
The simulator walks the workflow graph, uses your injected values to evaluate edge conditions, and reports which nodes were visited and in what order. Your test asserts the result matches expectations.
dippin test pipeline.dip automatically looks for pipeline.test.json
in the same directory. No configuration needed.
Take a real example from the Dippin repository:
code_quality_sweep.dip.
This workflow runs three LLM providers in parallel to analyze a codebase, synthesizes
findings, fans out into three work streams (fix bugs, write docs, write tests),
then finishes with a quality gate that can restart the whole process.
The key structural elements:
workflow CodeQualitySweep goal: "Analyze the dippin-lang codebase with three LLM providers in parallel..." start: ScanCodebase exit: Done # Phase 1: Scan agent ScanCodebase label: "Map the codebase" ... # Phase 2: Three-provider parallel analysis parallel AnalysisFan -> AnalyzeAnthropic, AnalyzeGemini, AnalyzeOpenAI fan_in AnalysisJoin <- AnalyzeAnthropic, AnalyzeGemini, AnalyzeOpenAI # Phase 3: Synthesize findings agent Synthesize ... # Phase 4: Three parallel work streams parallel WorkFan -> FixBugs, WriteDocs, WriteTests fan_in WorkJoin <- FixBugs, WriteDocs, WriteTests # Phase 5: Quality gate with retry agent QualityGate goal_gate: true ... edges ... QualityGate -> Done when ctx.outcome = success QualityGate -> Synthesize when ctx.outcome = fail restart: true QualityGate -> Done
The quality gate has three outgoing edges: success goes to Done, failure restarts from Synthesize, and an unconditional fallback also goes to Done. Three distinct execution paths, three test scenarios.
Here's the
code_quality_sweep.test.json
from the repository:
{
"tests": [
{
"name": "quality gate passes -- all branches traversed",
"scenario": {"outcome": "success"},
"expect": {
"status": "success",
"visited": [
"ScanCodebase",
"AnalyzeAnthropic", "AnalyzeGemini", "AnalyzeOpenAI",
"Synthesize",
"FixBugs", "WriteDocs", "WriteTests",
"QualityGate", "Done"
],
"path_contains": ["ScanCodebase", "Synthesize", "QualityGate", "Done"]
}
},
{
"name": "quality gate fails -- restarts from Synthesize",
"scenario": {"outcome": "fail"},
"expect": {
"visited": ["QualityGate", "Synthesize"],
"path_contains": ["QualityGate", "Synthesize"]
}
},
{
"name": "all three analysis providers run",
"scenario": {"outcome": "success"},
"expect": {
"path_contains": [
"AnalyzeAnthropic", "AnalyzeGemini",
"AnalyzeOpenAI", "AnalysisJoin"
]
}
},
{
"name": "no outcome -- unconditional fallback to Done",
"scenario": {},
"expect": {
"status": "success",
"visited": ["QualityGate", "Done"]
}
},
{
"name": "branch filter -- only Gemini analysis",
"scenario": {"outcome": "success"},
"branch": ["AnalyzeGemini"],
"expect": {
"status": "success",
"visited": ["AnalyzeGemini"],
"not_visited": ["AnalyzeAnthropic", "AnalyzeOpenAI"]
}
}
]
}
Each test case has three parts:
| Field | Purpose |
|---|---|
name | Human-readable description shown in test output |
scenario | Context values to inject. {"outcome": "success"} sets ctx.outcome to "success" at every node. |
expect | Assertions about the simulation result |
branch | Optional. Filters parallel fan-out to only these branches. |
The expect object supports five assertion types:
| Field | What it checks |
|---|---|
status | Overall simulation status: "success" or "fail" |
visited | Node names that must appear in the execution path |
not_visited | Node names that must NOT appear in the execution path |
path_contains | Node names that must appear in order (not necessarily adjacent) |
immediately_after | Object mapping node names: {"A": "B"} asserts B appears right after A |
$ dippin test examples/code_quality_sweep.dip PASS quality gate passes -- all branches traversed PASS quality gate fails -- restarts from Synthesize PASS all three analysis providers run PASS all three work streams run PASS no outcome -- unconditional fallback to Done PASS branch filter -- only Gemini analysis 6/6 passed examples/code_quality_sweep.dip
Add --verbose to see the full execution path for each scenario.
Invaluable when debugging a failing test:
$ dippin test --verbose examples/code_quality_sweep.dip PASS quality gate passes -- all branches traversed path: ScanCodebase -> AnalysisFan -> AnalyzeAnthropic -> AnalysisJoin -> AnalyzeGemini -> AnalysisJoin -> AnalyzeOpenAI -> AnalysisJoin -> Synthesize -> WorkFan -> FixBugs -> WorkJoin -> WriteDocs -> WorkJoin -> WriteTests -> WorkJoin -> QualityGate -> Done ...
Look at your edge conditions. Each when clause creates a branch.
You need at least one test scenario per branch. Think about:
The scenario object sets context values the simulator uses when
evaluating edge conditions. Keys correspond to variable names in
when clauses, without the ctx. prefix:
// Edge condition in .dip file: QualityGate -> Done when ctx.outcome = success // Corresponding scenario injection in .test.json: "scenario": {"outcome": "success"}
The key insight: you assert on which nodes were visited and in what order, never on LLM response content. The tests stay deterministic because you're testing the graph's routing logic, not the models' output.1
Be careful with not_visited. If your workflow has retry loops with
restart: true edges, the simulator's loop-breaking may visit nodes
you don't expect. Prefer positive assertions (visited,
path_contains) when possible. Use not_visited sparingly.2
For workflows with parallel fan-out, you can test individual branches
in isolation with the branch field:
{
"name": "branch filter -- only Gemini analysis",
"scenario": {"outcome": "success"},
"branch": ["AnalyzeGemini"],
"expect": {
"status": "success",
"visited": ["AnalyzeGemini"],
"not_visited": ["AnalyzeAnthropic", "AnalyzeOpenAI"]
}
}
The branch array lists the parallel targets to include. All other
fan-out branches get skipped, letting you test branch-specific behavior without
noise from other parallel paths.
Add --coverage to see which edges your test suite covers:
$ dippin test --coverage examples/code_quality_sweep.dip 6/6 passed examples/code_quality_sweep.dip Edge coverage: 24/25 edges covered (96.0%) Uncovered edges: QualityGate -> Synthesize when ctx.outcome = fail restart: true
The uncovered edge tells you exactly what test case to add. Write a scenario that triggers that condition to reach 100%.
For CI integration, use --format json
to get machine-readable results:
$ dippin test --format json examples/code_quality_sweep.dip { "file": "examples/code_quality_sweep.dip", "total": 6, "passed": 6, "failed": 0, "results": [ {"name": "quality gate passes -- all branches traversed", "status": "pass"}, {"name": "quality gate fails -- restarts from Synthesize", "status": "pass"}, ... ] }
You know how to write scenario tests that verify your pipeline's routing logic deterministically. Related topics:
simulate/.not_visited fragility was discovered during field testing with the Tracker team. Retry loops with restart: true create cycles, and the simulator breaks them after a bounded number of iterations -- but the nodes visited during those iterations can surprise you. See the testing reference for details on loop-breaking behavior.