Skip to main content
Agent simulation lets you run your agent against realistic, auto-generated test scenarios derived from a policy document. Synkro extracts rules, generates diverse scenarios, drives multi-turn conversations with a simulated user, and verifies every conversation against the policy.

How It Works

Policy Document
      |
      v
1. Extract Rules  ──>  Logic Map (DAG of rules)
      |
      v
2. Generate Scenarios  ──>  Positive, negative, edge cases
      |
      v
3. Simulate Conversations  ──>  Simulated user + your agent
      |
      v
4. Verify  ──>  Each conversation graded against rules
      |
      v
SimulationResults (pass/fail per scenario)
Stages 1, 2, and 4 reuse the same pipeline components that power synkro.generate() — the Logic Extractor, Golden Scenario Generator, and Trace Verifier.

Quick Start

import synkro

# Your agent — any callable that takes messages and returns a string
def my_agent(messages):
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )
    return response.choices[0].message.content

# Run simulation
results = synkro.simulate(
    agent=my_agent,
    policy="All refunds require a receipt. Maximum refund is $500.",
    scenarios=10,
    turns=3,
)

print(f"Pass rate: {results.pass_rate:.0%}")
print(f"Passed: {results.passed}/{results.total}")

Agent Signature

Your agent is a plain callable. It receives OpenAI-format messages and returns a string:
# Sync agent
def my_agent(messages: list[dict]) -> str:
    # messages = [{"role": "user", "content": "..."}, ...]
    return "response text"

# Async agent (auto-detected)
async def my_async_agent(messages: list[dict]) -> str:
    return "response text"
Messages are standard OpenAI-format dicts — the same thing you already pass to any LLM SDK. No Synkro-specific types required.

Inspecting Results

results = synkro.simulate(agent=my_agent, policy=policy, scenarios=10)

# High-level stats
print(results.pass_rate)   # 0.8
print(results.total)       # 10
print(results.passed)      # 8
print(results.failed)      # 2

# Iterate over individual results
for r in results:
    print(f"Scenario: {r.scenario.description}")
    print(f"Type: {r.scenario.scenario_type}")
    print(f"Passed: {r.passed}")

    if not r.passed:
        print(f"Issues: {r.issues}")

    # Full conversation transcript
    for msg in r.messages:
        print(f"  [{msg['role']}]: {msg['content']}")
    print("---")

Saving Results

# Save to JSON (includes summary, all transcripts, and logic map)
results.save("simulation_results.json")

# Convert to a Dataset for JSONL export or HuggingFace upload
dataset = results.dataset
dataset.save("simulation_traces.jsonl")

Controlling the Simulation

Number of Scenarios

The scenarios parameter controls how many test cases are auto-generated from the policy. Synkro produces a balanced mix of positive, negative, edge-case, and irrelevant scenarios.
results = synkro.simulate(agent=my_agent, policy=policy, scenarios=50)

Conversation Turns

The turns parameter sets the maximum number of user-agent exchanges per scenario. The simulated user may end the conversation early if it reaches a natural conclusion.
# Longer conversations for complex policies
results = synkro.simulate(agent=my_agent, policy=policy, turns=5)

Model Selection

By default, Synkro auto-detects available models. You can specify which model to use for the simulated user and scenario generation, and optionally a separate (stronger) model for verification.
results = synkro.simulate(
    agent=my_agent,
    policy=policy,
    model="gpt-4o-mini",           # simulated user + scenarios
    grading_model="gpt-4o",        # verification (stronger = better)
)

Using the Simulator Class

For more control, use the Simulator class directly:
from synkro import Simulator

sim = Simulator(
    model="gpt-4o-mini",
    grading_model="gpt-4o",
    concurrency=10,           # parallel scenario execution
)

results = sim.run(
    agent=my_agent,
    policy=policy,
    scenarios=20,
    turns=3,
)

Async Usage

from synkro import Simulator

sim = Simulator(model="gpt-4o-mini")

# In an async context
results = await sim.run_async(
    agent=my_async_agent,
    policy=policy,
    scenarios=20,
)

# Or use the convenience function
results = await synkro.simulate_async(
    agent=my_async_agent,
    policy=policy,
    scenarios=20,
)

Example: Testing a LangChain Agent

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import synkro

llm = ChatOpenAI(model="gpt-4o")

def langchain_agent(messages):
    # Convert OpenAI-format dicts to LangChain messages
    lc_messages = [HumanMessage(content=m["content"]) for m in messages if m["role"] == "user"]
    response = llm.invoke(lc_messages)
    return response.content

results = synkro.simulate(
    agent=langchain_agent,
    policy=open("company_policy.txt").read(),
    scenarios=20,
    turns=3,
    model="gpt-4o-mini",
)

print(f"Pass rate: {results.pass_rate:.0%}")

# Save for analysis
results.save("langchain_agent_sim.json")

Example: Testing a RAG Pipeline

import synkro

def rag_agent(messages):
    last_user_msg = [m for m in messages if m["role"] == "user"][-1]["content"]

    # Your RAG retrieval
    docs = vector_store.similarity_search(last_user_msg, k=3)
    context = "\n".join(d.page_content for d in docs)

    # Generate response with context
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Use this context:\n{context}"},
            *messages,
        ],
    )
    return response.choices[0].message.content

results = synkro.simulate(
    agent=rag_agent,
    policy=policy_text,
    scenarios=30,
    turns=3,
)

# Check which scenarios failed
for r in results:
    if not r.passed:
        print(f"FAILED: {r.scenario.description}")
        print(f"  Issues: {r.issues}")

What Gets Verified

The verifier checks each conversation against the extracted policy rules:
CheckDescription
Skipped rulesRules that should have been applied but weren’t
Hallucinated rulesRules cited that don’t exist or don’t apply
ContradictionsLogical inconsistencies in the agent’s responses
DAG complianceDependency order between rules was respected
Outcome alignmentResponse matches the expected outcome for the scenario
Each SimulationResult includes the full VerificationResult with details:
for r in results:
    v = r.verification
    print(f"Passed: {v.passed}")
    print(f"Rules verified: {v.rules_verified}")
    print(f"Skipped rules: {v.skipped_rules}")
    print(f"Hallucinated: {v.hallucinated_rules}")
    print(f"Contradictions: {v.contradictions}")

What’s Next