Agent Simulation

Agent simulation lets you run your agent against realistic, auto-generated test scenarios derived from a policy document. Synkro extracts rules, generates diverse scenarios, drives multi-turn conversations with a simulated user, and verifies every conversation against the policy.

How It Works

Policy Document
      |
      v
1. Extract Rules  ──>  Logic Map (DAG of rules)
      |
      v
2. Generate Scenarios  ──>  Positive, negative, edge cases
      |
      v
3. Simulate Conversations  ──>  Simulated user + your agent
      |
      v
4. Verify  ──>  Each conversation graded against rules
      |
      v
SimulationResults (pass/fail per scenario)

Stages 1, 2, and 4 reuse the same pipeline components that power synkro.generate() — the Logic Extractor, Golden Scenario Generator, and Trace Verifier.

Quick Start

import synkro

# Your agent — any callable that takes messages and returns a string
def my_agent(messages):
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )
    return response.choices[0].message.content

# Run simulation
results = synkro.simulate(
    agent=my_agent,
    policy="All refunds require a receipt. Maximum refund is $500.",
    scenarios=10,
    turns=3,
)

print(f"Pass rate: {results.pass_rate:.0%}")
print(f"Passed: {results.passed}/{results.total}")

Agent Signature

Your agent is a plain callable. It receives OpenAI-format messages and returns a string:

# Sync agent
def my_agent(messages: list[dict]) -> str:
    # messages = [{"role": "user", "content": "..."}, ...]
    return "response text"

# Async agent (auto-detected)
async def my_async_agent(messages: list[dict]) -> str:
    return "response text"

Messages are standard OpenAI-format dicts — the same thing you already pass to any LLM SDK. No Synkro-specific types required.

Inspecting Results

results = synkro.simulate(agent=my_agent, policy=policy, scenarios=10)

# High-level stats
print(results.pass_rate)   # 0.8
print(results.total)       # 10
print(results.passed)      # 8
print(results.failed)      # 2

# Iterate over individual results
for r in results:
    print(f"Scenario: {r.scenario.description}")
    print(f"Type: {r.scenario.scenario_type}")
    print(f"Passed: {r.passed}")

    if not r.passed:
        print(f"Issues: {r.issues}")

    # Full conversation transcript
    for msg in r.messages:
        print(f"  [{msg['role']}]: {msg['content']}")
    print("---")

Saving Results

# Save to JSON (includes summary, all transcripts, and logic map)
results.save("simulation_results.json")

# Convert to a Dataset for JSONL export or HuggingFace upload
dataset = results.dataset
dataset.save("simulation_traces.jsonl")

Controlling the Simulation

Number of Scenarios

The scenarios parameter controls how many test cases are auto-generated from the policy. Synkro produces a balanced mix of positive, negative, edge-case, and irrelevant scenarios.

results = synkro.simulate(agent=my_agent, policy=policy, scenarios=50)

Conversation Turns

The turns parameter sets the maximum number of user-agent exchanges per scenario. The simulated user may end the conversation early if it reaches a natural conclusion.

# Longer conversations for complex policies
results = synkro.simulate(agent=my_agent, policy=policy, turns=5)

Model Selection

By default, Synkro auto-detects available models. You can specify which model to use for the simulated user and scenario generation, and optionally a separate (stronger) model for verification.

results = synkro.simulate(
    agent=my_agent,
    policy=policy,
    model="gpt-4o-mini",           # simulated user + scenarios
    grading_model="gpt-4o",        # verification (stronger = better)
)

Using the Simulator Class

For more control, use the Simulator class directly:

from synkro import Simulator

sim = Simulator(
    model="gpt-4o-mini",
    grading_model="gpt-4o",
    concurrency=10,           # parallel scenario execution
)

results = sim.run(
    agent=my_agent,
    policy=policy,
    scenarios=20,
    turns=3,
)

Async Usage

from synkro import Simulator

sim = Simulator(model="gpt-4o-mini")

# In an async context
results = await sim.run_async(
    agent=my_async_agent,
    policy=policy,
    scenarios=20,
)

# Or use the convenience function
results = await synkro.simulate_async(
    agent=my_async_agent,
    policy=policy,
    scenarios=20,
)

Example: Testing a LangChain Agent

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import synkro

llm = ChatOpenAI(model="gpt-4o")

def langchain_agent(messages):
    # Convert OpenAI-format dicts to LangChain messages
    lc_messages = [HumanMessage(content=m["content"]) for m in messages if m["role"] == "user"]
    response = llm.invoke(lc_messages)
    return response.content

results = synkro.simulate(
    agent=langchain_agent,
    policy=open("company_policy.txt").read(),
    scenarios=20,
    turns=3,
    model="gpt-4o-mini",
)

print(f"Pass rate: {results.pass_rate:.0%}")

# Save for analysis
results.save("langchain_agent_sim.json")

Example: Testing a RAG Pipeline

import synkro

def rag_agent(messages):
    last_user_msg = [m for m in messages if m["role"] == "user"][-1]["content"]

    # Your RAG retrieval
    docs = vector_store.similarity_search(last_user_msg, k=3)
    context = "\n".join(d.page_content for d in docs)

    # Generate response with context
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Use this context:\n{context}"},
            *messages,
        ],
    )
    return response.choices[0].message.content

results = synkro.simulate(
    agent=rag_agent,
    policy=policy_text,
    scenarios=30,
    turns=3,
)

# Check which scenarios failed
for r in results:
    if not r.passed:
        print(f"FAILED: {r.scenario.description}")
        print(f"  Issues: {r.issues}")

What Gets Verified

The verifier checks each conversation against the extracted policy rules:

Check	Description
Skipped rules	Rules that should have been applied but weren’t
Hallucinated rules	Rules cited that don’t exist or don’t apply
Contradictions	Logical inconsistencies in the agent’s responses
DAG compliance	Dependency order between rules was respected
Outcome alignment	Response matches the expected outcome for the scenario

Each SimulationResult includes the full VerificationResult with details:

for r in results:
    v = r.verification
    print(f"Passed: {v.passed}")
    print(f"Rules verified: {v.rules_verified}")
    print(f"Skipped rules: {v.skipped_rules}")
    print(f"Hallucinated: {v.hallucinated_rules}")
    print(f"Contradictions: {v.contradictions}")

Get Started

Core Concepts

Dataset Types

Guides

API Reference

Advanced

Examples

Reference

Agent Simulation

How It Works

Quick Start

Agent Signature

Inspecting Results

Saving Results

Controlling the Simulation

Number of Scenarios

Conversation Turns

Model Selection

Using the Simulator Class

Async Usage

Example: Testing a LangChain Agent

Example: Testing a RAG Pipeline

What Gets Verified

What’s Next

simulate() API

Evaluation Datasets

Get Started

Core Concepts

Dataset Types

Guides

API Reference

Advanced

Examples

Reference

Documentation Index

​How It Works

​Quick Start

​Agent Signature

​Inspecting Results

​Saving Results

​Controlling the Simulation

​Number of Scenarios

​Conversation Turns

​Model Selection

​Using the Simulator Class

​Async Usage

​Example: Testing a LangChain Agent

​Example: Testing a RAG Pipeline

​What Gets Verified

​What’s Next

simulate() API

Evaluation Datasets

How It Works

Quick Start

Agent Signature

Inspecting Results

Saving Results

Controlling the Simulation

Number of Scenarios

Conversation Turns

Model Selection

Using the Simulator Class

Async Usage

Example: Testing a LangChain Agent

Example: Testing a RAG Pipeline

What Gets Verified

What’s Next