Import
DatasetType
Enum for dataset generation types.Message
A single message in a conversation.| Field | Type | Description |
|---|---|---|
role | str | Message role: "system", "user", "assistant", or "tool" |
content | str | None | Message content text |
tool_calls | list[ToolCall] | None | Tool calls made by assistant |
tool_call_id | str | None | ID of tool call this message responds to (for tool role) |
Trace
A complete training trace with messages and metadata.| Field | Type | Description |
|---|---|---|
messages | list[Message] | The conversation messages |
scenario | Scenario | The scenario this trace was generated from |
grade | GradeResult | None | Grading result if graded |
reasoning_chain | list | None | Chain-of-thought reasoning steps |
rules_applied | list[str] | None | Rule IDs that were applied |
rules_excluded | list[str] | None | Rule IDs that were excluded |
Scenario
A test scenario for trace generation.| Field | Type | Description |
|---|---|---|
description | str | The scenario description |
context | str | Additional context and background |
category | str | None | Category this scenario belongs to |
scenario_type | str | None | Type: "positive", "negative", "edge_case", "irrelevant" |
target_rule_ids | list[str] | None | Rule IDs this scenario tests |
expected_outcome | str | None | Expected behavior based on rules |
EvalScenario
A scenario for evaluation with ground truth labels. Used bygenerate_scenarios() for eval dataset generation.
| Field | Type | Description |
|---|---|---|
user_message | str | The user’s request (test input) |
expected_outcome | str | Expected behavior based on policy rules |
target_rule_ids | list[str] | Rule IDs this scenario tests |
scenario_type | str | Type: "positive", "negative", "edge_case", "irrelevant" |
category | str | Policy category this scenario belongs to |
context | str | Additional context for the scenario |
GradeResult
Result of grading a trace.| Field | Type | Description |
|---|---|---|
passed | bool | Whether the trace passes quality checks |
issues | list[str] | List of issues found |
feedback | str | Summary feedback for improvement |
GenerationResult
Result of the generation pipeline whenreturn_logic_map=True.
ScenariosResult
Result of scenario-only generation for eval datasets.CoverageReport
Coverage metrics for a generation run.ToolDefinition
Definition of a tool that an agent can use.| Field | Type | Description |
|---|---|---|
name | str | Name of the tool |
description | str | What the tool does |
parameters | dict | JSON Schema for parameters |
examples | list[dict] | Example tool calls for few-shot learning |
mock_responses | list[str] | Example responses for simulation |