Dataset class is a collection of generated training traces. It provides methods for filtering, saving, and exporting traces in various formats.
Import
Properties
| Property | Type | Description |
|---|---|---|
traces | list[Trace] | The list of generated traces |
passing_rate | float | Percentage of traces that passed grading (0.0-1.0) |
categories | list[str] | Unique categories in the dataset |
Methods
save()
Save dataset to a JSONL file.Output file path. If None, auto-generates a timestamped filename.
Output format:
"messages", "qa", "langsmith", "langfuse", "tool_call", "chatml", or "bert" / "bert:<task>"If True, format JSON with indentation (multi-line, human-readable)
to_jsonl()
Convert dataset to JSONL string.Output format (same options as
save())Format with indentation
filter()
Filter traces by criteria. Returns a new Dataset with filtered traces.Filter by grade pass/fail status
Filter by scenario category
Minimum response length in characters
dedupe()
Remove duplicate or near-duplicate traces.Similarity threshold (0-1). Higher = stricter dedup. Only used for semantic method.
Deduplication method:
"exact": Remove exact text duplicates (fast)"semantic": Remove semantically similar traces (requires sentence-transformers)
Which field to dedupe on:
"user", "assistant", or "both"Semantic deduplication requires the
sentence-transformers package:to_hf_dataset()
Convert to HuggingFace Dataset.Output format (same options as
save())datasets.Dataset object
Requires the
datasets package:push_to_hub()
Push dataset directly to HuggingFace Hub.HuggingFace repo ID (e.g.,
"my-org/policy-data")Output format
Whether the repo should be private
Dataset split name
HuggingFace token (uses cached token if not provided)
to_dict()
Convert dataset to a dictionary.summary()
Get a human-readable summary of the dataset.Container Protocol
Dataset supports standard Python container operations:Export Format Reference
| Format | Description | Use Case |
|---|---|---|
messages | OpenAI messages format | Fine-tuning GPT models |
chatml | ChatML format | Alternative chat format |
qa | Q&A with ground truth | Evaluation datasets |
langsmith | LangSmith format | LangSmith integration |
langfuse | Langfuse format | Langfuse integration |
tool_call | Tool calling format | Function calling datasets |
bert | BERT classification | Encoder models |
bert:qa | BERT extractive QA | Question answering |