Skip to main content

Overview

The INSTRUCTION dataset type generates single-turn question-answer pairs. Best for: Instruction-following models, RAG systems, or when you don’t need multi-turn context.

Usage

from synkro import create_pipeline, DatasetType

pipeline = create_pipeline(dataset_type=DatasetType.INSTRUCTION)
dataset = pipeline.generate(policy, traces=100)
dataset.save("training.jsonl")

Output Format

{"messages": [
  {"role": "user", "content": "What's the approval process for a $350 expense?"},
  {"role": "assistant", "content": "For a $350 expense, you'll need manager approval since it exceeds the $50 threshold. Here's the process:\n\n1. Submit your expense report in the system\n2. Attach the original receipt\n3. Your manager will receive a notification\n4. Approval typically takes 1-2 business days\n\nOnce approved, reimbursement is processed within 14 business days."}
]}

When to Use

Use Instruction

  • Standalone Q&A
  • RAG applications
  • Quick lookups
  • FAQ-style training

Use Conversation

  • Follow-up questions needed
  • Context-dependent answers
  • Complex multi-step processes

Export Formats

# Standard messages format
dataset.save("data.jsonl", format="messages")

# ChatML format
dataset.save("data.jsonl", format="chatml")