Prerequisites
Quick Push
Push Options
Train/Test Split
Multiple Formats
Upload different formats for different use cases:Custom Processing
Load from Hub
Load synkro-generated datasets for fine-tuning:Dataset Card
Synkro datasets work well with HuggingFace Dataset Cards:Large Datasets
For large datasets, use streaming upload:Versioning
HuggingFace Hub handles versioning automatically:Best Practices
- Use descriptive repo names:
customer-service-policy-sft>dataset1 - Include metadata: Add dataset cards with generation config
- Version major changes: Tag releases for production datasets
- Use private repos: For proprietary policies, use
private=True - Split appropriately: 90/10 train/test is common for SFT
- Document the policy: Include policy text in dataset card