Signature
synkro.grade(
response: str,
scenario: EvalScenario,
policy: str | Policy,
model: str = "gpt-4o",
base_url: str | None = None,
) -> GradeResult
Parameters
The response from the model being evaluated
The eval scenario with expected_outcome and target_rules
The policy document for grading context
LLM to use for grading (stronger = better)
Returns
GradeResult containing:
| Field | Type | Description |
|---|
passed | bool | Whether the response passed |
feedback | str | Explanation of the grade |
issues | list[str] | Specific issues found |
Examples
Basic Grading
import synkro
# Generate scenarios
result = synkro.generate_scenarios(policy, count=100)
# Grade each response
passed = 0
failed = 0
for scenario in result.scenarios:
response = my_model(scenario.user_message)
grade = synkro.grade(response, scenario, policy)
if grade.passed:
passed += 1
else:
failed += 1
print(f"Failed: {scenario.user_message[:50]}...")
print(f"Issues: {grade.issues}")
print(f"Pass rate: {passed}/{passed+failed}")
Detailed Analysis
grade = synkro.grade(response, scenario, policy)
print(f"Passed: {grade.passed}")
print(f"Feedback: {grade.feedback}")
if not grade.passed:
print("Issues:")
for issue in grade.issues:
print(f" - {issue}")
Using Stronger Grading Model
from synkro.models import Anthropic
grade = synkro.grade(
response,
scenario,
policy,
model=Anthropic.CLAUDE_OPUS, # More thorough grading
)
Grading Criteria
The grader evaluates:
- Policy Compliance - Does the response follow the rules?
- Accuracy - Is the information correct?
- Completeness - Are all relevant rules addressed?
- Expected Outcome - Does it match the scenario’s expected behavior?