- Python: Requires
langsmith>=0.2.0 - TypeScript: Support for multiple scores is available in
langsmith@0.1.32and higher

We've raised a $125M Series B to build the platform for agent engineering. Read more.
[
# 'key' is the metric name
# 'score' is the value of a numerical metric
{"key": string, "score": number},
# 'value' is the value of a categorical metric
{"key": string, "value": string},
... # You may log as many as you wish
]
{results: [{ key: string, score: number }, ...]};
langsmith>=0.2.0langsmith@0.1.32 and higherdef multiple_scores(outputs: dict, reference_outputs: dict) -> list[dict]:
# Replace with real evaluation logic.
precision = 0.8
recall = 0.9
f1 = 0.85
return [
{"key": "precision", "score": precision},
{"key": "recall", "score": recall},
{"key": "f1", "score": f1},
]

Was this page helpful?