Skip to main content
Composite Score evaluations combine multiple existing evaluations into a unified score. This is ideal for measuring overall quality by aggregating various aspects of your prompt’s performance, such as combining accuracy, safety, and relevance metrics into one comprehensive assessment.
  • How it works: Runs multiple existing evaluations and combines their results using different mathematical approaches (average, weighted, or custom formula). Note that sub-evaluations do not create their own results!
  • Best for: Holistic quality assessment, combining multiple evaluation criteria, creating overall performance metrics, balancing trade-offs between different aspects (e.g., accuracy vs. safety).
  • Requires: At least two existing evaluations configured on the same prompt. These evaluations can be of any type, even other Composite Scores!
Currently, composite evaluations cannot run in live mode and only support sub-evaluations that do not require an expected output. Check out the Running Evaluations guide.

Setup

1

Go to evaluations tab

Go to evaluations tab on a prompt in one of your projects.
2

Combine evaluations

On the top right corner, click on the “Combine evaluations” button.
3

Choose a metric

Choose Composite Score
metric
4

Select sub-evaluations

Select the evaluations you want to combine. You need to select at least two evaluations. Select
sub-evaluations

Metrics

Average
Combines scores evenly. The resulting score is the average.
Weighted
Combines scores using custom weights. The resulting score is the weighted blend. Weights are measured in percentage and must add up to 100%.
Custom
Combines scores using a custom formula. The resulting score is the result of the expression. The expression can be a complex mathematical formula.