EvalRunner Module
The evaluation runner: runs cases against an agent and scores them
Functions and values
| Function or value |
Description
|
Full Usage:
compareAgentsAsync config evaluator agents dataset
Parameters:
EvalRunnerConfig
evaluator : IEvaluator
agents : (string * IAgent) list
dataset : EvalDataset
Returns: Task<(string * EvalReport) list>
|
Compare two agents on the same dataset
|
Full Usage:
runCaseAsync evaluator agent case
Parameters:
IEvaluator
agent : IAgent
case : EvalCase
Returns: Task<EvalResult>
|
Run a single eval case against an agent with a given evaluator
|
Full Usage:
runCaseLightAsync evaluator agent case
Parameters:
IEvaluator
agent : IAgent
case : EvalCase
Returns: Task<EvalResult>
|
Run a single eval case without trace capture (lightweight)
|
Full Usage:
runDatasetAsync config evaluator agent dataset
Parameters:
EvalRunnerConfig
evaluator : IEvaluator
agent : IAgent
dataset : EvalDataset
Returns: Task<EvalReport>
|
Run all cases in a dataset against an agent
|
Full Usage:
runWithMultipleEvaluatorsAsync config evaluators agent dataset
Parameters:
EvalRunnerConfig
evaluators : IEvaluator list
agent : IAgent
dataset : EvalDataset
Returns: Task<EvalReport>
|
Run cases with multiple evaluators and combine results
|
Nao