Header menu logo Nao

EvalRunner Module

The evaluation runner: runs cases against an agent and scores them

Functions and values

Function or value Description

compareAgentsAsync config evaluator agents dataset

Full Usage: compareAgentsAsync config evaluator agents dataset

Parameters:
Returns: Task<(string * EvalReport) list>

Compare two agents on the same dataset

config : EvalRunnerConfig
evaluator : IEvaluator
agents : (string * IAgent) list
dataset : EvalDataset
Returns: Task<(string * EvalReport) list>

runCaseAsync evaluator agent case

Full Usage: runCaseAsync evaluator agent case

Parameters:
Returns: Task<EvalResult>

Run a single eval case against an agent with a given evaluator

evaluator : IEvaluator
agent : IAgent
case : EvalCase
Returns: Task<EvalResult>

runCaseLightAsync evaluator agent case

Full Usage: runCaseLightAsync evaluator agent case

Parameters:
Returns: Task<EvalResult>

Run a single eval case without trace capture (lightweight)

evaluator : IEvaluator
agent : IAgent
case : EvalCase
Returns: Task<EvalResult>

runDatasetAsync config evaluator agent dataset

Full Usage: runDatasetAsync config evaluator agent dataset

Parameters:
Returns: Task<EvalReport>

Run all cases in a dataset against an agent

config : EvalRunnerConfig
evaluator : IEvaluator
agent : IAgent
dataset : EvalDataset
Returns: Task<EvalReport>

runWithMultipleEvaluatorsAsync config evaluators agent dataset

Full Usage: runWithMultipleEvaluatorsAsync config evaluators agent dataset

Parameters:
Returns: Task<EvalReport>

Run cases with multiple evaluators and combine results

config : EvalRunnerConfig
evaluators : IEvaluator list
agent : IAgent
dataset : EvalDataset
Returns: Task<EvalReport>

Type something to start searching.