Nao.Eval Namespace
| Type/Module | Description |
|
|
|
|
A single evaluation test case |
|
|
|
|
|
A dataset is a named collection of eval cases |
|
|
|
|
|
Aggregate report of an evaluation run |
|
|
|
|
|
The result of evaluating a single case |
|
|
The evaluation runner: runs cases against an agent and scores them |
|
|
Configuration for the evaluation runner |
|
|
The verdict of a single evaluation |
|
|
Interface for evaluating agent outputs against expectations |
|
|
Summary statistics for a specific tag |
Nao