Welcome
Instructions
Use InspectorRAGet to visualize and analyze LLM evaluation results. Your evaluation data is expected to contain:
- A set of tasks, each with one or more model results. All models must have a result for every task, and all metrics must have scores on every model result. Including more than 5 models may result in noticeable slowdowns due to the volume of data being processed and rendered.
- Scores on one or more metrics per model result. Metrics may be categorical (e.g., yes/no, Likert scale) or numeric. Mixing metric types within a single experiment is supported.
- At least one annotator or evaluator (human, algorithmic, or LLM-based). Multiple annotators per experiment are supported.
Supported task types:
- RAG
- Text generation
- Tool calling
- Agentic traces
The next page includes a full schema example. If your file does not match the expected format, you will see a validation error.