InspectorRAGet

Welcome

Instructions

Use InspectorRAGet to visualize and analyze LLM evaluation results. Your evaluation data is expected to contain:

  • A set of tasks, each with one or more model results. All models must have a result for every task, and all metrics must have scores on every model result. Including more than 5 models may result in noticeable slowdowns due to the volume of data being processed and rendered.
  • Scores on one or more metrics per model result. Metrics may be categorical (e.g., yes/no, Likert scale) or numeric. Mixing metric types within a single experiment is supported.
  • At least one annotator or evaluator (human, algorithmic, or LLM-based). Multiple annotators per experiment are supported.

Supported task types:

  • RAG
  • Text generation
  • Tool calling
  • Agentic traces

The next page includes a full schema example. If your file does not match the expected format, you will see a validation error.