# Evaluation Harness

> An evaluation harness is the test framework used to measure an AI system against a fixed set of inputs, expected outputs, and metrics, run on every change.

Category: Production
Source: https://impetora.com/glossary/evaluation-harness
Part of: Impetora AI consulting glossary (https://impetora.com/glossary)

## What is Evaluation Harness?

An evaluation harness combines a curated dataset, scoring functions, and a runner. Scoring may be exact-match, embedding similarity, rubric-based LLM-as-judge, business KPI, or human review. The harness runs on every prompt change, model change, retrieval change, or data change. Without it, the team has no way to tell whether an edit improved or regressed the system.

## How does Evaluation Harness apply to enterprise AI?

Enterprise AI systems must have an evaluation harness before they go live. It is the difference between a demo and a production system, and the artefact regulators ask for under the EU AI Act conformity assessment.

## Related terms

- [LLMOps](https://impetora.com/glossary/llmops) - LLMOps is the subset of MLOps focused on the specific operational concerns of large language models: prompt versioning, evaluation, cost control, and output observability.
- [Observability](https://impetora.com/glossary/observability) - Observability for AI is the ability to understand what an AI system did, why it did it, and at what cost, by inspecting its inputs, outputs, intermediate steps, and metrics.
- [Model Drift](https://impetora.com/glossary/model-drift) - Model drift is the gradual or sudden degradation of a model's performance in production caused by changes in input data, target distribution, or operating context.
- [Guardrails](https://impetora.com/glossary/guardrails) - Guardrails are runtime checks placed around an AI system to constrain inputs, outputs, and tool calls within safety, compliance, and business policy.

## External references

- [EleutherAI lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)

---

Impetora is a custom AI consultancy and solutions partner for enterprises in regulated industries. Submit a project at https://impetora.com/intake.
