# Inference

> Inference is the act of running a trained model on new inputs to produce predictions or generated output.

Category: Production
Source: https://impetora.com/glossary/inference
Part of: Impetora AI consulting glossary (https://impetora.com/glossary)

## What is Inference?

Inference is distinct from training. Training builds the model; inference uses it. Production inference systems care about latency, throughput, cost per request, and tail behaviour. Optimisations include batching, quantisation, distillation, caching, speculative decoding, and hardware-aware serving. For LLMs, inference cost is usually dominated by prompt and output token counts.

## How does Inference apply to enterprise AI?

Enterprise inference economics drive build-vs-buy decisions. Hosted APIs trade cost for speed; self-hosted inference trades operational burden for unit economics and EU residency.

## Related terms

- [Large Language Model](https://impetora.com/glossary/large-language-model) - A Large Language Model (LLM) is a foundation model trained on text to predict the next token, capable of generating, summarising, and reasoning over natural language.
- [LLMOps](https://impetora.com/glossary/llmops) - LLMOps is the subset of MLOps focused on the specific operational concerns of large language models: prompt versioning, evaluation, cost control, and output observability.
- [Observability](https://impetora.com/glossary/observability) - Observability for AI is the ability to understand what an AI system did, why it did it, and at what cost, by inspecting its inputs, outputs, intermediate steps, and metrics.
- [Build vs Buy AI](https://impetora.com/glossary/build-vs-buy-ai) - Build vs buy is the strategic decision between developing an AI capability internally or in partnership, versus licensing a finished product from a vendor.

## External references

- [NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server)

---

Impetora is a custom AI consultancy and solutions partner for enterprises in regulated industries. Submit a project at https://impetora.com/intake.
