# Multi-modal AI

> Multi-modal AI refers to systems that can process and generate more than one type of input or output, such as text, images, audio, and video, within a single model or pipeline.

Category: Architecture
Source: https://impetora.com/glossary/multi-modal-ai
Part of: Impetora AI consulting glossary (https://impetora.com/glossary)

## What is Multi-modal AI?

Multi-modal models map different modalities into a shared representation space. A user can ask a question about an uploaded chart; a voice agent can read aloud; a claim adjuster can attach a photo of damage and receive structured output. Underlying architectures include vision-language transformers, audio encoders coupled to language models, and joint embedding models. Multi-modal capability is now standard for top foundation models.

## How does Multi-modal AI apply to enterprise AI?

Enterprise multi-modal AI is most valuable in claims processing, medical imaging triage, voice support, and document workflows where input mixes scanned PDFs, photos, and structured fields.

## Related terms

- [Foundation Model](https://impetora.com/glossary/foundation-model) - A foundation model is a large neural network pre-trained on broad data and designed to be adapted to many downstream tasks.
- [Large Language Model](https://impetora.com/glossary/large-language-model) - A Large Language Model (LLM) is a foundation model trained on text to predict the next token, capable of generating, summarising, and reasoning over natural language.
- [Agentic AI](https://impetora.com/glossary/agentic-ai) - Agentic AI refers to systems that plan multi-step actions, call external tools, and operate with some autonomy toward a goal, rather than producing a single response to a single prompt.
- [Embedding](https://impetora.com/glossary/embedding) - An embedding is a dense numerical vector that represents a piece of content (text, image, audio) in a way that semantically similar items end up close together in the vector space.

## External references

- [Radford et al., CLIP](https://arxiv.org/abs/2103.00020)

---

Impetora is a custom AI consultancy and solutions partner for enterprises in regulated industries. Submit a project at https://impetora.com/intake.
