---
title: "Enterprise AI implementation roadmap for 2026 | Impetora"
description: "A 90-day roadmap for an enterprise's first production AI deployment. What readiness looks like, what each Discovery, Pilot, and Production phase delivers, and the timing mistakes that cost the biggest programmes a year."
url: https://impetora.com/blog/enterprise-ai-implementation-roadmap-2026
category: Methodology
datePublished: 2026-04-27
dateModified: 2026-04-27
readMinutes: 12
author: Impetora
---

# Enterprise AI implementation roadmap for 2026

> A first enterprise AI deployment can move from kickoff to production in roughly 90 days when the workload is correctly scoped, the data is governed, and the team builds for operability from week one. The same scope, attempted without a structured roadmap, typically takes a year and stalls at the security review. The difference is not budget or talent. It is sequencing.

*Updated 2026-04-27. By Impetora. 12 min read.*

## What does enterprise AI readiness actually look like in 2026?

Readiness in 2026 is no longer about whether the model is capable. The frontier models from OpenAI, Anthropic, Google, and the open-weight ecosystem are sufficiently capable for almost every enterprise workload that does not require novel research. Readiness is now about whether the organisation can operate the system in production once it is built. McKinsey's 2024 state-of-AI survey found that 72% of organisations had adopted AI in at least one function, up from 55% the year before, but only a small minority of those deployments had moved past pilot into ongoing production use [1]. The readiness gap shows up in five places. The data needed to ground the system is rarely centralised, rarely cleanly classified, and rarely consented for the new use. The downstream systems the AI must integrate with have undocumented APIs and legacy authentication. The compliance function has not been included in the design. The operations team has no runbook for an AI workload because the organisation has never run one. And the executive sponsor has been promised an outcome that the proposed scope cannot deliver. None of these problems are unsolvable. All of them are slow to solve in the middle of a delivery sprint. Readiness work is the discipline of solving them before the sprint starts. Gartner's research consistently identifies the same pattern: programmes that complete a structured readiness phase ship roughly twice as fast through pilot to production as programmes that skip it [2]. A useful working definition: an enterprise is ready when it can answer four questions in writing. What workload are we automating, end to end? What data does the system need to read, and what is the lawful basis for each source? Who owns the system in production? What does success look like at 30, 60, and 90 days, with measurable thresholds? When all four answers exist before kickoff, a 90-day production deployment is realistic. When any one is missing, the programme will overrun.

## What does the Discovery phase deliver?

Discovery is the first two to four weeks of the engagement. Its output is a written package that lets the organisation decide, with full information, whether to proceed to pilot. Discovery is paid work. It is not a sales motion dressed up as scoping. The deliverables matter on their own even if the programme stops at the end of Discovery. The package contains five artefacts. First, a data-source map: every system the AI will read, the data it contains, the lawful basis for processing under GDPR, the retention policy, and the integration approach. Second, a workload diagram: the end-to-end process the AI will participate in, the human handoffs, the failure modes, and the escalation paths. Third, a target architecture: the components, the model selection rationale, the retrieval and grounding approach, the logging schema, and the human-oversight surface. Fourth, a risk classification: where the workload sits under the EU AI Act and what obligations apply. Fifth, a delivery plan: the pilot scope, the production-acceptance criteria, the timeline, the team, and the cost estimate. The artefacts should be reviewable by four functions: the business owner, the DPO, the security lead, and the operations team that will run the system. If any of those four cannot make a decision based on the package, the package is incomplete and Discovery is not finished. The biggest mistake we see is treating Discovery as a one-week analyst exercise that produces slides for the steering committee. Slides are not artefacts. Decisions cannot be defended from slides three years later when the auditor arrives. MIT CISR's research on AI implementation maturity makes the same point in a different vocabulary: the highest-performing organisations treat AI projects as enterprise-architecture decisions, not as individual workload decisions, and they invest disproportionately in the discovery and design phases [3]. The cost of a thorough Discovery is recovered many times over by the time the system is in production.

## What is the right scope for an AI pilot?

A pilot is not a proof of concept. A proof of concept proves a model can do a task. A pilot proves an organisation can run a model in production under realistic constraints. The distinction is operational. A pilot must run end to end, against real data, with real users, in the production environment, for long enough to surface the operational issues that a sandbox demo will never expose. The right pilot has six properties. It is narrow: one workload, one persona, one decision the AI is allowed to influence. It is bounded: a specific data set, a specific user group, a specific time window. It is instrumented: every input, output, and human decision is logged with the audit-grade fields described in Building AI systems that survive audit. It is reversible: there is a documented rollback procedure that operations can run without the build team's involvement. It is measurable: the success criteria are specified in advance, with a small number of metrics that map to business outcomes rather than to model accuracy. And it is honest: the pilot is allowed to fail without political consequence, because the organisation has agreed in advance what failure means and what the next step would be. Pilots that violate any of these properties tend to produce ambiguous results. The model worked but no one knew why, the model failed but no one could tell whether it was a model problem or a data problem, the pilot ran for two weeks and was extended indefinitely because no one wanted to call it. Ambiguity at the pilot stage is what kills programmes at the production-readiness review. The cleanest pilots are the ones with the strictest exit criteria. A typical pilot for a regulated workload runs four to eight weeks in calendar time, with two weeks of build and four to six weeks of supervised production use. The supervised period is where the real learning happens. Forrester's research on AI services delivery notes that fewer than 20% of enterprise GenAI engagements were running at production scale in late 2024, with most stuck in pilot extensions for months at a time [4]. The fix is structural, not technical. A pilot with a written exit checklist either passes or terminates. It does not drift.

## When is an AI system actually ready for production?

Production readiness is a checklist with sections rather than a single bar. We use seven sections, each with a small set of binary questions. The system is production-ready when all sections pass. Not when "most" pass, not when "the team feels good," not when the deadline is here. All sections. Section 1, accuracy and behaviour. The system meets its stated accuracy target on a held-out evaluation set, the failure modes are characterised, and the failure rate is acceptable to the business owner. Section 2, evidence and explainability. Every output produced by the system can be traced back to its inputs, its retrieval set, its prompt and model version, and its reviewer decision. The trace is queryable. Article 13 of the EU AI Act requires this for high-risk systems and it is good practice for any decision-influencing AI [5]. Section 3, integration and reliability. The system meets its latency target at peak load, fails gracefully when downstream systems are unavailable, and recovers automatically from transient failures. The integration tests pass continuously in CI. Section 4, security and privacy. Data residency commitments are enforceable, encryption is verified end to end, the sub-processor list is documented, the DPIA is signed, and access controls follow least privilege. Section 5, operations. A named team owns the system in production. The runbook covers the top ten failure modes. On-call rotation is in place. Drift monitoring is live. Rollback is tested, not assumed. Section 6, human oversight. The reviewer surface is built, the reviewer decisions are logged, the escalation path is documented, and the workload between automated and human-reviewed cases is calibrated to actual reviewer capacity. Section 7, governance. The system is registered in the AI inventory, classified under the AI Act, mapped to ISO 42001 controls if applicable, and assigned a re-review cadence. The discipline matters because production AI systems do not stay still. Drift is real, model upgrades are real, data sources change, business contexts shift. A system that ships without a complete checklist will accumulate technical debt at a faster rate than the team can pay it down, and within twelve months the operations team will refuse to be on call for it. The checklist is the contract between build and operate.

## What are the timing mistakes that cost programmes a year?

Three mistakes recur in programmes that overrun by a year or more. Each is a sequencing error rather than a technical failure. Compressing Discovery into a kickoff workshop. The team treats Discovery as a one-day event, produces a slide deck, and starts building. By week eight the data audit reveals that two of the three planned data sources have legal-basis problems. The team has already burned the budget and now has to choose between cancelling and rebuilding around what is left. Discovery should be paid, scoped, and time-boxed to two-to-four weeks. Cutting it shorter is the most expensive saving in the programme. Skipping the supervised pilot in favour of a direct production launch. The team feels confident from the proof-of-concept demo and goes straight to production with the full user base. The first week of real traffic surfaces failure modes that a four-week supervised pilot would have caught with a tenth of the blast radius. The recovery costs the programme three months. Pilot in shadow before pilot in production. Pilot in production before scale. Treating governance as a final-quarter activity. The team plans to do the AI Act classification, the DPIA, the ISO 42001 mapping, and the inventory registration "after the system is live." None of these are after-the-fact activities. The classification is a design input. The DPIA is required before processing starts. The ISO controls are part of the architecture. Adding them late forces redesign of components that have already been built. The fix is a single rule: the compliance function joins the kickoff and stays in every weekly review until the system is in production. The pattern across all three is the same. The cheapest place to absorb a regulatory or design constraint is at the start. Every week the constraint is deferred, the cost of absorbing it grows. Programmes that respect the sequencing ship in 90 days. Programmes that fight it ship in fifteen months.

## What does the actual 90-day roadmap look like?

The roadmap is three phases, each with named deliverables and a written exit gate. Days 1-21, Discovery. Workload mapped end to end. Data-source inventory completed and classified. Lawful basis documented for each source. Risk classification under the EU AI Act produced. Target architecture drafted and reviewed by DPO, security, and operations. Pilot scope and acceptance criteria written. Production-readiness checklist drafted. Exit gate: business owner, DPO, and security lead sign off the package in writing. No sign-off, no Pilot. Days 22-56, Pilot. First two weeks: build the system end to end against the target architecture. Source-of-truth separation, prompt and model pinning, retrieval grounding, schema-constrained outputs, audit-grade logging, human-oversight surface. Wire the system into the production environment in shadow mode. Next three weeks: supervised production use with a bounded user group and bounded data set. All outputs reviewed. Drift, accuracy, latency, and reviewer load measured against the stated criteria. Exit gate: pilot passes acceptance against the criteria, or the programme terminates with a written post-mortem. No "extended pilot." No "let's see how next month goes." Days 57-90, Production. Two weeks of staged rollout: progressively widen the user group and the data set under continuous monitoring, with rollback ready. Two weeks of stabilisation: tune retrieval, calibrate human-review thresholds, complete the runbook, finalise the on-call rotation. Final week: production-readiness review against the seven-section checklist. Exit gate: the system is in the AI inventory, the runbook is owned by operations, and the named team is on call. This timeline is achievable for a single, well-scoped workload in a regulated environment. It assumes the organisation does the readiness work. It assumes the scope is narrow. It assumes the leadership has the discipline to terminate at the gates if the criteria are not met. The 90-day roadmap is not a guarantee. It is a structure that makes the right outcome the most likely outcome.

## How does Impetora structure the engagement?

Most of our engagements use a version of this 90-day structure, adapted to the specific workload. We run Discovery as a paid two-to-four-week phase with a written deliverable that the client owns regardless of whether they continue. We run Pilot as a four-to-eight-week phase with a written exit gate. We run Production as a stabilisation and handover phase, after which we step back into an Operate posture for ongoing monitoring, drift management, and versioned releases. The engagement model is built around the readiness work, not around hours. We do not start a pilot before Discovery is signed off, and we do not declare production-ready until the seven-section checklist passes. That discipline is what separates a system that ships from a system that stalls. If you have a workload to scope under this structure, the intake form is the only path in. We reply within one business day with a written next step.

## Frequently asked questions

### Can a 90-day roadmap really cover a high-risk AI workload under the EU AI Act?

It can cover the design, the pilot, and the initial production deployment for one workload. It does not cover certification activities for systems that require third-party conformity assessment, which can run alongside but on their own timeline. The 90-day structure is the engineering and operational delivery; the conformity-assessment work is run as a parallel governance stream by the compliance function with input from the build team.

### What is the difference between a pilot and a proof of concept?

A proof of concept proves a model can perform a task on a sample. A pilot proves an organisation can run the model in production with real data, real users, real failure modes, and real human oversight. PoCs are useful very early, before scoping decisions are made. Pilots are required before production. Treating a PoC as if it were a pilot is one of the most expensive shortcuts in enterprise AI delivery.

### Does the 90-day roadmap apply to vendor-platform deployments like Microsoft Copilot?

The same sequencing applies to any production AI deployment, custom or platform. With a vendor platform, Discovery focuses more on data integration and policy configuration and less on architectural design. Pilot still requires supervised production use. Production readiness still requires the seven-section checklist. The platform reduces the build effort but does not eliminate the readiness or governance work.

### How do we know when to terminate a programme rather than extend the pilot?

Decide the answer in writing before the pilot begins. The pilot scope document should specify the acceptance criteria and the consequence if they are not met. If the criteria fail at the exit gate, the programme stops, the team writes a post-mortem, and the organisation learns from the structured failure. Extending a failing pilot is almost always worse than terminating it, because the budget keeps depleting while the underlying problem stays unsolved.

### What is the right team size for a 90-day enterprise AI deployment?

For a single workload, a small senior team usually outperforms a larger mixed team. A typical shape is a technical lead, two engineers covering the AI and integration surfaces, a part-time data engineer, a part-time DPO or compliance liaison, and a business owner who can spend several hours per week on review. Larger teams are appropriate for multi-workload programmes where the additional capacity goes to parallel workstreams, not to deeper coverage of the same workstream.

### Should we wait until the EU AI Act high-risk obligations apply in August 2026 before deploying?

No. The obligations apply from August 2026 to systems placed on the market after that date, which means designing for them now is cheaper than retrofitting later. The Act also formalises practices that are already industry good practice: technical documentation, logging, human oversight, post-market monitoring. Building these in now prepares the organisation for the formal deadline and improves the operability of every AI workload in the meantime.

### How does Discovery cost compare to building straight into pilot?

Discovery is a fraction of the total programme cost and consistently reduces the total cost of getting to production. Programmes that skip Discovery typically pay for it later through scope changes, security-review rework, and extended pilots. The variance in total cost between disciplined and undisciplined programmes is much larger than the cost of Discovery itself.

## Sources cited

1. The state of AI in early 2024. McKinsey & Company, 2024-05. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
2. Hype Cycle for Artificial Intelligence, 2024. Gartner, 2024-08. https://www.gartner.com/en/articles/hype-cycle-for-artificial-intelligence
3. Achieving Enterprise AI Readiness. MIT Center for Information Systems Research, 2024-05. https://cisr.mit.edu/publication/2024_0501_AIReadiness_PetersonWoerner
4. The Forrester Wave: Generative AI Services, Q4 2024. Forrester, 2024-11. https://www.forrester.com/report/the-forrester-wave-generative-ai-services-q4-2024/RES181225
5. Regulation (EU) 2024/1689 (Artificial Intelligence Act). European Union, Official Journal, 2024-07-12. https://eur-lex.europa.eu/eli/reg/2024/1689/oj
6. AI Risk Management Framework Generative AI Profile (NIST AI 600-1). NIST, 2024-07. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
