Moving AI From Proof of Concept to Production Value

How do we make AI work reliably, repeatably, and at scale?

The State of AI in Production
Why AI Projects Fail to Scale
AI Value: Defining Outcomes Before Technology
Enterprise Data Foundations: The Hidden Determinant of Scale
From Prototype to Production: Operationalising AI
Governance, Risk and Compliance as Enablers, Not Blockers
Delivery Framework: From Ideation to Ongoing Value
Case Studies: From Pilot to Production in Practice
Building a Repeatable, Scalable AI Operating Model
Conclusion

The promise of artificial intelligence has never been greater – and the gap between that promise and reality has never been more visible. Organisations across every sector are experimenting with AI at an unprecedented pace. But experimentation and value creation are very different things, and right now, most enterprises can find themselves stuck in the former.

There are multiple terms gaining traction among technology leaders for this, such as “pilot purgatory”, or “pilotitis”. It describes the organisational pattern of running successive proof of concepts, demos and small-scale experiments that never graduate into production systems delivering sustained business value. Each pilot generates excitement, consumes budget, and produces a promising presentation to the board – but the leap from "it works in a controlled environment" to "it runs reliably at scale, integrated into our operations" never quite happens. The next pilot begins before the last one was ever operationalised.

AI PoC to Production Gap

This piece sets out a framework for escaping "pilot purgatory", for IT leaders who have moved past the question of whether AI matters and are now grappling with the questions of: how do we make it work – reliably, repeatably, and at scale?

The State of AI in Production

The data on AI adoption tells a story of increasing experimentation and limited production deployment. McKinsey's 2025 Global Survey on AI found that 88% of organisations now use AI in at least one business function, up from 78% a year earlier. Yet only 7% of respondents indicated that AI had been fully scaled across their organisations.

Nearly everyone is trying AI, but very few have industrialised it.

The picture is similar when looking at from the perspective of project outcomes. Research from the RAND Corporation found that more than 80% of AI projects fail – twice the rate of failure for IT projects that do not involve AI. Gartner predicted that at least 30% of generative AI projects would be abandoned after proof of concept by the end of 2025, due to poor data quality, inadequate risk controls, escalating costs or unclear business value. And Boston Consulting Group's research across 1,000 senior executives found that 74% of companies are struggling to move beyond proofs of concept and generate tangible value from AI, with only 26% having developed the capabilities to do so.

These findings represent a broad consensus that the technology works, but the organisational machinery to exploit it does not. The bottleneck has shifted from "can AI do this?" to "can we operationalise AI in a way that delivers reliable, measurable business outcomes?"

That shift demands a fundamentally different approach – one that treats the path to production as the primary design constraint, as opposed to an afterthought.

Why AI Projects Fail to Scale

Understanding why pilots stall is the first step toward preventing it. The root causes are often rarely technical and are more strategic, organisational and foundational. Causes can be a number of the following:

Starting with technology instead of a problem:

The most common pattern in failed AI initiatives is that they begin with a capability – "we should do something with large language models" – rather than a clearly defined business problem. Research identifies that industry stakeholders often misunderstand or miscommunicate what problem needs to be solved using AI, and that this is the most common reason for AI project failure. It is also common to see people try and apply AI, such as generative AI/LLMs, to a use case where AI is not necessarily fit-for purpose, where more traditional algorithmic or machine learning approaches often outperform and are more deterministic depending on what outcome is trying to be achieved. When the starting point is technology rather than outcome, this can often result in a smart demonstration that people don’t know how to deploy.

Lack of executive sponsorship and cross-functional alignment:

AI projects that remain within a single team – typically data science or IT – rarely see scale. Scaling requires changes to workflows, processes and sometimes organisational structures. That requires executive authority and cross-functional buy-in that a pilot team likely doesn't have. McKinsey's research found that management practices spanning strategy, talent, operating model, technology, data and adoption all correlate positively with value attributable to AI, and that having an agile delivery organisation is strongly correlated with achieving value.

Underestimating data readiness:

Many organisations assume their data is ready for AI because it supports reporting and analytics, however this can still be unfit for purpose for AI projects. A survey of data management leaders found that 63% of organisations either do not have, or are unsure if they have, the right data management practices for AI. The difference between data that supports dashboards and data that can train, validate and operate AI models is vast due to much higher historic volume, quality and feature richness requirements and closing that gap can take more investment.

No defined production pathway:

Perhaps the most revealing pattern is the pilot that "succeeds" technically but has no plan for what happens next. This can be seen in a lack of infrastructure to deploy the model, monitoring tracking its performance, integration with existing systems, and change management plan for the people who will use it. The pilot was designed as an experiment, and an experiment is what it remains.

Treating AI as an IT project:

AI initiatives that are governed and resourced like traditional IT projects – with fixed requirements, waterfall timelines and technology-centric success criteria – can be structurally set up to fail. AI is inherently iterative, where models need to be trained, evaluated, refined and retrained. Data pipelines evolves, whilst user feedback shapes the solution. This means that the delivery model must be shaped to this.

It is often that these causes do not operate in isolation. In most organisations where AI projects are stalled in the PoC phase, several causes are present simultaneously, compounding one another.

AI Value: Defining Outcomes Before Technology

We see that the most important move an organisation can make is to begin every AI initiative with a clearly articulated business outcome – as opposed to a technology capability.

Although this may sound obvious, starting with the technology in AI projects is actually more common that organisations think.

In too many organisations, the AI conversation begins with "what can we build?" rather than "what problem are we solving, for whom, and how will we measure success?" The result can be a portfolio of pilots optimised for technical interest rather than business impact, with no clear line of sight to value.

An outcome-first approach means defining, before any technical work begins, the specific business metric the initiative is intended to move. That might be reducing customer churn by a measurable percentage, accelerating underwriting decisions by a defined number of days, improving defect detection rates on a production line, or decreasing manual processing time in a back-office function. The metric must be specific, measurable and tied to a genuine operational or financial outcome.

This can serve several purposes.

Firstly, it forces early engagement with business stakeholders, who must validate that the proposed outcome is worth pursuing and that the current process is genuinely a bottleneck.

Secondly, it establishes the criteria against which the initiative will be judged, preventing the ambiguous "the model looks promising" conclusion that sustains pilot purgatory.

Lastly, it surfaces feasibility questions early: do we have the data to support this? Can we integrate with the relevant systems? Will users actually adopt a new way of working?

BCG's research found that AI leaders pursue, on average, only about half as many opportunities as their less advanced peers, focusing instead on the most promising initiatives – and they expect more than twice the ROI as a result. Detailed prioritisation phases can create fewer initiatives, that are more rigorously selected, with a clear path to measurable value.

A practical framework for prioritisation should assess each potential use case across at least three dimensions:

the size and certainty of the expected business impact;
the feasibility given current data, technology and organisational readiness; and
the strategic alignment with broader business objectives.

Use cases that score highly across all three are usually good choices for investment. Those that score highly on only one – typically feasibility, which is why they appeal to technical teams – should be deprioritised, even if the technology looks interesting.

The output of this stage should be a concise value case for each initiative: the problem, the proposed AI-enabled solution, the expected outcome, the KPIs that will be tracked, the data requirements and the dependencies.

In those cases where value cases cannot be completed across these areas, it is likely that you are not ready to start building.

Enterprise Data Foundations: The Hidden Determinant of Scale

If outcome definition is the most important strategic shift, data readiness is the most important operational one. Most conversations about AI eventually end up becoming a conversation about data – but in scome cases, this conversation can be too far down the line.

Research predicts that through 2026, organisations will abandon 60% of AI projects unsupported by AI-ready data. Evidently, this is not just a readiness gap; it is the single largest determinant of whether AI initiatives scale or stall.

The challenge is that most organisations have data estates built for reporting and transactions, whereas can be insufficient for AI. Traditional data management comprising of structured databases, ETL pipelines feeding data warehouses, quality rules focused on completeness and consistency, are necessary but require additional capabilities to effective leverage AI.

AI-ready data demands aspects such as large-scale unstructured data processing, feature engineering and storage, data lineage and provenance tracking, real-time data integration, and governance frameworks that balance accessibility with control.

There are several dimensions of data readiness that organisations can assess:

Quality:

For AI, qualitative measures are comprised of more than accuracy and completeness. It includes representativeness (does the data reflect the real-world conditions the model will encounter?), timeliness (is the data current enough?), and label quality (for supervised learning, are the labels accurate and consistent?).

Accessibility:

Accessibility means that data can be discovered, accessed and used by AI teams without requiring extra work from data owners. This is often the most practically challenging dimension. Where data is locked in silos, subject to unclear ownership, or accessible only through manual extracts, it is unlikely that it will be able to support scalable AI.

Integration:

Integration addresses whether data from multiple sources can be combined reliably. The most valuable AI use cases require joining data across systems, such as customer data, transactional data, operational data, external data, and doing so in a way that is automated, repeatable and auditable.

Governance:

Governance ensures that data usage for AI complies with regulatory requirements, organisational policies and ethical standards. This includes data privacy, consent management and the ability to trace how data was used in model training and inference.

The cost of not considering data foundations can result in each AI project becoming a bespoke data engineering exercise, unexpectedly ballooning project scale. This can look like teams spending months preparing data for a single use case, with none of that work reusable for the next initiative. This can end up in organisations paying the full cost of data preparation every time, with no compounding benefit.

The alternative is to invest in a shared data platform that serves as the foundation for all AI initiatives. This means identifying the core data domains that will support the highest-priority use cases and building reusable, well-governed pipelines for those domains. Each successive AI initiative can then build on the existing foundation rather than starting from scratch.

From Prototype to Production: Operationalising AI

The gap between a working prototype and a production-grade AI system is where the majority of AI investment is lost. A prototype proves that an idea can work, whereas production proves it can deliver value reliably, at scale, day after day.

A prototype might run in a Jupyter notebook on a data scientist's laptop, using a static dataset, with manual steps throughout the pipeline. A production system must handle live data with all its complexity and variability. Production should be integrated with existing enterprise systems, such as the CRM, ERP, and customer-facing applications. It must be monitored for performance degradation and be secure, compliant and auditable, as well as maintainable and resilient.

This can become an engineering challenge, requiring engineering practices, such as:

MLOps and AI engineering:

MLOps and AI engineering bring software engineering disciplines including continuous integration and delivery for models (CI/CD), automated testing of model performance against defined thresholds, model versioning and reproducibility, automated retraining pipelines, and monitoring and alerting for model drift (the gradual degradation in model performance as real-world data diverges from training data).

Platform thinking:

Platform thinking is the shift from building bespoke infrastructure for each AI initiative to building shared, reusable capabilities. A well-designed AI platform provides can provide common services such as data ingestion, feature engineering, model training, model serving, and monitoring that any AI team can use. This dramatically reduces the time and cost of moving each new use case from prototype to production, because the infrastructure work is done once and shared across initiatives.

Integration and workflow design:

This addresses the production reality that AI models do not operate in isolation, they are integrated into existing business processes and systems. This means understanding the user workflow, designing the interaction between human and AI (what decisions does the model make autonomously, and where does a human review or override?), and building the technical integrations that allow the model's outputs to flow into the systems where decisions are made and actions are taken.

Change management:

A model that is technically excellent but that users don't trust, don't understand or don't want to use also results in failure. Change management should be a core part of production deployment and include aspects such as user training, clear communication about what the AI does and doesn't do, feedback mechanisms that allow users to flag issues, and a plan for how roles and responsibilities change when AI is introduced into a workflow.

McKinsey's research found that the redesign of workflows has the biggest effect on an organisation's ability to see EBIT impact from its use of GenAI, out of 25 attributes tested. This reinforces the point that operationalising AI is about redesigning how work gets done, rather than just deploying a model.

Governance, Risk and Compliance as Enablers, Not Blockers

Governance is the aspect of AI that can trigger resistance from delivery teams. There can be a perception that governance means delays, committees, paperwork, and rigid risk management.

We see resistance where organisations typically have more process-oriented governance frameworks that are less suited to the iterative, experimental nature of AI. However, resistance to, or lack of, governance within the AI space can create a significant amount of risk in uncovering issues within production – where the consequences can be the most severe and costly.

The organisations that are succeeding with AI at scale have reframed governance as an enabler rather than a blocker. They have achieved this by making governance proportionate, embedded and automated.

Proportionate governance:

This means that the level of oversight is calibrated to the level of risk. A model that recommends blog articles to read requires a different governance posture than a model that makes credit decisions or informs clinical diagnoses.

A risk-based tiering framework, typically three or four tiers, from low risk to high risk, allows low-risk use cases to move quickly with lightweight review, while high-risk use cases receive the scrutiny they need. This can help to avoid the bottleneck of treating every AI initiative as if it were mission-critical.

Embedded governance:

This means building compliance checks into the development process rather than imposing them as a gate at the end.

This could be bias testing as part of model evaluation, data privacy assessments as part of data pipeline design and explainability requirements as part of model selection. When governance is embedded, it can prevent slowing delivery down, and the rework that comes from discovering compliance issues after the fact.

Automated governance:

This leverages tooling to enforce standards without human bottlenecks.

Automated checks for data quality, model performance thresholds, bias metrics and audit logging can be built into CI/CD pipelines, ensuring that governance is consistently applied without requiring manual review for every update.

The regulatory context is also evolving rapidly. The EU AI Act – the first comprehensive AI regulation globally – entered into force in August 2024, with obligations for general-purpose AI models becoming applicable from August 2025 and high-risk AI system requirements applying from August 2026.

While the UK is currently pursuing a more principles-based approach, the Bank of England and FCA's latest AI survey showed that 75% of financial services firms are already using some form of AI, up from 53% in 2022, and regulatory attention is intensifying across sectors. Organisations that have robust governance frameworks in place will find regulatory compliance a manageable extension of existing practice.

The takeaway is that governance is essential to successful AI projects. A lack of governance can erode trust with the board, create compliance risk that blocks deployment and generate rework when issues are discovered late.

Delivery Framework: From Ideation to Ongoing Value

Pulling together the principles described above, a practical delivery framework for AI initiatives should follow a structured sequence of phases, each with clear objectives, activities and decision gates. The framework below is designed to build the production pathway into the process from day one, preventing the common failure mode where a successful pilot has nowhere to go.

Phase 1: Discovery and Value Framing

This phase identifies and prioritises AI opportunities based on business impact, feasibility and strategic alignment. Key activities include stakeholder engagement to identify pain points and opportunities, assessment of current data and technology readiness, development of value cases with defined KPIs and prioritisation of the use case portfolio. It is important to first identify use cases, then pragmatically see how AI fits as a solution instead of trying to fit AI to all use cases.

The output is a prioritised backlog of AI initiatives with clear value cases.

Phase 2: Data Assessment and Preparation

Before any modelling begins, this phase evaluates the data required for the target use case and closes any readiness gaps. This can include data discovery and profiling, quality assessment against AI-readiness criteria, data pipeline design and engineering, and establishment of data governance and access controls.

The output is an AI-ready data foundation for the target use case, with reusable components where possible.

Phase 3: Solution Design and Prototyping

This is where the AI solution is developed and validated, but the production pathway is designed alongside the AI solution. This can include model, or architecture selection and development, iterative training and evaluation against defined success criteria, production architecture design (infrastructure, integration, monitoring), and governance and compliance review against the relevant risk tier.

The output is a validated AI solution and a production deployment plan.

Phase 4: Production Engineering

This phase builds the infrastructure and integrations required to operate in production. This can include deployment pipeline engineering (CI/CD, automated testing), system integration with enterprise applications and workflows, monitoring and alerting setup (model performance, data drift, system health), security hardening and compliance controls, and user interface or workflow design.

The output is a production-ready system that is fully integrated and monitored.

Phase 5: Deployment and Adoption

This phase focuses on operationalising AI. This can include a phased rollout (typically starting with a limited user group), user training and change management, feedback collection and issue resolution and performance tracking against the KPIs defined in Phase 1.

Phase 6: Monitoring, Optimisation and Scaling

This ongoing phase ensures the model continues to deliver value and identifies opportunities to extend or replicate the approach. This can include continuous performance monitoring, model retraining and updating as data and conditions evolve, value tracking and reporting, and identification of adjacent use cases that can leverage the same data and infrastructure.

The critical design principle of this framework is that production readiness is a consideration that runs through every phase from the beginning. Data engineering, infrastructure design, governance, integration planning and change management all start early, as opposed to after it has been built.

Case Studies: From Pilot to Production in Practice

Organisations across sectors are demonstrating what it looks like to move AI from experimentation to scaled production value and, equally instructively, what happens when the foundations are missing.

Rolls-Royce:

Rolls-Royce represents one of the most comprehensive examples of an organisation that has moved beyond pilot purgatory to embed AI across its entire value chain, from engine design through manufacturing to in-service maintenance.

Rather than pursuing disconnected experiments, Rolls-Royce established R2 Data Labs as a dedicated centre of excellence for data innovation, providing the centralised expertise and shared infrastructure that individual business units could draw on. Critically, it invested in a central data platform – building the reusable data foundations that allow each new AI use case to build on the last rather than starting from scratch.

The results span three distinct domains. In design engineering, Rolls-Royce uses generative AI and advanced simulation on Microsoft Azure to explore broader design parameters in hours rather than the years that manual processes required. In manufacturing, their "Signature Analyzer" tool uses AI and machine learning to predict defects in turbine blade production, dramatically reducing the processing time for inspecting the roughly two million cooling holes produced monthly – a task that was previously a significant manual bottleneck. And in after-market operations, AI-driven engine health monitoring tracks over 10,000 engine parameters in real time, while the company has achieved a 30% increase in machine utilisation through optimised scheduling and digital inventory management.

The company detects and prevents around 400 unplanned maintenance events annually, saving millions in repairs, and has accelerated fault resolution from days to near real time.

The company invested in shared data infrastructure, established governance frameworks appropriate to safety-critical aerospace applications, built cross-functional delivery capabilities, and designed for production from the outset. Each AI initiative builds on the platform and data foundations established by the last, creating the compounding returns that define a mature AI operating model.

JPMorgan Chase:

Financial services is one of the sectors where the stakes of pilot purgatory are highest and where the regulatory environment makes the leap to production most demanding.

The bank committed a $17 billion technology budget in 2024, with approximately $1.3 billion directed specifically at advancing AI capabilities and a further $3.1 billion allocated to modernising cloud infrastructure and data platforms. This dual investment reflects that advanced AI models are ineffective without modern, scalable data infrastructure to support them. As of 2024, 80% of JPMorgan's applications have been moved out of legacy data centres and 90% of its analytical data resides on public cloud platforms Cio.

The bank now has over 450 AI use cases in development, but its approach to scaling is deliberate. In asset and wealth management, the organisations "Coach AI" tool – which enables advisors to surface relevant research and personalised recommendations using natural language – improved response times by 95% during periods of market volatility. AI-driven tools contributed to a 20% increase in gross sales between 2023 and 2024. And Generative AI tooling has been deployed to over 200,000 employees, with more than half using it multiple times per day.

Operating in one of the most heavily regulated industries in the world, JPMorgan has built compliance, security and auditability into its AI infrastructure from the ground up, demonstrating that robust governance and rapid scaling can be complimentary.

The NHS AI Lab:

Launched in 2019, the NHS AI Lab was one of the most ambitious national AI programmes in the world, with the explicit goal of accelerating safe AI adoption across health and social care. A peer-reviewed evaluation published in Nature's npj Digital Medicine found that the AI Lab made important contributions to national AI policy, regulation and capability building. However, implementation and scaling were hindered by shifting objectives, limited capacity and systemic misalignment with service needs.

A separate study by UCL researchers found that contracting for AI diagnostic tools took between four and ten months longer than anticipated, and 18 months after contracting was meant to be completed, a third of hospital trusts were not yet using the tools in clinical practice UCL. The barriers were not primarily technological, with key challenges including engaging clinical staff with already high workloads, embedding new technology in ageing and varied NHS IT systems across dozens of hospitals, and a general lack of understanding and scepticism among staff about using AI in healthcare UCL.

This is referred to as “pilotitis” at a national scale, with significant investment and genuine technological promise, but an inability to move from experimentation to production because the data foundations, integration pathways, change management, and governance frameworks were not adequately addressed upfront. The lessons here are that technology readiness alone is not sufficient – without the organisational machinery to operationalise AI, including the workflows, the infrastructure integration, the staff engagement, and the governance, even well-funded initiatives can stall.

However, what did work was national programme leadership, local imaging networks sharing resources and expertise, high levels of commitment from hospital staff leading implementation and dedicated project management. A success in the building blocks of successfully delivering AI.

The Evidence Pattern

BCG's research across 1,000 senior executives found that AI leaders successfully scale more than twice as many AI products and services across their organisations. The pattern is fewer, better-chosen initiatives, with deeper investment into the foundations required to take them to production.

AI leaders follow a ratio of roughly 10% of resources to algorithms, 20% to technology and data, and 70% to people and processes.This inverts the instinct of over-investing in the technology, yet not the organisational change required to effectively leverage it.

Over a three-year period, AI leaders achieved 1.5 times higher revenue growth, 1.6 times greater shareholder returns, and 1.4 times higher returns on invested capital compared to their less AI advanced peers. BCG's 2025 follow-up research found that the gap is in fact widening, with "future-built" companies pulling further ahead of those still struggling to scale.

Building a Repeatable, Scalable AI Operating Model

Individual AI projects that are delivered well can create value. However an AI operating model can create compounding value, with each initiative building on the capabilities established by the last, making the next one faster, cheaper and more likely to succeed.

An operating model provides the shared infrastructure, processes, governance, talent and leadership structures that allow AI to be delivered at scale, as opposed to a series of individual investments.

Research drawing on more than 200 at-scale AI transformations, identifies six dimensions essential to capturing value from AI: strategy, talent, operating model, technology, data and adoption and scaling. Organisations that invest across all six consistently outperform those that focus on technology alone.

The key components of a scalable AI operating model include:

A centre of excellence or distributed capability model:

Whether centralised, federated or hybrid, a clear structure for how AI expertise is organised, deployed and developed can significantly contribute to success. A centre of excellence can provide shared services, standards and governance, while domain-embedded teams bring the business context required for effective delivery. The most effective models combine elements of both.

Reusable platforms and tooling:

The AI platform, encompassing data infrastructure, development environments, deployment pipelines and monitoring – is what can transform AI into an efficient and effective capability. With every investment in shared tooling reducing the marginal cost of the next AI initiative.

Standardised governance and risk management:

A common governance framework, with risk-based tiering and embedded compliance, provides the consistency and transparency that leadership and regulators require. This can include clear roles and responsibilities, standard assessment templates and automated compliance checks.

Talent strategy:

As mentioned before, with the research finding that AI leaders allocate roughly 10% of resources to algorithms, 20% to technology and data, and 70% to people and processes, the talent dimension encompasses recruitment, development, retention and upskilling – not just for data scientists and engineers, but for the business professionals who will work alongside AI and the leaders who will govern it.

Continuous learning and improvement:

An operating model is a living system that evolves as the organisation's AI maturity grows. This means regular retrospectives on delivery performance, active tracking of lessons learned, and deliberate experimentation with new tools, techniques and approaches.

AI maturity can be seen as a progression. At the earliest stage, AI is ad hoc, with individual experiments driven by enthusiastic teams. At the next stage, capabilities begin to coalesce, with considerations for platforms, governance, and a small number of use cases reach production. At the most advanced stage, AI becomes a fully operational model, with a steady pipeline of use cases moving from ideation through to production and ongoing optimisation, each one faster and more efficient than the last.

Conclusion

"Pilotitis", or "pilot purgatory", is often not a technology problem. It is more a strategy, data, operations and governance problem that manifests in the technology layer. The technology itself has matured dramatically – large language models, computer vision, predictive analytics and agentic AI systems are all capable of delivering transformative business value. However the gap is in the organisational machinery required to leverage them effectively.

Closing that gap requires a number of things: starting with clearly defined business outcomes rather than technology capabilities; investing in data foundations that support AI at scale; building the production pathway into every initiative from day one; implementing governance that is proportionate, embedded and automated; and developing a repeatable operating model that makes each successive AI initiative faster, cheaper and more likely to succeed.

Organisations that have successfully scaled AI share one thing in common: they stopped treating it as a technology programme and started treating it as an operational capability. That shift - in mindset, in structure, in how value is measured - is ultimately what can separate the pilot from the product.