Episode #1 | AI in Practice: From Pilot to Production

Episode #1 | AI in Practice: From Pilot to Production

Audacia

26 May 2026 - 47 min read

AI
Episode #1 | AI in Practice: From Pilot to Production

Recent figures suggest that up to 88% of AI proofs of concept fail to reach live deployment - more than double the failure rate of traditional IT projects.

In this episode of Technically Speaking, host Richard Brown is joined by Amy De-Balsi, an independent programme manager working in a highly regulated public sector environment, and Adam Brookes, Head of Consulting at Audacia, to explore why AI initiatives so often stall between proof of concept and production, and what it actually takes to close that gap.

Grounded in real projects and first-hand experience, the conversation covers the full lifecycle of an AI initiative: from early ideation and feasibility, through pilots and governance, and into live deployment.

In this episode:

  • Data readiness and why it needs to come before AI strategy
  • Governance, guardrails and how they can support innovation
  • Human-in-the-loop processes and why they remain essential
  • How to define success when AI outputs are less predictable

Watch now

Listen now

Speakers

Richard Brown

Richard Brown is the Technical Director at Audacia, where he is responsible for steering the technical direction of the company and maintaining standards across development and testing.

Amy De-Balsi

Amy is an independent Programme Manager with a track record of delivering data programmes and strategic initiatives across government and healthcare, including end-to-end management of AI pilots from PoC to production, working with technology and business teams, senior leadership and suppliers to drive innovation and ensure secure delivery.

Adam Brookes

Adam is Head of Consulting at Audacia, specialising in delivering advice and strategic roadmaps for the delivery of technology projects across engineering, data, AI and cloud.

Transcript

00:00:03 Richard Brown: Welcome to Technically Speaking, a podcast from Audacia where we get into the conversations happening within technology leadership. I'm Richard Brown, Technical Director at Audacia, and over the course of this series, I'll be sitting down with guests that are shaping how organisations think about and apply technology. Each episode will dig into a different topic, whether that's navigating AI governance, rethinking what technology leadership looks like, or building the infrastructure and culture that makes it all possible. The conversations will be honest, substantial, and genuinely useful.

00:00:38 Richard Brown: Okay, in this first episode, I'm joined by Amy De-Balsi and Adam Brookes, and we're talking about taking AI from POC to production. Hope this is a conversation you find interesting. So thanks both for being here. Just for anyone who isn't already familiar with you both, can you just give a quick intro to yourself and your background? Amy, start with you.

00:00:56 Amy De-Balsi: My name is Amy. I am an independent programme manager and I am currently working for an arms-length body in the health space. I've been working on AI now for 15 months and I've taken two proof of concepts through to live and have another four being developed by the teams at the moment.

00:01:14 Richard Brown: Excellent. Adam.

00:01:16 Adam Brookes: I'm the head of consulting at Audacia. I've been at Audacia for about 10 years now. My background's mainly in sort of software engineering, solution architecture, and then I moved more into delivering projects from a technical and DM point of view. So I head up the consultancy team at Audacia, who is the team that is responsible for shaping these initial POCs and sort of getting them ready to go into that production state.

[0:01:38 Richard Brown: Excellent. Yeah, thanks both and thanks again for being here. So like I said at the start today, we're talking about taking AI projects from that initial proof of concept phase right through to production. So probably the best place to start is, you know, you've both got a wealth of experience building quote unquote traditional software projects. What makes AI projects different? What do you need to do differently to actually make them a success?

00:02:02 Amy De-Balsi: So the first one I was involved with was done under what's called Innovation Hour, which was a construct by the Chief Digital Technology Officer. And she deliberately got rid of all the rules. And to do that, she was like, let’s just look at AI and let’s see if we can actually find a use case and put it in and build a proof of concept, which is when you guys got involved. But it was done outside of any existing processes. And it was really, really useful to just prove the fact that, yes, we could get an idea, take it through to a proof of concept and then get it through to production. Now, because we’ve been so under the radar about the very first bit, the productionisation bit was a bit harder. But it meant that we learned, we learned really fast and we got the benefits really fast.

00:02:48 Adam Brookes: Yeah, I think it’s interesting. When you’re looking to build a more traditional software engineering project, I suppose you sort of have a vision for a product or a piece of software and you can sort of define the requirements for that in quite a deterministic way. I guess the interesting thing about doing, especially with a large language model space, is just how undeterministic it is. So you have to almost reframe how you’re defining those requirements. It’s almost, you know, define what good looks like, then try and understand how close that you’re able to get. But you’re almost evaluating against less traditional metrics, I suppose. And you’re also bringing in the data a lot earlier in the cycle. So we’ve done a lot of reflection around actually how is this different to a normal software project and getting the access to the data and making sure it’s of a high enough quality and of the volume to be able to use by AI is kind of almost done in pre-discovery. Because if you can’t get access to that, then you can’t build AI.

00:03:45 Adam Brookes: And you have to find that if you’re going on a journey and you can do some use case ideation to understand how AI could potentially impact across the organisation. Quite often you almost start to see pockets of data that actually unlock several potential use cases, I suppose. So if you’re doing that sort of initial road mapping, you kind of know, well, actually, before we can start exploring how AI can help this, we’re going to have to do some engineering work, we’re going to do some data work to actually get to a place where we’re able to make it available.

00:04:13 Richard Brown: So it’s part of that proof of concept, it’s part of the concept that you’re proving that we have the data, it’s of high enough quality, we can kind of get it into the right place in order to build on top of it.

00:04:24 Amy De-Balsi: Yes, and also then during particularly the alpha phase, it’s about testing that the AI can actually return the right results that a human would do. So we do a lot of comparison around, so the models produce this as an answer, what would you produce as an assessor or a human?

00:04:43 Richard Brown: And how does that then evolve as time goes on and new models come out and the data’s potentially changing over time? Like is that what productionisation means, that kind of, yeah, I guess enabling this to be used over a longer period of time?

00:04:57 Adam Brookes: I mean, I guess it’s a couple of things. So generally when you’re doing initial proof of concept, you’re probably opening up this idea or this solution to a smaller part of the organisation. You’re almost picking some champions to validate that it’s going to work. I think on your point about models changing and data changing, a really key point is to understand how you’re going to measure what good looks like, especially with these non-deterministic methods. So, you know, you almost need to, before you start developing, create a framework, and if possible, use more traditional software engineering techniques to be able to automate and run these as the project progresses. Ultimately, you’ve got a language model producing this result, here’s our gold standard, and there’s almost several metrics you can use to compare the two. So at that point, if you need to upgrade your model, change your model, change the data, or even change slightly how you’re doing it, you’re able to run a suite of automated tests and be able to say, well, we’ve gone from it averaging 8 out of 10 to averaging 7.5, that’s okay.

00:05:59 Richard Brown: So it’s kind of automated testing, but moving away from the assert, this is true, more towards a kind of more probabilistic kind of outcome or kind of comparison of like, is it better, is it worse?

00:06:11 Adam Brookes: Yeah, I mean, it’s different techniques for doing it. You can sort of use an LLM as a judge itself, but then who’s testing that LLM, you sort of go on forever. But there are more statistical ways that you’re able to almost compare the similarities almost like a numerical level as well.

00:06:29 Amy De-Balsi: Often with normal software projects, you kind of deliver it when you’ve met all the requirements and then you’re done. And then you just do a bit of iteration depending on if those requirements change. Now, even during the cycle when we were building, GPT changed their models and they deprecated one. So we had to go through the process of going, right, this is going to happen quite often. And the way I think about AI delivery now is much more about a living and breathing thing. You can’t just go, yep, done, right, off to the next one. You’ve got to look after it, understand what the changes are going to be from GPT, so you can keep up to date with what’s happening.

00:07:05 Adam Brookes: Yeah, I think one point as well is that because so much of what’s happening in the AI space right now is large language models, agentic AI, I guess we can default to talking about that particular area of AI, but you know, there’s a lot more to AI than just large language models. Actually, a lot of value can happen from more predictive models, classification models. And I guess the same principles ultimately apply. I think it’s easier to be able to understand and have an evaluation set, to be able to say this model is X percent accurate to where we need it to be. But it’s the exact same thing about having something that’s maintaining that and consistently evaluating if it’s still reaching those levels of performance as situations change.

00:07:47 Amy De-Balsi: But it’s also about human in the loop. So we’ve always been very, very clear that whatever AI is put into the organisation, that there is a human that reviews that and they make the decision, not the AI. So everything is reviewed before it goes out. And if we have 70% of the right information through, that’s a massive productivity gain.

00:08:11 Richard Brown: So I come back to something you mentioned earlier, Amy, about the productionisation phase. You said that’s when it kind of got a bit harder. Is that because it sounds like you were very empowered at the beginning — you had a lot of freedom to innovate and to kind of make mistakes and things. Was the productionisation phase where some more standards and rules came in and you’d have to conform a little bit more?

00:08:35 Amy De-Balsi: Yeah, so there’s really stringent service transition. For something to become a live service within the organisation, it has to be checked off by all the relevant people and it’s very stringent, and we had to go through that process because it was going to be a live service. So yes, you have a lot of freedom at the beginning, but then we had to go, right, have we got all the paperwork in place? Have we got the sign off from testing? But our biggest blocker and the bit that put the biggest gap in terms of our delivery was around cyber testing. So we use a third party cyber testing organisation. It’s so new that there aren’t that many AI cyber testers globally.

00:09:16 Richard Brown: So we’re looking at pen testing for LLM based applications, that kind of thing?

00:09:20 Amy De-Balsi: Yeah, because your attack surface gets a lot bigger. And so I think in the cyber testing organisation that we partner with, there was 10 individuals that could do it globally last year. This year there’s 20. So they’re building up the skills, but it’s a really new arena and they’re learning as we are.

00:09:47 Richard Brown: And I guess the development of the system itself — all of those skills are in their infancy. So were there things that came out of that, actually we had no idea that was going to happen, or did it kind of get through with a fairly clean bill of health?

00:09:55 Amy De-Balsi: It got through with a really clean bill of health actually. And so I’ve had two or three tested now and there’s never been anything that’s been a big red flag at all. But it’s very interesting talking to those people because they talk about poisoning and hallucination. You’ll know this stuff better than I will, but yeah, the risk is so much higher with AI.

00:10:17 Adam Brookes: Yeah, I mean, it’s just a whole new area of potential attack vectors, isn’t it? You’ve got to kind of protect against yourself. And I think a key thing to consider is also sort of evaluating where that risk is. So obviously if you’ve got an internal chatbot tool you’re using to create some efficiencies, the attack surface is a lot smaller than if you’ve got a public facing tool.

00:10:42 Richard Brown: Something else you were talking about was, it sounds like you were halfway through development and OpenAI deprecated a model — almost like, right, okay, this is the new normal. Does that mean when you’re kind of putting together a business case for an AI system, are you having to kind of build in more like operational costs because actually the maintenance overhead is just that much more?

00:11:05 Amy De-Balsi: Yeah. And also if you’re processing large amounts of data, there’s that processing cost as well. So if you want to upload 20,000 reports, there is a cost associated to that. So you have to factor in essentially FinOps right from the beginning.

00:11:22 Adam Brookes: Yeah, I mean, it’s interesting. So just going on that sort of ongoing OpEx costs of these kind of projects, I think a common trap that I see certain organisations fall into is when they start that POC, they’ve got a particular level of budget, but they’re not understanding how that’s going to scale from an infrastructure perspective or a token cost perspective when you’re using production levels of data or usage. So something that at the start seems like it’s delivering loads of benefit and financial value — when you take that into account, the trade-off is not particularly worth it. It’s definitely worth being conscious of, because I think a lot of these LLM providers have been definitely giving subsidies. And I think that is going to change once they’ve got us all hooked on language models.

00:12:12 Richard Brown: Yeah, so I guess that return on investment calculation — how do you, maybe it’s an impossible question to answer, but how do you kind of factor in that already the providers are kind of doubling, tripling kind of prices? Like how do you kind of factor those future events in?

00:12:30 Adam Brookes: I mean, I think you’ve got to look at the market, look how it’s going and factor in a sensible proportion increase year on year, month on month, and make sure that in the worst case scenario — let’s assume that token costs go up by 50%, 100%, whatever that might be in the next three years — actually, is this still financially viable? But I think obviously at the moment everyone’s going for those large language model solutions because they’re so easy to use and so powerful. But there’s definitely also the potential that we’ll move back to smaller models or even more local based models, and hosting these things yourself rather than using cloud platforms will become more of a norm.

00:13:11 Amy De-Balsi: One of the things that we are now having to deal with, because we’ve got two tools that are alive within the organisation, is that’s created this real buzz around AI and we are literally inundated with people with use cases. So we’re having to be really careful about where we place the investments. Initially, when we started this process 18 months ago, it was let’s find the group that’s willing to work with us and be experimental. That’s gone completely the other way. Now we’ve got so many people with really valid use cases, so we need to be really selective about who gets that investment to give the biggest impact, so particularly towards productivity.

00:13:48 Richard Brown: So is that the main factor — what are the benefits and trying to quantify those in terms of kind of time saved, or what other things kind of come into that?

00:13:57 Amy De-Balsi: Yeah, so we’re doing quite a lot of benchmarking right in the pre-discovery phase around how long it takes somebody to do something. And it’s quite hard for some people to actually say, oh, well, for my job, they find it really hard to quantify their job. And they also think, well, what are you using that for? But all we want to know is if it takes you three hours now and we give you a tool, how much time has it saved overall? And what does that quantify in terms of the wider benefit to the organisation?

00:14:29 Richard Brown: It terms of like moving that individual onto more valuable work and doing things that you otherwise couldn’t get round to doing?

00:14:36 Amy De-Balsi: Yeah, so the whole thought is that it’s about freeing people up to spend more time on higher value activity and getting rid of the dross, essentially. It’s not about getting rid of people. The organisation I work with, specialists take five years to trainIt’s a heavy investment in the people — there’s no incentive to get rid of that.

00:15:02 Richard Brown: It’s a heavy investment in the people — there’s no incentive to get rid of that.I think that’s a really important point — there’s a lot of panic out there. I think a lot of misinformation about it’s going to automate everything, it’s going to take everyone’s jobs. But Adam, from your point of view, talking to lots of different clients, is that primarily what you see, that it’s more about getting individuals onto more valuable work?

00:15:21 Adam Brookes: Yeah, I think so. I think most people that we speak to — people have areas of their job that they find far less rewarding or they know it’s less value add, the organisation knows it’s less value add. So that’s definitely one. I think another part of it is also sort of barrier to entry to doing certain jobs. So jobs that previously people have taken far more training, actually by using some AI technology you’re able to onboard them a lot quicker. I guess there’s also a couple of areas to think that AI can add value — so the other one is just increasing general quality. But the problem with that is it’s far less tangible and a lot harder to measure. And the other area is doing things safer. I think it can add value there.

00:16:10 Richard Brown: I guess that’s a really good kind of segue. I was going to talk about guardrails — whether you work in kind of a heavily regulated environment or, I know Amy, a lot of your work kind of has to be correct. I suppose that’s a really important aspect of it. How do you balance those kind of guardrails and that governance with actually, you do want to move fast and you do want to innovate?

00:16:33 Adam Brookes: I think it’s a bit of a common misconception. If done right, then I think that guardrails are the things that can allow you to move faster. So I think the key thing with AI, if you’re talking about some of the more common tools like Copilot, we sort of know that giving tools to people, they’re the ones most likely to be able to innovate with it and think of new ways for it to help their day-to-day roles, as opposed to someone higher up deciding that’s how they should use a tool. But at the same time, most organisations are only comfortable doing that if they’ve got guardrails in place. So the organisation’s been really explicit in terms of what data you can put into this and what data you can’t. And going back to Amy’s point earlier, making sure that people are aware that they are fully accountable for the output of this — there’s no falling back, oh, sorry, AI produced it. If you’re going to use that tool, you’re responsible for what it’s producing. So I think done right, they can almost accelerate that innovation and adoption, and done wrong, they can almost stifle it.

00:17:46 Amy De-Balsi: But deploying Copilot within the Azure estate actually gave us a lot more control in the organisation that I work with because it meant that the data — you knew the data was staying on the estate. And the worry if you haven’t got a tool that somebody can use that’s safe is that they’re going to use the public tools and accidentally, unwittingly train models without knowing. And you’ve got data leakage at that point.

00:18:25 Richard Brown: Yeah, I mean, it’s one of the biggest drivers, positive indicators of adoption is actually having some rules in place. Because without rules, people potentially are paralyzed by fear — like, I don’t know if I’m allowed to use this, so I’m not going to use it. But Adam, you were talking about accountability and Amy, you’ve already mentioned having that human in the loop. Like, is that a really important aspect of actually making sure that ultimately you don’t have an AI system just autonomously making decisions?

00:18:39 Amy De-Balsi: I’ve not been in, so essentially Agentic AI is kind of the next step that we need to take. However, how you do that with legacy systems — haven’t looked at yet. So we’re kind of doing the relatively easy stuff first with large language models. But I do think you’ll get real benefit from Agentic AI. I just think there’s quite a lot to think through before you can deploy it.

00:19:06 Adam Brookes: Yeah, I think that quite often when we’re supporting clients with these kind of projects, it’s the case that we will build a roadmap that ultimately has got multiple phases. And you almost want to trial and validate that if you were to automate a process, would it work effectively? And the best way to do that actually is to start off the phase which is keeping the human in the loop and be able to gather feedback where it’s not potentially made the right decisions or the outcome is not as expected. I think from there you can sort of get some really quantifiable data in terms of how often it’s not doing the right thing and where it’s doing the right thing. I think having that in place gives organisations a lot more confidence in terms of being able to fully automate something end to end.

00:19:48 Adam Brookes: I think it’s quite important as well when we talk about agentic AI — with these systems, they come to several sort of decision points almost, where they can go down different paths and different things can happen. So I think by really mapping out the processes and the path that it might take, you can almost identify pinch points where the risk is highest. And I think that in those pinch points, what works quite well is trying to think of if there’s more deterministic ways you’re able to decide what should happen at that particular point in time.

00:20:22 Richard Brown: So kind of jumping out of the AI into just the traditional.

00:20:26 Adam Brookes: The hybrid, you know. And maybe ultimately those high risk points you want to get to a place where you’re able to have technology that’s able to give you a confidence level back of I am X percent confident. Obviously, if you’re asking an LLM how confident are you, it’s going to say I’m positive mate.

00:20:44 Amy De-Balsi: But you do make a very good point. We were really lucky because we’ve had users with the pilot projects now working with them for about four or five months. So we’ve had user research go back to those teams and really find out what they find useful and what they don’t. What we learned is that barely anything said was about the AI — it was all about the service design. So where the friction points for them were around the process, and it gave us a lot of learning around if we’re going to do this again, we need to look at the end-to-end service rather than doing point solutions because you think that’s going to give you the benefit. Actually, yes, it may do because it’s all shiny and we’re meant to be doing AI at the moment. But actually, if you look at an end-to-end service, you might actually get more productivity gain just by doing a normal software project.

00:21:34 Adam Brookes: Agreed. I think a lot of these things end up being, I’d say, 70% software engineering or data engineering anyway — that’s the reality when we actually get to production. Because you’re either wrapping a large language model in something people can interact with, or like Amy says, managing it as part of a process, or you’ve got a machine learning model which again is being consumed by something.

00:22:02 Richard Brown: Yeah, I guess that’s going back to your initial point. It’s about kind of getting the data in place. There’s probably a huge amount of data engineering needing to happen to move data around, to cleanse it, to kind of anonymise it maybe, depending on how you’re processing it.

00:22:15 Adam Brookes: Yeah, I think obviously coming back to security, making sure that it’s labelled appropriately as well. You know, there’s a lot of making sure you understand where that sensitive information is and how it’s going to be surfaced through these AI tools.

00:22:29 Richard Brown: Adam you’ve mentioned kind of more traditional machine learning kind of predictive models a couple of times. And is there anything different? We started by talking about how do you successfully get AI projects from POC to production, focusing more on kind of LLM applications. Is there anything different when it comes to the more kind of predictive machine learning type applications or do the same rules apply?

00:22:53 Adam Brookes: I mean, we keep talking about data, but I think that those types of projects, data is even more crucial because ultimately you are leveraging historical data to then be able to classify future data, to predict future data. So quite often with those projects, we’ll start off with more of a technical feasibility stage rather than a POC stage. So, you know, we’ve had a conversation where a client said, wouldn’t it be great if we could predict the number of dogs that will come into my vets next week. So our first question is, well, do you record the type of animal that comes through the door? Maybe not the best analogy, but if you haven’t, then quite often it’s the case that from those conversations we might identify several use cases that we know will add value, but they haven’t got the data for yet. So actually, the short-term plan is, well, how can we support you in capturing that data and hopefully in a structured manner? So again, it’ll come back to a data project, maybe a software engineering project — so almost facilitating those kind of prerequisites.

00:24:00 Richard Brown: And is that, could there sometimes then be a time lag of actually we need to go away for six months and build a data set?

00:24:05 Adam Brookes: 100%, yeah.

00:24:08 Richard Brown: Kind of quite early on, Amy, you were talking about leadership. Sounds like you had freedom to innovate — you were very empowered. How important was that in that particular use case, that kind of almost the opposite of top-down leadership — just like, you’ve got freedom, go away and do what you do best.

00:24:30 Amy De-Balsi: Yeah, it was brilliant. And because I’m in the world of government, I get quite a lot of people coming and saying, can you tell us how you managed to do these and put these live? And when I tell them, they’re like, oh, okay. Because we didn’t tie ourselves up in knots. We just went, right, let’s just see if we can do it. And it worked. Which was, I shouldn’t be surprised about, but the amount of people I talk to who have got like 15 proof of concepts that they cannot get into production — we’ve now worked out what the end to end process is. One of the things that has been produced in the last few months, or updated in the last few months, is the Department of Science Innovation Technology have produced an AI playbook. So I’ve been reading it and actually going, yeah, that’s kind of actually what we did. But they’ve written it all down for people. It’s really useful.

00:25:23 Richard Brown: So, did you have any kind of input into that? Are you aware that was even happening, or is it just you happened to be doing that anyway?

00:25:33 Amy De-Balsi: Yeah, I got added to the government delivery Slack channel and it popped up on there, which is how I found it.

00:25:41 Richard Brown: Yeah, wow. Is there anything else from leadership other than just being empowered and the freedom to kind of innovate and make mistakes? Is there anything else that was really important in terms of the leadership that you had?

00:25:55 Amy De-Balsi: Doing it quietly meant we could get it done, but then we’ve had the slight flip of it. By the time people find out about it they’re quite surprised. So lots of leaders are like, oh, you’ve done AI — oh, we didn’t know about that. And then we had to go through that learning curve of actually selling what we’d done to a wider audience.

00:26:17 Richard Brown: It’s interesting because it’s flipping it. Because I guess something that we often see is everyone wants to be seen to be doing AI — it’s almost the first opportunity leaders have, they’re going to shout from the rooftops. We’ve got this AI project, this AI project. Sounds like you were very much the opposite — almost like this little skunk works, just doing it on the sly without announcing it. That was quite brave and quite kind of groundbreaking, I think.

00:26:41 Amy De-Balsi: I’d worked with the CDTO before, so I knew that was a style, right?

00:26:47 Richard Brown: Do you see any, I mean, you work kind of across a number of organisations, Adam. Do you see that same kind of importance of leadership?

00:26:55 Adam Brookes: Definitely. I think it’s quite common that we’ll go into organisations and we’ll try and explore how they can leverage AI. I think quite often these conversations start off with leadership. I think we’ve seen it sort of both sides. We’ve seen leaders sort of being like, you know, I know what the business needs, I understand the processes, here’s the set of use cases. And I guess they don’t involve the people who will be using those tools day-to-day. I think the best thing is actually where these leaders ultimately get everyone involved and then get out of the way. I think it works. I mean quite often these conversations are — a big shortfall that some people fall into is leading with a technology first approach. We’ve got co-pilot licenses, what can we use them for? And it’s just generally not very successful. So we approach these conversations very much with a, what are your current pain points and how do you think technology can solve it? And we find that quite often, maybe 30% of them are actually a use case for AI and the rest of them are just around process improvements or could be solved with more integrations.

00:28:10 Richard Brown: So you’re not even coming, sometimes not even starting at an AI angle, it’s more about tell me about your problems.

00:28:15 Adam Brookes: Definitely. So coming back to your point, you tend to find that if you’re allowed to have some conversations with people outside of leadership, they can sometimes be more candid about what their pain points are, which gives us a clearer picture in terms of use cases. I guess the other thing as well when it comes to leadership is making sure that they’re comfortable with that sort of fail fast, fail early mentality. I think that’s what you need to be able to do with AI — quickly weed out the AI POCs that ultimately aren’t going to be suitable for production for whatever reason, and have a leadership team that’s comfortable doing that.

00:28:52 Amy De-Balsi: We had one of those. So we were running three proof of concepts all at the same time. Two were successful, one wasn’t. Partly because it was an off-the-shelf solution and it didn’t quite meet what the team needed, but also we hadn’t done the proper business analysis and those conversations that you talk about, and it was in a particularly complex area. So we’ve gone back to them, we’ve done that analysis and the full product piece around it, and actually they don’t need AI.

00:29:23 Richard Brown: Interesting. So I was going to ask about that because we’ve been, up until now we’ve painted a fairly rosy picture — look at all these successful AI projects we’re running, but somewhere between 80 and 95% of AI projects fail depending on which sources you believe. So the example you just gave Amy, it sounds like the failure was more around how that was approached. And actually if it had been approached a different way, maybe you would have just uncovered it’s not actually an AI use case at all.

00:29:56 Amy De-Balsi: I think that’s right, but also we learned so much by failing. And I was in an environment where that was actually a legitimate outcome, as long as we take those learnings and apply them to next use cases. So that tool did not have a front end. It just spat out an Excel spreadsheet. And what it did was create more work for the teams, not less. That was so valuable — we know that whenever we go into the next project, make sure you’ve got a really good interface, make sure that you can make the AI really visible, that you can interact with it via thumbs up, thumbs down. So I’m always quite comfortable with having a failed project.

00:30:38 Adam Brookes: The key thing there is you want to fail after 20 days, not 200.

00:30:43 Richard Brown: Yeah, that’s the key. It’s the fail fast, but then learn from that and move it forward. Was there any kind of structure to that? Were you doing kind of retrospectives or anything, or was it more just it was such a small team that you were just naturally kind of learning as you went on?

00:30:58 Amy De-Balsi: Yeah, we were a tiny, tiny team and running a bit like a startup really, where we were just all pitching in just to get stuff done. And it was all slightly side of desk because we didn’t really tell anyone what we were doing. But yeah, we didn’t do formal agile sprints and retrospectives and all of the good stuff that you should do — that we did for you guys on that one. It was just, it came very obvious quite quickly. It just wasn’t a good fit.

00:31:31 Richard Brown: Have either of you ever seen cases where a project could have failed, it could or should have failed earlier. But sometimes there’s a bit of that kind of sunk cost fallacy where, well, we spent this much time on it, we’ve just got to plow on and see where we get to. Actually, people could be braver about just pulling the plug a bit earlier.

00:31:52 Adam Brookes: You want to, with these kind of projects, it just goes back to that traditional agile sort of methodology. You want to be able to get to a point where you’re releasing and you’re giving something on a regular basis. So I think that as long as you’re doing that and you’re phasing your projects on your roadmap, it makes it easier to pull the plug. I mean, I guess it comes back to what I was saying before around defining what good looks like — trying to have an objective measure of that. Because I think where you can fall into a trap is that, especially in those large language model projects, you might get it 60% of the way there, and it’s not enough to add real value. Actually, it probably might add an overhead because you’re taking an output and then you’re having to fix it rather than just doing it yourself. It might take longer, it might be worse quality, but almost that sunk cost of trying endlessly to tweak it and improve it — it might improve in one area, but weaken it in the other. Prompt engineering is not particularly an exact science, so it’s about having that objective measure of how good it is and having an incremental roadmap for how you’re going to release something.

00:33:00 Amy De-Balsi: One of the things we learned was having the output to be editable. Because if you produce 70% of content that’s right and you can edit it — but on one tool it wasn’t editable in the first iteration, which then meant it wasn’t actually useful.

00:33:18 Richard Brown: So it had to either be totally right or you couldn’t use it.

00:33:22 Amy De-Balsi: So we’ve gone through that learning to say, right, whatever we produce from AI, got to be able to edit it.

00:33:30 Richard Brown: Which comes back to that kind of accountability — if it’s not editable, you can’t hold the human accountable. Whereas if it is, then they’ve always got an opportunity to tweak or kind of stop hallucinations or whatever it happens to be.

00:33:44 Amy De-Balsi: It was about address matching, so it wasn’t anything horrific.

00:33:49 Richard Brown: So the way we’ve been talking just now, it implies that you kind of need various kind of off ramps — it shouldn’t just be, we’ve got to spend this much money and then we’ll make a decision. You’ve got to kind of have more frequent places where you can jump off and say, actually, it’s not working. Let’s pivot. Like how small do those iterations have to be, or does it just depend on the use case?

00:34:13 Adam Brookes: I think that in that technical feasibility, POC type phase, those feedback loops need to be very, very quick for you to get to a point that you’re comfortable that this is going to work as a solution. I think that once you’ve got something into production, you can sort of fall back to maybe a more traditional 2 week sprint sort of cadence, and that seems to be enough.

00:34:36 Amy De-Balsi: We tend to follow like pre-discovery, discovery, alpha, beta and the length of time certainly depends on how big the problem is. But at each point, because we’re working a lot with third parties, we’ve got contractual ability to go, right, now, stop. And it gives us the space to really evaluate at the end of each sprint what the right thing is to do. We haven’t just gone, right, here’s a contract, deliver me these tools, regardless of whether they’re right or not.

00:35:05 Richard Brown: Has that changed how you, like, do contracts for these things look different?

00:35:13 Amy De-Balsi: I think they might actually, and it’s not really something I’ve talked to our commercial team about, but it just seems to be the way that we’ve ended up contracting is just on a sort of phase-by-phase basis.

00:35:25 Adam Brookes: Yeah, I think that estimation is an interesting point though. With traditional software engineering projects, it’s kind of easier to size different user stories against each other and say, this is a five, this is a three, because you’ve got previous experience of, I don’t know, making some basic edit screen or searchable table. It is quite challenging because if you’ve got a user story in a backlog which is, okay, make this prompt a bit better, make the output a little better, it is all just so subjective. I think one thing that we’ve done in that space is to almost treat those parts of the project as sort of time-boxed exercises. We know that we’ve allocated how many days to do that, something that we feel comfortable with, but you’ve sort of got to draw a line because you can go on forever, really.

00:36:18 Richard Brown: Yeah, and kind of linked to that — thinking towards a more traditional software engineering project, you have user stories and you have a definition of done and things move very kind of neatly across the board. How do you even begin to define a definition of done for something that’s not deterministic and can kind of change depending on the day?

00:36:41 Amy De-Balsi: So we had a really good conversation with the business users on the one that you produced around, actually is what’s been done better than what’s there? And is it good enough to go live and will you see value from it? So it’s a very different conversation that you have to have with your users. And actually the answer was yes, given what we’ve got in production now — it’s definitely better. We accept that and let’s push it live.

00:37:16 Adam Brookes: I mean, it comes back to what I said before — with these projects, before you start writing code or playing about with an LLM or training a model of any kind, you want to be able to, as close as you can, objectively define what good looks like and have a way to score an output, this is 8, 9 out of 10. But once you’ve got a way to do that, you sort of know what you’re aiming to achieve. You know, the done is when you deliver a solution that meets an agreed upon metric for this.

00:37:46 Richard Brown: Is there anything kind of different around, I guess a big part of software delivery is kind of risks — capturing what are the risks, how might we mitigate them? What were kind of the biggest risks that you went into these projects kind of thinking about?

00:38:03 Amy De-Balsi: Data access. What is the data? Where is it? Can I get hold of it? If I can’t, how on earth are we going to fix that? That’s generally the thing I hit on a daily basis. So we have to push all of that analysis right up front in the programme.

00:38:31 Adam Brookes: I guess another key one actually is adoption. So I think that you need to consider a lot more carefully how you’re going to onboard people onto AI tools opposed to more traditional software platforms. I mean, one of the techniques that we advocate for is actually as part of that POC or that use case ideation phase, make sure that you’ve got a couple of cynics in the room because they’re the people that will really challenge you and really stress test that the solution that you’re coming up with is going to add value. And the reality is if you’re able to convert a cynic to a champion, then they’re going to advocate for that solution. They’re going to get people using it.

00:39:14 Amy De-Balsi: I had two who are now the biggest advocates for AI. They’ve written papers, they’ve gone viral — they are just absolute evangelists. With our POCs initially, the team was so small, it was like 15 users. So we didn’t really need like adoption and change really wasn’t an issue. But now we’ve matured and we’re looking at much bigger audiences for the tools. We’ve actually got change managers involved and we’ve got engagement plans and we’ve got comms. It makes such a difference. And my change manager is amazing — she’s really good at going and finding the right people to contribute even at the discovery stage. So we’re engaging them right at the beginning.

00:40:05 Adam Brookes: Yeah, I mean, it’s that classic thing — if you sort of spring a new AI tool that’s going to halve the time to do a particular part of their role, and if they’re not being communicated to properly, most people’s first thought is they’re going to replace my job. Which, as we’ve discussed, is not necessarily true — it’s so that they can do more value add tasks.

00:40:28 Richard Brown: So the individuals that you mentioned Amy, the cynics to champions — how did that happen and was their initial reluctance because of ‘it’s going to take my job’, or was there something else going on?

00:40:41 Amy De-Balsi: I wasn’t around when that use case was selected. All I know is that they were really cynical and I’m not sure where it came from, but he’s got three tools in development because the opportunity is so big in that space. And once he was a convert, it was like, right, how else can we make this absolutely critical part of the organisation much, much more efficient?

00:41:12 Adam Brookes: I think the interesting thing as well is when technology is advancing quite quickly. We’re in a place now where you can develop a lot of these solutions, and I’ve actually seen organisations, people who started off as that cynic, have really seen the value and now they’re the people championing it. They’re in co-pilot studio and making their own agent systems and really pushing it and coming up with some really cool innovative stuff.

00:41:44 Richard Brown: User kind of training, user onboarding — again, kind of drawing parallels with a traditional software project, you want to try and make it as intuitive as possible. You want to kind of try and minimise the amount of training that people need. Is that different for AI projects? Like do they fundamentally need to be aware they’re interacting with an AI system and what their responsibilities therefore are?

00:42:07 Amy De-Balsi: So in government, you’ve got to follow government digital services. So you don’t, you shouldn’t need to use the manual. If you need to use the manual, you’ve got it wrong. And they do check. So we’ve got that as a baseline. But also I think within the tool, we put in sort of notifications and warnings that this was generated by AI and please be aware and edit it. So we kind of approach it with a warning piece.

00:42:34 Adam Brookes: Yeah, I think it’s interesting having a lot of these sort of large language model projects. I guess you’re doing some stuff in the background, whether it’s Agentic or whether it’s doing some prompt engineering, but ultimately most users are not used to using technology through natural language — it is a challenge and it is a bit of a shift. So I think one thing that we’re exploring more at the moment is, where possible, giving users more structured ways to input what it is that they want. There’s actually pros and cons to both — you know, having a conversation or bouncing ideas through natural language gives a lot of value, but at the same time, for the task where a user kind of knows what they need, putting a form in front of them or making it more button-based actually adds value, which again kind of requires less training.

00:43:29 Richard Brown: So rather than being presented with a massive text box where there’s free form entry, it’s actually filling these fields and then behind the scenes you’re stitching that together such that the LLM can work with it.

00:43:39 Adam Brookes: I mean I’m oversimplifying it here, but just wrapping some UX and considering the UX and not just giving them a blank canvas text box, ask whatever you want — because that can be quite daunting.

00:43:54 Amy De-Balsi: We’ve used a lot of Figma designs during the alpha phase, so people can really see what the potential is and then they get really on board. But then they get really excited and you’re like, no, we haven’t built it yet. But it gives people the idea of where you’re going really quite clearly and they get very excited.

00:44:16 Adam Brookes: Yeah, so I guess when it comes to making sure and reducing the amount of training that you need — if you’re having to train someone to use a system, it’s clearly not very intuitive. But I think that leveraging AI to be able to quickly produce mock-ups to bounce ideas when you’re sort of solutioning these things, to understand what that UX would be — I think that’s more accessible than ever with, you know, full code and stuff.

00:44:40 Richard Brown: Predicting the future, especially in AI, is kind of near impossibility, but I’m going to ask you both to do exactly that. I’ve already talked about kind of agentic AI as the big kind of, yeah, I guess that’s going to be the next big thing and is already kind of the next big thing almost. Is that where you see a lot of focus in the next 12 months, or if not, like where else do you see particular change happening in the world of AI?

00:45:07 Amy De-Balsi: I think we’ll mature and stop focusing on delivering a technology but actually looking at services and looking at the pain points. And by that point hopefully we’ve got patterns that are established that we can reuse and apply to those pain points when they come up. At the moment we’re so early that we don’t have that consistent pattern — so somebody wants a drafting tool, which most teams do, we haven’t got that pattern nailed that we can then copy across other ones. But I can see AI just being another part of a service improvement at some point.

00:45:48 Adam Brookes: Yeah, good question. I do think that agentic AI is going to keep advancing at quite a rapid pace. I mean, ultimately we’re going to a world where we had to define very procedurally sort of the steps that we wanted software or technology to take to reach a particular outcome — we’re getting more to a place where we’re able to describe or outline what we’re trying to achieve. I think it really does increase the speed we’re able to deliver things. I think that token costs from some of these large providers are going to start becoming a big concern for organisations and people in general. I think they are at the moment very, very slowly pulling the rug out. So I think there’ll be a push for people exploring how they’re able to run more local models, or potentially there’ll be more of a shift towards people maintaining the infrastructure where they’re hosting their own open-source models. I can definitely see that happening.

00:46:53 Richard Brown: So we’re nearly out of time. It’s been a really interesting, wide-ranging conversation. I was wondering for any listeners kind of planning their next AI project, what’s like one key takeaway that you would want people to really take away from this and put into action?

00:47:08 Adam and Amy: Just one. Just one.

00:47:09 Richard Brown: You can have two if you want. That means none for you, Adam, none for me.

00:47:16 Amy De-Balsi: Definitely prioritise your use cases that will give you the biggest benefit first. But if you haven’t done AI before, just start with something small and just prove the fact that you can do it within the organisation that you work. Because once you’ve proven it, then everyone gets excited and on board.

00:47:34 Adam Brookes: My two are: don’t go into this technology first. Understand what pain points or issues your organisation is facing, or potential areas of improvement, and then take a step back and try and understand what the most appropriate challenge is — whether it’s AI or not. I guess the second one is make sure that you’re able to define what good looks like up front, being able to quantify — this task takes someone 5 hours a week, so okay, we need to improve that. And then you can constantly assess the amount of money you’re spending on it against the ROI ultimately.

00:48:11 Richard Brown: Excellent. So that’s all for this episode of Technically Speaking. Huge thanks to both Adam and Amy for joining me today. I hope everyone found this as interesting a conversation as I did. If you are interested in future episodes, please subscribe to this wherever you get your podcasts. That’s it for now. Thanks again to Adam and Amy and until next time.