[{"data":1,"prerenderedAt":301},["ShallowReactive",2],{"blog-page-content":3,"blog-post-what-ai-assisted-engineering-means-for-software-testing":30,"blog-author-what-ai-assisted-engineering-means-for-software-testing":6},{"id":4,"title":5,"author":6,"blogTags":6,"body":7,"customExcerpt":6,"date":6,"description":11,"excerpt":6,"extension":14,"h1":15,"image":6,"meta":16,"navigation":17,"path":18,"readingStats":19,"seo":24,"stem":28,"__hash__":29},"insightsBlogPage\u002Finsights-blog-page.md","Insights Blog Page",null,{"type":8,"value":9,"toc":10},"minimark",[],{"title":11,"searchDepth":12,"depth":12,"links":13},"",2,[],"md","Insights on the latest industry developments and technology advancements within digital transformation.",{},true,"\u002Finsights-blog-page",{"text":20,"minutes":21,"time":22,"words":23},"1 min read",0.32,19200,64,{"title":25,"description":26,"keywords":27},"Industry Insights | Audacia ","The latest industry insights, research and developments in digital transformation for business leaders across industries from software development company Audacia.","Software development company, Software development blog, Bespoke software development news, Bespoke software development articles, Software development articles, Software development code","insights-blog-page","ya52brGhr6G8uhl8kc87eB-z-RvUW2jH_-Goj9QUuFI",{"id":31,"title":32,"author":6,"blogTags":33,"body":37,"customExcerpt":6,"date":289,"description":42,"excerpt":6,"extension":14,"h1":6,"image":290,"meta":291,"navigation":17,"path":292,"readingStats":293,"seo":298,"stem":299,"__hash__":300},"blog\u002Fblog\u002Fwhat-ai-assisted-engineering-means-for-software-testing.md","What AI-Assisted Engineering Means for Software Testing ",[34,35,36],"AI","Testing","Software Testing ",{"type":8,"value":38,"toc":273},[39,43,46,49,52,57,60,71,74,77,93,96,105,108,117,121,124,133,141,144,147,150,153,156,160,163,166,174,177,180,184,187,192,195,199,202,206,209,213,216,220,223,227,230,233,236,239,242,246,249,252,260,263,267,270],[40,41,42],"p",{},"In February 2025, Andrej Karpathy coined the term \"vibe coding\" to describe a development practice where you \"fully give in to the vibes, embrace exponentials, and forget that the code even exists.\" The developer describes intent in natural language; the AI generates code; the developer runs it, provides feedback, and iterates, often without reading or understanding the code being produced. Collins English Dictionary named it Word of the Year for 2025 and by the end of that year, it had moved from a whimsical observation about weekend projects to a defining challenge for engineering leadership. ",[40,44,45],{},"This practice has evolved beyond hobby projects and hackathons. Microsoft's CEO has disclosed that up to 30% of the company's code is now AI-generated, with Google reporting similar figures, Y Combinator reported that 25% of startups in its Winter 2025 batch had codebases that were 95% AI-generated, and over 90% of developers now report using AI coding tools in their workflows.  ",[40,47,48],{},"The spectrum ranges from responsible AI-assisted development, where AI augments an engineer who reviews, tests and takes ownership of all generated code, to pure “vibe coding”, where code is accepted uncritically in pursuit of speed.  ",[40,50,51],{},"The question for engineering leaders is not whether their teams are using AI to write code (they almost certainly are), but whether the testing, review and quality practices surrounding that code have evolved to match the pace and nature of AI-assisted development.  ",[53,54,56],"h2",{"id":55},"the-quality-evidence","The Quality Evidence ",[40,58,59],{},"The data on AI-generated code quality has matured rapidly during 2025, moving from anecdotal concerns to large-scale empirical research.  ",[40,61,62,63,70],{},"CodeRabbit's State of AI vs Human Code Generation ",[64,65,69],"a",{"href":66,"rel":67},"https:\u002F\u002Fwww.coderabbit.ai\u002Fblog\u002Fstate-of-ai-vs-human-code-generation-report",[68],"nofollow","report",", published in December 2025, analysed 470 real-world open-source GitHub pull requests – 320 AI-co-authored and 150 human-only – using a structured issue taxonomy: ",[40,72,73],{},"AI-generated pull requests contained 1.7 times more issues overall, averaging 10.83 findings per PR compared with 6.45 for human-authored code ",[40,75,76],{},"AI PRs contained 1.4 times more critical issues and 1.7 times more major issues. ",[78,79,80,84,87,90],"ul",{},[81,82,83],"li",{},"Logic and correctness errors 1.75 times more frequently; ",[81,85,86],{},"Code quality and maintainability issues 1.64 times more frequently; ",[81,88,89],{},"Security findings 1.57 times more frequently; and ",[81,91,92],{},"Performance problems 1.42 times more frequently. ",[40,94,95],{},"At the 90th percentile, AI pull requests reached 26 issues per change – more than double the human baseline. ",[40,97,98,99,104],{},"The security dimension is particularly stark. Veracode's 2025 GenAI Code Security ",[64,100,103],{"href":101,"rel":102},"https:\u002F\u002Fwww.veracode.com\u002Fresources\u002Fanalyst-reports\u002F2025-genai-code-security-report\u002F",[68],"Report"," analysed 80 coding tasks across more than 100 large language models and found that AI-generated code introduced security vulnerabilities in 45% of cases. These were not minor issues and included OWASP Top 10 vulnerabilities. Findings include: Java was the riskiest language with a 72% security failure rate, while Python, C#, and JavaScript logged failure rates between 38% and 45%. Defences against cross-site scripting failed in 86% of relevant samples. ",[40,106,107],{},"Perhaps the most concerning finding from Veracode's research is that this is not a problem that improves with better models. Security performance remained flat regardless of model size, training sophistication, or release date. Newer, larger models produce more syntactically correct code, but not more secure code. The models learn from vast public code repositories that contain both secure and insecure patterns, and they reproduce insecure approaches with the same confidence as secure ones. This means that organisations waiting for the next model release to solve their AI code quality problem may be waiting indefinitely. ",[40,109,110,111,116],{},"A December 2025 ",[64,112,115],{"href":113,"rel":114},"https:\u002F\u002Fwww.csoonline.com\u002Farticle\u002F4116923\u002Foutput-from-vibe-coding-tools-prone-to-critical-security-flaws-study-finds.html",[68],"assessment"," reinforced these findings through testing. It compared five leading AI coding tools – Claude Code, OpenAI Codex, Cursor, Replit, and Devin – by building the same three test applications with each. The result: 69 vulnerabilities across 15 applications, including several rated critical. The tools performed well at avoiding generic, well-known vulnerability patterns like SQL injection, but failed consistently on context-dependent business logic flaws – the kind that require understanding how a workflow should operate, which AI agents currently lack. ",[53,118,120],{"id":119},"the-technical-debt-accelerator","The Technical Debt Accelerator ",[40,122,123],{},"Security vulnerabilities are the most immediately dangerous consequence of unchecked AI-generated code, but they are not the only one. The evidence on structural code quality tells a parallel story of rapidly accumulating technical debt. ",[40,125,126,127,132],{},"GitClear's AI Copilot Code Quality ",[64,128,131],{"href":129,"rel":130},"https:\u002F\u002Fwww.gitclear.com\u002Fai_assistant_code_quality_2025_research",[68],"research",", analysing 211 million changed lines of code from 2020 to 2024 across repositories owned by Google, Microsoft, Meta, and enterprise organisations, found that AI coding assistants are fundamentally changing the composition of code being written, and not for the better. They found:  ",[78,134,135,138],{},[81,136,137],{},"Code duplication exploded - blocks with five or more duplicated lines increased eightfold during 2024.  ",[81,139,140],{},"Refactoring collapsed - the proportion of changed lines associated with refactoring fell from 25% in 2021 to less than 10% in 2024, a decline of nearly 40%.  ",[40,142,143],{},"For the first time in the history of their dataset, copy-pasted lines exceeded moved lines, meaning developers were duplicating code more than they were consolidating it into reusable modules. ",[40,145,146],{},"With regards to code churn, the proportion of new code revised within two weeks of its initial commit, also rose significantly, from 3.1% in 2020 to 5.7% in 2024. This indicates that AI-generated code is being corrected or reworked at higher rates, suggesting that initial output quality is lower even when the code appears to function correctly on first execution. ",[40,148,149],{},"The mechanism behind these trends is that AI coding assistants make it extraordinarily easy to generate new code (a developer accepts a suggestion with a single keystroke). But the same tools rarely propose reusing an existing function elsewhere in the codebase, partly because their limited context window prevents them from understanding the full system architecture. The result is a systematic incentive toward duplication and away from the refactoring practices that keep codebases maintainable over time. ",[40,151,152],{},"Google's 2024 DORA report corroborates this trade-off, reporting that a 25% increase in AI usage was associated with faster code reviews and better documentation, but also a 7.2% decrease in delivery stability. This surfaces a consistent pattern in that AI accelerates output while potentially eroding the structural qualities that make software maintainable, secure and reliable in the long term. ",[40,154,155],{},"For engineering leaders, this creates a paradox. Teams appear more productive in that they are shipping more code, faster. But the total cost of ownership is increasing as duplicated code multiplies maintenance burden, structural inconsistencies make onboarding harder, and the absence of refactoring causes codebases to calcify. The initial speed gains are eventually consumed by the overhead of managing a codebase that was generated quickly but never designed. ",[53,157,159],{"id":158},"why-traditional-testing-practices-are-insufficient","Why Traditional Testing Practices Are Insufficient ",[40,161,162],{},"Vibe coding does not just introduce new categories of defect, it undermines the practices that traditionally catch defects before they reach production. ",[40,164,165],{},"The most fundamental problem is the comprehension gap. When developers do not read or fully understand the code being generated, they cannot write meaningful tests for it. Effective test design requires understanding of what the code does when given expected inputs, as well as how it should behave at boundaries, under error conditions, and in edge cases. A developer who has described a feature in natural language and accepted the generated implementation without studying it lacks the mental model needed to identify which scenarios require testing. Therefore, test coverage may end up appearing adequate by line count, but is in fact missing the failure modes that matter most. ",[40,167,168,169,173],{},"There is also an impact on the traditional code review process. CodeRabbit's ",[64,170,172],{"href":66,"rel":171},[68],"data"," shows that AI-generated pull requests create heavier review workloads. This isn’t seen in just more issues per PR, but in wider variance in issue severity, meaning reviewers must spend more time triaging. At the 90th percentile, AI PRs contain 26 issues compared to the human baseline of 12 issues. This volume of review work is difficult to sustain when teams are simultaneously under pressure to ship faster – the same pressure that motivated adopting AI coding tools in the first place. ",[40,175,176],{},"This has introduced a speed-quality tension. The traditional cycle of write, review, test, fix was designed for human-paced development. When AI can generate thousands of lines of code in minutes, manual review and testing become bottlenecks, and there can be temptation to relax quality gates rather than slow down delivery. This is precisely how quality debt accumulates. ",[40,178,179],{},"There is also an emerging problem with what might be called \"shadow AI development\" – where employees outside formal development teams are building applications and automations using AI coding tools, without engineering oversight or quality governance. These tools have lowered the barrier to creating functional software to the point where non-developers can produce working applications. But \"working\" and \"production-ready\" are very different standards, and organisations are discovering vibe-coded internal tools that lack authentication, contain hardcoded credentials or have no error handling, are deployed and in use before engineering teams are even aware they exist. ",[53,181,183],{"id":182},"what-responsible-ai-assisted-development-looks-like","What Responsible AI-Assisted Development Looks Like ",[40,185,186],{},"This isn’t to say that teams should abandon AI coding tools. The productivity benefits are significant, even if they come with caveats. However, what this new landscape requires is a rethinking of how testing and quality practices operate in an AI-assisted development environment. ",[188,189,191],"h3",{"id":190},"test-driven-development","Test-driven development  ",[40,193,194],{},"Writing tests before AI generates code ensures the code meets specific requirements, regardless of how it was produced. When a developer specifies expected behaviour through tests first, the AI-generated implementation can be validated immediately against those expectations. This approach transforms the developer's role from code reviewer (which the comprehension gap undermines) to specification author (which requires domain knowledge the developer still possesses). TDD also naturally constrains the AI's output – when the generated code must pass predefined tests, many categories of defect are caught at the point of creation rather than downstream. ",[188,196,198],{"id":197},"automated-security-scanning","Automated security scanning  ",[40,200,201],{},"CI\u002FCD pipelines must enforce static application security testing (SAST), dynamic application security testing (DAST), and software composition analysis (SCA) on all code, with no distinction between human-written and AI-generated contributions. Given that AI-generated code introduces security vulnerabilities in 45% of cases and contains 2.74 times more XSS vulnerabilities than human code on average, security scanning is not an optional quality enhancement. Organisations should also implement dependency scanning and licence checking, since AI tools can incorporate outdated or insecure third-party libraries without vetting. ",[188,203,205],{"id":204},"property-based-and-contract-testing","Property-based and contract testing  ",[40,207,208],{},"Where developers lack the detailed understanding needed to write comprehensive example-based tests, property-based testing offers an alternative: defining the properties that outputs should always satisfy (this function should never return a negative value; this API response should always contain these required fields) and generating test inputs automatically to verify those properties hold. Similarly, contract testing – defining the expected interfaces between services – provides a framework for validating AI-generated code against architectural constraints that the AI itself was not aware of. ",[188,210,212],{"id":211},"code-quality-gates","Code quality gates  ",[40,214,215],{},"The GitClear data on code duplication and declining refactoring suggests that quality standards cannot rely on developer discipline alone, they must be embedded in the pipeline. This means automated checks for code duplication thresholds, complexity metrics, test coverage requirements and architectural conformance. AI-generated code that fails these checks should be rejected automatically, just as any code that fails existing CI\u002FCD gates would be. Some organisations are also implementing \"AI code review\" tools – ironically, using AI to review AI-generated code – which adds an additional layer of automated scrutiny. ",[188,217,219],{"id":218},"agentic-security-tools","Agentic security tools  ",[40,221,222],{},"Security needs to be embedded in the act of creation rather than added on further downstream. This means security analysis tools that operate as companions to AI coding assistants within the development environment itself, providing real-time feedback on the security implications of generated code as it is produced, rather than catching issues hours or days later in a CI\u002FCD pipeline. This helps to shrink the gap between code generation and security validation near zero. ",[53,224,226],{"id":225},"the-regulatory-dimension","The Regulatory Dimension ",[40,228,229],{},"The regulatory environment is adding both urgency and legal liability to AI code quality. ",[40,231,232],{},"The EU Cyber Resilience Act (CRA), which came into force in 2024 with compliance deadlines extending through 2027, requires manufacturers of software products to implement comprehensive cybersecurity requirements. Products must be developed according to secure-by-design principles, delivered free from known exploitable vulnerabilities, and supported by ongoing security updates. AI generated software that has never been reviewed by a human with the expertise to assess its security posture is unlikely to meet these obligations. For organisations operating in or selling into the EU market, this creates direct compliance exposure. ",[40,234,235],{},"The EU AI Act adds further requirements for software systems that incorporate AI capabilities which increasingly means any software built using AI coding tools, since the generated code itself may embed AI-powered features. High-risk AI systems require comprehensive testing for accuracy, robustness and non-discrimination, with documentation requirements that assume human oversight of the development process. ",[40,237,238],{},"In the UK, the Product Security and Telecommunications Infrastructure Act and forthcoming cyber security regulations apply similar principles to connected products and digital services. Sector-specific regulators – the FCA for financial services, the ICO for data protection, the CQC for healthcare – are applying existing regulatory frameworks to software quality in ways that create implicit requirements for code review and testing clarity. ",[40,240,241],{},"The implication is that regulatory frameworks consistently place responsibility for software quality on the organisation that deploys it, regardless of whether the code was written by a human developer, generated by an AI tool or produced through vibe coding. This means that engineering leaders who do not establish governance frameworks for AI-generated code are accepting regulatory risk on behalf of their organisations. ",[53,243,245],{"id":244},"theorganisationalpolicy-question","The Organisational Policy Question ",[40,247,248],{},"The evidence points clearly toward a need for explicit organisational policy on AI-assisted development. This is not to prohibit it, but to establish the governance framework within which it operates safely. ",[40,250,251],{},"At minimum, this means defining where on the spectrum of AI-assisted development the organisation is willing to operate, and under what conditions. For production systems, the expectation should be clear: all code, regardless of origin, must be reviewed, tested and understood by a qualified human before deployment. For prototyping, internal tools and exploratory work, the tolerance for less rigorous oversight may be higher, but even here, security scanning and basic quality gates should apply. ",[40,253,254,255,259],{},"It also means addressing the skills dimension. The World Quality ",[64,256,103],{"href":257,"rel":258},"https:\u002F\u002Fwww.capgemini.com\u002Finsights\u002Fresearch-library\u002Fworld-quality-report-2025-26\u002F",[68]," 2025 found that 50% of organisations lack AI\u002FML expertise in their quality engineering teams, which is a gap that extends to understanding the specific failure modes of AI-generated code. Engineers need training not just in using AI coding tools effectively, but in reviewing AI-generated output critically, such as recognising the patterns of duplication, the categories of security vulnerability, and the architectural anti-patterns that these tools systematically produce. ",[40,261,262],{},"The most mature organisations are treating AI-generated code as a catalyst for strengthening universal quality practices. If AI-generated code requires comprehensive security scanning, automated quality gates, mandatory test coverage and architectural conformance checks, then so does all code. The AI coding revolution may ultimately leave its most lasting impact not through the code it generates, but through the quality practices it forces organisations to adopt. ",[53,264,266],{"id":265},"the-compounding-problem","The Compounding Problem ",[40,268,269],{},"What makes the vibe coding quality crisis particularly challenging is its compounding nature. AI-generated code that is not properly tested accumulates as technical debt. That technical debt makes the codebase harder to understand, which makes it harder to write effective tests, which makes it more likely that the next round of AI-generated code will introduce further undetected issues. GitClear’s 2024 data that states an eightfold increase in code duplication results in a 40% decline in refactoring represents the early stages of this compounding cycle. ",[40,271,272],{},"The organisations that act now, establishing testing standards, security gates and governance frameworks for AI-assisted development, will be those that capture the genuine productivity benefits of AI coding tools. However, those that delay, assuming the problem will resolve itself as models improve, are ignoring the clearest finding in the research – code quality does not improve with model size. Better models may produce more syntactically correct code, but as it stands do not result in more secure, maintainable, or architecturally sound code.",{"title":11,"searchDepth":12,"depth":12,"links":274},[275,276,277,278,286,287,288],{"id":55,"depth":12,"text":56},{"id":119,"depth":12,"text":120},{"id":158,"depth":12,"text":159},{"id":182,"depth":12,"text":183,"children":279},[280,282,283,284,285],{"id":190,"depth":281,"text":191},3,{"id":197,"depth":281,"text":198},{"id":204,"depth":281,"text":205},{"id":211,"depth":281,"text":212},{"id":218,"depth":281,"text":219},{"id":225,"depth":12,"text":226},{"id":244,"depth":12,"text":245},{"id":265,"depth":12,"text":266},"2026-04-29T13:38:42.140Z","\u002Fimg\u002Fblog\u002Fai-engineering-impact-on-testing.png",{},"\u002Fblog\u002Fwhat-ai-assisted-engineering-means-for-software-testing",{"text":294,"minutes":295,"time":296,"words":297},"12 min read",11.52,691200,2304,{"title":32,"description":42},"blog\u002Fwhat-ai-assisted-engineering-means-for-software-testing","wsyeABeQ9Q60go_R6v_f-Q6l1Zi6nPoVgdpzLyIKULw",1777475755350]