Functional testing answers "does it work?" Non-functional testing answers a harder set of questions: does it work fast enough, securely enough, and reliably enough, under real-world conditions, at scale, and when things go wrong?
For most of software engineering's history, these were secondary concerns. Engineers got the features right first, then performance-tested and ran security scans before a release. In a monolithic application deployed to a known server, this was often sufficient. The system was predictable, the infrastructure was stable and the test environment could reasonably approximate production.
However, these assumptions don’t necessarily hold in cloud-native architectures. Applications built on microservices, containers, and serverless functions are distributed by design, where a single user request may traverse dozens of services, each independently deployed, each with its own failure modes, latency characteristics, and scaling behaviour. The infrastructure is elastic – services scale up and down automatically, containers are ephemeral, and the topology of the system changes constantly. The attack surface is broader, the failure modes more varied, and the interactions between components more complex than traditional testing methodologies were designed for.
Research predicted that cloud-native platforms underpinned over 95% of new digital workloads by 2025, up from less than 40% in 2021. However, the non-functional testing practices required to validate these systems have not kept pace with the architectural shift. The World Quality Report 2025, surveying over 2,000 executives globally, found that 50% of organisations lack the technical expertise to test modern architectures effectively – a figure unchanged from the prior year.
This article examines the five dimensions of non-functional testing that matter most in cloud-native environments: performance, resilience, security, observability, and accessibility, and what engineering leaders need to consider to address them.
Performance Testing: From Load Tests to Latency Budgets
In a monolith, performance testing typically means running a load test against a known set of endpoints and measuring response times. The system is self-contained, the call stack is local, and bottlenecks can be identified with a profiler.
However, in a microservices architecture, the problem is qualitatively different.
Latency compounds across service chains. A request traversing five services in sequence accumulates latency from each hop, and at tail percentiles the behaviour is super-additive – small increases at individual services (10-20ms per hop) can compound into severe end-to-end violations (100ms+) that degrade the user experience.
Research on distributed system performance found that organisations must maintain average response times of 50 milliseconds or less for 95% of transactions to meet modern business requirements, with network latency between geographically distributed nodes typically ranging from 5 to 15 milliseconds per hop. When the latency budget for an entire request chain is measured in tens of milliseconds, every service in the chain must be tested not just for its own performance but for its contribution to end-to-end latency.
This has driven a shift from traditional load testing toward Service Level Objective (SLO) based performance engineering. Rather than testing against arbitrary response time thresholds, teams define SLOs that express the performance commitments that matter to the business, e.g. "99th percentile latency for checkout requests must be below 200ms", and then test continuously against those objectives. SLOs make performance testing a business-aligned discipline rather than a technical exercise, and they provide a shared language between engineering, product, and operations teams for discussing acceptable performance trade-offs.
Performance testing in distributed systems also requires rethinking what constitutes a realistic test. The test environment must closely replicate production infrastructure – underpowered test environments produce misleading results, and the cost of discovering a performance problem in production is orders of magnitude higher than catching it earlier. Tests must validate auto-scaling behaviour under realistic traffic patterns, including spikes, sustained load, and the specific request mixes that characterise real usage. They also must examine interaction effects between services that share resources, as scaling one service can create bottlenecks in its dependencies – a failure mode invisible to any test that examines services in isolation.
Contract testing for APIs – validating that services adhere to agreed interfaces and performance characteristics – is becoming the backbone of distributed system quality. When dozens of teams independently deploy services that depend on each other, contract tests provide the automated verification that a change to one service does not silently degrade another. Without them, performance regressions propagate invisibly through the system until they manifest as user-facing problems that are difficult to diagnose and attribute.
The challenge for many organisations is that performance testing still operates as a pre-release activity, acting as a gate before deployment rather than a continuous discipline. In cloud-native environments where services are deployed multiple times per day, this cadence can be insufficient. Instead, performance validation must be embedded in the CI/CD pipeline, with automated checks running against every deployment and production monitoring providing continuous feedback on whether SLOs are being met.
Resilience Testing: From Edge Case to Core Discipline
If performance testing asks "does it work fast enough?", resilience testing asks "what happens when things go wrong?".
In distributed systems, partial failure is less of an edge case and more of a constant. Services become temporarily unavailable, network partitions occur, databases hit capacity limits, third-party dependencies time out, and container orchestrators reschedule workloads across nodes. The question isn’t whether these failures will happen but whether the system degrades gracefully when they do.
Chaos engineering – the practice of deliberately introducing failures into production-like environments to observe system behaviour – has evolved from a discipline pioneered by Netflix into a mainstream quality practice. Research on distributed systems found that organisations implementing chaos engineering principles achieve 89% higher reliability than those relying on traditional testing approaches and reduce mean time to recovery from 18 minutes to 2.5 minutes through automated remediation mechanisms. The same study found that comprehensive observability solutions detected potential failures an average of 17 minutes before they caused service impact and that organisations conducting quarterly recovery simulations achieved 89% successful recovery rates during actual incidents.
The value of chaos engineering lies in its ability to reveal failure modes that no amount of pre-production testing would uncover. These include
- Cascading failures, where one service's degradation overwhelms its dependents through increased latency or error rates;
- Retry storms, where well-intentioned retry logic amplifies a transient failure into a system-wide overload. One case study documented a payment processor handling 126,000 transactions per minute that found limiting retries to 8% of normal traffic volume provided the optimal balance between resilience and system stability;
- Resource contention under multi-tenant workloads, where interference between unrelated requests causes performance degradation that is invisible to single-tenant testing;
- And timeout misconfiguration, which research shows causes 2.7 times more service degradations during periods of network instability.
For engineering leaders, the challenge is that chaos engineering requires a culture that treats controlled failure as a learning exercise rather than a risk. It requires investment into realistic staging environments (or carefully controlled production experiments) and the observability infrastructure to analyse what happens when failures are injected. And it requires a maturity progression, moving teams that typically begin with tabletop exercises (discussing what would happen if a service failed), to automated fault injection in staging environments, and eventually conduct controlled experiments in production with appropriate safeguards. Organisations that skip these maturity stages, or that attempt chaos engineering without adequate observability, can risk creating the very outages they are trying to prevent.
Security Testing: The Expanded Attack Surface
The attack surface of cloud-native applications is broader than monolithic systems. Every API endpoint, container image, third-party dependency, infrastructure configuration, and service-to-service communication channel is a potential vector.
The Sysdig Cloud-Native Security and Usage Report found that 87% of container images have high or critical vulnerabilities, and 66% of organisations have experienced security issues from insecure container images. As well as this, software supply chain attacks surged by 431% between 2021 and 2023, and it is reported that the average modern application comprises 70-90% open-source components, each with its own dependency chain and vulnerability profile.
GitLab's 2024 Global DevSecOps Report found that security and DevSecOps platforms are the joint first and third investment priorities for IT teams globally, with the average cost of a ransomware data breach reaching $4.91 million and that over 85% of organisations now identifying AI-related vulnerabilities as their fastest-growing cyber risk.
The integration of security into the development pipeline – DevSecOps – is the necessary response, but the scope of what "security testing" means in cloud-native environments has expanded significantly. It now encompasses static application security testing (SAST) running on every commit, software composition analysis (SCA) scanning every dependency for known vulnerabilities, dynamic application security testing (DAST) in staging environments, container image scanning before deployment, infrastructure-as-code scanning to catch misconfigured cloud resources, and secrets detection to prevent credentials being embedded in code or configuration.
The software supply chain dimension requires a particular focus. Gartner reports that 60% of large enterprises are already deploying software supply chain security tools in 2025, with that figure projected to reach 85% by 2028. The EU Cyber Resilience Act orders on software security are driving organisations to generate and maintain Software Bills of Materials (SBOMs) – comprehensive inventories of every component in their software – and to implement automated policy enforcement that prevents non-compliant artefacts from reaching production.
AI adds a further layer. AI-powered applications introduce non-functional security concerns that traditional testing does not address, such as prompt injection attacks that manipulate model behaviour, data leakage through model outputs, adversarial inputs that cause unexpected behaviour, and the risk that AI systems may expose sensitive training data. These require specialised testing approaches, including red-teaming and adversarial evaluation, that most security teams are still developing. The convergence of AI-generated code (with its elevated vulnerability rates) and cloud-native deployment patterns creates a compounding risk that demands integrated security testing across the entire delivery pipeline.
The velocity challenge is significant. AI-assisted development is accelerating the rate at which code is produced and deployed. Security checks that take hours or require manual intervention become bottlenecks that teams can end up working around rather than through. Security testing must therefore be automated, fast, and embedded in the workflow, providing rapid feedback while maintaining the rigour necessary to catch the vulnerabilities that matter.
Observability: From Operations Concern to Quality Discipline
Observability has evolved into a core quality engineering discipline. The traditional monitoring approach including collecting metrics and alerting on thresholds becomes insufficient for distributed systems where the root cause of a performance issue, a reliability problem, or an unexpected behaviour pattern may span multiple services, infrastructure layers, and data flows.
Modern observability integrates three pillars – logs, metrics, and distributed traces – to provide the visibility required to understand system behaviour in production. Of these, distributed tracing is the most transformative for quality engineering. Tracing follows individual requests across service boundaries, providing a complete picture of how a request was processed, where time was spent, and where errors occurred. It transforms debugging from "something is slow" to "this specific request spent 800ms waiting for a database query in the inventory service, which was experiencing connection pool exhaustion due to a deployment in the payment service."
For quality teams, observability creates a feedback loop that fundamentally changes testing strategy. Production observability data reveals performance patterns, failure modes, and user experience issues that inform which tests to write, which scenarios to prioritise, and where investment in testing will have the greatest impact. Teams can analyse real traffic patterns to design more realistic performance tests, identify the failure modes that actually occur in production and build resilience tests that target those specific scenarios, and track SLO compliance over time, correlating changes in quality metrics with specific deployments or configuration changes.
This represents a shift in what quality means. In this cloud-native era, quality no longer stops at deployment but extends into production, where assumptions made during development are validated against real-world behaviour, and where the continuous stream of observability data becomes the most valuable input to the testing strategy. The organisations that have embraced this approach describe it as "shift-right" quality – complementing pre-production testing with production observability to achieve a level of quality confidence that neither approach could deliver alone.
The implication is that quality engineers require new skills. Engineers need to be able to read and act on observability data, such as understanding traces, interpreting latency distributions, correlating metrics with user experience, and distinguishing between normal variation and genuine degradation. This is a significant shift from traditional testing, where the system under test was typically a self-contained application running in a controlled environment with predictable behaviour.
Accessibility: The Compliance Dimension
The European Accessibility Act, which came into force in June 2025, places legal requirements on digital products and services sold in the EU market. This means that accessibility testing is no longer a nice-to-have but a compliance obligation for any organisation with European customers or users.
Yet accessibility remains one of the least mature areas of non-functional testing in most organisations. Automated accessibility scanning tools can catch a proportion of issues, such as missing alt text, insufficient colour contrast, incorrect ARIA attributes, or broken focus order, but meaningful accessibility validation still requires manual testing with assistive technologies, cognitive walkthrough of user journeys, and ideally testing by people with disabilities who can identify usability barriers that automated tools would miss.
In cloud-native architectures, the challenge is amplified by the pace of change. When services are deployed multiple times per day and the user interface draws on multiple independently deployed components, accessibility regressions can be introduced at any deployment.
Automated accessibility checks in the CI/CD pipeline can provide a baseline, catching the most common violations before they reach production, but they are insufficient on their own. Organisations need a layered approach: automated scanning in the pipeline, periodic manual audits of key user journeys, and a culture of accessibility awareness among developers and designers that prevents the most common barriers from being introduced in the first place.
The Skills and Organisational Challenge
The common thread across all five non-functional testing dimensions is that they require capabilities well beyond traditional functional testing. Quality engineers supporting cloud-native applications need understanding of microservices architecture, cloud platforms, container orchestration, infrastructure as code, security principles, observability tooling, performance engineering methodology, and accessibility standards. Non-functional testing in cloud-native environments inherently becomes a cross-functional discipline.
Risk-based testing becomes essential when the service landscape is too large for exhaustive coverage. Teams must prioritise based on business impact, usage patterns, and regulatory sensitivity, focusing performance testing on the highest-traffic paths, resilience testing on the most critical services, and security testing on the most exposed components. This prioritisation requires both technical knowledge and business context to provide an understanding of which services support revenue-critical transactions, which handle regulated data, and which have the most complex dependency chains.
The World Quality Report's finding that 50% of organisations lack AI/ML expertise applies equally to distributed systems testing expertise. The skills required to validate modern architectures have evolved faster than most organisations' training and hiring strategies. Engineering leaders face a choice in investing in upskilling existing teams (slower but builds institutional knowledge), hiring specialists (faster but creates dependency on scarce talent), or partnering with organisations that bring cloud-native testing maturity (pragmatic but requires careful knowledge transfer).
Non-functional testing can no longer be treated as a specialist activity performed by a separate team before major releases. In cloud-native environments, performance, security, resilience, observability, and accessibility must be embedded throughout the development lifecycle – in the CI/CD pipeline, in production monitoring, in the design discussions that happen before a line of code is written. The organisations that achieve this integration will be those that deliver cloud-native applications with the quality, security, and reliability that their users and regulators demand.


