Evaluate API Performance with k6 Load Testing
Back to Blog

Evaluate API Performance with k6 Load Testing

I've been tasked with load testing a new API core with k6, a tool that uses JavaScript-based scripts to define and execute load scenarios against APIs. It allows requests, traffic patterns, and assertions to be described in code, then runs them at scale while producing metrics such as latency, throughput, and error rates.

Metrics don't mean much in isolation; they only make sense once you understand how load is being applied and how the system is being pushed.

Load testing vs stress testing

Load testing and stress testing are often used interchangeably, but they describe different conditions.

Load testing targets expected or near-expected traffic. The goal is to verify stability and performance under realistic usage. It answers whether the system behaves correctly within its intended operating range.

Stress testing goes beyond that range. The system is deliberately pushed past its capacity to observe failure modes. The focus is not correctness, but how and where the system breaks.

Open vs closed load models

Load models define how traffic is generated and how system capacity should be interpreted under load.

A closed model uses a fixed number of virtual users, where each user runs in a loop sending requests. The key property is that total load is indirectly controlled by response time. If the system becomes slower, each virtual user spends more time waiting for responses, which naturally reduces the overall request rate. This makes it useful for answering: how many concurrent users can the system handle before latency and stability degrade?

An open model fixes the request rate instead of the number of users. Requests are generated at a constant pace regardless of system performance. If the system slows down, traffic does not self-adjust; instead, requests start to queue, time out, or fail. This makes it useful for answering: what sustained request rate can the system handle while still meeting an SLA?

Test parameters and scenario design

In load testing, traffic is usually defined through scenarios, which describe how requests are generated over time. Scenarios control not just how much load is applied, but how it evolves during the test.

Key parameters inside scenarios include concurrency, request rate, duration, and workload mix. Each of these affects which part of the system becomes a bottleneck first.

A constant scenario applies a fixed level of traffic for the entire test duration. It is used to observe steady-state behavior after the system has warmed up, where metrics like latency and throughput stabilize under consistent load.

A ramping scenario increases load gradually over time. It is used to identify saturation points, where latency starts increasing or throughput stops scaling. This type of scenario is more useful for understanding system limits than steady-state performance.

What to look at in a test

The main focus in load testing is not individual metrics, but how the system behaves as load increases.

Key signals to watch:

  • Latency (p95, p99)
    p95 shows typical user experience under load, while p99 highlights edge cases and early instability.

  • Throughput (requests/sec)
    Indicates actual processing capacity. A plateau despite increasing load usually signals saturation.

  • Error rate
    Often rises after latency increases and throughput flattens, making it a late indicator of overload.

  • Stability over time
    Whether metrics are steady or highly fluctuating, which can indicate stable vs unstable saturation.

Failure and error behavior

k6 separates failures into transport-level and application-level categories.

Transport-level failures (http_req_failed) occur when the HTTP request does not complete successfully at the network or protocol level. This includes timeouts, connection resets, DNS failures, TLS issues, or cases where no valid HTTP response is received. It is strictly independent of HTTP status codes or application logic.

Application-level failures are defined through response validation such as custom checks on response content or status conditions. These represent cases where the request completes successfully at the HTTP level, but the application returns an invalid or failed result.

These two can diverge depending on where the system breaks under load.

One important caveat is that what appears to be a "business logic failure" can sometimes indirectly lead to transport-level failures if it causes the server or downstream dependencies to drop connections, reset sockets, or fail to respond within the allowed time window. In these cases, k6 would still count it as a http_req_failed.