Level Two: JSON over HTTP

Published on

In the Level One post, I covered some of the core practices related to JSON APIs. For part two, we're diving a little deeper into some of the trickier topics surrounding API design and starting to think more about security and performance.


Idempotent endpoints

If a request is idempotent, it can be safely executed multiple times with the same expected result. There should be no side effects to the request.

There are standard expectations in the HTTP specification around the behaviour of HTTP methods that you need to adhere to when you're designing your API.

HTTP GET and HEAD requests should always be idempotent. Clients can safely call the endpoint multiple times and expect there to be no side effects. These requests should not modify resources in any way.

HTTP PUT requests should also be idempotent. Since a PUT request replaces the object, sending the same request again will just replace the object in the same way.

POST requests are not idempotent. Two successive POST requests should create two different objects with different identifiers. If there are specific uniqueness requirements, the second POST request might even fail.

The idempotent nature of an endpoint helps us understand how and where we can cache the response. Spec-complient behaviour makes it easier for developers to use standardised tooling to consume your API.

The stateless nature of APIs

In the context of APIs, being stateless means that a particular request doesn't require any other request or piece of knowledge (on the server) to be successful. Services serving APIs shouldn't use sessions or any other mechanism to manage state between requests. Any state needs to be managed by the client, and all the information needed to satisfy a request should be sent with the request.

Keeping our API stateless means that the backend service no longer needs to retrieve the application state from a session store before serving a request. This allows us to increase performance and scalability in ways we wouldn't be able to do otherwise.

If your design calls for scenarios where you need to understand what was previously requested, it might be best to create a "request" object that can be queried later.

POST /api/v1/ride-request {'source', 'destination', 'scheduled_time', etc..}
GET /api/v1/ride-request/:id

It is common in traditional web applications to use cookies and remote sessions to manage application state. When building an API it's important to avoid this pattern to maintain a stateless application architecture. Modern third party cookie policies mean that outside of a BFF being served on the same domain you can no longer rely on web browsers sending cookies with requests to your service.

Optimisations for Backends for Frontends

When you're building an API for a platform service, or an API that's going to be consumed by many clients, it's strongly recommended to keep to a resource-driven API layout. This means following the basic principals of REST and creating separate endpoints for each resource.

When creating an API for a specific frontend you can do things a little differently.

Because of their nature, Backends for Frontends (BFFs) should be optimised for the needs of the frontend they are paired with. The goal of a BFF is to help the frontend be as performant as possible. Your design should optimise for the following:

  1. Create endpoints that help to reduce the number of frontend requests.
  2. Create composite responses to save the frontend from having to make additional requests.
  3. Optimise responses by returning only the information that you know the frontend will use.

Rather than being built to serve multiple clients, your BFF should evolve with the frontend that it's serving. Ideally, this means that the frontend and BFF are being managed by the same team, or teams that work closely together. A new field should only be added to the BFF response when it's needed by the frontend.

Importantly BFFs should be built to serve a single frontend application. Breaking this rule means that we quickly start to lose the benefits of BFFs as the responses start becoming more generic to serve multiple UIs.


In most systems after a user has authenticated the client will have a token that will be used to represent the user during future requests. There are many ways that this can be transmitted to the server. The correct way, as discussed in RFC 7235 is to use the HTTP Authorization header.

When we have a token issued by an authorisation server (i.e. one issued via an OAuth request) we call this a "Bearer token" and prefix the token with the word "Bearer":

Authorzation: Bearer <token>

This pattern applies equally to both opaque tokens and JWTs. The Bearer Token nomenclature comes directly from the OAuth 2.0 Specification but is often used outside the context of OAuth.

Some APIs may use API keys or other credentials as query parameters or in the POST body. If you're designing an API from scratch there are very few reasons to use this pattern. Including the token as a query parameter increases the risk that the token will be logged or accidentally shared by the user.

Bot protection and rate limiting

If you require rate limiting on your endpoint you should rate limit based on the number of requests made with a specific token. IP address-based solutions offer little to no value as it can be easily bypassed by even amateur attackers.

Implementing rate limiting normally means implementing a leaky bucket style log of requests in a globally accessible data store. Data stores that allow for expiring entries, like Redis, help to simplify this process. You should look at solutions provided by your web framework of cloud provider (i.e. Azure APIM or AWS API Gateway) before writing your own rate limiting code.

From a design perspective, you should look at returning x-ratelimit and x-ratelimit-remaining headers to let consumers know when they need to slow down. Others to consider include x-ratelimit-reset to tell the consumer how many seconds before the rate limit is reset and x-ratelimit-replenish-rate to convey how quickly the rate limit will increase. Whether you include these headers will depend on the [type of rate limiting][rate-limit-technique] you implement.

If a client has been rate limited return a 429 response.

BFFs that are consumed by a browser-based frontend can use bot protection like reCAPTCHA or more advanced services like Kasada. These services require Javascript to be executed in the browser and a server-side check of the token provided. There are privacy implications any time you introduce a third party service like this: your customer's data and device fingerprint are shared with the service making a decision about whether they are a bot or not.

Make sure everything to do with rate limiting and bot protection is documented clearly so that your consumers know what to expect.


Cross Origin Resource Sharing is a system for allowing web browsers to make HTTP requests to resources on another domain. If your API is intended to be consumed directly from a browser you will have to get your CORS setup right.

The browser will make an OPTIONS request (called a "preflight request") to your endpoint before making the actual call. This request needs to return the CORS headers to be validated by the browser.

For public consumption your API will return Access-Control-Allow-Origin: *, Access-Control-Allow-Headers: ... (with any headers that the browser needs access to to) and Access-Control-Allow-Methods: ... with a list of methods that will be used.

For BFFs your Access-Control-Allow-Origin header should include only the domain of your frontend application. This, along with the same-origin policies in modern browsers, helps prevent attacks that rely on making requests across domains from a browser.

If you're building an API that will be consumed by a web browser the MDN guide on CORS is required reading.

Server Side Caching

You need to consider two places where data can be cached when designing your API: server side and client side.

On the server side you can use fast-access data stores like Redis or Memcached to store data that is expensive to retrieve from the database. This allows you to quickly handle requests for the same information.

Server side caching works by checking to see if a copy of the data exists in the cache before going to the data source (i.e. database or external service) to get it. If you go to the data source then you keep a copy of the response in your cache for next time. The cache will be configured with an expiry. This expiry should make sense for the piece of information being stored.

Be very careful when caching data that's tied directly to a specific user. If you decide to do this, make sure that the user's id or the token being used is included in the cache key. You don't want to accidentally serve someone else's details to a user.

Sometimes you might want to implement server side cache warming. This is when you pre-cache data on your server to ensure that none of your users have to suffer through a slow "first hit". Implementing cache warming could be a sign that you have other performance issues in your application's design. Think about those before reaching for a tool like this.

Another tool that you might use is "cache busting". This is when you use a special flag to bypass the cache (and optionally recache the response). You'll often use this to invalidate a cached response after a change has been made, or to give a user or administrator a way to bypass the cache.

Client Side Caching

Your backend service has a few ways that it can suggest to the client that it cache the response that you give it.

The Cache-Control header is how the server indicates to the client whether they should cache the response. Specifying a max-age in the header hints to the client that the response can be stored and reused for that period of time.

The slightly confusing no-cache directive actually means that clients can cache the response but should revalidate it before re-use. Use no-store or max-age=0 if your aim is to prevent all caching.

Including an ETag header in the response indicates to a client that they should cache the response until the ETag value changes. The client implements this by sending subsequent requests to the same resource with an If-None-Match header with the ETag value. On the backend, your service checks to see if the response will have the same ETag value and returns a 304 Not Modified response instead of the normal response.

ETag / If-None-Match allows us to save bandwidth by not returning large response payloads when the client doesn't need them. Your backend service needs to be able to either consistently calculate the ETag value for a response payload or store the value in a way that it can be updated if the response will change.

It's important to remember that clients can do anything they want. Just because you implement ETag and Cache-Control headers doesn't mean that your clients have to use them.

Further Reading