Deep Dive into JWTs

Published on September 22, 2022

Have you had to use, validate, request or generate JWTs for authorisation in your application? Ever wondered what's going on inside that otherwise opaque-looking token? Let's go on a little journey together and deep dive into the wonderful world of JSON Web Tokens.

Spotting and Decoding JWTs

Once you know what to look for, you'll see JWTs in the wild all the time. The basic structure, as outlined in RFC 7519 for what is officially called a JSON Web Signature (JWS) style JWT, is as follows:

<header>.<payload>.<signature>

The header and payload portions are simple urlsafe base64 encoded JSON objects. The signature is a urlsafe base64 encoded hash, based on the signing algorithm defined in the header.

Because of the way that they are encoded, and because the header object always follows the same structure, there are two key giveaways to spot a valid JWT:

The token has two . characters in it, separating the parts
The token starts with ey

Point #1 is explained by the structure defined in the spec, but why the ey rule?

The header uses the "JOSE" (JSON Object Signing and Encryption) format which dictates that it must be a valid JSON object. Those JSON objects are wrapped by { and use " to wrap keys within the structure. Once minified the JSON header will always begin with {" which, when base64 encoded, becomes ey.

There are a number of valid keys, called "claims" in the world of JWTs, defined in the spec for the header object, but in practice you're likely to only see a few:

alg must be present in all tokens. It defines the signing algorithm being used by the token. Although provided, in most cases you should ignore this value when verifying tokens (more on that later).
typ can be supplied to indicate the token type.
kid will be provided when there are multiple keys that could be used to sign the token, this helps the client decide which key to verify with.
jku and jwk give details about the signing key used.

There are other keys defined in the spec but they are used relatively infrequently.

Actually decoding the token is a straight forward process. Split the token into three parts, then reverse the urlsafe base64 encoding on the header and payload. Once some formatting is applied you will go from this:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

To this:

{
  "alg": "HS256",
  "typ": "JWT"
}.{
  "sub": "1234567890",
  "name": "John Doe",
  "iat": 1516239022
}.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

A more complex token, in this example issued by Auth0, might look like this:

eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IkFCblFyemhSdWxDZXgyUUNJelZfdyJ9.eyJuaWNrbmFtZSI6Im1lIiwibmFtZSI6Im1lQGV4YW1wbGUub3JnIiwicGljdHVyZSI6Imh0dHBzOi8vcy5ncmF2YXRhci5jb20vYXZhdGFyL2NkMTE5MjMyODRmYzBmOTA0YzQ3MzJiYjhmN2Q3ZTNjP3M9NDgwJnI9cGcmZD1odHRwcyUzQSUyRiUyRmNkbi5hdXRoMC5jb20lMkZhdmF0YXJzJTJGbWUucG5nIiwidXBkYXRlZF9hdCI6IjIwMjEtMDktMjNUMjI6NTc6NDEuNTMwWiIsImVtYWlsIjoibWVAZXhhbXBsZS5vcmciLCJlbWFpbF92ZXJpZmllZCI6ZmFsc2UsImlzcyI6Imh0dHBzOi8vZGV2LW8zbWkyNnAyLmF1LmF1dGgwLmNvbS8iLCJzdWIiOiJhdXRoMHw2MTRkMDY2NTI4Y2JkNTAwNjllMTNlMzkiLCJhdWQiOiJYWThyYjRIU01tODNSODB6Y0dXTGxGNlpjYTFvWnk4RyIsImlhdCI6MTYzMjQzNzg2NSwiZXhwIjoxNjMyNDczODY1fQ.5rWkDXfq-v8KkAfY91T2A7-xlIwXpLSoSz5aCFcGD8pfATSagrVYBCHNkNJ7h2sPZstn2LIfPg5EdLHPA2fm6CA3fl3Ba4Nbv2RHLAypjNAGqjh267jXzSbw-pPDRaXKKtaIFyy4GkSkD1KZ6fA1NEfPX13SbTDgpCXkvzQJSzZ_ELrNoCjOBH2wSgwwDA44qhTFI6FTm_-5-IZQF6XFlUKFzo9ZZYGzD6CbnIYTcRWm7Y3vJr4zgmdw7GAHRS9gY0yDs2Br1K2zg8DLNYrSAnR9WgWaZGCYF8xTHIseiFxBxI9koF2EIh3Z9iLuLgJH7pDQgCng3P46wEsVx8U1cA

And decode into this:

{
  "alg": "RS256",
  "typ": "JWT",
  "kid": "ABnQrzhRulCex2QCIzV_w"
}.{
  "nickname": "me",
  "name": "me@example.org",
  "picture": "https://s.gravatar.com/avatar/cd11923284fc0f904c4732bb8f7d7e3c?s=480&r=pg&d=https%3A%2F%2Fcdn.auth0.com%2Favatars%2Fme.png",
  "updated_at": "2021-09-23T22:57:41.530Z",
  "email": "me@example.org",
  "email_verified": false,
  "iss": "https://dev-o3mi26p2.au.auth0.com/",
  "sub": "auth0|614d066528cbd50069e13e39",
  "aud": "XY8rb4HSMm83R80zcGWLlF6Zca1oZy8G",
  "iat": 1632437865,
  "exp": 1632473865
}.5rWkDXfq-v8KkAfY91T2A7-xlIwXpLSoSz5aCFcGD8pfATSagrVYBCHNkNJ7h2sPZstn2LIfPg5EdLHPA2fm6CA3fl3Ba4Nbv2RHLAypjNAGqjh267jXzSbw-pPDRaXKKtaIFyy4GkSkD1KZ6fA1NEfPX13SbTDgpCXkvzQJSzZ_ELrNoCjOBH2wSgwwDA44qhTFI6FTm_-5-IZQF6XFlUKFzo9ZZYGzD6CbnIYTcRWm7Y3vJr4zgmdw7GAHRS9gY0yDs2Br1K2zg8DLNYrSAnR9WgWaZGCYF8xTHIseiFxBxI9koF2EIh3Z9iLuLgJH7pDQgCng3P46wEsVx8U1cA

This example includes the additional kid header to indicate which RS256 public key was used to sign the token. Auth0 and many other providers make those keys available as a JSON Web Key Set (JWKS) that can be retrieved when verifying a token.

If you are decoding tokens that you've been issued by a service simply because you're nosey then you probably don't care that much about the header. Unless there's something unusual (i.e. an alg that isn't RS256 or HS256) then the claims in the payload portion of the token are by far the most interesting part.

Registered Claims

Once you've decoded the payload portion of the token you'll like notice a set of relatively-cryptic three letter keys (claims) in the object. The spec defines a list of registered claims that might be included in the token. All of these claims are considered optional and will often not be included. Those registered claims are:

iss to represent the issuer of the token.
sub to represent the subject of the token. For tokens that represent a user this will normally be unique identifier that represents the user the token was issued to.
aud to represent the recipient of the token.
exp to represent the expiry time of the token - this is represented in the "seconds since the epoch" format.
nbf to represent a time that the token should not be used before.
iat to represent the time that the token was issued.
jti to represent a unique identifier for this token.

In addition to the claims there might be any number of additional "private claims" added by the issuer of the token. These additional claims won't always follow the three-character convention from the spec – these short names are simply to save as much space in the token as possible. If you're checking the content of tokens that you've been issued by service then these private claims are like to be the most juicy part!

The Auth0 token above has private claims like email, email_verified and nickname that might be useful to your application. The example below is from Supabase and includes email, phone and app_metadata to describe how the user authenticated.

{
  "alg": "HS256",
  "typ": "JWT"
}.{
  "aud": "authenticated",
  "exp": 1659921285,
  "sub": "7f07478b-3a95-4055-98b4-7aa7f0a92091",
  "email": "me@example.org",
  "phone": "",
  "app_metadata": {
    "provider": "email",
    "providers": [
      "email"
    ]
  },
  "user_metadata": {},
  "role": "authenticated"
}.AtdMAFrUUcAy6IOihVczSCVFPU7lNdovQ9XW_psr1-4

Verifying JWTs

If you're planning on verifying a JWT for use by your own application you should read through the JSON Web Token Best Current Practices RFC. It's a highly accessible document and runs through a complete set of best practices and covers far more than I will in this post.

Verification of the content of a JWT must check the following:

Confirm that the alg in the header matches what you are expecting.
Verify that the aud represents your service or platform.
If provided, verify that the iss matches the expected issuer for your service.
Verify that the token has not expired by comparing exp to the current time.
Check that iat and nbf are in the past if they are provided. Your service might allow a small buffer to allow for time drift between your service and the token issuer.
If a kid is specified in the header confirm that the key exists.
Verify that the signature provided is correct

When comparing the alg, aud and iss fields they should effectively be hard-coded (perhaps via environment specific configuration) within your application. For each token type / issuer combination there should be a single valid value for each of these fields.

Once you've completed the "cheap" verifications it's time to cryptographically verify the signature of the token. Your service should know the expected algorithm for each token issuer and explicitly use that algorithm for checking the signature. Never rely on the alg key in the header to select the signature algorithm during verification.

You should rely on a trusted library for verifying the token signature rather than attempting to do it yourself. If there's no library for your preferred language then JSON Web Signature RFC goes into more depth about the process you need to follow.

Common Signing Algorithms

To quickly cover off an item from the spec that you need to know about: the specification allows the use of the none signing algorithm. This effectively allows unsigned tokens, which your application should never support. There's a variety of CVEs for validation libraries and applications caught unexpectedly supporting {"alg": "none"} in the JWT header.

Beyond that little mistake, there are two types of signing algorithms you will encounter in the wild:

Symmetric algorithms that use a shared secret between signer and verifier (e.g. HS256)
Asymmetric algorithms that use a public / private key pair (e.g. RS256)

HMAC SHA-256 (HS256) takes a shared secret, concatenates it with your token, then generates a signature with the HMAC algorithm. It can be very convenient when you're able to share the secret securely between your application and the token provider. When using HS256 it's critically important that you generate a long and secure secret.

From RFC7515:

Keys are only as strong as the amount of entropy used to generate them. A minimum of 128 bits of entropy should be used for all keys, and depending upon the application context, more may be required.

RSASSA-PKCS1-v1_5 (RS256) uses a private key to generate the signature, which can then be verified using only the public key. This allows the service signing the token to publicly share the public key (i.e. in a JWKS file) that services can then use to verify the token. When you're interacting with a provider that uses the same keys to sign tokens for multiple clients, then you'll very likely end up using RS256.

Signing Your Own JWTs

Before we leave finish up this post, I'm keen to quickly cover generating your own RS256 signed token. This can be handy in cases where you need tokens for testing your application (I've done this for both functional and performance testing in the past). This example uses Python and the cryptography library to do the signing, but the code should be pretty straight forward to port to other languanges.

We will sign the token with a key pair that we control, so start by using openssl to create both the public and private keys:

openssl genpkey -algorithm RSA -out private.pem -pkeyopt rsa_keygen_bits:2048
openssl rsa -pubout -in private.pem -out public.pem

Using Python, we start by assembling the header and payload that we will sign:

header = urlsafe_b64encode(
    json.dumps({"alg": "RS256", "typ": "JWT", "kid": kid}).encode("utf-8")
).decode()

payload = urlsafe_b64encode(
  json.dumps({"sub": "1234567890", "name": "John Doe", "iat": 1516239022})
).decode()

Concatenate the two parts and remove any padding (the = character) to create the token that we will sign:

token_to_sign = f"{header.rstrip('=')}.{payload.rstrip('=')}".encode("utf-8")

Load the key using the serialization module of the crytography library:

with open(f"private.pem", "rb") as key_file:
  key = serialization.load_pem_private_key(
      key_file.read(),
      password=None,
  )

  signature = urlsafe_b64encode(
      key.sign(token_to_sign, padding.PKCS1v15(), hashes.SHA256())
  )

And finally, concatenate our token_to_sign and the signature to create the token we can use to test our application:

token = f"{token_to_sign.decode()}.{signature.decode().rstrip('=')}"