AES & ChaCha — A Case for Simplicity in Cryptography
A technical deep dive into how the ChaCha20 cipher is taking on AES as the gold standard for symmetric encryption, and a lesson about the power of simplicity in cryptographic design.
Photo by Vlado Paunovic on Unsplash
Introduction
Whether you've worked with cryptography and encryption tools or not, you've probably heard of AES. The Advanced Encryption Standard was announced by NIST in 2001, and is one of the most widely used cryptographic algorithms for symmetric encryption. AES has long been the cipher of choice for securing web traffic, including TLS, VPNs, email, messaging apps, and more. However there is a challenger to the dominance of AES — ChaCha20. Daniel Bernstein published the ChaCha20 cipher in 2008, and it has gained increasing popularity and use in systems as fundamental as TLS and is often suggested as a replacement for AES.
Let's take a closer look at how these two ciphers work, how they compare, and why you may want to choose one over the other.
A Primer on Symmetric Encryption
Encryption is essentially a way of scrambling data in such a way that it can only be unscrambled by someone who knows the key. Encryption schemes aim to scramble the original data in a way that makes it computationally impractical to recover either the key or the original plaintext via either brute force or cryptanalysis. In practice, for an encryption scheme to be strong, it must be resistant to cryptanalysis of large amounts of ciphertext generated using the same key, and even known pairs of plaintext and ciphertext blocks.
Symmetric Encryption refers to encryption operations that use the same key to both encrypt and decrypt data. Encrypting data this way is fast, but requires that the key is handled carefully. This is the type of encryption we'll look at.
Before we continue, let's establish some terminology and review the basic architecture of a symmetric encryption scheme. We'll skip over a lot here, since an exhaustive description of these two ciphers and how they operate is out of scope of a single blog post, so here are some disclaimers:
- We won't cover every variant of these two ciphers. AES in particular comes in many flavors, but we will focus on CTR/GCM.
- This post doesn't cover authenticated encryption, which adds properties of confidentiality, authenticity, and integrity to encrypted data.
- We won't cover finite field arithmetic, which is helpful for a full understanding of AES.
Some terminology
- XOR: Exclusive OR - a bitwise logical operation denoted by the symbol ⊕, that returns
1
only if the two input bits are different. - Plaintext: refers to unencrypted input data. This is the data we are trying to encrypt for secure transport or storage.
- Ciphertext: refers to the encrypted version of the plaintext. This is the end result of the encryption operation, and is the data that we can safely store or transport.
- Key: refers to the secret key that is used for encryption and decryption operations. It's also referred to as the "password" in some cases.
- Nonce: a Number used Once is an additional parameter used to add additional entropy to the encryption of a given piece of data with a given key. In some schemes this is considered secret, but generally not. The Nonce primarily helps to make keys re-usable without leaking information via cryptanalysis.
- Counter: a block number, used to quickly skip to a particular position in the keystream.
- Keystream: an expanded version of the key, usually in the form of a series of bytes of arbitrary length that is used to encrypt the plaintext byte-by-byte. The Key and Nonce are fed into a Keystream Generator which performs a series of operations to generate this key stream. More on this soon.
- Entropy: a measure of randomness or unpredictability. In cryptography, high entropy makes keys and nonces harder to guess or brute-force. It's what keeps encrypted data from being predictable.
- Diffusion: refers to the degree to which one input to an operation affects the total output. High diffusion is a desirable property in encryption schemes, as it makes it harder for an attacker to apply statistical analysis to decipher the key or plaintext
- Stream Cipher: an encryption algorithm that generates a keystream and combines it with the plaintext one byte at a time, typically using XOR. Stream ciphers are fast and well-suited for encrypting data of arbitrary length or streaming data in real-time.
- Block Cipher: an encryption algorithm that processes data in fixed-size blocks (e.g. 128 bits at a time). The plaintext is split into blocks, and each block is encrypted separately.
Why do we need a cipher?
To understand why we need a cipher at all, let's see what happens when we just XOR the key and plaintext together. After all, the key is secret, so this seems like it should work. Let's try it.
We'll "encrypt" the message The launch code is River
with the key SECRET_KEY
.
First, we'll represent the plaintext and key as hexadecimal bytes:
Since, the key is much shorter than the message, we'll have to expand it somehow. Let's repeat the key to make it as long as the message. This allows us to compute an XOR for each byte of the plaintext with a byte from our expanded key, giving us a ciphertext.
As a string, this ciphertext would look like -&r)5*%&1s&, t68e:3&
. This might look sufficiently "scrambled", but there are a number of problems with this approach. A closer look at the input and output bytes reveals predictable patterns that reveal information about the key. For example, XORing identical plaintext bytes with the same repeated key byte produces identical ciphertext, leaking information about the key. With enough data, this can quickly allow an attacker to decipher the key, message, and any other data that was secured using this key in the past or future.
This approach is also extremely vulnerable to a host of other attacks including known-plaintext attacks, statistical analysis, or just good-old guesswork, since the key just doesn't provide us with enough entropy, and repeated uses of the key produce very predictable outputs.
Thus, we need a stream of bytes that expands our key in a more unpredictable way, and can be XORed with messages of arbitrary length without producing predictable patterns of ciphertext. The problem of obtaining this Keystream from a shorter key is the problem that all stream ciphers, including AES and ChaCha, are attempting to solve.
Common ground
Both AES and ChaCha combine the key, nonce, and a counter or block number to produce a keystream. This keystream can be XORed with the plaintext byte-by-byte to generate the ciphertext:
The mechanism that combines these inputs to produce the keystream is called the Keystream Generator (KSG). The KSG is the defining feature of a stream cipher, and will be the focus of our exploration into the differences between AES and ChaCha.
AES
AES natively operates on fixed-size 128-bit blocks, so to behave like a stream cipher, it's typically used in Counter (CTR) mode, or more recently in Galois/Counter Mode (GCM). In these modes, AES generates a keystream by encrypting incrementing blocks that combine a nonce and a counter:
Keystream[i] = AES_KSG(Key, Nonce || Counter[i])
Each AES block encryption consists of a sequence of rounds — 10, 12, or 14 depending on key size. This output is then XORed with the plaintext to produce the ciphertext, one block at a time.
Inside an AES Encryption Round
AES is designed as a substitution-permutation network. Each round applies a series of transformations to a 4×4 matrix of key bytes called the state. To begin with, the state is populated with the Nonce and Counter.
These transformations are designed for diffusion and confusion, making cryptanalysis difficult — but they're computationally expensive and involve lookup tables, finite field math and complex transformations.
These transformations are:
SubBytes
– Each byte is replaced using a non-linear S-box (a substitution table).ShiftRows
– Bytes are shifted within rows to mix columns.MixColumns
– Each column is mixed using transformations over a finite field (specifically, multiplication in GF(28)).AddRoundKey
– The round key (derived from the original key) is XORed into the state.
Let's take a look at each of these operations in more detail.
SubBytes
In the first step, we use a fixed lookup table called an S-Box to substitute each byte in our state with one from the S-Box.
Each byte in our state ai,j is replaced with a byte bi,j from the S-box:
This is what the full Rijndael S-Box used in AES looks like, if you're curious:
00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 0A | 0B | 0C | 0D | 0E | 0F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
00 | 63 | 7C | 77 | 7B | F2 | 6B | 6F | C5 | 30 | 01 | 67 | 2B | FE | D7 | AB | 76 |
10 | CA | 82 | C9 | 7D | FA | 59 | 47 | F0 | AD | D4 | A2 | AF | 9C | A4 | 72 | C0 |
20 | B7 | FD | 93 | 26 | 36 | 3F | F7 | CC | 34 | A5 | E5 | F1 | 71 | D8 | 31 | 15 |
30 | 04 | C7 | 23 | C3 | 18 | 96 | 05 | 9A | 07 | 12 | 80 | E2 | EB | 27 | B2 | 75 |
40 | 09 | 83 | 2C | 1A | 1B | 6E | 5A | A0 | 52 | 3B | D6 | B3 | 29 | E3 | 2F | 84 |
50 | 53 | D1 | 00 | ED | 20 | FC | B1 | 5B | 6A | CB | BE | 39 | 4A | 4C | 58 | CF |
60 | D0 | EF | AA | FB | 43 | 4D | 33 | 85 | 45 | F9 | 02 | 7F | 50 | 3C | 9F | A8 |
70 | 51 | A3 | 40 | 8F | 92 | 9D | 38 | F5 | BC | B6 | DA | 21 | 10 | FF | F3 | D2 |
80 | CD | 0C | 13 | EC | 5F | 97 | 44 | 17 | C4 | A7 | 7E | 3D | 64 | 5D | 19 | 73 |
90 | 60 | 81 | 4F | DC | 22 | 2A | 90 | 88 | 46 | EE | B8 | 14 | DE | 5E | 0B | DB |
A0 | E0 | 32 | 3A | 0A | 49 | 06 | 24 | 5C | C2 | D3 | AC | 62 | 91 | 95 | E4 | 79 |
B0 | E7 | C8 | 37 | 6D | 8D | D5 | 4E | A9 | 6C | 56 | F4 | EA | 65 | 7A | AE | 08 |
C0 | BA | 78 | 25 | 2E | 1C | A6 | B4 | C6 | E8 | DD | 74 | 1F | 4B | BD | 8B | 8A |
D0 | 70 | 3E | B5 | 66 | 48 | 03 | F6 | 0E | 61 | 35 | 57 | B9 | 86 | C1 | 1D | 9E |
E0 | E1 | F8 | 98 | 11 | 69 | D9 | 8E | 94 | 9B | 1E | 87 | E9 | CE | 55 | 28 | DF |
F0 | 8C | A1 | 89 | 0D | BF | E6 | 42 | 68 | 41 | 99 | 2D | 0F | B0 | 54 | BB | 16 |
ShiftRows
The next step is a permutation operation that shuffles bytes within each row. The first row is left as is, while bytes in the second row are shifted 1 position to the left, the third row is shift 2 positions left, and the fourth row is shifted 3 positions left.
MixColumns
This step applies a linear transformation to each column in the AES state matrix by multiplying it with a fixed 4×4 matrix, commonly referred to as the Rijndael MixColumns matrix:
In essence, this operation takes 4 bytes from a single column and mixes them together using finite field multiplication, so that the value of each input byte influences every byte in the output. This step along with the ShiftRows
operation are the main sources of diffusion in AES.
The values in the matrix and the column vector are treated as polynomials over the finite field GF(28), and the multiplication is done modulo an irreducible polynomial (specifically, x8 + x4 + x3 + x + 1). Each column is transformed by multiplying it as a vector by the fixed matrix, with arithmetic done in the field.
A full understanding of this operation involves finite-field arithmetic, which is beyond the scope of this blog — but worth exploring if you're curious about how AES achieves both performance and cryptographic strength at such a low level.
AddRoundKey
The final step uses a subkey for each round called a round key, and combines it with the state. The round key is derived using a key schedule and is the same size as the state. Each byte of the round key is XORed with the corresponding byte of the state:
These operations are repeated up to 14 times for a 256-bit key, with the final state of each round being used as an input to the start of the next. The output of the final round is used as our keystream, which is finally XORed with our plaintext to produce the ciphertext.
ChaCha20
ChaCha20, derived from it's predecessor Salsa, is a stream cipher that is designed as a pseudorandom function that uses a combination of Add
, Rotate
and XOR
operations, commonly referred to as ARX. These core operations at the heart of the cipher are computationally much cheaper than those found in AES, and are performed several times over a series of rounds on a 64 byte block of data.
Let's take a closer look under the hood.
Block structure
Similarly to AES, ChaCha operates on a 4x4 state matrix made up of sixteen 32-bit words, making each block 512-bits (64-bytes). This matrix is initialized with:
- A 128-bit constant (
"expand 32-byte k"
) - A 256-bit key: K
- A 32-bit block counter n
- A 96-bit nonce IV
The constant serves primarily as a defense against 0-keys, i.e keys that are composed of all 0s, which would result in the cipher outputting all 0 bits. More generally, it ensures bit diffusion even with carefully crafted inputs that an attacker may use to try and extract information about the key. The value is not a secret: there's nothing up my sleeve.
ChaCha Rounds
This state is run through 20 rounds of transformations, with the goal of scrambling the input data to maximize bit diffusion. These rounds are constructed as iterations of a core quarter round function. We'll explore the details of this quarter round operation shortly, but for now we all we need to know is that its a function which operates on 4 words from the matrix at a time, labelled A,B,C,D, and outputs 4 words in return:
[A′, B′, C′, D′] = ChaCha_QUARTER_ROUND([A,B,C,D])
The quarter round is run once on each of the 4 columns of the state creating 1 column round:
And once on each of the 4 diagonals, creates 1 diagonal round:
Column and diagonal rounds are run in an alternating sequence, with column rounds run on odd rounds, and diagonals on even rounds. Together, these 8 quarter rounds are called a double round.
10 double rounds create the 20 total rounds of the cipher:
The Quarter-Round Function
The heart of ChaCha is the quarter-round, which modifies four words of the state using only Add
, Rotate
and XOR
Add
This is unsigned integer addition, in modulo 2³². In ChaCha, this is done on 32-bit words. The addition mixes input bits non-linearly and contributes to diffusion.
// add a and b
a = 0xFFFFFFFF // 4294967295
b = 0x00000001 // 1
a + b = 0x00000000 (overflow wraps around)
Rotate
Bit rotation shifts bits left and wraps the bits that fall off back to the right side. For example, rotating 7 bits to the left looks like:
// rotate x left by 7 bits
x = 0x86D2BF3A // binary: 10000110110100101011111100111010
ROL(x, 7) = 0x695F9D50 // binary: 01101001010111111001110101000000
XOR
Bitwise XOR outputs 1 if the inputs bits are different, 0 if the same.
// XOR a and b
a = 0b11001100
b = 0b10101010
a ^ b = 0b01100110
These three operations — collectively called ARX (Add, Rotate, XOR) — are fast, run in constant-time, and are natively supported on every modern CPU. No lookup tables, no finite field math. These ARX operations are performed in the same sequence across the 4 inputs (A,B,C,D) on each quarter round, and give this cipher its name. Here's what that looks like, moving from top to bottom:
The final output of 20 rounds gives us a block of keystream bytes which, just like AES, are XORed with the plaintext to give us our ciphertext.
Key takeaways
So far we've explored the inner workings of both AES and ChaCha20, while also taking a look at the shared high-level KSG architecture and how these ciphers are used to encrypt data. Both ciphers are masterfully designed and are triumphs of modern day cryptography, yet differ quite significantly in approach and implementation.
Complexity vs Efficiency
The approaches used by AES and ChaCha have a glaring difference. AES uses a series of very complex and computationally expensive operations. SubBytes
requires looking through a large fixed matrix of values that must be held in memory. MixColumns
and the key schedule used in the final AddRoundKey
operation involve complex finite field arithmetic. This complexity does a great job of scrambling data with maximal bit-diffusion, but it comes at a cost.
Performance
The most obvious problem with complex operations is degraded performance. Fans of AES would argue that in practice it operates just as fast or even faster than competitors like ChaCha20, but that is only possible because of dedicated hardware modules that accelerate the algorithm. On Intel and AMD chips, the AES-NI
instruction set accelerates the entire encryption round in constant time. On ARM, a similar set of AES instructions is available.
Without these instructions, AES relies on lookup tables and software implementations of Galois field multiplication — which are both much slower and harder to implement securely.
Cache Timing Attacks
AES isn't just slower without hardware support—it's also potentially more vulnerable. Software implementations typically rely on precomputed S-boxes and Transformation tables (T-tables) to substitute and mix bytes in the cipher state. These tables are indexed using secret values, and when the CPU loads them from memory, the pattern of cache hits and misses can leak information.
This forms the basis of a cache-timing side-channel attack, first demonstrated by Daniel Bernstein in a 2005 paper. By measuring how long AES takes to encrypt different inputs, an attacker can statistically infer which parts of the lookup tables were accessed—and by extension, reconstruct portions of the secret key.
The root problem isn't the specific lookup tables, but the key-dependent memory access patterns and variable timing that leak information through the microarchitecture. These risks are hard to avoid in software, especially on shared hardware, and they're a consequence of AES's inherent complexity and reliance on table-driven logic.
ChaCha and Simplicity
Because ChaCha avoids complex arithmetic and uses only ARX operations, it performs consistently across architectures — desktop, mobile or embedded. It doesn't need hardware acceleration to be fast, and in some cases (especially on mobile or low-end CPUs), it actually outperforms AES. These operations don't involve memory access patterns that depend on secret data, so there's no opportunity for the kind of cache leakage that we see in AES.
The result is that ChaCha runs at consistent speed regardless of input, with no timing variation to exploit. It's also easier to implement securely, since the operations that comprise the quarter round are primitive, and the overall structure of the cipher is far easier to understand. Performance is inherently superior thanks to the primitive ARX operations, and no hardware acceleration is needed. The simplicity and elegance of ChaCha's design makes it faster, more secure, easier to understand and thus easier to implement.
Its not hard to see why ChaCha20 is seeing increased adoption across a range of applications and technologies. In practice the cipher is most often used in the extended-nonce XChaCha variant with the Poly1305 authentication code, referred to as XChaCha-Poly1305. Google added support for it in TLS in 2014, and the IETF standardized the use of ChaCha20-Poly1305 in TLS in 2016. ChaCha20 is also used is IPSec, S/MIME, WireGuard, OpenSSH, and several other technologies including Phase.
Conclusion
Both AES and ChaCha are excellent ciphers. They've been thoroughly vetted, widely adopted, and provide robust security in practice. However, ChaCha's simplicity isn't just aesthetic — it's functional. It avoids a class of real-world problems by design. No special hardware, no lookup tables, no timing leaks. Just a clean, elegant sequence of simple operations that do one thing well.
There's a broader lesson here, too: complexity can be a liability. It might solve the problem in theory, but it often creates new ones in practice. AES is a case study in clever but intricate design — ChaCha is a reminder that sometimes, the better answer is the simpler one.
Simplicity is a virtue. Keep it simple, and you'll usually end up with something safer, faster, and easier to reason about.