Proxying is enough
Presentation: https://youtu.be/OL5rqvLkJfE
Last updated
Presentation: https://youtu.be/OL5rqvLkJfE
Last updated
achieves selective data provenance from a TLS server by enforcing a server-client-verifier communication pattern. However, DECO introduces significant performance overhead. This raises the question: is it possible to achieve the same functionality by placing the verifier as a proxy between the client and server? Naturally, the participant acting as the proxy must not be able to learn any sensitive information. While DECO's strength lies in requiring no modifications to the server side, a proxy-based approach offers the additional benefit of not requiring any modifications on the client side either.
AEAD is a widely used encryption method in modern cryptographic systems, offering two key security guarantees simultaneously:
Confidentiality (Encryption): It encrypts the plaintext so that third parties cannot read its content.
Integrity and Authentication: It ensures that any tampering with the ciphertext or related information can be detected.
AEAD provides stronger security than simple encryption by also covering various types of metadata encountered in real-world scenarios. This is where the concept of Associated Data (AD) comes into play.
Associated Data is data that is not encrypted but is still included in the authentication scope.
AD remains in plaintext and is visible to anyone.
However, during decryption, if the provided AD does not match the original AD used during encryption, the authentication fails.
✉️ Think of it like the address on an envelope — it’s visible to everyone, but it must not be altered.
Traditional encryption schemes (e.g., AES-CBC) only encrypt the plaintext and leave other parts of the message (headers, identifiers, etc.) unprotected.
In real-world communication systems, the existence of unauthenticated metadata leads to several potential attacks:
Reordering
Changing the order of messages to alter meaning or induce errors
Replay
Re-sending the same message repeatedly to confuse the system
Re-targeting
Forging a message to appear as if intended for a different recipient
→ These attacks arise because the context surrounding the ciphertext is not protected.
AEAD generates an authentication tag that includes both the ciphertext and the AD. This leads to the following outcomes:
The plaintext is encrypted,
The AD is authenticated but not encrypted,
If the AD does not match during decryption, the authentication fails and decryption is rejected.
✅ Header forgery prevention
Protects metadata outside the encrypted portion
✅ Receiver verification
Allows messages to be accepted only by specific recipients
✅ Context binding
Cryptographically binds key, nonce, and AD into a secure “context”
An AEAD scheme defines the following interfaces:
AES-GCM is an AEAD (Authenticated Encryption with Associated Data) algorithm that combines the following two components:
The plaintext and associated data are divided into blocks:
Encrypt plaintext using AES-CTR:
Generate authentication tag using GHASH:
Verify authentication tag using GHASH:
Decrypt ciphertext using AES-CTR:
Encrypt the plaintext using ChaCha20:
Generate the authentication tag using Poly1305:
Verify the authentication tag using Poly1305:
Decrypt the ciphertext using ChaCha20:
This attack evaluates how strongly an AEAD scheme binds a ciphertext to a specific context.
The success of this attack implies that a ciphertext is not uniquely bound to a single context.
It exploits the lack of Key Commitment in the AEAD scheme.
CDY Attack
Preimage Attack (ciphertext → context)
CMT Attack
Collision Attack (same ciphertext → two valid contexts)
This mirrors the security properties of cryptographic hash functions: If a hash function is collision-resistant, it is also preimage-resistant. Similarly, if an AEAD scheme is secure against CMT attacks, it is also secure against CDY attacks. (We will discuss Context Unforgeability in a later section.)
Although AES-GCM and ChaCha20-Poly1305 are widely used AEAD schemes, they do not guarantee the Key Commitment property. That is, the same ciphertext may be valid under different (key, nonce) combinations, which introduces a potential vulnerability. Here's why:
In AES-GCM, the authentication tag is computed as:
In ChaCha20-Poly1305, the tag is computed as:
There is no 2PC (two-party computation) overhead, offering a performance benefit.
Unlike DECO, no modifications are required on either the server or the client side.
The proof in this design can be structured as follows:
then a serious security vulnerability may arise.
This vulnerability stems from the structural properties of TLS itself and may affect systems like DECO in a similar way.
However, if the underlying AEAD scheme does satisfy Key Commitment, then secure zk proofs can still be constructed in the proxy-based model, as proven in Theorem 5.2 of the referenced paper.
This reasoning is justified when modeling AES as an ideal cipher.
📌 Definition: Ideal Cipher
Then a natural question arises: do we have this kind of padding structure in TLS?
Although these values are not fixed, their possible combinations are highly constrained, offering useful—though weaker—security compared to fixed padding. This is referred to as Variable Padding.
For example:
The Date
field can vary per second, but within a 1-hour time window, it can take on 3600 values.
Thus, the total number of possible combinations is:
However, this security assumption relies on the presence of HTTPS. What happens if HTTPS is not available?
This attack model lies between the Context Discovery (CDY) and Context Commitment (CMT) models, and it evaluates whether an AEAD scheme enforces decryption only under a unique context.
In this analogy, the relationship is:
Preimage Attack ↔ CDY Attack
Second Preimage Attack ↔ CFY Attack
The table below compares different types of context attacks on AEAD.
CDY (Context Discovery)
Preimage attack
CFY (Context Forgery)
Second Preimage attack
CMT (Context Commitment)
Collision attack
Thus, the upper bound on the attack success probability is:
Rearranging this equation, we obtain:
Let’s analyze the distribution of both sides.
Thus, the probability that the equation holds — i.e., the probability that the attack succeeds — is:
Whether AEAD schemes require the Key Commitment property depends on the nature of the data being protected.
When Key Commitment is essential: For sensitive information such as bank account balances, Key Commitment is critical. If the ciphertext is decrypted under a forged context, it may result in misinterpreted financial data, which could lead to serious financial incidents. Therefore, decryption must succeed only under a single, correct context, a property guaranteed when AEAD satisfies Key Commitment.
When Key Commitment is not required: Data such as age, account numbers, or national ID numbers are examples of Fixed Data, which do not change across sessions. Even if an attacker manipulates the context to alter the age from 20 to 80, for instance, proper external safeguards can detect or prevent such tampering effectively.
Even in environments where HTTPS cannot be used, and therefore Variable Padding is unavailable, there is no need for despair.
As discussed earlier, when using ChaCha20-Poly1305 as the AEAD scheme, even though it lacks Key Commitment, the probability of a successful CFY attack is extremely low, and a zkTLS protocol based on Fixed Data can still be implemented safely and practically.
This article examined the Key Commitment issue of AEAD schemes in the context of selective data disclosure over TLS. In particular, it highlighted the limitations of the existing DECO protocol and explored how a proxy-based approach could overcome them. The proxy-based design has the advantage of requiring no modifications to either the server or the client, while still enabling selective disclosure through zero-knowledge proofs.
However, since AES-GCM and ChaCha20-Poly1305, the AEAD schemes commonly used in TLS, do not guarantee the Key Commitment property, any such protocol must ensure that AEAD decryption is tightly bound to a unique context to be considered secure.
This article proposed two practical solutions to address the issue:
In an HTTPS environment, the use of Variable Padding can enforce a predictable structure in ciphertexts, helping to mitigate the lack of key commitment.
In non-HTTPS environments, the statistical security guarantees of AEAD schemes like ChaCha20-Poly1305 make it feasible to build a secure zkTLS protocol, especially for Fixed Data.
Therefore, while enforcing full Key Commitment in all scenarios may be unrealistic, by carefully analyzing the data characteristics, the structure of the AEAD scheme, and the capabilities of potential adversaries, it is entirely feasible to design a secure and practical zkTLS system.
That said, one possible drawback of the proxy-based approach is that the proxy has visibility into all communications, which could be viewed as a trade-off.
: Generates a secret key and nonce from transcript . We refer to the pair as the context.
: Encrypts plaintext using key , nonce , and associated data , producing ciphertext and authentication tag .
: Verifies using , , and . If valid, decrypts to recover plaintext ; otherwise, returns .
-: A stream cipher method for encrypting plaintext.
: A function used to compute an authentication tag from the ciphertext and associated data.
where and are the number of blocks in the plaintext and AD, respectively.
is an AEAD (Authenticated Encryption with Associated Data) algorithm that combines the following two components:
: A fast stream cipher used for encrypting plaintext.
: A universal hash function that generates an authentication tag from the ciphertext and associated data.
Here, and are derived as follows:
A Context Discovery Attack (CDY) refers to an adversary’s ability to find a valid decryption context for a given , even if it is not the original context used during encryption.
The goal is not to recover the plaintext itself, but rather to find any context for which decryption succeeds without error.
A Context Commitment Attack (CMT) occurs when an adversary can find two different contexts—say and with —that both successfully decrypt the same .
Even if , it’s possible that .
It is also possible that , even if .
According to the referenced paper, in AEAD schemes like AES-GCM and ChaCha20-Poly1305, an adversary making at most queries to the oracle can succeed in a CMT (Context Commitment) attack with probability at least .
As illustrated above, when the verifier acts as a proxy between the prover (or TLS client) and the TLS server , the following advantages arise:
Since only relays messages, this design maintains compatibility with various versions of TLS.
In DECO, since knows , it can forge a ciphertext instead of relaying the actual server response from using . To prevent this, DECO splits between and , and only allows to reveal its share after commits to .
This issue can also be addressed in the proxy-based approach. Because , acting as a proxy, observes all encrypted requests and responses exchanged between and , it can detect if attempts to tamper with the response and use a forged instead of the genuine .
In DECO, the issue of forgery arises from ’s knowledge of , allowing it to manipulate the ciphertext using that key. To mitigate this, the key was split between and , and would only reveal its key share after receiving a commitment to from .
Here, is a user-defined predicate representing the condition the user wants to prove about .
However, the constraint introduces significant complexity. To simplify the circuit, we may omit this constraint and instead define the circuit as:
This is because AEADs used in TLS 1.2/1.3 do not satisfy Key Commitment, making it possible to find two different combinations and that decrypt the same triple . For example:
If , and particularly if , then multiple conflicting claims can be proven over the same ciphertext, leading to critical security failures.
According to the paper “,” the key commitment issue can be mitigated using fixed padding. For example, consider prepending 128 bytes of 0x00
to the plaintext as padding. Decrypting to exactly 128 bytes of zeros is extremely difficult. The decryption process of AES-GCM works as follows:
Assume that fixed padding blocks are inserted at the beginning of the plaintext. For the decrypted output to exactly match the padding, the following condition must hold:
In other words, different key/nonce pairs must produce identical key streams for blocks—an extremely unlikely event.
An ideal cipher is defined as a collection of functions from a message space to a ciphertext space .
Each is a random permutation over .
That is, for each key , the function is assumed to be independent and completely random.
💡 In other words, under the ideal cipher model, even when the same input is used with different keys and , the outputs are almost certainly unrelated.
As shown in the image above, many web servers—including those of Google and Twitter—prepend HTTP responses with consistently structured data such as the HTTP status code
and Date
fields. This formatting is considered best practice under , and widely adopted by major web servers like Apache and Nginx. According to the paper, more than 85% of webpages begin with this kind of pattern.
There are approximately 63 HTTP status codes
, based on the (excluding unassigned codes). (Interestingly, when I counted them myself, there were 64 HTTP status codes
. )
If the HTTP status code
and Date
header together form a 54-byte string, this effectively corresponds to having possible padding values. It means that forcing a decryption result to match one of these valid patterns is statistically infeasible for an attacker. In this sense, such structural padding acts as a practical substitute for full key commitment and can offer meaningful protection. (For a detailed analysis, refer to of the paper.)
A CFY attack aims to find a different context such that a given , originally generated under a key and nonce , can still be successfully decrypted.
The constraint is that .
In other words, the attacker tries to determine whether the ciphertext can be decrypted under a different pair than the one originally used.
The structure of a CFY attack closely resembles that of a Second Preimage Attack in cryptographic hash functions. In a Second Preimage Attack, the attacker is given an input and seeks a different input that hashes to the same value:
This is different from a Preimage Attack, where one is given only a hash output and must find any corresponding input :
One valid that produces
A second valid that produces
Two valid and that produce a same
The zkTLS attacker must perform a CFY attack, not a CDY attack. This is because is already a valid decryption context, and the attacker’s goal is to find a different context that also successfully decrypts the ciphertext.
For simplicity, let’s assume the ciphertext consists of a single block and there is no associated data . Under these conditions, the AES-GCM tag is computed as follows:
Now, suppose an attacker randomly samples a different key , and attempts to generate the same tag using a new context :
Assuming AES behaves as an ideal cipher, each key defines a reversible permutation. Thus, the attacker can rearrange the above equation to solve for :
In other words, given , , and , the attacker can compute a candidate nonce that would yield the same tag.
⚠️ However, in TLS, nonces must follow a specific format: among the 64-bit nonce, the last 32 bits must equal . Therefore, the probability that a randomly generated satisfies this constraint by chance is:
Each attempt with a new requires two AES invocations: one for and one for . If the attacker is limited to a total of AES queries, they can make approximately attempts.
To simplify the discussion, assume the ciphertext consists of a single block and there is no associated data . Under this assumption, the ChaCha20-Poly1305 tag is computed as:
Here, and are computed as follows:
The attacker randomly samples a different key and nonce pair and attempts to generate the same tag using this new context. For the attack to succeed, the following condition must hold:
Since behaves as a , different pairs will produce independent and uniformly random values. Therefore, is uniformly distributed over .
is a , meaning that its output is also uniformly distributed. Hence, if we define:
then is uniformly distributed over as well.
Therefore, if the attacker is allowed to make queries, the upper bound on the total success probability is: