Tonelli-Shanks Algorithm

The Tonelli–Shanks algorithm is a classical and efficient method to compute modular square roots over finite fields Fpm\mathbb{F}_{p^m}, especially when mm is odd and p1(mod4)p \equiv 1 \pmod 4, where simpler methods like Shanks do not apply. The method is based on Algorithm 5 from ePrint 2012/685.


Preconditions

To use Tonelli–Shanks:

  • mm must be odd.

  • pp must be an odd prime

  • aa is a quadratic residue.

We express p1p - 1 as:

p1=2sTp - 1 = 2^s \cdot T

where TT is odd.

This decomposition is known as the 2-adic decomposition of p1p−1, and ss is the 2-adicity.


Algorithm Steps (High-level)

Let’s outline the algorithm assuming aFpma \in \mathbb{F}_{p^m}^*:

  1. Precomputation

    • Write p1=2sTp - 1 = 2^s \cdot T

    • Find a known quadratic non-residue cFpmc \in \mathbb{F}_{p^m}

    • Compute z=cTz = c^T — a 2s2^s-th primitive root of unity

  2. Initial values

    • x=a(T+1)/2x = a^{(T+1)/2}

    • b=aTb = a^T

    • v=sv = s

  3. Main loop

    • While b1b \ne 1, do:

      • Find the smallest kk such that b2k1b^{2^k} \equiv 1

      • Let w=z2vk1w = z^{2^{v - k - 1}}

      • Update:

        xxw,bbw2,zw2,vkx \leftarrow x \cdot w,\quad b \leftarrow b \cdot w^2,\quad z \leftarrow w^2,\quad v \leftarrow k
  4. Output

    • xx is the square root of aa


Intuition Behind the Updates in Tonelli–Shanks

The Tonelli–Shanks algorithm iteratively computes the square root of a quadratic residue aFpa \in \mathbb{F}_p, given the decomposition p1=2sTp - 1 = 2^s \cdot T, with TT odd. Its central mechanism relies on maintaining a key invariant and gradually reducing the complexity of the problem step by step.


Invariant: xi2=abix_i^2 = a \cdot b_i

At each iteration ii, the algorithm maintains:

xi2=abix_i^2 = a \cdot b_i

We define:

  • x0=a(T+1)/2x_0 = a^{(T+1)/2}

  • b0=aTb_0 = a^T

  • Then:

    x02=aT+1=aaT=ab0x_0^2 = a^{T+1} = a \cdot a^T = a \cdot b_0

Inductive step: Suppose at iteration ii, the invariant holds:

xi2=abix_i^2 = a \cdot b_i

Then the algorithm updates:

wi=z2viki1xi+1=xiwibi+1=biwi2\begin{aligned} w_i &= z^{2^{v_i - k_i - 1}} \\ x_{i+1} &= x_i \cdot w_i \\ b_{i+1} &= b_i \cdot w_i^2 \end{aligned}

Then we have:

xi+12=(xiwi)2=xi2wi2=(abi)wi2=a(biwi2)=abi+1x_{i+1}^2 = (x_i w_i)^2 = x_i^2 \cdot w_i^2 = (a \cdot b_i) \cdot w_i^2 = a \cdot (b_i \cdot w_i^2) = a \cdot b_{i+1}

Therefore, the invariant is preserved at every step:

xi2=abix_i^2 = a \cdot b_i

Why bi1b_i \to 1: Convergence of the Loop

At each iteration of Tonelli–Shanks, the algorithm updates:

bi+1=biwi2b_{i+1} = b_i \cdot w_i^2

To show that bi1b_i \to 1, we analyze how the order of bib_i decreases over time. Suppose that at iteration ii, the smallest integer kk such that bi2k=1b_i^{2^k} = 1, and bi2k11b_i^{2^{k-1}} \ne 1. Then bib_i is a primitive 2k2^k-th root of unity.

We now prove that the update:

bi+1=biwi2withwi=z2vk1b_{i+1} = b_i \cdot w_i^2 \quad \text{with} \quad w_i = z^{2^{v - k - 1}}

forces bi+1b_{i+1} to have strictly lower order.

Let’s compute:

(bi+1)2k1=(biwi2)2k1=bi2k1(wi2)2k1=(1)(1)=1\begin{aligned} (b_{i+1})^{2^{k-1}} &= (b_i \cdot w_i^2)^{2^{k-1}} \\ &= b_i^{2^{k-1}} \cdot (w_i^2)^{2^{k-1}} \\ &= (-1) \cdot (-1) = 1 \end{aligned}

Why does this work?

  • Since bi2k=1b_i^{2^k} = 1 and bi2k11b_i^{2^{k-1}} \ne 1, we know by the Halving Lemma that bi2k1=1b_i^{2^{k-1}} = -1

  • Now for wiw_i, recall that:

    wi=z2vk1wi2=z2vkw_i = z^{2^{v - k - 1}} \Rightarrow w_i^2 = z^{2^{v - k}}

    Then:

    (wi2)2k1=z2vk2k1=z2v1=1(w_i^2)^{2^{k-1}} = z^{2^{v - k} \cdot 2^{k-1}} = z^{2^{v - 1}} = -1

    because zz is a quadratic non-residue, and by construction(when i=0,v=si = 0, v= s):

    z2s1=(cT)2s1=c2s1T=1z^{2^{s-1}} = \left(c^T\right)^{2^{s-1}} = c^{2^{s-1}\cdot T} = -1

    This identity continues to hold at every step due to the invariant:

    zi2v1=1z_i^{2^{v - 1}} = -1

Result

Since both bi2k1=1b_i^{2^{k - 1}} = -1 and (wi2)2k1=1(w_i^2)^{2^{k - 1}} = -1, their product is 1:

(biwi2)2k1=1(b_i \cdot w_i^2)^{2^{k - 1}} = 1

Thus, the order of bi+1b_{i+1} is now at most 2k12^{k - 1}, which is strictly less than the order of bib_i. As a result, the order of bb decreases with each iteration, and within at most ss steps, we reach:

blast=1b_{\text{last}} = 1

Once this happens, the invariant x2=abx^2 = a \cdot b gives x2=ax^2 = a, completing the algorithm.

Written by ryan Kim from A41

Last updated