Parallel/Distributed FFT

Parallel FFT

Suppose we are given a polynomial P(X)=i=022n1ciXiP(X) = \sum_{i=0}^{2^{2n} - 1} c_i X^i. The Discrete Fourier Transform (DFT) of this polynomial is defined as:

c^j=i=022n1ciωij,where 0j<22n\hat{c}_j = \sum_{i=0}^{2^{2n} - 1} c_i \cdot \omega^{ij}, \quad \text{where } 0 \le j < 2^{2n}

Here, ω\omega is a primitive 22n2^{2n}-th root of unity.

We can decompose this equation using index splitting as follows (where 0j0,j1<2n0 \le j_0, j_1 < 2^n):

c^j12n+j0=i0=02n1i1=02n1ci12n+i0ω(i12n+i0)(j12n+j0)=(i0=02n1(ωi0j0(i1=02n1ci12n+i0(ω2n)(j12n+j0)i1)Step 1)Step 2(ω2n)i0j1)Step 3\begin{align*} \hat{c}_{j_1 \cdot 2^n + j_0} &= \sum_{i_0 = 0}^{2^n - 1} \sum_{i_1 = 0}^{2^n - 1} c_{i_1 \cdot 2^n + i_0} \cdot \omega^{(i_1 \cdot 2^n + i_0)(j_1 \cdot 2^n + j_0)} \\ &= \underbrace{ \left(\sum_{i_0 = 0}^{2^n - 1} \underbrace{\left( \omega^{i_0 j_0} \underbrace{ \left( \sum_{i_1 = 0}^{2^n - 1} c_{i_1 \cdot 2^n + i_0} \cdot (\omega^{2^n})^{(j_1 \cdot 2^n + j_0)i_1} \right)}_{\text{Step 1}} \right)}_{\text{Step 2}} \cdot (\omega^{2^n})^{i_0 j_1}\right)}_{\text{Step 3}} \end{align*}

Example (n=2n=2)

Let P(X)=c0+c1X++c15X15P(X) = c_0 + c_1 X + \cdots + c_{15} X^{15}, and suppose ω16=1\omega^{16} = 1. Define the following input vectors:

c(0)=(c0,c4,c8,c12)c(1)=(c1,c5,c9,c13)c(2)=(c2,c6,c10,c14)c(3)=(c3,c7,c11,c15)\begin{aligned} \mathbf{c}^{(0)} &= (c_0, c_4, c_8, c_{12}) \\ \mathbf{c}^{(1)} &= (c_1, c_5, c_9, c_{13}) \\ \mathbf{c}^{(2)} &= (c_2, c_6, c_{10}, c_{14}) \\ \mathbf{c}^{(3)} &= (c_3, c_7, c_{11}, c_{15}) \\ \end{aligned}

Step 1: Perform FFT on each c(i0)\mathbf{c}^{(i_0)}

c(0)^=(c(0)^0,c(0)^1,c(0)^2,c(0)^3)c(1)^=(c(1)^0,c(1)^1,c(1)^2,c(1)^3)c(2)^=(c(2)^0,c(2)^1,c(2)^2,c(2)^3)c(3)^=(c(3)^0,c(3)^1,c(3)^2,c(3)^3)\begin{aligned} \widehat{\mathbf{c}^{(0)}} &= (\widehat{c^{(0)}}_0, \widehat{c^{(0)}}_1, \widehat{c^{(0)}}_2, \widehat{c^{(0)}}_3) \\ \widehat{\mathbf{c}^{(1)}} &= (\widehat{c^{(1)}}_0, \widehat{c^{(1)}}_1, \widehat{c^{(1)}}_2, \widehat{c^{(1)}}_3) \\ \widehat{\mathbf{c}^{(2)}} &= (\widehat{c^{(2)}}_0, \widehat{c^{(2)}}_1, \widehat{c^{(2)}}_2, \widehat{c^{(2)}}_3) \\ \widehat{\mathbf{c}^{(3)}} &= (\widehat{c^{(3)}}_0, \widehat{c^{(3)}}_1, \widehat{c^{(3)}}_2, \widehat{c^{(3)}}_3) \end{aligned}

Each transform is defined by:

c(i0)^j0=i1=03c4i1+i0(ω4)(4j1+j0)i1\widehat{c^{(i_0)}}_{j_0} = \sum_{i_1 = 0}^3 c_{4 \cdot i_1 + i_0} \cdot (\omega^4)^{(4j_1 + j_0)i_1}

Step 2: Construct z[j0] \mathbf{z}^{[j_0]}

z[0]=(c(0)^0,c(1)^0,c(2)^0,c(3)^0)z[1]=(c(0)^1,ω1c(1)^1,ω2c(2)^1,ω3c(3)^1)z[2]=(c(0)^2,ω2c(1)^2,ω4c(2)^2,ω6c(3)^2)z[3]=(c(0)^3,ω3c(1)^3,ω6c(2)^3,ω9c(3)^3)\begin{aligned} \mathbf{z}^{[0]} &= (\widehat{c^{(0)}}_0, \widehat{c^{(1)}}_0, \widehat{c^{(2)}}_0, \widehat{c^{(3)}}_0) \\ \mathbf{z}^{[1]} &= (\widehat{c^{(0)}}_1, \omega^1 \widehat{c^{(1)}}_1, \omega^2 \widehat{c^{(2)}}_1, \omega^3 \widehat{c^{(3)}}_1) \\ \mathbf{z}^{[2]} &= (\widehat{c^{(0)}}_2, \omega^2 \widehat{c^{(1)}}_2, \omega^4 \widehat{c^{(2)}}_2, \omega^6 \widehat{c^{(3)}}_2) \\ \mathbf{z}^{[3]} &= (\widehat{c^{(0)}}_3, \omega^3 \widehat{c^{(1)}}_3, \omega^6 \widehat{c^{(2)}}_3, \omega^9 \widehat{c^{(3)}}_3) \end{aligned}

Each z[j0] \mathbf{z}^{[j_0]} can be defined generally as:

z[j0]:=(c(0)^j0,ωj0c(1)^j0,ω2j0c(2)^j0,ω3j0c(3)^j0)\mathbf{z}^{[j_0]} := \left( \widehat{c^{(0)}}_{j_0}, \omega^{j_0} \widehat{c^{(1)}}_{j_0}, \omega^{2j_0} \widehat{c^{(2)}}_{j_0}, \omega^{3j_0} \widehat{c^{(3)}}_{j_0} \right)

Step 3: Final FFT on each z[j0] \mathbf{z}^{[j_0]}

z[0]^=(c^0,c^4,c^8,c^12)z[1]^=(c^1,c^5,c^9,c^13)z[2]^=(c^2,c^6,c^10,c^14)z[3]^=(c^3,c^7,c^11,c^15)\begin{aligned} \widehat{\mathbf{z}^{[0]}} &= (\hat{c}_0, \hat{c}_4, \hat{c}_8, \hat{c}_{12}) \\ \widehat{\mathbf{z}^{[1]}} &= (\hat{c}_1, \hat{c}_5, \hat{c}_9, \hat{c}_{13}) \\ \widehat{\mathbf{z}^{[2]}} &= (\hat{c}_2, \hat{c}_6, \hat{c}_{10}, \hat{c}_{14}) \\ \widehat{\mathbf{z}^{[3]}} &= (\hat{c}_3, \hat{c}_7, \hat{c}_{11}, \hat{c}_{15}) \end{aligned}

Each transform is again a standard FFT:

z[j0]^j1=i0=03z[j0]i0(ω4)i0j1\widehat{z^{[j_0]}}_{j_1} = \sum_{i_0 = 0}^3 {z^{[j_0]}}_{i_0} \cdot (\omega^4)^{i_0 j_1}

Verification Examples

  • z[1]^2=c^9 \widehat{z^{[1]}}_2 = \hat{c}_9 (j1=2,j0=1)(j_1 = 2, j_0 = 1)

z[1]^2=z[1]0+z[1]1ω8+z[1]2ω16+z[1]3ω24=c(0)^1+c(1)^1ω9+c(2)^1ω18+c(3)^1ω27=i1=03c4i1(ω4)9i1+i1=03c4i1+1(ω4)9i1ω9+i1=03c4i1+2(ω4)9i1ω18+i1=03c4i1+3(ω4)9i1ω27=c^9 \begin{align*} \widehat{z^{[1]}}_2 &= {z^{[1]}}_0 + {z^{[1]}}_1 \omega^{8} + {z^{[1]}}_2 \omega^{16} + {z^{[1]}}_3 \omega^{24} \\ &= \widehat{c^{(0)}}_1 + \widehat{c^{(1)}}_1 \omega^{9} + \widehat{c^{(2)}}_1 \omega^{18} + \widehat{c^{(3)}}_1 \omega^{27} \\ &= \sum_{i_1=0}^3 c_{4i_1} (\omega^4)^{9i_1} + \sum_{i_1=0}^3 c_{4i_1 + 1} (\omega^4)^{9i_1}\omega^9 + \sum_{i_1=0}^3 c_{4i_1 + 2} (\omega^4)^{9i_1}\omega^{18} + \sum_{i_1=0}^3 c_{4i_1 + 3} (\omega^4)^{9i_1}\omega^{27} \\ &= \hat{c}_9 \end{align*}
  • z[2]^1=c^6\widehat{z^{[2]}}_1 = \hat{c}_6 (j1=1,j0=2)(j_1 = 1, j_0 = 2)

z[2]^1=z[2]0+z[2]1ω4+z[2]2ω8+z[2]3ω12=c(0)^2+c(1)^2ω6+c(2)^2ω12+c(3)^2ω18=i1=03c4i1(ω4)6i1+i1=03c4i1+1(ω4)6i1ω6+i1=03c4i1+2(ω4)6i1ω12+i1=03c4i1+3(ω4)6i1ω18=c^6\begin{align*} \widehat{z^{[2]}}_1 &= {z^{[2]}}_0 + {z^{[2]}}_1 \omega^{4} + {z^{[2]}}_2 \omega^{8} + {z^{[2]}}_3 \omega^{12} \\ &= \widehat{c^{(0)}}_2 + \widehat{c^{(1)}}_2 \omega^{6} + \widehat{c^{(2)}}_2 \omega^{12} + \widehat{c^{(3)}}_2 \omega^{18} \\ &= \sum_{i_1=0}^3 c_{4i_1} (\omega^4)^{6i_1} + \sum_{i_1=0}^3 c_{4i_1 + 1} (\omega^4)^{6i_1} \omega^6 + \sum_{i_1=0}^3 c_{4i_1 + 2} (\omega^4)^{6i_1} \omega^{12} + \sum_{i_1=0}^3 c_{4i_1 + 3} (\omega^4)^{6i_1} \omega^{18} \\ &= \hat{c}_6 \end{align*}

The inverse FFT is structurally similar, but we do not cover it in this article.

Distributed FFT

The Parallel FFT algorithm described above can be implemented as a Distributed FFT by mapping it onto the MapReduce model. The diagram above illustrates how the same example from the parallel FFT is executed within the MapReduce paradigm.

  • Map phase: Each executor performs the first-stage FFTs over c(i0)\mathbf{c}^{(i_0)}

  • Shuffle (transpose + twist): Results are reorganized into z[j0]\mathbf{z}^{[j_0]}

  • Reduce phase: Second-stage FFTs are computed over each z[j0]\mathbf{z}^{[j_0]}

This structure enables FFT computations over extremely large input sizes to be parallelized and scaled effectively in distributed systems like Spark or Hadoop.

Reference

Written by ryan Kim from A41

Last updated