Parallel FFT
Suppose we are given a polynomial P ( X ) = ∑ i = 0 2 2 n − 1 c i X i P(X) = \sum_{i=0}^{2^{2n} - 1} c_i X^i P ( X ) = ∑ i = 0 2 2 n − 1 c i X i . The Discrete Fourier Transform (DFT) of this polynomial is defined as:
c ^ j = ∑ i = 0 2 2 n − 1 c i ⋅ ω i j , where 0 ≤ j < 2 2 n \hat{c}_j = \sum_{i=0}^{2^{2n} - 1} c_i \cdot \omega^{ij}, \quad \text{where } 0 \le j < 2^{2n} c ^ j = i = 0 ∑ 2 2 n − 1 c i ⋅ ω ij , where 0 ≤ j < 2 2 n Here, ω \omega ω is a primitive 2 2 n 2^{2n} 2 2 n -th root of unity.
We can decompose this equation using index splitting as follows (where 0 ≤ j 0 , j 1 < 2 n 0 \le j_0, j_1 < 2^n 0 ≤ j 0 , j 1 < 2 n ):
c ^ j 1 ⋅ 2 n + j 0 = ∑ i 0 = 0 2 n − 1 ∑ i 1 = 0 2 n − 1 c i 1 ⋅ 2 n + i 0 ⋅ ω ( i 1 ⋅ 2 n + i 0 ) ( j 1 ⋅ 2 n + j 0 ) = ( ∑ i 0 = 0 2 n − 1 ( ω i 0 j 0 ( ∑ i 1 = 0 2 n − 1 c i 1 ⋅ 2 n + i 0 ⋅ ( ω 2 n ) ( j 1 ⋅ 2 n + j 0 ) i 1 ) ⏟ Step 1 ) ⏟ Step 2 ⋅ ( ω 2 n ) i 0 j 1 ) ⏟ Step 3 \begin{align*}
\hat{c}_{j_1 \cdot 2^n + j_0}
&= \sum_{i_0 = 0}^{2^n - 1} \sum_{i_1 = 0}^{2^n - 1} c_{i_1 \cdot 2^n + i_0} \cdot \omega^{(i_1 \cdot 2^n + i_0)(j_1 \cdot 2^n + j_0)} \\
&= \underbrace{
\left(\sum_{i_0 = 0}^{2^n - 1}
\underbrace{\left( \omega^{i_0 j_0}
\underbrace{
\left(
\sum_{i_1 = 0}^{2^n - 1} c_{i_1 \cdot 2^n + i_0} \cdot (\omega^{2^n})^{(j_1 \cdot 2^n + j_0)i_1}
\right)}_{\text{Step 1}}
\right)}_{\text{Step 2}}
\cdot (\omega^{2^n})^{i_0 j_1}\right)}_{\text{Step 3}}
\end{align*} c ^ j 1 ⋅ 2 n + j 0 = i 0 = 0 ∑ 2 n − 1 i 1 = 0 ∑ 2 n − 1 c i 1 ⋅ 2 n + i 0 ⋅ ω ( i 1 ⋅ 2 n + i 0 ) ( j 1 ⋅ 2 n + j 0 ) = Step 3 i 0 = 0 ∑ 2 n − 1 Step 2 ω i 0 j 0 Step 1 ( i 1 = 0 ∑ 2 n − 1 c i 1 ⋅ 2 n + i 0 ⋅ ( ω 2 n ) ( j 1 ⋅ 2 n + j 0 ) i 1 ) ⋅ ( ω 2 n ) i 0 j 1 Let P ( X ) = c 0 + c 1 X + ⋯ + c 15 X 15 P(X) = c_0 + c_1 X + \cdots + c_{15} X^{15} P ( X ) = c 0 + c 1 X + ⋯ + c 15 X 15 , and suppose ω 16 = 1 \omega^{16} = 1 ω 16 = 1 . Define the following input vectors:
c ( 0 ) = ( c 0 , c 4 , c 8 , c 12 ) c ( 1 ) = ( c 1 , c 5 , c 9 , c 13 ) c ( 2 ) = ( c 2 , c 6 , c 10 , c 14 ) c ( 3 ) = ( c 3 , c 7 , c 11 , c 15 ) \begin{aligned}
\mathbf{c}^{(0)} &= (c_0, c_4, c_8, c_{12}) \\
\mathbf{c}^{(1)} &= (c_1, c_5, c_9, c_{13}) \\
\mathbf{c}^{(2)} &= (c_2, c_6, c_{10}, c_{14}) \\
\mathbf{c}^{(3)} &= (c_3, c_7, c_{11}, c_{15}) \\
\end{aligned} c ( 0 ) c ( 1 ) c ( 2 ) c ( 3 ) = ( c 0 , c 4 , c 8 , c 12 ) = ( c 1 , c 5 , c 9 , c 13 ) = ( c 2 , c 6 , c 10 , c 14 ) = ( c 3 , c 7 , c 11 , c 15 ) c ( 0 ) ^ = ( c ( 0 ) ^ 0 , c ( 0 ) ^ 1 , c ( 0 ) ^ 2 , c ( 0 ) ^ 3 ) c ( 1 ) ^ = ( c ( 1 ) ^ 0 , c ( 1 ) ^ 1 , c ( 1 ) ^ 2 , c ( 1 ) ^ 3 ) c ( 2 ) ^ = ( c ( 2 ) ^ 0 , c ( 2 ) ^ 1 , c ( 2 ) ^ 2 , c ( 2 ) ^ 3 ) c ( 3 ) ^ = ( c ( 3 ) ^ 0 , c ( 3 ) ^ 1 , c ( 3 ) ^ 2 , c ( 3 ) ^ 3 ) \begin{aligned}
\widehat{\mathbf{c}^{(0)}} &= (\widehat{c^{(0)}}_0, \widehat{c^{(0)}}_1, \widehat{c^{(0)}}_2, \widehat{c^{(0)}}_3) \\
\widehat{\mathbf{c}^{(1)}} &= (\widehat{c^{(1)}}_0, \widehat{c^{(1)}}_1, \widehat{c^{(1)}}_2, \widehat{c^{(1)}}_3) \\
\widehat{\mathbf{c}^{(2)}} &= (\widehat{c^{(2)}}_0, \widehat{c^{(2)}}_1, \widehat{c^{(2)}}_2, \widehat{c^{(2)}}_3) \\
\widehat{\mathbf{c}^{(3)}} &= (\widehat{c^{(3)}}_0, \widehat{c^{(3)}}_1, \widehat{c^{(3)}}_2, \widehat{c^{(3)}}_3)
\end{aligned} c ( 0 ) c ( 1 ) c ( 2 ) c ( 3 ) = ( c ( 0 ) 0 , c ( 0 ) 1 , c ( 0 ) 2 , c ( 0 ) 3 ) = ( c ( 1 ) 0 , c ( 1 ) 1 , c ( 1 ) 2 , c ( 1 ) 3 ) = ( c ( 2 ) 0 , c ( 2 ) 1 , c ( 2 ) 2 , c ( 2 ) 3 ) = ( c ( 3 ) 0 , c ( 3 ) 1 , c ( 3 ) 2 , c ( 3 ) 3 ) Each transform is defined by:
c ( i 0 ) ^ j 0 = ∑ i 1 = 0 3 c 4 ⋅ i 1 + i 0 ⋅ ( ω 4 ) ( 4 j 1 + j 0 ) i 1 \widehat{c^{(i_0)}}_{j_0} = \sum_{i_1 = 0}^3 c_{4 \cdot i_1 + i_0} \cdot (\omega^4)^{(4j_1 + j_0)i_1} c ( i 0 ) j 0 = i 1 = 0 ∑ 3 c 4 ⋅ i 1 + i 0 ⋅ ( ω 4 ) ( 4 j 1 + j 0 ) i 1 Step 2: Construct z [ j 0 ] \mathbf{z}^{[j_0]} z [ j 0 ] z [ 0 ] = ( c ( 0 ) ^ 0 , c ( 1 ) ^ 0 , c ( 2 ) ^ 0 , c ( 3 ) ^ 0 ) z [ 1 ] = ( c ( 0 ) ^ 1 , ω 1 c ( 1 ) ^ 1 , ω 2 c ( 2 ) ^ 1 , ω 3 c ( 3 ) ^ 1 ) z [ 2 ] = ( c ( 0 ) ^ 2 , ω 2 c ( 1 ) ^ 2 , ω 4 c ( 2 ) ^ 2 , ω 6 c ( 3 ) ^ 2 ) z [ 3 ] = ( c ( 0 ) ^ 3 , ω 3 c ( 1 ) ^ 3 , ω 6 c ( 2 ) ^ 3 , ω 9 c ( 3 ) ^ 3 ) \begin{aligned}
\mathbf{z}^{[0]} &= (\widehat{c^{(0)}}_0, \widehat{c^{(1)}}_0, \widehat{c^{(2)}}_0, \widehat{c^{(3)}}_0) \\
\mathbf{z}^{[1]} &= (\widehat{c^{(0)}}_1, \omega^1 \widehat{c^{(1)}}_1, \omega^2 \widehat{c^{(2)}}_1, \omega^3 \widehat{c^{(3)}}_1) \\
\mathbf{z}^{[2]} &= (\widehat{c^{(0)}}_2, \omega^2 \widehat{c^{(1)}}_2, \omega^4 \widehat{c^{(2)}}_2, \omega^6 \widehat{c^{(3)}}_2) \\
\mathbf{z}^{[3]} &= (\widehat{c^{(0)}}_3, \omega^3 \widehat{c^{(1)}}_3, \omega^6 \widehat{c^{(2)}}_3, \omega^9 \widehat{c^{(3)}}_3)
\end{aligned} z [ 0 ] z [ 1 ] z [ 2 ] z [ 3 ] = ( c ( 0 ) 0 , c ( 1 ) 0 , c ( 2 ) 0 , c ( 3 ) 0 ) = ( c ( 0 ) 1 , ω 1 c ( 1 ) 1 , ω 2 c ( 2 ) 1 , ω 3 c ( 3 ) 1 ) = ( c ( 0 ) 2 , ω 2 c ( 1 ) 2 , ω 4 c ( 2 ) 2 , ω 6 c ( 3 ) 2 ) = ( c ( 0 ) 3 , ω 3 c ( 1 ) 3 , ω 6 c ( 2 ) 3 , ω 9 c ( 3 ) 3 ) Each z [ j 0 ] \mathbf{z}^{[j_0]} z [ j 0 ] can be defined generally as:
z [ j 0 ] : = ( c ( 0 ) ^ j 0 , ω j 0 c ( 1 ) ^ j 0 , ω 2 j 0 c ( 2 ) ^ j 0 , ω 3 j 0 c ( 3 ) ^ j 0 ) \mathbf{z}^{[j_0]} := \left( \widehat{c^{(0)}}_{j_0}, \omega^{j_0} \widehat{c^{(1)}}_{j_0}, \omega^{2j_0} \widehat{c^{(2)}}_{j_0}, \omega^{3j_0} \widehat{c^{(3)}}_{j_0} \right) z [ j 0 ] := ( c ( 0 ) j 0 , ω j 0 c ( 1 ) j 0 , ω 2 j 0 c ( 2 ) j 0 , ω 3 j 0 c ( 3 ) j 0 ) Step 3: Final FFT on each z [ j 0 ] \mathbf{z}^{[j_0]} z [ j 0 ] z [ 0 ] ^ = ( c ^ 0 , c ^ 4 , c ^ 8 , c ^ 12 ) z [ 1 ] ^ = ( c ^ 1 , c ^ 5 , c ^ 9 , c ^ 13 ) z [ 2 ] ^ = ( c ^ 2 , c ^ 6 , c ^ 10 , c ^ 14 ) z [ 3 ] ^ = ( c ^ 3 , c ^ 7 , c ^ 11 , c ^ 15 ) \begin{aligned}
\widehat{\mathbf{z}^{[0]}} &= (\hat{c}_0, \hat{c}_4, \hat{c}_8, \hat{c}_{12}) \\
\widehat{\mathbf{z}^{[1]}} &= (\hat{c}_1, \hat{c}_5, \hat{c}_9, \hat{c}_{13}) \\
\widehat{\mathbf{z}^{[2]}} &= (\hat{c}_2, \hat{c}_6, \hat{c}_{10}, \hat{c}_{14}) \\
\widehat{\mathbf{z}^{[3]}} &= (\hat{c}_3, \hat{c}_7, \hat{c}_{11}, \hat{c}_{15})
\end{aligned} z [ 0 ] z [ 1 ] z [ 2 ] z [ 3 ] = ( c ^ 0 , c ^ 4 , c ^ 8 , c ^ 12 ) = ( c ^ 1 , c ^ 5 , c ^ 9 , c ^ 13 ) = ( c ^ 2 , c ^ 6 , c ^ 10 , c ^ 14 ) = ( c ^ 3 , c ^ 7 , c ^ 11 , c ^ 15 ) Each transform is again a standard FFT:
z [ j 0 ] ^ j 1 = ∑ i 0 = 0 3 z [ j 0 ] i 0 ⋅ ( ω 4 ) i 0 j 1 \widehat{z^{[j_0]}}_{j_1} = \sum_{i_0 = 0}^3 {z^{[j_0]}}_{i_0} \cdot (\omega^4)^{i_0 j_1} z [ j 0 ] j 1 = i 0 = 0 ∑ 3 z [ j 0 ] i 0 ⋅ ( ω 4 ) i 0 j 1 Verification Examples
z [ 1 ] ^ 2 = c ^ 9 \widehat{z^{[1]}}_2 = \hat{c}_9 z [ 1 ] 2 = c ^ 9 ( j 1 = 2 , j 0 = 1 ) (j_1 = 2, j_0 = 1) ( j 1 = 2 , j 0 = 1 )
z [ 1 ] ^ 2 = z [ 1 ] 0 + z [ 1 ] 1 ω 8 + z [ 1 ] 2 ω 16 + z [ 1 ] 3 ω 24 = c ( 0 ) ^ 1 + c ( 1 ) ^ 1 ω 9 + c ( 2 ) ^ 1 ω 18 + c ( 3 ) ^ 1 ω 27 = ∑ i 1 = 0 3 c 4 i 1 ( ω 4 ) 9 i 1 + ∑ i 1 = 0 3 c 4 i 1 + 1 ( ω 4 ) 9 i 1 ω 9 + ∑ i 1 = 0 3 c 4 i 1 + 2 ( ω 4 ) 9 i 1 ω 18 + ∑ i 1 = 0 3 c 4 i 1 + 3 ( ω 4 ) 9 i 1 ω 27 = c ^ 9
\begin{align*}
\widehat{z^{[1]}}_2 &= {z^{[1]}}_0 + {z^{[1]}}_1 \omega^{8} + {z^{[1]}}_2 \omega^{16} + {z^{[1]}}_3 \omega^{24} \\
&= \widehat{c^{(0)}}_1 + \widehat{c^{(1)}}_1 \omega^{9} + \widehat{c^{(2)}}_1 \omega^{18} + \widehat{c^{(3)}}_1 \omega^{27} \\
&= \sum_{i_1=0}^3 c_{4i_1} (\omega^4)^{9i_1} + \sum_{i_1=0}^3 c_{4i_1 + 1} (\omega^4)^{9i_1}\omega^9 + \sum_{i_1=0}^3 c_{4i_1 + 2} (\omega^4)^{9i_1}\omega^{18} + \sum_{i_1=0}^3 c_{4i_1 + 3} (\omega^4)^{9i_1}\omega^{27} \\
&= \hat{c}_9
\end{align*} z [ 1 ] 2 = z [ 1 ] 0 + z [ 1 ] 1 ω 8 + z [ 1 ] 2 ω 16 + z [ 1 ] 3 ω 24 = c ( 0 ) 1 + c ( 1 ) 1 ω 9 + c ( 2 ) 1 ω 18 + c ( 3 ) 1 ω 27 = i 1 = 0 ∑ 3 c 4 i 1 ( ω 4 ) 9 i 1 + i 1 = 0 ∑ 3 c 4 i 1 + 1 ( ω 4 ) 9 i 1 ω 9 + i 1 = 0 ∑ 3 c 4 i 1 + 2 ( ω 4 ) 9 i 1 ω 18 + i 1 = 0 ∑ 3 c 4 i 1 + 3 ( ω 4 ) 9 i 1 ω 27 = c ^ 9 z [ 2 ] ^ 1 = c ^ 6 \widehat{z^{[2]}}_1 = \hat{c}_6 z [ 2 ] 1 = c ^ 6 ( j 1 = 1 , j 0 = 2 ) (j_1 = 1, j_0 = 2) ( j 1 = 1 , j 0 = 2 )
z [ 2 ] ^ 1 = z [ 2 ] 0 + z [ 2 ] 1 ω 4 + z [ 2 ] 2 ω 8 + z [ 2 ] 3 ω 12 = c ( 0 ) ^ 2 + c ( 1 ) ^ 2 ω 6 + c ( 2 ) ^ 2 ω 12 + c ( 3 ) ^ 2 ω 18 = ∑ i 1 = 0 3 c 4 i 1 ( ω 4 ) 6 i 1 + ∑ i 1 = 0 3 c 4 i 1 + 1 ( ω 4 ) 6 i 1 ω 6 + ∑ i 1 = 0 3 c 4 i 1 + 2 ( ω 4 ) 6 i 1 ω 12 + ∑ i 1 = 0 3 c 4 i 1 + 3 ( ω 4 ) 6 i 1 ω 18 = c ^ 6 \begin{align*}
\widehat{z^{[2]}}_1 &= {z^{[2]}}_0 + {z^{[2]}}_1 \omega^{4} + {z^{[2]}}_2 \omega^{8} + {z^{[2]}}_3 \omega^{12} \\
&= \widehat{c^{(0)}}_2 + \widehat{c^{(1)}}_2 \omega^{6} + \widehat{c^{(2)}}_2 \omega^{12} + \widehat{c^{(3)}}_2 \omega^{18} \\
&= \sum_{i_1=0}^3 c_{4i_1} (\omega^4)^{6i_1} + \sum_{i_1=0}^3 c_{4i_1 + 1} (\omega^4)^{6i_1} \omega^6 + \sum_{i_1=0}^3 c_{4i_1 + 2} (\omega^4)^{6i_1} \omega^{12} + \sum_{i_1=0}^3 c_{4i_1 + 3} (\omega^4)^{6i_1} \omega^{18} \\
&= \hat{c}_6
\end{align*} z [ 2 ] 1 = z [ 2 ] 0 + z [ 2 ] 1 ω 4 + z [ 2 ] 2 ω 8 + z [ 2 ] 3 ω 12 = c ( 0 ) 2 + c ( 1 ) 2 ω 6 + c ( 2 ) 2 ω 12 + c ( 3 ) 2 ω 18 = i 1 = 0 ∑ 3 c 4 i 1 ( ω 4 ) 6 i 1 + i 1 = 0 ∑ 3 c 4 i 1 + 1 ( ω 4 ) 6 i 1 ω 6 + i 1 = 0 ∑ 3 c 4 i 1 + 2 ( ω 4 ) 6 i 1 ω 12 + i 1 = 0 ∑ 3 c 4 i 1 + 3 ( ω 4 ) 6 i 1 ω 18 = c ^ 6 The inverse FFT is structurally similar, but we do not cover it in this article.
Distributed FFT
The Parallel FFT algorithm described above can be implemented as a Distributed FFT by mapping it onto the MapReduce model. The diagram above illustrates how the same example from the parallel FFT is executed within the MapReduce paradigm.
Map phase: Each executor performs the first-stage FFTs over c ( i 0 ) \mathbf{c}^{(i_0)} c ( i 0 )
Shuffle (transpose + twist): Results are reorganized into z [ j 0 ] \mathbf{z}^{[j_0]} z [ j 0 ]
Reduce phase: Second-stage FFTs are computed over each z [ j 0 ] \mathbf{z}^{[j_0]} z [ j 0 ]
This structure enables FFT computations over extremely large input sizes to be parallelized and scaled effectively in distributed systems like Spark or Hadoop .
Reference
Written by ryan Kim from A41