This documentation is automatically generated by online-judge-tools/verification-helper
#include "fft.hpp"
I have learned that classical/original FFT was first used by Gauss in 1805 from Bernstein’s paper. The classical FFT is very simple and elegant. Consider the following two functions:
\[\begin{aligned} \phi&:\mathbb{C}\left\lbrack x\right\rbrack /(x^{2n}-c^2)\to \mathbb{C}\left\lbrack x\right\rbrack /(x^{n}-c) \\ \varphi&:\mathbb{C}\left\lbrack x\right\rbrack /(x^{2n}-c^2)\to \mathbb{C}\left\lbrack x\right\rbrack /(x^{n}+c) \end{aligned}\]Let $f(x):=\sum _ {j=0}^{2n-1}f _ jx^j$, then
\[\begin{aligned} \phi&:f(x)\mapsto f(x)\bmod{\left(x^n-c\right)} \\ \varphi&:f(x)\mapsto f(x)\bmod{\left(x^n+c\right)} \end{aligned}\]If we set $f _ 0(x):=\sum _ {j=0}^{n-1}f_jx^j$ and $f _ 1(x):=\sum _ {j=0}^{n-1}f _ {j+n}x^j$, we will find out that we are actually computing
\[\begin{bmatrix} F _ 0(x) \\ F _ 1(x) \end{bmatrix}= \begin{bmatrix} 1&c \\ 1&-c \end{bmatrix} \begin{bmatrix} f _ 0(x) \\ f _ 1(x) \end{bmatrix}\]and we simply know that this step is invertible if $\begin{bmatrix}1&c \\ 1&-c \end{bmatrix}$ is invertible.
We use the notation $\zeta _ n=\exp\left(\frac{2\pi\mathrm{i}}{2}\right)$. For the number theory transformation (a.k.a. NTT), we mostly use prime numbers called FFT primes $p$ and computing in field $\mathbb{F} _ p$, which has element of order $2^k$ with sufficient large $k$, so we could compute DFT for polynomials over $\mathbb{F} _ p\left\lbrack x\right\rbrack$ of length $2^k$.
Use the step iteratively, we could get $f(x)\bmod{\left(x-d\right)}=f(d)$ for several $d$s. The whole algorithm is called radix-2 FFT, which could compute the size-$n$ DFT where $n$ is a power of $2$.
\[\begin{aligned} \operatorname{\mathsf{DFT}} _ n &: A(x)\bmod{\left(x^n-1\right)} \mapsto \begin{bmatrix} A(1) & A\left(\zeta _ {n}\right) & \cdots & A\left(\zeta _ n^{n-1}\right)\end{bmatrix} \\ \operatorname{\mathsf{IDFT}} _ n &: \begin{bmatrix} A(1) & A\left(\zeta _ {n}\right) & \cdots & A\left(\zeta _ n^{n-1}\right)\end{bmatrix} \mapsto A(x)\bmod{\left(x^n-1\right)} \end{aligned}\]We don’t need to explain how to compute IDFT which is the inverse function of DFT, just note that $\begin{bmatrix}1&c \\ 1&-c \end{bmatrix}^{-1}=\dfrac{1}{2}\begin{bmatrix}1&c^{-1} \\ 1&-c^{-1} \end{bmatrix}^{\intercal}$. But there is a little difference between Gauss’s classical FFT algorithm and the definition of DFT, we will finally have the result in a bit-reversed permutation of $\begin{bmatrix} A(1) & A\left(\zeta _ {n}\right) & \cdots & A\left(\zeta _ n^{n-1}\right)\end{bmatrix}$.
By the way, if we don’t take the inversion of entries of $\frac{1}{2}\begin{bmatrix}1&c^{-1} \\ 1&-c^{-1} \end{bmatrix}^{\intercal}$, we will get the transposed algorithm of IDFT which is
\[\operatorname{\mathsf{IDFT}} _ n^{\intercal}\left(A(x)\right)=\frac{1}{n}\operatorname{\mathsf{DFT}} _ n\left(x^nA\left(x^{-1}\right)\right)\]and the transposed algorithm of DFT is just the inverse function of the transposed algorithm of IDFT.
Before introducing the FFT algorithm formally, I would like to introduce the root array first, which is important and useful in FFT.
Firstly, we know that we should have $\zeta _ n^k$ computed for $k=0,\dots,n-1$, and they are used in bit-reversed order. But it is not a good idea that we just compute all of them, there is a better way to compute and store them. We define the function $R:\mathbb{N}\to\mathbb{C}$ by
\[R(k)=\begin{cases} 1,&\text{if }k=0, \\ \zeta _ {4k},&\text{if }k\text{ is a power of }2, \\ R\left(k-2^{\lfloor\log _ 2 k\rfloor}\right)R\left(2^{\lfloor\log _ 2 k\rfloor}\right),&\text{otherwise.} \end{cases}\]and first $6$ of them are
\[\begin{aligned} R(0)&=1, \\ R(1)&=\zeta _ 4=\mathrm{i}, \\ R(2)&=\zeta _ 8, \\ R(3)&=\zeta _ 8\zeta _ 4=\zeta _ 8^3, \\ R(4)&=\zeta _ {16}, \\ R(5)&=\zeta _ {16}\zeta _ 4=\zeta _ {16}^5 \end{aligned}\]which could generate $\zeta _ n^k$ for $k=0,\dots,n-1$, and only need half of the storage.
It’s not hard for anyone I think, so I just give the pseudocode.
\[\begin{array}{ll} &\textbf{Algorithm }\operatorname{\mathsf{FFT}}\text{:} \\ &\textbf{Input}\text{: }A(x)=\sum _ {j=0}^{n-1}a _ jx^j,n\in 2^{\mathbb{N}}\text{.} \\ &\textbf{Output}\text{: }\begin{bmatrix} A(1) & A(-1) & \cdots & A\left(\zeta _ {n}^{n-1}\right)\end{bmatrix}\text{.} \\ 1&i\gets n \\ 2&\textbf{while }i\geq 2\textbf{ do} \\ 3&\qquad j\gets 0 \\ 4&\qquad \textbf{while }j\lt n\textbf{ do} \\ 5&\qquad \qquad \textbf{for }k=0,\dots,i/2-1\textbf{ do} \\ 6&\qquad \qquad \qquad \begin{bmatrix}a _ {j+k} \\ a _ {j+k+i/2}\end{bmatrix}\gets \begin{bmatrix}1&R(j/i) \\ 1&-R(j/i)\end{bmatrix}\begin{bmatrix}a _ {j+k} \\ a _ {j+k+i/2}\end{bmatrix} \\ 7&\qquad \qquad \textbf{end for} \\ 8&\qquad \qquad j\gets j+i \\ 9&\qquad \textbf{end while} \\ 10&\qquad i\gets i/2 \\ 11&\textbf{end while} \\ 12&\textbf{return }\begin{bmatrix} a _ 0 & a _ 1 & \cdots & a _ {n-1} \end{bmatrix} \end{array}\]Note that $\begin{bmatrix} A(1) & A(-1) & \cdots & A\left(\zeta _ {n}^{n-1}\right)\end{bmatrix} =\begin{bmatrix} A\left(R(0)\right) & A\left(-R(0)\right) & \cdots & A\left(-R(\left\lfloor n/2\right\rfloor)\right)\end{bmatrix}$, that’s why the root array is useful and can be reused even for different $n$.
$\begin{bmatrix} A(1) & A\left(\zeta _ {n}\right) & \cdots & A\left(\zeta _ n^{n-1}\right)\end{bmatrix}$ is bit-reversed permutation of $\begin{bmatrix} A(1) & A(-1) & \cdots & A\left(\zeta _ {n}^{n-1}\right)\end{bmatrix}$.
I suggest to use the following C++ code to make the bit-reversed permutation.
#include <cassert>
#include <utility>
template<typename Iterator> void revbin(Iterator a, int n) {
assert((n & (n - 1)) == 0);
if (n == 0) return;
for (int i = 0, j = 0;;) {
if (i < j) std::swap(a[i], a[j]);
if (++i == n) break;
for (int k = n >> 1; ((j ^= k) & k) == 0; k >>= 1) {}
}
}
Note that the input is in bit-reversed order, so we are able to avoid explicit bit-reverse step by using the above pseudocode.
These tricks could be used in various algorithms, and they are very simple. If one use the Gauss’s original FFT algorithm, $A(-c)$ is next to $A(c)$.
\[\begin{array}{ll} &\textbf{Algorithm }\operatorname{\mathsf{FFTEven}}\text{:} \\ &\textbf{Input}\text{: }\begin{bmatrix} A(1) & A(-1) & \cdots & A\left(\zeta _ {2n}^{2n-1}\right) \end{bmatrix},n\text{ is a power of }2,\deg A\lt 2n\text{.} \\ &\textbf{Output}\text{: }\begin{bmatrix} B(1) & B(-1) & \cdots & B\left(\zeta _ {n}^{n-1}\right)\end{bmatrix}\text{ where }B(x^2)=\frac{1}{2}\left(A(x)+A(-x)\right)\text{.} \\ 1&\textbf{return }\frac{1}{2}\begin{bmatrix} A(1)+A(-1) & A(\mathrm{i})+A(-\mathrm{i}) & \cdots & A\left(\zeta _ {2n}^{n-1}\right)+A\left(-\zeta _ {2n}^{n-1}\right)\end{bmatrix} \end{array}\] \[\begin{array}{ll} &\textbf{Algorithm }\operatorname{\mathsf{FFTOdd}}\text{:} \\ &\textbf{Input}\text{: }\begin{bmatrix} A(1) & A(-1) & \cdots & A\left(\zeta _ {2n}^{2n-1}\right) \end{bmatrix},n\text{ is a power of }2,\deg A\lt 2n\text{.} \\ &\textbf{Output}\text{: }\begin{bmatrix} B(1) & B(-1) & \cdots & B\left(\zeta _ {n}^{n-1}\right)\end{bmatrix}\text{ where }B(x^2)=\frac{1}{2x}\left(A(x)-A(-x)\right)\text{.} \\ 1&\textbf{return }\frac{1}{2}\begin{bmatrix} \left(A(1)-A(-1)\right)/1 & \left(A(\mathrm{i})-A(-\mathrm{i})\right)/\mathrm{i} & \cdots & \left(A\left(\zeta _ {2n}^{n-1}\right)-A\left(-\zeta _ {2n}^{n-1}\right)\right)/\zeta _ {2n}^{n-1}\end{bmatrix} \end{array}\]FFT doubling is described in another document.
Consider the simple case first: Given $\operatorname{\mathsf{FFT}} _ n\left(A(x)\right)$ and define $A(x):=\sum _ {j=0}^{n-1}a _ j x^j$, but we don’t know the coefficients of $A(x)$ and we want to know $A(0)$, this could be done in $O(n)$. Consider $A(x)+A(-x)=2\sum _ {j=0}^{n/2-1}a _ {2j}x^{2j}=B\left(x^2\right)$ and $B(1)+B(-1)=A(1)+A(-1)+A(\mathrm{i})+A(-\mathrm{i})$, finally we will have $\sum _ {j=0}^{n-1}A\left(\zeta _ {n}^j\right)=nA(0)$. Since we can compute $\operatorname{\mathsf{FFT}} _ n\left(x^kA(x)\bmod{\left(x^n - 1\right)}\right)$ in $O(n)$, we are able to extract or modify single coefficient of FFT.
#include "fft.hpp" // this file
#include <cassert>
#include <iterator>
#include <utility>
#include <vector>
template<typename Iterator> void revbin(Iterator a, int n) {
assert((n & (n - 1)) == 0);
if (n == 0) return;
for (int i = 0, j = 0;;) {
if (i < j) std::swap(a[i], a[j]);
if (++i == n) break;
for (int k = n >> 1; ((j ^= k) & k) == 0; k >>= 1) {}
}
}
template<typename Iterator>
typename std::iterator_traits<Iterator>::value_type extract_coeff_from_fft(Iterator a, int n,
int k) {
using Tp = typename std::iterator_traits<Iterator>::value_type;
assert(k < n);
if (n == 0) return 0;
assert((n & (n - 1)) == 0);
std::vector<Tp> revbin_pow_table(n);
{
Tp v = 1;
const auto t = FftInfo<Tp>::get().inv_root(n / 2).at(n / 4);
std::vector<Tp> pow_table(n);
for (int i = 0; i < n; ++i) pow_table[i] = v, v *= t;
for (int i = 0; i < n; ++i) revbin_pow_table[i] = pow_table[((long long)k * i) & (n - 1)];
revbin(revbin_pow_table.begin(), n);
}
Tp res;
for (int i = 0; i < n; ++i) res += a[i] * revbin_pow_table[i];
return res / n;
}
We are given $\operatorname{\mathsf{FFT}} _ n\left(A(x)\right)$, and we want to compute $\operatorname{\mathsf{FFT}} _ n\left(x^nA\left(x^{-1}\right)\right)$. It could be done in $O(n)$.
#include <algorithm>
#include <cassert>
#include <utility>
template<typename Iterator> void revbin(Iterator a, int n) {
assert((n & (n - 1)) == 0);
if (n == 0) return;
for (int i = 0, j = 0;;) {
if (i < j) std::swap(a[i], a[j]);
if (++i == n) break;
for (int k = n >> 1; ((j ^= k) & k) == 0; k >>= 1) {}
}
}
template<typename Iterator> void transform(Iterator a, int n) {
assert((n & (n - 1)) == 0);
if (n == 0) return;
revbin(a, n);
std::reverse(a + 1, a + n);
revbin(a, n);
}
The transformation can be viewed as
\[\begin{aligned} \mathbb{C}\left\lbrack x\right\rbrack/\left(x^{4m} - b^4\right) &\to \mathbb{C}\left\lbrack x\right\rbrack/\left(x^{m} - b\right) \\ &\times \mathbb{C}\left\lbrack x\right\rbrack/\left(x^{m} + b\right) \\ &\times \mathbb{C}\left\lbrack x\right\rbrack/\left(x^{m} - \mathrm{i} b\right) \\ &\times \mathbb{C}\left\lbrack x\right\rbrack/\left(x^{m} + \mathrm{i} b\right) \\ \end{aligned}\]showed by Bernstein. If one use the matrix to represent one step, it’s very easy to implement. We omit the details which are verbose but simple.
#pragma once
#include <algorithm>
#include <cassert>
#include <iterator>
#include <memory>
#include <vector>
template<typename Tp> class FftInfo {
static Tp least_quadratic_nonresidue() {
for (int i = 2;; ++i)
if (Tp(i).pow((Tp::mod() - 1) / 2) == -1) return Tp(i);
}
const int ordlog2_;
const Tp zeta_;
const Tp invzeta_;
const Tp imag_;
const Tp invimag_;
mutable std::vector<Tp> root_;
mutable std::vector<Tp> invroot_;
FftInfo()
: ordlog2_(__builtin_ctzll(Tp::mod() - 1)),
zeta_(least_quadratic_nonresidue().pow((Tp::mod() - 1) >> ordlog2_)),
invzeta_(zeta_.inv()), imag_(zeta_.pow(1LL << (ordlog2_ - 2))), invimag_(-imag_),
root_{Tp(1), imag_}, invroot_{Tp(1), invimag_} {}
public:
static const FftInfo &get() {
static FftInfo info;
return info;
}
Tp imag() const { return imag_; }
Tp inv_imag() const { return invimag_; }
Tp zeta() const { return zeta_; }
Tp inv_zeta() const { return invzeta_; }
const std::vector<Tp> &root(int n) const {
// [0, n)
assert((n & (n - 1)) == 0);
if (const int s = root_.size(); s < n) {
root_.resize(n);
for (int i = __builtin_ctz(s); (1 << i) < n; ++i) {
const int j = 1 << i;
root_[j] = zeta_.pow(1LL << (ordlog2_ - i - 2));
for (int k = j + 1; k < j * 2; ++k) root_[k] = root_[k - j] * root_[j];
}
}
return root_;
}
const std::vector<Tp> &inv_root(int n) const {
// [0, n)
assert((n & (n - 1)) == 0);
if (const int s = invroot_.size(); s < n) {
invroot_.resize(n);
for (int i = __builtin_ctz(s); (1 << i) < n; ++i) {
const int j = 1 << i;
invroot_[j] = invzeta_.pow(1LL << (ordlog2_ - i - 2));
for (int k = j + 1; k < j * 2; ++k) invroot_[k] = invroot_[k - j] * invroot_[j];
}
}
return invroot_;
}
};
inline int fft_len(int n) {
--n;
n |= n >> 1, n |= n >> 2, n |= n >> 4, n |= n >> 8;
return (n | n >> 16) + 1;
}
namespace detail {
template<typename Iterator> inline void
butterfly_n(Iterator a, int n,
const std::vector<typename std::iterator_traits<Iterator>::value_type> &root) {
assert(n > 0);
assert((n & (n - 1)) == 0);
const int bn = __builtin_ctz(n);
if (bn & 1) {
for (int i = 0; i < n / 2; ++i) {
const auto a0 = a[i], a1 = a[i + n / 2];
a[i] = a0 + a1, a[i + n / 2] = a0 - a1;
}
}
for (int i = n >> (bn & 1); i >= 4; i /= 4) {
const int i4 = i / 4;
for (int k = 0; k < i4; ++k) {
const auto a0 = a[k + i4 * 0], a1 = a[k + i4 * 1];
const auto a2 = a[k + i4 * 2], a3 = a[k + i4 * 3];
const auto a02p = a0 + a2, a02m = a0 - a2;
const auto a13p = a1 + a3, a13m = (a1 - a3) * root[1];
a[k + i4 * 0] = a02p + a13p, a[k + i4 * 1] = a02p - a13p;
a[k + i4 * 2] = a02m + a13m, a[k + i4 * 3] = a02m - a13m;
}
for (int j = i, m = 2; j < n; j += i, m += 2) {
const auto r = root[m], r2 = r * r, r3 = r2 * r;
for (int k = j; k < j + i4; ++k) {
const auto a0 = a[k + i4 * 0], a1 = a[k + i4 * 1] * r;
const auto a2 = a[k + i4 * 2] * r2, a3 = a[k + i4 * 3] * r3;
const auto a02p = a0 + a2, a02m = a0 - a2;
const auto a13p = a1 + a3, a13m = (a1 - a3) * root[1];
a[k + i4 * 0] = a02p + a13p, a[k + i4 * 1] = a02p - a13p;
a[k + i4 * 2] = a02m + a13m, a[k + i4 * 3] = a02m - a13m;
}
}
}
}
template<typename Iterator> inline void
inv_butterfly_n(Iterator a, int n,
const std::vector<typename std::iterator_traits<Iterator>::value_type> &root) {
assert(n > 0);
assert((n & (n - 1)) == 0);
const int bn = __builtin_ctz(n);
for (int i = 4; i <= (n >> (bn & 1)); i *= 4) {
const int i4 = i / 4;
for (int k = 0; k < i4; ++k) {
const auto a0 = a[k + i4 * 0], a1 = a[k + i4 * 1];
const auto a2 = a[k + i4 * 2], a3 = a[k + i4 * 3];
const auto a01p = a0 + a1, a01m = a0 - a1;
const auto a23p = a2 + a3, a23m = (a2 - a3) * root[1];
a[k + i4 * 0] = a01p + a23p, a[k + i4 * 1] = a01m + a23m;
a[k + i4 * 2] = a01p - a23p, a[k + i4 * 3] = a01m - a23m;
}
for (int j = i, m = 2; j < n; j += i, m += 2) {
const auto r = root[m], r2 = r * r, r3 = r2 * r;
for (int k = j; k < j + i4; ++k) {
const auto a0 = a[k + i4 * 0], a1 = a[k + i4 * 1];
const auto a2 = a[k + i4 * 2], a3 = a[k + i4 * 3];
const auto a01p = a0 + a1, a01m = a0 - a1;
const auto a23p = a2 + a3, a23m = (a2 - a3) * root[1];
a[k + i4 * 0] = a01p + a23p, a[k + i4 * 1] = (a01m + a23m) * r;
a[k + i4 * 2] = (a01p - a23p) * r2, a[k + i4 * 3] = (a01m - a23m) * r3;
}
}
}
if (bn & 1) {
for (int i = 0; i < n / 2; ++i) {
const auto a0 = a[i], a1 = a[i + n / 2];
a[i] = a0 + a1, a[i + n / 2] = a0 - a1;
}
}
}
} // namespace detail
// FFT_n: A(x) |-> bit-reversed order of [A(1), A(zeta_n), ..., A(zeta_n^(n-1))]
template<typename Iterator> inline void fft_n(Iterator a, int n) {
using Tp = typename std::iterator_traits<Iterator>::value_type;
detail::butterfly_n(a, n, FftInfo<Tp>::get().root(n / 2));
}
template<typename Tp> inline void fft(std::vector<Tp> &a) { fft_n(a.begin(), a.size()); }
// IFFT_n: bit-reversed order of [A(1), A(zeta_n), ..., A(zeta_n^(n-1))] |-> A(x)
template<typename Iterator> inline void inv_fft_n(Iterator a, int n) {
using Tp = typename std::iterator_traits<Iterator>::value_type;
detail::inv_butterfly_n(a, n, FftInfo<Tp>::get().inv_root(n / 2));
const Tp iv = Tp::mod() - (Tp::mod() - 1) / n;
for (int i = 0; i < n; ++i) a[i] *= iv;
}
template<typename Tp> inline void inv_fft(std::vector<Tp> &a) { inv_fft_n(a.begin(), a.size()); }
// IFFT_n^T: A(x) |-> 1/n FFT_n((x^n A(x^(-1))) mod (x^n - 1))
template<typename Iterator> inline void transposed_inv_fft_n(Iterator a, int n) {
using Tp = typename std::iterator_traits<Iterator>::value_type;
const Tp iv = Tp::mod() - (Tp::mod() - 1) / n;
for (int i = 0; i < n; ++i) a[i] *= iv;
detail::butterfly_n(a, n, FftInfo<Tp>::get().inv_root(n / 2));
}
template<typename Tp> inline void transposed_inv_fft(std::vector<Tp> &a) {
transposed_inv_fft_n(a.begin(), a.size());
}
// FFT_n^T : FFT_n((x^n A(x^(-1))) mod (x^n - 1)) |-> n A(x)
template<typename Iterator> inline void transposed_fft_n(Iterator a, int n) {
using Tp = typename std::iterator_traits<Iterator>::value_type;
detail::inv_butterfly_n(a, n, FftInfo<Tp>::get().root(n / 2));
}
template<typename Tp> inline void transposed_fft(std::vector<Tp> &a) {
transposed_fft_n(a.begin(), a.size());
}
template<typename Tp> inline std::vector<Tp> convolution_fft(std::vector<Tp> a, std::vector<Tp> b) {
if (a.empty() || b.empty()) return {};
const int n = a.size();
const int m = b.size();
const int len = fft_len(n + m - 1);
a.resize(len);
b.resize(len);
fft(a);
fft(b);
for (int i = 0; i < len; ++i) a[i] *= b[i];
inv_fft(a);
a.resize(n + m - 1);
return a;
}
template<typename Tp> inline std::vector<Tp> square_fft(std::vector<Tp> a) {
if (a.empty()) return {};
const int n = a.size();
const int len = fft_len(n * 2 - 1);
a.resize(len);
fft(a);
for (int i = 0; i < len; ++i) a[i] *= a[i];
inv_fft(a);
a.resize(n * 2 - 1);
return a;
}
template<typename Tp>
inline std::vector<Tp> convolution_naive(const std::vector<Tp> &a, const std::vector<Tp> &b) {
if (a.empty() || b.empty()) return {};
const int n = a.size();
const int m = b.size();
std::vector<Tp> res(n + m - 1);
for (int i = 0; i < n; ++i)
for (int j = 0; j < m; ++j) res[i + j] += a[i] * b[j];
return res;
}
template<typename Tp>
inline std::vector<Tp> convolution(const std::vector<Tp> &a, const std::vector<Tp> &b) {
if (std::min(a.size(), b.size()) < 60) return convolution_naive(a, b);
if (std::addressof(a) == std::addressof(b)) return square_fft(a);
return convolution_fft(a, b);
}
#line 2 "fft.hpp"
#include <algorithm>
#include <cassert>
#include <iterator>
#include <memory>
#include <vector>
template<typename Tp> class FftInfo {
static Tp least_quadratic_nonresidue() {
for (int i = 2;; ++i)
if (Tp(i).pow((Tp::mod() - 1) / 2) == -1) return Tp(i);
}
const int ordlog2_;
const Tp zeta_;
const Tp invzeta_;
const Tp imag_;
const Tp invimag_;
mutable std::vector<Tp> root_;
mutable std::vector<Tp> invroot_;
FftInfo()
: ordlog2_(__builtin_ctzll(Tp::mod() - 1)),
zeta_(least_quadratic_nonresidue().pow((Tp::mod() - 1) >> ordlog2_)),
invzeta_(zeta_.inv()), imag_(zeta_.pow(1LL << (ordlog2_ - 2))), invimag_(-imag_),
root_{Tp(1), imag_}, invroot_{Tp(1), invimag_} {}
public:
static const FftInfo &get() {
static FftInfo info;
return info;
}
Tp imag() const { return imag_; }
Tp inv_imag() const { return invimag_; }
Tp zeta() const { return zeta_; }
Tp inv_zeta() const { return invzeta_; }
const std::vector<Tp> &root(int n) const {
// [0, n)
assert((n & (n - 1)) == 0);
if (const int s = root_.size(); s < n) {
root_.resize(n);
for (int i = __builtin_ctz(s); (1 << i) < n; ++i) {
const int j = 1 << i;
root_[j] = zeta_.pow(1LL << (ordlog2_ - i - 2));
for (int k = j + 1; k < j * 2; ++k) root_[k] = root_[k - j] * root_[j];
}
}
return root_;
}
const std::vector<Tp> &inv_root(int n) const {
// [0, n)
assert((n & (n - 1)) == 0);
if (const int s = invroot_.size(); s < n) {
invroot_.resize(n);
for (int i = __builtin_ctz(s); (1 << i) < n; ++i) {
const int j = 1 << i;
invroot_[j] = invzeta_.pow(1LL << (ordlog2_ - i - 2));
for (int k = j + 1; k < j * 2; ++k) invroot_[k] = invroot_[k - j] * invroot_[j];
}
}
return invroot_;
}
};
inline int fft_len(int n) {
--n;
n |= n >> 1, n |= n >> 2, n |= n >> 4, n |= n >> 8;
return (n | n >> 16) + 1;
}
namespace detail {
template<typename Iterator> inline void
butterfly_n(Iterator a, int n,
const std::vector<typename std::iterator_traits<Iterator>::value_type> &root) {
assert(n > 0);
assert((n & (n - 1)) == 0);
const int bn = __builtin_ctz(n);
if (bn & 1) {
for (int i = 0; i < n / 2; ++i) {
const auto a0 = a[i], a1 = a[i + n / 2];
a[i] = a0 + a1, a[i + n / 2] = a0 - a1;
}
}
for (int i = n >> (bn & 1); i >= 4; i /= 4) {
const int i4 = i / 4;
for (int k = 0; k < i4; ++k) {
const auto a0 = a[k + i4 * 0], a1 = a[k + i4 * 1];
const auto a2 = a[k + i4 * 2], a3 = a[k + i4 * 3];
const auto a02p = a0 + a2, a02m = a0 - a2;
const auto a13p = a1 + a3, a13m = (a1 - a3) * root[1];
a[k + i4 * 0] = a02p + a13p, a[k + i4 * 1] = a02p - a13p;
a[k + i4 * 2] = a02m + a13m, a[k + i4 * 3] = a02m - a13m;
}
for (int j = i, m = 2; j < n; j += i, m += 2) {
const auto r = root[m], r2 = r * r, r3 = r2 * r;
for (int k = j; k < j + i4; ++k) {
const auto a0 = a[k + i4 * 0], a1 = a[k + i4 * 1] * r;
const auto a2 = a[k + i4 * 2] * r2, a3 = a[k + i4 * 3] * r3;
const auto a02p = a0 + a2, a02m = a0 - a2;
const auto a13p = a1 + a3, a13m = (a1 - a3) * root[1];
a[k + i4 * 0] = a02p + a13p, a[k + i4 * 1] = a02p - a13p;
a[k + i4 * 2] = a02m + a13m, a[k + i4 * 3] = a02m - a13m;
}
}
}
}
template<typename Iterator> inline void
inv_butterfly_n(Iterator a, int n,
const std::vector<typename std::iterator_traits<Iterator>::value_type> &root) {
assert(n > 0);
assert((n & (n - 1)) == 0);
const int bn = __builtin_ctz(n);
for (int i = 4; i <= (n >> (bn & 1)); i *= 4) {
const int i4 = i / 4;
for (int k = 0; k < i4; ++k) {
const auto a0 = a[k + i4 * 0], a1 = a[k + i4 * 1];
const auto a2 = a[k + i4 * 2], a3 = a[k + i4 * 3];
const auto a01p = a0 + a1, a01m = a0 - a1;
const auto a23p = a2 + a3, a23m = (a2 - a3) * root[1];
a[k + i4 * 0] = a01p + a23p, a[k + i4 * 1] = a01m + a23m;
a[k + i4 * 2] = a01p - a23p, a[k + i4 * 3] = a01m - a23m;
}
for (int j = i, m = 2; j < n; j += i, m += 2) {
const auto r = root[m], r2 = r * r, r3 = r2 * r;
for (int k = j; k < j + i4; ++k) {
const auto a0 = a[k + i4 * 0], a1 = a[k + i4 * 1];
const auto a2 = a[k + i4 * 2], a3 = a[k + i4 * 3];
const auto a01p = a0 + a1, a01m = a0 - a1;
const auto a23p = a2 + a3, a23m = (a2 - a3) * root[1];
a[k + i4 * 0] = a01p + a23p, a[k + i4 * 1] = (a01m + a23m) * r;
a[k + i4 * 2] = (a01p - a23p) * r2, a[k + i4 * 3] = (a01m - a23m) * r3;
}
}
}
if (bn & 1) {
for (int i = 0; i < n / 2; ++i) {
const auto a0 = a[i], a1 = a[i + n / 2];
a[i] = a0 + a1, a[i + n / 2] = a0 - a1;
}
}
}
} // namespace detail
// FFT_n: A(x) |-> bit-reversed order of [A(1), A(zeta_n), ..., A(zeta_n^(n-1))]
template<typename Iterator> inline void fft_n(Iterator a, int n) {
using Tp = typename std::iterator_traits<Iterator>::value_type;
detail::butterfly_n(a, n, FftInfo<Tp>::get().root(n / 2));
}
template<typename Tp> inline void fft(std::vector<Tp> &a) { fft_n(a.begin(), a.size()); }
// IFFT_n: bit-reversed order of [A(1), A(zeta_n), ..., A(zeta_n^(n-1))] |-> A(x)
template<typename Iterator> inline void inv_fft_n(Iterator a, int n) {
using Tp = typename std::iterator_traits<Iterator>::value_type;
detail::inv_butterfly_n(a, n, FftInfo<Tp>::get().inv_root(n / 2));
const Tp iv = Tp::mod() - (Tp::mod() - 1) / n;
for (int i = 0; i < n; ++i) a[i] *= iv;
}
template<typename Tp> inline void inv_fft(std::vector<Tp> &a) { inv_fft_n(a.begin(), a.size()); }
// IFFT_n^T: A(x) |-> 1/n FFT_n((x^n A(x^(-1))) mod (x^n - 1))
template<typename Iterator> inline void transposed_inv_fft_n(Iterator a, int n) {
using Tp = typename std::iterator_traits<Iterator>::value_type;
const Tp iv = Tp::mod() - (Tp::mod() - 1) / n;
for (int i = 0; i < n; ++i) a[i] *= iv;
detail::butterfly_n(a, n, FftInfo<Tp>::get().inv_root(n / 2));
}
template<typename Tp> inline void transposed_inv_fft(std::vector<Tp> &a) {
transposed_inv_fft_n(a.begin(), a.size());
}
// FFT_n^T : FFT_n((x^n A(x^(-1))) mod (x^n - 1)) |-> n A(x)
template<typename Iterator> inline void transposed_fft_n(Iterator a, int n) {
using Tp = typename std::iterator_traits<Iterator>::value_type;
detail::inv_butterfly_n(a, n, FftInfo<Tp>::get().root(n / 2));
}
template<typename Tp> inline void transposed_fft(std::vector<Tp> &a) {
transposed_fft_n(a.begin(), a.size());
}
template<typename Tp> inline std::vector<Tp> convolution_fft(std::vector<Tp> a, std::vector<Tp> b) {
if (a.empty() || b.empty()) return {};
const int n = a.size();
const int m = b.size();
const int len = fft_len(n + m - 1);
a.resize(len);
b.resize(len);
fft(a);
fft(b);
for (int i = 0; i < len; ++i) a[i] *= b[i];
inv_fft(a);
a.resize(n + m - 1);
return a;
}
template<typename Tp> inline std::vector<Tp> square_fft(std::vector<Tp> a) {
if (a.empty()) return {};
const int n = a.size();
const int len = fft_len(n * 2 - 1);
a.resize(len);
fft(a);
for (int i = 0; i < len; ++i) a[i] *= a[i];
inv_fft(a);
a.resize(n * 2 - 1);
return a;
}
template<typename Tp>
inline std::vector<Tp> convolution_naive(const std::vector<Tp> &a, const std::vector<Tp> &b) {
if (a.empty() || b.empty()) return {};
const int n = a.size();
const int m = b.size();
std::vector<Tp> res(n + m - 1);
for (int i = 0; i < n; ++i)
for (int j = 0; j < m; ++j) res[i + j] += a[i] * b[j];
return res;
}
template<typename Tp>
inline std::vector<Tp> convolution(const std::vector<Tp> &a, const std::vector<Tp> &b) {
if (std::min(a.size(), b.size()) < 60) return convolution_naive(a, b);
if (std::addressof(a) == std::addressof(b)) return square_fft(a);
return convolution_fft(a, b);
}