C-Finite Sequence
(c_finite.hpp)

C-Finite Sequence

The sequence $\left(a _ n\right) _ {n\geq 0}$ is said to be C-Finite if it has form

\[a _ n=\sum _ {j=1}^d c _ j a _ {n-j}, \qquad (n \geq d)\]

here C-Finite means linear with constant coefficients. $d$ is said to be the order of the recurrent relation. We are given $c _ 1, \dots , c _ d$ and initial terms $a _ 0, \dots, a _ {d-1}$ in $\mathbb{C}$, the following algorithm could help us computing $a _ k$ for $k\gg d$.

We define $\Gamma(x):=x^d-\sum _ {j=0}^{d-1}c _ {d-j}x^j$ to be the characteristic polynomial of $\left(a _ n\right) _ {n\geq 0}$.

Companion matrix

The companion matrix of monic polynomial $\Gamma (x)$ is

\[C_\Gamma:= \begin{bmatrix} &&&c _ d \\ 1&&&c _ {d-1} \\ &\ddots &&\vdots \\ &&1&c _ 1 \end{bmatrix}\]

We define $b(x):=\sum _ {j=0}^{d-1}b _ jx^j$ and

\[B _ b:=\begin{bmatrix}b _ 0 & b _ 1 & \cdots & b _ {d-1}\end{bmatrix}^{\intercal}\]

One could simply check that

\[\underbrace{\begin{bmatrix} &&&c _ d \\ 1&&&c _ {d-1} \\ &\ddots &&\vdots \\ &&1&c _ 1 \end{bmatrix}} _ {C _ \Gamma} \underbrace{\begin{bmatrix} b _ 0 \\ b _ 1 \\ \vdots \\ b _ {d-1} \end{bmatrix}} _ {B _ b}= \underbrace{\begin{bmatrix} c _ d b _ {d-1} \\ b _ 0 + c _ {d-1} b _ {d-1} \\ \vdots \\ b _ {d-2} + c _ 1 b _ {d-1} \end{bmatrix}} _ {B _ {xb\bmod{\Gamma}}}\]

and

\[\begin{aligned} C _ \Gamma &= \begin{bmatrix}B _ {x\bmod{\Gamma}} & B _ {x^2\bmod{\Gamma}} & \cdots & B _ {x^d\bmod{\Gamma}}\end{bmatrix}, \\ \left(C _ \Gamma\right)^2 &= \begin{bmatrix}B _ {x^2\bmod{\Gamma}} & B _ {x^3\bmod{\Gamma}} & \cdots & B _ {x^{d+1}\bmod{\Gamma}}\end{bmatrix}, \\ \vdots \\ \left(C _ \Gamma\right)^k&=\begin{bmatrix}B _ {x^k\bmod{\Gamma}} & B _ {x^{k+1}\bmod{\Gamma}} & \cdots & B _ {x^{k+d}\bmod{\Gamma}}\end{bmatrix} \end{aligned}\]

Reduce the order

Since the order $d$ maybe large, we can use matrix to reduce the order by

\[\begin{bmatrix} a _ {k} \\ a _ {k+1} \\ \vdots \\ a _ {k+d-1} \end{bmatrix}=\underbrace{\begin{bmatrix} &1&& \\ &&\ddots & \\ &&&1 \\ c _ d&c _ {d-1}&\cdots &c _ 1 \end{bmatrix}^k} _ {\left(\left(C _ \Gamma\right)^{\intercal}\right)^k=\left(\left(C _ \Gamma\right)^{k}\right)^{\intercal}} \begin{bmatrix} a _ 0 \\ a _ {1} \\ \vdots \\ a _ {d-1} \end{bmatrix}\]

Using the exponentiation by squaring trick, we could simply have an algorithm with time $O\left(\mathsf{MM}(d)\log k\right)$.

Fiduccia’s algorithm

Fiduccia showed that

\[a _ k=\left\langle x^k\bmod{\Gamma(x)},\begin{bmatrix} a _ 0 & \cdots & a _ {d-1}\end{bmatrix}\right\rangle\]

so we could reduce the matrix exponentiation to modular exponentiation in $\mathbb{C}\left\lbrack x\right\rbrack /\left(\Gamma\right)$, which is apparent.

Bostan–Mori’s algorithm

Bostan and Mori have introduced a new algorithm to compute $x^k\bmod{Q(x)}$ faster. It is called the MSB-first algorithm in their paper, but I would like to show this in a slightly different way which is still equivalent to the original Bostan–Mori’s algorithm.

Graeffe iteration

Before introducing Bostan–Mori’s algorithm, let’s introduce the Graeffe iteration first, which could be used to compute truncated (multiplicative) inverse of a formal power series.

Given $Q(x)\in\mathbb{C}\left\lbrack\left\lbrack x\right\rbrack\right\rbrack$ and $Q(0)\neq 0$. We want to compute $Q(x)^{-1}\bmod{x^{2n}}$, Graeffe iteration is that

\[\begin{aligned} \frac{1}{Q(x)}\bmod{x^{2n}}&=\left(\frac{1}{Q(x)Q(-x)}\bmod{x^{2n}}\right)\cdot Q(-x)\bmod{x^{2n}} \\ &=\left.\left(\frac{1}{V(z)}\bmod{z^{n}}\right)\right| _ {z=x^2}\cdot Q(-x)\bmod{x^{2n}} \end{aligned}\]

where $Q(x)Q(-x)$ is an even function written $V\left(x^2\right)$.

If we are using FFT, we can do better.

\[\begin{array}{ll} &\textbf{Algorithm }\operatorname{\mathsf{FPSInv}}\text{:} \\ &\textbf{Input}\text{: }A(x)\bmod{x^n},n\gt 0,A(0)\neq 0\text{.} \\ &\textbf{Output}\text{: }A(x)^{-1}\bmod{x^n}\text{.} \\ 1&\textbf{if }n=1\textbf{ then return }A(0)^{-1}\\ 2&\ell \gets 2^{\left\lceil\log _ 2 \left(2n-1\right)\right\rceil} \\ 3&\begin{bmatrix} b _ 0 & \cdots & b _ {\ell - 1}\end{bmatrix} \gets \operatorname{\mathsf{FFT}} _ {\ell}\left(A(x)\right) \\ 4&V(x) \gets \operatorname{\mathsf{IFFT}} _ {\ell /2}\left(\begin{bmatrix} b _ 0 b _ 1 & b _ 2 b _ 3 & \cdots & b _ {\ell -2} b _ {\ell - 1}\end{bmatrix}\right) \bmod{x^{\left\lceil n/2\right\rceil}} \\ 5&T(x) \gets \operatorname{\mathsf{FPSInv}}\left(V(x) \bmod{x^{\left\lfloor n/2\right\rfloor}},\left\lfloor n/2\right\rfloor\right) \\ 6&\textbf{if }n \bmod{2} = 1 \textbf{ then} \\ 7&\qquad T(x) \gets T(x)-\left(\left\lbrack x^{\left\lfloor n/2\right\rfloor}\right\rbrack V(x)T(x)\right)V(0)^{-1}x^{\left\lfloor n/2\right\rfloor} \\ 8&\textbf{end if} \\ 9&\begin{bmatrix} c _ 0 & \cdots & c _{\ell/2 - 1}\end{bmatrix} \gets \operatorname{\mathsf{FFT}} _{\ell/2}\left(T(x)\right) \\ 10&U(x) \gets \operatorname{\mathsf{IFFT}} _ {\ell}\left(\begin{bmatrix} b _ 0 c _ 0 & b _ 1 c _ 0 & \cdots & b _ {\ell - 1} c _ {\ell/2 - 1}\end{bmatrix}\right) \bmod{x^n} \\ 11&\textbf{return }U(x) \end{array}\]

which runs in time $12\mathsf{E}(n)+O(n)$.

Formal Laurent series

If finitely many terms of form $x^{\lt 0}$ are allowed, we define the formal Laurent series ring:

\[\mathbb{C}\left(\left(x\right)\right) := \left\lbrace \sum _ {j\geq N}a _ j x^j : N\in\mathbb{Z},a _ j \in\mathbb{C}\right\rbrace\]

and actually $\mathbb{C}\left(\left(x\right)\right)$ is a field since $\mathbb{C}$ is a field. To compute the truncated inverse of a given formal Laurent series, we could use the algorithm which we used in the formal power series cases.

Reversed Laurent series

I will not give the definition of reversed Laurent series, since it is quite nature if one replace $x$ with $x^{-1}$ for the formal Laurent series. We will use the notation $\mathbb{C}\left(\left(x^{-1}\right)\right)$.

Since we want to compute $x^k\bmod{Q(x)}$, we could find that

\[\begin{aligned} \frac{1}{Q(x)}&=\cdots +a _ {0}x^{-\deg Q} \in \mathbb{C}\left(\left(x^{-1}\right)\right) \\ \frac{x^k}{Q(x)}&=\cdots +a _ kx^{-\deg Q}+\cdots +a _ 0x^{k-\deg Q} \in \mathbb{C}\left(\left(x^{-1}\right)\right) \\ \left\lbrack x^{\left(-\infty,0\right)}\right\rbrack \frac{x^k}{Q(x)}&=\left\lbrack x^{\left(-\infty,0\right)}\right\rbrack \frac{x^k\bmod{Q(x)}}{Q(x)} \end{aligned}\]

If we know $\left\lbrack x^{\left\lbrack -\deg Q,0\right)}\right\rbrack\frac{x^k}{Q(x)}$, then we could compute $x^k\bmod{Q(x)}$ by a simple convolution, Bostan–Mori’s algorithm showed that

\[\frac{x^k}{Q(x)}=\frac{x^k}{Q(x)Q(-x)}\cdot Q(-x)\]

whether $k=2n$ or $k=2n+1$, we only need to compute

\[\left\lbrack x^{\left\lbrack -2\deg Q,0\right)}\right\rbrack \frac{x^{2n}}{Q(x)Q(-x)}\]

since $\left\lbrack x^{-2\deg Q-1}\right\rbrack \frac{x^{2n}}{Q(x)Q(-x)}=0$, if we set $V\left(x^2\right)=Q(x)Q(-x)$, then our goal is to compute

\[\left\lbrack x^{\left\lbrack -\deg Q,0\right)}\right\rbrack \frac{x^{n}}{V(x)}\]

The algorithm halts when $k=0$.

\[\begin{array}{ll} &\textbf{Algorithm }\operatorname{\mathsf{RLSBostanMori}}\text{:} \\ &\textbf{Input}\text{: }Q(x),k\in\mathbb{N}\text{.} \\ &\textbf{Output}\text{: }\left\lbrack x^{\left\lbrack -\deg Q,0\right)}\right\rbrack \dfrac{x^k}{Q(x)},\text{where }Q(x)^{-1}\in\mathbb{C}\left(\left( x^{-1}\right)\right)\text{.} \\ 1&\textbf{if }k=0 \textbf{ then return }\begin{bmatrix} \left(\left\lbrack x^{\deg Q}\right\rbrack Q(x)\right)^{-1} & 0 & \cdots & 0 \end{bmatrix} \\ 2&V\left(x^2\right)\gets Q(x)Q(-x) \\ 3&\begin{bmatrix} c _ {-\deg Q} & \cdots & c _ {-1}\end{bmatrix} \gets \operatorname{\mathsf{RLSBostanMori}}\left(V(x),\left\lfloor k/2\right\rfloor\right) \\ 4&T(x)\gets \sum _ {j=0}^{-1+\deg Q}c _ {j-\deg Q}x^j \\ 5&\sum _ {j=0}^{-1+3\deg Q} u _ jx^j\gets T\left(x^2\right)x^{k\bmod{2}}Q(-x) \\ 6&\textbf{return }\begin{bmatrix} u _ {\deg Q} & \cdots & u _ {-1+2\deg Q}\end{bmatrix} \end{array}\]

If we want make the algorithm faster, make the input and output in FFT. We omit the details.

An alternative way is that we could consider as formal Laurent series, this also works. I have implemented such an algorithm used in computing $x^k\bmod{Q(x)}$. The implementation maybe a little weird.

Connection between $\mathbb{C}\left(\left(x\right)\right)$ and $\mathbb{C}\left(\left(x^{-1}\right)\right)$

Let $Q\in\mathbb{C}\left\lbrack x\right\rbrack$, we have

\[\begin{aligned} \left\lbrack x^{\left\lbrack L,L+\deg Q\right)}\right\rbrack \frac{1}{Q(x)}&=\left\lbrack x^{\left\lbrack 0,\deg Q\right)}\right\rbrack \frac{x^{-L}}{Q(x)} \in\mathbb{C}\left(\left(x\right)\right) \\ &=\left\lbrack x^{\left\lbrack -\deg Q+1,1\right)}\right\rbrack \frac{x^L}{Q\left(x^{-1}\right)} \in\mathbb{C}\left(\left(x^{-1}\right)\right) \\ &=\left\lbrack x^{\left\lbrack -\deg Q,0\right)}\right\rbrack \frac{x^{-1+\deg Q}x^L}{x^{\deg Q}Q\left(x^{-1}\right)} \in\mathbb{C}\left(\left(x^{-1}\right)\right) \end{aligned}\]

If we want to compute $\left\lbrack x^{\lbrack L,R)}\right\rbrack \frac{P(x)}{Q(x)}\in \mathbb{C}\left(\left(x\right)\right)$ where $P,Q\in\mathbb{C}\left\lbrack x\right\rbrack$, we have

\[\begin{aligned} \frac{P\left(x^{-1}\right)}{Q\left(x^{-1}\right)}&=\sum _ {j\geq 0}a _ j x^{-j}\in\mathbb{C}\left(\left(x^{-1}\right)\right) \\ \frac{x^{n-1}P\left(x^{-1}\right)}{x^nQ\left(x^{-1}\right)}&=\sum _ {j\geq 0}a _ j x^{-j-1}\in\mathbb{C}\left(\left(x^{-1}\right)\right) \end{aligned}\]

Let $\tilde{P}(x):=x^{n-1}P\left(x^{-1}\right),\tilde{Q}(x):=x^nQ\left(x^{-1}\right)$ for sufficient large $n$ such that $\tilde{P},\tilde{Q}\in\mathbb{C}\left\lbrack x\right\rbrack$, we have

\[\begin{aligned} \left\lbrack x^{\left\lbrack -R,-L\right)}\right\rbrack \frac{\tilde{P}(x)}{\tilde{Q}(x)}&=\left\lbrack x^{\left\lbrack -R+L,0\right)}\right\rbrack \frac{x^L\cdot \tilde{P}(x)}{\tilde{Q}(x)} \\ &=\left\lbrack x^{\left\lbrack -R+L,0\right)}\right\rbrack \frac{\left(x^L\cdot \tilde{P}(x)\right)\bmod{\tilde{Q}(x)}}{\tilde{Q}(x)} \end{aligned}\]

References

Alin Bostan, Ryuhei Mori. A Simple and Fast Algorithm for Computing the N-th Term of a Linearly Recurrent Sequence. SOSA 2021: 118-132 url: https://arxiv.org/abs/2008.08822
Arne Storjohann. Algorithms for Matrix Canonical Forms. ETH Zürich. Diss., Technische Wissenschaften ETH Zürich, Nr. 13922, 2001. url: https://www.research-collection.ethz.ch/handle/20.500.11850/145127

Depends on

Verified with

Code

#pragma once

#include "fft.hpp"
#include "fft_doubling.hpp"
#include "fps_basic.hpp"
#include "poly_basic.hpp"
#include <algorithm>
#include <cassert>
#include <utility>
#include <vector>

// see:
// [1]: Alin Bostan, Ryuhei Mori.
//      A Simple and Fast Algorithm for Computing the N-th Term of a Linearly Recurrent Sequence.
//      https://arxiv.org/abs/2008.08822

template<typename Tp> inline void fft_high(std::vector<Tp> &a) {
    const int n = a.size();
    inv_fft_n(a.begin() + n / 2, n / 2);
    Tp k         = 1;
    const auto t = FftInfo<Tp>::get().inv_root(n / 2).at(n / 4);
    for (int i = 0; i < n / 2; ++i) a[i + n / 2] *= k, k *= t;
    fft_n(a.begin() + n / 2, n / 2);
    for (int i = 0; i < n / 2; ++i) a[i] = (a[i] - a[i + n / 2]).div_by_2();
    a.resize(n / 2);
}

// returns DFT([x^[L,L+len/2)]1/Q)
// 1/Q in R((x))
// requires len/2 > deg(Q), len/2 is even
template<typename Tp>
inline std::vector<Tp> bostan_mori_laurent_series(std::vector<Tp> dftQ, long long L) {
    const int len = dftQ.size() * 2;
    if (L <= 0) {
        inv_fft(dftQ);
        const int ordQ = order(dftQ);
        assert(ordQ >= 0);
        if (L + len / 2 <= -ordQ) return std::vector<Tp>(len / 2);
        auto invQ = fps_inv(std::vector(dftQ.begin() + ordQ, dftQ.end()), L + len / 2 + ordQ);
        if (-ordQ < (int)L) {
            // ?x^(-ord(Q)) + ... + ?x^L + ... + ?x^(L+len/2-1)
            invQ.erase(invQ.begin(), invQ.begin() + (L + ordQ));
        } else {
            // ?x^L + ... + ?x^(-ord(Q)) + ... + ?x^(L+len/2-1)
            invQ.insert(invQ.begin(), -ordQ - L, Tp(0));
        }
        fft(invQ);
        return invQ;
    }

    fft_doubling(dftQ);
    std::vector<Tp> dftV(len / 2);
    for (int i = 0; i < len; i += 2) dftV[i / 2] = dftQ[i] * dftQ[i + 1];
    // recursive call: ceil((L-len/2)/2)
    const auto dftT = bostan_mori_laurent_series(std::move(dftV), (L - len / 2 + (L & 1)) / 2);

    if (L & 1) {
        auto &&root = FftInfo<Tp>::get().root(len / 2);
        for (int i = 0; i < len; i += 2) {
            const auto u = dftQ[i], v = dftQ[i + 1];
            dftQ[i]     = dftT[i / 2] * v * root[i / 2];
            dftQ[i + 1] = dftT[i / 2] * u * -root[i / 2];
        }
    } else {
        for (int i = 0; i < len; i += 2) {
            const auto u = dftQ[i], v = dftQ[i + 1];
            dftQ[i]     = dftT[i / 2] * v;
            dftQ[i + 1] = dftT[i / 2] * u;
        }
    }

    fft_high(dftQ);
    return dftQ;
}

// returns DFT([x^[-len/2,0)]x^k/Q)
// x^k/Q in R((x^(-1)))
// requires len/2 > degQ
template<typename Tp>
inline std::vector<Tp> bostan_mori_reversed_laurent_series(std::vector<Tp> dftQ, long long k) {
    assert(k >= 0);
    const int len = dftQ.size() * 2;
    if (k < len / 2LL) {
        inv_fft(dftQ);
        const int degQ = degree(dftQ);
        assert(degQ >= 0);
        dftQ.resize(degQ + 1);
        std::reverse(dftQ.begin(), dftQ.end());
        auto invQ = fps_inv(dftQ, len / 2 - degQ + k + 1);
        std::reverse(invQ.begin(), invQ.end());
        invQ.resize(len / 2);
        fft(invQ);
        return invQ;
    }

    fft_doubling(dftQ);
    std::vector<Tp> dftV(len / 2);
    for (int i = 0; i < len; i += 2) dftV[i / 2] = dftQ[i] * dftQ[i + 1];
    const auto dftT = bostan_mori_reversed_laurent_series(std::move(dftV), k / 2);

    if (k & 1) {
        auto &&root = FftInfo<Tp>::get().root(len / 2);
        for (int i = 0; i < len; i += 2) {
            const auto u = dftQ[i], v = dftQ[i + 1];
            dftQ[i]     = dftT[i / 2] * v * root[i / 2];
            dftQ[i + 1] = dftT[i / 2] * u * -root[i / 2];
        }
    } else {
        for (int i = 0; i < len; i += 2) {
            const auto u = dftQ[i], v = dftQ[i + 1];
            dftQ[i]     = dftT[i / 2] * v;
            dftQ[i + 1] = dftT[i / 2] * u;
        }
    }

    fft_high(dftQ);
    return dftQ;
}

// returns x^k mod Q
template<typename Tp> inline std::vector<Tp> xk_mod(long long k, const std::vector<Tp> &Q) {
    assert(k >= 0);
    const int degQ = degree(Q);
    assert(degQ >= 0);
    if (degQ == 0) return {};
    if (k < degQ) {
        std::vector<Tp> res(degQ);
        res[k] = 1;
        return res;
    }

    const int len = fft_len(degQ * 2 + 1);
    if (k < len / 2LL) {
        auto invQ = fps_inv(std::vector(Q.rend() - (degQ + 1), Q.rend()), k + 1);
        std::reverse(invQ.begin(), invQ.end());
        invQ.resize(degQ);
        auto res = convolution(invQ, Q);
        res.erase(res.begin(), res.begin() + degQ);
        res.resize(degQ);
        return res;
    }

    auto dftQ = std::vector(Q.rend() - (degQ + 1), Q.rend());
    dftQ.resize(len);
    fft(dftQ);
    std::vector<Tp> dftV(len / 2);
    for (int i = 0; i < len; i += 2) dftV[i / 2] = dftQ[i] * dftQ[i + 1];
    const long long L = k + 1 - degQ;
    const auto dftT   = bostan_mori_laurent_series(dftV, (L - len / 2 + (L & 1)) / 2);
    std::vector<Tp> dftU(len);
    if (L & 1) {
        auto &&root = FftInfo<Tp>::get().root(len / 2);
        for (int i = 0; i < len; i += 2) {
            dftU[i]     = dftT[i / 2] * dftQ[i + 1] * root[i / 2];
            dftU[i + 1] = dftT[i / 2] * dftQ[i] * -root[i / 2];
        }
    } else {
        for (int i = 0; i < len; i += 2) {
            dftU[i]     = dftT[i / 2] * dftQ[i + 1];
            dftU[i + 1] = dftT[i / 2] * dftQ[i];
        }
    }
    inv_fft(dftU);
    dftU.erase(dftU.begin(), dftU.begin() + len / 2);
    dftU.resize(degQ);
    dftU.resize(len);
    fft(dftU);
    for (int i = 0; i < len; ++i) dftU[i] *= dftQ[i];
    inv_fft(dftU);
    dftU.resize(degQ);
    std::reverse(dftU.begin(), dftU.end());
    return dftU;
}

// returns [x^[L,R)]P/Q
// P: polynomial
// Q: non-zero polynomial, ord(Q)=0
template<typename Tp> inline std::vector<Tp>
slice_coeff_rational(const std::vector<Tp> &P, const std::vector<Tp> &Q, long long L, long long R) {
    assert(L >= 0);
    assert(order(Q) == 0);
    const int degP = degree(P);
    if (degP < 0) return std::vector<Tp>(R - L);
    const int degQ = degree(Q);
    const int N    = std::max(degP + 1, degQ);
    auto P0 = P, Q0 = Q;
    P0.resize(N);
    std::reverse(P0.begin(), P0.end());
    Q0.resize(N + 1);
    std::reverse(Q0.begin(), Q0.end());
    auto [q, r] = euclidean_div(convolution(xk_mod(L, Q0), P0), Q0);
    r.resize(N);
    std::reverse(r.begin(), r.end());
    return fps_div(r, Q, R - L);
}

// returns [x^k]P/Q
// P: polynomial
// Q: non-zero polynomial, ord(Q)=0
template<typename Tp>
inline Tp div_at(const std::vector<Tp> &P, const std::vector<Tp> &Q, long long k) {
    return slice_coeff_rational(P, Q, k, k + 1).at(0);
}

#line 2 "c_finite.hpp"

#line 2 "fft.hpp"

#include <algorithm>
#include <cassert>
#include <iterator>
#include <memory>
#include <vector>

template<typename Tp> class FftInfo {
    static Tp least_quadratic_nonresidue() {
        for (int i = 2;; ++i)
            if (Tp(i).pow((Tp::mod() - 1) / 2) == -1) return Tp(i);
    }

    const int ordlog2_;
    const Tp zeta_;
    const Tp invzeta_;
    const Tp imag_;
    const Tp invimag_;

    mutable std::vector<Tp> root_;
    mutable std::vector<Tp> invroot_;

    FftInfo()
        : ordlog2_(__builtin_ctzll(Tp::mod() - 1)),
          zeta_(least_quadratic_nonresidue().pow((Tp::mod() - 1) >> ordlog2_)),
          invzeta_(zeta_.inv()), imag_(zeta_.pow(1LL << (ordlog2_ - 2))), invimag_(-imag_),
          root_{Tp(1), imag_}, invroot_{Tp(1), invimag_} {}

public:
    static const FftInfo &get() {
        static FftInfo info;
        return info;
    }

    Tp imag() const { return imag_; }
    Tp inv_imag() const { return invimag_; }
    Tp zeta() const { return zeta_; }
    Tp inv_zeta() const { return invzeta_; }
    const std::vector<Tp> &root(int n) const {
        // [0, n)
        assert((n & (n - 1)) == 0);
        if (const int s = root_.size(); s < n) {
            root_.resize(n);
            for (int i = __builtin_ctz(s); (1 << i) < n; ++i) {
                const int j = 1 << i;
                root_[j]    = zeta_.pow(1LL << (ordlog2_ - i - 2));
                for (int k = j + 1; k < j * 2; ++k) root_[k] = root_[k - j] * root_[j];
            }
        }
        return root_;
    }
    const std::vector<Tp> &inv_root(int n) const {
        // [0, n)
        assert((n & (n - 1)) == 0);
        if (const int s = invroot_.size(); s < n) {
            invroot_.resize(n);
            for (int i = __builtin_ctz(s); (1 << i) < n; ++i) {
                const int j = 1 << i;
                invroot_[j] = invzeta_.pow(1LL << (ordlog2_ - i - 2));
                for (int k = j + 1; k < j * 2; ++k) invroot_[k] = invroot_[k - j] * invroot_[j];
            }
        }
        return invroot_;
    }
};

inline int fft_len(int n) {
    --n;
    n |= n >> 1, n |= n >> 2, n |= n >> 4, n |= n >> 8;
    return (n | n >> 16) + 1;
}

namespace detail {

template<typename Iterator> inline void
butterfly_n(Iterator a, int n,
            const std::vector<typename std::iterator_traits<Iterator>::value_type> &root) {
    assert(n > 0);
    assert((n & (n - 1)) == 0);
    const int bn = __builtin_ctz(n);
    if (bn & 1) {
        for (int i = 0; i < n / 2; ++i) {
            const auto a0 = a[i], a1 = a[i + n / 2];
            a[i] = a0 + a1, a[i + n / 2] = a0 - a1;
        }
    }
    for (int i = n >> (bn & 1); i >= 4; i /= 4) {
        const int i4 = i / 4;
        for (int k = 0; k < i4; ++k) {
            const auto a0 = a[k + i4 * 0], a1 = a[k + i4 * 1];
            const auto a2 = a[k + i4 * 2], a3 = a[k + i4 * 3];
            const auto a02p = a0 + a2, a02m = a0 - a2;
            const auto a13p = a1 + a3, a13m = (a1 - a3) * root[1];
            a[k + i4 * 0] = a02p + a13p, a[k + i4 * 1] = a02p - a13p;
            a[k + i4 * 2] = a02m + a13m, a[k + i4 * 3] = a02m - a13m;
        }
        for (int j = i, m = 2; j < n; j += i, m += 2) {
            const auto r = root[m], r2 = r * r, r3 = r2 * r;
            for (int k = j; k < j + i4; ++k) {
                const auto a0 = a[k + i4 * 0], a1 = a[k + i4 * 1] * r;
                const auto a2 = a[k + i4 * 2] * r2, a3 = a[k + i4 * 3] * r3;
                const auto a02p = a0 + a2, a02m = a0 - a2;
                const auto a13p = a1 + a3, a13m = (a1 - a3) * root[1];
                a[k + i4 * 0] = a02p + a13p, a[k + i4 * 1] = a02p - a13p;
                a[k + i4 * 2] = a02m + a13m, a[k + i4 * 3] = a02m - a13m;
            }
        }
    }
}

template<typename Iterator> inline void
inv_butterfly_n(Iterator a, int n,
                const std::vector<typename std::iterator_traits<Iterator>::value_type> &root) {
    assert(n > 0);
    assert((n & (n - 1)) == 0);
    const int bn = __builtin_ctz(n);
    for (int i = 4; i <= (n >> (bn & 1)); i *= 4) {
        const int i4 = i / 4;
        for (int k = 0; k < i4; ++k) {
            const auto a0 = a[k + i4 * 0], a1 = a[k + i4 * 1];
            const auto a2 = a[k + i4 * 2], a3 = a[k + i4 * 3];
            const auto a01p = a0 + a1, a01m = a0 - a1;
            const auto a23p = a2 + a3, a23m = (a2 - a3) * root[1];
            a[k + i4 * 0] = a01p + a23p, a[k + i4 * 1] = a01m + a23m;
            a[k + i4 * 2] = a01p - a23p, a[k + i4 * 3] = a01m - a23m;
        }
        for (int j = i, m = 2; j < n; j += i, m += 2) {
            const auto r = root[m], r2 = r * r, r3 = r2 * r;
            for (int k = j; k < j + i4; ++k) {
                const auto a0 = a[k + i4 * 0], a1 = a[k + i4 * 1];
                const auto a2 = a[k + i4 * 2], a3 = a[k + i4 * 3];
                const auto a01p = a0 + a1, a01m = a0 - a1;
                const auto a23p = a2 + a3, a23m = (a2 - a3) * root[1];
                a[k + i4 * 0] = a01p + a23p, a[k + i4 * 1] = (a01m + a23m) * r;
                a[k + i4 * 2] = (a01p - a23p) * r2, a[k + i4 * 3] = (a01m - a23m) * r3;
            }
        }
    }
    if (bn & 1) {
        for (int i = 0; i < n / 2; ++i) {
            const auto a0 = a[i], a1 = a[i + n / 2];
            a[i] = a0 + a1, a[i + n / 2] = a0 - a1;
        }
    }
}

} // namespace detail

// FFT_n: A(x) |-> bit-reversed order of [A(1), A(zeta_n), ..., A(zeta_n^(n-1))]
template<typename Iterator> inline void fft_n(Iterator a, int n) {
    using Tp = typename std::iterator_traits<Iterator>::value_type;
    detail::butterfly_n(a, n, FftInfo<Tp>::get().root(n / 2));
}

template<typename Tp> inline void fft(std::vector<Tp> &a) { fft_n(a.begin(), a.size()); }

// IFFT_n: bit-reversed order of [A(1), A(zeta_n), ..., A(zeta_n^(n-1))] |-> A(x)
template<typename Iterator> inline void inv_fft_n(Iterator a, int n) {
    using Tp = typename std::iterator_traits<Iterator>::value_type;
    detail::inv_butterfly_n(a, n, FftInfo<Tp>::get().inv_root(n / 2));
    const Tp iv = Tp::mod() - (Tp::mod() - 1) / n;
    for (int i = 0; i < n; ++i) a[i] *= iv;
}

template<typename Tp> inline void inv_fft(std::vector<Tp> &a) { inv_fft_n(a.begin(), a.size()); }

// IFFT_n^T: A(x) |-> 1/n FFT_n((x^n A(x^(-1))) mod (x^n - 1))
template<typename Iterator> inline void transposed_inv_fft_n(Iterator a, int n) {
    using Tp    = typename std::iterator_traits<Iterator>::value_type;
    const Tp iv = Tp::mod() - (Tp::mod() - 1) / n;
    for (int i = 0; i < n; ++i) a[i] *= iv;
    detail::butterfly_n(a, n, FftInfo<Tp>::get().inv_root(n / 2));
}

template<typename Tp> inline void transposed_inv_fft(std::vector<Tp> &a) {
    transposed_inv_fft_n(a.begin(), a.size());
}

// FFT_n^T : FFT_n((x^n A(x^(-1))) mod (x^n - 1)) |-> n A(x)
template<typename Iterator> inline void transposed_fft_n(Iterator a, int n) {
    using Tp = typename std::iterator_traits<Iterator>::value_type;
    detail::inv_butterfly_n(a, n, FftInfo<Tp>::get().root(n / 2));
}

template<typename Tp> inline void transposed_fft(std::vector<Tp> &a) {
    transposed_fft_n(a.begin(), a.size());
}

template<typename Tp> inline std::vector<Tp> convolution_fft(std::vector<Tp> a, std::vector<Tp> b) {
    if (a.empty() || b.empty()) return {};
    const int n   = a.size();
    const int m   = b.size();
    const int len = fft_len(n + m - 1);
    a.resize(len);
    b.resize(len);
    fft(a);
    fft(b);
    for (int i = 0; i < len; ++i) a[i] *= b[i];
    inv_fft(a);
    a.resize(n + m - 1);
    return a;
}

template<typename Tp> inline std::vector<Tp> square_fft(std::vector<Tp> a) {
    if (a.empty()) return {};
    const int n   = a.size();
    const int len = fft_len(n * 2 - 1);
    a.resize(len);
    fft(a);
    for (int i = 0; i < len; ++i) a[i] *= a[i];
    inv_fft(a);
    a.resize(n * 2 - 1);
    return a;
}

template<typename Tp>
inline std::vector<Tp> convolution_naive(const std::vector<Tp> &a, const std::vector<Tp> &b) {
    if (a.empty() || b.empty()) return {};
    const int n = a.size();
    const int m = b.size();
    std::vector<Tp> res(n + m - 1);
    for (int i = 0; i < n; ++i)
        for (int j = 0; j < m; ++j) res[i + j] += a[i] * b[j];
    return res;
}

template<typename Tp>
inline std::vector<Tp> convolution(const std::vector<Tp> &a, const std::vector<Tp> &b) {
    if (std::min(a.size(), b.size()) < 60) return convolution_naive(a, b);
    if (std::addressof(a) == std::addressof(b)) return square_fft(a);
    return convolution_fft(a, b);
}
#line 2 "fft_doubling.hpp"

#line 8 "fft_doubling.hpp"

template<typename Iterator> inline void fft_doubling_n(Iterator a, int n) {
    using Tp = typename std::iterator_traits<Iterator>::value_type;
    assert((n & (n - 1)) == 0);
    std::copy_n(a, n, a + n);
    inv_fft_n(a + n, n);
    Tp k         = 1;
    const auto t = FftInfo<Tp>::get().root(n).at(n / 2);
    for (int i = 0; i < n; ++i) a[i + n] *= k, k *= t;
    fft_n(a + n, n);
}

template<typename Tp> inline void fft_doubling(std::vector<Tp> &a) {
    const int n = a.size();
    a.resize(n * 2);
    fft_doubling_n(a.begin(), n);
}
#line 2 "fps_basic.hpp"

#line 2 "binomial.hpp"

#line 5 "binomial.hpp"

template<typename Tp> class Binomial {
    std::vector<Tp> factorial_, invfactorial_;

    Binomial() : factorial_{Tp(1)}, invfactorial_{Tp(1)} {}

    void preprocess(int n) {
        if (const int nn = factorial_.size(); nn < n) {
            int k = nn;
            while (k < n) k *= 2;
            k = std::min<long long>(k, Tp::mod());
            factorial_.resize(k);
            invfactorial_.resize(k);
            for (int i = nn; i < k; ++i) factorial_[i] = factorial_[i - 1] * i;
            invfactorial_.back() = factorial_.back().inv();
            for (int i = k - 2; i >= nn; --i) invfactorial_[i] = invfactorial_[i + 1] * (i + 1);
        }
    }

public:
    static const Binomial &get(int n) {
        static Binomial bin;
        bin.preprocess(n);
        return bin;
    }

    Tp binom(int n, int m) const {
        return n < m ? Tp() : factorial_[n] * invfactorial_[m] * invfactorial_[n - m];
    }
    Tp inv(int n) const { return factorial_[n - 1] * invfactorial_[n]; }
    Tp factorial(int n) const { return factorial_[n]; }
    Tp inv_factorial(int n) const { return invfactorial_[n]; }
};
#line 2 "semi_relaxed_conv.hpp"

#line 5 "semi_relaxed_conv.hpp"
#include <type_traits>
#include <utility>
#line 8 "semi_relaxed_conv.hpp"

template<typename Tp, typename Closure>
inline std::enable_if_t<std::is_invocable_r_v<Tp, Closure, int, const std::vector<Tp> &>,
                        std::vector<Tp>>
semi_relaxed_convolution_naive(const std::vector<Tp> &A, Closure gen, int n) {
    std::vector<Tp> B(n), AB(n);
    for (int i = 0; i < n; ++i) {
        for (int j = std::max(0, i - (int)A.size() + 1); j < i; ++j) AB[i] += A[i - j] * B[j];
        B[i] = gen(i, AB);
        if (!A.empty()) AB[i] += A[0] * B[i];
    }
    return B;
}

// returns coefficients generated by closure
// closure: gen(index, current_product)
template<typename Tp, typename Closure>
inline std::enable_if_t<std::is_invocable_r_v<Tp, Closure, int, const std::vector<Tp> &>,
                        std::vector<Tp>>
semi_relaxed_convolution(const std::vector<Tp> &A, Closure gen, int n) {
    if (A.size() < 60) return semi_relaxed_convolution_naive(A, gen, n);
    enum { BaseCaseSize = 32 };
    static_assert((BaseCaseSize & (BaseCaseSize - 1)) == 0);

    static const int Block[]     = {16, 16, 16, 16, 16};
    static const int BlockSize[] = {
        BaseCaseSize,
        BaseCaseSize * Block[0],
        BaseCaseSize * Block[0] * Block[1],
        BaseCaseSize * Block[0] * Block[1] * Block[2],
        BaseCaseSize * Block[0] * Block[1] * Block[2] * Block[3],
        BaseCaseSize * Block[0] * Block[1] * Block[2] * Block[3] * Block[4],
    };

    // returns (which_block, level)
    auto blockinfo = [](int ind) {
        int i = ind / BaseCaseSize, lv = 0;
        while ((i & (Block[lv] - 1)) == 0) i /= Block[lv++];
        return std::make_pair(i & (Block[lv] - 1), lv);
    };

    std::vector<Tp> B(n), AB(n);
    std::vector<std::vector<std::vector<Tp>>> dftA, dftB;

    for (int i = 0; i < n; ++i) {
        const int s = i & (BaseCaseSize - 1);

        // block contribution
        if (i >= BaseCaseSize && s == 0) {
            const auto [j, lv]  = blockinfo(i);
            const int blocksize = BlockSize[lv];

            if (blocksize * j == i) {
                if ((int)dftA.size() == lv) {
                    dftA.emplace_back();
                    dftB.emplace_back(Block[lv] - 1);
                }
                if ((j - 1) * blocksize < (int)A.size()) {
                    dftA[lv]
                        .emplace_back(A.begin() + (j - 1) * blocksize,
                                      A.begin() + std::min<int>((j + 1) * blocksize, A.size()))
                        .resize(blocksize * 2);
                    fft(dftA[lv][j - 1]);
                }
            }

            if (!dftA[lv].empty()) {
                dftB[lv][j - 1].resize(blocksize * 2);
                std::copy_n(B.begin() + (i - blocksize), blocksize, dftB[lv][j - 1].begin());
                std::fill_n(dftB[lv][j - 1].begin() + blocksize, blocksize, Tp(0));
                fft(dftB[lv][j - 1]);

                // middle product
                std::vector<Tp> mp(blocksize * 2);
                for (int k = 0; k < std::min<int>(j, dftA[lv].size()); ++k)
                    for (int l = 0; l < blocksize * 2; ++l)
                        mp[l] += dftA[lv][k][l] * dftB[lv][j - 1 - k][l];
                inv_fft(mp);

                for (int k = 0; k < blocksize && i + k < n; ++k) AB[i + k] += mp[k + blocksize];
            }
        }

        // basecase contribution
        for (int j = std::max(i - s, i - (int)A.size() + 1); j < i; ++j) AB[i] += A[i - j] * B[j];
        B[i] = gen(i, AB);
        if (!A.empty()) AB[i] += A[0] * B[i];
    }

    return B;
}
#line 8 "fps_basic.hpp"

template<typename Tp> inline int order(const std::vector<Tp> &a) {
    for (int i = 0; i < (int)a.size(); ++i)
        if (a[i] != 0) return i;
    return -1;
}

template<typename Tp> inline std::vector<Tp> fps_inv(const std::vector<Tp> &a, int n) {
    assert(order(a) == 0);
    if (n <= 0) return {};
    if (std::min<int>(a.size(), n) < 60)
        return semi_relaxed_convolution(
            a, [v = a[0].inv()](int n, auto &&c) { return n == 0 ? v : -c[n] * v; }, n);
    enum { Threshold = 32 };
    const int len = fft_len(n);
    std::vector<Tp> invA, shopA(len), shopB(len);
    invA = semi_relaxed_convolution(
        a, [v = a[0].inv()](int n, auto &&c) { return n == 0 ? v : -c[n] * v; }, Threshold);
    invA.resize(len);
    for (int i = Threshold * 2; i <= len; i *= 2) {
        std::fill(std::copy_n(a.begin(), std::min<int>(a.size(), i), shopA.begin()),
                  shopA.begin() + i, Tp(0));
        std::copy_n(invA.begin(), i, shopB.begin());
        fft_n(shopA.begin(), i);
        fft_n(shopB.begin(), i);
        for (int j = 0; j < i; ++j) shopA[j] *= shopB[j];
        inv_fft_n(shopA.begin(), i);
        std::fill_n(shopA.begin(), i / 2, Tp(0));
        fft_n(shopA.begin(), i);
        for (int j = 0; j < i; ++j) shopA[j] *= shopB[j];
        inv_fft_n(shopA.begin(), i);
        for (int j = i / 2; j < i; ++j) invA[j] = -shopA[j];
    }
    invA.resize(n);
    return invA;
}

template<typename Tp>
inline std::vector<Tp> fps_div(const std::vector<Tp> &a, const std::vector<Tp> &b, int n) {
    assert(order(b) == 0);
    if (n <= 0) return {};
    return semi_relaxed_convolution(
        b,
        [&, v = b[0].inv()](int n, auto &&c) {
            if (n < (int)a.size()) return (a[n] - c[n]) * v;
            return -c[n] * v;
        },
        n);
}

template<typename Tp> inline std::vector<Tp> deriv(const std::vector<Tp> &a) {
    const int n = (int)a.size() - 1;
    if (n <= 0) return {};
    std::vector<Tp> res(n);
    for (int i = 1; i <= n; ++i) res[i - 1] = a[i] * i;
    return res;
}

template<typename Tp> inline std::vector<Tp> integr(const std::vector<Tp> &a, Tp c = {}) {
    const int n = a.size() + 1;
    auto &&bin  = Binomial<Tp>::get(n);
    std::vector<Tp> res(n);
    res[0] = c;
    for (int i = 1; i < n; ++i) res[i] = a[i - 1] * bin.inv(i);
    return res;
}

template<typename Tp> inline std::vector<Tp> fps_log(const std::vector<Tp> &a, int n) {
    return integr(fps_div(deriv(a), a, n - 1));
}

template<typename Tp> inline std::vector<Tp> fps_exp(const std::vector<Tp> &a, int n) {
    if (n <= 0) return {};
    assert(a.empty() || a[0] == 0);
    return semi_relaxed_convolution(
        deriv(a),
        [bin = Binomial<Tp>::get(n)](int n, auto &&c) {
            return n == 0 ? Tp(1) : c[n - 1] * bin.inv(n);
        },
        n);
}

template<typename Tp> inline std::vector<Tp> fps_pow(std::vector<Tp> a, long long e, int n) {
    if (n <= 0) return {};
    if (e == 0) {
        std::vector<Tp> res(n);
        res[0] = 1;
        return res;
    }

    const int o = order(a);
    if (o < 0 || o > n / e || (o == n / e && n % e == 0)) return std::vector<Tp>(n);
    if (o != 0) a.erase(a.begin(), a.begin() + o);

    const Tp ia0 = a[0].inv();
    const Tp a0e = a[0].pow(e);
    const Tp me  = e;

    for (int i = 0; i < (int)a.size(); ++i) a[i] *= ia0;
    a = fps_log(a, n - o * e);
    for (int i = 0; i < (int)a.size(); ++i) a[i] *= me;
    a = fps_exp(a, n - o * e);
    for (int i = 0; i < (int)a.size(); ++i) a[i] *= a0e;

    a.insert(a.begin(), o * e, Tp(0));
    return a;
}
#line 2 "poly_basic.hpp"

#line 7 "poly_basic.hpp"
#include <array>
#line 10 "poly_basic.hpp"

template<typename Tp> inline int degree(const std::vector<Tp> &a) {
    int n = (int)a.size() - 1;
    while (n >= 0 && a[n] == 0) --n;
    return n;
}

template<typename Tp> inline void shrink(std::vector<Tp> &a) { a.resize(degree(a) + 1); }

template<typename Tp> inline std::vector<Tp> taylor_shift(std::vector<Tp> a, Tp c) {
    const int n = a.size();
    auto &&bin  = Binomial<Tp>::get(n);
    for (int i = 0; i < n; ++i) a[i] *= bin.factorial(i);
    Tp cc = 1;
    std::vector<Tp> b(n);
    for (int i = 0; i < n; ++i) {
        b[i] = cc * bin.inv_factorial(i);
        cc *= c;
    }
    std::reverse(a.begin(), a.end());
    auto ab = convolution(a, b);
    ab.resize(n);
    std::reverse(ab.begin(), ab.end());
    for (int i = 0; i < n; ++i) ab[i] *= bin.inv_factorial(i);
    return ab;
}

// returns (quotient, remainder)
// O(deg(Q)deg(B))
template<typename Tp> inline std::array<std::vector<Tp>, 2>
euclidean_div_naive(const std::vector<Tp> &A, const std::vector<Tp> &B) {
    const int degA = degree(A);
    const int degB = degree(B);
    assert(degB >= 0);
    const int degQ = degA - degB;
    if (degQ < 0) return {std::vector<Tp>{Tp(0)}, A};
    std::vector<Tp> Q(degQ + 1), R = A;
    const auto inv = B[degB].inv();
    for (int i = degQ, n = degA; i >= 0; --i)
        if ((Q[i] = R[n--] * inv) != 0)
            for (int j = 0; j <= degB; ++j) R[i + j] -= Q[i] * B[j];
    R.resize(degB);
    return {Q, R};
}

// O(min(deg(Q)^2,deg(Q)deg(B)))
template<typename Tp> inline std::vector<Tp>
euclidean_div_quotient_naive(const std::vector<Tp> &A, const std::vector<Tp> &B) {
    const int degA = degree(A);
    const int degB = degree(B);
    assert(degB >= 0);
    const int degQ = degA - degB;
    if (degQ < 0) return {Tp(0)};
    const auto inv = B[degB].inv();
    std::vector<Tp> Q(degQ + 1);
    for (int i = 0; i <= degQ; ++i) {
        for (int j = 1; j <= std::min(i, degB); ++j) Q[degQ - i] += B[degB - j] * Q[degQ - i + j];
        Q[degQ - i] = (A[degA - i] - Q[degQ - i]) * inv;
    }
    return Q;
}

// returns (quotient, remainder)
template<typename Tp> inline std::array<std::vector<Tp>, 2>
euclidean_div(const std::vector<Tp> &A, const std::vector<Tp> &B) {
    const int degA = degree(A);
    const int degB = degree(B);
    assert(degB >= 0);
    // A = Q*B + R => A/B = Q + R/B in R((x^(-1)))
    const int degQ = degA - degB;
    if (degQ < 0) return {std::vector<Tp>{Tp(0)}, A};
    if (degQ < 60 || degB < 60) return euclidean_div_naive(A, B);

    auto Q = fps_div(std::vector(A.rend() - (degA + 1), A.rend()),
                     std::vector(B.rend() - (degB + 1), B.rend()), degQ + 1);
    std::reverse(Q.begin(), Q.end());

    // returns a mod (x^n-1)
    auto make_cyclic = [](const std::vector<Tp> &a, int n) {
        assert((n & (n - 1)) == 0);
        std::vector<Tp> b(n);
        for (int i = 0; i < (int)a.size(); ++i) b[i & (n - 1)] += a[i];
        return b;
    };

    const int len      = fft_len(std::max(degB, 1));
    const auto cyclicA = make_cyclic(A, len);
    auto cyclicB       = make_cyclic(B, len);
    auto cyclicQ       = make_cyclic(Q, len);

    fft(cyclicQ);
    fft(cyclicB);
    for (int i = 0; i < len; ++i) cyclicQ[i] *= cyclicB[i];
    inv_fft(cyclicQ);

    // R = A - QB mod (x^n-1) (n >= degB)
    std::vector<Tp> R(degB);
    for (int i = 0; i < degB; ++i) R[i] = cyclicA[i] - cyclicQ[i];
    return {Q, R};
}

template<typename Tp>
inline std::vector<Tp> euclidean_div_quotient(const std::vector<Tp> &A, const std::vector<Tp> &B) {
    const int degA = degree(A);
    const int degB = degree(B);
    assert(degB >= 0);
    // A = Q*B + R => A/B = Q + R/B in R((x^(-1)))
    const int degQ = degA - degB;
    if (degQ < 0) return {Tp(0)};
    if (std::min(degQ, degB) < 60) return euclidean_div_quotient_naive(A, B);

    auto Q = fps_div(std::vector(A.rend() - (degA + 1), A.rend()),
                     std::vector(B.rend() - (degB + 1), B.rend()), degQ + 1);
    std::reverse(Q.begin(), Q.end());
    return Q;
}
#line 11 "c_finite.hpp"

// see:
// [1]: Alin Bostan, Ryuhei Mori.
//      A Simple and Fast Algorithm for Computing the N-th Term of a Linearly Recurrent Sequence.
//      https://arxiv.org/abs/2008.08822

template<typename Tp> inline void fft_high(std::vector<Tp> &a) {
    const int n = a.size();
    inv_fft_n(a.begin() + n / 2, n / 2);
    Tp k         = 1;
    const auto t = FftInfo<Tp>::get().inv_root(n / 2).at(n / 4);
    for (int i = 0; i < n / 2; ++i) a[i + n / 2] *= k, k *= t;
    fft_n(a.begin() + n / 2, n / 2);
    for (int i = 0; i < n / 2; ++i) a[i] = (a[i] - a[i + n / 2]).div_by_2();
    a.resize(n / 2);
}

// returns DFT([x^[L,L+len/2)]1/Q)
// 1/Q in R((x))
// requires len/2 > deg(Q), len/2 is even
template<typename Tp>
inline std::vector<Tp> bostan_mori_laurent_series(std::vector<Tp> dftQ, long long L) {
    const int len = dftQ.size() * 2;
    if (L <= 0) {
        inv_fft(dftQ);
        const int ordQ = order(dftQ);
        assert(ordQ >= 0);
        if (L + len / 2 <= -ordQ) return std::vector<Tp>(len / 2);
        auto invQ = fps_inv(std::vector(dftQ.begin() + ordQ, dftQ.end()), L + len / 2 + ordQ);
        if (-ordQ < (int)L) {
            // ?x^(-ord(Q)) + ... + ?x^L + ... + ?x^(L+len/2-1)
            invQ.erase(invQ.begin(), invQ.begin() + (L + ordQ));
        } else {
            // ?x^L + ... + ?x^(-ord(Q)) + ... + ?x^(L+len/2-1)
            invQ.insert(invQ.begin(), -ordQ - L, Tp(0));
        }
        fft(invQ);
        return invQ;
    }

    fft_doubling(dftQ);
    std::vector<Tp> dftV(len / 2);
    for (int i = 0; i < len; i += 2) dftV[i / 2] = dftQ[i] * dftQ[i + 1];
    // recursive call: ceil((L-len/2)/2)
    const auto dftT = bostan_mori_laurent_series(std::move(dftV), (L - len / 2 + (L & 1)) / 2);

    if (L & 1) {
        auto &&root = FftInfo<Tp>::get().root(len / 2);
        for (int i = 0; i < len; i += 2) {
            const auto u = dftQ[i], v = dftQ[i + 1];
            dftQ[i]     = dftT[i / 2] * v * root[i / 2];
            dftQ[i + 1] = dftT[i / 2] * u * -root[i / 2];
        }
    } else {
        for (int i = 0; i < len; i += 2) {
            const auto u = dftQ[i], v = dftQ[i + 1];
            dftQ[i]     = dftT[i / 2] * v;
            dftQ[i + 1] = dftT[i / 2] * u;
        }
    }

    fft_high(dftQ);
    return dftQ;
}

// returns DFT([x^[-len/2,0)]x^k/Q)
// x^k/Q in R((x^(-1)))
// requires len/2 > degQ
template<typename Tp>
inline std::vector<Tp> bostan_mori_reversed_laurent_series(std::vector<Tp> dftQ, long long k) {
    assert(k >= 0);
    const int len = dftQ.size() * 2;
    if (k < len / 2LL) {
        inv_fft(dftQ);
        const int degQ = degree(dftQ);
        assert(degQ >= 0);
        dftQ.resize(degQ + 1);
        std::reverse(dftQ.begin(), dftQ.end());
        auto invQ = fps_inv(dftQ, len / 2 - degQ + k + 1);
        std::reverse(invQ.begin(), invQ.end());
        invQ.resize(len / 2);
        fft(invQ);
        return invQ;
    }

    fft_doubling(dftQ);
    std::vector<Tp> dftV(len / 2);
    for (int i = 0; i < len; i += 2) dftV[i / 2] = dftQ[i] * dftQ[i + 1];
    const auto dftT = bostan_mori_reversed_laurent_series(std::move(dftV), k / 2);

    if (k & 1) {
        auto &&root = FftInfo<Tp>::get().root(len / 2);
        for (int i = 0; i < len; i += 2) {
            const auto u = dftQ[i], v = dftQ[i + 1];
            dftQ[i]     = dftT[i / 2] * v * root[i / 2];
            dftQ[i + 1] = dftT[i / 2] * u * -root[i / 2];
        }
    } else {
        for (int i = 0; i < len; i += 2) {
            const auto u = dftQ[i], v = dftQ[i + 1];
            dftQ[i]     = dftT[i / 2] * v;
            dftQ[i + 1] = dftT[i / 2] * u;
        }
    }

    fft_high(dftQ);
    return dftQ;
}

// returns x^k mod Q
template<typename Tp> inline std::vector<Tp> xk_mod(long long k, const std::vector<Tp> &Q) {
    assert(k >= 0);
    const int degQ = degree(Q);
    assert(degQ >= 0);
    if (degQ == 0) return {};
    if (k < degQ) {
        std::vector<Tp> res(degQ);
        res[k] = 1;
        return res;
    }

    const int len = fft_len(degQ * 2 + 1);
    if (k < len / 2LL) {
        auto invQ = fps_inv(std::vector(Q.rend() - (degQ + 1), Q.rend()), k + 1);
        std::reverse(invQ.begin(), invQ.end());
        invQ.resize(degQ);
        auto res = convolution(invQ, Q);
        res.erase(res.begin(), res.begin() + degQ);
        res.resize(degQ);
        return res;
    }

    auto dftQ = std::vector(Q.rend() - (degQ + 1), Q.rend());
    dftQ.resize(len);
    fft(dftQ);
    std::vector<Tp> dftV(len / 2);
    for (int i = 0; i < len; i += 2) dftV[i / 2] = dftQ[i] * dftQ[i + 1];
    const long long L = k + 1 - degQ;
    const auto dftT   = bostan_mori_laurent_series(dftV, (L - len / 2 + (L & 1)) / 2);
    std::vector<Tp> dftU(len);
    if (L & 1) {
        auto &&root = FftInfo<Tp>::get().root(len / 2);
        for (int i = 0; i < len; i += 2) {
            dftU[i]     = dftT[i / 2] * dftQ[i + 1] * root[i / 2];
            dftU[i + 1] = dftT[i / 2] * dftQ[i] * -root[i / 2];
        }
    } else {
        for (int i = 0; i < len; i += 2) {
            dftU[i]     = dftT[i / 2] * dftQ[i + 1];
            dftU[i + 1] = dftT[i / 2] * dftQ[i];
        }
    }
    inv_fft(dftU);
    dftU.erase(dftU.begin(), dftU.begin() + len / 2);
    dftU.resize(degQ);
    dftU.resize(len);
    fft(dftU);
    for (int i = 0; i < len; ++i) dftU[i] *= dftQ[i];
    inv_fft(dftU);
    dftU.resize(degQ);
    std::reverse(dftU.begin(), dftU.end());
    return dftU;
}

// returns [x^[L,R)]P/Q
// P: polynomial
// Q: non-zero polynomial, ord(Q)=0
template<typename Tp> inline std::vector<Tp>
slice_coeff_rational(const std::vector<Tp> &P, const std::vector<Tp> &Q, long long L, long long R) {
    assert(L >= 0);
    assert(order(Q) == 0);
    const int degP = degree(P);
    if (degP < 0) return std::vector<Tp>(R - L);
    const int degQ = degree(Q);
    const int N    = std::max(degP + 1, degQ);
    auto P0 = P, Q0 = Q;
    P0.resize(N);
    std::reverse(P0.begin(), P0.end());
    Q0.resize(N + 1);
    std::reverse(Q0.begin(), Q0.end());
    auto [q, r] = euclidean_div(convolution(xk_mod(L, Q0), P0), Q0);
    r.resize(N);
    std::reverse(r.begin(), r.end());
    return fps_div(r, Q, R - L);
}

// returns [x^k]P/Q
// P: polynomial
// Q: non-zero polynomial, ord(Q)=0
template<typename Tp>
inline Tp div_at(const std::vector<Tp> &P, const std::vector<Tp> &Q, long long k) {
    return slice_coeff_rational(P, Q, k, k + 1).at(0);
}

C-Finite Sequence (c_finite.hpp)