Algorithm-Hardware Co-design of Split-Radix Discrete Galois Transformation for KyberKEM

Date:

KyberKEM is one of the finalized key encapsulation mechanisms (KEMs) in the NIST postquantum cryptography (PQC) competition. In the NIST competition, the performance of the implementation of KyberKEM on the field-programmable gate array (FPGA) is one of the essential issues. Related works proposed plenty of improvements on negative wrapped convolution (NWC) via number theoretic transform (NTT), which could be considered as the bottleneck of KyberKEM. Discrete Galois Transformation (DGT) is a variant of NTT working on GF(q^2), reducing the transform length in half. But DGT requires more multiplication computation operations than NTT. In this work, we first proposed the novel DGT variant by introducing the “split radix” method, which is named split-radix DGT. When the polynomial size N = 1024, the proposed split-radix DGT requires 93% multiplication operations of the state-of-the-art NTT without compromising the reduction of the transform length. Secondly, a dedicated stream permutation network and the corresponding scheduling plan for the split-radix DGT in KyberKEM are proposed, which could reduce 25% clock cycle counts compared to the state-of-the-art NTT architecture in KyberKEM. Then, the pipelined unified split-radix DGT processor is proposed and implemented on Xilinx Zynq-7000 platform at 240 MHz, which could achieve at least 1.33 times faster NTT, and 1.94 times faster PWM with 0.91 times smaller area-time product (ATP) compared with the results of the state-of-the-art designs of NWC in KyberKEM on similar platforms. Finally, the highly efficient architecture of KyberKEM on FPGA using split-radix DGT is proposed. The implementation results on an FPGA show that our design could outperform the state-of-the-art designs of KyberKEM on similar platforms.