Talks and presentations

Algorithm-Hardware Co-design of Split-Radix Discrete Galois Transformation for KyberKEM

November 25, 2022

Talk, Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China

KyberKEM is one of the finalized key encapsulation mechanisms (KEMs) in the NIST postquantum cryptography (PQC) competition. In the NIST competition, the performance of the implementation of KyberKEM on the field-programmable gate array (FPGA) is one of the essential issues. Related works proposed plenty of improvements on negative wrapped convolution (NWC) via number theoretic transform (NTT), which could be considered as the bottleneck of KyberKEM. Discrete Galois Transformation (DGT) is a variant of NTT working on GF(q^2), reducing the transform length in half. But DGT requires more multiplication computation operations than NTT. In this work, we first proposed the novel DGT variant by introducing the “split radix” method, which is named split-radix DGT. When the polynomial size N = 1024, the proposed split-radix DGT requires 93% multiplication operations of the state-of-the-art NTT without compromising the reduction of the transform length. Secondly, a dedicated stream permutation network and the corresponding scheduling plan for the split-radix DGT in KyberKEM are proposed, which could reduce 25% clock cycle counts compared to the state-of-the-art NTT architecture in KyberKEM. Then, the pipelined unified split-radix DGT processor is proposed and implemented on Xilinx Zynq-7000 platform at 240 MHz, which could achieve at least 1.33 times faster NTT, and 1.94 times faster PWM with 0.91 times smaller area-time product (ATP) compared with the results of the state-of-the-art designs of NWC in KyberKEM on similar platforms. Finally, the highly efficient architecture of KyberKEM on FPGA using split-radix DGT is proposed. The implementation results on an FPGA show that our design could outperform the state-of-the-art designs of KyberKEM on similar platforms.

Multiplication Algorithms in Lightweight PQC - Binary Ring-LWE

June 23, 2020

Talk, Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China

The most time-consuming function in most Post-Quantum Cryptography (PQC) is the polynomial multiplication modulo q. Based on the fact, researchers have developed many polynomial multiplication methods, such as Toom-Cook, Karatsuba, and NTT. One of the key strategies to implement and realize the multiplication algorithm with high performance is in the parallelized nature and relies on the compromise between the device area and the performance. However, in some lightweight scenarios, such as IoT applications, such a compromise is not acceptable because of the limited device area. In this sharing, I would introduce a lightweight PQC algorithm named the Binary Ring-LWE, with a sparse convolution product method.