Algorithm-Hardware Co-design Methodology and the Case Study in Cryptographic Applications
Talk, Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China
Lattice-based cryptography (LBC) has been established as a prominent research field, with particular attention on post-quantum cryptography (PQC) and fully homomorphic encryption (FHE). As the implementing bottleneck of PQC and FHE, polynomial multiplication over cyclonic ring has been extensively studied. While schoolbook implementation of the polynomial multiplication takes the computation complexity of O(n^2) for length-n polynomials, the number theoretic transform (NTT) based polynomial multiplication reduces the complexity to O(nlogn). Although the complexity of the NTT algorithm is much lower than the schoolbook algorithm, the efficiency of the NTT algorithm in parallel computing has yet to be effectively improved. Besides, current works struggled with scalability, hindering their adaptation to various parameters, such as bit-width and polynomial length.
This thesis proposes algorithmic and architectural optimizations to improve the efficiency of NTT-based LBC implementations in parallel computing. Firstly, we propose the novel Discrete Galois Transformation (DGT) algorithms utilizing the split-radix variant and radix-4 variant, respectively.DGT is a variant of NTT that reduces transform length to half, but DGT requires more multiplication operations than the latest NTT algorithm in theoretical analysis. The proposal of DGT utilizing the split-radix variant could improve the number of multiplication operations compared to the existing NTT while achieving a higher level of parallelism to the existing NTT. Besides, the proposal of DGT utilizing the radix-4 variant could further improve the level of parallelism and reduce the memory throughput to the existing NTT.