Recently, ARM NEON architecture has occupied a significant
share of tablet and smartphone markets due to its low cost
and high performance. This paper studies efficient techniques of
lattice-based cryptography on ARM processor and presents the
first implementation of ring-LWE encryption on ARM NEON
architecture. In particular, we propose a vectorized version of
Iterative Number Theoretic Transform (NTT) for high-speed
computation. We present a 32-bit variant of SAMS2 technique,
original proposed in CHES'15, for fast reduction. A combination
of proposed and previous optimizations results in a very efficient
implementation. For 128-bit security level, our ring-LWE implementation
requires only 145; 200 clock cycles for encryption
and 32; 800 cycles for decryption. These result are more than
17:6 times faster than the fastest ECC implementation on ARM
NEON with same security level.
↧