We’re dedicated to bringing robust, next-generation cryptography to even the most resource-constrained devices. One direction of ongoing work has focused heavily on implementing and optimizing the NIST-standardized post-quantum cryptographic algorithms, ML-KEM (formerly Kyber) and ML-DSA (formerly Dilithium), along with the SHA-3 hash function, specifically for the Internet of Things (IoT) ecosystem. All of our technical work and results are open source and available on our Github repository.

Balancing Performance, Memory, and Security on IoT

Developing cryptography for embedded devices in a high-level language like Rust presents unique challenges. IoT devices operate with drastically reduced memory, especially low-latency RAM, and load operations are particularly costly. Our goal isn’t necessarily to achieve the absolute fastest implementation, but to provide a formally verifiable solution that is easy to use and offers best in class performance. This requires tight control over memory and operations to ensure optimal pipelining effects and memory access patterns.

Initially, we aimed for a single implementation that could work across all platforms, from desktop computers to IoT devices. However, to achieve the necessary performance improvements on IoT devices, we pivoted to a separate, optimized implementation of ML-KEM, ML-DSA, and SHA-3.

A significant focus of our optimization efforts has been on the SHA-3 hash implementation, which accounts for most of the computation in both ML-KEM and ML-DSA. We’ve implemented SHA-3 with bit interleaving and in-place operations, minimizing memory usage and boosting performance for 32-bit devices. We also fine-tuned Rust instructions to generate memory access patterns that leverage armv7-m pipelining.

For ML-KEM and ML-DSA, we’ve implemented improved polynomial arithmetic that caches intermediate values and reduces the number of required modular reduction operations. While exploring various academic optimizations, we carefully evaluated their viability and potential downsides. For example, some techniques, like using floating-point registers in non-standard ways or specific modular arithmetic optimizations requiring inline assembly, were deemed either unsafe in Rust or provided insufficient performance gains to justify the increased verification burden.

Further optimizations include reducing stack usage by computing matrix entries on the fly, rather than holding the entire matrix in memory. We also improved performance and memory usage by carefully tuning function inlining and re-using allocated memory where possible.

What’s Next?

Our work doesn’t stop here. By focusing on efficient, secure, and verified implementations of ML-KEM, ML-DSA, and SHA-3, we’re paving the way for a more secure and trustworthy IoT landscape. We will continue our work to formally verify these implementations, build a TLS 1.3 IoT library on top of these cryptographic primitives, and work towards certification for use in safety critical applications of our formally verified secure communications stack.

Reach out if you are interested in licensing the libcrux-iot library to secure your connections with high-performance, high-assurance post-quantum cryptography.