This is the 2nd report of the Crypto Core FPGA project — (more than) 2 weeks late. Sorry 🙂
Here is the first part in case you missed it: https://medium.com/@punpck/iota-crypto-core-fpga-1st-progress-report-caebe1dac579
In the last report I included an evaluation of the Troika hashing algorithm and came to the conclusion that it’s not running very well on binary CPU architectures — but I also wrote that there certainly is room for optimizations. Reference implementations have to be clear and understandable but not necessarily very efficient.
A “silent hero” (he didn’t reveal his identity) managed to optimize Troika significantly — he managed to get a gain of x11.63 on my Cortex M1 which is very impressive. You can find his Troika optimizations here: https://github.com/c-mnd/troika
Here are some new numbers:
Interesting to note is that a lot of type conversions (Byte to Trytes and Trytes to Bytes) wouldn’t be needed anymore if Kerl would switch to Troika. There still is a factor of 5 between SHA3 and the optimized Troika but without type conversions Troika really could be used without much performance penalty!
I tried to replace SHA3 with Troika (done sloppily) but there is one smaller problem I stumbled upon on. The Troika implementation doesn’t support “streaming” yet. Here are my observations: https://github.com/Troikahash/reference/issues/2
Because I already am short of time I didn’t continue to pursue this topic. Perhaps someone wants to try it?
Crypto FPGA Core Optimizations
Some things were improved on the FPGA core — for instance a x4 speed-up could be archived in Bytes to Trytes conversion by replacing division to fixed-point multiplication. Instead of dividing by 27 it can be multiplied by 1/27. Tests showed that 43Bit (fractional) are needed to calculate the correct result. Since multiplications don’t calculate reminders of divisions a second multiplier was needed followed by a substraction. This sounds complicated but it’s worth it because the division needed about 42 clock-cycles but two multiplications only need 4 (including subtraction). The FPGA has dedicated DSP (digital signal processing) blocks which can be used for such calculations. For this reason resource usage didn’t increase much.
I have to admit, I got distracted a bit from the project but I think it was worth it because Troika could become the default hashing algorithm in IOTA.
I tried to squeeze the Troika FPGA core into a $5 small FPGA (the one, I used on the PoWChip prototype; the lower black chip).
It took more time than expected and it isn’t very fast (10k blocks (including SPI transfer times) in 11.6s; 1k blocks with 27 hash-loops (including SPI transfer times and auto-padding) in 2.6s) because resources were too little to calculate a complete hashing-round within one clock-cycle. So it needs 55 clock cycles and additionally SPI data transfer times add.
But I integrated some features which could partly compensate the lower speed:
- The FPGA core supports auto-padding for 243Trit input vectors (almost everything with addresses and signing works on 243Trit). The core automatically can add an additional block with padding like it is done in the reference implementation of Troika
- The core can do multiple hashing-rounds (also with auto-padding). For instance a private key has to be hashed several times in a loop for address generation. The same with signing a transaction. This function allows to hash input data e.g. 27 times in a loop without having to transfer new data via SPI to the core.
- The core can do nested hashing. There is only one hashing-core but it can happen that multiple hashs with different states have to be hashed nested. There is a kind of stack implemented on which the state can be pushed and popped.
I released everything (FPGA-Core, PCB-design, software for the STM32 µC) here: https://github.com/shufps/troika_ice40
A secure element was successfully attached to the FPGA. An ATECC608A was used which is very cheap but secure. It’s a quite new chip and the full datasheet is only available with NDA but there is an open-source library from MicroChip which could be used.