Most RNG theorists think in terms of software, and this shows in the algorithms they design, which implicitly assume that the underlying execution environment is word-based and sequential. However, FPGAs are bit-based and extremely parallel, so directly mapping algorithms from software to hardware generally leads to some sort of inefficiency. This inefficiency might manifest as one or more of the following problems:
Unpredictable generation rate: Many of the most efficient non-uniform RNGs use the rejection method, which do not guarantee to produce a new number on each execution of the main RNG iteration. This presents a significant problem in hardware simulations, as it means on some cycles the RNG will stall the downstream processing components, requiring extra hardware resources, and making certain types of static scheduling impossible.
Low clock frequency: A sequence of instructions may have to be mapped into a long combinatorial path. Typically this path is within a loop in the algorithm, meaning the path cannot be pipelined or retimed.
Partially unused logic elements: long combinatorial paths also mean that in many logic elements only the LUT will be used, wasting FF resources. Or in cases where the iteration can be pipelined, LUT resources will be wasted because FFs are needed to buffer data.
Poor utilisation of RAMs: many software algorithms assume that RAM is a large central resource, which can be read and written multiple times for each generated number. Such algorithms typically require storage to be split across multiple physical block RAMs, which are often a scarce resource in FPGAs.
Mismatch between word lengths: A particular problem with uniform generators is that they assume a specific word length (e.g. 32 or 64), and the algorithms assume that instructions have to operate on words, and the published algorithm parameterisations only cater to those word-lengths. For example, this often leads to FPGA developers instantiating two 32-bit generators because they need 48 bits every cycle.
My approach is to take the underlying idea from software generators, but to throw away the existing software algorithm. Hopefully we can then build a completely new generation algorithm or architecture, using the same basic idea, but which makes optimal use of the resources and capabilities of modern FPGAs.
Up: RNGs for FPGAs, Next: Uniform RNGs