Abstract
Modern-day computers are characterized by a striking contrast between the processing power of the CPU and the latency of main memory accesses. If the data processed is both large compared to processor caches and sparse or high-dimensional in nature, as is commonly the case in complex network research, the main memory latency can become a performace bottleneck. In this article, we present a cache-efficient data structure, a variant of a linear probing hash table, for representing edge sets of such networks. The performance benchmarks show that it is, indeed, quite superior to its commonly used counterparts in this application. In addition, its memory footprint only exceeds the absolute minimum by a small constant factor. The practical usability of our approach has been well demonstrated in the study of very large real-world networks.
Notes
1. The n in the expression for the fill rate should include the elements marked as removed, also.
2. This case does not correspond to any constant α.
3. The table sizes used in this implementation are primes close to 2/3 time successive powers of two. The smallest table size available is 53.
4. In fact, Fibonacci hashing can be implemented very efficiently by utilizing bit-shifting in which this power is used.