Replace CityHash with a more modern hash function
The design of CityHash, the hash function used in NameTree and DeadNonceList, is somewhat outdated. While it's still very fast, the hashes produced, especially by the seed-less
CityHash64 variant, have a number of quality issues (see SMhasher for details). More modern hash functions produce better quality hashes and can be as fast as CityHash, if not faster.
After a quick survey and a few simple benchmarks, I came up with three candidates that I believe are the most promising potential replacements for CityHash in NFD. All three produce high quality hashes.
- xxHash, specifically the new
XXH3_64bitsvariant. Speed comparable to
CityHash64(sometimes faster, sometimes slower, but not by much). When compiled with
-march=nativeon an AVX2-capable CPU, it becomes almost always faster than CityHash (but marginally so).
- wyhash. This seems to be the fastest in my tests.
- t1ha, specifically the
t1ha2_atoncevariant. Slightly, but consistently, slower than all previous functions (including CityHash). Also provides a pair of endian-specific
t1ha1_*variants that are supposed to be faster but lower quality than
t1ha2; however, in my tests I didn't see any measurable speed difference so I guess there's no reason to consider the
I explicitly did not include SipHash. While it provides stronger security guarantees, I don't think it makes much sense to use SipHash just in a couple of places (NameTree and DNL) while everywhere else we use the standard C++ hash table implementation with its default hash function. SipHash is also much slower than the proposed alternatives.
Note that I've only considered 64-bit hashes here. The above functions do work on 32-bit architectures as well, but they will typically be slower than dedicated 32-bit hash functions. In 2021 I don't think we should spend too much time optimizing for obsolete architectures such as x86 and arm, but if someone wanted to look into this, it would be trivial to select the hash function at compile time based on the target platform (we don't need stability/portability of the hash value in NFD).