Feature #5175
openReplace CityHash with a more modern hash function
0%
Description
The design of CityHash, the hash function used in NameTree and DeadNonceList, is somewhat outdated. While it's still very fast, the hashes produced, especially by the seed-less CityHash64
variant, have a number of quality issues (see SMhasher for details). More modern hash functions produce better quality hashes and can be as fast as CityHash, if not faster.
After a quick survey and a few simple benchmarks, I came up with three candidates that I believe are the most promising potential replacements for CityHash in NFD. All three produce high quality hashes.
- xxHash, specifically the new
XXH3_64bits
variant. Speed comparable toCityHash64
(sometimes faster, sometimes slower, but not by much). When compiled with-march=native
on an AVX2-capable CPU, it becomes almost always faster than CityHash (but marginally so). - wyhash. This seems to be the fastest in my tests.
- t1ha, specifically the
t1ha2_atonce
variant. Slightly, but consistently, slower than all previous functions (including CityHash). Also provides a pair of endian-specifict1ha1_*
variants that are supposed to be faster but lower quality thant1ha2
; however, in my tests I didn't see any measurable speed difference so I guess there's no reason to consider thet1ha1_*
variants.
I explicitly did not include SipHash. While it provides stronger security guarantees, I don't think it makes much sense to use SipHash just in a couple of places (NameTree and DNL) while everywhere else we use the standard C++ hash table implementation with its default hash function. SipHash is also much slower than the proposed alternatives.
Note that I've only considered 64-bit hashes here. The above functions do work on 32-bit architectures as well, but they will typically be slower than dedicated 32-bit hash functions. In 2021 I don't think we should spend too much time optimizing for obsolete architectures such as x86 and arm, but if someone wanted to look into this, it would be trivial to select the hash function at compile time based on the target platform (we don't need stability/portability of the hash value in NFD).