Task #1541
closedInvestigate applicability of unordered_map for NameTree
Added by Alex Afanasyev over 10 years ago. Updated over 8 years ago.
60%
Description
Depending on how our hash table is used, either std::unordered_map
or boost::intrusive::unordered_map
may simplify hash table implementation.
Updated by Alex Afanasyev over 10 years ago
- Target version changed from v0.2 to Unsupported
Updated by Junxiao Shi about 10 years ago
- Subject changed from Investigate applicability of boost::unordered_map or boost::intrusive::unordered_map for NameTree to Investigate applicability of Boost unordered_map for NameTree
- Priority changed from Normal to Low
- Target version changed from Unsupported to v0.3
- Start date deleted (
04/25/2014)
@Haowei agrees to work on this Task in 0.3 timeline.
Updated by Junxiao Shi about 10 years ago
- Subject changed from Investigate applicability of Boost unordered_map for NameTree to Investigate applicability of unordered_map for NameTree
- Description updated (diff)
Change boost::unordered_map
to std::unordered_map
because it becomes available with C++11.
Updated by Haowei Yuan almost 10 years ago
- Status changed from New to In Progress
Updated by Haowei Yuan over 9 years ago
What is the Table structure used in the current Content Store implementation? Is it the same as defined in http://www.boost.org/doc/libs/1_48_0/boost/unordered/detail/table.hpp?
Updated by Junxiao Shi over 9 years ago
ContentStore index requires an ordered container type. Currently std::set
is chosen. It's benchmarked to be faster than Ilya's skip list.
Note that ContentStore isn't attached to NameTree.
Updated by Haowei Yuan over 9 years ago
- % Done changed from 40 to 60
The std::unordered_map
supports customized hash function, and a equal_to
predicate can be provided to determine if two keys match. As a result, it seems possible to use std::unordered_map
to achieve hash table functions. However, I have noticed two performance related issues so far.
lookup()
The name tree lookup keys are the name prefix strings. To accelerate lookup, in the current hash table implementation, each hash table entry stores the full hash value, and string matching is required only if the hash values match.
It is not clear if the same optimization can be supported by std::unordered_map
. For instance, to use the equal_to
predicate, we could customize a key structure that contains both the hash value and the key. However, during a lookup, the hash value needs to be computed before calling the unordered_map.find()
function, which wastes CPU cycles as the hash values would essentially be computed twice.
The following is an example using a customized key (original link: http://stackoverflow.com/questions/17016175/c-unordered-map-using-a-custom-class-type-as-the-key)
struct Key
{
std::string first;
std::string second;
int third;
bool operator==(const Key &other) const
{ return (first == other.first
&& second == other.second
&& third == other.third);
}
};
int main()
{
std::unordered_map<Key,std::string> m6 = {
{ {"John", "Doe", 12}, "example"},
{ {"Mary", "Sue", 21}, "another"}
};
}
- Hash table
resize()
The other benefit of storing hash values in the hash table is that, during resize()
, we don't need to recompute the hash values.
The std::unordered_map
rehash()
function would require rehash all the items stored in the hash table. When the table size is large, the rehash()
function may take considerable amount of time. So far, I have not found alternative methods for rehash().
I will continue exploring ways to accelerate lookup()
and resize()
. Please let me know if you are aware of any alternative methods.
Updated by Junxiao Shi over 9 years ago
during a lookup, the hash value needs to be computed before calling the
unordered_map.find()
function, which wastes CPU cycles as the hash values would essentially be computed twice
I don't understand this statement.
Consider the following key type:
class Key
{
public:
// implicit conversion from Name
Key(const Name& name)
: hash(std::hash<Name>()(name))
, m_name(name)
{
}
// implicit conversion to Name
operator Name()
{
return m_name;
}
public:
const size_t hash;
private:
const Name m_name;
};
namespace std {
template<>
inline size_t
hash<Key>::operator()(const Key& key)
{
return key.hash;
}
} // namespace std
When doing a lookup, NameTree constructs a Key
instance which involves one Name hash computation (call std::hash<Name>::operator()
).
There's no more Name hash computation inside the unordered_map
.
During a rehash, there is no Name hash computation, because std::hash<Key>::operator()
never invokes Name hash computation.
Updated by Haowei Yuan over 9 years ago
Thank you, Junxiao, this is very helpful.
I will try to go though the required functions with this Key structure, and see if all of them can be supported.
Updated by Davide Pesavento about 9 years ago
- Target version changed from v0.3 to v0.5
any news here?
Updated by Haowei Yuan about 9 years ago
Davide Pesavento wrote:
any news here?
Not much recently, I will post some progress soon.
Updated by Haowei Yuan almost 9 years ago
I will write some code to verify all the existing functions can be supported this week, and estimate how much work will be involved if we decide to take this direction.
Updated by Junxiao Shi over 8 years ago
- Status changed from In Progress to Abandoned
20160526 call decides to abandon this task, because there's no evidence that Haowei's hash table implementation has bugs, and switching to STL is likely to worsen performance.