Project

General

Profile

Actions

Task #1541

closed

Investigate applicability of unordered_map for NameTree

Added by Alex Afanasyev over 10 years ago. Updated over 8 years ago.

Status:
Abandoned
Priority:
Low
Assignee:
Category:
Tables
Target version:
Start date:
Due date:
% Done:

60%

Estimated time:

Description

Depending on how our hash table is used, either std::unordered_map or boost::intrusive::unordered_map may simplify hash table implementation.

Actions #1

Updated by Junxiao Shi over 10 years ago

  • Description updated (diff)
Actions #2

Updated by Alex Afanasyev over 10 years ago

  • Target version changed from v0.2 to Unsupported
Actions #3

Updated by Junxiao Shi about 10 years ago

  • Subject changed from Investigate applicability of boost::unordered_map or boost::intrusive::unordered_map for NameTree to Investigate applicability of Boost unordered_map for NameTree
  • Priority changed from Normal to Low
  • Target version changed from Unsupported to v0.3
  • Start date deleted (04/25/2014)

@Haowei agrees to work on this Task in 0.3 timeline.

Actions #4

Updated by Junxiao Shi about 10 years ago

  • Subject changed from Investigate applicability of Boost unordered_map for NameTree to Investigate applicability of unordered_map for NameTree
  • Description updated (diff)

Change boost::unordered_map to std::unordered_map because it becomes available with C++11.

Actions #5

Updated by Haowei Yuan almost 10 years ago

  • Status changed from New to In Progress
Actions #6

Updated by Haowei Yuan over 9 years ago

  • % Done changed from 0 to 40
Actions #7

Updated by Haowei Yuan over 9 years ago

What is the Table structure used in the current Content Store implementation? Is it the same as defined in http://www.boost.org/doc/libs/1_48_0/boost/unordered/detail/table.hpp?

Actions #8

Updated by Junxiao Shi over 9 years ago

ContentStore index requires an ordered container type. Currently std::set is chosen. It's benchmarked to be faster than Ilya's skip list.

Note that ContentStore isn't attached to NameTree.

Actions #9

Updated by Haowei Yuan over 9 years ago

  • % Done changed from 40 to 60

The std::unordered_map supports customized hash function, and a equal_to predicate can be provided to determine if two keys match. As a result, it seems possible to use std::unordered_map to achieve hash table functions. However, I have noticed two performance related issues so far.

  • lookup() The name tree lookup keys are the name prefix strings. To accelerate lookup, in the current hash table implementation, each hash table entry stores the full hash value, and string matching is required only if the hash values match.

It is not clear if the same optimization can be supported by std::unordered_map. For instance, to use the equal_to predicate, we could customize a key structure that contains both the hash value and the key. However, during a lookup, the hash value needs to be computed before calling the unordered_map.find() function, which wastes CPU cycles as the hash values would essentially be computed twice.

The following is an example using a customized key (original link: http://stackoverflow.com/questions/17016175/c-unordered-map-using-a-custom-class-type-as-the-key)

struct Key
{
  std::string first;
  std::string second;
  int         third;

  bool operator==(const Key &other) const
  { return (first == other.first
            && second == other.second
            && third == other.third);
  }
};

int main()
{
  std::unordered_map<Key,std::string> m6 = {
    { {"John", "Doe", 12}, "example"},
    { {"Mary", "Sue", 21}, "another"}
  };
}
  • Hash table resize()

The other benefit of storing hash values in the hash table is that, during resize(), we don't need to recompute the hash values.

The std::unordered_map rehash() function would require rehash all the items stored in the hash table. When the table size is large, the rehash() function may take considerable amount of time. So far, I have not found alternative methods for rehash().

I will continue exploring ways to accelerate lookup() and resize(). Please let me know if you are aware of any alternative methods.

Actions #10

Updated by Junxiao Shi over 9 years ago

during a lookup, the hash value needs to be computed before calling the unordered_map.find() function, which wastes CPU cycles as the hash values would essentially be computed twice

I don't understand this statement.

Consider the following key type:

class Key
{
public:
  // implicit conversion from Name
  Key(const Name& name)
    : hash(std::hash<Name>()(name))
    , m_name(name)
  {
  }

  // implicit conversion to Name
  operator Name()
  {
    return m_name;
  }

public:
  const size_t hash;

private:
  const Name m_name;
};

namespace std {

template<>
inline size_t
hash<Key>::operator()(const Key& key)
{
  return key.hash;
}

} // namespace std

When doing a lookup, NameTree constructs a Key instance which involves one Name hash computation (call std::hash<Name>::operator()).

There's no more Name hash computation inside the unordered_map.

During a rehash, there is no Name hash computation, because std::hash<Key>::operator() never invokes Name hash computation.

Actions #11

Updated by Haowei Yuan over 9 years ago

Thank you, Junxiao, this is very helpful.
I will try to go though the required functions with this Key structure, and see if all of them can be supported.

Actions #12

Updated by Davide Pesavento about 9 years ago

  • Target version changed from v0.3 to v0.5

any news here?

Actions #13

Updated by Haowei Yuan about 9 years ago

Davide Pesavento wrote:

any news here?

Not much recently, I will post some progress soon.

Actions #14

Updated by Haowei Yuan almost 9 years ago

I will write some code to verify all the existing functions can be supported this week, and estimate how much work will be involved if we decide to take this direction.

Actions #15

Updated by Junxiao Shi over 8 years ago

  • Status changed from In Progress to Abandoned

20160526 call decides to abandon this task, because there's no evidence that Haowei's hash table implementation has bugs, and switching to STL is likely to worsen performance.

Actions

Also available in: Atom PDF