Task #3610
closedFIB+PIT profiling
Added by Junxiao Shi over 8 years ago. Updated almost 7 years ago.
100%
Description
Profile the performance of FIB and PIT, and understand where the bottlenecks are.
Files
callgrind.pit-fib-performance-profiling.out (586 KB) callgrind.pit-fib-performance-profiling.out | Callgrind profiling output file for pit-fib-performance test on a ONL Host8core machine | Chengyu Fan, 08/29/2016 01:39 PM | |
callgrind.pit-fib-performance-profiling.out (66.1 KB) callgrind.pit-fib-performance-profiling.out | Callgrind profiling output file for the portion between two time::steady_clock::now() calls in pit-fib-performance test (9685cc5682c9e1d642747649294f469e316733ab) on a ONL Host8core machine | Chengyu Fan, 08/30/2016 01:22 PM | |
call-graph-for-pit-fib-operations.png (175 KB) call-graph-for-pit-fib-operations.png | The profiling call graph for pit-fib operations (exclude generatePacketsAndPopulateFib()) | Chengyu Fan, 08/30/2016 02:06 PM |
Updated by Junxiao Shi over 8 years ago
- Blocked by Feature #3571: FIB+PIT benchmark added
Updated by Junxiao Shi over 8 years ago
- Status changed from In Progress to New
- Assignee deleted (
Zhuo Li) - Start date deleted (
04/29/2016)
Beichuan says Zhuo will not work on this recently.
Updated by Junxiao Shi over 8 years ago
- Assignee set to Chengyu Fan
Chengyu agrees to work on this issue but he cannot start until August.
Updated by Chengyu Fan about 8 years ago
- File callgrind.pit-fib-performance-profiling.out callgrind.pit-fib-performance-profiling.out added
- % Done changed from 0 to 100
I have run the pit_fib performance test on an ONL Host8core machine. The Callgrind profiling output file is uploaded. Please use kcachegrind to open it for details.
The pit-fib-benchmark.cpp simulates FIB and PIT operations on one million Interest-Data exchanges. It generates packets and populates FIB before the actual tests. The tests include finding longest prefix match in FIB, finding matched Data in PIT and erasing PIT entries.
The results show that the FIB and PIT tests take 11374372 microseconds. The whole pit-fib-benchmark takes 25.655s. This means more than half of the time is used by the generatePacketsAndPopulateFib() function. FIB and PIT operations (exclude generatePacketsAndPopulateFib()) can achieve 87,950 Interest-Data exchanges per second.
More information is given below:
(1) generatePacketsAndPopulateFib() takes 49.15% of the running time.
Its child, the ndn::Name::Name constructor uses around 26.06%. The other child nfd::fib::insert() takes 8.74%
(2) Pit as a whole takes 44.46%
nfd::Pit::findOrInsert() takes 19.92%
nfd::Pit::findAllDataMatches() takes 16.37%
nfd::Pit::erase() uses the other 8.32%
(3) the most time-consuming ndn function is ndn::Name::wireEncode() 21.51%.
It is called by both nfd::name_tree::computeHash() 14.00% and nfd::name_tree::computeHashSet() 7.52%
(4) NameTree:
a. the most time-consuming NameTree function is nfd::NameTree::lookup() 26.01%.
The Fib::insert() contributes 8.74% (called by generatePacketsAndPopulateFib()). The rest is from Pit::findOrInsert() 19.92%.
b. another time-consuming function is nfd::NameTree::findAllMatches() 12.90%. They are from nfd::Pit::findAllDataMatches()
Since the ndn::Name::wireEncode() alone uses 21.51% of time, I have tried to use TCMalloc library to optimize the memory management. The results confirmed my assumption. The FIB and PIT operations go down to 8499654 microseconds. This means FIB and PIT operations can achieve 117,647 Interest-Data exchanges per second.
Updated by Junxiao Shi about 8 years ago
- Status changed from New to Feedback
I should have said this earlier, but generatePacketsAndPopulateFib
should not be part of the profiling: it's the initial setup.
Can you report only the portion between two time::steady_clock::now()
calls?
https://github.com/named-data/NFD/blob/dbef6dc8e2e32160b9e826ca0f32b9a00bb759df/tests/other/pit-fib-benchmark.cpp#L118-L133
Updated by Davide Pesavento about 8 years ago
Please provide the exact versions of NFD and ndn-cxx (git commit SHA) used to perform the test in note-6.
Updated by Chengyu Fan about 8 years ago
The number I reported is the time between two time::steady_clock::now() calls.
Since I have trouble to install openssl library for ndn-cxx on ONL, I used the older version. The ndn-cxx is 0.4.1 (commit SHA = 2e52d7cab4e03631db7f6c89a2daf7de590dd29e). The NFD is 4f1afaca16ba02b40651f62e0f28e56d0b7cbf9f, which is the commit for face system benchmark, but I modified the pit-fib-benchmark.cpp code as https://gerrit.named-data.net/#/c/3021/.
Please let me know if I chose the wrong version.
Without TCMalloc, the time between two time::steady_clock::now() calls is 11374372 microseconds. With TCMalloc, the time is 8499654 microseconds.
Updated by Junxiao Shi about 8 years ago
It's necessary to rerun the profiling with latest version, because I have changed NameTree
significantly.
The profiling should only report function calls between two now
s; it should exclude anything under initial setup procedure.
Updated by Chengyu Fan about 8 years ago
- File callgrind.pit-fib-performance-profiling.out callgrind.pit-fib-performance-profiling.out added
- File call-graph-for-pit-fib-operations.png call-graph-for-pit-fib-operations.png added
The profiling results for the latest NFD (9685cc5682c9e1d642747649294f469e316733ab) is uploaded. The ndn-cxx version used for this test is 667370f102f7953968a6d0d60a782cf0ef33d9d5. The profiling is running for the portion between two time::steady_clock::now() calls. Please refer to the uploaded file call-graph-for-pit-fib-operations.png for the call graph.
(1) nfd::Pit::findOrInsert() takes 42.33% of the running time
(2) nfd::Pit::findAllDataMatches() takes 39.51%
(3) nfd::Pit::erase() uses the other 16.23%
(4) ndn::Name::wireEncode() uses 40.27%, and it is called by nfd::name_tree::computesHash()
To measure the time between two time::steady_clock::now() calls, I run 10 experiments for both without TCMalloc and with TCMalloc.
Without TCMalloc, the time is in range (7833889, 10918891) microseconds, and the median is 8606859.5 microseconds. With TCMalloc, the time is in range (6012537, 8827668) microseconds, and the median is 6935110.5 microseconds.
Updated by Davide Pesavento about 8 years ago
Chengyu Fan wrote:
The profiling is running for the portion between two time::steady_clock::now() calls.
How did you do that? Did you put CALLGRIND_{START,STOP}_INSTRUMENTATION
macros in the code?
Updated by Chengyu Fan about 8 years ago
Davide Pesavento wrote:
Chengyu Fan wrote:
The profiling is running for the portion between two time::steady_clock::now() calls.
How did you do that? Did you put
CALLGRIND_{START,STOP}_INSTRUMENTATION
macros in the code?
Yes, I added these macros in the pit-fib-benchmark.cpp code. Please let me know if there are any issues
Updated by Davide Pesavento about 8 years ago
That's good, it's what I had in mind. We should probably add those calls to the committed benchmark code for posterity's sake. I'll prepare a patch.
Updated by Davide Pesavento about 8 years ago
Updated by Davide Pesavento about 8 years ago
Junxiao asks on gerrit:
Can you post on Redmine the command line to generate a callgrind report? I need to test this.
$ valgrind --tool=callgrind --instr-atstart=no ./build/whatever
Updated by Davide Pesavento almost 7 years ago
- Category deleted (
Integration Tests) - Status changed from Feedback to Closed