Task #3610: FIB+PIT profiling - NFD - NDN project issue tracking system

Actions

Copy link

Task #3610

closed

FIB+PIT profiling

Added by Junxiao Shi about 9 years ago. Updated over 7 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Chengyu Fan

Category:

Target version:

v0.5

Start date:

Due date:

% Done:

100%

Estimated time:

12.00 h

Description

Profile the performance of FIB and PIT, and understand where the bottlenecks are.

Files

Download all files

callgrind.pit-fib-performance-profiling.out (586 KB) callgrind.pit-fib-performance-profiling.out	Callgrind profiling output file for pit-fib-performance test on a ONL Host8core machine	Chengyu Fan, 08/29/2016 01:39 PM
callgrind.pit-fib-performance-profiling.out (66.1 KB) callgrind.pit-fib-performance-profiling.out	Callgrind profiling output file for the portion between two time::steady_clock::now() calls in pit-fib-performance test (9685cc5682c9e1d642747649294f469e316733ab) on a ONL Host8core machine	Chengyu Fan, 08/30/2016 01:22 PM
call-graph-for-pit-fib-operations.png (175 KB) call-graph-for-pit-fib-operations.png	The profiling call graph for pit-fib operations (exclude generatePacketsAndPopulateFib())	Chengyu Fan, 08/30/2016 02:06 PM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Junxiao Shi about 9 years ago

Blocked by Feature #3571: FIB+PIT benchmark added

Actions

Copy link

Updated by Zhuo Li about 9 years ago

Start date set to 04/29/2016

Actions

Copy link

Updated by Zhuo Li about 9 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Junxiao Shi almost 9 years ago

Status changed from In Progress to New
Assignee deleted (~~Zhuo Li~~)
Start date deleted (~~04/29/2016~~)

Beichuan says Zhuo will not work on this recently.

Actions

Copy link

Updated by Junxiao Shi almost 9 years ago

Assignee set to Chengyu Fan

Chengyu agrees to work on this issue but he cannot start until August.

Actions

Copy link

Updated by Chengyu Fan almost 9 years ago

File callgrind.pit-fib-performance-profiling.out callgrind.pit-fib-performance-profiling.out added
% Done changed from 0 to 100

I have run the pit_fib performance test on an ONL Host8core machine. The Callgrind profiling output file is uploaded. Please use kcachegrind to open it for details.

The pit-fib-benchmark.cpp simulates FIB and PIT operations on one million Interest-Data exchanges. It generates packets and populates FIB before the actual tests. The tests include finding longest prefix match in FIB, finding matched Data in PIT and erasing PIT entries.

The results show that the FIB and PIT tests take 11374372 microseconds. The whole pit-fib-benchmark takes 25.655s. This means more than half of the time is used by the generatePacketsAndPopulateFib() function. FIB and PIT operations (exclude generatePacketsAndPopulateFib()) can achieve 87,950 Interest-Data exchanges per second.

More information is given below:


(1) generatePacketsAndPopulateFib() takes 49.15% of the running time.
     Its child, the ndn::Name::Name constructor uses around 26.06%. The other child nfd::fib::insert() takes 8.74%

(2) Pit as a whole takes 44.46%
     nfd::Pit::findOrInsert() takes 19.92%
     nfd::Pit::findAllDataMatches() takes 16.37%
     nfd::Pit::erase() uses the other 8.32%

(3) the most time-consuming ndn function is ndn::Name::wireEncode() 21.51%.
     It is called by both nfd::name_tree::computeHash() 14.00% and nfd::name_tree::computeHashSet() 7.52%

(4) NameTree:
    a. the most time-consuming NameTree function is nfd::NameTree::lookup() 26.01%.
       The Fib::insert() contributes 8.74% (called by generatePacketsAndPopulateFib()). The rest is from Pit::findOrInsert() 19.92%.

    b. another time-consuming function is nfd::NameTree::findAllMatches() 12.90%. They are from nfd::Pit::findAllDataMatches()

Since the ndn::Name::wireEncode() alone uses 21.51% of time, I have tried to use TCMalloc library to optimize the memory management. The results confirmed my assumption. The FIB and PIT operations go down to 8499654 microseconds. This means FIB and PIT operations can achieve 117,647 Interest-Data exchanges per second.

Actions

Copy link

Updated by Junxiao Shi almost 9 years ago

Status changed from New to Feedback

I should have said this earlier, but generatePacketsAndPopulateFib should not be part of the profiling: it's the initial setup.

Can you report only the portion between two time::steady_clock::now() calls?

https://github.com/named-data/NFD/blob/dbef6dc8e2e32160b9e826ca0f32b9a00bb759df/tests/other/pit-fib-benchmark.cpp#L118-L133

Actions

Copy link

Updated by Davide Pesavento almost 9 years ago

Please provide the exact versions of NFD and ndn-cxx (git commit SHA) used to perform the test in note-6.

Actions

Copy link

Updated by Chengyu Fan almost 9 years ago

The number I reported is the time between two time::steady_clock::now() calls.

Since I have trouble to install openssl library for ndn-cxx on ONL, I used the older version. The ndn-cxx is 0.4.1 (commit SHA = 2e52d7cab4e03631db7f6c89a2daf7de590dd29e). The NFD is 4f1afaca16ba02b40651f62e0f28e56d0b7cbf9f, which is the commit for face system benchmark, but I modified the pit-fib-benchmark.cpp code as https://gerrit.named-data.net/#/c/3021/.

Please let me know if I chose the wrong version.

Without TCMalloc, the time between two time::steady_clock::now() calls is 11374372 microseconds. With TCMalloc, the time is 8499654 microseconds.

Actions

Copy link

#10

Updated by Junxiao Shi almost 9 years ago

It's necessary to rerun the profiling with latest version, because I have changed NameTree significantly.

The profiling should only report function calls between two nows; it should exclude anything under initial setup procedure.

Actions

Copy link Download all files

#11

Updated by Chengyu Fan almost 9 years ago

File callgrind.pit-fib-performance-profiling.out callgrind.pit-fib-performance-profiling.out added
File call-graph-for-pit-fib-operations.png call-graph-for-pit-fib-operations.png added

The profiling results for the latest NFD (9685cc5682c9e1d642747649294f469e316733ab) is uploaded. The ndn-cxx version used for this test is 667370f102f7953968a6d0d60a782cf0ef33d9d5. The profiling is running for the portion between two time::steady_clock::now() calls. Please refer to the uploaded file call-graph-for-pit-fib-operations.png for the call graph.

(1) nfd::Pit::findOrInsert() takes 42.33% of the running time

(2) nfd::Pit::findAllDataMatches() takes 39.51%

(3) nfd::Pit::erase() uses the other 16.23%

(4) ndn::Name::wireEncode() uses 40.27%, and it is called by nfd::name_tree::computesHash()

To measure the time between two time::steady_clock::now() calls, I run 10 experiments for both without TCMalloc and with TCMalloc.
Without TCMalloc, the time is in range (7833889, 10918891) microseconds, and the median is 8606859.5 microseconds. With TCMalloc, the time is in range (6012537, 8827668) microseconds, and the median is 6935110.5 microseconds.

Actions

Copy link

#12

Updated by Davide Pesavento almost 9 years ago

Chengyu Fan wrote:

The profiling is running for the portion between two time::steady_clock::now() calls.

How did you do that? Did you put CALLGRIND_{START,STOP}_INSTRUMENTATION macros in the code?

Actions

Copy link

#13

Updated by Chengyu Fan almost 9 years ago

Davide Pesavento wrote:

Chengyu Fan wrote:

The profiling is running for the portion between two time::steady_clock::now() calls.

How did you do that? Did you put CALLGRIND_{START,STOP}_INSTRUMENTATION macros in the code?

Yes, I added these macros in the pit-fib-benchmark.cpp code. Please let me know if there are any issues

Actions

Copy link

#14