Project

General

Profile

Actions

Task #1621

closed

Performance profiling

Added by Junxiao Shi over 10 years ago. Updated about 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Integration Tests
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:

Description

Profile the performance of NFD forwarder, and understand where the bottlenecks are.


Files

Actions #1

Updated by Anonymous over 10 years ago

  • Assignee set to Chengyu Fan
Actions #2

Updated by Chengyu Fan over 10 years ago

Wonder what is the result format for this task?
When the test is done with gprof (callgrind/kcachegrind or whatever the tool is) on ONL, should I produce a report for it?

Actions #3

Updated by Junxiao Shi over 10 years ago

The outcome should a report on what areas of NFD is the current bottleneck.

Examples:

  • PIT match algorithm for Interests with PublisherPublicKeyLocator consumes 40% of Interest processing time
  • Memory allocation for Data packets takes 20% of total CPU time
Actions #4

Updated by Chengyu Fan over 10 years ago

Besides that, I assume that the target nfd is the latest one, not v0.1.0, correct?

ONL has nfd v0.1.0 installed, and I don't have privilege to install any library. My plan is install latest one in my home directory, and do performance profiling on it.

Actions #5

Updated by Chengyu Fan over 10 years ago

  • Status changed from New to In Progress
Actions #6

Updated by Chengyu Fan over 10 years ago

I did this task using John's scripts. The scripts launch the NDN-traffic generator to continuously produce traffic. In this task, I use Callgrind, not gprof. gprof needs some code modification, so it is inconvenient.

I did the tests using different number of name segments, and different length of segment. They introduce some minor difference, the results are described as below.

  1. According to the Callgrind output, the libcrypto functions are the bottlenecks:

    (1) Each libcrypto function of the three (Baseline_MultiplyTop16, Baseline_Square16, and Baseline_MultiplyBottom) takes more than 4% of the running time.

    (2) Functions DivideByPower2Mod and AlmostInverse are called hundreds of times, but they are expensive, both take about 2%.

    I think this is a reasonable result, because crypto ops are relatively slow. The whole libcrypto object takes about 25% of the running time in my test.

  2. For the nfd library, I did not find obvious bottlenecks.

    (1) All the top cost functions are called a large number of times. ex: ndn::Block::value_size() takes about 2%~4% of the time, but they are called many times by the wireEncode and name component manipulate functions. Same as ndn::name:size() et al.

    (2) ndn::name::compare, ndn::block::parse are relatively slow. But both of them only take less than 1% of the running time in my tests.

    (3) when I use long length for segment name, the nfd::cs::find and nfd::cs::insertToSkipList takes longer time than other tests.

    Basically, the results are reasonable.

I attached the manual to do the test as well.

Any suggestions?

Actions #7

Updated by Junxiao Shi over 10 years ago

  • Status changed from In Progress to Resolved

I think the crypto functions are called only for management.
They are not used for every Interest/Data packet.
Is this correct?

Actions #8

Updated by Chengyu Fan over 10 years ago

I think so.

According to the output, the nfd functions in the caller list (indirect caller) are nfd::AppFace::sign, ndn::keyChain::sign and ndn::keyChain::signPakcetWrapper, and ndn::SecTpmFile::signInTpm. nfd::ManagerBase::sendResponse, nfd::FaceManager::onRemoveFace, and nfd::ManagerBase::sendResponse etc. I believe they are from the channel setup process.

All those data are from the NDN router that connect to other routers, not from the ones directly connect to application. Let me analyze those data as well.

Actions #9

Updated by Chengyu Fan over 10 years ago

I checked the callgrind output from the nfd on a server and a client. No obvious bottlenecks found.

crypto functions are not the bottlenecks on nfd nodes which directly connect to clients/servers. Because they just need to setup the channel to the nfd routers.
There are lots of call to ndn::name class, which is reasonable, because clients keep sending Interest and fetch data.

nfd::NameTree::eraseEntryIfEmpty and nfd::Cs::find on the client node are relatively slow. They have small called number (around 14,000), and they take around 0.5% and 0.3% running time separately (In contrast, functions like ndn::name::at takes about 2.66% running time, and called number is above 7millions). But since they are not called frequently, they are not bottlenecks.

The above result is true for nfd::Cs::insertToSkipList and nfd::Cs::find on the server node. They have small called number(around 20,000). And the running time for both is around 0.2%.

Actions #10

Updated by Alex Afanasyev over 10 years ago

What was the amount of forwarded Interests and Data packets?

Actions #11

Updated by Chengyu Fan over 10 years ago

Oh, I deleted that log by accident. Let me restate it using another output data:

  1. According to the output on one server, 11685 Interests received (recorded by the traffic server). And nfd::Cs::insertToSkipList and nfd::Cs::find are called 11702 and 11701 separately, which is a little bit more than the Interests received by traffic server, probably comes from the channel setup process.

  2. Similar to the output on one client. Total Interests Sent = 9019, Total Responses Received = 1717 (recorded by the traffic client). And nfd::NameTree::eraseEntryIfEmpty and nfd::Cs::find are called 9058, and 9061 separately.

Actions #12

Updated by Junxiao Shi over 10 years ago

  • Status changed from Resolved to Closed

20140626 conference call decides no additional performance profiling is necessary before release.

Updated by Chengyu Fan about 10 years ago

The uploaded files nfd-performance-profiling-on-ONL-steps is modified according to the latest script, and the analysis are for the NFDv0.2.

Future profiling tasks may need those files.

Actions

Also available in: Atom PDF