Task #1621
closedPerformance profiling
100%
Description
Profile the performance of NFD forwarder, and understand where the bottlenecks are.
Files
Updated by Chengyu Fan over 10 years ago
Wonder what is the result format for this task?
When the test is done with gprof (callgrind/kcachegrind or whatever the tool is) on ONL, should I produce a report for it?
Updated by Junxiao Shi over 10 years ago
The outcome should a report on what areas of NFD is the current bottleneck.
Examples:
- PIT match algorithm for Interests with PublisherPublicKeyLocator consumes 40% of Interest processing time
- Memory allocation for Data packets takes 20% of total CPU time
Updated by Chengyu Fan over 10 years ago
Besides that, I assume that the target nfd is the latest one, not v0.1.0, correct?
ONL has nfd v0.1.0 installed, and I don't have privilege to install any library. My plan is install latest one in my home directory, and do performance profiling on it.
Updated by Chengyu Fan over 10 years ago
- File nfd-performance-profiling-on-ONL-steps.txt nfd-performance-profiling-on-ONL-steps.txt added
- % Done changed from 0 to 100
I did this task using John's scripts. The scripts launch the NDN-traffic generator to continuously produce traffic. In this task, I use Callgrind, not gprof. gprof needs some code modification, so it is inconvenient.
I did the tests using different number of name segments, and different length of segment. They introduce some minor difference, the results are described as below.
According to the Callgrind output, the libcrypto functions are the bottlenecks:
(1) Each libcrypto function of the three (
Baseline_MultiplyTop16
,Baseline_Square16
, andBaseline_MultiplyBottom
) takes more than 4% of the running time.(2) Functions
DivideByPower2Mod
andAlmostInverse
are called hundreds of times, but they are expensive, both take about 2%.I think this is a reasonable result, because crypto ops are relatively slow. The whole libcrypto object takes about 25% of the running time in my test.
For the nfd library, I did not find obvious bottlenecks.
(1) All the top cost functions are called a large number of times. ex:
ndn::Block::value_size()
takes about 2%~4% of the time, but they are called many times by the wireEncode and name component manipulate functions. Same asndn::name:size()
et al.(2)
ndn::name::compare
,ndn::block::parse
are relatively slow. But both of them only take less than 1% of the running time in my tests.(3) when I use long length for segment name, the
nfd::cs::find
andnfd::cs::insertToSkipList
takes longer time than other tests.Basically, the results are reasonable.
I attached the manual to do the test as well.
Any suggestions?
Updated by Junxiao Shi over 10 years ago
- Status changed from In Progress to Resolved
I think the crypto functions are called only for management.
They are not used for every Interest/Data packet.
Is this correct?
Updated by Chengyu Fan over 10 years ago
I think so.
According to the output, the nfd functions in the caller list (indirect caller) are nfd::AppFace::sign, ndn::keyChain::sign and ndn::keyChain::signPakcetWrapper, and ndn::SecTpmFile::signInTpm. nfd::ManagerBase::sendResponse, nfd::FaceManager::onRemoveFace, and nfd::ManagerBase::sendResponse etc. I believe they are from the channel setup process.
All those data are from the NDN router that connect to other routers, not from the ones directly connect to application. Let me analyze those data as well.
Updated by Chengyu Fan over 10 years ago
I checked the callgrind output from the nfd on a server and a client. No obvious bottlenecks found.
crypto functions are not the bottlenecks on nfd nodes which directly connect to clients/servers. Because they just need to setup the channel to the nfd routers.
There are lots of call to ndn::name class, which is reasonable, because clients keep sending Interest and fetch data.
nfd::NameTree::eraseEntryIfEmpty and nfd::Cs::find on the client node are relatively slow. They have small called number (around 14,000), and they take around 0.5% and 0.3% running time separately (In contrast, functions like ndn::name::at takes about 2.66% running time, and called number is above 7millions). But since they are not called frequently, they are not bottlenecks.
The above result is true for nfd::Cs::insertToSkipList and nfd::Cs::find on the server node. They have small called number(around 20,000). And the running time for both is around 0.2%.
Updated by Alex Afanasyev over 10 years ago
What was the amount of forwarded Interests and Data packets?
Updated by Chengyu Fan over 10 years ago
Oh, I deleted that log by accident. Let me restate it using another output data:
According to the output on one server, 11685 Interests received (recorded by the traffic server). And nfd::Cs::insertToSkipList and nfd::Cs::find are called 11702 and 11701 separately, which is a little bit more than the Interests received by traffic server, probably comes from the channel setup process.
Similar to the output on one client. Total Interests Sent = 9019, Total Responses Received = 1717 (recorded by the traffic client). And nfd::NameTree::eraseEntryIfEmpty and nfd::Cs::find are called 9058, and 9061 separately.
Updated by Junxiao Shi over 10 years ago
- Status changed from Resolved to Closed
20140626 conference call decides no additional performance profiling is necessary before release.
Updated by Chengyu Fan about 10 years ago
- File nfd-performance-profiling-on-ONL-steps-v0.2.txt nfd-performance-profiling-on-ONL-steps-v0.2.txt added
- File NFD-profiling-results-analysis-v0.2.pptx NFD-profiling-results-analysis-v0.2.pptx added
The uploaded files nfd-performance-profiling-on-ONL-steps is modified according to the latest script, and the analysis are for the NFDv0.2.
Future profiling tasks may need those files.