Bug #1432
closedPerformance collapse under 400 Interests/s after 1min
Added by Junxiao Shi over 10 years ago. Updated over 10 years ago.
100%
Description
Reported by John DeHart.
Under a constant load of interest traffic (where each interest is for a unique name), NFD's performance degrades to the point where it can no longer keep up with the incoming interest traffic.
Interests then start building up in the socket buffer and eventually interests start getting dropped at the socket layer.
This occurs even with a total load as low as 100 interests/sec.
Using 9 ONL hosts:
- 1 of the hosts is a central router
- 4 of the hosts are servers running ndn-traffic-server to serve content
- 4 of the hosts are clients running ndn-traffic to generate interests.
- All the clients use the central router to get to the servers.
Each client sending a constant load of 100 Interests/sec. Total load is 400 Interests/sec on the router node. The router keeps up for about 1 minute. Then its performance starts tailing off.
With total of 200 Interests/sec, the router lasts for about 3 minutes before tailing off.
With total of 100 Interests/sec, the router lasts for about 9 minutes before tailing off.
The above is using UDP faces.
Files
Archive.zip (14.3 KB) Archive.zip | Ilya Moiseenko, 04/05/2014 03:13 PM |
Updated by Junxiao Shi over 10 years ago
- Category set to Tables
- Assignee set to Ilya Moiseenko
From John DeHart:
Our theory is that as the content store grows something is taking
longer and longer to process interests or content.
If we set the content store nMaxpackets=10000 in cs.hpp we do not see any performance degradation even after several minutes.
This means CS match algorithm needs improvement, because it's the only module that is affected by number of stored packets.
Updated by Ilya Moiseenko over 10 years ago
I'll take a look. It could be cache cleanup.
Updated by Ilya Moiseenko over 10 years ago
I've submitted a commit http://gerrit.named-data.net/612 which disables cache cleanup based on content staleness. It uses priority queue and I think this is the source of the problems. So without it, cache replacement will be pure LRU.
I need performance feedback on this change. If performance improves significantly, I'll be looking for ways to consider packet staleness without strict ordering.
Updated by John DeHart over 10 years ago
I've rebuilt with commit 612 and re-run my test and there is no change.
NFD performance still degrades in the same way.
Updated by Junxiao Shi over 10 years ago
It's reported that there may be a platform-dependent issue:
Just a little bit more to observations. When I tried to run Ilya's code on my laptop, it performed ok.
When I tried to run the same code on one of the lab's Ubuntu machines, I got the same problem as John.
There is also very strange memory "problem". Somehow the test case, even on my machine, consumes a lot of memory (~2G) while it shouldn't --- CS cap is like 50k items and each item is an empty data packet (~100bytes).We found one of our students who had a new Mac and we ran the test on his laptop. We got results similar to yours.
When we run it on slightly older Macs we get crappy results.
I6c5a312456ffcd8cab4529e3fc107d48992801bc disables ContentStore completely.
Please test this patch on different platforms and compilers, in order to determine whether a platform issue outside of ContentStore is causing the bug.
When you report back, be sure to mention: operating system version, compiler version, Boost library version.
Updated by John DeHart over 10 years ago
We have done some experiments with smaller sized ContentStores and we
see improvement in performance as the CS gets smaller.
With a size of 0 we are probably simulating the removal of the CS
without changing a lot of code.
We just change this line in cs.hpp:
Cs(int nMaxPackets = 65536); // ~500MB with average packet size = 8KB
to change the nMaxPackets.
Updated by John DeHart over 10 years ago
When we run on ONL with nMaxPackets=0 it works beautifully at 400 interests/sec. It has been running for about
and hour with no performance degradation.
nfd does however continue to grow in size.
Updated by Davide Pesavento over 10 years ago
John DeHart wrote:
nfd does however continue to grow in size.
That could be an unrelated memory leak somewhere else in the code. I suggest you file a separate bug report.
Updated by John DeHart over 10 years ago
I6c5a312456ffcd8cab4529e3fc107d48992801bc disables ContentStore completely.
Please test this patch on different platforms and compilers, in order to determine whether a platform issue outside of ContentStore is causing the bug.When you report back, be sure to mention: operating system version, compiler version, Boost library version.
When I run (5 minutes so far) this patch on ONL (Ubuntu 12.04.4, g++ ver 4.6.3, boost 1.48) I see no performance degradation and nfd does NOT grow in size.
Updated by Yi Huang over 10 years ago
Settings:
3 VM nodes connected like this: A-B-C. All 3 nodes are the same.
Platform: Ubuntu 12.04
g++ version 4.6.3
boost 1.48
I ran the test using different frequency with Junxiao's update to completely by-pass CS. Looks like RTT and interest-loss rate has a significant increase when frequency changes from 250 interests/sec to ~333 interests/sec.
Here's the table I made with collected data:
interval(ms) | total-interests | interest-loss(%) | average-RTT(ms) |
---|---|---|---|
80 | 10000 | 0 | 73.7064 |
40 | 10000 | 0.04 | 194.4467 |
20 | 10000 | 0.06 | 307.9077 |
10 | 10000 | 0.30 | 330.7906 |
5 | 10000 | 0.79 | 415.1157 |
4 | 10000 | 2.07 | 621.9645 |
3 | 10000 | 9.62 | 1620.2501 |
2 | 10000 | 55.36 | 2050.1325 |
1 | 10000 | 75.48 | 2193.4072 |
Updated by Ilya Moiseenko over 10 years ago
Heap allocation is expensive. We do make_shared for every Interest and Data packet. In Content Store I call new on per-packet basis as well. But I can pre-allocate memory for my internal processing, so there will be no "new" at any place in Content Store. Not sure what to do with incoming packets.
Updated by Beichuan Zhang over 10 years ago
Yi's numbers are premature. They were obtained from 3 virtual machines on the same physical machine. We don't know how to interpret the numbers yet. He's looking into it.
Updated by Junxiao Shi over 10 years ago
- Status changed from New to In Progress
Heap allocation is expensive. Using a memory pool based allocator is possible, but that would be a major change that needs a lot of work.
I suspect ContentStore is holding more Data packets than nMaxPackets:
- CS has three queues: unsolicited, staleness, arrival
Cs::evictItem
looks at those queues one by one, and evicts the first entry found- When an entry is evicted, it's deleted from the skip list, and popped from the queue where it's found, but not popped from other queues
- Suppose an entry is found in staleness queue, it won't be popped from the arrival queue, so its shared_ptr is still kept in arrival queue
- Data packet will not be deallocated if a shared_ptr to the CS entry exists in arrival queue
- As long as there is some stale packet, nothing is popped from arrival queue, so it will keep growing
This memory problem is related to this bug, because the unbounded growth of arrival queue causes high memory utilization, and makes further memory allocation slower.
Updated by Ilya Moiseenko over 10 years ago
Queues are already replaced with boost multi_index. Situation with memory is better, but performance is almost the same.
Updated by Anonymous over 10 years ago
Is an updated patch set with multi_index available?
Updated by Ilya Moiseenko over 10 years ago
- File Archive.zip Archive.zip added
Yes, archive is attached. Unit test is called "TableCs/StressTest".
Updated by Junxiao Shi over 10 years ago
I'm uploading I01a6fc08171619f94aa3fe54daa560ecf301adae patchset 3.
This CS stub has full functionality. Cleanup/eviction is also implemented.
Please try this patchset on ONL.
Stress test¶
Code: I01a6fc08171619f94aa3fe54daa560ecf301adae patchset 3, tests/daemon/cs.cpp, "Stress" test case
Tested units:
- stubCS: I01a6fc08171619f94aa3fe54daa560ecf301adae patchset 3
- masterCS: ContentStore in master branch commit b07788af059a6b4128049fe97ddf3fcb6126a5b0
Machines:
- Ubuntu: Ubuntu 12.04 64-bit, VirtualBox, Xeon E5-2680 2.70GHz, 4GB memory, gcc 4.6.3, Boost 1.48
- OSX: OSX 10.9 64-bit, Mac Mini, Core i5-3210M 2.50GHz, 4GB memory, clang 500.2.79, Boost 1.55
nIterations | stubCS on Ubuntu | masterCS on Ubuntu | stubCS on OSX | masterCS on OSX |
---|---|---|---|---|
1,000 | 0.200s | 0.320s | 0.026s | 0.042s |
10,000 | 2.119s | 13.062s | 0.273s | 2.028s |
25,000 | 5.243s | 74.396s | 0.700s | 20.264s |
100,000 | 23.783s | 1567.259s | 3.175s | 1062.815s |
1,000,000 | 252.518s | NA | 33.552s | NA |
Observations¶
stubCS is close to linear time. masterCS execution time grows faster than linear.
Execution time on OSX is faster than on Ubuntu, possibly due to clang optimization.
Updated by Ilya Moiseenko over 10 years ago
Haowei found computational bug in the code. John found a memory leak. I optimized heap allocations and memory consumption.
Patch is here http://gerrit.named-data.net/#/c/626/
Performance with unit-test on Linux (UBUNTU 12.04 VM, i7 2.4 GHZ)
1M insertions/lookups = 18 seconds
2M insertions/lookups = 37 seconds
4M insertions/lookups = 75 seconds
Performance with unit-test on OS X (10.9, i7 2.4GHZ )
1M insertions/lookups = 14 seconds
2M insertions/lookups = 29 seconds
4M insertions/lookups = 60 seconds
Updated by Ilya Moiseenko over 10 years ago
With cache size = 200K (3.5x increase)
1M insertions/lookups = 15 seconds, 2M insertions/lookups = 30 seconds, 4M insertions/lookups = 62 seconds
Updated by Junxiao Shi over 10 years ago
- stubCS: I01a6fc08171619f94aa3fe54daa560ecf301adae patchset 3
- I626-1: I7f6752ec279e64e81c90c0b3e8d756da34194965 patchset 1
nIterations | stubCS on Ubuntu | I626-1 on Ubuntu | stubCS on OSX | I626-1 on OSX |
---|---|---|---|---|
1,000 | 0.200s | 0.114s | 0.026s | 0.048s |
10,000 | 2.119s | 1.275s | 0.273s | 0.401s |
25,000 | 5.243s | 3.288s | 0.700s | 1.280s |
100,000 | 23.783s | 14.186s | 3.175s | 5.049s |
1,000,000 | 252.518s | 149.774s | 33.552s | 52.406s |
With nMaxPackets=50000, Ilya's new code beats the stub on Ubuntu, but not on OSX.
Updated by John DeHart over 10 years ago
The latest patch from Ilya works well with the 400 interests/sec test on ONL.
I see no performance degradation.
I am now working on beefing up my ONL tests to do higher rates.
I'll also re-check for memory leaks with some longer tests in ONL.
Updated by John DeHart over 10 years ago
I have made one round of improvements to my ONL tests and am now
running the nfd (with Ilya's latest patch) at 8000 interests/sec without any degradation
and without any sign of memory leaks.
I will continue to beef up the tests and see how high I can go.
Updated by Junxiao Shi over 10 years ago
Ilya should update the code to:
- address review comments
- disable stress test by default (so that Jenkins won't be slow)
and then mark this bug as Resolved.
Updated by Alex Afanasyev over 10 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100