Bug #1432: Performance collapse under 400 Interests/s after 1min - NFD - NDN project issue tracking system

Actions

Copy link

Bug #1432

closed

Performance collapse under 400 Interests/s after 1min

Added by Junxiao Shi over 11 years ago. Updated over 11 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Ilya Moiseenko

Category:

Tables

Target version:

v0.1

Start date:

04/01/2014

Due date:

% Done:

100%

Estimated time:

Description

Reported by John DeHart.

Under a constant load of interest traffic (where each interest is for a unique name), NFD's performance degrades to the point where it can no longer keep up with the incoming interest traffic.
Interests then start building up in the socket buffer and eventually interests start getting dropped at the socket layer.
This occurs even with a total load as low as 100 interests/sec.

Using 9 ONL hosts:

1 of the hosts is a central router
4 of the hosts are servers running ndn-traffic-server to serve content
4 of the hosts are clients running ndn-traffic to generate interests.
All the clients use the central router to get to the servers.

Each client sending a constant load of 100 Interests/sec. Total load is 400 Interests/sec on the router node. The router keeps up for about 1 minute. Then its performance starts tailing off.

With total of 200 Interests/sec, the router lasts for about 3 minutes before tailing off.

With total of 100 Interests/sec, the router lasts for about 9 minutes before tailing off.

The above is using UDP faces.

Files

Archive.zip (14.3 KB) Archive.zip

Ilya Moiseenko, 04/05/2014 03:13 PM

Actions

Copy link

Updated by Junxiao Shi over 11 years ago

Category set to Tables
Assignee set to Ilya Moiseenko

From John DeHart:

Our theory is that as the content store grows something is taking
longer and longer to process interests or content.

If we set the content store nMaxpackets=10000 in cs.hpp we do not see any performance degradation even after several minutes.

This means CS match algorithm needs improvement, because it's the only module that is affected by number of stored packets.

Actions

Copy link

Updated by Ilya Moiseenko over 11 years ago

I'll take a look. It could be cache cleanup.

Actions

Copy link

Updated by Ilya Moiseenko over 11 years ago

I've submitted a commit http://gerrit.named-data.net/612 which disables cache cleanup based on content staleness. It uses priority queue and I think this is the source of the problems. So without it, cache replacement will be pure LRU.

I need performance feedback on this change. If performance improves significantly, I'll be looking for ways to consider packet staleness without strict ordering.

Actions

Copy link

Updated by John DeHart over 11 years ago

I've rebuilt with commit 612 and re-run my test and there is no change.
NFD performance still degrades in the same way.

Actions

Copy link

Updated by Ilya Moiseenko over 11 years ago

Actions

Copy link

Updated by Junxiao Shi over 11 years ago

It's reported that there may be a platform-dependent issue:

Just a little bit more to observations. When I tried to run Ilya's code on my laptop, it performed ok.
When I tried to run the same code on one of the lab's Ubuntu machines, I got the same problem as John.
There is also very strange memory "problem". Somehow the test case, even on my machine, consumes a lot of memory (~2G) while it shouldn't --- CS cap is like 50k items and each item is an empty data packet (~100bytes).

We found one of our students who had a new Mac and we ran the test on his laptop. We got results similar to yours.
When we run it on slightly older Macs we get crappy results.

I6c5a312456ffcd8cab4529e3fc107d48992801bc disables ContentStore completely.
Please test this patch on different platforms and compilers, in order to determine whether a platform issue outside of ContentStore is causing the bug.

When you report back, be sure to mention: operating system version, compiler version, Boost library version.

Actions

Copy link

Updated by John DeHart over 11 years ago

We have done some experiments with smaller sized ContentStores and we
see improvement in performance as the CS gets smaller.
With a size of 0 we are probably simulating the removal of the CS
without changing a lot of code.

We just change this line in cs.hpp:
Cs(int nMaxPackets = 65536); // ~500MB with average packet size = 8KB
to change the nMaxPackets.

Actions

Copy link

Updated by John DeHart over 11 years ago

When we run on ONL with nMaxPackets=0 it works beautifully at 400 interests/sec. It has been running for about
and hour with no performance degradation.

nfd does however continue to grow in size.

Actions

Copy link

Updated by Davide Pesavento over 11 years ago

John DeHart wrote:

nfd does however continue to grow in size.

That could be an unrelated memory leak somewhere else in the code. I suggest you file a separate bug report.

Actions

Copy link

#10

Updated by John DeHart over 11 years ago

I6c5a312456ffcd8cab4529e3fc107d48992801bc disables ContentStore completely.
Please test this patch on different platforms and compilers, in order to determine whether a platform issue outside of ContentStore is causing the bug.

When you report back, be sure to mention: operating system version, compiler version, Boost library version.

When I run (5 minutes so far) this patch on ONL (Ubuntu 12.04.4, g++ ver 4.6.3, boost 1.48) I see no performance degradation and nfd does NOT grow in size.

Actions

Copy link

#11

Updated by Yi Huang over 11 years ago

Settings:

3 VM nodes connected like this: A-B-C. All 3 nodes are the same.

Platform: Ubuntu 12.04

g++ version 4.6.3

boost 1.48

I ran the test using different frequency with Junxiao's update to completely by-pass CS. Looks like RTT and interest-loss rate has a significant increase when frequency changes from 250 interests/sec to ~333 interests/sec.

Here's the table I made with collected data:

interval(ms)	total-interests	interest-loss(%)	average-RTT(ms)
80	10000	0	73.7064
40	10000	0.04	194.4467
20	10000	0.06	307.9077
10	10000	0.30	330.7906
5	10000	0.79	415.1157
4	10000	2.07	621.9645
3	10000	9.62	1620.2501
2	10000	55.36	2050.1325
1	10000	75.48	2193.4072

Actions

Copy link

#12

Updated by Ilya Moiseenko over 11 years ago

Heap allocation is expensive. We do make_shared for every Interest and Data packet. In Content Store I call new on per-packet basis as well. But I can pre-allocate memory for my internal processing, so there will be no "new" at any place in Content Store. Not sure what to do with incoming packets.

Actions

Copy link

#13

Updated by Beichuan Zhang over 11 years ago

Yi's numbers are premature. They were obtained from 3 virtual machines on the same physical machine. We don't know how to interpret the numbers yet. He's looking into it.

Actions

Copy link

#14

Updated by Junxiao Shi over 11 years ago

Status changed from New to In Progress

Heap allocation is expensive. Using a memory pool based allocator is possible, but that would be a major change that needs a lot of work.

I suspect ContentStore is holding more Data packets than nMaxPackets:

CS has three queues: unsolicited, staleness, arrival
Cs::evictItem looks at those queues one by one, and evicts the first entry found
When an entry is evicted, it's deleted from the skip list, and popped from the queue where it's found, but not popped from other queues
Suppose an entry is found in staleness queue, it won't be popped from the arrival queue, so its shared_ptr is still kept in arrival queue
Data packet will not be deallocated if a shared_ptr to the CS entry exists in arrival queue
As long as there is some stale packet, nothing is popped from arrival queue, so it will keep growing

This memory problem is related to this bug, because the unbounded growth of arrival queue causes high memory utilization, and makes further memory allocation slower.

Actions

Copy link

#15

Updated by Ilya Moiseenko over 11 years ago

Queues are already replaced with boost multi_index. Situation with memory is better, but performance is almost the same.

Actions

Copy link

#16

Updated by Anonymous over 11 years ago

Is an updated patch set with multi_index available?

Actions

Copy link

#17

Updated by Ilya Moiseenko over 11 years ago

File Archive.zip Archive.zip added

Yes, archive is attached. Unit test is called "TableCs/StressTest".

Actions

Copy link

#18

Updated by Junxiao Shi over 11 years ago

I'm uploading I01a6fc08171619f94aa3fe54daa560ecf301adae patchset 3.

This CS stub has full functionality. Cleanup/eviction is also implemented.

Please try this patchset on ONL.

Stress test¶

Code: I01a6fc08171619f94aa3fe54daa560ecf301adae patchset 3, tests/daemon/cs.cpp, "Stress" test case

Tested units:

stubCS: I01a6fc08171619f94aa3fe54daa560ecf301adae patchset 3
masterCS: ContentStore in master branch commit b07788af059a6b4128049fe97ddf3fcb6126a5b0

Machines:

Ubuntu: Ubuntu 12.04 64-bit, VirtualBox, Xeon E5-2680 2.70GHz, 4GB memory, gcc 4.6.3, Boost 1.48
OSX: OSX 10.9 64-bit, Mac Mini, Core i5-3210M 2.50GHz, 4GB memory, clang 500.2.79, Boost 1.55

nIterations	stubCS on Ubuntu	masterCS on Ubuntu	stubCS on OSX	masterCS on OSX
1,000	0.200s	0.320s	0.026s	0.042s
10,000	2.119s	13.062s	0.273s	2.028s
25,000	5.243s	74.396s	0.700s	20.264s
100,000	23.783s	1567.259s	3.175s	1062.815s
1,000,000	252.518s	NA	33.552s	NA

Observations¶

stubCS is close to linear time. masterCS execution time grows faster than linear.

Execution time on OSX is faster than on Ubuntu, possibly due to clang optimization.

Actions

Copy link

#19

Updated by Ilya Moiseenko over 11 years ago

Haowei found computational bug in the code. John found a memory leak. I optimized heap allocations and memory consumption.

Patch is here http://gerrit.named-data.net/#/c/626/

Performance with unit-test on Linux (UBUNTU 12.04 VM, i7 2.4 GHZ)
1M insertions/lookups = 18 seconds
2M insertions/lookups = 37 seconds
4M insertions/lookups = 75 seconds

Performance with unit-test on OS X (10.9, i7 2.4GHZ )
1M insertions/lookups = 14 seconds
2M insertions/lookups = 29 seconds
4M insertions/lookups = 60 seconds

Actions

Copy link

#20

Updated by Ilya Moiseenko over 11 years ago

With cache size = 200K (3.5x increase)

1M insertions/lookups = 15 seconds, 2M insertions/lookups = 30 seconds, 4M insertions/lookups = 62 seconds

Actions

Copy link

#21

Updated by Junxiao Shi over 11 years ago

stubCS: I01a6fc08171619f94aa3fe54daa560ecf301adae patchset 3
I626-1: I7f6752ec279e64e81c90c0b3e8d756da34194965 patchset 1

nIterations	stubCS on Ubuntu	I626-1 on Ubuntu	stubCS on OSX	I626-1 on OSX
1,000	0.200s	0.114s	0.026s	0.048s
10,000	2.119s	1.275s	0.273s	0.401s
25,000	5.243s	3.288s	0.700s	1.280s
100,000	23.783s	14.186s	3.175s	5.049s
1,000,000	252.518s	149.774s	33.552s	52.406s

With nMaxPackets=50000, Ilya's new code beats the stub on Ubuntu, but not on OSX.

Actions

Copy link

#22

Updated by John DeHart over 11 years ago

The latest patch from Ilya works well with the 400 interests/sec test on ONL.
I see no performance degradation.

I am now working on beefing up my ONL tests to do higher rates.

I'll also re-check for memory leaks with some longer tests in ONL.

Actions

Copy link

#23

Updated by John DeHart over 11 years ago

I have made one round of improvements to my ONL tests and am now
running the nfd (with Ilya's latest patch) at 8000 interests/sec without any degradation
and without any sign of memory leaks.

I will continue to beef up the tests and see how high I can go.

Actions

Copy link

#24

Updated by Junxiao Shi over 11 years ago

Ilya should update the code to:

address review comments
disable stress test by default (so that Jenkins won't be slow)

and then mark this bug as Resolved.

Actions

Copy link

#25

Updated by Alex Afanasyev over 11 years ago

Status changed from In Progress to Closed
% Done changed from 0 to 100

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

NFD

Tags

Bug #1432

Performance collapse under 400 Interests/s after 1min

Updated by Junxiao Shi over 11 years ago

Updated by Ilya Moiseenko over 11 years ago

Updated by Ilya Moiseenko over 11 years ago

Updated by John DeHart over 11 years ago

Updated by Ilya Moiseenko over 11 years ago

Updated by Junxiao Shi over 11 years ago

Updated by John DeHart over 11 years ago

Updated by John DeHart over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by John DeHart over 11 years ago

Updated by Yi Huang over 11 years ago

Updated by Ilya Moiseenko over 11 years ago

Updated by Beichuan Zhang over 11 years ago

Updated by Junxiao Shi over 11 years ago

Updated by Ilya Moiseenko over 11 years ago

Updated by Anonymous over 11 years ago

Updated by Ilya Moiseenko over 11 years ago

Updated by Junxiao Shi over 11 years ago

Stress test¶

Observations¶

Updated by Ilya Moiseenko over 11 years ago

Updated by Ilya Moiseenko over 11 years ago

Updated by Junxiao Shi over 11 years ago

Updated by John DeHart over 11 years ago

Updated by John DeHart over 11 years ago

Updated by Junxiao Shi over 11 years ago

Updated by Alex Afanasyev over 11 years ago