Bug #3722: Scheduler stops if a callback throws - ndn-cxx - NDN project issue tracking system

Actions

Copy link

Bug #3722

closed

Scheduler stops if a callback throws

Added by Junxiao Shi over 9 years ago. Updated over 9 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Junxiao Shi

Category:

Utils

Target version:

v0.5

Start date:

08/06/2016

Due date:

% Done:

100%

Estimated time:

2.00 h

Description

Steps to reproduce:

declare a ndn::util::scheduler::Scheduler
scheduler two events; the first event callback throws an exception
invoke io.run(), catch the exception thrown by the first event callback, and restart io.run()

Expected: the second event is executed

Actual: the second event is never executed

Actions

Copy link

Updated by Junxiao Shi over 9 years ago

Related to Bug #3691: Scheduler::cancelAllEvents causes memory error in ScopedEventId added

Actions

Copy link

Updated by Davide Pesavento over 9 years ago

Just because I noticed this problem during the code review for #3691, it doesn't mean it's "related to" #3691...

Actions

Copy link

Updated by Junxiao Shi over 9 years ago

Status changed from New to In Progress
Assignee set to Junxiao Shi

Actions

Copy link

Updated by Junxiao Shi over 9 years ago

Related to deleted (Bug #3691: Scheduler::cancelAllEvents causes memory error in ScopedEventId)

Actions

Copy link

Updated by Junxiao Shi over 9 years ago

Status changed from In Progress to Code review
% Done changed from 0 to 100

https://gerrit.named-data.net/3036

Actions

Copy link

Updated by Junxiao Shi over 9 years ago

https://gerrit.named-data.net/3037 patchset1 adds a benchmark for Scheduler.

I compared the Scheduler before #3691 changes, https://gerrit.named-data.net/3035 patchset6, https://gerrit.named-data.net/3036 patchset2.
ONL pc2core, configure line --with-tests --without-pch --enable-static --disable-shared, 1000000 events, duration in micros.

version	schedule	cancel	execute	schedule+execute
commit:750472b63d7df6254540356bb09647132b1e3104	368277	506833	133406	501684
commit:901f4a1ee7f71aa7a3ab48dafc3e912c504fdef0	412478	492051	112641	525119
commit:a482df9d511e6269844e990095fa64a8d61b92f3	414450	527419	117375	531825

I believe 30 milliseconds extra overhead per million events to resolve two bugs should be acceptable, and no further optimization is needed.

Actions

Copy link

Updated by Alex Afanasyev over 9 years ago

While 30 milliseconds doesn't sounds a lot, 6 percent increase which it equivalents sounds non-trivial increase.

Given the problem in this bug is kind of corner case, can we have a solution that does not have such an increase of penalty? What is the reason for the penalty?

Actions

Copy link

Updated by Davide Pesavento over 9 years ago

Alex Afanasyev wrote:

While 30 milliseconds doesn't sounds a lot, 6 percent increase which it equivalents sounds non-trivial increase.

True, 6% is quite a lot.

Looking at the numbers in note-6 again, the increase in cancel benchmark after commit a482df9d511e6269844e990095fa64a8d61b92f3 is very strange, since that commit doesn't touch cancelEvent(). Junxiao, do you have an explanation?

Actions

Copy link

Updated by Junxiao Shi over 9 years ago

I'll rerun the benchmark on ONL and see what it gives.
Vagrant box has frequent slowdowns recently.

The result is actually from ONL.
I guess I need to run the benchmark multiple times to average out external factors such as OS scheduling.

Actions

Copy link

#10

Updated by Junxiao Shi over 9 years ago

I ran the benchmarks again. This time it's on a Vagrant trusty 64-bit box, since ONL is unavailable at the moment.

First I compile three versions into three binaries, statically linked.
Then I run those binaries 50 times.

TRIALS=50
CASES='base 3691 3722'

for CASE in $CASES; do
  > $CASE.log
done

for I in $(seq $TRIALS); do
  echo $I
  for CASE in $CASES; do
    echo >> $CASE.log
    date -u >> $CASE.log
    ./scheduler-benchmark-$CASE -l message >> $CASE.log
  done
done

Finally, I collect the results:

for CASE in $CASES; do
  for METRIC in schedule cancel execute; do
    echo -n $CASE $METRIC ' '
    echo '('$(awk '$1=="'$METRIC'"{print $4}' $CASE.log | tr '\n' '+')'0)/'$(awk '$1=="'$METRIC'"' $CASE.log | wc -l)'/1000' | bc
  done
done

This gives the average duration of each million operations:

$CASE	patchset	schedule	cancel	execute	schedule+execute
base	3037,4	1940027	19582392	413383	2353410
#3691	3035,7	2036830	19590815	324531	2361361
#3722	3036,3	2039858	19676666	319231	2359089

One thing I notice during the benchmarks is: softirqd kernel thread is using up one CPU core.

Another observation is that there are a small number of failed tests, when the event recording completion time is executed before some other events.
When this occurs, "execute" metric would be unavailable.

vagrant@m0212:~/benchmark$ grep 'error in' base.log 3691.log 3722.log
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999995 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]

Actions

Copy link

#11

Updated by Junxiao Shi over 9 years ago

Status changed from Code review to Closed

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

NFD » ndn-cxx

Tags

Bug #3722

Scheduler stops if a callback throws

Updated by Junxiao Shi over 9 years ago

Updated by Davide Pesavento over 9 years ago

Updated by Junxiao Shi over 9 years ago

Updated by Junxiao Shi over 9 years ago

Updated by Junxiao Shi over 9 years ago

Updated by Junxiao Shi over 9 years ago

Updated by Alex Afanasyev over 9 years ago

Updated by Davide Pesavento over 9 years ago

Updated by Junxiao Shi over 9 years ago

Updated by Junxiao Shi over 9 years ago

Updated by Junxiao Shi over 9 years ago