Bug #3722
closedScheduler stops if a callback throws
100%
Description
Steps to reproduce:
- declare a
ndn::util::scheduler::Scheduler
- scheduler two events; the first event callback throws an exception
- invoke
io.run()
, catch the exception thrown by the first event callback, and restartio.run()
Expected: the second event is executed
Actual: the second event is never executed
Updated by Junxiao Shi over 8 years ago
- Related to Bug #3691: Scheduler::cancelAllEvents causes memory error in ScopedEventId added
Updated by Davide Pesavento over 8 years ago
Updated by Junxiao Shi over 8 years ago
- Status changed from New to In Progress
- Assignee set to Junxiao Shi
Updated by Junxiao Shi over 8 years ago
- Related to deleted (Bug #3691: Scheduler::cancelAllEvents causes memory error in ScopedEventId)
Updated by Junxiao Shi over 8 years ago
- Status changed from In Progress to Code review
- % Done changed from 0 to 100
Updated by Junxiao Shi over 8 years ago
https://gerrit.named-data.net/3037 patchset1 adds a benchmark for Scheduler
.
I compared the Scheduler before #3691 changes, https://gerrit.named-data.net/3035 patchset6, https://gerrit.named-data.net/3036 patchset2.
ONL pc2core, configure line --with-tests --without-pch --enable-static --disable-shared
, 1000000 events, duration in micros.
version | schedule | cancel | execute | schedule+execute |
---|---|---|---|---|
commit:750472b63d7df6254540356bb09647132b1e3104 | 368277 | 506833 | 133406 | 501684 |
commit:901f4a1ee7f71aa7a3ab48dafc3e912c504fdef0 | 412478 | 492051 | 112641 | 525119 |
commit:a482df9d511e6269844e990095fa64a8d61b92f3 | 414450 | 527419 | 117375 | 531825 |
I believe 30 milliseconds extra overhead per million events to resolve two bugs should be acceptable, and no further optimization is needed.
Updated by Alex Afanasyev over 8 years ago
While 30 milliseconds doesn't sounds a lot, 6 percent increase which it equivalents sounds non-trivial increase.
Given the problem in this bug is kind of corner case, can we have a solution that does not have such an increase of penalty? What is the reason for the penalty?
Updated by Davide Pesavento over 8 years ago
Alex Afanasyev wrote:
While 30 milliseconds doesn't sounds a lot, 6 percent increase which it equivalents sounds non-trivial increase.
True, 6% is quite a lot.
Looking at the numbers in note-6 again, the increase in cancel benchmark after commit a482df9d511e6269844e990095fa64a8d61b92f3 is very strange, since that commit doesn't touch cancelEvent()
. Junxiao, do you have an explanation?
Updated by Junxiao Shi over 8 years ago
I'll rerun the benchmark on ONL and see what it gives.
Vagrant box has frequent slowdowns recently.
The result is actually from ONL.
I guess I need to run the benchmark multiple times to average out external factors such as OS scheduling.
Updated by Junxiao Shi over 8 years ago
I ran the benchmarks again. This time it's on a Vagrant trusty 64-bit box, since ONL is unavailable at the moment.
First I compile three versions into three binaries, statically linked.
Then I run those binaries 50 times.
TRIALS=50
CASES='base 3691 3722'
for CASE in $CASES; do
> $CASE.log
done
for I in $(seq $TRIALS); do
echo $I
for CASE in $CASES; do
echo >> $CASE.log
date -u >> $CASE.log
./scheduler-benchmark-$CASE -l message >> $CASE.log
done
done
Finally, I collect the results:
for CASE in $CASES; do
for METRIC in schedule cancel execute; do
echo -n $CASE $METRIC ' '
echo '('$(awk '$1=="'$METRIC'"{print $4}' $CASE.log | tr '\n' '+')'0)/'$(awk '$1=="'$METRIC'"' $CASE.log | wc -l)'/1000' | bc
done
done
This gives the average duration of each million operations:
$CASE | patchset | schedule | cancel | execute | schedule+execute |
---|---|---|---|---|---|
base | 3037,4 | 1940027 | 19582392 | 413383 | 2353410 |
#3691 | 3035,7 | 2036830 | 19590815 | 324531 | 2361361 |
#3722 | 3036,3 | 2039858 | 19676666 | 319231 | 2359089 |
One thing I notice during the benchmarks is: softirqd
kernel thread is using up one CPU core.
Another observation is that there are a small number of failed tests, when the event recording completion time is executed before some other events.
When this occurs, "execute" metric would be unavailable.
vagrant@m0212:~/benchmark$ grep 'error in' base.log 3691.log 3722.log
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999995 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
Updated by Junxiao Shi over 8 years ago
- Status changed from Code review to Closed