Bug #3722
closedScheduler stops if a callback throws
100%
Description
Steps to reproduce:
- declare a
ndn::util::scheduler::Scheduler - scheduler two events; the first event callback throws an exception
- invoke
io.run(), catch the exception thrown by the first event callback, and restartio.run()
Expected: the second event is executed
Actual: the second event is never executed
Updated by Junxiao Shi almost 10 years ago
- Related to Bug #3691: Scheduler::cancelAllEvents causes memory error in ScopedEventId added
Updated by Davide Pesavento almost 10 years ago
Updated by Junxiao Shi almost 10 years ago
- Status changed from New to In Progress
- Assignee set to Junxiao Shi
Updated by Junxiao Shi almost 10 years ago
- Related to deleted (Bug #3691: Scheduler::cancelAllEvents causes memory error in ScopedEventId)
Updated by Junxiao Shi almost 10 years ago
- Status changed from In Progress to Code review
- % Done changed from 0 to 100
Updated by Junxiao Shi almost 10 years ago
https://gerrit.named-data.net/3037 patchset1 adds a benchmark for Scheduler.
I compared the Scheduler before #3691 changes, https://gerrit.named-data.net/3035 patchset6, https://gerrit.named-data.net/3036 patchset2.
ONL pc2core, configure line --with-tests --without-pch --enable-static --disable-shared, 1000000 events, duration in micros.
| version | schedule | cancel | execute | schedule+execute |
|---|---|---|---|---|
| commit:750472b63d7df6254540356bb09647132b1e3104 | 368277 | 506833 | 133406 | 501684 |
| commit:901f4a1ee7f71aa7a3ab48dafc3e912c504fdef0 | 412478 | 492051 | 112641 | 525119 |
| commit:a482df9d511e6269844e990095fa64a8d61b92f3 | 414450 | 527419 | 117375 | 531825 |
I believe 30 milliseconds extra overhead per million events to resolve two bugs should be acceptable, and no further optimization is needed.
Updated by Alex Afanasyev almost 10 years ago
While 30 milliseconds doesn't sounds a lot, 6 percent increase which it equivalents sounds non-trivial increase.
Given the problem in this bug is kind of corner case, can we have a solution that does not have such an increase of penalty? What is the reason for the penalty?
Updated by Davide Pesavento almost 10 years ago
Alex Afanasyev wrote:
While 30 milliseconds doesn't sounds a lot, 6 percent increase which it equivalents sounds non-trivial increase.
True, 6% is quite a lot.
Looking at the numbers in note-6 again, the increase in cancel benchmark after commit a482df9d511e6269844e990095fa64a8d61b92f3 is very strange, since that commit doesn't touch cancelEvent(). Junxiao, do you have an explanation?
Updated by Junxiao Shi almost 10 years ago
I'll rerun the benchmark on ONL and see what it gives.
Vagrant box has frequent slowdowns recently.
The result is actually from ONL.
I guess I need to run the benchmark multiple times to average out external factors such as OS scheduling.
Updated by Junxiao Shi almost 10 years ago
I ran the benchmarks again. This time it's on a Vagrant trusty 64-bit box, since ONL is unavailable at the moment.
First I compile three versions into three binaries, statically linked.
Then I run those binaries 50 times.
TRIALS=50
CASES='base 3691 3722'
for CASE in $CASES; do
> $CASE.log
done
for I in $(seq $TRIALS); do
echo $I
for CASE in $CASES; do
echo >> $CASE.log
date -u >> $CASE.log
./scheduler-benchmark-$CASE -l message >> $CASE.log
done
done
Finally, I collect the results:
for CASE in $CASES; do
for METRIC in schedule cancel execute; do
echo -n $CASE $METRIC ' '
echo '('$(awk '$1=="'$METRIC'"{print $4}' $CASE.log | tr '\n' '+')'0)/'$(awk '$1=="'$METRIC'"' $CASE.log | wc -l)'/1000' | bc
done
done
This gives the average duration of each million operations:
| $CASE | patchset | schedule | cancel | execute | schedule+execute |
|---|---|---|---|---|---|
| base | 3037,4 | 1940027 | 19582392 | 413383 | 2353410 |
| #3691 | 3035,7 | 2036830 | 19590815 | 324531 | 2361361 |
| #3722 | 3036,3 | 2039858 | 19676666 | 319231 | 2359089 |
One thing I notice during the benchmarks is: softirqd kernel thread is using up one CPU core.
Another observation is that there are a small number of failed tests, when the event recording completion time is executed before some other events.
When this occurs, "execute" metric would be unavailable.
vagrant@m0212:~/benchmark$ grep 'error in' base.log 3691.log 3722.log
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
base.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999995 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
3722.log:../tests/integrated/scheduler-benchmark.cpp(72): fatal error in "Execute": critical check nExpired == nEvents failed [999999 != 1000000]
Updated by Junxiao Shi almost 10 years ago
- Status changed from Code review to Closed