Bug #2548: Logger: race condition between isEnabled and setLogLevel - NFD - NDN project issue tracking system

Actions

Copy link

Bug #2548

closed

Logger: race condition between isEnabled and setLogLevel

Added by Davide Pesavento over 10 years ago. Updated over 7 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Davide Pesavento

Category:

Core

Target version:

v0.7

Start date:

Due date:

% Done:

100%

Estimated time:

Description

The root issue is this:

setLogLevel is called by LoggerFactory::onConfig and LoggerFactory::setDefaultLevel, and is protected by LoggerFactory::m_loggersGuard.
isEnabled is called by NFD_LOG, and is not protected by any mutex.

A data race can result e.g. during a call to rib::Nrd::initializeLogging in the NRD thread, while the NFD thread is doing anything that involves printing a log message via NFD_LOG_*.

(A related issue is that the logging subsystem is basically initialized twice...)

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Davide Pesavento over 10 years ago

Description updated (diff)

Actions

Copy link

Updated by Joao Pereira about 10 years ago

I see 2 options here:

1 - We start using g_logMutex inside the methods setLogLevel and isEnabled

This is a more drastic measure because when we are writing a log we cannot check another log or change level.

2 - We pass by ref to the Logger class the mutex from LoggerFactory

This approach is more dangerous because we will split the locking between 2 different classes and this will make a potential deadlock if future changes are not done careful

3 - Add a mutex to the Logger class

This approach is the easiest to implement, but having a mutex per logger looks like a overkill

==========

From my point of view I believe the solution 3 is the one that a could cause less issues in a future, maybe it would bit a little more heavy in memory but it can also be faster because we will not be blocking every logger action.

Actions

Copy link

Updated by Junxiao Shi about 10 years ago

I believe this issue would disappear after #2513. If so, we don't need to do anything.

Actions

Copy link

Updated by Joao Pereira about 10 years ago

Not sure if it solves the problem, because the race condition is between isEnabled and setLogLevel, so even if we move the log writing into a separated thread the isEnabled is called in that thread but setLogLevel is called in all the other Threads. So the race condition remains, unless I misunderstood the change you want to implement in that issue.

Actions

Copy link

Updated by Junxiao Shi about 10 years ago

Looking at note-4, I agree that this race condition won't disappear with #2513.

If we don't change anything, when a race condition happens, how bad the behavior would be?

If it's just printing / not printing a few log lines, this isn't worth a lock.

Actions

Copy link

Updated by Davide Pesavento about 10 years ago

A data race, akin to undefined behavior, can result in anything, from nothing to incorrect results to program termination. Moreover, it makes it impossible to run nfd under helgrind/DRD or with ThreadSanitizer. We cannot ignore this bug.

I don't remember the details now, but maybe it's possible to fix this without introducing additional locks by reordering the init sequence of the logging subsystem...

Actions

Copy link

Updated by Junxiao Shi about 10 years ago

A data race, akin to undefined behavior, can result in anything, from nothing to incorrect results to program termination.

Please explain, at CPU instruction level, how this data race on Logger::m_enabledLogLevel can cause something worse than writing or not writing a few log lines.

Logger::m_enabledLogLevel is a 32-bit signed integer. Load/save operation on this type is atomic on all supported platforms.

Actions

Copy link

Updated by Davide Pesavento about 10 years ago

Junxiao, from your question it is clear that you have no idea what you're talking about... you're forgetting about so many things: compiler optimizations, reordering of loads/stores, memory fences, coherency on multi-processor systems, etc... Also "atomic" in this context doesn't mean what you think it means.

And even if "it's just printing / not printing a few log lines", isn't it still a bug? Why is it so hard to acknowledge it?

Actions

Copy link

Updated by Junxiao Shi about 10 years ago

So, we need to explicitly use std::atomic_uint32?
I can smell a further worsening of logging performance.

Actions

Copy link

#10

Updated by Davide Pesavento about 10 years ago

Not necessarily. It should be possible to avoid locking with a minor redesign of the logging system. We need to:

ensure that each logger can be used at most by one thread
keep the locking around accesses to LoggerMap m_loggers
when the config is read/changed and the logging is (re-)initialized, call a per-thread function that updates m_enabledLogLevel (this doesn't require any additional locking if the two conditions above are met)

Actions

Copy link

#11