Project

General

Profile

Actions

Bug #1984

closed

EthernetFace crash upon NIC down

Added by Davide Pesavento over 9 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
Normal
Category:
Faces
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
2.00 h

Description

  1. Run nfd with at least one Ethernet face, e.g.:

    faceid=257 remote=ether://[01:00:5e:00:17:aa] local=dev://eth0
    
  2. Run ip link set eth0 down

  3. Watch nfd committing suicide:

    1411055327.657042 FATAL: [NFD] pcap_next_ex(): The interface went down
    
Actions #1

Updated by Junxiao Shi over 9 years ago

  • Description updated (diff)
  • Category set to Faces
  • Target version set to v0.3

This shall be expected behavior. Operator is supposed to setup ifupdown scripts to restart NFD.

Actions #2

Updated by Davide Pesavento over 9 years ago

The presence of an ifupdown script that restarts NFD is orthogonal to this issue. NFD should not crash, period.

If I remember correctly, some time ago we decided that--in case of network connectivity changes--an external entity would send a HUP signal to NFD, which in turn would reload its config file and reinitialize the affected channels and multicast faces. This implies that NFD must survive the "interface down" event, otherwise using SIGHUP is pointless.

Actions #3

Updated by Junxiao Shi over 9 years ago

  • Subject changed from nfd dies when the network interface used by an EthernetFace goes down to EthernetFace crash upon NIC down
  • Assignee set to Davide Pesavento
  • Estimated time set to 2.00 h

Confirmed.

This bug is due to improper error handling in EthernetFace.

EthernetFace::handleRead throws an exception when pcap_next_ex returns an error code.
It should invoke .fail instead.

EthernetFace::handleRead also throws exceptions when captured frame is too short or has incorrect Ethertype.
These exceptions are acceptable, because libpcap should never return such unexpected frames.

Actions #4

Updated by Davide Pesavento over 9 years ago

Junxiao Shi wrote:

EthernetFace::handleRead also throws exceptions when captured frame is too short or has incorrect Ethertype.
These exceptions are acceptable, because libpcap should never return such unexpected frames.

Are you sure that libpcap never returns short frames? For robustness we should call fail (or ignore the captured frame and continue) in such cases as well.

Actions #5

Updated by Junxiao Shi over 9 years ago

Are you sure that libpcap never returns short frames? For robustness we should call fail (or ignore the captured frame and continue) in such cases as well.

libpcap is expected not to return anything that does not match the bpf_program.
This is part of the contract, and it's unnecessary to double check.

Actions #6

Updated by Davide Pesavento over 9 years ago

Junxiao Shi wrote:

libpcap is expected not to return anything that does not match the bpf_program.

There is nothing in the BPF filter about too-short frames (so called runt frames). They should be discarded simply because they violate IEEE 802.3

This is part of the contract, and it's unnecessary to double check.

Assuming that you're talking about the ethertype, the check is already there. What I'm saying is that we should just ignore the packet or call fail, instead of throwing an exception and dying. Are you arguing that we should remove the check altogether?

Actions #7

Updated by Junxiao Shi over 9 years ago

We should look at libpcap documentation to understand whether it should discard runt frames.
If yes, ASSERT that runt frames are not coming to our code.
If no, check for runt frames and ignore them.

Since bpf_program guarantees correct Ethertype, we can ASSERT Ethertype is correct.
In production builds, this check is effectively disabled because it's unnecessary.

Actions #8

Updated by Davide Pesavento over 9 years ago

  • Status changed from New to In Progress
Actions #9

Updated by Davide Pesavento over 9 years ago

  • Status changed from In Progress to Code review
  • % Done changed from 0 to 80
Actions #10

Updated by Davide Pesavento over 9 years ago

  • % Done changed from 80 to 100
Actions #11

Updated by Davide Pesavento over 9 years ago

  • Status changed from Code review to Closed
Actions

Also available in: Atom PDF