Bug #3348
closedVLAN tagged Ethernet packet triggers assertion failure
100%
Description
This is probably because of offset mismatch in Ethernet packets with VLAN tags.
- Steps to reproduce:
Add face on both ends:
faceid=259 remote=ether://[01:00:ab:00:17:aa] local=dev://p2p2.802
Machine1: nfdc register /test 259
Machine1: echo "hello" | ndnputchunks3 /test
faceid=263 remote=ether://[01:00:zz:00:17:aa] local=dev://p2p2.802
Machine2: nfdc register /test 263
Machine2" ndncatchunks3 /test
- Results in NFD assertion failure.
nfd: ../daemon/face/ethernet-face.cpp:375: void nfd::EthernetFace::processIncomingPacket(const pcap_pkthdr*, const uint8_t*): Assertion `((extension ({ unsigned short int v, __x
= (unsigned short int) (eh->ether_type); if (builtin_constant_p (x)) __v = ((unsigned short int) ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))); else __asm ("rorw $8, %w0" :
"=r" (v) : "0" (x) : "cc"); __v; })) == ethernet::ETHERTYPE_NDN)&&("Received frame with unrecognized ethertype")' failed.
Files
Updated by Davide Pesavento about 9 years ago
- Subject changed from VLAN tagged Ethernet Packet causes NFD crash to VLAN tagged Ethernet packet triggers assertion failure
- Category set to Faces
This assertion in EthernetFace/Transport is failing:
BOOST_ASSERT_MSG(ntohs(eh->ether_type) == ethernet::ETHERTYPE_NDN,
"Received frame with unrecognized ethertype");
Updated by Davide Pesavento about 9 years ago
I think we should simply remove that assertion. We already have a BPF socket filter in place and we should trust it, no point in reimplementing the ethertype parsing logic (which is non-trivial in the presence of nested VLAN tags).
Updated by susmit shannigrahi about 9 years ago
Davide Pesavento wrote:
I think we should simply remove that assertion. We already have a BPF socket filter in place and we should trust it, no point in reimplementing the ethertype parsing logic (which is non-trivial in the presence of nested VLAN tags).
I agree.
Updated by Davide Pesavento about 9 years ago
- Status changed from New to In Progress
- Assignee set to Davide Pesavento
- Target version set to v0.4
Updated by Davide Pesavento about 9 years ago
- Status changed from In Progress to Feedback
- % Done changed from 0 to 50
http://gerrit.named-data.net/2607
This change removes the failing assert.
However I'm not so sure this is the right fix. If we decide that NFD accepts only untagged frames, then removing this assert is wrong. We should instead filter all tagged frames in BPF, or discard them in processIncomingPacket
if that's not possible with BPF. Let's discuss if and how NFD handles VLANs in #3344 first, and then we can decide how to fix this crash.
Updated by Junxiao Shi about 9 years ago
#3344 note-16 says NFD should only accept untagged packets. This Bug occurs because BPF program is incorrect that it accepts tagged packets into the EthernetTransport. note-5 solution is wrong. Instead, BPF program should be adjusted.
Updated by Davide Pesavento about 9 years ago
Junxiao Shi wrote:
#3344 note-16 says NFD should only accept untagged packets. This Bug occurs because BFD program is incorrect that it accepts tagged packets into the EthernetTransport. note-5 solution is wrong. Instead, BFD program should be adjusted.
note-5 predates my "verdict" in #3344. Changing the BPF program is exactly what I said in note-5 so I suppose we agree.
Also, when you say "BFD" I'm assuming you mean "BPF".
Updated by Davide Pesavento about 9 years ago
- Status changed from Feedback to In Progress
Updated by Davide Pesavento about 9 years ago
- Status changed from In Progress to Code review
- % Done changed from 50 to 100
Updated by Alex Afanasyev about 9 years ago
I did testing using 2 virtual machines, connected using vde switch (I think it works only on Linux-based Virtualbox host):
vagrant config:
# -*- mode: ruby -*-
# vi: set ft=ruby :
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "ubuntu/trusty64"
config.vm.provider "virtualbox" do |vb|
vb.gui = false
vb.memory = "2048"
vb.cpus = "1"
end
(1..2).each do |i|
config.vm.define "node-#{i}" do |node|
node.vm.provision "shell",
inline: "echo hello from node #{i}"
node.vm.provider "virtualbox" do |vb|
vb.customize ["modifyvm", :id, "--nic2", "generic", "--nicgenericdrv2", "VDE"]
vb.customize ["modifyvm", :id, "--nicproperty2", "network=/tmp/switch1[#{i}]"]
end
end
end
end
On Ubuntu 14.04 there a small hack is needed:
sudo ln -s /usr/lib/libvdeplug.so.2 /usr/lib/libvdeplug.so
Install and run vde switch on host
sudo apt-get install vde2
vde_switch -s /tmp/switch1
vlan/create 10
vlan/addport 10 1
vlan/addport 10 2
With this setup, I'm confirming the original report and conclusion of #3348: packet is received in both tagged (on eth1) and untagged (on eth1.10) forms. In release mode, I'm seeing a failure to parse, in debug mode, there is an assertion.
Release mode:
1450682025.026806 TRACE: [EthernetTransport] [id=257,local=dev://eth1,remote=ether://[01:00:5e:00:17:aa]] Received: 12 bytes from 08:00:27:b1:1b:c8
1450682025.027094 WARNING: [GenericLinkService] [id=257,local=dev://eth1,remote=ether://[01:00:5e:00:17:aa]] packet parse error (unrecognized TLV-TYPE 0): DROP
1450682025.027158 TRACE: [EthernetTransport] [id=258,local=dev://eth1.10,remote=ether://[01:00:5e:00:17:aa]] Received: 30 bytes from 08:00:27:b1:1b:c8
Debug mode:
***** Internal Program Error - assertion (ntohs(eh->ether_type) == ethernet::ETHERTYPE_NDN) failed in void nfd::face::EthernetTransport::processIncomingPacket(const pcap_pkthdr*, const uint8_t*):
../daemon/face/ethernet-transport.cpp(347): Received frame with unrecognized ethertype
Aborted (core dumped)
After applying patch from http://gerrit.named-data.net/#/c/2607/3 the problem persists!
root@vagrant-ubuntu-trusty-64:~/NFD# ./build/bin/nfd
1450683041.939100 INFO: [EthernetTransport] [id=0,local=dev://eth1,remote=ether://[01:00:5e:00:17:aa]] Creating transport
1450683041.942406 DEBUG: [EthernetTransport] [id=0,local=dev://eth1,remote=ether://[01:00:5e:00:17:aa]] Interface MTU is 1500
1450683041.945520 INFO: [EthernetTransport] [id=0,local=dev://eth1.10,remote=ether://[01:00:5e:00:17:aa]] Creating transport
1450683041.948372 DEBUG: [EthernetTransport] [id=0,local=dev://eth1.10,remote=ether://[01:00:5e:00:17:aa]] Interface MTU is 1500
1450683041.952410 INFO: [EthernetTransport] [id=0,local=dev://eth0,remote=ether://[01:00:5e:00:17:aa]] Creating transport
1450683041.956304 DEBUG: [EthernetTransport] [id=0,local=dev://eth0,remote=ether://[01:00:5e:00:17:aa]] Interface MTU is 1500
1450683041.970910 WARNING: [EthernetTransport] [id=259,local=dev://eth0,remote=ether://[01:00:5e:00:17:aa]] Read timeout
***** Internal Program Error - assertion (ntohs(eh->ether_type) == ethernet::ETHERTYPE_NDN) failed in void nfd::face::EthernetTransport::processIncomingPacket(const pcap_pkthdr*, const uint8_t*):
../daemon/face/ethernet-transport.cpp(349): Received frame with unrecognized ethertype
Aborted (core dumped)
Something is really fishy with the filter. Here is my attempt to play with tcpdump filters:
root@vagrant-ubuntu-trusty-64:~/NFD# tcpdump -nei eth1 "(ether proto 0x8624) and not vlan"
tcpdump: WARNING: eth1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
07:44:33.610003 08:00:27:b1:1b:c8 > 01:00:5e:00:17:aa, ethertype 802.1Q (0x8100), length 64: vlan 10, p 0, ethertype 0x8624,
0x0000: 051c 0710 0804 7465 7374 0805 6865 6c6c ......test..hell
0x0010: 6f08 0133 0902 1200 0a04 a77b 7b13 0000 o..3.......{{...
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
This seem to be contradictory to what I'm requesting... I tried to remove eth1.10 interface and remove 8021q module. No change. Somehow, this VLAN-tagged packet is not treated as a VLAN tagged packet... Filter, for some reason, applies to the inner packet, not to the outer.
Updated by Alex Afanasyev about 9 years ago
Can it be related to https://bugzilla.redhat.com/show_bug.cgi?id=498981#c4 ?
Updated by Davide Pesavento about 9 years ago
Can you attach a pcap trace of the packets that trigger this behavior?
Updated by susmit shannigrahi about 9 years ago
Davide Pesavento wrote:
Can you attach a pcap trace of the packets that trigger this behavior?
Interesting, I don't see the behavior Alex described.
Publisher machine:
faceid=261 remote=ether://[01:00:5e:00:17:aa] local=dev://enp68s0f1.802 counters={in={4i 0d 96B} out={0i 2d 704B}} non-local permanent multi-access
/test nexthops={faceid=261 (cost=0)}
/test route={faceid=261 (origin=255 cost=0 ChildInherit)}
nfdc register /test 261
$ echo "hello" | ndnputchunks3 /test
Consumer machine:
faceid=262 remote=ether://[01:00:5e:00:17:aa] local=dev://enp68s0f0.802 counters={in={0i 2d 704B} out={4i 0d 96B}} non-local permanent multi-access
/test nexthops={faceid=262 (cost=0)}
/test route={faceid=262 (origin=255 cost=0 ChildInherit)}
nfdc register /test 262
[root@atmos-sac NFD]# ndncatchunks3 -o /test
hello
Updated by Davide Pesavento about 9 years ago
Yeah I suspect a bug in libpcap, only recently fixed. I'm on 1.7.4, Alex is probably on 1.5.3.
Susmit, what version are you using?
Updated by susmit shannigrahi about 9 years ago
Davide Pesavento wrote:
Yeah I suspect a bug in libpcap, only recently fixed. I'm on 1.7.4, Alex is probably on 1.5.3.
Susmit, what version are you using?
That might be true. I am using 1.7.4.
Updated by Alex Afanasyev about 9 years ago
I tried to compile the last version of libpcap and tcpdump, but still see the same result.
root@vagrant-ubuntu-trusty-64:~/tcpdump-4.7.4# ./tcpdump --version
tcpdump version 4.7.4
libpcap version 1.7.4
./tcpdump -ni eth1 -w eth1.pcap not vlan
reading from file /vagrant/eth1.pcap, link-type EN10MB (Ethernet)
18:27:01.670054 08:00:27:b1:1b:c8 > 01:00:5e:00:17:aa, ethertype 802.1Q (0x8100), length 64:
0x0000: 0518 0710 0804 7465 7374 0805 6865 6c6c ......test..hell
0x0010: 6f08 0131 0a04 4a36 d258 0000 0000 0000 o..1..J6.X......
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
Interesting fact. If I read pcap with "not vlan" option, then packet is correctly filtered out.
Updated by Alex Afanasyev about 9 years ago
Maybe this is effect of virtualbox emulation (something lacking in emulated driver), but I'm not 100% convinced. Unfortunately, I don't have a real system to try with.
I will repeat experiment with virtualbox'ed Fedora.
Updated by Davide Pesavento about 9 years ago
I cannot reproduce the problem on Ubuntu 15.10 reading Alex's trace (tcpdump -r eth1.pcap ...
), the packet is correctly filtered.
Updated by Davide Pesavento about 9 years ago
Alex Afanasyev wrote:
I tried to compile the last version of libpcap and tcpdump, but still see the same result.
The bug you referenced in note-11 says that some kernel changes were also required to completely fix it. Maybe your kernel doesn't have those changes.
I noticed that on Ubuntu 14.04 the BPF assembly code generated for both 'ether proto 0x8624'
and 'ether proto 0x8624 && not vlan'
is exactly the same, while on 15.10 it's different (the '&& not vlan'
part adds Linux-specific BPF extensions to the compiled filter).
This probably means that we cannot rely on a BPF filter for this job, instead we have to "manually" discard tagged packets in the transport.
Updated by Alex Afanasyev about 9 years ago
Just checked with Fedora 23 with the same virtualized environment. Everything works as expected:
[root@fedora NFD]# tcpdump -ni enp0s8
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp0s8, link-type EN10MB (Ethernet), capture size 262144 bytes
20:34:39.656886 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 08:00:27:34:16:fe, length 300
20:34:40.415807 08:00:27:b1:1b:c8 > 01:00:5e:00:17:aa, ethertype 802.1Q (0x8100), length 64:
0x0000: 0518 0710 0804 7465 7374 0805 6865 6c6c ......test..hell
0x0010: 6f08 0132 0a04 eed8 4064 0000 0000 0000 o..2....@d......
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
[root@fedora NFD]# tcpdump -ni enp0s8 not vlan
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp0s8, link-type EN10MB (Ethernet), capture size 262144 bytes
[root@fedora NFD]# tcpdump -ni enp0s8 vlan 10
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp0s8, link-type EN10MB (Ethernet), capture size 262144 bytes
20:35:20.119949 08:00:27:b1:1b:c8 > 01:00:5e:00:17:aa, ethertype 802.1Q (0x8100), length 64:
0x0000: 0518 0710 0804 7465 7374 0805 6865 6c6c ......test..hell
0x0010: 6f08 0132 0a04 745c 989d 0000 0000 0000 o..2..t\........
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
As a conclusion. We should proceed with the fix of the tcpdump filter. However, we also need to remove asserts and additional checks in incoming packet processing path. I think just checking for ethernet type would suffice.
Updated by Davide Pesavento about 9 years ago
Alex Afanasyev wrote:
As a conclusion. We should proceed with the fix of the tcpdump filter. However, we also need to remove asserts and additional checks in incoming packet processing path. I think just checking for ethernet type would suffice.
Agreed.
Updated by Alex Afanasyev about 9 years ago
- Status changed from Code review to Closed