Project

General

Profile

Actions

Bug #2768

closed

Misaligned memory accesses in TLV decoding (and probably elsewhere)

Added by Davide Pesavento almost 9 years ago. Updated almost 7 years ago.

Status:
Rejected
Priority:
Low
Assignee:
-
Category:
Faces
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

../tests/daemon/face/ndnlp.t.cpp:131:20: runtime error: load of misaligned address 0x000004992b32 for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
0x000004992b32: note: pointer points here
 02 8c  51 08 00 00 00 00 00 00  00 03 52 01 03 53 01 04  54 fd 02 78 cc cc cc cc  cc cc cc cc cc cc
              ^ 
../daemon/face/ndnlp-data.cpp:56:16: runtime error: load of misaligned address 0x00000498bf44 for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
0x00000498bf44: note: pointer points here
  50 4a 51 08 00 00 00 00  00 00 00 00 54 3e 01 3c  cc cc cc cc cc cc cc cc  cc cc cc cc cc cc cc cc
              ^ 

Related issues 1 (0 open1 closed)

Related to ndn-cxx - Bug #4097: Misaligned memory accesses in TLV decodingClosedJunxiao Shi

Actions
Actions #1

Updated by Davide Pesavento almost 9 years ago

You can reproduce using CXX=clang++ (clang version 3.5.0), CXXFLAGS=-fsanitize=undefined, LINKFLAGS=-fsanitize=undefined.

Actions #2

Updated by Junxiao Shi almost 9 years ago

  • Subject changed from NDNLP misaligned memory access to FaceNdnlp/Slice4 test case: misaligned memory access
  • Target version deleted (v0.4)

I can confirm that misaligned read access in this test case is intentional.

Please note that clang on non-OSX platform is unsupported.

Actions #3

Updated by Davide Pesavento almost 9 years ago

Junxiao Shi wrote:

I can confirm that misaligned read access in this test case is intentional.

I don't particularly care about the test case, but the same problem occurs in NdnlpData::fromBlock()

  parsed.seq = be64toh(*reinterpret_cast<const uint64_t*>(&*sequenceElement.value_begin()));

This is a performance sensitive code path and unaligned accesses hurt performance on some platforms. On some CPU architectures unaligned accesses are outright illegal and will abort execution.

Please note that clang on non-OSX platform is unsupported.

Here I used clang just to detect the problem, which exists in the code regardless of the compiler used.

Actions #4

Updated by Junxiao Shi almost 9 years ago

the same problem occurs in NdnlpData::fromBlock()
This is a performance sensitive code path and unaligned accesses hurt performance on some platforms.

This is a design limitation of NDNLPv1. Unaligned access is intentional.

I don't think eight single-octet reads (plus arithmetic operations) would be faster than one misaligned dword read.

NDN-TLV has the same design limitation in VAR-NUMBER.

On some CPU architectures unaligned accesses are outright illegal and will abort execution.

Maybe, but they are unsupported.

If C++ standard doesn't forbid unaligned access, it's compiler's responsibility to emit instructions that emulate an unaligned read by doing multiple single-octet reads.

Actions #5

Updated by Junxiao Shi almost 9 years ago

  • Subject changed from FaceNdnlp/Slice4 test case: misaligned memory access to NdnlpData::fromBlock misaligned memory access
Actions #6

Updated by Davide Pesavento almost 9 years ago

Junxiao Shi wrote:

This is a design limitation of NDNLPv1. Unaligned access is intentional.

It has nothing to do with the protocol (except for the fact that the wire format could be optimized for faster decoding). Unaligned access is a bug in my opinion, and UBSan considers it undefined behavior. Are you saying that there's an intentional bug in the code?

I don't think eight single-octet reads (plus arithmetic operations) would be faster than one misaligned dword read.

Depending on the processor architecture, yes, it can be much faster.

Anyway, you don't need to do single-byte reads, you can use memcpy which the compiler will be able to optimize appropriately in most cases if the target platform supports efficient unaligned accesses.

NDN-TLV has the same design limitation in VAR-NUMBER.

Again, maybe the wire format could be optimized, but it's not a design problem. Nothing in the design says that you have to perform a misaligned memory access.

If C++ standard doesn't forbid unaligned access, it's compiler's responsibility to emit instructions that emulate an unaligned read by doing multiple single-octet reads.

Where did you get this idea from? Anything not specified by the standard is, by definition, unspecified, i.e. each compiler is free to do whatever it wants, therefore you cannot rely on it. Moreover, memory layout and alignment issues are not part of the C++ language specification, but are covered by ABI definitions.

Actions #7

Updated by Junxiao Shi almost 9 years ago

  • Assignee deleted (Junxiao Shi)

Does unaligned memory access always cause bus errors? answer points out that:
unaligned access
(1) maybe slower
(2) is not supported on some platforms
(3) can be emulated by compiler when requested with a significant performance penalty (but it's not always enabled).

Daniel Lemire, Data alignment for speed: myth or reality? does some measurements and sees no measurable difference on Core i7 processor, but a 10% penalty on cheaper Core 2 processor.

This article also points out that unaligned access is supported on x86 and x64 processors, and newer ARM processors.

I did a benchmark with unaligned access vs memcpy method proposed in note-6.

// g++ -o y -std=c++0x y.cpp $(pkg-config --cflags --libs libndn-cxx)
#include <ndn-cxx/util/time.hpp>

using namespace ndn;

int
main(int argc, char** argv)
{
  const int SZ = 1048576;
  uint8_t arr[SZ * 4 + 3];
  std::memset(arr, 1, SZ * 4 + 3);

  int offset = atoi(argv[1]);
  uint32_t* rep = reinterpret_cast<uint32_t*>(arr + offset);

  auto t1 = time::steady_clock::now();

  int c = 0;
  for (int i = 0; i < SZ; ++i) {
    c += rep[i];
  }

  auto t2 = time::steady_clock::now();
  std::cout << "x " << offset << " " << (t2 - t1) << " " << c << "\n";

  return 0;
}
// g++ -o y -std=c++0x y.cpp $(pkg-config --cflags --libs libndn-cxx)
#include <ndn-cxx/util/time.hpp>

using namespace ndn;

int
main(int argc, char** argv)
{
  const int SZ = 1048576;
  uint8_t arr[SZ * 4 + 3];
  std::memset(arr, 1, SZ * 4 + 3);

  int offset = atoi(argv[1]);
  uint32_t* rep = reinterpret_cast<uint32_t*>(arr + offset);

  auto t1 = time::steady_clock::now();

  int c = 0;
  uint32_t n;
  for (int i = 0; i < SZ; ++i) {
    std::memcpy(&n, &rep[i], sizeof(uint32_t));
    c += n;
  }

  auto t2 = time::steady_clock::now();
  std::cout << "y " << offset << " " << (t2 - t1) << " " << c << "\n";

  return 0;
}

I ran the benchmark with this command:

for I in $(seq 20); do for OFFSET in $(seq 0 3); do ./x $OFFSET; ./y $OFFSET; done; done

The results on x64 are:

offset unaligned access memcpy
0 3233526 5971367
1 3573664 5951130
2 3388489 5935154
3 3384905 6038024

Clearly, using memcpy would be much slower than keeping unaligned access.

Therefore, I disagree with doing anything for now.

When there is a supported platform that does not support unaligned access, we could enable memcpy using conditional compilation.

Actions #8

Updated by Davide Pesavento almost 9 years ago

Junxiao Shi wrote:

(3) can be emulated by compiler when requested with a significant performance penalty (but it's not always enabled).

For completeness: (4) can be emulated by the OS kernel with an even higher performance penalty.

This article also points out that unaligned access is supported on x86 and x64 processors, and newer ARM processors.

Also on PowerPC if I remember correctly. On ARM, I don't know what the performance penalty is though.

Clearly, using memcpy would be much slower than keeping unaligned access.

On x86/x86_64, yes, it's a known fact. The Linux kernel has get_unaligned/put_unaligned macros that are used when unaligned access is unavoidable, and on x86 they just do the unaligned load/store because everything else would be slower.

Therefore, I disagree with doing anything for now.

When there is a supported platform that does not support unaligned access, we could enable memcpy using conditional compilation.

Fair enough.

Actions #9

Updated by Junxiao Shi over 8 years ago

NDNLPv1 implementation is being deleted in #3170. This issue can close after #3170 is complete.

Actions #10

Updated by Davide Pesavento over 8 years ago

Yes and no. NDNLPv1 will be deleted, but unaligned memory accesses occur in many other places that perform TLV parsing. So move this ticket to ndn-cxx if you want.

Actions #11

Updated by Davide Pesavento almost 8 years ago

  • Subject changed from NdnlpData::fromBlock misaligned memory access to Misaligned memory accesses in TLV decoding (and probably elsewhere)
  • Priority changed from Normal to Low
Actions #12

Updated by Junxiao Shi almost 7 years ago

  • Status changed from New to Rejected

unaligned memory accesses occur in many other places that perform TLV parsing.

NDNLPv1 is gone. I'm rejecting this because the issue description is written specifically for NDNLPv1.
If there are other places with the same problem, open an new bug with proper description.

Actions #13

Updated by Davide Pesavento almost 7 years ago

  • Precedes Bug #4097: Misaligned memory accesses in TLV decoding added
Actions #14

Updated by Davide Pesavento almost 7 years ago

  • Precedes deleted (Bug #4097: Misaligned memory accesses in TLV decoding)
Actions #15

Updated by Davide Pesavento almost 7 years ago

  • Related to Bug #4097: Misaligned memory accesses in TLV decoding added
Actions

Also available in: Atom PDF