Bug #2768
closedMisaligned memory accesses in TLV decoding (and probably elsewhere)
Added by Davide Pesavento over 9 years ago. Updated over 7 years ago.
0%
Description
../tests/daemon/face/ndnlp.t.cpp:131:20: runtime error: load of misaligned address 0x000004992b32 for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
0x000004992b32: note: pointer points here
02 8c 51 08 00 00 00 00 00 00 00 03 52 01 03 53 01 04 54 fd 02 78 cc cc cc cc cc cc cc cc cc cc
^
../daemon/face/ndnlp-data.cpp:56:16: runtime error: load of misaligned address 0x00000498bf44 for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
0x00000498bf44: note: pointer points here
50 4a 51 08 00 00 00 00 00 00 00 00 54 3e 01 3c cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
^
Updated by Davide Pesavento over 9 years ago
You can reproduce using CXX=clang++
(clang version 3.5.0), CXXFLAGS=-fsanitize=undefined
, LINKFLAGS=-fsanitize=undefined
.
Updated by Junxiao Shi over 9 years ago
- Subject changed from NDNLP misaligned memory access to FaceNdnlp/Slice4 test case: misaligned memory access
- Target version deleted (
v0.4)
I can confirm that misaligned read access in this test case is intentional.
Please note that clang on non-OSX platform is unsupported.
Updated by Davide Pesavento over 9 years ago
Junxiao Shi wrote:
I can confirm that misaligned read access in this test case is intentional.
I don't particularly care about the test case, but the same problem occurs in NdnlpData::fromBlock()
parsed.seq = be64toh(*reinterpret_cast<const uint64_t*>(&*sequenceElement.value_begin()));
This is a performance sensitive code path and unaligned accesses hurt performance on some platforms. On some CPU architectures unaligned accesses are outright illegal and will abort execution.
Please note that clang on non-OSX platform is unsupported.
Here I used clang just to detect the problem, which exists in the code regardless of the compiler used.
Updated by Junxiao Shi over 9 years ago
the same problem occurs in
NdnlpData::fromBlock()
This is a performance sensitive code path and unaligned accesses hurt performance on some platforms.
This is a design limitation of NDNLPv1. Unaligned access is intentional.
I don't think eight single-octet reads (plus arithmetic operations) would be faster than one misaligned dword read.
NDN-TLV has the same design limitation in VAR-NUMBER.
On some CPU architectures unaligned accesses are outright illegal and will abort execution.
Maybe, but they are unsupported.
If C++ standard doesn't forbid unaligned access, it's compiler's responsibility to emit instructions that emulate an unaligned read by doing multiple single-octet reads.
Updated by Junxiao Shi over 9 years ago
- Subject changed from FaceNdnlp/Slice4 test case: misaligned memory access to NdnlpData::fromBlock misaligned memory access
Updated by Davide Pesavento over 9 years ago
Junxiao Shi wrote:
This is a design limitation of NDNLPv1. Unaligned access is intentional.
It has nothing to do with the protocol (except for the fact that the wire format could be optimized for faster decoding). Unaligned access is a bug in my opinion, and UBSan considers it undefined behavior. Are you saying that there's an intentional bug in the code?
I don't think eight single-octet reads (plus arithmetic operations) would be faster than one misaligned dword read.
Depending on the processor architecture, yes, it can be much faster.
Anyway, you don't need to do single-byte reads, you can use memcpy
which the compiler will be able to optimize appropriately in most cases if the target platform supports efficient unaligned accesses.
NDN-TLV has the same design limitation in VAR-NUMBER.
Again, maybe the wire format could be optimized, but it's not a design problem. Nothing in the design says that you have to perform a misaligned memory access.
If C++ standard doesn't forbid unaligned access, it's compiler's responsibility to emit instructions that emulate an unaligned read by doing multiple single-octet reads.
Where did you get this idea from? Anything not specified by the standard is, by definition, unspecified, i.e. each compiler is free to do whatever it wants, therefore you cannot rely on it. Moreover, memory layout and alignment issues are not part of the C++ language specification, but are covered by ABI definitions.
Updated by Junxiao Shi over 9 years ago
- Assignee deleted (
Junxiao Shi)
Does unaligned memory access always cause bus errors? answer points out that:
unaligned access
(1) maybe slower
(2) is not supported on some platforms
(3) can be emulated by compiler when requested with a significant performance penalty (but it's not always enabled).
Daniel Lemire, Data alignment for speed: myth or reality? does some measurements and sees no measurable difference on Core i7 processor, but a 10% penalty on cheaper Core 2 processor.
This article also points out that unaligned access is supported on x86 and x64 processors, and newer ARM processors.
I did a benchmark with unaligned access vs memcpy
method proposed in note-6.
// g++ -o y -std=c++0x y.cpp $(pkg-config --cflags --libs libndn-cxx)
#include <ndn-cxx/util/time.hpp>
using namespace ndn;
int
main(int argc, char** argv)
{
const int SZ = 1048576;
uint8_t arr[SZ * 4 + 3];
std::memset(arr, 1, SZ * 4 + 3);
int offset = atoi(argv[1]);
uint32_t* rep = reinterpret_cast<uint32_t*>(arr + offset);
auto t1 = time::steady_clock::now();
int c = 0;
for (int i = 0; i < SZ; ++i) {
c += rep[i];
}
auto t2 = time::steady_clock::now();
std::cout << "x " << offset << " " << (t2 - t1) << " " << c << "\n";
return 0;
}
// g++ -o y -std=c++0x y.cpp $(pkg-config --cflags --libs libndn-cxx)
#include <ndn-cxx/util/time.hpp>
using namespace ndn;
int
main(int argc, char** argv)
{
const int SZ = 1048576;
uint8_t arr[SZ * 4 + 3];
std::memset(arr, 1, SZ * 4 + 3);
int offset = atoi(argv[1]);
uint32_t* rep = reinterpret_cast<uint32_t*>(arr + offset);
auto t1 = time::steady_clock::now();
int c = 0;
uint32_t n;
for (int i = 0; i < SZ; ++i) {
std::memcpy(&n, &rep[i], sizeof(uint32_t));
c += n;
}
auto t2 = time::steady_clock::now();
std::cout << "y " << offset << " " << (t2 - t1) << " " << c << "\n";
return 0;
}
I ran the benchmark with this command:
for I in $(seq 20); do for OFFSET in $(seq 0 3); do ./x $OFFSET; ./y $OFFSET; done; done
The results on x64 are:
offset | unaligned access | memcpy |
---|---|---|
0 | 3233526 | 5971367 |
1 | 3573664 | 5951130 |
2 | 3388489 | 5935154 |
3 | 3384905 | 6038024 |
Clearly, using memcpy
would be much slower than keeping unaligned access.
Therefore, I disagree with doing anything for now.
When there is a supported platform that does not support unaligned access, we could enable memcpy
using conditional compilation.
Updated by Davide Pesavento over 9 years ago
Junxiao Shi wrote:
(3) can be emulated by compiler when requested with a significant performance penalty (but it's not always enabled).
For completeness: (4) can be emulated by the OS kernel with an even higher performance penalty.
This article also points out that unaligned access is supported on x86 and x64 processors, and newer ARM processors.
Also on PowerPC if I remember correctly. On ARM, I don't know what the performance penalty is though.
Clearly, using
memcpy
would be much slower than keeping unaligned access.
On x86/x86_64, yes, it's a known fact. The Linux kernel has get_unaligned
/put_unaligned
macros that are used when unaligned access is unavoidable, and on x86 they just do the unaligned load/store because everything else would be slower.
Therefore, I disagree with doing anything for now.
When there is a supported platform that does not support unaligned access, we could enablememcpy
using conditional compilation.
Fair enough.
Updated by Junxiao Shi about 9 years ago
Updated by Davide Pesavento about 9 years ago
Yes and no. NDNLPv1 will be deleted, but unaligned memory accesses occur in many other places that perform TLV parsing. So move this ticket to ndn-cxx if you want.
Updated by Davide Pesavento over 8 years ago
- Subject changed from NdnlpData::fromBlock misaligned memory access to Misaligned memory accesses in TLV decoding (and probably elsewhere)
- Priority changed from Normal to Low
Updated by Junxiao Shi over 7 years ago
- Status changed from New to Rejected
unaligned memory accesses occur in many other places that perform TLV parsing.
NDNLPv1 is gone. I'm rejecting this because the issue description is written specifically for NDNLPv1.
If there are other places with the same problem, open an new bug with proper description.
Updated by Davide Pesavento over 7 years ago
- Precedes Bug #4097: Misaligned memory accesses in TLV decoding added
Updated by Davide Pesavento over 7 years ago
- Precedes deleted (Bug #4097: Misaligned memory accesses in TLV decoding)
Updated by Davide Pesavento over 7 years ago
- Related to Bug #4097: Misaligned memory accesses in TLV decoding added