Project

General

Profile

Actions

Bug #4140

closed

ChronoSync sendSyncData can result in trying to send content exceeding the size of the packet

Added by Ashlesh Gawande over 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Start date:
06/19/2017
Due date:
% Done:

0%

Estimated time:

Description

In ChronoSync a recovery interest triggers other side to send the full state.
ChronoSync can also send the full state if a pending sync interest is unknown or an empty digest is received.

https://github.com/named-data/ChronoSync/blob/097bb448f46b8bd9a5c1f431e824f8f6a169b650/src/logic.cpp#L748
https://github.com/named-data/ChronoSync/blob/097bb448f46b8bd9a5c1f431e824f8f6a169b650/src/logic.cpp#L563
https://github.com/named-data/ChronoSync/blob/097bb448f46b8bd9a5c1f431e824f8f6a169b650/src/logic.cpp#L444

Currently State is encoded using wireEncode.

On the testbed NLSR kept crashing with Data exceeds size limit error with the new ChronoSync.
So it was rolled back to the old ChronoSync fork NSync which did not show this error.

NSync used protobuf to encode the State as a character array which seems more efficient than wireEncode as it does not exceed Data packet size (Testbed is running fine since then).

Two solutions I can think of:
1) Segment data
2) Use protobuf

I need to try to re-create this error off the testbed to better understand exactly what lead to it.


Related issues 2 (0 open2 closed)

Related to ChronoSync - Bug #4218: ChronoSync exclude interests should be mustbefreshClosed08/03/2017

Actions
Related to NLSR - Bug #4513: Need to use a fixed session name for ChronoSync socketsClosed02/20/2018

Actions
Actions #1

Updated by Alex Afanasyev over 7 years ago

Protobuf is out of question, as it doesn't solve the problem. My rough calculation (assuming 50 bytes names + seqnos) give that it has to be > 140 sync nodes to cause the crash.. I don't know how we got to that point on our testbed, though agree that we need a solution.

Actions #2

Updated by Ashlesh Gawande almost 7 years ago

John reports that after restarting NFD this error seems to go away.

ChronoSync sent an exclude interest and immediately got back data - so it is coming from CS.
NLSR gets an update from ChronoSync with the same name but different sequence numbers.
May be they are coming from different sessions?
I think this happens because of exclude interest is not marked fresh.

1517602148.287780 DEBUG: [sync.Logic] << Logic::sendExcludeInterest
1517602148.287784 DEBUG: [sync.Logic] << Logic::formAndSendExcludeInterest
1517602148.287799 DEBUG: [sync.Logic] << Logic::cancelReset
1517602148.298503 DEBUG: [sync.Logic] >> Logic::onSyncData
1517602148.298516 DEBUG: [sync.Logic] First data
1517602148.298536 DEBUG: [sync.Logic] >> Logic::processSyncData
1517602148.305369 DEBUG: [nlsr.SyncLogicHandler] Received ChronoSync update event
1517602148.305399 DEBUG: [nlsr.SyncLogicHandler] Update Name: /localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/spurs/NAME Seq no: 1
1517602148.305417 DEBUG: [nlsr.SyncLogicHandler] Origin Router of update: /ndn/edu/ucla/%C1.Router/cs/spurs
1517602148.305442 DEBUG: [nlsr.SyncLogicHandler] Received sync update with higher NAME sequence number than entry in LSDB
1517602148.305491 DEBUG: [nlsr.Lsdb] Fetching Data for LSA: /localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/spurs/NAME/%01 Seq number: 1
1517602148.305592 TRACE: [nlsr.Nlsr] NLSR: Connect to SegmentFetcher.
1517602148.305610 DEBUG: [nlsr.SyncLogicHandler] Update Name: /localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/spurs/NAME Seq no: 12
1517602148.305624 DEBUG: [nlsr.SyncLogicHandler] Origin Router of update: /ndn/edu/ucla/%C1.Router/cs/spurs
1517602148.305635 DEBUG: [nlsr.SyncLogicHandler] Received sync update with higher NAME sequence number than entry in LSDB
1517602148.305660 DEBUG: [nlsr.Lsdb] Fetching Data for LSA: /localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/spurs/NAME/%0C Seq number: 12
1517602148.305708 TRACE: [nlsr.Nlsr] NLSR: Connect to SegmentFetcher.
1517602148.305727 DEBUG: [nlsr.SyncLogicHandler] Update Name: /localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/spurs/NAME Seq no: 23
1517602148.305741 DEBUG: [nlsr.SyncLogicHandler] Origin Router of update: /ndn/edu/ucla/%C1.Router/cs/spurs
1517602148.305752 DEBUG: [nlsr.SyncLogicHandler] Received sync update with higher NAME sequence number than entry in LSDB
1517602148.305777 DEBUG: [nlsr.Lsdb] Fetching Data for LSA: /localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/spurs/NAME/%17 Seq number: 23
1517602148.305830 TRACE: [nlsr.Nlsr] NLSR: Connect to SegmentFetcher.
1517602148.305845 DEBUG: [nlsr.SyncLogicHandler] Update Name: /localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/spurs/NAME Seq no: 34
1517602148.305858 DEBUG: [nlsr.SyncLogicHandler] Origin Router of update: /ndn/edu/ucla/%C1.Router/cs/spurs
1517602148.305869 DEBUG: [nlsr.SyncLogicHandler] Received sync update with higher NAME sequence number than entry in LSDB
1517602148.305892 DEBUG: [nlsr.Lsdb] Fetching Data for LSA: /localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/spurs/NAME/%22 Seq number: 34
Actions #3

Updated by Ashlesh Gawande almost 7 years ago

  • Related to Bug #4218: ChronoSync exclude interests should be mustbefresh added
Actions #4

Updated by Ashlesh Gawande almost 7 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Alex Afanasyev almost 7 years ago

The solution is incomplete, as it didn't provision for NDNLPv2 headers, resulting in the following error

1518476649.723324 FATAL: [nlsr.NlsrRunner] ERROR: input buffer full, but a valid TLV cannot be decoded
ERROR: input buffer full, but a valid TLV cannot be decoded
  • limit needs to be configurable (in ChronoSync/setMethod and NLSR/config file)

  • we should implement bzip2 of the content, given it is trivial to do. Here is a snippet for that:

#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/detail/iostream.hpp>
#include <boost/iostreams/filter/bzip2.hpp>
#include <boost/iostreams/copy.hpp>

...

  OBufferStream os;
  bio::filtering_stream<bio::output> out;
  out.push(bio::bzip2_compressor());
  out.push(os);
  bio::stream<bio::array_source> in(reinterpret_cast<const char*>(stuff.wire()), stuff.size());
  bio::copy(in, out);

  Data newData(*data);
  newData.setContent(os.buf());
  KeyChain keyChain;
  keyChain.sign(newData);

  io::save(newData, std::string(argv[1]) + ".compressed", io::NO_ENCODING);
Actions #6

Updated by Ashlesh Gawande almost 7 years ago

Okay, the user specified limit will replace MAX_NDN_PACKET_SIZE while comparing to syncReply size?

Actions #7

Updated by Ashlesh Gawande almost 7 years ago

Would it be possible to post the ChronoSync log before the crash from this node and other nodes?
When we used NSync we never encountered this problem. Are we encountering this problem now because of recovery + exclude mechanism that was added to ChronoSync later or because we no longer use protobuf to encode?

Actions #8

Updated by Alex Afanasyev almost 7 years ago

The testbed size has grown considerably. Also, having separate chronosync prefixes for NAME and COORDINATE inflated the size of Sync Data. This is content (just names) that was captured today on testbed:

/localhop/ndn/NLSR/LSA/ch/unibas/%C1.Router/dmi-ndn-testbed1/NAME/%00%00%01a%8B%87%229
/localhop/ndn/NLSR/LSA/ch/unibas/%C1.Router/dmi-ndn-testbed1/COORDINATE/%00%00%01a%8B%87%22G
/localhop/ndn/NLSR/LSA/fr/lip6/%C1.Router/ndnhub/NAME/%00%00%01a%8B%87%21%AC
/localhop/ndn/NLSR/LSA/fr/lip6/%C1.Router/ndnhub/COORDINATE/%00%00%01a%8B%87%21%B0
/localhop/ndn/NLSR/LSA/it/afasystems/%C1.Router/ndn/NAME/%00%00%01a%8B~%8F%87
/localhop/ndn/NLSR/LSA/it/afasystems/%C1.Router/ndn/COORDINATE/%00%00%01a%8B~%8F%90
/localhop/ndn/NLSR/LSA/edu/neu/%C1.Router/ndnrtr/NAME/%00%00%01a%8B7%92%ED
/localhop/ndn/NLSR/LSA/edu/neu/%C1.Router/ndnrtr/NAME/%00%00%01a%8B%87%221
/localhop/ndn/NLSR/LSA/edu/neu/%C1.Router/ndnrtr/COORDINATE/%00%00%01a%8B7%92%F4
/localhop/ndn/NLSR/LSA/edu/neu/%C1.Router/ndnrtr/COORDINATE/%00%00%01a%8B%87%229
/localhop/ndn/NLSR/LSA/edu/uci/%C1.Router/ndnhub/NAME/%00%00%01a%8B7%93%3E
/localhop/ndn/NLSR/LSA/edu/uci/%C1.Router/ndnhub/NAME/%00%00%01a%8B%87%24%04
/localhop/ndn/NLSR/LSA/edu/uci/%C1.Router/ndnhub/COORDINATE/%00%00%01a%8B7%93J
/localhop/ndn/NLSR/LSA/edu/uci/%C1.Router/ndnhub/COORDINATE/%00%00%01a%8B%87%24%10
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/NAME/%00%00%01a%8B7%92H
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/NAME/%00%00%01a%8B%87%22%92
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B7%92L
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B7%A5%8E
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B7%B7%B5
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B7%CAK
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B7%DD%C6
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B7%F1e
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B8%16V
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B8%28%E8
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B8%3A%A3
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B8L%B6
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B8_%B0
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B8r%F2
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B8%86T
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B8%9A%09
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B8%AC%B5
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B8%C0%1C
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B8%D3%C3
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B8%E7%18
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/aleph/COORDINATE/%00%00%01a%8B%87%22%97
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/spurs/NAME/%00%00%01a%8B7%96%FE
/localhop/ndn/NLSR/LSA/edu/ucla/%C1.Router/cs/spurs/COORDINATE/%00%00%01a%8B7%97%20
/localhop/ndn/NLSR/LSA/edu/umich/%C1.Router/ndn0/NAME/%00%00%01a%8B7%92%F4
/localhop/ndn/NLSR/LSA/edu/umich/%C1.Router/ndn0/NAME/%00%00%01a%8B%87%22%8A
/localhop/ndn/NLSR/LSA/edu/umich/%C1.Router/ndn0/COORDINATE/%00%00%01a%8B7%92%F9
/localhop/ndn/NLSR/LSA/edu/umich/%C1.Router/ndn0/COORDINATE/%00%00%01a%8B%87%22%8E
/localhop/ndn/NLSR/LSA/edu/wustl/%C1.Router/wundngw/NAME/%00%00%01a%8B7%93u
/localhop/ndn/NLSR/LSA/edu/wustl/%C1.Router/wundngw/NAME/%00%00%01a%8B%87%23%0C
/localhop/ndn/NLSR/LSA/edu/wustl/%C1.Router/wundngw/COORDINATE/%00%00%01a%8B7%93%82
/localhop/ndn/NLSR/LSA/edu/wustl/%C1.Router/wundngw/COORDINATE/%00%00%01a%8B%87%23%16
/localhop/ndn/NLSR/LSA/edu/arizona/%C1.Router/hobo/NAME/%00%00%01a%8B7%93%00
/localhop/ndn/NLSR/LSA/edu/arizona/%C1.Router/hobo/NAME/%00%00%01a%8B%87%24g
/localhop/ndn/NLSR/LSA/edu/arizona/%C1.Router/hobo/COORDINATE/%00%00%01a%8B7%93%09
/localhop/ndn/NLSR/LSA/edu/arizona/%C1.Router/hobo/COORDINATE/%00%00%01a%8B%87%24p
/localhop/ndn/NLSR/LSA/edu/memphis/%C1.Router/titan/NAME/%00%00%01a%8B7%94%E1
/localhop/ndn/NLSR/LSA/edu/memphis/%C1.Router/titan/NAME/%00%00%01a%8B%87%26%8C
/localhop/ndn/NLSR/LSA/edu/memphis/%C1.Router/titan/COORDINATE/%00%00%01a%8B7%94%F4
/localhop/ndn/NLSR/LSA/edu/memphis/%C1.Router/titan/COORDINATE/%00%00%01a%8B%87%26%9E
/localhop/ndn/NLSR/LSA/edu/illinois/%C1.Router/ndnx/NAME/%00%00%01a%8B7%93%81
/localhop/ndn/NLSR/LSA/edu/illinois/%C1.Router/ndnx/NAME/%00%00%01a%8B%87%23%91
/localhop/ndn/NLSR/LSA/edu/illinois/%C1.Router/ndnx/COORDINATE/%00%00%01a%8B7%93%8D
/localhop/ndn/NLSR/LSA/edu/illinois/%C1.Router/ndnx/COORDINATE/%00%00%01a%8B%87%23%A3
/localhop/ndn/NLSR/LSA/edu/colostate/%C1.Router/mccoy/NAME/%00%00%01a%8B7%DD%0F
/localhop/ndn/NLSR/LSA/edu/colostate/%C1.Router/mccoy/NAME/%00%00%01a%8B%87%23%87
/localhop/ndn/NLSR/LSA/edu/colostate/%C1.Router/mccoy/COORDINATE/%00%00%01a%8B7%DD%16
/localhop/ndn/NLSR/LSA/edu/colostate/%C1.Router/mccoy/COORDINATE/%00%00%01a%8B%87%23%8E
/localhop/ndn/NLSR/LSA/org/caida/%C1.Router/click/NAME/%00%00%01a%8B7%94_
/localhop/ndn/NLSR/LSA/org/caida/%C1.Router/click/NAME/%00%00%01a%8B%87%24l
/localhop/ndn/NLSR/LSA/org/caida/%C1.Router/click/COORDINATE/%00%00%01a%8B7%94t
/localhop/ndn/NLSR/LSA/org/caida/%C1.Router/click/COORDINATE/%00%00%01a%8B%87%24%81

This obviously would be extremely well compressed. My quick test (with the snippet above) reduced size of this packet from 6k to 1.4k.

Also. This has nothing to do with the exclude, as exclude has no relation to sync data size.

Actions #9

Updated by Alex Afanasyev almost 7 years ago

Ashlesh Gawande wrote:

Okay, the user specified limit will replace MAX_NDN_PACKET_SIZE while comparing to syncReply size?

Yes, with MAX_NDN_PACKET_SIZE acting as ultimate limit. We also should add a few bytes for NDNLP overheads. I'll leave to others to suggest what the reasonable number would be for that.

Actions #10

Updated by Ashlesh Gawande almost 7 years ago

If we don't set any tags on the data packet then the lpPacket should be empty and we would be sending a Data packet w/bare encoding, right?
Then why do we get this error?

Actions #11

Updated by Alex Afanasyev almost 7 years ago

If we don't, then may be we wouldn't see the error. On testbed, we ARE adding ndnlp headers now.

Actions #13

Updated by Ashlesh Gawande almost 7 years ago

  • Related to Bug #4513: Need to use a fixed session name for ChronoSync sockets added
Actions #14

Updated by Ashlesh Gawande over 6 years ago

  • Status changed from In Progress to Closed
Actions

Also available in: Atom PDF