Project

General

Profile

Feature #4364

NFD - Feature #1624: Design and Implement Congestion Control

Implement congestion control in SegmentFetcher

Added by Klaus Schneider over 3 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Utils
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:

Description

The congestion control functionality should be ported from one specific consumer application to an API that is more easily usable by multiple applications. This task is only about a simple API which provides a high-level reliable transport object retrieval (like a file download or non-time sensitive video playback).

Some references:


Related issues

Blocked by ndn-tools - Feature #4289: ndncatchunks: React to congestion marksClosedChavoosh Ghasemi

Actions
Blocked by ndn-cxx - Task #4464: Make SegmentFetcher pure Signal-basedClosedEric Newberry

Actions
Blocks ndn-tools - Task #4637: ndncatchunks: use SegmentFetcherNew

Actions
#1

Updated by Klaus Schneider over 3 years ago

  • Blocked by Feature #4289: ndncatchunks: React to congestion marks added
#2

Updated by Davide Pesavento over 3 years ago

  • Tracker changed from Task to Feature

I'm guessing this API should go in ndn-cxx (or a separate library on top of ndn-cxx).

#3

Updated by Klaus Schneider over 3 years ago

Davide Pesavento wrote:

I'm guessing this API should go in ndn-cxx (or a separate library on top of ndn-cxx).

Makes sense.

#4

Updated by Junxiao Shi about 3 years ago

  • Project changed from NFD to ndn-cxx
  • Category set to Base
  • Target version set to v0.7
#5

Updated by Davide Pesavento about 3 years ago

  • Subject changed from Create Consumer API for Congestion Control to Design Consumer API for Congestion Control
#6

Updated by Klaus Schneider about 3 years ago

  • Subject changed from Design Consumer API for Congestion Control to Design Simple Consumer API for Congestion Control
  • Description updated (diff)
  • Assignee set to Eric Newberry
#7

Updated by Klaus Schneider about 3 years ago

I'm thinking about the following functionality abstraction:

At the producer:

  • putObject(data, Name): Splits up the data into chunks, signs it, and puts it into the content repository.

At the consumer:

  • data getObject(Name): Reliably retrieves the data under a given name. This includes window adaptation, congestion control, and retransmissions.

This functionality is currently provided by the putchunks and catchunks programs. However, the API should allow other applications to use these functions.

#8

Updated by Klaus Schneider about 3 years ago

  • Description updated (diff)
#9

Updated by Eric Newberry about 3 years ago

Should the putObject method above be blocking or non-blocking?

#10

Updated by Davide Pesavento about 3 years ago

Klaus Schneider wrote:

putObject(data, Name): Splits up the data into chunks, signs it, and puts it into the content repository.

  • What is "data"?
  • Is "content repository" an in-memory structure?
  • What happens if putObject is called more than once with the same Name? Does it create a new version? Throws an error? Something else?
  • What happens to "old" data? Is it kept in the "content repository" forever?

data getObject(Name): Reliably retrieves the data under a given name. This includes window adaptation, congestion control, and retransmissions.

  • If the data cannot be retrieved, does getObject keep retrying ad infinitum?
#11

Updated by Klaus Schneider about 3 years ago

These are all good questions. Let's start with an easy one:

  • If the data cannot be retrieved, does getObject keep retrying ad infinitum?

No, there should be a maximum-retry threshold, after which the API gives up and returns an error code.

In catchunks this threshold is 3 by default, which I found often to be too low (at least with congestion marks disabled): Catchunks often returned an error, even though the file could have been transferred successfully with a couple more retransmissions.

Davide Pesavento wrote:

Klaus Schneider wrote:

putObject(data, Name): Splits up the data into chunks, signs it, and puts it into the content repository.

  • What is "data"?

The number of bytes that make up a content object. Can be a file on a disk, but doesn't have to be.

  • Is "content repository" an in-memory structure?

Intuitively, I thought of a permanent on-disk storage place, the way the term "repository" is used in the original CCN paper. However, I realize that putchunks serves the content from memory.

  • What happens if putObject is called more than once with the same Name? Does it create a new version? Throws an error? Something else?

Throwing an error would be the simpler solution. But if that's not satisfactory, we need some kind of version management.

  • What happens to "old" data? Is it kept in the "content repository" forever?

Yes?

I mean realistically, there should be some way to remove/overwrite it once the repository is full. But I'm not sure if we have to worry about this right now.

#12

Updated by Klaus Schneider about 3 years ago

Eric Newberry wrote:

Should the putObject method above be blocking or non-blocking?

Can we support both?

#13

Updated by Davide Pesavento about 3 years ago

Klaus Schneider wrote:

  • If the data cannot be retrieved, does getObject keep retrying ad infinitum?

No, there should be a maximum-retry threshold, after which the API gives up and returns an error code.

In catchunks this threshold is 3 by default, which I found often to be too low (at least with congestion marks disabled): Catchunks often returned an error, even though the file could have been transferred successfully with a couple more retransmissions.

Maybe a global timeout ("how long is the user willing to wait?") makes more sense than a max no. of retries.

  • Is "content repository" an in-memory structure?

Intuitively, I thought of a permanent on-disk storage place, the way the term "repository" is used in the original CCN paper. However, I realize that putchunks serves the content from memory.

You haven't answered the question. If it's in memory, we already have InMemoryStorage that we can use. An on-disk repository is more complicated.

#14

Updated by Klaus Schneider about 3 years ago

Davide Pesavento wrote:

Klaus Schneider wrote:

  • If the data cannot be retrieved, does getObject keep retrying ad infinitum?

No, there should be a maximum-retry threshold, after which the API gives up and returns an error code.

In catchunks this threshold is 3 by default, which I found often to be too low (at least with congestion marks disabled): Catchunks often returned an error, even though the file could have been transferred successfully with a couple more retransmissions.

Maybe a global timeout ("how long is the user willing to wait?") makes more sense than a max no. of retries.

Although that creates a couple of new questions: How is the waiting time measured? From the first request of the file? Or is the timer reset for each request?

Also, doesn't the tolerance ("how long is the user willing to wait?") depend on the file size?

I think it's easier to just increase the threshold a bit. + The congestion marks should get rid of the problem anyways.

  • Is "content repository" an in-memory structure?

Intuitively, I thought of a permanent on-disk storage place, the way the term "repository" is used in the original CCN paper. However, I realize that putchunks serves the content from memory.

You haven't answered the question. If it's in memory, we already have InMemoryStorage that we can use. An on-disk repository is more complicated.

Yeah, I would say let's start with the simpler solution.

I vaguely remember running experiments with a permanent repository. But that might have been with the ccnx-0.8.2 software, ccnputfile and ccngetfile.

#15

Updated by Davide Pesavento about 3 years ago

Klaus Schneider wrote:

Davide Pesavento wrote:

Maybe a global timeout ("how long is the user willing to wait?") makes more sense than a max no. of retries.

Although that creates a couple of new questions: How is the waiting time measured? From the first request of the file? Or is the timer reset for each request?

Also, doesn't the tolerance ("how long is the user willing to wait?") depend on the file size?

I realized I haven't really explained what I meant. As long as the transfer progresses, there's no timeout. The countdown starts when the transfer stalls (no data packets are received anymore). This model matches what I, as a user, would expect from a simple "file download" application, and I suppose current web browsers also operate in a similar manner.

#16

Updated by Klaus Schneider about 3 years ago

Davide Pesavento wrote:

Klaus Schneider wrote:

Davide Pesavento wrote:

Maybe a global timeout ("how long is the user willing to wait?") makes more sense than a max no. of retries.

Although that creates a couple of new questions: How is the waiting time measured? From the first request of the file? Or is the timer reset for each request?

Also, doesn't the tolerance ("how long is the user willing to wait?") depend on the file size?

I realized I haven't really explained what I meant. As long as the transfer progresses, there's no timeout. The countdown starts when the transfer stalls (no data packets are received anymore). This model matches what I, as a user, would expect from a simple "file download" application, and I suppose current web browsers also operate in a similar manner.

Yeah, this sounds good. So how long should this timeout be? 1-2 seconds?

#17

Updated by Davide Pesavento about 3 years ago

Klaus Schneider wrote:

Yeah, this sounds good. So how long should this timeout be? 1-2 seconds?

It should be provided by the application. But the default should be much longer than that... I was thinking 30-60 seconds.

#18

Updated by Klaus Schneider about 3 years ago

Davide Pesavento wrote:

Klaus Schneider wrote:

Yeah, this sounds good. So how long should this timeout be? 1-2 seconds?

It should be provided by the application. But the default should be much longer than that... I was thinking 30-60 seconds.

Sounds too long to me.

Unless we're dealing with mobility/intermittent connectivity, receiving no packets whatsoever for 2 seconds is already an extraordinarily rare event.

I don't think the default should be optimized for that case. But I do agree that the application should be able to choose that value.

#19

Updated by Davide Pesavento about 3 years ago

In 2018, I consider it a bug to not take mobility or imperfect connection conditions into account, regardless of the application. I don't want my download to fail because my wifi card is roaming to a different AP, or because my phone switched from wifi to cellular or vice versa.

#20

Updated by Klaus Schneider about 3 years ago

Fair enough, but even those things take less than 5 seconds.

On the other hand, if my connection goes down or some link fails, I don't want to wait 30-60s for the application to tell me.

Maybe we resolve both needs by having the application print a message after X seconds that says "currently experiencing connectivity problems", just like Bluejeans does sometimes (at least on my connection).

#21

Updated by Junxiao Shi about 3 years ago

Two parts are missing in note-7 design:

  • At consumer: how to input the trust schema to the retriever?
  • At producer: how to publish a continuous stream as opposed to an object with a finite ending? It’s fine to pause reading from the stream when consumer isn’t making progress, similar to how ‘netcat’ works.
#22

Updated by Klaus Schneider about 3 years ago

Junxiao Shi wrote:

Two parts are missing in note-7 design:

  • At consumer: how to input the trust schema to the retriever?

No idea. Don't we have security people to figure this out? :)

As far as I know, ndncatchunks doesn't perform any authenticity/integrity checks on the data either.

  • At producer: how to publish a continuous stream as opposed to an object with a finite ending? It’s fine to pause reading from the stream when consumer isn’t making progress, similar to how ‘netcat’ works.

Maybe we can focus on finite objects first, and worry about continuous streams later?

#23

Updated by Eric Newberry about 3 years ago

  • Status changed from New to In Progress
#24

Updated by Eric Newberry about 3 years ago

Klaus Schneider wrote:

Eric Newberry wrote:

Should the putObject method above be blocking or non-blocking?

Can we support both?

To me, it only makes sense that putObject would split up the content into chunks and then serve it asynchronously, returning after it sets up the InterestCallback.

For getObject, I think we can only support asynchronous operation because expressInterest is asynchronous.

#25

Updated by Eric Newberry about 3 years ago

Eric Newberry wrote:

Klaus Schneider wrote:

Eric Newberry wrote:

Should the putObject method above be blocking or non-blocking?

Can we support both?

To me, it only makes sense that putObject would split up the content into chunks and then serve it asynchronously, returning after it sets up the InterestCallback.

For getObject, I think we can only support asynchronous operation because expressInterest is asynchronous.

Disregard both. I forgot about the mechanics of Face::processEvents.

#26

Updated by Klaus Schneider almost 3 years ago

  • Description updated (diff)

The Cisco guys presented some update on their socket API on the last icnrg meeting: https://datatracker.ietf.org/meeting/interim-2018-icnrg-01/materials/slides-interim-2018-icnrg-01-sessa-hicn-socket-library-for-http-luca-muscariello-00

Might be interesting to figure out the differences to our API design.

#27

Updated by Klaus Schneider almost 3 years ago

  • Priority changed from Normal to High
#28

Updated by Klaus Schneider almost 3 years ago

  • Status changed from In Progress to Code review
  • % Done changed from 0 to 10
#30

Updated by Junxiao Shi almost 3 years ago

At consumer: how to input the trust schema to the retriever?

No idea. Don't we have security people to figure this out? :)

It needs a Validator at least. You can report validation errors to the callers, although caller's processing logic is tricky (see #3663).

At producer: how to publish a continuous stream as opposed to an object with a finite ending? It’s fine to pause reading from the stream when consumer isn’t making progress, similar to how ‘netcat’ works.

Maybe we can focus on finite objects first, and worry about continuous streams later?

Continuous streams are more important (see #4396). I don't see how "finite object publishing" is related to congestion control. It is just a regular file server.

https://gerrit.named-data.net/c/4651/

In patchset5 the code is in src/util/ as two new classes.
I disagree with having producer side at all.
Consumer side should be integrated into SegmentFetcher that essentially does the same. You can move it to src/socket/ if it's going to be more complicated.

#31

Updated by Klaus Schneider almost 3 years ago

Junxiao Shi wrote:

At consumer: how to input the trust schema to the retriever?

No idea. Don't we have security people to figure this out? :)

It needs a Validator at least. You can report validation errors to the callers, although caller's processing logic is tricky (see #3663).

Yeah, we can think about that.

At producer: how to publish a continuous stream as opposed to an object with a finite ending? It’s fine to pause reading from the stream when consumer isn’t making progress, similar to how ‘netcat’ works.

Maybe we can focus on finite objects first, and worry about continuous streams later?

Continuous streams are more important (see #4396).

#4396 talks about "real-time applications". How does it show that those are more important than all the other applications?

I don't see how "finite object publishing" is related to congestion control.

Finite objects that consist of multiple chunks need congestion control to avoid overloading any links/routers inside the network. I really don't see any difference to streams in this regard.

It is just a regular file server.

It's like developing a regular file server without being able to use the TCP protocol. The transport-layer part is likely the larger contribution of this API.

https://gerrit.named-data.net/c/4651/

In patchset5 the code is in src/util/ as two new classes.
I disagree with having producer side at all.

How can you fetch content without having a producer?

Consumer side should be integrated into SegmentFetcher that essentially does the same. You can move it to src/socket/ if it's going to be more complicated.

Can you give me a link to "SegmentFetcher"?

#32

Updated by Junxiao Shi almost 3 years ago

I don't see how "finite object publishing" is related to congestion control.

Finite objects that consist of multiple chunks need congestion control to avoid overloading any links/routers inside the network. I really don't see any difference to streams in this regard.

In 4651,5 CongestionProducer I see nothing related to congestion control. Its Interest processing behavior has no difference from a standard repository.

I disagree with having producer side at all.

How can you fetch content without having a producer?

Use a standard repository such as repo-sql or NDNFS.

Consumer side should be integrated into SegmentFetcher that essentially does the same. You can move it to src/socket/ if it's going to be more complicated.

Can you give me a link to "SegmentFetcher"?

See src/util/segment-fetcher.hpp.

#33

Updated by Klaus Schneider almost 3 years ago

Junxiao Shi wrote:

I don't see how "finite object publishing" is related to congestion control.

Finite objects that consist of multiple chunks need congestion control to avoid overloading any links/routers inside the network. I really don't see any difference to streams in this regard.

In 4651,5 CongestionProducer I see nothing related to congestion control.

Well, that's because the congestion control part is implemented on the consumer side. You can find it in the CongestionConsumer class.

I still don't see how this has anything to do with the difference between streams and finite objects.

Its Interest processing behavior has no difference from a standard repository.

Two things the producer does is to decide 1) how to segment a content object into chunks and 2) how to name those chunks.

This may be the same as a "standard repository", but then the behavior of the standard repository needs to be part of the API. For example, the consumer needs to know the naming convention.

I disagree with having producer side at all.

How can you fetch content without having a producer?

Use a standard repository such as repo-sql or NDNFS.

See my comment above.

Consumer side should be integrated into SegmentFetcher that essentially does the same. You can move it to src/socket/ if it's going to be more complicated.

Can you give me a link to "SegmentFetcher"?

See src/util/segment-fetcher.hpp.

Our API seems to be higher-level than the SegementFetcher. That is, it provides a "fetchObject()" rather than a "fetchNextSegment()" function.

But I agree that there seems to be a lot of overlap. Maybe Eric can look into it.

#34

Updated by Junxiao Shi almost 3 years ago

Two things the producer does is to decide 1) how to segment a content object into chunks and 2) how to name those chunks.

This may be the same as a "standard repository", but then the behavior of the standard repository needs to be part of the API. For example, the consumer needs to know the naming convention.

Use a standard repository such as repo-sql or NDNFS.

ndnputfile inserts a file (i.e. finite object) into a repo-ng instance; there's no equivalent in repo-sql but you can send a pull request or I can add one in the future.
NDNFS natively supports files via FUSE.
They all use the same naming convention that is compatible with ndnputchunks.

Our API seems to be higher-level than the SegementFetcher. That is, it provides a "fetchObject()" rather than a "fetchNextSegment()" function.

The main API of SegmentFetcher is SegmentFetcher::fetch static function, which retrieves a finite object, not an individual segment. fetchNextSegment is a private function.

#35

Updated by Klaus Schneider almost 3 years ago

Junxiao Shi wrote:

Two things the producer does is to decide 1) how to segment a content object into chunks and 2) how to name those chunks.

This may be the same as a "standard repository", but then the behavior of the standard repository needs to be part of the API. For example, the consumer needs to know the naming convention.

Use a standard repository such as repo-sql or NDNFS.

ndnputfile inserts a file (i.e. finite object) into a repo-ng instance; there's no equivalent in repo-sql but you can send a pull request or I can add one in the future.
NDNFS natively supports files via FUSE.
They all use the same naming convention that is compatible with ndnputchunks.

So the existing repos and putchunks all use the same unwritten naming conventions.

I think it's useful to make these conventions explicit and part of the API.

Our API seems to be higher-level than the SegementFetcher. That is, it provides a "fetchObject()" rather than a "fetchNextSegment()" function.

The main API of SegmentFetcher is SegmentFetcher::fetch static function, which retrieves a finite object, not an individual segment. fetchNextSegment is a private function.

Oh, I must have overlooked that. We should see how much of the code we can reuse.

#36

Updated by Eric Newberry almost 3 years ago

Junxiao Shi wrote:

At producer: how to publish a continuous stream as opposed to an object with a finite ending? It’s fine to pause reading from the stream when consumer isn’t making progress, similar to how ‘netcat’ works.

Maybe we can focus on finite objects first, and worry about continuous streams later?

Continuous streams are more important (see #4396). I don't see how "finite object publishing" is related to congestion control. It is just a regular file server.

I disagree - both are important. Congestion control is related to "finite object pushing" or any other transfer because the amount of data present may congest some links, no matter if it is a continuous stream or a finite object. This is especially true in the latter case when the object is large. However, in this particular scenario, we just want to make an API for finite objects to demonstrate congestion control and allow it to be easily adopted by NDN applications.

#37

Updated by Eric Newberry almost 3 years ago

Klaus Schneider wrote:

Junxiao Shi wrote:

Two things the producer does is to decide 1) how to segment a content object into chunks and 2) how to name those chunks.

This may be the same as a "standard repository", but then the behavior of the standard repository needs to be part of the API. For example, the consumer needs to know the naming convention.

Use a standard repository such as repo-sql or NDNFS.

ndnputfile inserts a file (i.e. finite object) into a repo-ng instance; there's no equivalent in repo-sql but you can send a pull request or I can add one in the future.
NDNFS natively supports files via FUSE.
They all use the same naming convention that is compatible with ndnputchunks.

So the existing repos and putchunks all use the same unwritten naming conventions.

I think it's useful to make these conventions explicit and part of the API.

I 100% agree. Nothing should be "assumed" in a protocol without being explicitly written down.

#38

Updated by Eric Newberry almost 3 years ago

I think it would be good to see whether congestion control can be merged with the SegmentFetcher API, instead of as a separate component. I'll evaluate this and report back.

#39

Updated by Davide Pesavento almost 3 years ago

They all use the same naming convention that is compatible with ndnputchunks.

So the existing repos and putchunks all use the same unwritten naming conventions.

I think it's useful to make these conventions explicit and part of the API.

I 100% agree. Nothing should be "assumed" in a protocol without being explicitly written down.

Are you talking about https://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-naming-conventions.pdf ?

#40

Updated by Klaus Schneider almost 3 years ago

I was thinking about getting rid of the version numbers in the name, in order to simplify the API.

For catchunks the version numbers are a major pain (requires a discovery mechanism and a timeout in the beginning) + they seem to be barely used by anyone.

In fact, I'm not sure if ndnputchunks even supports uploading the same content with a newer version.

If someone wants a new version, they can just change the name slightly, or add their own version prefix for the name.

Any comments or use cases that I'm missing here?

#41

Updated by Junxiao Shi almost 3 years ago

  • Status changed from Code review to Feedback

I was thinking about getting rid of the version numbers in the name, in order to simplify the API.
For catchunks the version numbers are a major pain (requires a discovery mechanism and a timeout in the beginning) + they seem to be barely used by anyone.

Versioning is independent from congestion control. You can keep SegmentFetcher’s very simple version discovery (send ChildSelector=1 and take whatever version is returned) for now, and focus on how to add congestion control to the retrieval process after the version is determined.
In #4555 we can consider remove or change SegmentFetcher’s version discovery procedure.

#42

Updated by Junxiao Shi almost 3 years ago

Do not change SegmentFetcher::fetch function. Complete #4464 first before adding more features.

Additional explanation:
Parameters to the congestion control algorithm should be provided as getter/setter on the SegmentFetcher instance. You may either define multiple getters and setters, or combine several parameters into a POD struct and set all of them at once.
Statistics of the congestion control algorithms can be collected internally, and exported via getter. If collection of some statistics would cause high performance overhead, they should be disabled initially and enabled via a function call.

Regarding #4555: There’s no blocking relation. However, this issue must not introduce new selector usage.

#43

Updated by Junxiao Shi almost 3 years ago

  • Blocked by Task #4464: Make SegmentFetcher pure Signal-based added
#44

Updated by Eric Newberry almost 3 years ago

  • Status changed from Feedback to Code review
  • % Done changed from 10 to 100
#45

Updated by Eric Newberry over 2 years ago

  • Blocks Task #4637: ndncatchunks: use SegmentFetcher added
#46

Updated by Eric Newberry over 2 years ago

  • Status changed from Code review to Closed
#47

Updated by Jeff Thompson over 2 years ago

Where is the congestion control API documented?

#48

Updated by Eric Newberry over 2 years ago

Jeff Thompson wrote:

Where is the congestion control API documented?

Are you looking for end-user docs (e.g., how to use the API) or docs on the internal design of the fetching API?

For the former, docs should be available in-line and on Doxygen (util::SegmentFetcher).

#49

Updated by Davide Pesavento over 2 years ago

Jeff Thompson wrote:

Where is the congestion control API documented?

We actually did not design any new API in this task. The code that was merged simply implements congestion control in the existing SegmentFetcher component. The issue title should be changed to avoid confusion.

#50

Updated by Davide Pesavento over 2 years ago

Also, I'm not sure what "congestion control API" means. Congestion control is not an API.

#51

Updated by Davide Pesavento over 2 years ago

  • Subject changed from Design Simple Consumer API for Congestion Control to Implement congestion control in SegmentFetcher
  • Category changed from Base to Utils
#52

Updated by Eric Newberry over 2 years ago

Davide Pesavento wrote:

Jeff Thompson wrote:

Where is the congestion control API documented?

We actually did not design any new API in this task. The code that was merged simply implements congestion control in the existing SegmentFetcher component. The issue title should be changed to avoid confusion.

However, the SegmentFetcher API was changed recently in #4464.

#53

Updated by Davide Pesavento over 2 years ago

Yes, but that had nothing to do with congestion control.

Also available in: Atom PDF