Project

General

Profile

Actions

Feature #3009

closed

Task #2949: Adding libcrypto-based crypto support

Crypto transformation concatenation

Added by Yingdi Yu over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Security
Target version:
Start date:
07/02/2015
Due date:
% Done:

100%

Estimated time:

Description

CryptoPP provides a nice C++ interface to concatenate crypto transformation.
OpenSSL, though providing concatenation to certain extent, but is not easy to use.
It would be desired to provide a C++ wrapper for OpenSSL to achieve the CryptoPP style transformation concatenation.

Actions #1

Updated by Yingdi Yu over 8 years ago

  • Status changed from New to Code review
  • Assignee set to Yingdi Yu
  • % Done changed from 0 to 80
Actions #2

Updated by Junxiao Shi over 8 years ago

I'm reviewing commit:b9b53f729852d047968d0a8fe3dd7683b0d7d229 but I'm really confused with the logic in Transform base class.

Please provide a flowchart, and create a very simple transform (eg. ROT13) that does not use OpenSSL, to illustrate how Transform is supposed to work.

Actions #3

Updated by Junxiao Shi over 8 years ago

In commit:73b92262c11aadf2144f1595291a75e85ea46ed3 , Transform class:

  1. .write accepts input octets from previous filter (such as a Source)
  2. input is passed to subclass, called a converter, via overridden .writeInternal method; .writeInternal method may choose to accept less octets than the input, and the excess will be rejected by .write and stay in previous filter
  3. conversion result stays in subclass
  4. conversion result is pulled from subclass to output buffer via overridden .fillOBuf method, where subclass is responsible for directly manipulating m_oBuf field declared on base class
  5. octets in output buffer is pushed to the next filter

I don't understand why the design uses two buffers: a buffer inside converter, and the m_oBuf buffer.
I think one of these two buffers can be eliminated.

As a though experiment, I can't figure out how ROT13 transform can be efficiently implemented under this design: it seems to require an unnecessary copy.

I also dislike .fillOBuf to manipulate m_oBuf and related fields directly.

The base class should supply .fillOBuf with the range of available space in m_oBuf, and the subclass can write into this range, without worrying about the layout of m_oBuf.

Actions #4

Updated by Yingdi Yu over 8 years ago

The two buffers are due to OpenSSL BIO model. BIO has its internal buffer, however, once you read it out, you can never put it back, so in case we cannot write everything into the next filter, we need to another buffer to store the part of converted result that is read from BIO but hasn't been written into next filter.

For the fillObuffer, we can change to the one you suggest.

Actions #5

Updated by Junxiao Shi over 8 years ago

From bio manpage:

There are two type of BIO, a source/sink BIO and a filter BIO.

BIOs can be joined together to form a chain.
A chain normally consist of one source/sink BIO and one or more filter BIOs.
Data read from or written to the first BIO then traverses the chain to the end (normally a source/sink BIO).

Why are we using OpenSSL BIOs individually and handling data passing ourselves?

We should:

  1. declare a chain of Filters
  2. convert this chain into a chain of BIOs, when .write is invoked for the first time on the Source
  3. push data through the chain of BIOs
  4. read data out of BIO at the Sink

This would save a lot of copying.

Actions #6

Updated by Davide Pesavento over 8 years ago

Given Junxiao's comment above, do we actually need this C++ abstraction at all? Using BIO chains directly doesn't seem too hard, and it would be confined to ndn-cxx's crypto code only anyway...

Actions #7

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

From bio manpage:

There are two type of BIO, a source/sink BIO and a filter BIO.

BIOs can be joined together to form a chain.
A chain normally consist of one source/sink BIO and one or more filter BIOs.
Data read from or written to the first BIO then traverses the chain to the end (normally a source/sink BIO).

Why are we using OpenSSL BIOs individually and handling data passing ourselves?

We should:

  1. declare a chain of Filters
  2. convert this chain into a chain of BIOs, when .write is invoked for the first time on the Source
  3. push data through the chain of BIOs
  4. read data out of BIO at the Sink

This would save a lot of copying.

I tried that, but it does not work for several reasons:

  1. If we construct a BIO chain, the operation on each BIO item is fixed. In other words, if we write data into source, then all the operation on the subsequent BIO item is write. However, the semantic of BIO read and write operations are not consistent for different BIO items. For example, the the base64 BIO uses read as decoding, and write and encoding, while for hash BIO, the the read and write simply pass the original data, we have to get the hash from another way. Even worse, you can cannot decode some data and encode them in another way (because one BIO use read for decoding while another one use write for encoding).

  2. There is no hex BIO, so unless we write our own hex BIO, you cannot construct a BIO chain if hex encoding is involved. However, a self-defined BIO is strongly discouraged because there is a lot of control operations on BIOs that we may not understand its impact on the next BIO item. (I tried to write my own hex BIO, it worked but we still face the problem 1)

  3. The output of transform is not always read out of Sink, for decoding, many of them are read out of Source.

  4. Another solution is to write our own BIO mem sink as an output buffer, so we do not have to create another output buffer. However, openssl does not support all the transformation, and many openssl transformation (such as digest BIO, sign and verify) does not use sink at all. So we still have the problem.

After struggling with these issues in many different ways, I finally gave up and decided to create a separate output buffer...

Actions #8

Updated by Yingdi Yu over 8 years ago

Davide Pesavento wrote:

Given Junxiao's comment above, do we actually need this C++ abstraction at all? Using BIO chains directly doesn't seem to hard, and it would be confined to ndn-cxx's crypto code anyway...

If you take a look at how openssl BIO abstraction works and how cryptopp transformation abstraction works, you would definitely dislike the BIO abstraction, because they just do not have consistent usage pattern. Note that this transformation (such as base64, hex, and encryption) is not only used by ndn-cxx, but some other tools. Although for now we hide transformation classes, but we probably make them public later.

Actions #9

Updated by Junxiao Shi over 8 years ago

Basically, note-7 is saying:

  • Some BIO uses a push model, eg. base64 encode.
  • Some BIO uses a pull model, eg. base64 decode.
  • Some BIO passes data through, and uses a side-door for result, eg. hash.

The solution for JIT compilation (note-5 step 2) is:

  • To connect a pull to a push, create a pump that pulls data from first BIO. and pushes it to second BIO.
  • To connect a push to a pull, create a queue that buffers data from first BIO, and waits for second BIO to fetch it.
  • To get data from a side-door, wrap the BIO so that output is discarded, and side-door is used as output.

OpenSSL BIO must have a rationale in its design.
We should first understand this rationale before blaming the design.

I can't guess why there's a mix of push and pull models, but this should have been discussed on OpenSSL developer mailing list when it's first designed, and we need to dig it out.

I guess the side-door for hash is to improve efficiency:
suppose one wants to compute the hash of some intermediate data, passing data through would allow a chain to be organized more efficiently.

file => gzip => encrypt
             |
             v
            hash

In this imaginary example, a program wants the hash of gzipped file, but needs to output encrypted gzipped file.
If we the hash is returned on the output, two chains (or a tee) would be needed.
But when the hash is returned on the side-door, this can be achieved with only one chain.

Actions #10

Updated by Junxiao Shi over 8 years ago

  • Tracker changed from Task to Feature
  • Subject changed from Support crypto transformation concatenation to Crypto transformation concatenation
  • Description updated (diff)
Actions #11

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

OpenSSL BIO must have a rationale in its design.

It is designed mainly for SSL which usually has a limited number of transformation chain pattern. And openssl is known for bad modularity.

I can't guess why there's a mix of push and pull models, but this should have been discussed on OpenSSL developer mailing list when it's first designed, and we need to dig it out.

I really do not want this block the progress. Yes, I also don't feel the implementation is efficient, but if the current implementation works but just have some performance issue, can we just accept it for now and improve it later if it turns to be a serious performance bottleneck? We cannot just stop here.

I guess the side-door for hash is to improve efficiency:
suppose one wants to compute the hash of some intermediate data, passing data through would allow a chain to be organized more efficiently.

file => gzip => encrypt
             |
             v
            hash

In this imaginary example, a program wants the hash of gzipped file, but needs to output encrypted gzipped file.
If we the hash is returned on the output, two chains (or a tee) would be needed.
But when the hash is returned on the side-door, this can be achieved with only one chain.

Again, we could discuss this with openssl people, but this should not be the reason of blocking the progress.

Actions #12

Updated by Yingdi Yu over 8 years ago

Basically, with current libcrypto interface, we cannot construct a BIO chain to achieve the same transformation abstraction as CryptoPP. That is the reason I gave up using BIO chain directly.

Actions #13

Updated by Junxiao Shi over 8 years ago

And openssl is known for bad modularity.

Please provide citation that shows why OpenSSL has bad modularity, and why CryptoPP's model is better.

I do not disagree with other points in note-11.

Actions #14

Updated by Davide Pesavento over 8 years ago

Yingdi Yu wrote:

Davide Pesavento wrote:

Given Junxiao's comment above, do we actually need this C++ abstraction at all? Using BIO chains directly doesn't seem to hard, and it would be confined to ndn-cxx's crypto code anyway...

If you take a look at how openssl BIO abstraction works and how cryptopp transformation abstraction works, you would definitely dislike the BIO abstraction, because they just do not have consistent usage pattern. Note that this transformation (such as base64, hex, and encryption) is not only used by ndn-cxx, but some other tools. Although for now we hide transformation classes, but we probably make them public later.

Fair enough. I've used openssl in the past and I kinda agree that its API is not the most friendly to use, but on the other hand other plain-C crypto libraries I've tried are not any better. Maybe openssl is not the right tool and we should instead look for another better-maintained C++ crypto library, for example botan?

Actions #15

Updated by Yingdi Yu over 8 years ago

Davide Pesavento wrote:

Fair enough. I've used openssl in the past and I kinda agree that its API is not the most friendly to use, but on the other hand other plain-C crypto libraries I've tried are not any better. Maybe openssl is not the right tool and we should instead look for another better-maintained C++ crypto library, for example botan?

The reason we use OpenSSL is that it is widely used and accepted (therefore the problem in openssl can be found and fixed promptly). And security people from U Mich also suggested us to switch to openssl.

Actions #16

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

Please provide citation that shows why OpenSSL has bad modularity, and why CryptoPP's model is better.

I take the word "bad modularity" back. But OpenSSL, as a C library, is apparently less modular than CryptoPP. I think my explanation above already showed why CryptoPP model is better. It is quite straight forward to construct a chain of transformation. You do not need to worry about whether it should be a read operation or write operation, you do not need to worry about where to get the transformation output. You just need to create filters and then connect them together.

Actions #17

Updated by Junxiao Shi over 8 years ago

I mostly agree with the statement in note-16, and accept the design of unifying all transforms as write-only.

Actions #18

Updated by Alex Afanasyev over 8 years ago

boost/std has some transformation support. Is there any relation to that and can we use that one instead?

The current interface, I think, inspired by cryptopp and I'm not convinced yet that it is an ideal one...

Actions #19

Updated by Davide Pesavento over 8 years ago

Alex Afanasyev wrote:

boost/std has some transformation support. Is there any relation to that and can we use that one instead?

Can you be more specific please? I don't know of anything like this in the std library, maybe boost has something... If C++ was more "functional" we'd simply have a bunch of functors and we'd combine and apply them with map, fold, and so on...

The current interface, I think, inspired by cryptopp and I'm not convinced yet that it is an ideal one...

I'm not convinced either, but (1) it's probably good enough, (2) I haven't seen a counterproposal, and (3) I agree with Yingdi that we shouldn't block other tasks because we're not 100% sure about this interface.

Actions #20

Updated by Junxiao Shi over 8 years ago

I agree with Yingdi that we shouldn't block other tasks because we're not 100% sure about this interface.

I disagree with this statement.
There's no urgency in replacing CryptoPP with OpenSSL; or, at least, #2949 provides no evidence on the urgency.
Thus, we should make sure the API design is good before proceeding.

I have no objection with the API design in commit:1cd5f8b3d5c70e7a7e68f26ddce6ffce7bb9784c.
I only disagree with the statement.

I suggest Alex to propose an API design and argue why it's better.

Actions #21

Updated by Junxiao Shi over 8 years ago

As @Davide pointed out in commit:1cd5f8b3d5c70e7a7e68f26ddce6ffce7bb9784c source.hpp line 37:

I believe that the exception hierarchy is way too overengineered. We don't need that kind of granularity.

Currently (in commit:1cd5f8b3d5c70e7a7e68f26ddce6ffce7bb9784c and commit:7c4a16f9f9c67777e131971138cbcf6d02be52f5):

  • Filter::Error is the base class for all exceptions.
  • Every Stream Transform Sink (eg. SourceStream HexDecode SinkBool) has a nested Error class that directly or indirectly inherits from Filter::Error.
  • Transform1::write method may throw Transform1::Error if an error occurs in this Transform. It may also throw the Error of any Transform connected after this Transform.
  • The recommended syntax for using transforms is SourceStream(is, new Base64Decode(new HexEncode(new SinkStream(os)))). Exceptions from any Stream Transform Sink will ultimately come out of the constructor of the Source, with the original exception type (eg. Base64Decode::Error).

Having multiple exception types is not useful:

  • Using the recommended syntax, a caller has to wrap the whole chain in a try block.

    try {
      SourceStream(is, new Base64Decode(new HexEncode(new SinkStream(os))));
    }
    catch (..
    
  • Although a caller may place multiple catch branches, it's useless in determining which step has failed.

    try {
      SourceStream(is, new AesEncrypt(key1, new AesEncrypt(key2, new SinkStream(os))));
    }
    catch (AesEncrypt::Error& e) {
      // which AesEncrypt has failed?
    }
    catch (..
    

I suggest:

  • Have only one exception type: ndn::security::transform::Error.
  • Indicate which Filter is causing the exception on getIndex property:

    /** \return zero-based index of the Source/Transform/Sink that raises this error
     */
    size_t
    getIndex() const;
    
  • If a Stream Transform Sink needs to provide useful additional information, it may inherit from the base Error class.
    Otherwise, it should use the base Error class directly.

  • Usage:

    try {
      SourceStream(is, new AesEncrypt(key1, new AesEncrypt(key2, new SinkStream(os))));
    }
    catch (Error& e) {
      switch (e.getIndex()) {
      case 0:
        // error in SourceStream
        break;
      case 1:
        // error in AesEncrypt1(key1
        auto keyError = dynamic_cast<AesEncrypt::KeyError*>(&e);
        if (keyError != nullptr) {
          // KeyError in AesEncrypt1
        }
        auto ivError = dynamic_cast<AesEncrypt::IvError*>(&e);
        if (ivError != nullptr) {
          // IvError in AesEncrypt1
        }
        break;
      case 2:
        // error in AesEncrypt1(key2
        break;
      case 3:
        // error in SinkStream
        break;
      default:
        BOOST_ASSERT(false);
        break;
      }
    }
    
Actions #22

Updated by Yingdi Yu over 8 years ago

Alex Afanasyev wrote:

boost/std has some transformation support. Is there any relation to that and can we use that one instead?

The current interface, I think, inspired by cryptopp and I'm not convinced yet that it is an ideal one...

I take a look at std::transform, it does not apply to all the crypto transformations (especially when you want to use it with openssl). Given current code is working, and has CryptoPP as proof-of-concept, I do not want to try std::transform.

I did not find the adequate boost transformation support you mentioned, could you give some references?

More importantly, you are not convinced because of what? what are the better one? If you think this one is bad, why you do not object this task when it was added? Now we have finished almost everything, and then you ask me to abandon all of them?

Actions #23

Updated by Junxiao Shi over 8 years ago

If you think this one is bad, why you do not object this task when it was added? Now we have finished almost everything, and then you ask me to abandon all of them?

This is exactly why I insist on getting Alex's approval (or at least a statement of "no objection") before starting anything.

Design review is important if you want to minimize wasted work due to design problems.

Actions #24

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

If you think this one is bad, why you do not object this task when it was added? Now we have finished almost everything, and then you ask me to abandon all of them?

This is exactly why I insist on getting Alex's approval (or at least a statement of "no objection") before starting anything.

Design review is important if you want to minimize wasted work due to design problems.

I talked to him when I started... and he did not object at that time...

Actions #25

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

I agree with Yingdi that we shouldn't block other tasks because we're not 100% sure about this interface.

I disagree with this statement.
There's no urgency in replacing CryptoPP with OpenSSL; or, at least, #2949 provides no evidence on the urgency.
Thus, we should make sure the API design is good before proceeding.

Just to clarify why I thought it was "urgent", we plan to provide a refactored KeyChain in next release (because I want to present the new security interface in NDNCom), and this is a part of KeyChain refactoring (though this change is the implementation detail, and may not be exposed to public at that time).

Actions #26

Updated by Junxiao Shi over 8 years ago

I talked to him when I started... and he did not object at that time...

Then "no objection" is insufficient.
Typically I'll ask for an "unconditional and non-revokable approval", although I can rarely get one.

In any case, let's see what API @Alex wants to propose.

Actions #27

Updated by Junxiao Shi over 8 years ago

Reply to note-25:

If the new API is equivalent to CryptoPP, KeyChain refactoring may as well use CryptoPP.

Actions #28

Updated by Yingdi Yu over 8 years ago

I generally agree with the comments about the complexity of exception. The reason I keep all the error inheritance there is simply because the class inheritance, no more other reason.

I want to ask the question about the purpose of exception in this particular case?
To me, it is only for debugging rather than recovering or fixing the transformation process,
because caller is supposed to construct the transformation chain correctly.
If you can fix it later, why not avoid the problem before constructing the chain?
Moreover, if the transformation fails in the middle, there is no way to recover it but just start a new transformation with correct parameters.
Therefore, in this case, the exception message is more important than the exception types.

I would rather have just Transform::Error and take Junxiao's suggestion to index where the error happened.

Actions #29

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

I talked to him when I started... and he did not object at that time...

Then "no objection" is insufficient.
Typically I'll ask for an "unconditional and non-revokable approval", although I can rarely get one.

I do not think that is the way most people work. As you said, even you can rarely get one.

Actions #30

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

Reply to note-25:

If the new API is equivalent to CryptoPP, KeyChain refactoring may as well use CryptoPP.

But if we already know we will gradually get rid of CryptoPP, do we still want to add more CryptoPP code which will be deleted later?

Actions #31

Updated by Junxiao Shi over 8 years ago

Reply to note-28:

Yes, transform::Error with an index is the only item necessary.

If you find it difficult to count the index from the source, it's also okay to count the index backwards with the sink as zero.

Type of exception is sometimes useful for better error messages.

Suppose a program using AesEncrypt is working as a CGI script, and inputs are supplied from the browser.
Knowing the type of an error allows the CGI script to tell browser to highlight the input field that is causing an error (eg. a weak key).

Thus, the design should leave this possibility open, buy does not necessarily need to implement different error types.

Actions #32

Updated by Junxiao Shi over 8 years ago

Reply to note-30:

This conflicts with your position in #3006 note-9, where you support adding more code dependent on CryptoPP.

Anyways, I don't disagree with #3006 note-9. I also don't disagree with note-30.

Reply to note-29:

Close to "unconditional and non-revokable approval" is getting one simple commit merged.

I typically refuse to work on a new Change until after all dependent Changes have been merged, to reduce wasted work in case the design is rejected.

If I'm to work on #3009, my first commit would only contain Filter Source Sink base classes plus one Source and one Sink.
I won't go any further before this commit is merged.

Actions #33

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

Reply to note-30:

This conflicts with your position in #3006 note-9, where you support adding more code dependent on CryptoPP.

This is for master branch, and what Alex does has nothing to do with security. We are talking about KeyChain refactoring which is in feature-keychain branch. Transformation pipeline is mainly designed for security.

Reply to note-29:

Close to "unconditional and non-revokable approval" is getting one simple commit merged.

I typically refuse to work on a new Change until after all dependent Changes have been merged, to reduce wasted work in case the design is rejected.

If I'm to work on #3009, my first commit would only contain Filter Source Sink base classes plus one Source and one Sink.
I won't go any further before this commit is merged.

I don't think this will be realistic. We still did not get the very first one done after two weeks... and next commit takes another two weeks... Given we will have dozens of commits for the whole task, how could we get this done before next release?

Actions #34

Updated by Junxiao Shi over 8 years ago

Reply to note-33 part1:

When code dependant on CryptoPP is being added, regardless of which branch is targeted, they contradict with #2949 goal and will eventually be removed.

Reply to note-33 part2:

This is a choice between safety (minimizing wasted work in case of a design rejection) vs efficiency (setting a closer version target).
You can't have both.

For #3009, suppose Source Transform Sink have been merged, I would start multiple commits in parallel for encoding, digest, encryption, etc, because there's no dependency among them.

Actions #35

Updated by Alex Afanasyev over 8 years ago

I'm not going to propose anything else. I have expressed my concern about the implemented interface that looks almost exactly like cryptopp and not like STD/boost. What I wanted is just us check if is there anything in boost/std that resembles such filter concept, instead of reimplementing (I saw something not exactly, but relatively in spirit in Boost.Iterator library http://www.boost.org/doc/libs/1_58_0/libs/iterator/doc/transform_iterator.html).

I'm not going to block this implementation. We can always change it again later, if we decide so.

Actions #36

Updated by Junxiao Shi over 8 years ago

Reply to note-35:

boost::transform_iterator is fundamentally different from ndn::security::transform::Transform.

  • boost::transform_iterator applies a function to every item from the input iterator. Size of input and output must be the same, but the type can be different.
  • ndn::security::transform::Transform applies a function to the input stream. Size of input and output can differ.

In addition to note-21, one place I dislike in commit:1cd5f8b3d5c70e7a7e68f26ddce6ffce7bb9784c is the many )s in the recommended syntax:

SourceStream(is, new Transform1(new Transform2(new SinkStream(os))));

The following syntax looks better:

SourceStream(is) >> Transform1() >> Transform2() >> SinkStream(os);

// need to overload:
// Filter& operator>>(Source&, Filter&);
// Filter& operator>>(Filter&, Filter&);
// void operator>>(Filter&, Sink&);

or even:

is >> Transform1() >> Transform2() >> os;

// need to overload:
// Filter& operator>>(std::istream&, Filter&); and other source types as first argument
// Filter& operator>>(Filter&, Filter&);
// void operator>>(Filter&, std::ostream&); and other sink types as second argument

but I'm unsure whether there would be implementation difficulties.

Note: the existing syntax is acceptable to me.

Actions #37

Updated by Yingdi Yu over 8 years ago

Reply to note-36

This is also my first question about CryptoPP interface when I started the implementation "why using constructors rather than stream operator?" I did try stream operator first. It turned to be complicated, so I went back to the one using constructor.

Actions #38

Updated by Junxiao Shi over 8 years ago

Reply to note-37:

The major difficulty in using operator>> is caused by the left-associativity of >> operator.

  • In a push model (.write), the sink is the innermost element.
  • >> is left associative, which means there must be an intermediate type that represent the source plus zero or more transforms.
  • The source cannot start pumping until the chain is connected to a sink.

This leaves us with two problems:

  • What is the intermediate type?
  • How to inform the source to start pumping when a sink is connected?

I hope C++ would allow us to override operator>> as right-associative, but this is impossible.

Actions #39

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

Reply to note-37:

This leaves us with two problems:

  • What is the intermediate type?

I did not get what you mean by "intermediate type"

  • How to inform the source to start pumping when a sink is connected?

For this one, my original implementation is to pass a pump callback of the source to sink when the chain is constructed using ">>".

The most complicated problem is about how to hold the previous/next filter in a filter. We need to use either pointer or reference due to the polymorphism of filter. In that case, we cannot have a nice expression as:

Source(is) >> Transform1() >> Transform2() >> Sink(os).

but only:

Transform1 tr1;
Transform2 tr2;
Sink sink(os);

Source(is) >> tr1 >> tr2 >> sink;

I do not have a good solution for now

Actions #40

Updated by Davide Pesavento over 8 years ago

Junxiao Shi wrote:

I hope C++ would allow us to override operator>> as right-associative, but this is impossible.

Let's rewrite ndn-cxx in Haskell then ;)

Actions #41

Updated by Junxiao Shi over 8 years ago

Reply to note-39:

The "intermediate type" is the type returned by operator>> before it reaches the sink.

One solution is:

interface Downstream
{
  size_t write(..
};

interface Upstream
{
  void setNext(Downstream* next);
};

interface Pumpable : public Upstream
{
  void pump();
};

class Source : public Pumpable
{
public:
  void setNext(Downstream* next);
  void pump(); // must call .setNext(non-null) before this
};

class Transform : public Upstream, Downstream
{
  void setNext(Downstream* next);
  size_t write(.. // must call .setNext(non-null) before this
};

class Sink : public Downstream
{
};

class IncompleteChain : public Pumpable
{
public:
  IncompleteChain(Pumpable* init, Transform* tail); // calls init->setNext(tail)
  void setNext(Downstream* next); // calls tail->setNext(next)
  void pump(); // calls init->pump()
};

IncompleteChain* operator>>(Pumpable* init, Transform* tail);
void operator>>(Pumpable* init, Sink* tail); // calls init->setNext(tail) and init->pump()

Then, for each source, transform, sink, declare a method that returns a pointer to that type:

(this is common in standard library, eg. std::make_tuple std::make_unique std::inserter)

SourceStream* sourceStream(std::istream& is);
Transform1* transform1();
Transform1* transform2();
SinkStream* sinkStream(std::ostream& os);

And the recommended syntax would be:

sourceStream(is) >> transform1() >> transform2() >> sinkStream(os);

A drawback is: if the chain is not connected to a sink, it causes a memory leak, but there's no runtime error.

This can be solved by changing the regular pointer for Source and IncompleteChain to shared_ptr, and print a warning in Source destructor if .pump is never invoked.

But this solution may not be necessary.

Actions #42

Updated by Davide Pesavento over 8 years ago

IncompleteChain* operator>>(Pumpable* init, Transform* tail);

Are you sure you can overload an operator for non-class types? I don't think so.

Actions #43

Updated by Junxiao Shi over 8 years ago

Reply to note-42:

I didn't know this restriction.
It's causing error "operator>> must have an argument of class or enumerated type".

The solution is: use an instance (rather than a pointer) as Source, and use pointers for Transform and Sink, so that operator>> can be overloaded on the Source, and no type needs to be copyable.

A full demo is below (tested on Ubuntu 12.04):

// g++ -o x -std=c++0x x.cpp $(pkg-config --cflags --libs libndn-cxx)
#include <ndn-cxx/common.hpp>

using namespace ndn;

class Downstream
{
public:
  virtual void
  write() = 0;
};

class Upstream
{
public:
  Upstream()
    : m_next(nullptr)
  {
  }

protected:
  void
  appendChain(Downstream* tail);

protected:
  Downstream* m_next;
};

class Transform : public Upstream, public Downstream, noncopyable
{
};

void
Upstream::appendChain(Downstream* tail)
{
  if (m_next == nullptr) {
    m_next = tail;
  }
  else {
    static_cast<Transform*>(m_next)->appendChain(tail);
  }
}


class Sink : public Downstream, noncopyable
{
};

class Source : public Upstream, noncopyable
{
public:
  virtual void
  pump() = 0;

  Source&
  operator>>(Transform* transform)
  {
    this->appendChain(transform);
    return *this;
  }

  void
  operator>>(Sink* sink)
  {
    this->appendChain(sink);
    this->pump();
  }
};

class SourceDemo : public Source
{
public:
  virtual void
  pump() NDN_CXX_DECL_FINAL
  {
    BOOST_ASSERT(m_next != nullptr);
    std::cout << "SourceDemo::pump()\n";
    m_next->write();
  }
};

typedef SourceDemo sourceDemo; // so that sourceDemo() can be called like a function

class Transform1 : public Transform
{
public:
  virtual void
  write() NDN_CXX_DECL_FINAL
  {
    BOOST_ASSERT(m_next != nullptr);
    std::cout << "Transform1::write()\n";
    m_next->write();
  }
};

Transform1*
transform1()
{
  return new Transform1();
}

class Transform2 : public Transform
{
public:
  virtual void
  write() NDN_CXX_DECL_FINAL
  {
    BOOST_ASSERT(m_next != nullptr);
    std::cout << "Transform2::write()\n";
    m_next->write();
  }
};

Transform2*
transform2()
{
  return new Transform2();
}

class SinkDemo : public Sink
{
public:
  virtual void
  write() NDN_CXX_DECL_FINAL
  {
    std::cout << "SinkDemo::write()\n";
  }
};

SinkDemo*
sinkDemo()
{
  return new SinkDemo();
}

int
main(int argc, char** argv)
{
  sourceDemo() >> transform1() >> transform2() >> sinkDemo();
  return 0;
}
Actions #44

Updated by Yingdi Yu over 8 years ago

I am ok with the new interface, implementing it now.

Actions #45

Updated by Davide Pesavento over 8 years ago

I don't see any substantial improvements in this new API, at least not enough to justify multiple inheritance. But whatever, I'm fine either way.

Actions #46

Updated by Junxiao Shi over 8 years ago

Reply to note-45:

All occurrences of "multiple inheritance" in note-43 design comply with Google C++ Style Guide, although I'm not using Interface suffix.

For example, Transform inherits Upstream, implements Downstream interface, and is marked noncopyable.

Actions #47

Updated by Junxiao Shi over 8 years ago

commit:cb54abfc18e51bdf9af66ea3a7ea0e5ace8435f1 introduces Source::pumpSome method and needPumpAll parameter on Source constructor.

What's the rationale for these?

Actions #48

Updated by Junxiao Shi over 8 years ago

Another question, what's the use case for SinkBool?

Actions #49

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

commit:cb54abfc18e51bdf9af66ea3a7ea0e5ace8435f1 introduces Source::pumpSome method and needPumpAll parameter on Source constructor.

What's the rationale for these?

I run into a problem when testing partial hex decoding: the test is to see what the decoder will do when input is not pairwise bytes? This require us to control how many bytes can be pumped from source into next module. To differentiate these behaviors, we need a needPumpAll flag.

Actions #50

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

Another question, what's the use case for SinkBool?

Some transformation, such as verifier, may output a boolean value.

Actions #51

Updated by Junxiao Shi over 8 years ago

Reply to note-49:

In fact I thought about this testing need.

Solely having a needPumpAll option is insufficient to force either SourceStream or SourceBuffer to write odd octets into HexDecode.

Instead,

  1. Eliminate needPumpAll option - it's unnecessary.
  2. Have a special Source that allows controlling exactly what is written into the next module.
* The `pump` method does nothing.
* `write` and `end` methods are added, which calls the corresponding methods on the next module.
  1. Unit testing that requires controlling how many octets are written to the next module can use this special Source.
Actions #52

Updated by Yingdi Yu over 8 years ago

I also felt that we probably need such a controlled source, so that we can easily implement accumulated digest.

Actions #53

Updated by Junxiao Shi over 8 years ago

Reply to note-52:

Can you explain what is accumulated digest, and why controlled source can help with that?

How does CryptoPP provide this feature?

Actions #54

Updated by Yingdi Yu over 8 years ago

The operation would be

digest.update(...)
digest.update(...)
...
digest.update(...)
digest.finalize();

so that we keep the unfinished result in the digest without blocking others. CryptoPP just provide another class (not a filter) to achieve this. If we have controlled source, we can re-use the same transformation code.

Actions #55

Updated by Junxiao Shi over 8 years ago

Reply to note-54:

Is digest computed over incomplete input? ie, for an 5000-octet input, can I obtain digest over the first 3000 octets?

If yes, the current API can't support it, because the digest transform does not know when to compute a digest and pass to downstream.

If no, this doesn't require a controlled source, because we can directly call the .write method on the digest transform without using a source.

Actions #56

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

Reply to note-54:

Is digest computed over incomplete input? ie, for an 5000-octet input, can I obtain digest over the first 3000 octets?

If no, this doesn't require a controlled source, because we can directly call the .write method on the digest transform without using a source.

No, but then we need to make appendChain() public, otherwise, there is no way to obtain the outpt, (i.e., connect digest transform to a sink).

Actions #57

Updated by Junxiao Shi over 8 years ago

Reply to note-56:

"There's no way to obtain the output" is inaccurate.

unique_ptr<Sha256Digest> transform = sha256Digest();
Sha256Digest* t = transform.get();
dummySource() >> std::move(transform) >> streamSink(os);
t->write(..);
t->write(..);
t->end();

but this is awkward, obviously.


Yes, I agree a "controlled source" is necessary for more than unit testing.

However, ControlledSource would be a bad name.

Depending on semantics, I suggest two possible names:

  • WritableSource: it has .write and .end methods. The .write method may accept part of input, and the caller should call .write again with rest of input.
  • AppendableSource: it has .append and .end methods. The .append method must accept all input, and internally buffer the part of input not accepted by its downstream.

In terms of simplifying applications, I prefer AppendableSource.

Actions #58

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

Depending on semantics, I suggest two possible names:

  • WritableSource: it has .write and .end methods. The .write method may accept part of input, and the caller should call .write again with rest of input.
  • AppendableSource: it has .append and .end methods. The .append method must accept all input, and internally buffer the part of input not accepted by its downstream.

I still prefer "WritableSource"... if we buffer partial input, when will the buffered input be written into next module? if we rely on next append method call, then we probably run into the case where a lot of input is buffered in source. Given some feedback to caller can help them to control the input rate.

Actions #59

Updated by Junxiao Shi over 8 years ago

Reply to note-58:

One problem with, but not limited to, WritableSource is: the semantics of a .write return value less than size is ambiguous.

  • In most cases, this means the downstream is blocked, so the upstream (or caller of WritableSource) should slow down.
  • In some other cases, this means the downstream expects structured records, but the input is not complete records. Most notably, HexDecode::write has this behavior.

This ambiguous semantics can confuse upstream, and may lead to endless loop, such as:

sourceBuffer("0", 1) >> hexDecode() >> sinkStream(os);
Actions #60

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

One problem with, but not limited to, WritableSource is: the semantics of a .write return value less than size is ambiguous.

  • In some other cases, this means the downstream expects structured records, but the input is not complete records. Most notably, HexDecode::write has this behavior.

This ambiguous semantics can confuse upstream, and may lead to endless loop, such as:

sourceBuffer("0", 1) >> hexDecode() >> sinkStream(os);

This behavior will not happen, in this case, hexDecode.write will return 1 (with the pending bytes stored in hexDecode), and if hexDecode.end() is called, hexDecode will throw exception saying that input is incomplete.

Actions #61

Updated by Junxiao Shi over 8 years ago

Reply to note-60:

I see, the design is changed since Change 2200 patchset 10.

It's necessary to clarify in Doxygen that if Transform expects structured input, it should not return less than size if final portion of input is not a complete record.

Actions #62

Updated by Junxiao Shi over 8 years ago

In commit:4ea56a55df0ef1da5fef244cb607254886bc67ba, the design of Transform class is confusing.

The major problem with this design is, a subclass of Transform can operate in one of two modes:

  • OpenSSL BIO:
    1. The input accepted by .doWrite is written into a BIO.
    2. The output is read from the BIO, and placed in OBuffer.
    3. OBuffer content is passed to downstream.
  • non-BIO transform:
    1. The input is converted and written to OBuffer.
    2. OBuffer content is passed to downstream.

This dual-mode operation makes Transform base class complicated and inefficient.

Recommendation is to split it into multiple classes:

  • Transform has no output buffer, but writes to downstream directly.

    .doWrite procedure works as follows: (use HexEncode as an example)

    if (hasOddOutputChar) {
      if (0 == next.write(oddOutputChar)) {
        return 0;
      }
    }
    int nConverted = 0;
    foreach (chunk in input) {
      outputChunk = toHex(chunk);
      int nAccepted = next.write(outputChunk);
      int nConsumed;
      if (nAccepted % 2 == 1) {
        hasOddOutputChar = true;
        oddOutputChar = outputChunk[nAccepted];
        nConsumed = (nAccepted + 1) / 2;
      }
      else {
        hasOddOutputChar = false;
        nConsumed = nAccepted / 2;
      }
      nConverted += nConsumed;
      if (nConsumed < chunk.size()) {
        break;
      }
    }
    return nConverted;
    
  • BufferedTransform has an output buffer. It allows a subclass to write output into the buffer.

    .doWrite procedure works as follows:

    if (OBuffer is not empty) {
      write OBuffer to downstream
    }
    if (OBuffer is not empty) {
      return 0; // don't accept new input if OBuffer is not empty
    }
    size_t nConverted = subclass.convert(input); // subclass should write to OBuffer
    if (OBuffer is not empty) {
      write OBuffer to downstream
    }
    return nConverted;
    
  • BioTransform inherits from BufferedTransform and is tailored for BIO.

    .doWrite procedure works as follows:

    if (OBuffer is not empty) {
      write OBuffer to downstream
    }
    if (OBuffer is not empty) {
      return 0; // don't accept new input if OBuffer is not empty
    }
    read output from the BIO into OBuffer
    if (OBuffer is not empty) {
      write OBuffer to downstream
    }
    if (OBuffer is not empty) {
      return 0; // don't accept new input if OBuffer is not empty
    }
    size_t nConverted = subclass.convert(input); // subclass should pass input to BIO
    read output from the BIO into OBuffer
    if (OBuffer is not empty) {
      write OBuffer to downstream
    }
    return nConverted;
    

In both BufferedTransform and BioTransform, no attempt is made to "fill up" the output buffer.
If the output buffer contains output from previous calls, or the BIO has some output, no new input can be accepted.
In this way, most waiting will be done at the source, so the Transform becomes simpler.

Actions #63

Updated by Junxiao Shi over 8 years ago

From http://gerrit.named-data.net/#/c/2199/15/src/security/transform/transform-base.cpp regarding note-43 static_cast<Transform*>(m_next)->appendChain(tail):

what guarantees that m_next points to a Transform instance?

.appendChain is only invoked by operator<<. After appending a Sink, operator<< would return void preventing further Downstreams from being appended.

Source could keep track of the last unique_ptr of the chain (which should always be nullptr) unique_ptr<Downstream>* m_attachTheNextModuleHere
initialized to m_attachTheNextModuleHere = &this->m_next;
Then in operator>> something like this (untested): *m_attachTheNextModuleHere = std::move(tail); m_attachTheNextModuleHere = &(*m_attachTheNextModuleHere)->m_next;
This would also be slightly more efficient (less function calls and unique_ptr moves).

No, this effectively makes Source a Chain.

It also does not eliminate the static_cast because &(*m_attachTheNextModuleHere) has type Downstream*.


If you want to keep a reference to the tail pointer, use a separate Chain class:

class Chain : noncopyable
{
public:
  explicit
  Chain(unique_ptr<Source> source);

  Chain&
  operator>>(unique_ptr<Transform> transform);

  void
  operator>>(unique_ptr<Sink> sink);

private:
  unique_ptr<Source> m_source;
  unique_ptr<Downstream>* m_tail;
};

Make Chain a friend of Upstream so operator>> can access m_next field. Eliminate Upstream::appendChain method.

Then change sourceStream as follows:

/** \brief creates a Chain that starts with a SourceStream
Chain
sourceStream(std::istream& is, size_t bufLen = SourceStream::DEFAULT_BUFFER_LEN)
{
  return Chain(make_unique<SourceStream>(is, bufLen));
}

This design makes Source cleaner.

Actions #64

Updated by Davide Pesavento over 8 years ago

Junxiao Shi wrote:

.appendChain is only invoked by operator<<. After appending a Sink, operator<< would return void preventing further Downstreams from being appended.

Unless I'm missing something, nothing prevents you from keeping a reference while constructing the chain...

auto chain = source() >> trans();
chain >> sink();
chain >> bad_things_happen_now();

No, this effectively makes Source a Chain.

uh? what's the definition of "Chain" for you? Source is an Upstream, therefore it has a next pointer, therefore it is a chain... I don't understand.

It also does not eliminate the static_cast because &(*m_attachTheNextModuleHere) has type Downstream*.

True.

If you want to keep a reference to the tail pointer, use a separate Chain class:
[...]

I won't comment on this additional design change. I have the feeling that the insistence on using operator>> to compose the chain has already made the design significantly more complex and hard to follow, while buying us nothing in terms of benefits.

Actions #65

Updated by Yingdi Yu over 8 years ago

given we cannot avoid using static_cast anyway, I will keep the code as is.

Actions #66

Updated by Davide Pesavento over 8 years ago

Well my objection was not to the presence of the static_cast per se, but to the fact that it's an unchecked downcast. What guarantees that it's a safe operation in that context? If the problem can arise only because of erroneous client code, then we should add an assert.

Actions #67

Updated by Yingdi Yu over 8 years ago

Davide Pesavento wrote:

Well my objection was not to the presence of the static_cast per se, but to the fact that it's an unchecked downcast. What guarantees that it's a safe operation in that context? If the problem can arise only because of erroneous client code, then we should add an assert.

I am not familiar with that. Could you provide an example of the assert code?

Actions #68

Updated by Davide Pesavento over 8 years ago

Something like this should work:

BOOST_ASSERT(dynamic_cast<Transform*>(getNext()) != nullptr);
Actions #69

Updated by Junxiao Shi over 8 years ago

In commit:c64adee627ec9bd11dd0eaa35eef22927d8878bf, StreamSource constructor accepts bufLen parameter to specify internal buffer size when transferring octets from a std::istream to a downstream transformation module.

What's the benefit of allowing to specify buffer size?

Actions #70

Updated by Yingdi Yu over 8 years ago

Junxiao Shi wrote:

What's the benefit of allowing to specify buffer size?

Alex mentioned in PatchSet 10 that it might be useful to make it configurable for optimization. http://gerrit.named-data.net/#/c/2199/10/src/security/transform/source-stream.cpp@76

Actions #71

Updated by Yingdi Yu over 8 years ago

reply to note-62:

@Junxiao, I do not think you can avoid output buffer completely. As long as the next module cannot accept all transformation result, then you need to a buffer to hold the pending transformation result. Otherwise, you have to do re-computation again. To me, do re-computation is even worse.

Moreover, what is the benefit of not having an output buffer? Argument for read write overhead does not stand, the example you gave also created an internal buffer implicitly.

Actions #72

Updated by Junxiao Shi over 8 years ago

Reply to note-71:

Then rename BufferedTransform to Transform, but still keep BioTransform as a subclass.

Actions #73

Updated by Junxiao Shi over 8 years ago

I notice the merged commit:9e0e00b1f3bddfc7ce2b23bfc639d4eeee5768d6 contains incorrectly named test suites:

tests/unit-tests/security/transform/sink-stream.t.cpp
tests/unit-tests/security/transform/source-buffer.t.cpp
tests/unit-tests/security/transform/source-stream.t.cpp

These should be renamed to conform to UnitTesting in another commit under this issue.

Actions #74

Updated by Yingdi Yu over 8 years ago

  • Status changed from Code review to Closed
  • % Done changed from 80 to 100
Actions #75

Updated by Junxiao Shi over 8 years ago

  • Status changed from Closed to Feedback

This issue cannot close until note-73 is fulfilled.

Actions #76

Updated by Junxiao Shi over 8 years ago

commit:bae21d51840dbcee046ed67a7337db68d823971e is forcefully merged, but that commit still violates unit testing naming convention. This issue cannot close until another commit fixes those violates, or the convention is changed. See also #2497 note-18.

Actions #77

Updated by Yingdi Yu over 8 years ago

  • Status changed from Feedback to Closed

Junxiao Shi wrote:

commit:bae21d51840dbcee046ed67a7337db68d823971e is forcefully merged, but that commit still violates unit testing naming convention. This issue cannot close until another commit fixes those violates, or the convention is changed. See also #2497 note-18.

The issue has been addressed, we can close this issue.

Actions

Also available in: Atom PDF