Task #3486: Scripts to log NFD counters across testbed during runs - ndnrtc - NDN project issue tracking system

Actions

Task #3486

open

Scripts to log NFD counters across testbed during runs

Added by Jeff Burke over 9 years ago. Updated almost 9 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

-

Start date:

02/28/2016

Due date:

% Done:

0%

Estimated time:

Description

Do we have / could we make scripts that gather table size, CPU, memory use, etc. every few seconds from testbed nodes? This might help see trends. Also might consider making these stats visualized on ndnmap? Peter, can you follow up with John DeHart?

Actions

#1

Updated by Alex Afanasyev over 9 years ago

At some point, I was suggesting to add cacti (http://www.cacti.net/) or something similar to record various historical information about routers. While it is primarily designed to work using SNMP protocol, it can use any custom scripts to obtain information from servers.

Actions

#2

Updated by Alex Afanasyev over 9 years ago

(Btw. This task is not related directly to ndnrtc, but rather to the Testbed monitoring)

Actions

#3

Updated by John DeHart over 9 years ago

Alex Afanasyev wrote:

At some point, I was suggesting to add cacti (http://www.cacti.net/) or something similar to record various historical information about routers. While it is primarily designed to work using SNMP protocol, it can use any custom scripts to obtain information from servers.

Yes, I am actually working on Cacti monitoring of Testbed stats right now. I have some stuff working
and will hopefully have something to show/report in the next few days.

I've also looked at ELK (elasticsearch.com) but that is much more heavy weight and doesn't
seem to fit our use model.

Actions

#4

Updated by John DeHart over 9 years ago

I'd like to suggest that for the next couple of tests we limit the number of nodes that are active in the test.
For the Seminar from CSU perhaps we could have users at CSU, WU, UCLA and REMAP.
And users connecting to those sites should all be local to those sites.
At this point we don't want more variables, we want less until we get the basic functionality working well.

Actions

#5

Updated by Jeff Burke over 9 years ago

@Peter, can you help coordinate John's request with the presenter/viewers?
@Alex, yes, I agree - I couldn't figure out a better redmine location for this issue, but we could move it...

Actions

#6

Updated by Peter Gusev over 9 years ago

@John so we limit our users to these four hubs for the next seminar: CSU, WU, UCLA, REMAP, right?

Actions

#7

Updated by John DeHart over 9 years ago

As a first step, I have scripts running that every 10 seconds are collecting nfd proc size, # Pit Entries and # Name Tree entries
on each of CSU, UCLA, REMAP and WU. Next, I'll work on getting that data onto real time graphs.

Actions

#8

Updated by John DeHart over 9 years ago

Peter Gusev wrote:

@John so we limit our users to these four hubs for the next seminar: CSU, WU, UCLA, REMAP, right?

Yes, and I would also suggest we limit the number of users. I would say at most 1 or 2 per site.

Actions

#9

Updated by John DeHart over 9 years ago

And, how much ahead of time can we set up the ndnrtc conference so we can try it before the NDN seminar starts?

Actions

#10

Updated by Jeff Burke over 9 years ago

Peter, also I think we want to limit to one bandwidth option per stream, perhaps a little higher bandwidth than the "low" option

Actions

#11

Updated by Peter Gusev over 9 years ago

And, how much ahead of time can we set up the ndnrtc conference so we can try it before the NDN seminar starts?

we'll do a test tomorrow w/ Susmit. time isn't confirmed yet.
we can start 1hr earlier on Wednesday if this you think it'll be useful.

also,

I've also looked at ELK (elasticsearch.com) but that is much more heavy weight and doesn't
seem to fit our use model.

I'm currently developing ELK-ish approach for NDN-RTC log analysis (not ELK exactly, but OpenTSDB + Metrilyx and InfluxDB+Chronograf right now - no one seems to have a perfect solution for my case). I'll eventually ingest NFD logs as well. This CPU/memory data will be useful for me to have so I can ingest it as well.

Actions

#12

Updated by Peter Gusev over 9 years ago

Jeff Burke wrote:

Peter, also I think we want to limit to one bandwidth option per stream, perhaps a little higher bandwidth than the "low" option

ok, will instruct Susmit about this.

Actions

#13

Updated by John DeHart over 9 years ago

Peter: Should I go ahead and restart nfd on CSU, UCLA, REMAP and WU in preparation for
your test tomorrow with Susmit?
Perhaps I should do Arizona and Illinois since they might be intermediate nodes
between CSU and WU.

Actions

#14

Updated by John DeHart over 9 years ago

Here is a sample of what I am collecting right now. Let me know if anyone has any suggestions on other data to collect.
I'm currently collecting this for WU, UCLA, REMAP, CSU, Illinois and Arizona.

WU:  2 cpus of type  Intel(R) Core(TM)2 Duo CPU     E4600  @ 2.40GHz ;  total mem 2048076
              SYSTEM        NFD         NFD               NFD
UTC Time     FREE_MEM    PROC_SIZE   PIT_ENTRIES     NAME_TREE_ENTRIES
1456785046   259708      953248      621         13417
1456785056   259952      953248      636         13500
1456785067   259892      953248      621         13416
1456785077   260076      953248      621         13415
1456785088   260060      953248      620         13409
1456785099   260036      953248      619         13406
1456785109   260204      953248      619         13406
1456785120   260152      953248      643         13536

Actions

#15

Updated by Peter Gusev over 9 years ago

John DeHart wrote:

Peter: Should I go ahead and restart nfd on CSU, UCLA, REMAP and WU in preparation for
your test tomorrow with Susmit?
Perhaps I should do Arizona and Illinois since they might be intermediate nodes
between CSU and WU.

yes, please restart them.

John DeHart wrote:

Here is a sample of what I am collecting right now. Let me know if anyone has any suggestions on other data to collect.
I'm currently collecting this for WU, UCLA, REMAP, CSU, Illinois and Arizona.

WU:  2 cpus of type  Intel(R) Core(TM)2 Duo CPU     E4600  @ 2.40GHz ;  total mem 2048076
            SYSTEM        NFD         NFD               NFD
UTC Time   FREE_MEM    PROC_SIZE   PIT_ENTRIES     NAME_TREE_ENTRIES
1456785046     259708      953248      621         13417
1456785056     259952      953248      636         13500
1456785067     259892      953248      621         13416
1456785077     260076      953248      621         13415
1456785088     260060      953248      620         13409
1456785099     260036      953248      619         13406
1456785109     260204      953248      619         13406
1456785120     260152      953248      643         13536

do you provide public access to these scripts so I can use them?

Actions

#16

Updated by John DeHart over 9 years ago

Cacti graphs are starting to come on line:

Go to http://ndndemo.arl.wustl.edu/cacti/

Log in as guest with password ndnTest

On the far right is an icon that looks like a plot line.
Click on that. At the top then you will have places where you can select which host, graph template
and how many graphs per page to display.
I haven't figured out auto refresh yet, so you might need to manually refresh. There is a button
at the top for that also.
The graphs are currently set to sample data every 60 seconds.

And in case anyone is wondering, we are collecting the data for the graphs over NDN.

Actions

#17

Updated by Alex Afanasyev over 9 years ago

I suggest tracking al the "uptime" value from NFDs.

Actions

#18

Updated by Alex Afanasyev over 9 years ago

The PIT entries graph I would generalize to include NameTree and PIT entries (may be also FIB, though it wouldn't be of much value).

The bandwidth information will be also helpful. For debugging purposes, I would collect this information both over IP/SNMP and from NFD's face counters.

Actions

#19

Updated by John DeHart over 9 years ago

Alex Afanasyev wrote:

The PIT entries graph I would generalize to include NameTree and PIT entries (may be also FIB, though it wouldn't be of much value).

As part of my next phase, I am going to work on including multiple data items on one graph. Pit Entries and NameTree are two obvious choices.

The bandwidth information will be also helpful. For debugging purposes, I would collect this information both over IP/SNMP and from NFD's face counters.

SNMP is not currently open on most Testbed nodes. We can certainly request it but with my NDN scheme working I'm not sure I see any value in relying on SNMP/IP.

As for collecting data from the NFD face counters, I'll look into how we could do that. Are you thinking about having it per face or total?
And if per face, are we just looking at inter-node faces or including faces to local clients?

Actions

#20

Updated by Alex Afanasyev over 9 years ago

SNMP is not currently open on most Testbed nodes. We can certainly request it but with my NDN scheme working I'm not sure I see any value in relying on SNMP/IP.

SNMP is just a dirty way of getting raw information from the interfaces. NFD counters may (unintentionally) hide some information. We can collect the same numbers over NDN, it is just you would need to write many more scripts for that, instead of just using SNMP.

As for collecting data from the NFD face counters, I'll look into how we could do that. Are you thinking about having it per face or total?
And if per face, are we just looking at inter-node faces or including faces to local clients?

Per-face would be ideal, however there is a question how to do it, given faces are kind of dynamic.

Actions

#21

Updated by John DeHart over 9 years ago

I have updated my Cacti graphs to include the following 8 nodes:
Arizona, CAIDA, CSU, MEMPHIS, REMAP, UCI, UCLA, WU

And for each node, there are graphs for the following data:

link traffic

load average

NFD CPU and Memory usage percentage

PIT and NameTree Entries counters

NFD Virtual Memory Size

System Free Memory

NFD Uptime

See http://redmine.named-data.net/issues/3486#note-16 for how to access the graphs.

Actions

#22

Updated by Peter Gusev almost 9 years ago

Status changed from New to Resolved

Actions

Also available in: Atom PDF