Task #3486
openScripts to log NFD counters across testbed during runs
Added by Jeff Burke over 8 years ago. Updated about 8 years ago.
0%
Description
Do we have / could we make scripts that gather table size, CPU, memory use, etc. every few seconds from testbed nodes? This might help see trends. Also might consider making these stats visualized on ndnmap? Peter, can you follow up with John DeHart?
Updated by Alex Afanasyev over 8 years ago
At some point, I was suggesting to add cacti (http://www.cacti.net/) or something similar to record various historical information about routers. While it is primarily designed to work using SNMP protocol, it can use any custom scripts to obtain information from servers.
Updated by Alex Afanasyev over 8 years ago
(Btw. This task is not related directly to ndnrtc, but rather to the Testbed monitoring)
Updated by John DeHart over 8 years ago
Alex Afanasyev wrote:
At some point, I was suggesting to add cacti (http://www.cacti.net/) or something similar to record various historical information about routers. While it is primarily designed to work using SNMP protocol, it can use any custom scripts to obtain information from servers.
Yes, I am actually working on Cacti monitoring of Testbed stats right now. I have some stuff working
and will hopefully have something to show/report in the next few days.
I've also looked at ELK (elasticsearch.com) but that is much more heavy weight and doesn't
seem to fit our use model.
Updated by John DeHart over 8 years ago
I'd like to suggest that for the next couple of tests we limit the number of nodes that are active in the test.
For the Seminar from CSU perhaps we could have users at CSU, WU, UCLA and REMAP.
And users connecting to those sites should all be local to those sites.
At this point we don't want more variables, we want less until we get the basic functionality working well.
Updated by Jeff Burke over 8 years ago
@Peter, can you help coordinate John's request with the presenter/viewers?
@Alex, yes, I agree - I couldn't figure out a better redmine location for this issue, but we could move it...
Updated by Peter Gusev over 8 years ago
@John so we limit our users to these four hubs for the next seminar: CSU, WU, UCLA, REMAP, right?
Updated by John DeHart over 8 years ago
As a first step, I have scripts running that every 10 seconds are collecting nfd proc size, # Pit Entries and # Name Tree entries
on each of CSU, UCLA, REMAP and WU. Next, I'll work on getting that data onto real time graphs.
Updated by John DeHart over 8 years ago
Peter Gusev wrote:
@John so we limit our users to these four hubs for the next seminar: CSU, WU, UCLA, REMAP, right?
Yes, and I would also suggest we limit the number of users. I would say at most 1 or 2 per site.
Updated by John DeHart over 8 years ago
And, how much ahead of time can we set up the ndnrtc conference so we can try it before the NDN seminar starts?
Updated by Jeff Burke over 8 years ago
Peter, also I think we want to limit to one bandwidth option per stream, perhaps a little higher bandwidth than the "low" option
Updated by Peter Gusev over 8 years ago
And, how much ahead of time can we set up the ndnrtc conference so we can try it before the NDN seminar starts?
we'll do a test tomorrow w/ Susmit. time isn't confirmed yet.
we can start 1hr earlier on Wednesday if this you think it'll be useful.
also,
I've also looked at ELK (elasticsearch.com) but that is much more heavy weight and doesn't
seem to fit our use model.
I'm currently developing ELK-ish approach for NDN-RTC log analysis (not ELK exactly, but OpenTSDB + Metrilyx and InfluxDB+Chronograf right now - no one seems to have a perfect solution for my case). I'll eventually ingest NFD logs as well. This CPU/memory data will be useful for me to have so I can ingest it as well.
Updated by Peter Gusev over 8 years ago
Jeff Burke wrote:
Peter, also I think we want to limit to one bandwidth option per stream, perhaps a little higher bandwidth than the "low" option
ok, will instruct Susmit about this.
Updated by John DeHart over 8 years ago
Peter: Should I go ahead and restart nfd on CSU, UCLA, REMAP and WU in preparation for
your test tomorrow with Susmit?
Perhaps I should do Arizona and Illinois since they might be intermediate nodes
between CSU and WU.
Updated by John DeHart over 8 years ago
Here is a sample of what I am collecting right now. Let me know if anyone has any suggestions on other data to collect.
I'm currently collecting this for WU, UCLA, REMAP, CSU, Illinois and Arizona.
WU: 2 cpus of type Intel(R) Core(TM)2 Duo CPU E4600 @ 2.40GHz ; total mem 2048076
SYSTEM NFD NFD NFD
UTC Time FREE_MEM PROC_SIZE PIT_ENTRIES NAME_TREE_ENTRIES
1456785046 259708 953248 621 13417
1456785056 259952 953248 636 13500
1456785067 259892 953248 621 13416
1456785077 260076 953248 621 13415
1456785088 260060 953248 620 13409
1456785099 260036 953248 619 13406
1456785109 260204 953248 619 13406
1456785120 260152 953248 643 13536
Updated by Peter Gusev over 8 years ago
John DeHart wrote:
Peter: Should I go ahead and restart nfd on CSU, UCLA, REMAP and WU in preparation for
your test tomorrow with Susmit?
Perhaps I should do Arizona and Illinois since they might be intermediate nodes
between CSU and WU.
yes, please restart them.
John DeHart wrote:
Here is a sample of what I am collecting right now. Let me know if anyone has any suggestions on other data to collect.
I'm currently collecting this for WU, UCLA, REMAP, CSU, Illinois and Arizona.WU: 2 cpus of type Intel(R) Core(TM)2 Duo CPU E4600 @ 2.40GHz ; total mem 2048076 SYSTEM NFD NFD NFD UTC Time FREE_MEM PROC_SIZE PIT_ENTRIES NAME_TREE_ENTRIES 1456785046 259708 953248 621 13417 1456785056 259952 953248 636 13500 1456785067 259892 953248 621 13416 1456785077 260076 953248 621 13415 1456785088 260060 953248 620 13409 1456785099 260036 953248 619 13406 1456785109 260204 953248 619 13406 1456785120 260152 953248 643 13536
do you provide public access to these scripts so I can use them?
Updated by John DeHart over 8 years ago
Cacti graphs are starting to come on line:
Go to http://ndndemo.arl.wustl.edu/cacti/
Log in as guest with password ndnTest
On the far right is an icon that looks like a plot line.
Click on that. At the top then you will have places where you can select which host, graph template
and how many graphs per page to display.
I haven't figured out auto refresh yet, so you might need to manually refresh. There is a button
at the top for that also.
The graphs are currently set to sample data every 60 seconds.
And in case anyone is wondering, we are collecting the data for the graphs over NDN.
Updated by Alex Afanasyev over 8 years ago
I suggest tracking al the "uptime" value from NFDs.
Updated by Alex Afanasyev over 8 years ago
The PIT entries graph I would generalize to include NameTree and PIT entries (may be also FIB, though it wouldn't be of much value).
The bandwidth information will be also helpful. For debugging purposes, I would collect this information both over IP/SNMP and from NFD's face counters.
Updated by John DeHart over 8 years ago
Alex Afanasyev wrote:
The PIT entries graph I would generalize to include NameTree and PIT entries (may be also FIB, though it wouldn't be of much value).
As part of my next phase, I am going to work on including multiple data items on one graph. Pit Entries and NameTree are two obvious choices.
The bandwidth information will be also helpful. For debugging purposes, I would collect this information both over IP/SNMP and from NFD's face counters.
SNMP is not currently open on most Testbed nodes. We can certainly request it but with my NDN scheme working I'm not sure I see any value in relying on SNMP/IP.
As for collecting data from the NFD face counters, I'll look into how we could do that. Are you thinking about having it per face or total?
And if per face, are we just looking at inter-node faces or including faces to local clients?
Updated by Alex Afanasyev over 8 years ago
SNMP is not currently open on most Testbed nodes. We can certainly request it but with my NDN scheme working I'm not sure I see any value in relying on SNMP/IP.
SNMP is just a dirty way of getting raw information from the interfaces. NFD counters may (unintentionally) hide some information. We can collect the same numbers over NDN, it is just you would need to write many more scripts for that, instead of just using SNMP.
As for collecting data from the NFD face counters, I'll look into how we could do that. Are you thinking about having it per face or total?
And if per face, are we just looking at inter-node faces or including faces to local clients?
Per-face would be ideal, however there is a question how to do it, given faces are kind of dynamic.
Updated by John DeHart over 8 years ago
I have updated my Cacti graphs to include the following 8 nodes:
Arizona, CAIDA, CSU, MEMPHIS, REMAP, UCI, UCLA, WU
And for each node, there are graphs for the following data:
link traffic
load average
NFD CPU and Memory usage percentage
PIT and NameTree Entries counters
NFD Virtual Memory Size
System Free Memory
NFD Uptime
See http://redmine.named-data.net/issues/3486#note-16 for how to access the graphs.