Monitoring a Cisco 675 with Unix

 
Tworoads Home
Our Home
News and Events
Photo Galleries
Reviews
Scott's Page
• Palm Pilot
Paul's Page
Yukon's Page
Scout's Page

Home Automation
• Weather Station
• Temp. Readings
Home Network
• Precision 420
• Cisco 675
Our Tivo
Tworoads Software
• HelioStat
• Loc Notes
• MoonCal
• GPS Find

CBOS Versions
Syslog Configurations
SNMP Configurations
MRTG Configurations
Line Noise

CBOS Versions and General Configuration

Update: Fri Oct 12 08:21:05 PDT 2001:

We're now running CBOS version 2.4.3. You can once again extract byte traffic from SNMP via MRTG in the same way you could for 2.3.5 ( In byte counts are still wrong or 0, so extract counts only with Out byte counts). Further, the the SNMP agent seems to be more stable to NET-SNMP queries snmpwalk. Even so we are now using telnet/expect scripts to extract the most accurate data possible, see below for our current conifguration. snmptraps are now correctly sending the eth0 based address instead of 0.0.0.0.

There are some changes regarding IP filters with 2.4.3. For instance, once you set any rule, automatically a "Deny All" rule is added for you. For some help see these two threads ( 1 , 2 ) on filtering. Also passwords are now stored MD5 encrypted. I note that a "Commander" password was inserted into the configuration even though I don't use commander? Hmmm.

Update: Fri Apr 6 08:21:05 PDT 2001:

N.B. The CBOS 2.4.1 SNMP service is different than that in previous versions, I think for making MMI (auto-provisioning) available. You won't be able to extract accurate byte traffic from SNMP via MRTG or any other SNMP query. I think I was also able to crash the SNMP agent/engine using snmpwalk. I had it stop responding to even the simplest of querys and it wouldn't work again until I rebooted the modem.

Previously:

We're using version 2.3.5.012 of CBOS, and we're in bridging mode as per our ISP (olywa.net). In the version 2.2.0 queries for stats on eth0 left a curious Alarm in the log:

Feb 20 19:30:11 cisco 000:00:01:44 TCP
	Alarm      MTU value returned by get_ip_mtu was zero

Check for your version using the CBOS command "show version:"

cbos>show version

Cisco Broadband Operating System
CBOS (tm) 675 Software (C675-I-M), Version v2.3.5.012 - Release Software
Copyright (c) 1986-2000 by cisco Systems, Inc.
Compiled May  9 2000 15:20:16
NVRAM image at 0x10359d90

*** RFC1483 Bridging Mode Enabled ***

Note that I do not advocate or endorse updating your CBOS. If you choose to upgrade, I strongly encourage you know exactly what you are doing and how to recover from a disaster. I've had to recover from a fouled upgrade, so don't fool yourself by thinking it won't happen to you. See also Setting up the Cisco 675, for general information and other links. Postyware is a site where one can find CBOS images for both the Cisco 675 and 677 DSL modems.

Measurement and Telemetry with Unix tools

Syslog

It's very easy to setup the cisco 675 for remote syslogging. The CBOS commands to have the cisco send syslog messages to a remote host:

cbos#set syslog on
SYSLOG is enabled

cbos#set syslog remote 192.168.1.254
SYSLOG will now send messages to 192.168.1.254

Cisco sends syslog traffic with the UUCP facility, so in your syslog configuration, override the usual UUCP handling and instead do something like this:

*.info,uucp.none	/var/log/messages
uucp.*			/var/log/cisco.log

to capture cisco system log messages.

SNMP:

I've setup our Cisco 675 to be monitored with SNMP for packet transfer information and have this information presented via MRTG.

I also set SNMP traps and send them to a host running snmptrapd (from net-snmp). This daemon can be setup to execute programs/scripts on recieving certain traps: I've set it up to send me mail when the modem sends a "link-up" or "cold start" trap.

I'm doing this in a NetBSD environment, but I'm sure FreeBSD, or Linux would serve just as well.

The cisco 675 CBOS configuration for 2.3.5 and above goes like this:

cbos#set snmp on
SNMP enabled

cbos#set snmp manager
SET SNMP MANAGER takes 5 arguments
IP Address, Community, [read] [write] [both], enable/disable, all/critical
i.e. set snmp manager 10.0.0.2 public read on all
The above means that 10.0.0.2 is the IP address of the SNMP Manager
who will use the community string public and has permission to read
and also receives all types(both critical and informational) of SNMP trap messages

cbos#set snmp manager 168.192.1.254 foobar both enable all
Added SNMP Manager

Update: Fri Oct 12 08:21:05 PDT 2001

Here is the MRTG configuration we use with CBOS versions 2.4.1, and 2.4.3:

########
#
# 640 kbits/s = 655360 bits/s = 81920 Bytes/s
# * .87 = 71270 Bytes/s = 70 kB/second
#
# 272 kbits/second = 278528 bits/s = 34816 Bytes/s
# * .87 = 30290 Bytes/s = 29.58 kB/second
#
# 512 kbits/s = 524288 bits/s = 65536 Bytes/s
# * .87 = 57016 Bytes/s = 56 kB/second
#
# 218 kbits/s = 222822 bits/s = 27853 Bytes/s
# * .87 = 24232 Bytes/s = 23.67 kB/s
#
Target[cisco]: `/bin/cat /usr/local/etc/.ciscoby`
AbsMax[cisco]: 81920
MaxBytes1[cisco]: 71270
MaxBytes2[cisco]: 30290
Title[cisco]: Cisco 675 ((WAN0-0 / WAN Port)
PageTop[cisco]:
 <H1>Traffic Analysis for Cisco 675 RADSL Modem</H1>
 <TABLE>
   <TR><TD>System:</TD><TD>Cisco 675 RADSL Modem</TD></TR>
   <TR><TD>Interface:</TD><TD>ATM Bridge (WAN0-0 / WAN)</TD></TR>  
   <TR><TD>IP:</TD><TD>None (bridge)</TD></TR>
   <TR><TD>Max Speed:</TD><TD>70/30 KBytes/s</TD></TR>
 </TABLE>
# Cisco Signal/Noise measurement
#
Target[cisco.sn]: `/bin/cat /usr/local/etc/.ciscosn`
MaxBytes[cisco.sn]: 45
Title[cisco.sn]: Cisco S/N
PageTop[cisco.sn]:
 <H1>Cisco S/N for DSL network</H1>
YLegend[cisco.sn]: db
ShortLegend[cisco.sn]: db
LegendO[cisco.sn]:  S/N:
Unscaled[cisco.sn]: dwmy
Options[cisco.sn]: nopercent, gauge

# Cisco CRC and RS errors
#
Target[cisco.errs]: `/bin/cat /usr/local/etc/.ciscoerrs`
MaxBytes[cisco.errs]: 500
Title[cisco.errs]: Cisco RS and CRC errors
PageTop[cisco.errs]:
 <H1>Cisco RS and CRC Errors for DSL network</H1>
YLegend[cisco.errs]: errs/hr
ShortLegend[cisco.errs]: errs/hr
LegendI[cisco.errs]:  RS :
LegendO[cisco.errs]:  CRC:
ThreshMaxO[cisco.errs]: 400
ThreshProgO[cisco.errs]: /usr/local/etc/domail
Options[cisco.errs]: nopercent, perhour

# Cisco RS errors
#
Target[cisco.rs]: `/bin/cat /usr/local/etc/.ciscors`
MaxBytes[cisco.rs]: 100
Title[cisco.rs]: Cisco Reed Solomon errors
PageTop[cisco.rs]:
 <H1>Cisco Reed Solomon Errors for DSL network</H1>
YLegend[cisco.rs]: errs/sec
ShortLegend[cisco.rs]: errs/sec
LegendI[cisco.rs]:  Uncorrected RS:
LegendO[cisco.rs]:  Corrected   RS:
Where /usr/local/etc/.ciscoby is created with the script cisprep. This script runs cisco.ex on the host that speaks directly to the cisco675. The output is piped back to cisprep, parsed into the various data we want to plot and left in a handful of files. Cisprep is run just before mrtg is run, both through CRON. For example
*/5     *       *       *       *       if test -x /etc/mrtg.conf; then /usr/local/etc/cisprep \
		> /dev/null 2>&1 && /usr/pkg/bin/mrtg /etc/mrtg.conf > /var/log/mrtg.log 2>&1; fi

(I often use the execute bit on a configuration file to indicate whether I want that service to run).

Previously with CBOS 2.3.5:

Here is the MRTG configuration we used with 2.3.5 It turns out that much of the SNMP measurement facility is broken on the current releases of the CBOS operating system for the 675. What does seem to work is SNMP measurement of outgoing byte and packet counts. I track outbound traffic on wan0 (SNMP ifindex 2), and outbound traffic on eth0 (SNMP ifindex 1) to simulate inbound traffic on wan0.

########
#
# 87% of theoretical is real max
#
# 640 kbits/s = 655360 bits/s = 81920 Bytes/s
# * .87 = 71270 Bytes/s = 70 kB/second
#
# 272 kbits/second = 278528 bits/s = 34816 Bytes/s
# * .87 = 30290 Bytes/s = 29.58 kB/second
#
# 512 kbits/s = 524288 bits/s = 65536 Bytes/s
# * .87 = 57016 Bytes/s = 56 kB/second
#
# 218 kbits/s = 222822 bits/s = 27853 Bytes/s
# * .87 = 24232 Bytes/s = 23.67 kB/s
#
#
Target[cisco]: ifOutOctets.1&ifOutOctets.2:public@mycisco675
AbsMax[cisco]: 81920
MaxBytes1[cisco]: 71270
MaxBytes2[cisco]: 30290
Title[cisco]: Cisco 675 ((WAN0-0 / WAN Port)
PageTop[cisco]:
 <H1>Traffic Analysis for Cisco 675 RADSL Modem</H1>
 <TABLE>
   <TR><TD>System:</TD><TD>Cisco 675 RADSL Modem</TD></TR>
   <TR><TD>Interface:</TD><TD>ATM Bridge (WAN0-0 / WAN)</TD></TR>  
   <TR><TD>IP:</TD><TD>None (bridge)</TD></TR>
   <TR><TD>Max Speed:</TD><TD>70/30 KBytes/s</TD></TR>
 </TABLE>
WithPeak[cisco]: wm

I have also arranged to collect information on a periodic basis to track error counts and S/N. This is done via an expect script and some perl scripts (ciserr2mrtg, cissn2mrtg) to pull the data from the expect output and translate it into records that MRTG can use. More MRTG configurations:


Target[cisco.errs]: `/usr/local/etc/ciserr2mrtg`
MaxBytes[cisco.errs]: 1000
Title[cisco.errs]: Cisco errors
PageTop[cisco.errs]:
 <H1>Cisco Errors for DSL network</H1>
YLegend[cisco.errs]: errs/hr
ShortLegend[cisco.errs]: errs/hr
LegendI[cisco.errs]:  RS:
LegendO[cisco.errs]:  CRC:
Options[cisco.errs]: nopercent, perhour
#Unscaled[cisco.errs]: dwm

Target[cisco.sn]: `/usr/local/etc/cissn2mrtg`
MaxBytes[cisco.sn]: 50
Title[cisco.sn]: Cisco S/N
PageTop[cisco.sn]:
 <H1>Cisco S/N for DSL network</H1>
YLegend[cisco.sn]: db
ShortLegend[cisco.sn]: db
LegendI[cisco.sn]:  S/N:
LegendO[cisco.sn]:  S/N:
Options[cisco.sn]: nopercent, gauge

cisstat is also a script I use for immediate inspection of the modem:


% cisstat
Name     Rate U/D  Power U/D S/N (db)
wan0      272/640    0.7/6.7  38

Name     Ipkts CRCerrs              RSerrs   Opkts Oerrs
eth0    184536       0                   0  275360     0
wan0-0   93300     554  1%        81/11966   86288    31  0%

Cisco 675, Qwest and Line Noise: My Story, April 2001

After Qwest updated the DSLAMS in our area, and our ISP trouble shot all of their connections, I could see that we were still getting significant error rates (CRC and RS errors). As I was able to determine later, at the worst of times this was in the range of 500 uncorrected RS errors per second without any significant TCP/IP traffic.

The progression of events was

  • Train up at a solid 45db S/N
  • Within 1hr S/N begins to drop - first low fourties, then mid thiries, evetually the S/N fluctuates by 10db constantly. Connection still runs great for 1hr or so.
  • Corrected RC errors begin to accumulate in an ever increasing rate.
  • Uncorrected RC errors begin to accumulate in an ever increasing rate.
  • CRC errors begin, packet loss between my firewall and ISPs router (with Cisco 675 inbetween) becomes significant (15-20%).
  • Cisco 675 loses sync and retrains (apparently the cisco 675 autmatically retrains when the S/N goes below 16db).
  • Begin cycle again

Elapsed time of events: 2-3 hours between retrains

I poked around in the cisco configuration, and also I noted from a couple of websites that you can control the local and remote transmission power, but initially this didn't change things.

I ended up calling Qwest for troubleshooting. They had me go through resetting the nvram and setting the modem back to the default configuration. That didn't change anything. I was eventually assigned a line test ticket. When I got home in the afternoon the next day, A tech called me. After disconnecting the phones in the house, he checked the line quality: he came back with an apparent loop length of 4800ft (As the crow flies, we're about 2500ft from our CO, so I guess this is about right), and reported no errors, and a pretty clean line.

After speaking to him and reconnecting the rest of the house, the connection statistics changed dramatically. I watched it overnight and the condition of the connection is still in great shape. Errors are on the order of a few an hour, and altough the S/N has dropped to 28db or so, it's solid, +/- 1 db. Not like the variation of 10 db I was seeing before. The next day I called back, waited for 90min to talk to a tech about this. When I got the tech, she said it may have been a "locked port at the CO." Maybe caused by lightning, or some other event (It's been some time since we've had lightning) She said sending the test signal down the line probably "unlocked the port."

For illustrative purposes, I've captured the images from MRTG to allow you to see before and after: (14hrs is about the time I was playing with the modem, and then got a call from the tech. Time increases right to left).

First: packet transfer (blue, 100% = 0% loss) and RTT to ISP router (green, ms)

Second: S/N (blue). In this graph you can see the difference between highly variable S/N ratio and the very constant ratio after the change. You can also see where I started at a S/N of 24 db and was getting some non zero rate of corrected RC errors (see below), when I ajusted the txpower to #4 I jumped to a S/N of 28 db and almost completely lost all errors, execpt on a sporatic basis.

Third: CRC (blue) and bulk RS (green)errors per hour. The RS errors appear in blocks, in the early half the graph because they rapidly outstrip the range of this graph.

Fourth: corrected RS (blue) and uncorrected RS (green) errors per second. In the early part of the graph (right hand side) you can see the increasing rate of both corrected and uncorrected errors until they are chopped off: that's the point at which the modem would retrain.

From our base configuration I changed two things:

  • Moved the power cord for the cisco 675 further away isolation transformer I was using as a protective power source for the computers in the closet (they had been right up against each other). This raised our S/N probably 2-3 db
  • Ajusted both the local and remote transmission power down to setting #4 (-15db) using:
    cbos# set int wan0 txpower 4
    cbos# set int wan0 remote txpower 4
    cbos# set int wan0 retrain
    This has resulted in a further increase in our S/N 24 -> 28 db

Update : 5/6/2001

Oddly enough I also learned that getting the best S/N does not result in the lowest number of uncorrected Reed Solomon errors. When set to txpower as above, my S/N went up, but my uncorrected Reed Solomon errors also went up significantly. I couldn't keep my modem trained for more than 1/2 hour at a time. I am currently using txpower 1 (default), and alough I get a substantial number of corrected RS errors (40/second), I get very few uncorrected RS errors, and the modem stays trained for 10's of hours at a time. CRC errors, while I do get them on occasion, are not due to a storm of uncorrected RS errors, but are more likely related to distant traffic (nothing I can do about that) and they come in ones, and twos.



Scott Presnell
Last modified: Thursday, May 22, 2008 16:29 PDT