OK, this is kind of odd. Got a HiperARC that's rebooting itself
periodically. The card's been replaced already, flash has been upgraded
to 5.1.99-8 (was acting up with 5.0.9 as well), and I've tried setting up
a sniffer to see if anyone's throwing some sort of DoS attack at it
(nothing showed up).
Did get a crashdump, though. We've got a case open with 3Com, so I'm
posting here in hopes that one of the more clueful 3Com techs will pick up
on it.
Support recommended replacing the HARC (did that), reflashing (did that),
rebuilding the config (did that too), swapping power supplies (did that).
They now say our power supplies aren't big enough, and we need to have 130
amp ones for this configuration. Possible, but I kind of doubt it, since
they say that one 130 is sufficient, but we currently have two 70s, and
removing either one of them and running the chassis on one makes no
noticeable impact on the frequency of the crashes.
Nothing interesting gets logged to syslog or RADIUS accounting, nor
anything else of interest on the console before the crash dump shows up.
Configuration is one ARC, 14 DSPs on PRI, dual 70A power supplies.
Routing IP only, PAP/RADIUS authentication, OSPF. NMC and DSPs are at the
required code versions for compatibity with 5.1.99 according to 3Com site,
2.1.9 across the board on DSPs, 8.0.9 on HiperNMC.
This hub had been stable for months, and just started rebooting a few
weeks ago. No configuration or firmware changes had been made to the ARC,
although we have added a few more elements to the network it sits on,
including one new OSPF speaking router. Looked over the config on that,
and don't see anything that could be causing it to spew bad OSPF packets.
One other thing I have noticed -- the past few times it's blown up, my
RADIUS server logs an authentication request with no username received
from the ARC. I don't know yet if this is a cause or a side-effect of the
thing crashing, but I'm soon going to do some testing with throwing
invalid LCP/PAP packets at it and see if I can provoke it into bombing.
EXCEPTION 0200 CRASH DUMP:
GPRs:
R0: 0x00385584 R1: 0x07F55E00 R2: 0x000F19E0 R3: 0x02B1A2F0
R4: 0x0000000F R5: 0x01938430 R6: 0x001A2001 R7: 0x00000001
R8: 0x3F91CE34 R9: 0x00000000 R10: 0x00000016 R11: 0x00000000
R12: 0x01CB7AB0 R13: 0x000FBA20 R14: 0x00982320 R15: 0x00982304
R16: 0x0098230C R17: 0x009822F8 R18: 0x00982300 R19: 0x00982310
R20: 0xFFFFFEFF R21: 0x00C03B50 R22: 0x00000001 R23: 0x00000000
R24: 0x00000000 R25: 0x00000001 R26: 0x032584B0 R27: 0x07F55F20
R28: 0x01F86370 R29: 0x02B1A2F0 R30: 0x00981A64 R31: 0x04010801
SPRs:
CR: 0x40000400 XER: 0x20000000 LR: 0x00385584 CTR: 0x007EE224
SRR0: 0x003B4F10 SRR1: 0x00089030 DSISR: 0x00000000 DAR: 0x00000000
DMISS: 0x00000000 DCMP: 0x00000000 HASH1: 0x00010000 HASH2: 0x0001FFC0
IMISS: 0x00000000 ICMP: 0x00000000 RPA: 0x00000000 IABR: 0x00000000
82660 Registers:
Err Status 1: 0x20, Err Status 2: 0x00, CPU Err: 0x72, PCI Err: 0x06
CPU/PCI Addr: 0x0F0E7368, Sys Error Addr: 0x0F0E7368
Call Stack:
0x003B4F10 (Exception return address - SRR0)
0x00385584
0x004BDDC0
0x004C0D40
0x004B58F0
0x004B2688
0x007E290C
0x007E2AB4
0x002008D4
0x0020024C
0x0020009C
0x000A7D80
BOOT PROM Version 1.16 (Built on June 9th, 1998 at 12:24:24)
-
To unsubscribe to usr-tc, send an email to "majordomo(a)xmission.com"
with "unsubscribe usr-tc" in the body of the message.
For information on digests or retrieving files and old messages send
"help" to the same address. Do not use quotes in your message.