ARC OSPF adjacency problem with Backup DR (again)
Has anyone encountered a situation where HiPer ARCs will not reestablish an OSPF adjacency with a new Backup Designated Router after a network undergoes a BDR reelection? This weekend I had rebooted 2 ARCs because they were refusing to establish an adjacency with a Cisco 7206 which was the acting BDR until this morning. The reboot fixed the adjacency problem with the Cisco. However, this morning I rebooted the BDR as part of a software update. When that happened one of our Juniper M5s was elected as the new BDR. The BDR reelection was expected. But both HiPer ARCs would not establish an adjacency with the Juniper and went into an Init, ExStart cycle as had happened before with the Cisco. I cleared the OSPF process on the Juniper M5 to force reelection of our Cisco 7206 as the BDR. The Cisco 7206 became the BDR once again, but the HiPer ARCs still could not reestablish their adjacency with the Cisco. As before they went into an Init, ExStart cycle. I'd hate to have to reboot a HiPer ARC everytime an OSPF BDR reelection occurs. On the other hand the failure to establish an adjacency with the BDR doesn't seem to be affecting the routing since they're adjacency with the DR is ok. Still this is somewhat annoying. Both HiPer ARCs are running 5.1.99 and installed in the same chassis. Anybody got any suggestions?
(Sorry, slow to respond here, I kinda tuned out during the v.92 whining) Yeah. Check the netmask on your ethernet interface (show ip network ip) to make sure it hasn't spontaneously changed itself. All the netmasks have to match before an adjacency will come up. The netmask might spontaneously change if you have two routes in your routing table with the same net number but a different size -- most notably large aggregate Null0 routes you might have for BGP. For example if you have something on your Cisco like router bgp 65535 network 192.168.1.0 mask 255.255.240.0 ip route 192.168.1.0 255.255.240.0 Null0 250 ! interface fastethernet0/0 ip address 192.168.1.1 255.255.255.224 interface fastethernet0/1 ip address 192.168.1.33 255.255.255.224 ...where you've got a /20 that's subnetted into chunks of /27 or whatever. You'd end up with this in the routing table of the Cisco: 192.168.1.0/20 null0 192.168.1.0/27 fastethernet0/0 192.168.1.32/27 fastethernet0/1 If your ARC is hanging off of fa0/1 in this example, everything's cool. If your ARC is hanging off of fa0/0 in this example, you'll have problems. Big problems. The ARC will get confused when it sees both 192.168.1.0/20 and 192.168.1.0/27 in the routing table, and will use the wrong one. If your ARC happens to be located in 192.168.1.0/27, it actually screws up to the point of deciding that your ethernet interface's netmask is /20 instead of /27. Annoying but not deadly -- until your DR or BDR reboots, at which point it becomes extremely toxic becuase your network melts down into a little puddle as everyone hangs in ExStart. (Turn on OSPF debugging on the Cisco and you'll see that they won't negotiate because the netmasks don't match.) (You can recover without rebooting the ARC if you can get in and do a 'reconfigure ip network ip' to reset the netmask. I think. It's been 2 years or so.) If your ARC is in 192.168.1.32/27, then everything works just dandy, because there's not another route with the same net number for it to get confused by. That's the workaround for the bug -- move the ARC to another subnet. 3com's known about this bug for years. (It was bug id MR12019 once upon a time.) Check the list archives; I brought it up numerous times. I got tired of waiting for a fix and renumbered my ARCs into a different segment 2 years ago. I don't know if they fixed it in 5.1.99 or not... I worked around it and my ARCs work great now for everything else, so, shrug... Mike Andrews * mandrews@dcr.net * mandrews@bit0.com * http://www.bit0.com VP, sysadmin, & network guy, Digital Crescent Inc, Frankfort KY Internet access for Frankfort, Lexington, Louisville and surrounding counties It's not news, it's Fark.com. "With sufficient thrust, pigs fly just fine" On Wed, 20 Feb 2002, Antonio Querubin wrote:
Has anyone encountered a situation where HiPer ARCs will not reestablish an OSPF adjacency with a new Backup Designated Router after a network undergoes a BDR reelection? This weekend I had rebooted 2 ARCs because they were refusing to establish an adjacency with a Cisco 7206 which was the acting BDR until this morning. The reboot fixed the adjacency problem with the Cisco.
However, this morning I rebooted the BDR as part of a software update. When that happened one of our Juniper M5s was elected as the new BDR. The BDR reelection was expected. But both HiPer ARCs would not establish an adjacency with the Juniper and went into an Init, ExStart cycle as had happened before with the Cisco.
I cleared the OSPF process on the Juniper M5 to force reelection of our Cisco 7206 as the BDR. The Cisco 7206 became the BDR once again, but the HiPer ARCs still could not reestablish their adjacency with the Cisco. As before they went into an Init, ExStart cycle.
I'd hate to have to reboot a HiPer ARC everytime an OSPF BDR reelection occurs. On the other hand the failure to establish an adjacency with the BDR doesn't seem to be affecting the routing since they're adjacency with the DR is ok. Still this is somewhat annoying. Both HiPer ARCs are running 5.1.99 and installed in the same chassis. Anybody got any suggestions?
_______________________________________________ USR-TC mailing list USR-TC@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/usr-tc
On Wed, 6 Mar 2002, Mike Andrews wrote:
The netmask might spontaneously change if you have two routes in your routing table with the same net number but a different size -- most notably large aggregate Null0 routes you might have for BGP.
For example if you have something on your Cisco like
router bgp 65535 network 192.168.1.0 mask 255.255.240.0 ip route 192.168.1.0 255.255.240.0 Null0 250 ! interface fastethernet0/0 ip address 192.168.1.1 255.255.255.224 interface fastethernet0/1 ip address 192.168.1.33 255.255.255.224
...where you've got a /20 that's subnetted into chunks of /27 or whatever. You'd end up with this in the routing table of the Cisco:
192.168.1.0/20 null0 192.168.1.0/27 fastethernet0/0 192.168.1.32/27 fastethernet0/1
If your ARC is hanging off of fa0/1 in this example, everything's cool. If your ARC is hanging off of fa0/0 in this example, you'll have problems. Big problems. The ARC will get confused when it sees both 192.168.1.0/20 and 192.168.1.0/27 in the routing table, and will use the wrong one.
If your ARC happens to be located in 192.168.1.0/27, it actually screws up to the point of deciding that your ethernet interface's netmask is /20 instead of /27. Annoying but not deadly -- until your DR or BDR reboots, at which point it becomes extremely toxic becuase your network melts down into a little puddle as everyone hangs in ExStart. (Turn on OSPF debugging on the Cisco and you'll see that they won't negotiate because the netmasks don't match.)
Thanks, this explanation sounds very similar to the problem we've had.
(You can recover without rebooting the ARC if you can get in and do a 'reconfigure ip network ip' to reset the netmask. I think. It's been 2 years or so.)
When the problem happens we can usually telnet to the console command prompt so this might be an additional useful workaround.
If your ARC is in 192.168.1.32/27, then everything works just dandy, because there's not another route with the same net number for it to get confused by. That's the workaround for the bug -- move the ARC to another subnet.
3com's known about this bug for years. (It was bug id MR12019 once upon a time.) Check the list archives; I brought it up numerous times. I got tired of waiting for a fix and renumbered my ARCs into a different segment 2 years ago. I don't know if they fixed it in 5.1.99 or not... I worked around it and my ARCs work great now for everything else, so, shrug...
Well, we're running 5.1.99 now :(. We plan to load the recent TCS 4.5 code into the ARC soon - maybe that'll solve the problem but I'll look through the release notes too. If it doesn't fix things we may have to revert back to RIP.
participants (2)
-
Antonio Querubin -
Mike Andrews