I currently administer a pool of 8 TC1000 chassis, and have been pulling hair out trying to stabilize OSPF. What we are seeing is some TC's refuse to ack some LSA reannouncements from the DR, causing it to eventually drop and restart OSPF sessions at random times. Every now and then this 'corrective' action taken by the DR will trip up a batch of TC's and 5 or 6 will all drop OSPF and reload their sessions. These hiccups are quite noticable to our dual channel ISDN customers (whom also seem to trigger the activity.) Our setup is as follows: 8 TC ARCs, all in the same ospf area (not the backbone) with one addition ospf neighbor, a Cisco router that acts as the DR and ABR to the backbone area. TC7 is the mpip server for the group, all chassis are clients with eachother. Dual channel callers are just as likely to land across chassis as they are to land with both links on one chassis. All callers get subnet/route info, if any (99% just use a single link, and get an ip from the dynamic pool) from ppp, we do not do any routing exchange (OSPF, RIP, BGP, etc) with any customers via dialup. This problem manifested when we turned up ISDN on this network. Before there were no multi-link customers dialing in, and the setup was fairly stable. Since turning up MPIP however, we have multiple ospf session failures a day and our customers are getting quite annoyed with the pauses or unusable connections they cause. Can anyone point me to a good walkthrough on a 'proper' MPIP setup so I can verify that I'm in compliance there, and does anyone have any other suggestions on how to fix this mess? We're currently running 4.5 Joshua Coombs
Just some general advice / info: TC7 is your mpip server you say. So every arc including itself (TC7) is pointed to TC7 as the server. And every arc has ALL of the other arc's, including the "server" listed as a client. I'll put a complete MPIP setup at the end. NTP is enabled and working on all the arcs? I recommend changing the ospf priorities on all the arc's to 0 so it doesn't get confused and try to become the DR: set ospf interface <ip of box> router_priority 0 MPIP SETUP: When setting up MPIP on the Hiper Arcs you will have to designate one of the Harcs as the server for the group. It will keep track of where all the channels are and who they belong to. The rest of the Harcs, including the one designated as the server will be setup as clients. (It's a client too.) add mpip server <ip_address of server> sharedsecret <somesecret> priority 1 Remember to do this on all the arcs. Keep the sharedsecret the same. On the Harc that is going to be the server, make sure the server state is on set mpip server_state on Now, on each Hiper Arc, including the server you need to add the ip address of all the Hiper Arc's in the group. add mpip client <ip_address> sharedsecret <somesecret> type hiper By default all Hiper Arc's are set to default server_state=off and client_state=on Only the Harc acting as the server will have both server and client state on, else it should be off, on. set NTP primary_server<ip address of NTP server> enable ntp save all NOTE: Daylight Savings Time does not work on the HiPer ARC until version 5.1.99. Verify Setup show mpip settings list mpip servers list mpip clients show ntp settings Todd -----Original Message----- From: usr-tc-admin@mailman.xmission.com [mailto:usr-tc-admin@mailman.xmission.com]On Behalf Of Joshua Coombs Sent: Tuesday, November 26, 2002 10:59 AM To: usr-tc@mailman.xmission.com Subject: [USR-TC] MPIP + OSPF Hell I currently administer a pool of 8 TC1000 chassis, and have been pulling hair out trying to stabilize OSPF. What we are seeing is some TC's refuse to ack some LSA reannouncements from the DR, causing it to eventually drop and restart OSPF sessions at random times. Every now and then this 'corrective' action taken by the DR will trip up a batch of TC's and 5 or 6 will all drop OSPF and reload their sessions. These hiccups are quite noticable to our dual channel ISDN customers (whom also seem to trigger the activity.) Our setup is as follows: 8 TC ARCs, all in the same ospf area (not the backbone) with one addition ospf neighbor, a Cisco router that acts as the DR and ABR to the backbone area. TC7 is the mpip server for the group, all chassis are clients with eachother. Dual channel callers are just as likely to land across chassis as they are to land with both links on one chassis. All callers get subnet/route info, if any (99% just use a single link, and get an ip from the dynamic pool) from ppp, we do not do any routing exchange (OSPF, RIP, BGP, etc) with any customers via dialup. This problem manifested when we turned up ISDN on this network. Before there were no multi-link customers dialing in, and the setup was fairly stable. Since turning up MPIP however, we have multiple ospf session failures a day and our customers are getting quite annoyed with the pauses or unusable connections they cause. Can anyone point me to a good walkthrough on a 'proper' MPIP setup so I can verify that I'm in compliance there, and does anyone have any other suggestions on how to fix this mess? We're currently running 4.5 Joshua Coombs _______________________________________________ USR-TC mailing list USR-TC@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/usr-tc
Also Sprach Joshua Coombs
I currently administer a pool of 8 TC1000 chassis, and have been pulling hair out trying to stabilize OSPF. What we are seeing is some TC's refuse to ack some LSA reannouncements from the DR, causing it to eventually drop and restart OSPF sessions at random times. Every now and then this 'corrective' action taken by the DR will trip up a batch of TC's and 5 or 6 will all drop OSPF and reload their sessions. These hiccups are quite noticable to our dual channel ISDN customers (whom also seem to trigger the activity.)
You already got a response to confirm your MPIP setup...all correct...make sure you follow all the steps...once that's working correctly, you might consider a second MPIP server in case the one you have falls over you can continue to process cross-chassis MP calls. Anyway...suggestions on the OSPF side of things. Don't. My suggestion would be to switch the Arcs to use RIPv2...in my experience, the Arc OSPF capabilities has always been a bit flakey. Set the Arcs to RIPv2, and it will, obviously, also require RIPv2 on the Cisco as well. Then have the Cisco redistribute the RIPv2 routes into OSPF. Its what we're doing here, with 18 chassis/Arcs spread across 4 different cities and 6 different subnets. The 6 subnets with the Arcs talk RIPv2 to the directly attached Cisco router which redistributes each subnet's RIPv2 routes into OSPF (summarizing a little bit in the process, where possible). Works like a champ. Customers with static IP addresses can dial into any city, any Arc and their routing works and is propogated across our whole network before they even finish their PPP negotiation. Its rock solid, too. As a suggestion, on the Cisco(s), set the RIPv2 interfaces as passive-interface for RIPv2...this drastically cuts down on the CPU usage on the Ciscos, and, most likely, doesn't affect your reachability at all (I suspect your Arcs all have default routes pointing to your Ciscos anyway, so the traffic is going to go that way in any case). Oh...for the MPIP thing, I've got 2 MPIP servers for all of my Arcs, so...all 4 different cities and 6 subnets talk to one of the two MPIP servers (ie, two of the cities don't have an MPIP server locally at all). Works without hiccups. I did have an issue a month or two ago where a phantom MPIP bundle was in the MPIP servers' tables that didn't exist in reality. Unfortunately, there is no way for MPIP to resync with reality in this case. But in investigating the situation, I found out that the MPIP database had been in continuous existance and use for several *years*. (ie, one or the other of the MPIP servers would be rebooted, but not both at the same time at any point...when one would come back up after a reboot, it would resync with the other server's MPIP database state....so the database survived reboots of each of the MPIP servers since they never got rebooted at the same time...they resync their databases with each other after a reboot, but unfortunately, not with the reality of what bundles *actually* exist on the clients...a flaw in MPIP as far as I'm concerned...but after years of continuous operation, there was only one phantom bundle in my MPIP servers, so its not that serious of a problem really). -- Jeff McAdams Email: jeffm@iglou.com Head Network Administrator Voice: (502) 966-3848 IgLou Internet Services (800) 436-4456
I have not still solve this riddle...any help please... __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
I have not still solve this riddle...any help please...I really need you r helps...Thank you. __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
On Tue, 26 Nov 2002, Joshua Coombs wrote:
I currently administer a pool of 8 TC1000 chassis, and have been pulling hair out trying to stabilize OSPF. What we are seeing is some TC's refuse to ack some LSA reannouncements from the DR, causing it to eventually drop and restart OSPF sessions at random times.
Since there's very little info available from the ARC, what kind of debugging have you done on the cisco side? There's lots of good info available there that might point you in the right direction. Start with "term mon" to get debug going to your session, then look at what comes up under "debug ospf ?". You can get as little as state changes to full db dumps; somewhere you may find what kind of weird stuff the ARC is doing. I got OSPF working well, but we have no MPIP set up, so... I guess I'm lucky. :) Charles
Near as we can tell from debugging on the cisco and lining up with packet traces, some TC's in the group will not ack an LSA reannouncement from the DR (the cisco) that is generated by the second link of a multilink connection coming online. This in turn causes the cisco to resend the announcement, unicast to that particular TC, which still ignores it. Do this enough times, and the DR syslogs an event (too many retransmissions) and tears down the neighbor state with the TC. It then brings it back up on the first hello it recieves. Joshua Coombs ----- Original Message ----- From: "Charles Sprickman" <spork@inch.com> To: <usr-tc@mailman.xmission.com> Sent: Tuesday, November 26, 2002 10:33 PM Subject: Re: [USR-TC] MPIP + OSPF Hell
On Tue, 26 Nov 2002, Joshua Coombs wrote:
I currently administer a pool of 8 TC1000 chassis, and have been pulling hair out trying to stabilize OSPF. What we are seeing is some TC's refuse to ack some LSA reannouncements from the DR, causing it to eventually drop and restart OSPF sessions at random times.
Since there's very little info available from the ARC, what kind of debugging have you done on the cisco side? There's lots of good info available there that might point you in the right direction.
Start with "term mon" to get debug going to your session, then look at what comes up under "debug ospf ?". You can get as little as state changes to full db dumps; somewhere you may find what kind of weird stuff the ARC is doing.
I got OSPF working well, but we have no MPIP set up, so... I guess I'm lucky. :)
Charles
_______________________________________________ USR-TC mailing list USR-TC@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/usr-tc
participants (5)
-
ayo faiyetole -
Charles Sprickman -
Jeff McAdams -
Joshua Coombs -
Todd Bertolozzi