LISTSERV - CLEANACCESS Archives - LISTSERV.MIAMIOH.EDU

CLEANACCESS Archives

July 2005

CLEANACCESS@LISTSERV.MIAMIOH.EDU

	LISTSERV Archives
	CLEANACCESS Home
	CLEANACCESS July 2005

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Warning: Failover Arp Storms
From:	Bradford Saul <[log in to unmask]>
Reply To:	Perfigo SecureSmart and CleanMachines Discussion List <[log in to unmask]>
Date:	Thu, 21 Jul 2005 14:18:53 -0400
Content-Type:	text/plain
Parts/Attachments:	text/plain (76 lines)

So while we are on the subject of failover.  What are your thoughts on
failover for the manager?  We have not implemented this rather we have a
"warm" spare ready to take over if needed.

Thanks for the input....

Brad

PS:  Perfigo v3.2.13 (Yeah I know we are WAY behind, but it works....)


On 7/21/05 2:10 PM, "Aaron Havens" <[log in to unmask]> wrote:

> Eric Weakland wrote:
>> 
>> All,
>> 
>> First of all - thanks to all of you on this list, it is a great resource
>> for us here at American.
>> 
>> I wanted to let anyone out there who is going to be implementing
>> failover bundles know of a rather alarming series of events that
>> happened to us yesterday.  We have Clean Access high availability
>> bundles implemented doing vlan retagging using both the ethernet
>> interfaces.  In the documentation it says that you can "optionally" use
>> serial cables to also send the heartbeat information for failover, but
>> that the heartbeats will always be sent by eth1 from one server to
>> another in a pair.  Here is a quote: "The serial connection essentially
>> provides an additional method of heartbeat exchange that must fail
>> before the standby system can take over. Note however that only the eth1
>> connection between the peers is mandatory."  from
>> http://www.cisco.com/application/pdf/en/us/guest/products/ps6128/c1616/ccmigr
>> ation_09186a00803e0969.pdf
>> 
>> 
>> The short story is that in our experience you should not just depend on
>> eth1.  Our team implemented failover on Monday and on Wednesday, all of
>> a sudden our core router cpu utilization (on 3 cat6500 series msfc2's)
>> went to 99% when they were usually at 15% max.  Clients couldn't get
>> DHCP addresses, couldn't get to the internet, etc.  It looked exactly
>> like a DDOS attack and that was what TAC told us to look for and made
>> sure we had the commands to track the DOSers down.  We realized soon
>> after that, however, that all of our full input queues were on segments
>> that had been migrated to CCA.  A little more empirical testing revealed
>> if we shut down just the CCA standby servers - all was well.  So
>> essentially the servers were fighting for the primary role and hammering
>> the routers with ARP storms.
>> 
>> We will be implementing Serial Failover cables before turning the
>> failover boxes back on.
>> 
>> Cheers!
>> 
>> Eric Weakland
>> CNE, CISSP
>> Director, Network Security
>> Office of Information Technology (IT)
>> American University
>> eric(at)american.edu
> 
> Thank you for that. I am about to finally get my Failover servers setup
> today. I will make sure I use a serial cable also.

-----------------------------------
Bradford B. Saul
Lead Network Engineer
IT - Network Engineering
Hoffman Hall Room 10
MSC 0601
James Madison University
Harrisonburg, VA 22807
V: (540) 568-2379
F: (540) 568-1696
M: (540) 435-3079
[log in to unmask]

ATOM RSS1 RSS2

LISTSERV.MIAMIOH.EDU