LISTSERV - CLEANACCESS Archives - LISTSERV.MIAMIOH.EDU

CLEANACCESS Archives

July 2005

CLEANACCESS@LISTSERV.MIAMIOH.EDU

	LISTSERV Archives
	CLEANACCESS Home
	CLEANACCESS July 2005

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Warning: Failover Arp Storms
From:	"King, Michael" <[log in to unmask]>
Reply To:	Perfigo SecureSmart and CleanMachines Discussion List <[log in to unmask]>
Date:	Thu, 21 Jul 2005 14:30:52 -0400
Content-Type:	text/plain
Parts/Attachments:	text/plain (117 lines)

Three features that I "REALLY" enjoy on 3.5.2.1

Concept of DNS based allow's  (You put in update.microsoft.com, and
that's it)
Auto Updating Agent.
You can see which specific CHECK (on the server) that clients failed.
(As opposed to the whole rule)

On the subject of Failover.

The ARP storm thing, Did you Disable Spanning Tree?  It sounds like the
ports were losing contact and cycling up and down.

HA on the Manager (and the Server as Well)
	 HA only does basic checking on the system itself.  Not on the
Health of the system.  Big distinction there.
That means, if the database daemon on the primary server dies, the
Secondary HA pair WILL NOT take over, because the HA is only checking
the status of the box.

It's better to think of it as a warm spare that has an upto the second
database replication.  If your primary physically died, the secondary
would take over in less than a minute. (usually 20 seconds for me)
However, if your Primary is still up, but some software componet of the
Perfigo process dies, the primary Will NOT failover.  However, You can
manaually fail over to the secondary, thereby restablishing network
connectivity very quickly, and allows you time to diagnose the problem
on the Primary.

-----Original Message-----
From: Perfigo SecureSmart and CleanMachines Discussion List
[mailto:[log in to unmask]] On Behalf Of Bradford Saul
Sent: Thursday, July 21, 2005 2:19 PM
To: [log in to unmask]
Subject: Re: Warning: Failover Arp Storms

So while we are on the subject of failover.  What are your thoughts on
failover for the manager?  We have not implemented this rather we have a
"warm" spare ready to take over if needed.

Thanks for the input....

Brad

PS:  Perfigo v3.2.13 (Yeah I know we are WAY behind, but it works....)


On 7/21/05 2:10 PM, "Aaron Havens" <[log in to unmask]> wrote:

> Eric Weakland wrote:
>> 
>> All,
>> 
>> First of all - thanks to all of you on this list, it is a great 
>> resource for us here at American.
>> 
>> I wanted to let anyone out there who is going to be implementing 
>> failover bundles know of a rather alarming series of events that 
>> happened to us yesterday.  We have Clean Access high availability 
>> bundles implemented doing vlan retagging using both the ethernet 
>> interfaces.  In the documentation it says that you can "optionally" 
>> use serial cables to also send the heartbeat information for 
>> failover, but that the heartbeats will always be sent by eth1 from 
>> one server to another in a pair.  Here is a quote: "The serial 
>> connection essentially provides an additional method of heartbeat 
>> exchange that must fail before the standby system can take over. Note

>> however that only the eth1 connection between the peers is 
>> mandatory."  from 
>> http://www.cisco.com/application/pdf/en/us/guest/products/ps6128/c161
>> 6/ccmigr
>> ation_09186a00803e0969.pdf
>> 
>> 
>> The short story is that in our experience you should not just depend 
>> on eth1.  Our team implemented failover on Monday and on Wednesday, 
>> all of a sudden our core router cpu utilization (on 3 cat6500 series 
>> msfc2's) went to 99% when they were usually at 15% max.  Clients 
>> couldn't get DHCP addresses, couldn't get to the internet, etc.  It 
>> looked exactly like a DDOS attack and that was what TAC told us to 
>> look for and made sure we had the commands to track the DOSers down.

>> We realized soon after that, however, that all of our full input 
>> queues were on segments that had been migrated to CCA.  A little more

>> empirical testing revealed if we shut down just the CCA standby 
>> servers - all was well.  So essentially the servers were fighting for

>> the primary role and hammering the routers with ARP storms.
>> 
>> We will be implementing Serial Failover cables before turning the 
>> failover boxes back on.
>> 
>> Cheers!
>> 
>> Eric Weakland
>> CNE, CISSP
>> Director, Network Security
>> Office of Information Technology (IT) American University 
>> eric(at)american.edu
> 
> Thank you for that. I am about to finally get my Failover servers 
> setup today. I will make sure I use a serial cable also.

-----------------------------------
Bradford B. Saul
Lead Network Engineer
IT - Network Engineering
Hoffman Hall Room 10
MSC 0601
James Madison University
Harrisonburg, VA 22807
V: (540) 568-2379
F: (540) 568-1696
M: (540) 435-3079
[log in to unmask]

ATOM RSS1 RSS2

LISTSERV.MIAMIOH.EDU