CLEANACCESS Archives

March 2006

CLEANACCESS@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Rajesh Nair (rajnair)" <[log in to unmask]>
Reply To:
Perfigo SecureSmart and CleanMachines Discussion List <[log in to unmask]>
Date:
Mon, 20 Mar 2006 16:07:34 -0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (116 lines)
Jason,

Sorry for the inconvenience. 
TAC cannot be faulted for this since they may have reached that
conclusion based on the symptoms which is that the machine(CAS) stops
communicating.   

Actually, I can be certain of the issue if you send me the first few
lines following the kernel panic.  As I mentioned in my previous email,
I really need to see the messages to confirm one way or the other.  

If it is indeed the issue I was refering to, then the answer to your
question is yes (both those options should be turned off and the CAS
restarted - service perfigo restart).  

And I do understand that this is a very desired feature.  We have fixed
the issue and we included it in 3.6.2 which is scheduled to be released
this evening.  I understand that it would be another minor upgrade for
you (and would involve either a UI upgrade or command line based
upgrade).  However, you would be able to use the OS fingerprinting
feature once 3.6.2 is applied. 

Once again, sorry for the inconvenience. 

-Rajesh.

-----Original Message-----
From: Perfigo SecureSmart and CleanMachines Discussion List
[mailto:[log in to unmask]] On Behalf Of Jason Richardson
Sent: Monday, March 20, 2006 3:03 PM
To: [log in to unmask]
Subject: Re: CASes going down after upgrade to 3.6.1.1

Hi Raj, I really appreciate the quick reply.  Unfortunately, we spent
two hours on the phone with TAC and this never came up.  What they had
us do was a firmware upgrade of the BCOM NICs in the servers (we're
running MCS-7825-H1's with the BCOM 5702X NIC) which we just completed. 
We have not upgraded the firmware in the other two CASes or the CAMs yet
although they all have the same BCOM NIC.  When you say disable "OS
fingerprinting" do you mean uncheck both of the boxes - "Set client OS
to WINDOWS_ALL when Win32 platform is detected" and "Set Client OS to
WINDOWS_ALL when Windows TCP/IP stack is detected (Best Effort Match)?" 
This is a real bummer since this is the feature that we were looking
forward to implementing the most.

Thanks,

Jason

---
Jason Richardson
Manager, Security Systems
Enterprise Systems Support
Northern Illinois University

>>> [log in to unmask] 3/20/2006 4:21:38 PM >>>
Jason,

We have recently discovered an issue with the OS fingerprinting feature
that can cause a kernel panic (machine hanging).  This issue is fixed in
3.6.2 which should be released late tonight.  

To see if this is the issue affecting your machines, please turn off the
OS detection feature on te machine that is crashing and  see if that
"fixes" the problem.  If that is the case, then I would recommend that
the OS fingerprinting feature be turned off until 3.6.2 is applied.
Note that this only happens in certain situations where there is a
client that deliberately sends certain null headers/mismatched TCP
headers.  Of course, when 3.6.2 is applied, you can turn the feature
back on. 

Jason, could you send the messages (you can send them to me offline)
that appear on the console at the time of kernel panic?  That will help
better identify the root cause.

-Rajesh.

-----Original Message-----
From: Perfigo SecureSmart and CleanMachines Discussion List
[mailto:[log in to unmask]] On Behalf Of Jason Richardson
Sent: Monday, March 20, 2006 1:22 PM
To: [log in to unmask]
Subject: CASes going down after upgrade to 3.6.1.1

Hi all, has anyone else had problems with their CASes after upgrading to
3.6.x?  Our 2 CAMs and 4 CASes were running fine after our upgrade last
week, but the students weren't back from break yet.  We came in this
morning to a trouble ticket from students reporting that they could not
login.  Upon investigation we found one CAS totally unresponsive -
disconnected from the CAM and wouldn't respond to a ping or SSH.  We
literally had to power cycle it to get it back and that seemed to
resolve the problem.  This afternoon we got another trouble ticket
reporting the same problem and found another CAS in the same state with
a message on the console of "kernel panic - not syncing, fatal exception
in interrupt."  The interesting thing about the second one is that when
we did the upgrade it took almost 2x as long as to install the OS as the
others and 2x as long to reboot after we applied the 3.6.1.1 patch. 
The
first one that went down installed just fine, but also took a long time
to come back after applying the .1 patch.

We're on the phone with TAC now, but we were just wondering whether
anyone else had had similar problems.

Simon, to answer your question, we completed the entire upgrade of 2
CAMs an4 CASes in under three hours and we considered it a total success
until today.

Thanks,

---
Jason Richardson
Manager, Security Systems
Enterprise Systems Support
Northern Illinois University

ATOM RSS1 RSS2