CLEANACCESS Archives

October 2005

CLEANACCESS@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Jason Richardson <[log in to unmask]>
Reply To:
Perfigo SecureSmart and CleanMachines Discussion List <[log in to unmask]>
Date:
Tue, 11 Oct 2005 12:57:06 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (206 lines)
Well, after Cisco gave up and told us to reload the OS on our primary
again, we managed to get the two CAMs to sync by revoking permissions to
all added users and groups on the primary and then trying the sync
again.  A note for anyone who has used pgadmin to add read-only users
for accessing the database is to revoke permissions to all added users
and groups before deleting the users and groups. For some reason, the
database maintains permission info for non-existing group in the
tables.

Thanks,

---
Jason Richardson
Manager, IT Security and Client Development
Enterprise Systems Support
Northern Illinois University


>>> [log in to unmask] 10/10/2005 9:29 PM >>>
Hi Rajesh, we have been having that same conversation with Cisco SE's
since last Friday, and all of the sync problems started somewhere
between 5AM and 5:10AM last Wed. when we did the upgrade to 3.5.5. 
Prior to that the CAMs were synced just fine so the upgrade was at
least
the catalyst for our problems.

One of the problems has been differences in terminology as the SEs
have
also referred to the "package" or "app" that we installed as reason
for
telling us that our config is unsupportable and that we have to reload
everything from scratch.  My tech, however, insists that he didn't
install an app or package at all, he simply edited the pg_hba.conf
file
to allow read-only access to the postgres database by a few other
machines, and created a few read-only database accounts for doing so,
following instructions that have been posted on this listserv,
although
not posted by anyone at Cisco.  We have since removed those accounts
to
restore the system to plain vanilla, but the problem remains.  Maybe
the
differences in terminology are irrelevant and we're talking about the
same thing.  The end result is that we ended up reloading the OS on
the
back-up CAM at the SE's suggestion, but we are not inclined to take
our
primary down to reload it without talking to someone who knows
postgres
better than the L2 than we talked to today.  I'm assuming that some of
the Perfigo people came over when Cisco acquired the company so I am
hopeful that we will eventually be put in touch with one if we hold on.

We will try what you suggested (with all caveats taken into account)
and
continue to work with the L2s and hope that someone here has something
to add.

Thanks,

---
Jason Richardson
Manager, IT Security and Client Development
Enterprise Systems Support
Northern Illinois University


>>> [log in to unmask] 10/10/05 9:10 PM >>>
Hi Jason,

I am from Cisco and I know about this case.  I don't think this has
anything to do with the upgrade to ver 3.5.5. 

I believe what happened on your CAMs is that the package that you
installed (pgadmin or something similar) modified the database system
tables.  For instance, one of the differences I noticed was that there
were entries in pg_group whereas in a "unmodified" CAM, there are no
entries in the pg_group.  

The package that you installed on the CAM modified NOT ONLY the system
tables but also modified the CCA database (controlsmartdb).  Try the
following on your CAM that is up and running:
# psql -h 127.0.0.1 controlsmartdb postgres
controlsmartdb=# \dp

What you will see is that each of the tables has some additional
access
privilege information.  This information, as you can see, has been
written into the controlsmartdb database. 

Hence, what happens is that when the inactive box is
rebooted/restarted
and gets the database snapshot from its peer for synchronization, the
database snapshot will contain information about these privileges
(e.g.
it will contain instructions to grant read access to <foo_table_name>
for the "ReadOnly" group).  However, such access privilege information
is not valid on this inactive machine because those groups (e.g.
ReadOnly) do not exist for whatever reason.  You will have to consult
the documentation for the package you installed for more information. 

I suspect that the only issue would be the group information and the
entries in the pg_group system table.  However, I am unsure as I don't
know what package was installed and may not be able to comment about
the
package because we won't necessarily know how it functions. 

Hence, what the Cisco TAC engineer told you is entirely reasonable
from
his/her point-of-view.  Since they don't know what package you
installed, nor could they be expected to know how that package affects
the system, they would not be able to hazard any suggestions.  

I can offer a suggestion - but please note that this is only a
suggestion and may not work at all and should be taken with more than
a
pinch of salt because I am totally unfamiliar with the package you
tried
to install on the CAM.  :-) Sorry, I have to provide the disclaimer up
front. 

I suspect that if you try to replicate the entries from pg_group on
the
working system to the pg_group on the inactive system and then try the
failover, it might work.  If the only issue with the database restore
is
that the appropriate pg_group entries (i.e. ReadOnly) are not
available,
this might work.  However, you might very well run into other issues
(i.e. other changes to pg_catalog system that I am unaware of at this
point) and this might only be the first one. 

Please let me know how things proceed.

Regards and hope this helps,
-Rajesh.

-----Original Message-----
From: Perfigo SecureSmart and CleanMachines Discussion List
[mailto:[log in to unmask]] On Behalf Of Jason Richardson
Sent: Monday, October 10, 2005 5:49 PM
To: [log in to unmask] 
Subject: Problems with database sync between CAMs after upgrade to
ver.
3.5.5

Hi all, ever since upgrading our two CAS and CAM servers from 3.5.3.1
to
3.5.5, and the agent to 3.5.8, (the Cisco SE that we trust to give us
good advice was not comfortable with 3.5.6 or 3.5.8 yet), we have been
unable to get our CAMs to sync the database.  We have two for HA, but
we
have only been running with our primary since last Wed. AM when we
completed the upgrade.  I've pasted my tech's explanation of the issue
below.  Please let us know if you have experienced the same or
anything
like it because we have pretty much exceeded the Cisco L2's knowledge
that has been working with us.  The current status is that the back-up
CAM has been reinstalled, but it will not sync with the primary
because
it hangs on a non-existent postgres user group named "read_only".  The
accounts that we created were read only but they have been removed.

TIA,

---
Jason Richardson
Manager, IT Security and Client Development Enterprise Systems Support
Northern Illinois University

 
We had a bit of a meltdown with the backup CAM. We upgraded to version
3.5.5 last Wednesday and after the patch the failover stopped syncing
with the main database. Our upgrade happened at about 5 AM Wednesday
morning and the backup had a copy of the database until 5:11 AM. The
standby was still sending the heartbeat, just the data wasn't in sync.
I
had made some changes to the CAMs a while back to allow read only
access
to the database, but after the upgrade all the changes had reverted to
original configuration. 
 
What I had done before the upgrade: 
Addedd IP addresses to pg_hba.conf to allow access to the database
Created read-only account so as not to use the admin account. 
 
With these changes, the main and failover were syncing fine until the
upgrade. Thursday I realized that the changes I had made had been
reverted to defaults so I added them back in. After doing so, I was
able
to read the data in the backup and noticed that there was no data
since
5:11 AM Wednesday morning. 
 
Our Network Engineers contacted Cisco and were told that because of
what
I had done, they were unable to help and therefore need to re-install
the standby. This is where we are now. 
 
I would really like to know what may have caused this loss of
communication between the databases. I'm fairly positive the changes I
made would not have done it as it was syncing fine after I had made
those and the problem arose after the upgrade which set it to
defaults.
 

ATOM RSS1 RSS2