RESCOMP Archives

March 2009

RESCOMP@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Moler, James Clark III" <[log in to unmask]>
Reply To:
Research Computing Support <[log in to unmask]>, Moler, James Clark III
Date:
Sat, 14 Mar 2009 18:56:09 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (31 lines)
I'm very sorry for this, but after a few identical runs that worked, the problem has occurred again.  The jobid this time is 994493.

I am at a loss as to what is causing the issue, but given that it's occurred twice it can't be a coincidence, so I will stop running these jobs until I can diagnose and address it.  Again, I sincerely apologize for the waste of computing resources.

Thank you,


--
James C. Moler ("Trey")
Graduate Student
Computer Science and Systems Analysis
Miami University
Oxford, OH 45056
________________________________________
From: Moler, James Clark III
Sent: Saturday, March 14, 2009 2:56 PM
To: [log in to unmask]
Subject: Problem with a redhawk job not terminating

I have a job on the redhawk cluster (jobid 993111) that was supposed to terminate some time ago, as it exceeded its walltime.  I actually tried to delete the job using qdel about 6 hours into its execution (it should have finished within an hour, and it hasn't done this in the 50+ times I have run it in the past) but the job refused to be deleted.  Right now it appears that the PBS system is trying to stop it and it isn't responding, so I am getting an email every 30 seconds or so telling me that the job was stopped, and the job is staying in the system.

Thanks,


--
James C. Moler ("Trey")
Graduate Student
Computer Science and Systems Analysis
Miami University
Oxford, OH 45056

ATOM RSS1 RSS2