RESCOMP Archives

March 2009

RESCOMP@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Woods, David M. Dr." <[log in to unmask]>
Reply To:
Research Computing Support <[log in to unmask]>, Woods, David M. Dr.
Date:
Sat, 14 Mar 2009 20:50:51 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (44 lines)
For both jobs, it looks like they exceeded their walltime requests.  When this happens, the scheduler signals the job to end, but it looks like that didn't get handled properly.

I've deleted the job - in the future, I'd suggest using a walltime request that gives you plenty of spare time - you can request up to 480 hours.

Don't worry about wasting resources - it's expected that things like this will happen as different users try different things on the cluster.

Dave
________________________________________
From: Research Computing Support [[log in to unmask]] On Behalf Of Moler, James Clark III [[log in to unmask]]
Sent: Saturday, March 14, 2009 6:56 PM
To: [log in to unmask]
Subject: Re: Problem with a redhawk job not terminating

I'm very sorry for this, but after a few identical runs that worked, the problem has occurred again.  The jobid this time is 994493.

I am at a loss as to what is causing the issue, but given that it's occurred twice it can't be a coincidence, so I will stop running these jobs until I can diagnose and address it.  Again, I sincerely apologize for the waste of computing resources.

Thank you,


--
James C. Moler ("Trey")
Graduate Student
Computer Science and Systems Analysis
Miami University
Oxford, OH 45056
________________________________________
From: Moler, James Clark III
Sent: Saturday, March 14, 2009 2:56 PM
To: [log in to unmask]
Subject: Problem with a redhawk job not terminating

I have a job on the redhawk cluster (jobid 993111) that was supposed to terminate some time ago, as it exceeded its walltime.  I actually tried to delete the job using qdel about 6 hours into its execution (it should have finished within an hour, and it hasn't done this in the 50+ times I have run it in the past) but the job refused to be deleted.  Right now it appears that the PBS system is trying to stop it and it isn't responding, so I am getting an email every 30 seconds or so telling me that the job was stopped, and the job is staying in the system.

Thanks,


--
James C. Moler ("Trey")
Graduate Student
Computer Science and Systems Analysis
Miami University
Oxford, OH 45056

ATOM RSS1 RSS2