http://dmtcp.sourceforge.net/, http://cryopid.berlios.de seem to do check-point.
I've not used it myself and definitely not installed on the server.
Seems to be a neat tool.
Robin
On May 19, 2010, at 2:41 PM, Robin, Robin wrote:
> Hi Steve,
>
>
>>> 1. What is the max runtime for a batch job on the new cluster, assuming one node is requested?
>
> qmgr -c 'p s';
> Seems that the maxtime is 480 hours.
>
>>> 2. What tools are available that might allow me to save the execution state of a batch job and continue it in a subsequent job?
>
> I see: http://cryopid.berlios.de
> We can manually increase the time above the queue limit (when needed).
>
> Robin