That should work, but prefer a loop where there is an upper limit than
an infinite loop. Having a sleep of 1 second should be sufficient.
Now that is just 1 spot. Are you going to modify all the spots including
those in the libraries and C/C++ code underlying python (that may create
intermediate JIT files as needed)? Perhaps you going to do the
easy/correct thing and create directories for each job?
On Tue, 2015-08-25 at 09:13 -0400, Karro, John wrote:
> Its just a timing issue, correct? So if I were to do something like:
>
>
> while True:
> try:
> fp = open(blah blah)
> except:
> continue
> break
>
>
> That should allow it to eventually open the file and continue on,
> correct?
>
>
> Would it be better to stuck a sleep statement in exception -- is this
> something I can count on needing few second to resolve?
>
>
> John
>
> ----------------------------------------------------------------------------------------------
> Dr. John Karro, Associate Professor
> Department of Computer Science and Software Engineering
> Affiliate: Department of Microbiology, Department of Statistics
> Office: Benton 205D, Miami University, Oxford, Ohio
> ----------------------------------------------------------------------------------------------
>
> On Tue, Aug 25, 2015 at 8:59 AM, Dhananjai M. Rao <[log in to unmask]>
> wrote:
> If they are writing to the same directory from different
> compute nodes
> then this problem can occur because the directory entries are
> being
> updated.
>
> node #1: Open's a file so the directory's timestamps and
> inodes have to
> be updated.
>
> node #2: Tries to open a file in the same directory as inodes
> are being
> updated and NFS has to reject the second one because directory
> file
> handles are stale
>
> This is a standard issue and no amount of additional file
> space or speed
> or solution will fix it other than to create separate
> directories for
> each job.
>
> On Tue, 2015-08-25 at 08:57 -0400, Karro, John wrote:
> > Yes.
> >
> >
> ----------------------------------------------------------------------------------------------
> > Dr. John Karro, Associate Professor
> > Department of Computer Science and Software Engineering
> > Affiliate: Department of Microbiology, Department of
> Statistics
> > Office: Benton 205D, Miami University, Oxford, Ohio
> >
> ----------------------------------------------------------------------------------------------
> >
> > On Tue, Aug 25, 2015 at 8:56 AM, Dhananjai M. Rao
> <[log in to unmask]>
> > wrote:
> > Are there multiple jobs writing to the same
> directory?
> >
> > On Tue, 2015-08-25 at 08:53 -0400, Karro, John
> wrote:
> > > I really don't think so. Obviously, I could have
> a bug I'm
> > unaware
> > > of. But that output file should be unique to that
> call to
> > that
> > > script. I really don't see how I could have
> screwed it up.
> > >
> > >
> >
> ----------------------------------------------------------------------------------------------
> > > Dr. John Karro, Associate Professor
> > > Department of Computer Science and Software
> Engineering
> > > Affiliate: Department of Microbiology, Department
> of
> > Statistics
> > > Office: Benton 205D, Miami University, Oxford,
> Ohio
> > >
> >
> ----------------------------------------------------------------------------------------------
> > >
> > > On Tue, Aug 25, 2015 at 8:49 AM, Dhananjai M. Rao
> > <[log in to unmask]>
> > > wrote:
> > > Is it possible that there is some other
> job that is
> > writing to
> > > the same
> > > file? If you are bulk scheduling jobs, is
> it
> > possible that you
> > > are
> > > accidentally scheduling 2 jobs that are
> writing to
> > the same
> > > directory?
> > >
> > > On Tue, 2015-08-25 at 08:44 -0400, Karro,
> John
> > wrote:
> > > > Can anyone explain to me the following
> OS errors
> > occurring
> > > > sporadically on Redhawk, as returned by
> my Python
> > code:
> > > >
> > > >
> > > > Traceback (most recent call last):
> > > > File "consensus_seq.py", line 84, in
> <module>
> > > >
> >
> main(args.seq,args.elements,args.output,args.fa_output)
> > > > File "consensus_seq.py", line 45, in
> main
> > > > wp = open(output, "w")
> > > > OSError: [Errno 116] Stale NFS file
> handle:
> > > >
> 'SEEDS1/PHRAIDER/ce10.chrV.s2.f3.consensus.txt'
> > > >
> > > >
> > > > I'm running batches of jobs, and this
> seems to pop
> > up
> > > every-once in a
> > > > while and kill my pipeline. The
> directory does
> > exist. And
> > > if I rerun
> > > > the program (many hours later) it works
> fine.
> > > >
> > > > Any idea why this might happen?
> > > >
> > > > John
> > > >
> > > >
> > > >
> > > >
> > >
> >
> ----------------------------------------------------------------------------------------------
> > > >
> > > >
> > > > Dr. John Karro, Associate Professor
> > > > Department of Computer Science and
> Software
> > Engineering
> > > > Affiliate: Department of Microbiology,
> Department
> > of
> > > Statistics
> > > > Office: Benton 205D, Miami University,
> Oxford,
> > Ohio
> > > >
> > >
> >
> ----------------------------------------------------------------------------------------------
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
>
>
>
>
>
|