RESCOMP Archives

June 2006

RESCOMP@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
Research Computing Support <[log in to unmask]>, Robin <[log in to unmask]>
Date:
Thu, 1 Jun 2006 08:19:59 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (88 lines)
Steve,

core dump will help.

Core dump is enabled by default. All core dumps go to: /tmp/corefiles/*
ls -lt to see the ownership and time.

Core-dumping on network file system (e.g. NFS or IBRIX) overflows  
kernel stack. This means that it can't be core-dumped to your home dir.
The bad news on such setup is that you need to know where your code  
runs at.

Send us an email when the code 'bus error'ed' out. I'll search  
through all compute nodes for you to find where the core dump is.
If you happen to know where it is, then you can ssh into that compute  
node.

Perhaps, a lot advanced users will like to know that. We ought to  
come up with a better way to find out where the core dump files are:  
in case users are interested.

Thanks,
Robin

On May 31, 2006, at 9:33 PM, Stephen Wright wrote:

> I compiled it with icc.
>
> I've resubmitted the job to see if it happens again overnight or if  
> it was
> a fluke.  So far, I haven't had trouble with other jobs running the  
> same
> code on different data.  I'll go over the source tomorrow and make  
> sure I
> have checks on possible out-of-bounds array references.
>
> A job that runs for 8 hours before puking probably isn't a good  
> candidate
> for "icc -g".  Would it be helpful to have a core dump anyway?  If  
> so, how
> do we enable core dumping?
>
> SW
>
>> For more info please look at:
>>
>> http://web.mit.edu/answers/unix/unix_bus_or_seg.html
>>
>>
>>> It's in C.
>>>
>>>> Steve,
>>>>
>>>> Is it a C or C++ code ?
>>>>
>>>> Thanks,
>>>> Robin
>>>>
>>>>
>>>> On May 31, 2006, at 5:46 PM, Stephen Wright wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I just had a PBS batch job quit after 8 of 48 hours with the
>>>>> following error:
>>>>>
>>>>> /var/spool/PBS/mom_priv/jobs/4578.mulnx31.SC: line 12: 18368  
>>>>> Bus error
>>>>>
>>>>> Does this indicate a PBS problem or something I should be looking
>>>>> for in
>>>>> my code?
>>>>>
>>>>> Steve
>>>>
>>>
>>
>>
>> --
>> Jaime E. Combariza
>> Assistant Director Research Computing
>> Academic Technology Services
>> [log in to unmask]
>> (513) 529-5080
>> Miami University
>> Oxford, Ohio 45056
>>

ATOM RSS1 RSS2