My first question here, tell me if I am doing things wrong.
My problem
I am writing a module using Numba. Of course I have run accross a segfault, and I can't find where it comes from. So I am trying to debug it using gdb from Numba, but it does not work: the segfault is raised but I get no information from where it comes from:
[Reading symbols]
28 ../sysdeps/unix/sysv/linux/nanosleep.c: Aucun fichier ou dossier de ce type.
0x00007feacc67ea30 in __GI___nanosleep (requested_time=0x7ffdfe227510,
remaining=0x7ffdfe227510) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
Breakpoint 1 at 0x7fea9d234150: file numba/_helperlib.c, line 1131.
Continuing.
double free or corruption (!prev)
51 ../sysdeps/unix/sysv/linux/raise.c: Aucun fichier ou dossier de ce type.
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
(gdb)
During the gdb initialisation, this is also printed in stderr:
attach: No such file or directory.
Note that I have already managed the ptrace problem mentioned here by setting kernel.yama.ptrace_scope=0
Reproducing Numba's debug example
I don't know if it counts as a working reproducer, but I tried to run the example from the Numba documentation that I mentionned earlier :
from numba import njit, gdb_init
import numpy as np
#njit(debug=True)
def foo(a, index):
gdb_init() # instruct Numba to attach gdb at this location, but not to pause execution
b = a + 1
c = a * 2.34
d = c[index] # access an address that is a) invalid b) out of the page
print(a, b, c, d)
bad_index = int(1e9) # this index is invalid
z = np.arange(10)
r = foo(z, bad_index)
print(r)
And I can't get the same output as they do:
[Reading symbols]
0x00007efcb5a60a30 in __GI___nanosleep (requested_time=0x7ffff4990f70,
remaining=0x7ffff4990f70) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
28 ../sysdeps/unix/sysv/linux/nanosleep.c: Aucun fichier ou dossier de ce type.
Breakpoint 1 at 0x7efcabf3d150: file numba/_helperlib.c, line 1131.
Continuing.
Traceback (most recent call last):
File "/home/.../test_segfault.py", line 16, in <module>
r = foo(z, bad_index)
IndexError: index is out of bounds
[Inferior 1 (process 4299) exited with code 01]
My error message does not indicate me the exact line of the segfault, just the jitted function..
And again, I have this message, which may be the source of my problem..
attach: No such file or directory.
Could anyone help me make this work ?
Or a link to an equivalent question ? I have found the documentation and forums really short when it comes to debugging Numba.
Or maybe alternative ways to trace a segfault in Numba jitted functions ?
Thank you for reading me, I hope it is understandable.
the segfault is raised but I get no information from where it comes from
This is not segfault, this is SIGABRT because of double free or corruption that was detected by glibc. This kind of error is better debugged by Valgrind or AddressSanitizer, not gdb. If your program is relatively small use Valgrind because it is easier to use. If not I'd suggest to use AddressSanitizer.
Related
I'm trying to learn CUDA for python using Numba in a Google Colab jupyter notebook. To learn how to apply 3D thread allocation for nested loops I wrote the following kernel:
from numba import cuda as cd
# Kernel to loop over 3D grid
#cd.jit
def grid_coordinate_GPU():
i = cd.blockDim.x * cd.blockIdx.x + cd.threadIdx.x
j = cd.blockDim.y * cd.blockIdx.y + cd.threadIdx.y
k = cd.blockDim.z * cd.blockIdx.z + cd.threadIdx.z
print(f"[{i},{j},{k}]")
# Grid Dimensions
Nx = 2
Ny = 2
Nz = 2
threadsperblock = (1,1,1)
blockspergrid = (Nx,Ny,Nz)
grid_coordinate_GPU[blockspergrid, threadsperblock]()
The problem I however find is that printing the coordinates in format string does not work. The exact error I get is:
TypingError: Failed in cuda mode pipeline (step: nopython frontend)
No implementation of function Function(<class 'str'>) found for signature:
>>> str(int64)
There are 10 candidate implementations:
- Of which 8 did not match due to:
Overload of function 'str': File: <numerous>: Line N/A.
With argument(s): '(int64)':
No match.
- Of which 2 did not match due to:
Overload in function 'integer_str': File: numba/cpython/unicode.py: Line 2394.
With argument(s): '(int64)':
Rejected as the implementation raised a specific error:
NumbaRuntimeError: Failed in nopython mode pipeline (step: native lowering)
NRT required but not enabled
During: lowering "s = call $76load_global.17(kind, char_width, length, $const84.21, func=$76load_global.17, args=[Var(kind, unicode.py:2408), Var(char_width, unicode.py:2409), Var(length, unicode.py:2407), Var($const84.21, unicode.py:2410)], kws=(), vararg=None, varkwarg=None, target=None)" at /usr/local/lib/python3.7/dist-packages/numba/cpython/unicode.py (2410)
raised from /usr/local/lib/python3.7/dist-packages/numba/core/runtime/context.py:19
During: resolving callee type: Function(<class 'str'>)
During: typing of call at <ipython-input-12-4a28d7f41e76> (12)
To solve this I tried a couple of things.
Firstly I tried to initialise the CUDA simulator by setting the environment variable NUMBA_ENABLE_CUDASIM = 1 following the Numba Documentation. This however dit not change much.
Secondly I thought that the problem laid within the inability of the Jupiter notebook to print the result in the notebook instead of the terminal. I tried to solve this by following this GitHub post which instructed me to use wurlitzer. This however did not do much.
Lastly I added cd.synchronize() after the call to the kernel to try and mimic the c++ example I tried to implement in the first place. This sadly did not work either.
It would be amazing if someone could help me out!
The simple solution was to skip the formatted string and just use print(i,j,k) within the kernel instead.
I have a really weird behavior from my python program and I need your help to understand where to search.
I made quite a big a program using rpm (ReadProcessMemory from kernel32 windows DLL).
My issue is that my program sometimes closes without any Traceback nor Error.
It does not go to the end and just stops running.
Let's show a simple piece of code :
rPM =ctypes.WinDLL('kernel32',use_last_error=True).ReadProcessMemory
rPM.argtypes = [wintypes.HANDLE,wintypes.LPCVOID,wintypes.LPVOID,ctypes.c_size_t,ctypes.POINTER(ctypes.c_size_t)]
rPM.restype = BOOL
def ReadMemory(self, pPath):
pPath = int.from_bytes(pPath,"little")
PathBuffer = ctypes.create_string_buffer(40)
bytes_read = ctypes.c_size_t()
if not rPM(self.handle,pPath,PathBuffer,40, bytes_read ):
Logger.error("Cannot read Path from memory")
return None
DynamicX=struct.unpack("H", PathBuffer[0x02:0x02 + 2])[0]
DynamicY=struct.unpack("H", PathBuffer[0x06:0x06 + 2])[0]
StaticX=struct.unpack("H", PathBuffer[0x10:0x10 + 2])[0]
StaticY=struct.unpack("H", PathBuffer[0x14:0x14 + 2])[0]
return DynamicX, DynamicY, StaticX, StaticY
for i in range(50):
Logger.debug("Read Info")
ReadMemory()
Logger.debug("Finished Read Info")
Logger.debug("End of program")
Sometimes it will stop at occurence #30, sometime # 45, etc...
and sometimes it comes without any error at all and goes to the end, when running a failing program again it goes through this loop and fail in another one.
The memory I'm reading is the same between two different executions.
How could I get the reason for the closure ? I tried try: except: but never entering into the except catcher.
I'm using python 3.9.1 in windows.
Do you have a hint please, I really don't understand why and cannot fix it :(
Thanks !
Edit :
After more invetigation the crash is not always on rpm function, sometimes it's when using struct.unpack and sometimes (even stranger !) it's during the return statment !
I found on windows error logs a lot of APPCRASH :
Signature du problème
Nom d’événement du problème : APPCRASH
Nom de l’application: python.exe
Version de l’application: 3.7.6150.1013
Horodatage de l’application: 5dfac7ba
Nom du module défaillant: python37.dll
Version du module défaillant: 3.7.6150.1013
Horodateur du module défaillant: 5dfac78b
Code de l’exception: c0000005
Décalage de l’exception: 000000000004d547
Version du système: 10.0.19042.2.0.0.768.101
Identificateur de paramètres régionaux: 1036
Information supplémentaire n° 1: c75e
Information supplémentaire n° 2: c75e78fc0ea847c06758a77801e05e29
Information supplémentaire n° 3: 2730
Information supplémentaire n° 4: 27303d8be681197ea114e04ad6924f93
But I still don't know why it's crashing, I checked the memory and CPU usage of my computer and does not go higher than 60%.
I tried (as you can see) also to change my python version to another one.
Thanks, finally found the issue !
First step was to add :
faulthandler.enable()
It enables the windows crash event to be catched and display in std.err or a file.
It has given me the same thing as #Mark Tolonen said. Read access violation !
After knowing that I double checked my ReadMemory and the buffer size was bigger than expected. It means sometimes I tried to read more than the process memory and tried to read "somewhere else" : Eureka !
Thanks for your tips Mark, I learnt a lot with this one !
Due to this, I need to use python memoryview.cast('I') to access a FPGA avoiding double read/write strobe. No panic, you wont need an FPGA to answer the question below...
So here comes a python sample which fails ('testfile' can be any file here -just longer than 20 bytes-, but for me, eventually it will be a IO mapped FPGA HW):
#!/usr/bin/python
import struct
import mmap
with open('testfile', "r+b") as f:
mm=memoryview(mmap.mmap(f.fileno(), 20)).cast('I')
# now try to assign the 2 first U32 of the file new values 1 and 2
# mm[0]=1; mm[1]=2 would work, but the following fails:
mm[0:1]=memoryview(struct.pack('II',1,2)).cast('I') #assignement error
The error is:
./test.py
Traceback (most recent call last):
File "./test.py", line 8, in <module>
mm[0:1]=memoryview(struct.pack('II',1,2)).cast('I')
ValueError: memoryview assignment: lvalue and rvalue have different structures
I don't undestand the error... what "different structures" are we talking about??
How can I rewrite the right hand-side of the assignment expression so it works??
Changing the left-hand-side will fail for the FPGA... as it seems anything else generates wrong signal towards the hardware...
More generally, how should I rework my array of 32 bit integers to fit the left-hand-side of the assignment...?
Yes #Monica is right: The error was simpler than I though. The slice on the left hand side is wrong indeed.
I'm working on some machine learning code and today I've lost about 6 hours because simple typo.
It was this:
numpy.empty(100,100)
instead of
numpy.empty([100,100])
As I'm not really used to numpy, so I forgot the brackets. The code happily crunched the numbers and at the end, just before saving results to disk, it crashed on that line.
Just to put things in perspective I code on remote machine in shell, so IDE is not really an option. Also I doubt IDE would catch this.
Here's what I already tried:
running pylint - well pylint kinda works. After I've disabled everything apart of errors and warnings, it even seem to be usefull. But pylint have serious issue with imported modules. As seen on official bug tracker devs know about it, but cannot/won't do anything about it. There is suggested workaround, but ignoring whole module, would not help in my case.
running pychecker - if I create code snippet with the mistake I made, the pychecker reports error - same error as python interpreter. However if I run pychecker on the actual source file (~100 LOC) it reported other errors (unused vars, unused imports, etc.); but the faulty numpy line was skipped.
At last I have tried pyflakes but it does even less checking than pychecker/pylint combo.
So is there any reliable method which can check code in advance? Without actually running it.
A language with stronger type checking would have been able to save you from this particular error, but not from errors in general. There are plenty of ways to go wrong that pass static type checking. So if you have computations that takes a long time, it makes sense to adopt the following strategies:
Test the code from end to end on small examples (that run in a few seconds or minutes) before running it on big data that will consume hours.
Structure long-running computations so that intermediate results are saved to files on disk at appropriate points in the computation. This means that when something breaks, you can fix the problem and restart the computation from the last save point.
Run the code from the interactive interpreter, so that in the event of an exception you are returned to the interactive session, giving you a chance of being able to recover the data using a post-mortem debugging session. For example, suppose I have some long-running computation:
def work(A, C):
B = scipy.linalg.inv(A) # takes a long time when A is big
return B.dot(C)
I run this from the interactive interpreter and it raises an exception:
>>> D = work(A, C)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "q22080243.py", line 6, in work
return B.dot(C)
ValueError: matrices are not aligned
Oh no! I forgot to transpose C! Do I have to do the inversion of A again? Not if I call pdb.pm:
>>> import pdb
>>> pdb.pm()
> q22080243.py(6)work()
-> return B.dot(C)
(Pdb) B
array([[-0.01129249, 0.06886091, ..., 0.08530621, -0.03698717],
[ 0.02586344, -0.04872148, ..., -0.04853373, 0.01089163],
...,
[-0.11463087, 0.15048804, ..., 0.0722889 , -0.12388141],
[-0.00467437, -0.13650975, ..., -0.13894875, 0.02823997]])
Now, unlike in Lisp, I can't just set things right and continue the execution. But at least I can recover the intermediate results:
(Pdb) D = B.dot(C.T)
(Pdb) numpy.savetxt('result.txt', D)
Do you use unit tests? There is really no better way.
I am trying to embed Python 2.6 into MATLAB (7.12). I wanted to embed with a mex file written in C. This worked fine for small simple examples using scalars. However, if Numpy (1.6.1) is imported in anyway MATLAB crashes. I say anyway because I have tried a number of ways to load the numpy libraries including
In the python module (.py):
from numpy import *
With PyRun_SimpleString in the mex file:
PyRun_SimpleString(“from numpy import *”);
Calling numpy functions with Py_oBject_CallObject:
pOut = PyObject_CallObject(pFunc, pArgs);
Originally, I thought this may be a problem with embedding Numpy in C. However, Numpy works fine when embedded in simple C files that are compiled from the command line with /MD (multithread) switch with the Visual Studios 2005 C compiler. Next, I thought I will just change the make file in MATLAB to include the /MD switch. No such luck, mexopts.bat compiles with the /MD switch. I also manually commented out lines in the Numpy init module to find what was crashing MATLAB. It seems that loading any file with the extension pyd crashes MATLAB. The first of such files loaded in NumPy is multiarray.pyd. The MATLAB documentation describes how to debug mex files with visual studios which I did and placed the error message below. At this point I know the problem is a memory problem with the pyd’s and some conflict with MATLAB. Interestingly, I can use a system command in MATLAB to kick off a process in python that uses numpy and no error is generated. I will paste below the error message from MATLAB followed by the DEBUG output in visual studios of the processes that crash MATLAB. However, I am not pasting the whole thing because the list of first-chance exceptions is very long. Are there any suggestions for solving this integration problem?
MATLAB error
Matlab has encountered an internal problem and needs to close
MATLAB crash file:C:\Users\pml355\AppData\Local\Temp\matlab_crash_dump.3484-1:
------------------------------------------------------------------------
Segmentation violation detected at Tue Oct 18 12:19:03 2011
------------------------------------------------------------------------
Configuration:
Crash Decoding : Disabled
Default Encoding: windows-1252
MATLAB License : 163857
MATLAB Root : C:\Program Files\MATLAB\R2011a
MATLAB Version : 7.12.0.635 (R2011a)
Operating System: Microsoft Windows 7
Processor ID : x86 Family 6 Model 7 Stepping 10, GenuineIntel
Virtual Machine : Java 1.6.0_17-b04 with Sun Microsystems Inc. Java HotSpot(TM) Client VM mixed mode
Window System : Version 6.1 (Build 7600)
Fault Count: 1
Abnormal termination:
Segmentation violation
Register State (from fault):
EAX = 00000001 EBX = 69c38c20
ECX = 00000001 EDX = 24ae1da8
ESP = 0088af0c EBP = 0088af44
ESI = 69c38c20 EDI = 24ae1da0
EIP = 69b93d31 EFL = 00010202
CS = 0000001b DS = 00000023 SS = 00000023
ES = 00000023 FS = 0000003b GS = 00000000
Stack Trace (from fault):
[ 0] 0x69b93d31 C:/Python26/Lib/site-packages/numpy/core/multiarray.pyd+00081201 ( ???+000000 )
[ 1] 0x69bfead4 C:/Python26/Lib/site-packages/numpy/core/multiarray.pyd+00518868 ( ???+000000 )
[ 2] 0x69c08039 C:/Python26/Lib/site-packages/numpy/core/multiarray.pyd+00557113 ( ???+000000 )
[ 3] 0x08692b09 C:/Python26/python26.dll+00076553 ( PyEval_EvalFrameEx+007833 )
[ 4] 0x08690adf C:/Python26/python26.dll+00068319 ( PyEval_EvalCodeEx+002255 )
This error was detected while a MEX-file was running. If the MEX-file
is not an official MathWorks function, please examine its source code
for errors. Please consult the External Interfaces Guide for information
on debugging MEX-files.
If this problem is reproducible, please submit a Service Request via:
http://www.mathworks.com/support/contact_us/
A technical support engineer might contact you with further information.
Thank you for your help.
Output from Visual Studios DEBUGGER
First-chance exception at 0x0c12c128 in MATLAB.exe: 0xC0000005: Access violation reading location 0x00000004.
First-chance exception at 0x0c12c128 in MATLAB.exe: 0xC0000005: Access violation reading location 0x00000004.
First-chance exception at 0x0c12c128 in MATLAB.exe: 0xC0000005: Access violation reading location 0x00000004.
First-chance exception at 0x751d9673 in MATLAB.exe: Microsoft C++ exception: jitCgFailedException at memory location 0x00c3e210..
First-chance exception at 0x751d9673 in MATLAB.exe: Microsoft C++ exception: jitCgFailedException at memory location 0x00c3e400..
First-chance exception at 0x69b93d31 in MATLAB.exe: 0xC0000005: Access violation writing location 0x00000001.
> throw_segv_longjmp_seh_filter()
throw_segv_longjmp_seh_filter(): invoking THROW_SEGV_LONGJMP SEH filter
> mnUnhandledWindowsExceptionFilter()
MATLAB.exe has triggered a breakpoint
Try to approach the problem from the Python side: Python is a great glue language, I would suggest you to have Python run your Matlab and C programs. Python has:
Numpy
PyLab
Matplotlib
IPython
Thus, the combination is a good alternative for almost any existing Matlab module.
With matlab 2014b a possibility to call python functions directly in m code was added.