Python crashed; how to decode segfault in dmesg log? - python

I have a Python daemon running on a 64-bit Linux box. It is crashing. Not a friendly, straightforward to debug, Python exception stack trace sort of crash, either-- this is a segmentation fault. Linux's dmesg log has a succinct post-mortem:
python2.7[27509]: segfault at 7fe500000008 ip 00007fe56644a891 sp 00007fe54e1fa230 error 4 in libpython2.7.so.1.0[7fe566359000+193000]
python2.7[23517]: segfault at 7f5600000008 ip 00007f568bb45891 sp 00007f5678e55230 error 4 in libpython2.7.so.1.0[7f568ba54000+193000]
libpython2.7.so.1.0 on this system has symbols and I can run objdump -d to get an assembly language dump. So I'm curious to know which function is causing the segfault.
How can I decode one of these dmesg segfault notices and find the errant function? One line says "7fe566359000+193000" and the next says "7f568ba54000+193000". I'm guessing this means both segfaults come from the same location. 193000 = 0x2f1e8. I thought that 0x2f1e8 would lead to an instruction in the Python library assembly dump, but it didn't; 0x2f1e8 is well out of range of the disassembly.

That is the address from the base of the library load, so you should compare it with the load address of .text as returned by (for example) eu-readelf:
flame#saladin ~ % eu-readelf -S /usr/lib/libpython2.7.so
There are 25 section headers, starting at offset 0x1b1a80:
Section Headers:
[Nr] Name Type Addr Off Size ES Flags Lk Inf Al
[ 0] NULL 0000000000000000 00000000 00000000 0 0 0 0
[….]
[10] .text PROGBITS 000000000003f220 0003f220 000e02a0 0 AX 0 0 16
[….]
What you should be able to do is to use the address you got with the addr2line tool:
addr2line -e /usr/lib/libpython2.7.so 0x6e408
In this case I can't get the data because my copy of the library differs from yours so the address makes no sense.
Of course you still won't get a full backtrace unless you had a core file.

Related

What does Segmentation fault: 11 mean and how do you fix it? [duplicate]

This question already has answers here:
What causes a Python segmentation fault?
(8 answers)
Closed 3 years ago.
I'm fairly new to Python and I am having real trouble because I run into it this Segmentation fault: 11 error.
Here is a simple code example that produces this error every time:
import grequests
class Url(object):
pass
a = Url()
a.url = 'http://www.heroku.com'
a.result = 0
b = Url()
b.url = 'http://www.google.com'
b.result = 0
c = Url()
c.url = 'http://www.wordpress.com'
c.result = 0
urls = [a, b, c]
rs = (grequests.get(u.url) for i, u in enumerate(urls))
grequests.map(rs)
What is absolutely bizarre is that if I replace the urls = ... line with this:
urls = [a, b]
Then I get no error, and the script runs fine.
If I change that to just
urls = [c]
Then I also get no error, and the script runs fine.
If I change c.url = ... to
c.url = "http://yahoo.com"
And revert urls = ... back to
urls = [a, b, c]
Then I do get the segmentation fault: 11 error.
Being a memory issue sounds like a possibility though I'm not sure how to fix it.
I've been stuck on this for a number of days, so any help, no matter how small, is greatly appreciated.
For reference, I'm using macOS High Sierra (10.13.5) and installed Python 3.7.0 using Brew.
Segmentation fault (violation) is caused by an invalid memory reference. Trying to access an address that should not be accessible for current process (could also be buffer overrun or entirely bogus or uninitialized pointer). Usually it would be indicative of a bug in the underlying code or a problem during binary build (linking).
This problem lies not in your Python script, even though you may be able to trigger it by modifying your python code. Even if you for instance exhausted buffers used in a module or by the interpreter itself, it should still handle that situation gracefully.
Given your script, either gevent (dependency of grequests) or your Python (and/or bits of its standard library) are likely places where a segfault could have occurred (or a library is being used that causes it). Perhaps try rebuilding them? Where there any substantial changes around them on your system since the time you've built them? Perhaps they are trying to run against libraries other than they've been originally built against?
You can also allow your system to dump cores (I presume MacOS being essentially BSD can do that) and inspect (load it into a debugger such as gdb) the coredump to see what exactly crashed and what was going on at the time.

OSError 105 : No buffer Space - Zeroconf

I'm using a NanoPi M1 (Allwinner H3 board) & running a Yocto-based OS. On my first encounter with ZeroConf-python,
>>> from zeroconf import Zeroconf, ServiceBrowser
>>> zero = Zeroconf()
I'm getting the error:
File "/usr/lib/python3.5/site-packages/zeroconf.py", line 1523, in __init__
socket.inet_aton(_MDNS_ADDR) + socket.inet_aton(i))
OSError: [Errno 105] No buffer space available
This error doesn't arise when I run it in Raspbian(on RPI).
I've tried to search for fixes to such errors in homeassistant, but none provide a good overview to the real problem, rest-aside the solution.
Update the net/ipv4/igmp_max_memberships value of sysctl to greater than zero.
Execute the following commands on the terminal:
$ sysctl -w net.ipv4.igmp_max_memberships=20 (or any other value greater than zero)
&
$ sysctl -w net.ipv4.igmp_max_msf=10
Then, restart the avahi-daemon
systemctl restart avahi-daemon
You can verify the existing values of the above keys using
'sysctl net.ipv4.igmp_max_memberships'.
An addition to the answer of Neelotpal:
This post includes a nice solution proposal with all options to check for this problem:
# Bigger buffers (to make 40Gb more practical). These are maximums, but the default is unaffected.
net.core.wmem_max=268435456
net.core.rmem_max=268435456
net.core.netdev_max_backlog=10000
# Avoids problems with multicast traffic arriving on non-default interfaces
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.all.rp_filter=0
# Force IGMP v2 (required by CBF switch)
net.ipv4.conf.all.force_igmp_version=2
net.ipv4.conf.default.force_igmp_version=2
# Increase the ARP cache table
net.ipv4.neigh.default.gc_thresh3=4096
net.ipv4.neigh.default.gc_thresh2=2048
net.ipv4.neigh.default.gc_thresh1=1024
# Increase number of multicast groups permitted
net.ipv4.igmp_max_memberships=1024
I don't suggest to just blindly copy these values but to systematically test which one it is that is limiting your resources:
use sysctl <property> to get the currently set value
verify if the property is currently running at the limit by checking system stats
change the configuration as described by Neelotpal with sysctl -w or by changing /etc/sysctl.conf directly and realoading it via sysctl -p
In my case increasing the net.ipv4.igmp_max_memberships did the trick:
I checked the current value with sysctl net.ipv4.igmp_max_memberships which was 20
I checked how many memberships there are with netstat -gn, realizing that my numerous docker containers take up most of that
finally I increased value in syctl.conf, and it worked
And of course it is also good to read up on those properties to understand what they actually do, for example on sysctl-explorer.net.

How do I use `setrlimit` to limit memory usage? RLIMIT_AS kills too soon; RLIMIT_DATA, RLIMIT_RSS, RLIMIT_STACK kill not at all

I'm trying to use setrlimit to limit my memory usage on a Linux system, in order to stop my process from crashing the machine (my code was crashing nodes on a high performance cluster, because a bug led to memory consumption in excess of 100 GiB). I can't seem to find the correct resource to pass to setrlimit; I think it should be resident, which cannot be limited with setrlimit, but I am confused by resident, heap, stack. In the code below; if I uncomment only RLIMIT_AS, the code fails with MemoryError at numpy.ones(shape=(1000, 1000, 10), dtype="f8") even though that array should be only 80 MB. If I uncomment only RLIMIT_DATA, RLIMIT_RSS, or RLIMIT_STACK both arrays get allocated successfully, even though the total memory usage is 2 GB, or twice the desired maximum.
I would like to make make my program fail (no matter how) as soon as it tries to allocate too much RAM. Why do none of RLIMIT_DATA, RLIMIT_RSS, RLIMIT_STACK and RLIMIT_AS do what I mean, and what is the correct resource to pass to setrlimit?
$ cat mwe.py
#!/usr/bin/env python3.5
import resource
import numpy
#rsrc = resource.RLIMIT_AS
#rsrc = resource.RLIMIT_DATA
#rsrc = resource.RLIMIT_RSS
#rsrc = resource.RLIMIT_STACK
soft, hard = resource.getrlimit(rsrc)
print("Limit starts as:", soft, hard)
resource.setrlimit(rsrc, (1e9, 1e9))
soft, hard = resource.getrlimit(rsrc)
print("Limit is now:", soft, hard)
print("Allocating 80 KB, should certainly work")
M1 = numpy.arange(100*100, dtype="u8")
print("Allocating 80 MB, should work")
M2 = numpy.arange(1000*1000*10, dtype="u8")
print("Allocating 2 GB, should fail")
M3 = numpy.arange(1000*1000*250, dtype="u8")
input("Still here…")
Output with the RLIMIT_AS line uncommented:
$ ./mwe.py
Limit starts as: -1 -1
Limit is now: 1000000000 -1
Allocating 80 KB, should certainly work
Allocating 80 MB, should work
Traceback (most recent call last):
File "./mwe.py", line 22, in <module>
M2 = numpy.arange(1000*1000*10, dtype="u8")
MemoryError
Output when running with any of the other ones uncommented:
$ ./mwe.py
Limit starts as: -1 -1
Limit is now: 1000000000 -1
Allocating 80 KB, should certainly work
Allocating 80 MB, should work
Allocating 2 GB, should fail
Still here…
At the final line, top reports that my process is using 379 GB VIRT, 2.0 GB RES.
System details:
$ uname -a
Linux host.somewhere.ac.uk 2.6.32-573.3.1.el6.x86_64 #1 SMP Mon Aug 10 09:44:54 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.7 (Santiago)
$ free -h
total used free shared buffers cached
Mem: 2.0T 1.9T 37G 1.6G 3.4G 1.8T
-/+ buffers/cache: 88G 1.9T
Swap: 464G 4.8M 464G
$ python3.5 --version
Python 3.5.0
$ python3.5 -c "import numpy; print(numpy.__version__)"
1.11.1
Alas I have no answer for your question. But I hope the following might help:
Your script works as expected on my system. Please share exact spec for yours, might be there is a known problem with Linux distro, kernel or even numpy...
You should be OK with RLIMIT_AS. As explained here this should limit the entire virtual memory used by the process. And virtual memory includes all: swap memory, shared libraries, code and data. More details here.
You may add the following function (adopted from this answer) to your script to check actual virtual memory usage at any point:
def peak_virtual_memory_mb():
with open('/proc/self/status') as f:
status = f.readlines()
vmpeak = next(s for s in status if s.startswith("VmPeak:"))
return vmpeak
A general advice, disable swap memory. In my experience with high performance servers it does more harm than solves problems.

How to cause Errno 23 ENFILE on purpose

Is there a way I can cause errno 23 (ENFILE File table overflow) on purpose?
I am doing socket programming and I want to check if creating too many sockets can cause this error. As I understand - created socked is treated as a file descriptor, so it should count towards system limit of opened files.
Here is a part of my python script, which creates the sockets
def enfile():
nofile_soft_limit = 10000
nofile_hard_limit = 20000
resource.setrlimit(resource.RLIMIT_NOFILE, (nofile_soft_limit,nofile_hard_limit))
sock_table = []
for i in range(0, 10000):
print "Creating socket number {0}".format(i)
try:
temp = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.SOL_UDP)
except socket.error as msg:
print 'Failed to create socket. Error code: ' + str(msg[0]) + ' , Error message : ' + msg[1]
print msg[0]
sock_table.append(temp)
With setrlimit() I change the processes limit of open files to a high value, so that I don't get Errno24 (EMFILE).
I have tried two approaches:
1) Per-user limit
by changing /etc/security/limits.conf
root hard nofile 5000
root soft nofile 5000
(logged in with a new session after that)
2) System-wide limit
by changing /etc/sysctl.conf
fs.file-max = 5000
and then run sysctl -p to apply the changes.
My script easily creates 10k sockets despite per-user and system-wide limits, and it ends with errno 24 (EMFILE).
Is it possible to achieve my goal? I am using two OS'es - CentOS 6.7 and Fedora 20. Maybe there are some other settings to make in these system?
Thanks!
ENFILE will only happen if the system-wide limit is reached, whereas the settings you've tried so far are per-process, so only related to EMFILE. For more details including which system-wide settings to change to trigger ENFILE, see this answer: https://stackoverflow.com/a/24862823/4323 as well as https://serverfault.com/questions/122679/how-do-ulimit-n-and-proc-sys-fs-file-max-differ
You should look for an answer in kernel sources.
Socket call returns ENFILE in __sock_create() when sock_alloc() returns NULL. This can happen only if it can't allocate a new inode.
You can use:
df -i
to check for your inodes usage.
Unfortunately the inode limit can't be changed dynamically.
Generally the total number of inodes and the space reserved for these inodes is set when the filesystem is first created.
Solution?
Modern filesystems like Brtfs and XFS use dynamic inodes to avoid inode limits - if you have one of them it could be impossible to do that.
If you have LVM disk, decreasing the size of the volume could help.
But if you want to be sure of simulating a situation from your post you should create googol of files, 1 byte each and you will run out of inodes long before you run out of disk. Then you can try to create socket.
If I am wrong, please correct me.

Embedding Python in MATLAB

I am trying to embed Python 2.6 into MATLAB (7.12). I wanted to embed with a mex file written in C. This worked fine for small simple examples using scalars. However, if Numpy (1.6.1) is imported in anyway MATLAB crashes. I say anyway because I have tried a number of ways to load the numpy libraries including
In the python module (.py):
from numpy import *
With PyRun_SimpleString in the mex file:
PyRun_SimpleString(“from numpy import *”);
Calling numpy functions with Py_oBject_CallObject:
pOut = PyObject_CallObject(pFunc, pArgs);
Originally, I thought this may be a problem with embedding Numpy in C. However, Numpy works fine when embedded in simple C files that are compiled from the command line with /MD (multithread) switch with the Visual Studios 2005 C compiler. Next, I thought I will just change the make file in MATLAB to include the /MD switch. No such luck, mexopts.bat compiles with the /MD switch. I also manually commented out lines in the Numpy init module to find what was crashing MATLAB. It seems that loading any file with the extension pyd crashes MATLAB. The first of such files loaded in NumPy is multiarray.pyd. The MATLAB documentation describes how to debug mex files with visual studios which I did and placed the error message below. At this point I know the problem is a memory problem with the pyd’s and some conflict with MATLAB. Interestingly, I can use a system command in MATLAB to kick off a process in python that uses numpy and no error is generated. I will paste below the error message from MATLAB followed by the DEBUG output in visual studios of the processes that crash MATLAB. However, I am not pasting the whole thing because the list of first-chance exceptions is very long. Are there any suggestions for solving this integration problem?
MATLAB error
Matlab has encountered an internal problem and needs to close
MATLAB crash file:C:\Users\pml355\AppData\Local\Temp\matlab_crash_dump.3484-1:
------------------------------------------------------------------------
Segmentation violation detected at Tue Oct 18 12:19:03 2011
------------------------------------------------------------------------
Configuration:
Crash Decoding : Disabled
Default Encoding: windows-1252
MATLAB License : 163857
MATLAB Root : C:\Program Files\MATLAB\R2011a
MATLAB Version : 7.12.0.635 (R2011a)
Operating System: Microsoft Windows 7
Processor ID : x86 Family 6 Model 7 Stepping 10, GenuineIntel
Virtual Machine : Java 1.6.0_17-b04 with Sun Microsystems Inc. Java HotSpot(TM) Client VM mixed mode
Window System : Version 6.1 (Build 7600)
Fault Count: 1
Abnormal termination:
Segmentation violation
Register State (from fault):
EAX = 00000001 EBX = 69c38c20
ECX = 00000001 EDX = 24ae1da8
ESP = 0088af0c EBP = 0088af44
ESI = 69c38c20 EDI = 24ae1da0
EIP = 69b93d31 EFL = 00010202
CS = 0000001b DS = 00000023 SS = 00000023
ES = 00000023 FS = 0000003b GS = 00000000
Stack Trace (from fault):
[ 0] 0x69b93d31 C:/Python26/Lib/site-packages/numpy/core/multiarray.pyd+00081201 ( ???+000000 )
[ 1] 0x69bfead4 C:/Python26/Lib/site-packages/numpy/core/multiarray.pyd+00518868 ( ???+000000 )
[ 2] 0x69c08039 C:/Python26/Lib/site-packages/numpy/core/multiarray.pyd+00557113 ( ???+000000 )
[ 3] 0x08692b09 C:/Python26/python26.dll+00076553 ( PyEval_EvalFrameEx+007833 )
[ 4] 0x08690adf C:/Python26/python26.dll+00068319 ( PyEval_EvalCodeEx+002255 )
This error was detected while a MEX-file was running. If the MEX-file
is not an official MathWorks function, please examine its source code
for errors. Please consult the External Interfaces Guide for information
on debugging MEX-files.
If this problem is reproducible, please submit a Service Request via:
http://www.mathworks.com/support/contact_us/
A technical support engineer might contact you with further information.
Thank you for your help.
Output from Visual Studios DEBUGGER
First-chance exception at 0x0c12c128 in MATLAB.exe: 0xC0000005: Access violation reading location 0x00000004.
First-chance exception at 0x0c12c128 in MATLAB.exe: 0xC0000005: Access violation reading location 0x00000004.
First-chance exception at 0x0c12c128 in MATLAB.exe: 0xC0000005: Access violation reading location 0x00000004.
First-chance exception at 0x751d9673 in MATLAB.exe: Microsoft C++ exception: jitCgFailedException at memory location 0x00c3e210..
First-chance exception at 0x751d9673 in MATLAB.exe: Microsoft C++ exception: jitCgFailedException at memory location 0x00c3e400..
First-chance exception at 0x69b93d31 in MATLAB.exe: 0xC0000005: Access violation writing location 0x00000001.
> throw_segv_longjmp_seh_filter()
throw_segv_longjmp_seh_filter(): invoking THROW_SEGV_LONGJMP SEH filter
> mnUnhandledWindowsExceptionFilter()
MATLAB.exe has triggered a breakpoint
Try to approach the problem from the Python side: Python is a great glue language, I would suggest you to have Python run your Matlab and C programs. Python has:
Numpy
PyLab
Matplotlib
IPython
Thus, the combination is a good alternative for almost any existing Matlab module.
With matlab 2014b a possibility to call python functions directly in m code was added.

Categories

Resources