Python script generates segmentation fault
I cannot find the source of the problem.
How is it to debug a segfault simply in Python ?
Are there recommended coding practices to avoid Segmentation Faults ?
I wrote a script (1600 lines) to systematically download CSV files from a source and format the collected data. The script works very well for 100-200 files, but it systematically crashes with a segmentation fault message after a while.
-Crash always happens at the same spot in the script, with no understandable cause
-I run it on Mac OSX but the crash also happens on Ubuntu Linux and Debian 9
-If I run the Pandas Routines that crash during my script on single files, they work properly. The crash only happens when I loop the script 100-200 times.
-I checked every variable content, constructors (init) and they seem to be fine.
Code is too long to be pasted, but available on demand
Expected result should be execution to the end. Instead, it crashes after 100-200 iterations
Related
I need to instrument a Python script using Intel Pin for ChampSim simulator.
The problem is, whenever I run the tool, the script does not seem to run as nothing is printed. Moreover, no matter how long/complex the script is, the trace always ends up with a size of 62M (this is also the case when I simply instrument the interpreter without any script).
I tried running the solution from this post, but it didn't work either. For reference, I am running the following command:
../../../pin -t obj-intel64/champsim_tracer.so -- ./python_script.py
Is it even possible to instrument a Python script? If yes, please detail the steps. Thanks!
I am currently currently a python code that I believe has some memory issue.
I reasoned it out last night, I believe it is due to some overheads that's been being constantly generated but I don't know what exactly they are. I have tried gccollect() method and it didn't really work. The good thing about my python code is that at each run, I can save the parameters of the partially trained model as pickle. And hopefully, I will shut down the script, and start a new python script and load my saved pickle to continue training.
My strategy is to do the following in a shell script:
for i in 1000:
run `python train_model.py` # this will give me allows a fresh environment with little memory overhead
# during the runs of `train_model.py`, some memory overhead has been accumulated
torch.save(train_model.state_dict())
shut down the train_model.py script
# I am hoping that for the next run of this python script, I can get a fresh environment
Do you think this will work? How do I do this in a shell script?
EDIT2: Stress tested the fix, it still failed after creating near 1TB of intermediary files. Changed .py code to delete intermediary files after performing necessary processes. Stress test succeeded after this change. Original issues likely to do with memory. Still no way of proving.
EDIT: I'm still not sure whether or not it was a memory issue. However I got the full process to run via .bat by breaking up the second script into 4 additional .py files and running those instead of the full second script at once. Problem solved for now.
ORIGINAL:
I am running two .py files through cmd via a .bat file. The reason I am doing this is the first .py is an arcpy file that requires use of 32 bit python, and the second .py is a combined PCI and arcpy file that requires use of 64 bit python.
The first file runs without issue. The second file gets to a certain point, the same point every time, and I am prompted with a "python.exe has stopped working" dialog box. This point is in the middle of a for loop in the PCI section of code. I have run the PCI python in PCI's interpreter without issue. I suspect this might be related to memory issues but am uncertain, and have no idea how to check.
How do I check to see if memory issues are the problem?
.bat code below:
C:/Python27/ArcGIS10.5/python.exe E:/Path/CODE/file1.py
C:/Python27/ArcGISx6410.5/python.exe E:/Path/CODE/file2.py
I'm running a python script that does the following logical steps:
Gets a list of hosts from a DB query
For each of those hosts, gets a csv
Processes the csv with Pandas
Inserts the output in elasticsearch
It works well but as the list of hosts is large in some cases, it can take up to 6 days to finish some of the runs.
The problem is that sometimes it crashes without much information.
This is what the error looks like in /var/log/messages: abrt[3769]:
Saved core dump of pid 24382 (/usr/bin/python) to
/var/spool/abrt/ccpp-2016-02-05-10:33:42-24382 (631136256 bytes)
abrtd: Directory 'ccpp-2016-02-05-10:33:42-24382' creation detected
abrtd: Interpreter crashed, but no packaged script detected: 'python
/home/cloud/collection-all-es-commissioned.py' abrtd: 'post-create' on
'/var/spool/abrt/ccpp-2016-02-05-10:33:42-24382' exited with 1 abrtd:
Deleting problem directory
'/var/spool/abrt/ccpp-2016-02-05-10:33:42-24382'
It happens with python 2.6 and 2.7 in Oracle Linux.
Any ideas on how to find out the root cause and fix it?
Thanks,
Isaac
I've looked at some questions about profiling memory usage in Python programs, but so far haven't been able to get anything to work. My program must run as root (it opens a TUN/TAP device).
First, I tried heapy; unfortunately this didn't work for me. Every time my code tried to execute hpy().heap() the program froze. Not wanting to waste too much timed I decided to try valgrind.
I tried valgrind with massif:
# valgrind --tool=massif ./my_prog.py --some-options value
I think the issue is related to profiling Python programs. I tried my program (which runs as root) and no massif output file was generated. I also wasn't able to generate an output file with another Python program (which doesn't run as root). However, a simple C test program worked fine and produced the massif file.
What are the issues preventing Valgrind and massif from working correctly with Python programs?
Instead of having the script launch the interpreter, directly calling it as a parameter to Valgrind solves the problem.
valgrind --tool=massif python my_script.py