Python: CrashingPython explanation

Python: CrashingPython explanation - python

I have a code in Python that makes the python interpreter crash randomly. I have tried to isolate the source of the problem, but I am still investigating. While searching on the net for problems that could make the interpreter crash, I stumble upon this:
def crash():
'''\
crash the Python interpreter...
'''
i = ctypes.c_char('a')
j = ctypes.pointer(i)
c = 0
while True:
j[c] = 'a'
c += 1
j
http://wiki.python.org/moin/CrashingPython
Since I am using Ctypes, I think that the problem could be related to the way the Ctypes is used. So I am trying to understand why that code could make Python interpreter crash. Understanding it would help investigate my problem in my Ctypes code.
Can anybody explain this?
Help would be appreciate.

It makes a pointer to memory that's likely to be unwritable, and writes to it.
The numerical value of a is very small, and very low memory addresses are typically not writable, causing a crash when you try to write to them.
Should the initial write succeed, it keeps trying successive addresses until it finds one that isn't writable. Not all memory addresses are writable, so it's bound to crash eventually.
(Why it doesn't simply start at address zero I don't know - that's a bit odd. Perhaps ctypes explicitly protects against that?)

The problem seems to be that there you're writing to memory locations indefinitely. So it will come the time when the memory accessed will be unwritable and the program will crash.

Related

How to change memory of dll process?

when i try to change the memory nothing happens because its targeting the wrong thing, i'm not sure how to get the specific process of aj classic and how to target "pepflashplayer.dll" when changing memory
the pointer is for "pepflashplayer.dll", got the pointer via cheat engine, does not work if you choose the aj classic application, you have to go into processes and try them until you get the 1 specific process where it works, i'm new to this so i'm writing it in python since its simple and easy
i tried:
rwm = ReadWriteMemory()
self.process = rwm.get_process_by_name('AJ Classic.exe')
which did not change the memory how i wanted it to. this question may be very dumb but ¯\_(ツ)__/¯

Python3 Search the virtual memory of a running windows process

begin TLDR;
I want to write a python3 script to scan through the memory of a running windows process and find strings.
end TLDR;
This is for a CTF binary. It's a typical Windows x86 PE file. The goal is simply to get a flag from the processes memory as it runs. This is easy with ProcessHacker you can search through the strings in the memory of the running application and find the flag with a regex. Now because I'm a masochistic geek I strive to script out solutions for CTFs (for everything really). Specifically I want to use python3, C# is also an option but would really like to keep all of the solution scripts in python.
Thought this would be a very simple task. You know... pip install some library written by someone that's already solved the problem and use it. Couldn't find anything that would let me do what I need for this task. Here are the libraries I tried out already.
ctypes - This was the first one I used, specifically ReadProcessMemory. Kept getting 299 errors which was because the buffer I was passing in was larger than that section of memory so I made a recursive function that would catch that exception, divide the buffer length by 2 until it got something THEN would read one byte at a time until it hit a 299 error. May have been on the right track there but I wasn't able to get the flag. I WAS able to find the flag only if I knew the exact address of the flag (which I'd get from process hacker). I may make a separate question on SO to address that, this one is really just me asking the community if something already exists before diving into this.
pymem - A nice wrapper for ctypes but had the same issues as above.
winappdbg - python2.x only. I don't want to use python 2.x.
haystack - Looks like this depends on winappdbg which depends on python 2.x.
angr - This is a possibility, Only scratched the surface with it so far. Looks complicated and it's on the to learn list but don't want to dive into something right now that's not going to solve the issue.
volatility - Looks like this is meant for working with full RAM dumps not for hooking into currently running processes and reading the memory.
My plan at the moment is to dive a bit more into angr to see if that will work, go back to pymem/ctypes and try more things. If all else fails ProcessHacker IS opensource. I'm not fluent in C so it'll take time to figure out how they're doing it. Really hoping there's some python3 library I'm missing or maybe I'm going about this the wrong way.

Ended up writing the script using the frida library. Also have to give soutz to rootbsd because his or her code in the fridump3 project helped greatly.

Python crashes in rare cases when running code - how to debug?

I have a problem that I seriously spent months on now!
Essentially I am running code that requires to read from and save to HD5 files. I am using h5py for this.
It's very hard to debug because the problem (whatever it is) only occurs in like 5% of the cases (each run takes several hours) and when it gets there it crashes python completely so debugging with python itself is impossible. Using simple logs it's also impossible to pinpoint to the exact crashing situation - it appears to be very random, crashing at different points within the code, or with a lag.
I tried using OllyDbg to figure out whats happening and can safely conclude that it consistently crashes at the following location: http://i.imgur.com/c4X5W.png
It seems to be shortly after calling the python native PyObject_ClearWeakRefs, with an access violation error message. The weird thing is that the file is successfully written to. What would cause the access violation error? Or is that python internal (e.g. the stack?) and not file (i.e. my code) related?
Has anyone an idea whats happening here? If not, is there a smarter way of finding out what exactly is happening? maybe some hidden python logs or something I don't know about?
Thank you

PyObject_ClearWeakRefs is in the python interpreter itself. But if it only happens in a small number of runs, it could be hardware related. Things you could try:
Run your program on a different machine. if it doesn't crash there, it is probably a hardware issue.
Reinstall python, in case the installed version has somehow become corrupted.
Run a memory test program.

Thanks for all the answers. I ran two versions this time, one with a new python install and my same program, another one on my original computer/install, but replacing all HDF5 read/write procedures with numpy read/write procedures.
The program continued to crash on my second computer at odd times, but on my primary computer I had zero crashes with the changed code. I think it is thus safe to conclude that the problems were HDF5 or more specifically h5py related. It appears that more people encountered issues with h5py in that respect. Given that any error in my application translates to potentially large financial losses I decided to dump HDF5 completely in favor of other stable solutions.

Use a try catch statement. This can be put into the program in order to stop the program from crashing when erroneous data is entered

Pythonistas, please help convert this to utilize Python Threading concepts

Update : For anyone wondering what I went with at the end -
I divided the result-set into 4 and ran 4 instances of the same program with one argument each indicating what set to process. It did the trick for me. I also consider PP module. Though it worked, it prefer the same program. Please pitch in if this is a horrible implementation! Thanks..
Following is what my program does. Nothing memory intensive. It is serial processing and boring. Could you help me convert this to more efficient and exciting process? Say, I process 1000 records this way and with 4 threads, I can get it to run in 25% time!
I read articles on how python threading can be inefficient if done wrong. Even python creator says the same. So I am scared and while I am reading more about them, want to see if bright folks on here can steer me in the right direction. Muchos gracias!
def startProcessing(sernum, name):
'''
Bunch of statements depending on result,
will write to database (one update statement)
Try Catch blocks which upon failing,
will call this function until the request succeeds.
'''
for record in result:
startProc = startProcessing(str(record[0]), str(record[1]))

Python threads can't run at the same time due to the Global Interpreter Lock; you want new processes instead. Look at the multiprocessing module.
(I was instructed to post this as an answer =p.)

Python - Working around memory leaks

I have a Python program that runs a series of experiments, with no data intended to be stored from one test to another. My code contains a memory leak which I am completely unable to find (I've look at the other threads on memory leaks). Due to time constraints, I have had to give up on finding the leak, but if I were able to isolate each experiment, the program would probably run long enough to produce the results I need.
Would running each test in a separate thread help?
Are there any other methods of isolating the effects of a leak?
Detail on the specific situation
My code has two parts: an experiment runner and the actual experiment code.
Although no globals are shared between the code for running all the experiments and the code used by each experiment, some classes/functions are necessarily shared.
The experiment runner isn't just a simple for loop that can be easily put into a shell script. It first decides on the tests which need to be run given the configuration parameters, then runs the tests then outputs the data in a particular way.
I tried manually calling the garbage collector in case the issue was simply that garbage collection wasn't being run, but this did not work
Update
Gnibbler's answer has actually allowed me to find out that my ClosenessCalculation objects which store all of the data used during each calculation are not being killed off. I then used that to manually delete some links which seems to have fixed the memory issues.

You can use something like this to help track down memory leaks
>>> from collections import defaultdict
>>> from gc import get_objects
>>> before = defaultdict(int)
>>> after = defaultdict(int)
>>> for i in get_objects():
... before[type(i)] += 1
...
now suppose the tests leaks some memory
>>> leaked_things = [[x] for x in range(10)]
>>> for i in get_objects():
... after[type(i)] += 1
...
>>> print [(k, after[k] - before[k]) for k in after if after[k] - before[k]]
[(<type 'list'>, 11)]
11 because we have leaked one list containing 10 more lists

Threads would not help. If you must give up on finding the leak, then the only solution to contain its effect is running a new process once in a while (e.g., when a test has left overall memory consumption too high for your liking -- you can determine VM size easily by reading /proc/self/status in Linux, and other similar approaches on other OS's).
Make sure the overall script takes an optional parameter to tell it what test number (or other test identification) to start from, so that when one instance of the script decides it's taking up too much memory, it can tell its successor where to restart from.
Or, more solidly, make sure that as each test is completed its identification is appended to some file with a well-known name. When the program starts it begins by reading that file and thus knows what tests have already been run. This architecture is more solid because it also covers the case where the program crashes during a test; of course, to fully automate recovery from such crashes, you'll want a separate watchdog program and process to be in charge of starting a fresh instance of the test program when it determines the previous one has crashed (it could use subprocess for the purpose -- it also needs a way to tell when the sequence is finished, e.g. a normal exit from the test program could mean that while any crash or exit with a status != 0 signify the need to start a new fresh instance).
If these architectures appeal but you need further help implementing them, just comment to this answer and I'll be happy to supply example code -- I don't want to do it "preemptively" in case there are as-yet-unexpressed issues that make the architectures unsuitable for you. (It might also help to know what platforms you need to run on).

I had the same problem with a third party C library which was leaking. The most clean work-around that I could think of was to fork and wait. The advantage of it is that you don't even have to create a separate process after each run. You can define the size of your batch.
Here's a general solution (if you ever find the leak, the only change you need to make is to change run() to call run_single_process() instead of run_forked() and you'll be done):
import os,sys
batchSize = 20
class Runner(object):
def __init__(self,dataFeedGenerator,dataProcessor):
self._dataFeed = dataFeedGenerator
self._caller = dataProcessor
def run(self):
self.run_forked()
def run_forked(self):
dataFeed = self._dataFeed
dataSubFeed = []
for i,dataMorsel in enumerate(dataFeed,1):
if i % batchSize > 0:
dataSubFeed.append(dataMorsel)
else:
self._dataFeed = dataSubFeed
self.fork()
dataSubFeed = []
if self._child_pid is 0:
self.run_single_process()
self.endBatch()
def run_single_process(self)
for dataMorsel in self._dataFeed:
self._caller(dataMorsel)
def fork(self):
self._child_pid = os.fork()
def endBatch(self):
if self._child_pid is not 0:
os.waitpid(self._child_pid, 0)
else:
sys.exit() # exit from the child when done
This isolates the memory leak to the child process. And it will never leak more times than the value of the batchSize variable.

I would simply refactor the experiments into individual functions (if not like that already) then accept an experiment number from the command line which calls the single experiment function.
The just bodgy up a shell script as follows:
#!/bin/bash
for expnum in 1 2 3 4 5 6 7 8 9 10 11 ; do
python youProgram ${expnum} otherParams
done
That way, you can leave most of your code as-is and this will clear out any memory leaks you think you have in between each experiment.
Of course, the best solution is always to find and fix the root cause of a problem but, as you've already stated, that's not an option for you.
Although it's hard to imagine a memory leak in Python, I'll take your word on that one - you may want to at least consider the possibility that you're mistaken there, however. Consider raising that in a separate question, something that we can work on at low priority (as opposed to this quick-fix version).
Update: Making community wiki since the question has changed somewhat from the original. I'd delete the answer but for the fact I still think it's useful - you could do the same to your experiment runner as I proposed the bash script for, you just need to ensure that the experiments are separate processes so that memory leaks dont occur (if the memory leaks are in the runner, you're going to have to do root cause analysis and fix the bug properly).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.