I just got the following error and I have no idea what to make of it.
Unhandled exception in thread started by <bound method Timer.__bootstrap of <Timer(Thread-3, stopped -1234564240)>>
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 525, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 565, in __bootstrap_inner
(self.name, _format_exc()))
File "/usr/lib/python2.7/traceback.py", line 241, in format_exc
return ''.join(format_exception(etype, value, tb, limit))
File "/usr/lib/python2.7/traceback.py", line 141, in format_exception
list = list + format_tb(tb, limit)
File "/usr/lib/python2.7/traceback.py", line 76, in format_tb
return format_list(extract_tb(tb, limit))
File "/usr/lib/python2.7/traceback.py", line 101, in extract_tb
line = linecache.getline(filename, lineno, f.f_globals)
File "/usr/lib/python2.7/linecache.py", line 14, in getline
lines = getlines(filename, module_globals)
File "/usr/lib/python2.7/linecache.py", line 40, in getlines
return updatecache(filename, module_globals)
File "/usr/lib/python2.7/linecache.py", line 133, in updatecache
lines = fp.readlines()
MemoryError
Relevant code (although I'm not sure if it's actually relevant - it's just the only part of my code that is in any way mentioned by the exception):
class Timer(threading.Thread):
def __init__(self, interval, callback, limit=0, args=[], kwargs={}):
threading.Thread.__init__(self)
self.interval = interval / 1000.0
self.callback = callback
self.limit = limit
self.args = args
self.kwargs = kwargs
self.iterations = 0
self._stop = threading.Event()
def restart(self):
self.iterations = 0
self._stop.clear()
threading.Thread.__init__(self)
self.start()
def run(self):
while not self._stop.wait(self.interval):
self.callback(*self.args, **self.kwargs)
self.iterations += 1
if self.limit > 0 and self.iterations >= self.limit:
break
def stop(self):
self._stop.set()
def stopped(self):
return self._stop.isSet()
I think it was around this time that the server on which I was running the code was sort of falling apart - was this just a symptom, or was it indicative of problems with my code somewhere else?
Mostly, though, I just want to know what the hell this means, I can probably figure out the rest.
You ran out of memory. From the python docs on exceptions:
exception MemoryError
Raised when an operation runs out of memory but
the situation may still be rescued (by deleting some objects). The
associated value is a string indicating what kind of (internal)
operation ran out of memory. Note that because of the underlying
memory management architecture (C’s malloc() function), the
interpreter may not always be able to completely recover from this
situation; it nevertheless raises an exception so that a stack
traceback can be printed, in case a run-away program was the cause.
So you either:
Ran out of system memory (you filled up all your physical RAM, and all of your pagefile.) This is entirely possible to do if you had a runaway loop creating lots of data very fast.
You ran into the 2GB per-process RAM limit.
Note that Python on 32-bit systems has a 2G memory limit regardless of how much physical ram you have, or if PAE is enabled. This isn't Python-specific - it's an operating system limitation.
It probably wasn't the Timer class that caused the problem - it's just that you happened to run out of memory while doing something with a Timer.
From the Python Docs...
exception MemoryError
Raised when an operation runs out of memory but the situation may still be rescued
(by deleting some objects). The associated value is a string indicating what kind
of (internal) operation ran out of memory. Note that because of the underlying
memory management architecture (C’s malloc() function), the interpreter may not
always be able to completely recover from this situation; it nevertheless raises
an exception so that a stack traceback can be printed, in case a run-away program
was the cause.
Related
I'm offloading a task to a separate process to protect my memory space. The process runs a cythonized C library that tends to not fully clean up after itself. The result is then returned through multiprocessing.Queue. However, once the item being return reaches a certain size, the Queue.get method stalls.
I'm using the processify wrapper from https://gist.github.com/schlamar/2311116 which wraps the function call in a Process.
My test function is
#processify
def do_something(size: int):
return np.zeros(shape=size, dtype=np.uint8)
My test code is
if __name__ == '__main__':
for k in range(1, 11):
print(f"{k*256}MB")
t0 = time()
do_something(k * 256 * 1024 * 1024)
print(f"Took {time()-t0:0.1f}s")
This runs smoothly until 2048MB where it just stays for minutes (no CPU activity) until I cancel the process:
256MB
Took 0.7s
512MB
Took 1.6s
768MB
Took 2.1s
1024MB
Took 2.7s
1280MB
Took 3.4s
1536MB
Took 4.0s
1792MB
Took 4.6s
2048MB
^CTraceback (most recent call last):
File "processify.py", line 63, in <module>
do_something(k * 256 * 1024 * 1024)
File "processify.py", line 40, in wrapper
ret, error = q.get()
File "/home/.../python3.7/multiprocessing/queues.py", line 94, in get
res = self._recv_bytes()
File "/home/.../python3.7/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/.../python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/.../python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
From the stack trace it becomes evident that the Queue.get function is waiting. If I add print statements, I can see that Queue.put has already finished at this time so the return value should be inside the Queue. I also tried to run without Process.join as suggested by a commend in the GitHub gist, but that didn't help either.
I know that this kind of design is suboptimal and I should probably fix the cython library so that I don't need to offload in the first place. Yet, I would like to know if there's an inherent limitation in python's multiprocessing that doesn't allow for objects of a certain size to pass through the Queue.
Thank you all in advance!
I'm using python to do some processing on text files and am having issues with MemoryErrors. Sometimes the file being processed is quite large which means that too much RAM is being used by a multiprocessing Process.
Here is a snippet of my code:
import multiprocessing as mp
import os
def preprocess_file(file_path):
with open(file_path, "r+") as f:
file_contents = f.read()
# modify the file_contents
# ...
# overwrite file
f.seek(0)
f.write(file_contents)
f.truncate()
if __name__ == "main":
with mp.Pool(mp.cpu_count()) as pool:
pool_processes = []
# for all files in dir
for root, dirs, files in os.walk(some_path):
for f in files:
pool_processes.append(os.path.join(root, f))
# start the processes
pool.map(preprocess_file, pool_processes)
I have tried to use the resource package to set a limit to how much RAM each process can use as shown below but this hasn't fixed the issue, and I still get MemoryErrors being raised which leads me to believe it's the pool.map which is causing issues. I was hoping to have each process deal with the exception individually so that the file could be skipped rather than crashing the whole program.
import resource
def preprocess_file(file_path):
try:
hard = os.sysconf("SC_PAGE_SIZE") * os.sysconf("SC_PHYS_PAGES") # total bytes of RAM in machine
soft = (hard - 512 * 1024 * 1024) // mp.cpu_count() # split between each cpu and save 512MB for the system
resource.setrlimit(resource.RLIMIT_AS, (soft, hard)) # apply limit
with open(file_path, "r+") as f:
# ...
except Exception as e: # bad practice - should be more specific but just a placeholder
# ...
How can I let an individual process run out of memory while letting the other processes continue unaffected? Ideally I want to catch the exception within the preprocess_file file so that I can log exactly which file caused the error.
Edit: The preprocess_file function does not share data with any other processes so there is no need for shared memory. The function also needs to read the entire file at once as the file is reformatted which cannot be done line by line.
Edit 2: The traceback from the program is below. As you can see, the error doesn't actually point to the file being run, and instead comes from the package's files.
Process ForkPoolWorker-2:
Traceback (most recent call last):
File "/usr/lib64/python3.6/multiprocessing/pool.py", line 125, in worker
File "/usr/lib64/python3.6/multiprocessing/queues.py", line 341, in put
File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 51, in dumps
File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 39, in __init__
MemoryError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
File "/usr/lib64/python3.6/multiprocessing/pool.py", line 130, in worker
File "/usr/lib64/python3.6/multiprocessing/queues.py", line 341, in put
File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 51, in dumps
File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 39, in __init__
MemoryError
If MemoryError is raised, the worker process may or may not be able to recover from the situation. If it do, as #Thomas suggest, catch the MemoryError somewhere.
import multiprocessing as mp
from time import sleep
def initializer():
# Probably set the memory limit here
pass
def worker(i):
sleep(1)
try:
if i % 2 == 0:
raise MemoryError
except MemoryError as ex:
return str(ex)
return i
if __name__ == '__main__':
with mp.Pool(2, initializer=initializer) as pool:
tasks = range(10)
results = pool.map(worker, tasks)
print(results)
If the worker cannot recover, the whole pool is unlikely working. For example, change worker to do a force exit:
def worker(i):
sleep(1)
try:
if i % 2 == 0:
raise MemoryError
elif i == 5:
import sys
sys.exit()
except MemoryError as ex:
return str(ex)
return i
the Pool.map never return and block forever.
I am working on html parser, it uses Python multiprocessing Pool, because it runs through huge number of pages. The output from every page is saved to a separate CSV file. The problem is sometimes I get unexpected error and whole program crashes and I have errors handling almost everywhere - reading pages, parsing pages, even writing files. Moreover it looks like the script crashes after it finishes writing a batch of files, so it shouldn't be anything to crush on. Thus after whole day of debugging I am left clueless.
Error:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "D:\ppp\Python\parser\run.py", line 244, in media_process
save_media_product(DIRECTORY, category, media_data)
File "D:\ppp\Python\parser\manage_output.py", line 180, in save_media_product
_file_manager(target_file, temp, temp2)
File "D:\ppp\Python\store_parser\manage_output.py", line 214, in _file_manager
file_to_write.close()
UnboundLocalError: local variable 'file_to_write' referenced before assignment
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\ppp\Python\store_parser\run.py", line 356, in <module>
main()
File "D:\Rzeczy Mariusza\Python\store_parser\run.py", line 318, in main
process.map(media_process, batch)
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 644, in get
raise self._value
UnboundLocalError: local variable 'file_to_write' referenced before assignment
It look like, there is an error with variable assignment, but it is not:
try:
file_to_write = open(target_file, 'w')
except OSError:
message = 'OSError while writing file name - {}'.format(target_file)
log_error(message)
except UnboundLocalError:
message = 'UnboundLocalError while writing file name - {}'.format(target_file)
log_error(message)
except Exception as e:
message = 'Total failure "{}" while writing file name - {}'.format(e, target_file)
log_error(message)
else:
file_to_write.write(temp)
file_to_write.write(temp2)
finally:
file_to_write.close()
Line - except Exception as e:, does not help with anything, the whole thing still crashes. So far i have excluded only Out Of Memory scenario, because this script is designed to be handled on low spec VPS, but in testing stage I run it in environment with 8 GB of ram. So if You have any theories please share.
The exception really says what is happening.
This part is telling you obvious issue:
UnboundLocalError: local variable 'file_to_write' referenced before assignment
Even you have try/except blocks that catches various exceptions, else/finally doesn't.
More specifically in finally block you reference variable that might not exist since exception with doing: file_to_write = open(target_file, 'w') is being handled by at least last except Exception as e block, but then finally is run too.
Since exception happened as a result of not being able to open target file, you do not have anything assigned to file_to_write and that variable doesn't exist after exception is handled. That is why finally block crashes.
I am trying to use the example Pika Async consumer (http://pika.readthedocs.io/en/0.10.0/examples/asynchronous_consumer_example.html) as a multiprocessing process (by making the ExampleConsumer class subclass multiprocessing.Process). However, I'm running into some issues with gracefully shutting down everything.
Let's say for example I have defined my procs as below:
for k, v in queues_callbacks.iteritems():
proc = ExampleConsumer(queue, k, v, rabbit_user, rabbit_pw, rabbit_host, rabbit_port)
"queues_callbacks" is basically just a dictionary of exchange : callback_function (ideally I'd like to be able to connect to several exchanges with this architecture).
Then I do the normal python way of dealing with starting processes:
try:
for proc in self.consumers:
proc.start()
for proc in self.consumers:
proc.join()
except KeyboardInterrupt:
for proc in self.consumers:
proc.terminate()
proc.join(1)
The issue is coming when I try to stop everything. Let's say I've overriden the "terminate" method to call the consumer's "stop" method then continue on with the normal terminate of Process. With this structure, I am getting some strange attribute errors
Traceback (most recent call last):
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 154, in <module>
main()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 150, in main
mybot.start()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 71, in start
self.stop()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 53, in stop
self.__stop_consumers__()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 130, in __stop_consumers__
self.consumers[0].terminate()
File "/Users/christopheralexander/PycharmProjects/new_bot/rabbit_consumer.py", line 414, in terminate
self.stop()
File "/Users/christopheralexander/PycharmProjects/new_bot/rabbit_consumer.py", line 399, in stop
self._connection.ioloop.start()
AttributeError: 'NoneType' object has no attribute 'ioloop'
It's as if these attributes somehow disappear at some point. In the particular case above, _connection is initialized as None, but then gets set when the Consumer is started. However, when the "stop" method is called, it has already reverted back to None (with nothing set to do so). I'm also observing other strange behavior, such as times when it appears that things are getting called twice (even though "stop" is called once). Any ideas as to what is going on here, or is this not the proper way of architecting this?
Thanks!
Sorry in advance, this is going to be long ...
Possibly related:
Python Multiprocessing atexit Error "Error in atexit._run_exitfuncs"
Definitely related:
python parallel map (multiprocessing.Pool.map) with global data
Keyboard Interrupts with python's multiprocessing Pool
Here's a "simple" script I hacked together to illustrate my problem...
import time
import multiprocessing as multi
import atexit
cleanup_stuff=multi.Manager().list([])
##################################################
# Some code to allow keyboard interrupts
##################################################
was_interrupted=multi.Manager().list([])
class _interrupt(object):
"""
Toy class to allow retrieval of the interrupt that triggered it's execution
"""
def __init__(self,interrupt):
self.interrupt=interrupt
def interrupt():
was_interrupted.append(1)
def interruptable(func):
"""
decorator to allow functions to be "interruptable" by
a keyboard interrupt when in python's multiprocessing.Pool.map
**Note**, this won't actually cause the Map to be interrupted,
It will merely cause the following functions to be not executed.
"""
def newfunc(*args,**kwargs):
try:
if(not was_interrupted):
return func(*args,**kwargs)
else:
return False
except KeyboardInterrupt as e:
interrupt()
return _interrupt(e) #If we really want to know about the interrupt...
return newfunc
#atexit.register
def cleanup():
for i in cleanup_stuff:
print(i)
return
#interruptable
def func(i):
print(i)
cleanup_stuff.append(i)
time.sleep(float(i)/10.)
return i
#Must wrap func here, otherwise it won't be found in __main__'s dict
#Maybe because it was created dynamically using the decorator?
def wrapper(*args):
return func(*args)
if __name__ == "__main__":
#This is an attempt to use signals -- I also attempted something similar where
#The signals were only caught in the child processes...Or only on the main process...
#
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#Try 2 with signals (only catch signal on main process)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#def startup(): signal.signal(signal.SIGINT,signal.SIG_IGN)
#p=multi.Pool(processes=4,initializer=startup)
#Try 3 with signals (only catch signal on child processes)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,signal.SIG_IGN)
#def startup(): signal.signal(signal.SIGINT,onSigInt)
#p=multi.Pool(processes=4,initializer=startup)
p=multi.Pool(4)
try:
out=p.map(wrapper,range(30))
#out=p.map_async(wrapper,range(30)).get() #This doesn't work either...
#The following lines don't work either
#Effectively trying to roll my own p.map() with p.apply_async
# results=[p.apply_async(wrapper,args=(i,)) for i in range(30)]
# out = [ r.get() for r in results() ]
except KeyboardInterrupt:
print ("Hello!")
out=None
finally:
p.terminate()
p.join()
print (out)
This works just fine if no KeyboardInterrupt is raised. However, if I raise one, the following exception occurs:
10
7
9
12
^CHello!
None
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
error: [Errno 2] No such file or directory
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
socket.error: [Errno 2] No such file or directory
Interestingly enough, the code does exit the Pool.map function without calling any of the additional functions ... The problem seems to be that the KeyboardInterrupt isn't handled properly at some point, but it is a little confusing where that is, and why it isn't handled in interruptable. Thanks.
Note, the same problem happens if I use out=p.map_async(wrapper,range(30)).get()
EDIT 1
A little closer ... If I enclose the out=p.map(...) in a try,except,finally clause, it gets rid of the first exception ... the other ones are still raised in atexit however. The code and traceback above have been updated.
EDIT 2
Something else that does not work has been added to the code above as a comment. (Same error). This attempt was inspired by:
http://jessenoller.com/2009/01/08/multiprocessingpool-and-keyboardinterrupt/
EDIT 3
Another failed attempt using signals added to the code above.
EDIT 4
I have figured out how to restructure my code so that the above is no longer necessary. In the (unlikely) event that someone stumbles upon this thread with the same use-case that I had, I will describe my solution ...
Use Case
I have a function which generates temporary files using the tempfile module. I would like those temporary files to be cleaned up when the program exits. My initial attempt was to pack each temporary file name into a list and then delete all the elements of the list with a function registered via atexit.register. The problem is that the updated list was not being updated across multiple processes. This is where I got the idea of using multiprocessing.Manager to manage the list data. Unfortunately, this fails on a KeyboardInterrupt no matter how hard I tried because the communication sockets between processes were broken for some reason. The solution to this problem is simple. Prior to using multiprocessing, set the temporary file directory ... something like tempfile.tempdir=tempfile.mkdtemp() and then register a function to delete the temporary directory. Each of the processes writes to the same temporary directory, so it works. Of course, this solution only works where the shared data is a list of files that needs to be deleted at the end of the program's life.