To my mind, I have a fairly simple long-IO operation that could be refined using threading. I've built a DearPyGui GUI interface (not explicitly related to the problem - just background info). A user can load a file via the package's file loader. Some of these files can be quite large (3 GB). Therefore, I'm adding a pop-up window to lock the interface (modal) whilst the file is loading. The above was context, and the problem is not the DearPyGUI.
I'm starting a thread inside a method of a class instance, which in turn calls (via being the thread's target) a further method (from the same object) and then updates an attribute of that object, which is to be interrogated later. For example:
class IOClass:
__init__(self):
self.fileObj = None
def loadFile(self, fileName):
thread = threading.Thread(target=self.threadMethod, args=fileName)
thread.start()
#Load GUI wait-screen
thread.join()
#anything else..EXCEPTION THROWN HERE
print(" ".join(["Version:", self.fileObj.getVersion()]))
def threadMethod(self, fileName):
print(" ".join(["Loading filename", fileName]))
#expensive-basic Python IO operation here
self.fileObj = ...python IO operation here
class GUIClass:
__init__(self):
pass
def startMethod(self):
#this is called by __main__
ioClass = IOClass()
ioClass.loadFile("filename.txt")
Unfortunately, I get this error:
Exception in thread Thread-1 (loadFile):
Traceback (most recent call last):
File "/home/anthony/anaconda3/envs/CPRD-software/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
self.run()
File "/home/anthony/anaconda3/envs/CPRD-software/lib/python3.10/threading.py", line 946, in run
self._target(*self._args, **self._kwargs)
TypeError: AnalysisController.loadFile() takes 2 positional arguments but 25 were given
Traceback (most recent call last):
File "/home/anthony/CPRD-software/GUI/Controllers/AnalysisController.py", line 117, in loadStudySpace
print(" ".join(["Version:", self.fileObj.getVersion()]))
AttributeError: 'NoneType' object has no attribute 'getVersion'
I'm not sure what's going on. The machine should sit there for at least 3 minutes as the data is loaded. But instead, it appears to perform join, but the main thread doesn't wait for the IO thread to load the file, instead attempting to class a method on what was loaded in.
I solved it. In the threading.Thread() do not call the method using self. Instead, pass self in as an argument to the thread method e.g.,
thread = threading.Thread(target=threadMethod, args=(self, fileName))
The target function doesn't change i.e. it remains as so:
def threadMethod(self, fileName):
#expensive-basic Python IO operation here
self.fileObj = ...python IO operation here
Related
I am trying to use a Timer inside a method, as a way to wait some time, without blocking the script, as time.sleep would.
Outside of the class, the Timer code runs fine, but inside a class method, it returns the error:
TypeError: 'NoneType' object is not callable
import time
from threading import Timer
openDuration = 10
### Valve class
class Valve():
def open(self, openDuration):
t = Timer(openDuration, Valve().dummyWait())
t.start()
t.join()
print("Valve open")
def dummyWait(self): # empty. Just used to wait some time
pass
Valve().open(openDuration)
While the code does run and prints "Valve open" after 10s, it returns this error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\threading.py", line 917, in _bootstrap_inner
self.run()
File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\threading.py", line 1166, in run
self.function(*self.args, **self.kwargs)
TypeError: 'NoneType' object is not callable
What is causing this error message? It is my understanding, that I am using two variables, openduration & t, both of which get defined, therefore I don't understand the NoneType Error message here.
Timer takes 2 arguments - time to wait, and function to execute after wait time is reached.
Hence, you don't have to create an object when passing the second argument, just pass the function to be run.
Try like below snippet:
import time
from threading import Timer
openDuration = 10
### Valve class
class Valve():
def open(self, openDuration):
t = Timer(openDuration, self.dummyWait)
t.start()
t.join()
print("Valve open")
def dummyWait(self): # empty. Just used to wait some time
pass
Valve().open(openDuration)
Ok, this one has me tearing my hair out:
I have a multi-process program, with separate workers each working on a given task.
When a KeyboardInterrupt comes, I want each worker to save its internal state to a file, so it can continue where it left off next time.
HOWEVER...
It looks like the dictionary which contains information about the state is vanishing before this can happen!
How? The exit() function is accessing a more globally scoped version of the dictionary... and it turns out that the various run() (and subordinate to run()) functions have been creating their own version of the variable.
Nothing strange about that...
Except...
All of them have been using the self. keyword.
Which, if my understanding is correct, should mean they are always accessing the instance-wide version of the variable... not creating their own!
Here's a simplified version of the code:
import multiprocessing
import atexit
import signal
import sys
import json
class Worker(multiprocessing.Process):
def __init__(self, my_string_1, my_string_2):
# Inherit the __init_ from Process, very important or we will get errors
super(Worker, self).__init__()
# Make sure we know what to do when called to exit
atexit.register(self.exit)
signal.signal(signal.SIGTERM, self.exit)
self.my_dictionary = {
'my_string_1' : my_string_1,
'my_string_2' : my_string_2
}
def run(self):
self.my_dictionary = {
'new_string' : 'Watch me make weird stuff happen!'
}
try:
while True:
print(self.my_dictionary['my_string_1'] + " " + self.my_dictionary['my_string_2'])
except (KeyboardInterrupt, SystemExit):
self.exit()
def exit(self):
# Write the relevant data to file
info_for_file = {
'my_dictionary': self.my_dictionary
}
print(info_for_file) # For easier debugging
save_file = open('save.log', 'w')
json.dump(info_for_file, save_file)
save_file.close()
# Exit
sys.exit()
if __name__ == '__main__':
strings_list = ["Hello", "World", "Ehlo", "Wrld"]
instances = []
try:
for i in range(len(strings_list) - 2):
my_string_1 = strings_list[i]
my_string_2 = strings_list[i + 1]
instance = Worker(my_string_1, my_string_2)
instances.append(instance)
instance.start()
for instance in instances:
instance.join()
except (KeyboardInterrupt, SystemExit):
for instance in instances:
instance.exit()
instance.close()
On run we get the following traceback...
Process Worker-2:
Process Worker-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "<stdin>", line 18, in run
File "<stdin>", line 18, in run
KeyError: 'my_string_1'
KeyError: 'my_string_1'
In other words, even though the key my_string_1 was explicitly added during init, the run() function is accessing a new version of self.my_dictionary which does not contain that key!
Again, this would be expected if we were dealing with a normal variable (my_dictionary instead of self.my_dictionary) but I thought that self.variables were always instance-wide...
What is going on here?
Your problem can basically be represented by the following:
class Test:
def __init__(self):
self.x = 1
def run(self):
self.x = 2
if self.x != 1:
print("self.x isn't 1!")
t = Test()
t.run()
Note what run is doing.
You overwrite your instance member self.my_dictionary with incompatible data when you write
self.my_dictionary = {
'new_string' : 'Watch me make weird stuff happen!'
}
Then try to use that incompatible data when you say
print(self.my_dictionary['my_string_1']...
It's not clear precisely what your intent is when you overwrite my_dictionary, but that's why you're getting the error. You'll need to rethink your logic.
Question
I am observing behavior in Python 3.3.4 that I would like help understanding: Why are my exceptions properly raised when a function is executed normally, but not when the function is executed in a pool of workers?
Code
import multiprocessing
class AllModuleExceptions(Exception):
"""Base class for library exceptions"""
pass
class ModuleException_1(AllModuleExceptions):
def __init__(self, message1):
super(ModuleException_1, self).__init__()
self.e_string = "Message: {}".format(message1)
return
class ModuleException_2(AllModuleExceptions):
def __init__(self, message2):
super(ModuleException_2, self).__init__()
self.e_string = "Message: {}".format(message2)
return
def func_that_raises_exception(arg1, arg2):
result = arg1 + arg2
raise ModuleException_1("Something bad happened")
def func(arg1, arg2):
try:
result = func_that_raises_exception(arg1, arg2)
except ModuleException_1:
raise ModuleException_2("We need to halt main") from None
return result
pool = multiprocessing.Pool(2)
results = pool.starmap(func, [(1,2), (3,4)])
pool.close()
pool.join()
print(results)
This code produces this error:
Exception in thread Thread-3:
Traceback (most recent call last):
File "/user/peteoss/encap/Python-3.4.2/lib/python3.4/threading.py", line 921, in _bootstrap_inner
self.run()
File "/user/peteoss/encap/Python-3.4.2/lib/python3.4/threading.py", line 869, in run
self._target(*self._args, **self._kwargs)
File "/user/peteoss/encap/Python-3.4.2/lib/python3.4/multiprocessing/pool.py", line 420, in _handle_results
task = get()
File "/user/peteoss/encap/Python-3.4.2/lib/python3.4/multiprocessing/connection.py", line 251, in recv
return ForkingPickler.loads(buf.getbuffer())
TypeError: __init__() missing 1 required positional argument: 'message2'
Conversely, if I simply call the function, it seems to handle the exception properly:
print(func(1, 2))
Produces:
Traceback (most recent call last):
File "exceptions.py", line 40, in
print(func(1, 2))
File "exceptions.py", line 30, in func
raise ModuleException_2("We need to halt main") from None
__main__.ModuleException_2
Why does ModuleException_2 behave differently when it is run in a process pool?
The issue is that your exception classes have non-optional arguments in their __init__ methods, but that when you call the superclass __init__ method you don't pass those arguments along. This causes a new exception when your exception instances are unpickled by the multiprocessing code.
This has been a long-standing issue with Python exceptions, and you can read quite a bit of the history of the issue in this bug report (in which a part of the underlying issue with pickling exceptions was fixed, but not the part you're hitting).
To summarize the issue: Python's base Exception class puts all the arguments it's __init__ method receives into an attribute named args. Those arguments are put into the pickle data and when the stream is unpickled, they're passed to the __init__ method of the newly created object. If the number of arguments received by Exception.__init__ is not the same as a child class expects, you'll get at error at unpickling time.
A workaround for the issue is to pass all the arguments you custom exception classes require in their __init__ methods to the superclass __init__:
class ModuleException_2(AllModuleExceptions):
def __init__(self, message2):
super(ModuleException_2, self).__init__(message2) # the change is here!
self.e_string = "Message: {}".format(message2)
Another possible fix would be to not call the superclass __init__ method at all (this is what the fix in the bug linked above allows), but since that's usually poor behavior for a subclass, I can't really recommend it.
Your ModuleException_2.__init__ fails while beeing unpickled.
I was able to fix the problem by changing the signature to
class ModuleException_2(AllModuleExceptions):
def __init__(self, message2=None):
super(ModuleException_2, self).__init__()
self.e_string = "Message: {}".format(message2)
return
but better have a look at Pickling Class Instances to ensure a clean implementation.
below code works:
import multiprocessing
import threading
import time
file_path = 'C:/TEST/0000.txt'
class H(object):
def __init__(self, path):
self.hash_file = open(file_path, 'rb')
def read_line(self):
print self.hash_file.readline()
h = H(file_path)
h.read_line()
But when I use in process:
import multiprocessing
import threading
import time
file_path = 'C:/TEST/0000.txt'
class Worker(multiprocessing.Process):
def __init__(self, path):
super(Worker, self).__init__()
self.hash_file = open(path, 'rb')
def run(self):
while True:
for i in range(1000):
print self.hash_file.readline()
time.sleep(1.5)
if __name__ == '__main__':
w = Worker(file_path)
w.start()
w.join()
raise exception:
Process Worker-1:
Traceback (most recent call last):
File "E:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\ts_file_open.py", line 31, in run
print self.hash_file.readline()
ValueError: I/O operation on closed file
Because open cost a lot and I only need read the file, I think open it once would be enough.But why this file object is closed when process run? And I also want to pass this file object to child process and child thread of child process.
This fails because you're opening the file in the parent process, but trying to use it in the child. File descriptors from the parent process are not inherited by the child on Windows (because it's not using os.fork to create the new process), so the read operation fails in the child. Note that this code will actually work on Linux, because the file descriptor gets inherited by the child, due to the nature of os.fork.
Also, I don't think the open operation itself is particularly expensive. Actually reading the file is potentially expensive, but the open operation itself should be fast.
I'm not sure why, but yesterday I was testing some multiprocessing code that I wrote and it was working fine. Then today when I checked the code again, it would give me this error:
Exception in thread Thread-5:
Traceback (most recent call last):
File "C:\Python32\lib hreading.py", line 740, in _bootstrap_inner
self.run()
File "C:\Python32\lib hreading.py", line 693, in run
self._target(*self._args, **self._kwargs)
File "C:\Python32\lib\multiprocessing\pool.py", line 342, in _handle_tasks
put(task)
File "C:\Python32\lib\multiprocessing\pool.py", line 439, in __reduce__
'pool objects cannot be passed between processes or pickled'
NotImplementedError: pool objects cannot be passed between processes or pickled
The structure of my code goes as follows:
* I have 2 modules, say A.py, and B.py.
* A.py has class defined in it called A.
* B.py similarly has class B.
* In class A I have a multiprocessing pool as one of the attributes.
* The pool is defined in A.__init__(), but used in another method - run()
* In A.run() I set some attributes of some objects of class B (which are collected in a list called objBList), and then I use pool.map(processB, objBList)
* processB() is a module function (in A.py) that receives as the only parameter (an instance of B) and calls B.runInput()
* the error happens at the pool.map() line.
basically in A.py:
class A:
def __init__(self):
self.pool = multiprocessing.Pool(7)
def run(self):
for b in objBList:
b.inputs = something
result = self.pool.map(processB, objBList)
return list(result)
def processB(objB):
objB.runInputs()
and in B.py:
class B:
def runInputs(self):
do_something()
BTW, I'm forced to use the processB() module function because of the way multiprocessing works on Windows.
Also I would like to point out that the error I am getting - that pool can't be pickled - shouldn't be referring to any part of my code, as I'm not trying to send the child processes any Pool objects.
Any ideas?
(PS: I should also mention that in between the two days that I was testing this function the computer restarted unexpectedly - possibly after installing windows updates.)
Perhaps your class B objects contain a reference to your A instance.