below code works:
import multiprocessing
import threading
import time
file_path = 'C:/TEST/0000.txt'
class H(object):
def __init__(self, path):
self.hash_file = open(file_path, 'rb')
def read_line(self):
print self.hash_file.readline()
h = H(file_path)
h.read_line()
But when I use in process:
import multiprocessing
import threading
import time
file_path = 'C:/TEST/0000.txt'
class Worker(multiprocessing.Process):
def __init__(self, path):
super(Worker, self).__init__()
self.hash_file = open(path, 'rb')
def run(self):
while True:
for i in range(1000):
print self.hash_file.readline()
time.sleep(1.5)
if __name__ == '__main__':
w = Worker(file_path)
w.start()
w.join()
raise exception:
Process Worker-1:
Traceback (most recent call last):
File "E:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\ts_file_open.py", line 31, in run
print self.hash_file.readline()
ValueError: I/O operation on closed file
Because open cost a lot and I only need read the file, I think open it once would be enough.But why this file object is closed when process run? And I also want to pass this file object to child process and child thread of child process.
This fails because you're opening the file in the parent process, but trying to use it in the child. File descriptors from the parent process are not inherited by the child on Windows (because it's not using os.fork to create the new process), so the read operation fails in the child. Note that this code will actually work on Linux, because the file descriptor gets inherited by the child, due to the nature of os.fork.
Also, I don't think the open operation itself is particularly expensive. Actually reading the file is potentially expensive, but the open operation itself should be fast.
Related
I am trying to create a shared memory for my Python application, which should be used in the parent process and in another process that is spawned from that parent process. In most cases that works fine, however, sometimes I get the following stacktrace:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 110, in __setstate__
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory: '/psm_47f7f5d7'
I want to emphasize that our code/application works fine in 99% of the time. We are spawning these new processes with new shared memory for each such process on a regular basis in our application (which is a server process, so it's running 24/7). Nearly all the time this works fine, only from time to time this error above is thrown, which then kills the whole application.
Update: I noticed that this problem occurs mainly when the application was running for a while already. When I start it up the creation of shared memory and spawning new processes works fine without this error.
The shared memory is created like this:
# Spawn context for multiprocessing
_mp_spawn_ctxt = multiprocessing.get_context("spawn")
_mp_spawn_ctxt_pipe = _mp_spawn_ctxt.Pipe
# Create shared memory
mem_size = width * height * bpp
shared_mem = shared_memory.SharedMemory(create=True, size=mem_size)
image = np.ndarray((height, width, bpp), dtype=np.uint8, buffer=shared_mem.buf)
parent_pipe, child_pipe = _mp_spawn_ctxt_pipe()
time.sleep(0.1)
# Spawn new process
# _CameraProcess is a custom class derived from _mp_spawn_ctxt.Process
proc = _CameraProcess(shared_mem, child_pipe)
proc.start()
Any ideas what could be the issue here?
I had the similar issue in case, that more processes had access to the shared memory/object and one process did update the shared memory/object.
I solved these issues based on these steps:
I synchronized all operations with shared memory/object via mutexes (see sample for multiprocessing usage superfastpython or protect shared resources). Critical part of code are create, update, delete but also reading content of shared object/memory, because at the same time different process can do update of shared object/memory, etc.
I avoided libraries with only single thread execution support
See sample code with synchronization:
def increase(sharedObj, lock):
for i in range(100):
time.sleep(0.01)
lock.acquire()
sharedObj = sharedObj + 1
lock.release()
def decrease(sharedObj, lock):
for i in range(100):
time.sleep(0.001)
lock.acquire()
sharedObj = sharedObj - 1
lock.release()
if __name__ == '__main__':
sharedObj = multiprocessing.Value ('i',1000)
lock=multiprocessing.Lock()
p1=multiprocessing.Process(target=increase, args=(sharedObj, lock))
p2=multiprocessing.Process(target=decrease, args=(sharedObj, lock))
p1.start()
p2.start()
p1.join()
p2.join()
To my mind, I have a fairly simple long-IO operation that could be refined using threading. I've built a DearPyGui GUI interface (not explicitly related to the problem - just background info). A user can load a file via the package's file loader. Some of these files can be quite large (3 GB). Therefore, I'm adding a pop-up window to lock the interface (modal) whilst the file is loading. The above was context, and the problem is not the DearPyGUI.
I'm starting a thread inside a method of a class instance, which in turn calls (via being the thread's target) a further method (from the same object) and then updates an attribute of that object, which is to be interrogated later. For example:
class IOClass:
__init__(self):
self.fileObj = None
def loadFile(self, fileName):
thread = threading.Thread(target=self.threadMethod, args=fileName)
thread.start()
#Load GUI wait-screen
thread.join()
#anything else..EXCEPTION THROWN HERE
print(" ".join(["Version:", self.fileObj.getVersion()]))
def threadMethod(self, fileName):
print(" ".join(["Loading filename", fileName]))
#expensive-basic Python IO operation here
self.fileObj = ...python IO operation here
class GUIClass:
__init__(self):
pass
def startMethod(self):
#this is called by __main__
ioClass = IOClass()
ioClass.loadFile("filename.txt")
Unfortunately, I get this error:
Exception in thread Thread-1 (loadFile):
Traceback (most recent call last):
File "/home/anthony/anaconda3/envs/CPRD-software/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
self.run()
File "/home/anthony/anaconda3/envs/CPRD-software/lib/python3.10/threading.py", line 946, in run
self._target(*self._args, **self._kwargs)
TypeError: AnalysisController.loadFile() takes 2 positional arguments but 25 were given
Traceback (most recent call last):
File "/home/anthony/CPRD-software/GUI/Controllers/AnalysisController.py", line 117, in loadStudySpace
print(" ".join(["Version:", self.fileObj.getVersion()]))
AttributeError: 'NoneType' object has no attribute 'getVersion'
I'm not sure what's going on. The machine should sit there for at least 3 minutes as the data is loaded. But instead, it appears to perform join, but the main thread doesn't wait for the IO thread to load the file, instead attempting to class a method on what was loaded in.
I solved it. In the threading.Thread() do not call the method using self. Instead, pass self in as an argument to the thread method e.g.,
thread = threading.Thread(target=threadMethod, args=(self, fileName))
The target function doesn't change i.e. it remains as so:
def threadMethod(self, fileName):
#expensive-basic Python IO operation here
self.fileObj = ...python IO operation here
When I run the following script on a Windows computer, I do not see any of the log messages from the log_pid function, however I do when I run on Unix / Mac. I've read before that multiprocessing is different on Windows compared to Mac, but it's not clear to me what changes should I make to get this script to work on Windows. I'm running Python 3.6.
import logging
import sys
from concurrent.futures import ProcessPoolExecutor
import os
def log_pid(x):
logger.info('Executing on process: %s' % os.getpid())
def do_stuff():
logger.info('this is the do stuff function.')
with ProcessPoolExecutor(max_workers=4) as executor:
executor.map(log_pid, range(0, 10))
def main():
logger.info('this is the main function.')
do_stuff()
if __name__ == '__main__':
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logger = logging.getLogger(__name__)
logger.info('Start of script ...')
main()
logger.info('End of script ...')
Unix processes are created via the fork strategy where the child gets cloned from the parent and continues its execution right at the moment the parent forked.
On Windows is quite different: a blank process is created and a new Python interpreter gets launched. The interpreter will then load the module where the log_pid function is located and execute it.
This means the __main__ section is not executed by the newly spawned child process. Hence, the logger object is not created and the log_pid function crashes accordingly. You don't see the error because you ignore the result of your computation. Try to modify the logic as follows.
def do_stuff():
logger.info('this is the do stuff function.')
with ProcessPoolExecutor(max_workers=4) as executor:
iterator = executor.map(log_pid, range(0, 10))
list(iterator) # collect the results in a list
And the issue will become evident.
Traceback (most recent call last):
File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 153, in _process_chunk
return [fn(*args) for args in chunk]
File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 153, in <listcomp>
return [fn(*args) for args in chunk]
File "C:\Users\cafama\Desktop\pool.py", line 8, in log_pid
logger.info('Executing on process: %s' % os.getpid())
NameError: name 'logger' is not defined
When dealing with process pools (whether concurrent.futures or multiprocessing ones) always collect the result of the computation to avoid silent bugs to cause confusion.
To fix the problem, just move the logger creation at the top level of the module and everything will work on all platforms.
import logging
import sys
from concurrent.futures import ProcessPoolExecutor
import os
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logger = logging.getLogger(__name__)
def log_pid(x):
logger.info('Executing on process: %s' % os.getpid())
...
Ok, this one has me tearing my hair out:
I have a multi-process program, with separate workers each working on a given task.
When a KeyboardInterrupt comes, I want each worker to save its internal state to a file, so it can continue where it left off next time.
HOWEVER...
It looks like the dictionary which contains information about the state is vanishing before this can happen!
How? The exit() function is accessing a more globally scoped version of the dictionary... and it turns out that the various run() (and subordinate to run()) functions have been creating their own version of the variable.
Nothing strange about that...
Except...
All of them have been using the self. keyword.
Which, if my understanding is correct, should mean they are always accessing the instance-wide version of the variable... not creating their own!
Here's a simplified version of the code:
import multiprocessing
import atexit
import signal
import sys
import json
class Worker(multiprocessing.Process):
def __init__(self, my_string_1, my_string_2):
# Inherit the __init_ from Process, very important or we will get errors
super(Worker, self).__init__()
# Make sure we know what to do when called to exit
atexit.register(self.exit)
signal.signal(signal.SIGTERM, self.exit)
self.my_dictionary = {
'my_string_1' : my_string_1,
'my_string_2' : my_string_2
}
def run(self):
self.my_dictionary = {
'new_string' : 'Watch me make weird stuff happen!'
}
try:
while True:
print(self.my_dictionary['my_string_1'] + " " + self.my_dictionary['my_string_2'])
except (KeyboardInterrupt, SystemExit):
self.exit()
def exit(self):
# Write the relevant data to file
info_for_file = {
'my_dictionary': self.my_dictionary
}
print(info_for_file) # For easier debugging
save_file = open('save.log', 'w')
json.dump(info_for_file, save_file)
save_file.close()
# Exit
sys.exit()
if __name__ == '__main__':
strings_list = ["Hello", "World", "Ehlo", "Wrld"]
instances = []
try:
for i in range(len(strings_list) - 2):
my_string_1 = strings_list[i]
my_string_2 = strings_list[i + 1]
instance = Worker(my_string_1, my_string_2)
instances.append(instance)
instance.start()
for instance in instances:
instance.join()
except (KeyboardInterrupt, SystemExit):
for instance in instances:
instance.exit()
instance.close()
On run we get the following traceback...
Process Worker-2:
Process Worker-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "<stdin>", line 18, in run
File "<stdin>", line 18, in run
KeyError: 'my_string_1'
KeyError: 'my_string_1'
In other words, even though the key my_string_1 was explicitly added during init, the run() function is accessing a new version of self.my_dictionary which does not contain that key!
Again, this would be expected if we were dealing with a normal variable (my_dictionary instead of self.my_dictionary) but I thought that self.variables were always instance-wide...
What is going on here?
Your problem can basically be represented by the following:
class Test:
def __init__(self):
self.x = 1
def run(self):
self.x = 2
if self.x != 1:
print("self.x isn't 1!")
t = Test()
t.run()
Note what run is doing.
You overwrite your instance member self.my_dictionary with incompatible data when you write
self.my_dictionary = {
'new_string' : 'Watch me make weird stuff happen!'
}
Then try to use that incompatible data when you say
print(self.my_dictionary['my_string_1']...
It's not clear precisely what your intent is when you overwrite my_dictionary, but that's why you're getting the error. You'll need to rethink your logic.
This is not very important, just a silly experiment. I would like to create my own message passing.
I would like to have a dictionary of queues, where each key is the PID of the process.
Because I'd like to have the processes (created by Process()) to exchange messages inserting them in the queue of the process they want to send it to (knowing its pid).
This is a silly code:
from multiprocessing import Process, Manager, Queue
from os import getpid
from time import sleep
def begin(dic, manager, parentQ):
parentQ.put(getpid())
dic[getpid()] = manager.Queue()
dic[getpid()].put("Something...")
if __name__== '__main__':
manager = Manager()
dic = manager.dict()
parentQ = Queue()
p = Process(target = begin, args=(dic, manager, parentQ))
p.start()
son = parentQ.get()
print son
sleep(2)
print dic[son].get()
dic[getpid()] = manager.Queue(), this works fine. But when I perform
dic[son].put()/get() I get this message:
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "mps.py", line 8, in begin
dic[getpid()].put("Something...")
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.7/multiprocessing/managers.py", line 773, in _callmethod
raise convert_to_error(kind, result)
RemoteError:
---------------------------------------------------------------------------
Unserializable message: ('#RETURN', <Queue.Queue instance at 0x8a92d0c>)
---------------------------------------------------------------------------
do you know what's the right way to do it?
I believe your code is failing because Queues are not serializable, just like the traceback says. The multiprocessing.Manager() object can create a shared dict for you without a problem, just as you've done here, but values stored in the dict still need to be serializable (or picklable in Pythonese). If you're okay with the subprocesses not having access to each other's queues, then this should work for you:
from multiprocessing import Process, Manager, Queue
from os import getpid
number_of_subprocesses_i_want = 5
def begin(myQ):
myQ.put("Something sentimental from your friend, PID {0}".format(getpid()))
return
if __name__== '__main__':
queue_dic = {}
queue_manager = Manager()
process_list = []
for i in xrange(number_of_subprocesses_i_want):
child_queue = queue_manager.Queue()
p = Process(target = begin, args=(child_queue,))
p.start()
queue_dic[p.pid] = child_queue
process_list.append(p)
for p in process_list:
print(queue_dic[p.pid].get())
p.join()
This leaves you with a dictionary whose keys are the child processes, and the values are their respective queues, which can be used from the main process.
I don't think your original goal is achievable with queues because queues that you want a subprocess to use must be passed to the processes when they are created, so as you launch more processes, you have no way to give an existing process access to a new queue.
One possible way to have inter-process communication would be to have everyone share a single queue to pass messages back to your main process bundled with some kind of header, such as in a tuple:
(destination_pid, sender_pid, message)
..and have main read the destination_pid and direct (sender_pid, message) to that subprocess' queue. Of course, this implies that you need a method of notifying existing processes when a new process is available to communicate with.