How to share string between processes

How to share string between processes - python

I have the hardest time to share a string between processes. I looked at the following Q1, Q2, Q3 but still fail in an actual example. If I understood the docs correctly, looking should be done automatically therefore there should be no race condition with setting/reading. Here's what I've got:
#!/usr/bin/env python3
import multiprocessing
import ctypes
import time
SLEEP = 0.1
CYCLES = 20
def child_process_fun(share):
for i in range(CYCLES):
time.sleep(SLEEP)
share.value = str(time.time())
if __name__ == '__main__':
share = multiprocessing.Value(ctypes.c_wchar_p, '')
process = multiprocessing.Process(target=child_process_fun, args=(share,))
process.start()
for i in range(CYCLES):
time.sleep(SLEEP)
print(share.value)
which produces:
Traceback (most recent call last):
File "test2.py", line 23, in <module>
print(share.value)
File "<string>", line 5, in getvalue
ValueError: character U+e479b7b0 is not in range [U+0000; U+10ffff]
Edit: ´id(share.value)´ is different for each process. However if I try a double as shared variable instead they are the same and it works like a charm. Could this be a python bug?

Related

Getting an AssertionError when I use the multiprocessing module

The relevant code is actually from a separate thread.
Here it is anyway
import multiprocessing
from playsound import playsound
p = multiprocessing.Process(target=playsound, args=("file.mp3",))
p.start()
input("press ENTER to stop playback")
p.terminate()
When I run this exact code, I get the following error:
Traceback (most recent call last):
File "/Users/omit/Developer/omit/main2.py", line 22, in <module>
p2 = multiprocessing.Process(printStatus)
File "/opt/homebrew/Cellar/python#3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 82, in __init__
assert group is None, 'group argument must be None for now'
AssertionError: group argument must be None for now
(yt-proj) omit#omit-Air youtube proj % /opt/homebrew/bin/python3 "/Users/omit/Developer/omit/main2.py"
Traceback (most recent call last):
File "/Users/omit/Developer/omit/main2.py", line 22, in <module>
p2 = multiprocessing.Process(printStatus)
File "/opt/homebrew/Cellar/python#3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 82, in __init__
assert group is None, 'group argument must be None for now'
AssertionError: group argument must be None for now
What could be the problem?

The code to start the multiprocessing must never be "top level". It must only be run if this is the main thread.
def main():
p = multiprocessing.Process(target=playsound, args=("file.mp3",))
p.start()
input("press ENTER to stop playback")
p.terminate()
if __name__ == "__main__":
main()

Your problem, as seen in your error message is in the p2 variable. You are creating it like this:
p2 = multiprocessing.Process(printStatus)
That is a problem because the first positional argument is the group parameter. You need to change it to this:
p2 = multiprocessing.Process(target=printStatus)
Here are the python docs for multiprocessing.Process.
The relevant paragraph (emphasis mine):
class multiprocessing.Process(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None)
Process objects represent activity that is run in a separate process. The Process class has equivalents of all the methods of threading.Thread.
The constructor should always be called with keyword arguments. group should always be None; it exists solely for compatibility with threading.Thread. target is the callable object to be invoked by the run() method.
As shown above, your problem is coming from trying to create a Process object without using keyword arguments, and it is assigning something to the group parameter.

how to use Queue in python?

My code in the following encounters error.
It's an example to use Queue in python.
May I ask how to fix it?
thanks for your help in advance.
# coding=utf-8
from multiprocessing import Queue
if __name__ == '__main__':
q=Queue(3)
q.put('message1')
q.put('message2')
print(q.full())
q.put('message3')
print(q.full())
try:
q.put("message4",True,1)
except:
print("the queue is full. current amount is %s"%q.qsize())
try:
q.put('message4')
except:
print("the queue is full. current amount is %s"%q.qsize())
if not q.empty():
print('get message from the queue.')
for i in range(q.qsize()):
print(q.get_nowait())
if not q.full():
q.put_nowait("message4")
The output error is:
False
True
Traceback (most recent call last):<br/>
File "process.py", line 13, in <module>
q.put("message4",True,1)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/queues.py", line 84, in put
raise Full
queue.Full
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "process.py", line 15, in <module>
print("the queue is full. current amount is %s"%q.qsize())
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/queues.py", line 120, in qsize
return self._maxsize - self._sem._semlock._get_value()
NotImplementedError

As Víctor Terrón has suggested in a GitHub discussion, you can use his implementation:
https://github.com/vterron/lemon/blob/d60576bec2ad5d1d5043bcb3111dff1fcb58a8d6/methods.py#L536-L573
According to the doc:
A portable implementation of multiprocessing.Queue. Because of
multithreading / multiprocessing semantics, Queue.qsize() may raise
the NotImplementedError exception on Unix platforms like Mac OS X
where sem_getvalue() is not implemented. This subclass addresses this
problem by using a synchronized shared counter (initialized to zero)
and increasing / decreasing its value every time the put() and get()
methods are called, respectively. This not only prevents
NotImplementedError from being raised, but also allows us to implement
a reliable version of both qsize() and empty().

Trouble importing name '_args_from_interpreter_flags' from 'subprocess'

I am attempting to write a Python script to download and unzip hundreds of files from an AWS server. As I understand it, these tasks are I/O-bound tasks, so I would like to multi-thread this task to speed up processing times.
Since I am new to Python, I've been reading guides like this one and that one on multithreading and multiprocessing.
Both of the above links suggest code to import methods from the subprocess library, but I am running into trouble completing these imports. The second link above suggests the following code to illustrate multithreading:
from multiprocessing import Pool as ProcessPool
from urllib.request import urlopen
def run_tasks(function, args, pool, chunk_size=None):
results = pool.map(function, args, chunk_size)
return results
def work(n):
with urlopen("https://www.google.com/#{n}") as f:
contents = f.read(32)
return contents
if __name__ == '__main__':
numbers = [x for x in range(1,100)]
# Run the task using a thread pool
t_p = ThreadPool()
result = run_tasks(work, numbers, t_p)
print (result)
t_p.close()
When I tried running this script, I got the following error with traceback:
PS C:\Users\USERNAME> & "C:/Users/USERNAME/AppData/Local/Continuum/anaconda3/python.exe" "h:/Post-Processing/API Query/Python Test/subprocess_test/subprocess.py"
Traceback (most recent call last):
File "h:/Post-Processing/API Query/Python Test/subprocess_test/subprocess.py", line 38, in <module>
t_p = ThreadPool()
File "C:\Users\USERNAME\AppData\Local\Continuum\anaconda3\lib\multiprocessing\dummy\__init__.py", line 123, in Pool
from ..pool import ThreadPool
File "C:\Users\USERNAME\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py", line 26, in <module>
from . import util
File "C:\Users\USERNAME\AppData\Local\Continuum\anaconda3\lib\multiprocessing\util.py", line 17, in <module>
from subprocess import _args_from_interpreter_flags
ImportError: cannot import name '_args_from_interpreter_flags' from 'subprocess' (h:\PSO Post-Processing\API Query\Python Test\subprocess_test\subprocess.py)
I found this SO thread, in which the answer suggests adding
from subprocess import _args_from_interpreter_flags
to the list of imports. However, when I added this line, the import error seems to shift into my current script:
Traceback (most recent call last):
File "h:/Post-Processing/API Query/Python Test/subprocess_test/subprocess.py", line 20, in <module>
from subprocess import _args_from_interpreter_flags
File "h:\Post-Processing\API Query\Python Test\subprocess_test\subprocess.py", line 20, in <module>
from subprocess import _args_from_interpreter_flags
ImportError: cannot import name '_args_from_interpreter_flags' from 'subprocess' (h:\PSO Post-Processing\API Query\Python Test\subprocess_test\subprocess.py)
I am now suspecting that something is wrong with my Python installation, but I am not sure how to troubleshoot it.
I am running Windows 10 on a work computer and using Visual Studio Code as my editor. According to Visual Studio Code, I'm running Python 3.7.6 64-bit ('Continuum': virtualenv). I found that I have subprocess.py installed at
"C:\Users\USER\AppData\Local\Continuum\anaconda3\Lib\subprocess.py"
and this subprocess.py file indeed has a segment with
def _args_from_interpreter_flags():
"""Return a list of command-line arguments reproducing the current
settings in sys.flags, sys.warnoptions and sys._xoptions."""
flag_opt_map = {
'debug': 'd',
# 'inspect': 'i',
# 'interactive': 'i',
'dont_write_bytecode': 'B',
'no_site': 'S',
'verbose': 'v',
'bytes_warning': 'b',
'quiet': 'q',
# -O is handled in _optim_args_from_interpreter_flags()
}
args = _optim_args_from_interpreter_flags()
for flag, opt in flag_opt_map.items():
v = getattr(sys.flags, flag)
if v > 0:
args.append('-' + opt * v)
if sys.flags.isolated:
args.append('-I')
else:
if sys.flags.ignore_environment:
args.append('-E')
if sys.flags.no_user_site:
args.append('-s')
# -W options
warnopts = sys.warnoptions[:]
bytes_warning = sys.flags.bytes_warning
xoptions = getattr(sys, '_xoptions', {})
dev_mode = ('dev' in xoptions)
if bytes_warning > 1:
warnopts.remove("error::BytesWarning")
elif bytes_warning:
warnopts.remove("default::BytesWarning")
if dev_mode:
warnopts.remove('default')
for opt in warnopts:
args.append('-W' + opt)
# -X options
if dev_mode:
args.extend(('-X', 'dev'))
for opt in ('faulthandler', 'tracemalloc', 'importtime',
'showalloccount', 'showrefcount', 'utf8'):
if opt in xoptions:
value = xoptions[opt]
if value is True:
arg = opt
else:
arg = '%s=%s' % (opt, value)
args.extend(('-X', arg))
return args
Given all this information, I am sure that I'm missing a simple detail that's stopping the threading code from working. I appreciate any help you can give.
Thank you!!

Learning the Python Thread Module

I am trying to learn more about the thread module. I've come up with a quick script but am getting an error when I run it. The docs show the format as:
thread.start_new_thread ( function, args[, kwargs] )
My method only has one argument.
#!/usr/bin/python
import ftplib
import thread
sites = ["ftp.openbsd.org","ftp.ucsb.edu","ubuntu.osuosl.org"]
def ftpconnect(target):
ftp = ftplib.FTP(target)
ftp.login()
print "File list from: %s" % target
files = ftp.dir()
print files
for i in sites:
thread.start_new_thread(ftpconnect(i))
The error I am seeing occurs after one iteration of the for loop:
Traceback (most recent call last): File "./ftpthread.py", line 16,
in
thread.start_new_thread(ftpconnect(i)) TypeError: start_new_thread expected at least 2 arguments, got 1
Any suggestions for this learning process would be appreciated. I also looked into using threading, but I am unable to import threading since its not install apparently and I haven't found any documentation for installing that module yet.
Thank You!
There error I get when trying to import threading on my Mac is:
>>> import threading
# threading.pyc matches threading.py
import threading # precompiled from threading.pyc
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "threading.py", line 7, in <module>
class WorkerThread(threading.Thread) :
AttributeError: 'module' object has no attribute 'Thread'

The thread.start_new_thread function is really low-level and doesn't give you a lot of control. Take a look at the threading module, more specifically the Thread class: http://docs.python.org/2/library/threading.html#thread-objects
You then want to replace the last 2 lines of your script with:
# This line should be at the top of your file, obviously :p
from threading import Thread
threads = []
for i in sites:
t = Thread(target=ftpconnect, args=[i])
threads.append(t)
t.start()
# Wait for all the threads to complete before exiting the program.
for t in threads:
t.join()
Your code was failing, by the way, because in your for loop, you were calling ftpconnect(i), waiting for it to complete, and then trying to use its return value (that is, None) to start a new thread, which obviously doesn't work.
In general, starting a thread is done by giving it a callable object (function/method -- you want the callable object, not the result of a call -- my_function, not my_function()), and optional arguments to give the callable object (in our case, [i] because ftpconnect takes one positional argument and you want it to be i), and then calling the Thread object's start method.

Now that you can import threading, start with best practices at once ;-)
import threading
threads = [threading.Thread(target=ftpconnect, args=(s,))
for s in sites]
for t in threads:
t.start()
for t in threads: # shut down cleanly
t.join()

What you want is to pass the function object and arguments to the function to thread.start_new_thread, not execute the function.
Like this:
for i in sites:
thread.start_new_thread(ftpconnect, (i,))

Python multiprocessing: synchronizing file-like object

I'm trying to make a file like object which is meant to be assigned to sys.stdout/sys.stderr during testing to provide deterministic output. It's not meant to be fast, just reliable. What I have so far almost works, but I need some help getting rid of the last few edge-case errors.
Here is my current implementation.
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
from os import getpid
class MultiProcessFile(object):
"""
helper for testing multiprocessing
multiprocessing poses a problem for doctests, since the strategy
of replacing sys.stdout/stderr with file-like objects then
inspecting the results won't work: the child processes will
write to the objects, but the data will not be reflected
in the parent doctest-ing process.
The solution is to create file-like objects which will interact with
multiprocessing in a more desirable way.
All processes can write to this object, but only the creator can read.
This allows the testing system to see a unified picture of I/O.
"""
def __init__(self):
# per advice at:
# http://docs.python.org/library/multiprocessing.html#all-platforms
from multiprocessing import Queue
self.__master = getpid()
self.__queue = Queue()
self.__buffer = StringIO()
self.softspace = 0
def buffer(self):
if getpid() != self.__master:
return
from Queue import Empty
from collections import defaultdict
cache = defaultdict(str)
while True:
try:
pid, data = self.__queue.get_nowait()
except Empty:
break
cache[pid] += data
for pid in sorted(cache):
self.__buffer.write( '%s wrote: %r\n' % (pid, cache[pid]) )
def write(self, data):
self.__queue.put((getpid(), data))
def __iter__(self):
"getattr doesn't work for iter()"
self.buffer()
return self.__buffer
def getvalue(self):
self.buffer()
return self.__buffer.getvalue()
def flush(self):
"meaningless"
pass
... and a quick test script:
#!/usr/bin/python2.6
from multiprocessing import Process
from mpfile import MultiProcessFile
def printer(msg):
print msg
processes = []
for i in range(20):
processes.append( Process(target=printer, args=(i,), name='printer') )
print 'START'
import sys
buffer = MultiProcessFile()
sys.stdout = buffer
for p in processes:
p.start()
for p in processes:
p.join()
for i in range(20):
print i,
print
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
print
print 'DONE'
print
buffer.buffer()
print buffer.getvalue()
This works perfectly 95% of the time, but it has three edge-case problems. I have to run the test script in a fast while-loop to reproduce these.
3% of the time, the parent process output isn't completely reflected. I assume this is because the data is being consumed before the Queue-flushing thread can catch up. I haven't though of a way to wait for the thread without deadlocking.
.5% of the time, there's a traceback from the multiprocess.Queue implementation
.01% of the time, the PIDs wrap around, and so sorting by PID gives the wrong ordering.
In the very worst case (odds: one in 70 million), the output would look like this:
START
DONE
302 wrote: '19\n'
32731 wrote: '0 1 2 3 4 5 6 7 8 '
32732 wrote: '0\n'
32734 wrote: '1\n'
32735 wrote: '2\n'
32736 wrote: '3\n'
32737 wrote: '4\n'
32738 wrote: '5\n'
32743 wrote: '6\n'
32744 wrote: '7\n'
32745 wrote: '8\n'
32749 wrote: '9\n'
32751 wrote: '10\n'
32752 wrote: '11\n'
32753 wrote: '12\n'
32754 wrote: '13\n'
32756 wrote: '14\n'
32757 wrote: '15\n'
32759 wrote: '16\n'
32760 wrote: '17\n'
32761 wrote: '18\n'
Exception in thread QueueFeederThread (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
File "/usr/lib/python2.6/threading.py", line 484, in run
File "/usr/lib/python2.6/multiprocessing/queues.py", line 233, in _feed
<type 'exceptions.TypeError'>: 'NoneType' object is not callable
In python2.7 the exception is slightly different:
Exception in thread QueueFeederThread (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
File "/usr/lib/python2.7/threading.py", line 505, in run
File "/usr/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
<type 'exceptions.IOError'>: [Errno 32] Broken pipe
How do I get rid of these edge cases?

The solution came in two parts. I've successfully run the test program 200 thousand times without any change in output.
The easy part was to use multiprocessing.current_process()._identity to sort the messages. This is not a part of the published API, but it is a unique, deterministic identifier of each process. This fixed the problem with PIDs wrapping around and giving a bad ordering of output.
The other part of the solution was to use multiprocessing.Manager().Queue() rather than the multiprocessing.Queue. This fixes problem #2 above because the manager lives in a separate Process, and so avoids some of the bad special cases when using a Queue from the owning process. #3 is fixed because the Queue is fully exhausted and the feeder thread dies naturally before python starts shutting down and closes stdin.

I have encountered far fewer multiprocessing bugs with Python 2.7 than with Python 2.6. Having said this, the solution I used to avoid the "Exception in thread QueueFeederThread" problem is to sleep momentarily, possibly for 0.01s, in each process in which the the Queue is used. It is true that using sleep is not desirable or even reliable, but the specified duration was observed to work sufficiently well in practice for me. You can also try 0.1s.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to share string between processes - python

Related

Getting an AssertionError when I use the multiprocessing module

how to use Queue in python?

Trouble importing name '_args_from_interpreter_flags' from 'subprocess'

Learning the Python Thread Module

Python multiprocessing: synchronizing file-like object

Categories

Resources