Python check isinstance multiprocessing.Manager().Queue() - python

I am using python 2.7 multiprocessing on windows 7:
import multiprocessing as mp
from Queue import Queue
from multiprocessing.managers import AutoProxy
if __name__ == '__main__':
manager = mp.Manager()
myqueue = manager.Queue()
print myqueue
print type(myqueue)
print isinstance(myqueue, Queue)
print isinstance(myqueue, AutoProxy)
Output:
<Queue.Queue instance at 0x0000000002956B08>
<class 'multiprocessing.managers.AutoProxy[Queue]'>
False
Traceback (most recent call last):
File "C:/Users/User/TryHere.py", line 12, in <module> print
isinstance(myqueue, AutoProxy) TypeError: isinstance() arg 2 must be a
class, type, or tuple of classes and types
My question is: I would like to check if a variable is an instance of a multiprocessing queue, how should i go about checking?
I have referred to:
Check for instance of Python multiprocessing.Connection?
Accessing an attribute of a multiprocessing Proxy of a class
but they dont seem to address my issue. Thanks in advance!

Question: I would like to check if a variable is an instance of a multiprocessing queue, how should i go about checking?
It's a Proxy object, multiprocessing.managers.BaseProxy does match:
from multiprocessing.managers import BaseProxy
print(isinstance(myqueue, BaseProxy))
>>>True
Tested with Python: 3.4.2 and 2.7.9

For Python 3.6, the equivalent would be
import multiprocessing
test_queue = multiprocessing.Queue()
type(test_queue) == multiprocessing.queues.Queue:
>>> True
We can choose not to perform a type-check upon a newly created Queue object as proposed by #mikeye

Here's what I do:
import multiprocessing as mp
my_queue = mp.Queue()
print(type(my_queue) == type(mp.Queue()))
>>>True

For Python 3.10, we can use the Queue under the queues namespace, while still using isinstance().
import multiprocessing as mp
my_q = mp.Queue()
isinstance(my_q, mp.queues.Queue)
>>> True
import asyncio
isinstance(my_q, asyncio.Queue)
>>> False
import queue
isinstance(my_q, queue.Queue)
>>> False
I know this doesn't exactly cover the original question, but since the error is much the same as with only using mp.Queue instead of mp.queues.Queue, I thought I'd add this.

Related

Can I use a `multiprocessing.Queue` for communication within a process?

I'm using queues for inter-thread communication. I'm using multiprocessing.Queue() instead of queue.Queue() because the multiprocessing version exposes an underlying file descriptor which can be waited on with select.select - which means I can block waiting for an object in the queue or a packet to arrive on a network interface from the same thread.
But when I try to get an object from the queue, I get this:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: can't pickle _thread.lock objects
Is there a way to do this? Or am I stuck using queue.Queue() and having a separate thread select.select() on the sockets and put the results into the queue?
Edit: I think this is the minimal reproducible example:
import multiprocessing
import threading
queue = multiprocessing.Queue()
class Msg():
def __init__(self):
self.lock = threading.Lock()
def source():
queue.put(Msg())
def sink():
obj = queue.get()
print("Got")
threading.Thread(target=sink).start()
source()
The problem is that the object I'm putting into the queue has a threading.Lock object as a field (at several levels of composition deep).
TL;DR: threading.Lock instances simply cannot be pickled and pickle is used to serialize an object that is put to a multiprocessing.Queue instance. But there is very little value to passing an object to another thread via a multiprocessing.Queue since the thread retrieves what becomes a new instance of that object unless creating a copy of the object is part of your goal. So if you do pass the object via a queue, then the lock cannot not be part of the object's state and you need an alternate approach (see below).
The (much) Longer Answer
First, as your error message states threading.Lock` instances cannot be serialized with pickle. This can also easily be demonstrated:
>>> import pickle
>>> import threading
>>> lock = threading.Lock()
>>> serialized_lock = pickle.dumps(lock)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot pickle '_thread.lock' object
Second, when you put an object to a threading.Queue instance, the object is serialized with pickle and so you get the above exception.
But while your posting constitutes a minimal, complete example, it does not represent a realistic program that does anything useful. What are you actually trying to accomplish? Let's suppose you were able to serialize a lock and therefore pass an instance of Msg via a queue. Presumably the lock is to serialize some code that updates the object's state. But since this is a different instance of Msg than the one that was put on the queue, the only meaningful use of this lock would be if this sink thread created additional threads that operated on this instance. So let's conjecture there is an attribute, x that needs to be incremented in multiple threads. This would require a lock since the += operator is not atomic. Since the required lock could not be part of the object's state if being passed via a queue, then you have to separately create the lock. This is just one of many possible approaches:
import multiprocessing
import threading
queue = multiprocessing.Queue()
class Msg():
def __init__(self):
self.x = 0
def set_lock(self, lock):
self.lock = lock
def compute(self):
with self.lock:
self.x += 1
def source():
queue.put(Msg())
def sink():
msg = queue.get()
msg.set_lock(threading.Lock())
t = threading.Thread(target=msg.compute)
t.start()
msg.compute()
t.join()
print(msg.x)
threading.Thread(target=sink).start()
source()
Prints:
2
If you are not using a queue for object passing, then there is no problem having the lock as part of the object's initial state:
import queue
import socket
import os
import select
import threading
class PollableQueue(queue.Queue):
def __init__(self):
super().__init__()
# Create a pair of connected sockets
if os.name == 'posix':
self._putsocket, self._getsocket = socket.socketpair()
else:
# Compatibility on non-POSIX systems
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(('127.0.0.1', 0))
server.listen(1)
self._putsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self._putsocket.connect(server.getsockname())
self._getsocket, _ = server.accept()
server.close()
def fileno(self):
return self._getsocket.fileno()
def put(self, item):
super().put(item)
self._putsocket.send(b'x')
def get(self):
self._getsocket.recv(1)
return super().get()
class Msg:
def __init__(self, q, socket):
# An instance of this class could be passed via a multithreading.Queue
# A multiprocessing.Lock could also be used but is not
# necessary if we are doing threading:
self.lock = threading.Lock() # to be used by some method not shown
self.q = q
self.socket = socket
def consume(self):
while True:
can_read, _, _ = select.select([q, read_socket], [], [])
for r in can_read:
item = r.get() if isinstance(r, queue.Queue) else r.recv(3).decode()
print('Got:', item, 'from', type(r))
# Example code that performs polling:
if __name__ == '__main__':
import threading
import time
q = PollableQueue()
write_socket, read_socket = socket.socketpair()
msg = Msg(q, read_socket)
t = threading.Thread(target=msg.consume, daemon=True)
t.start()
# Feed data to the queues
q.put('abc')
write_socket.send(b'cde')
write_socket.send(b'fgh')
q.put('ijk')
# Give consumer time to get all the items:
time.sleep(1)
Prints:
Got: abc from <class '__main__.PollableQueue'>
Got: ijk from <class '__main__.PollableQueue'>
Got: cde from <class 'socket.socket'>
Got: fgh from <class 'socket.socket'>

Python namedtuple as argument to apply_async(..) callback

I'm writing a short program where I want to call a function asynchronously so that it doesn't block the caller. To do this, I'm using Pool from python's multiprocessing module.
In the function being called asynchronously I want to return a namedtuple to fit with the logic of the rest of my program, but I'm finding that a namedtuple does not seem to be a supported type to pass from the spawned process to the callback (probably because it cannot be pickled). Here is a minimum repro of the problem.
from multiprocessing import Pool
from collections import namedtuple
logEntry = namedtuple("LogEntry", ['logLev', 'msg'])
def doSomething(x):
# Do actual work here
logCode = 1
statusStr = "Message Here"
return logEntry(logLev=logCode, msg=statusStr)
def callbackFunc(result):
print(result.logLev)
print(result.msg)
def userAsyncCall():
pool = Pool()
pool.apply_async(doSomething, [1,2], callback=callbackFunc)
if __name__ == "__main__":
userAsyncCall() # Nothing is printed
# If this is uncommented, the logLev and status are printed as expected:
# y = logEntry(logLev=2, msg="Hello World")
# callbackFunc(y)
Does anyone know if there is a way to pass a namedtuple return value from the async process to the callback? Is there a better/more pythonic approach for what I'm doing?
The problem is that the case is different for the return value of namedtuple() and its typename parameter. That is, there's a mismatch between the named tuple's class definition and the variable name you've given it. You need the two to match:
LogEntry = namedtuple("LogEntry", ['logLev', 'msg'])
And update the return statement in doSomething() correspondingly.
Full code:
from multiprocessing import Pool
from collections import namedtuple
LogEntry = namedtuple("LogEntry", ['logLev', 'msg'])
def doSomething(x):
# Do actual work here
logCode = 1
statusStr = "Message Here"
return LogEntry(logLev=logCode, msg=statusStr)
def callbackFunc(result):
print(result.logLev)
print(result.msg)
def userAsyncCall():
pool = Pool()
return pool.apply_async(doSomething, [1], callback=callbackFunc)
if __name__ == "__main__":
c = userAsyncCall()
# To see whether there was an exception, you can attempt to get() the AsyncResult object.
# print c.get()
(To see the class definition, add verbose=True to namedtuple().)
The reason nothing is printed is that apply_async failed silently. By the way, I think this is a bad behavior which just make people confused. You can pass error_callback to handle error.
def errorCallback(exception):
print(exception)
def userAsyncCall():
pool = Pool()
pool.apply_async(doSomething, [1], callback=callbackFunc, error_callback=errorCallback)
# You passed wrong arguments. doSomething() takes 1 positional argument.
# I replace [1,2] with [1].
if __name__ == "__main__":
userAsyncCall()
import time
time.sleep(3) # You need this, otherwise you will never see the output.
When you came here, the output is
Error sending result: 'LogEntry(logLev=1, msg='Message Here')'. Reason: 'PicklingError("Can't pickle <class '__mp_main__.LogEntry'>: attribute lookup LogEntry on __mp_main__ failed",)'
PicklingError! You're right, namedtuple cannot be passed from the spawned process to the callback.
Maybe it's not a more accpetable way, but you can send dict as result instead of namedtuple.
As Dag Høidahl corrected, namedtuple can be passed. The following line works.
LogEntry = namedtuple("LogEntry", ['logLev', 'msg'])

multiprocessing ignores "__setstate__"

I assumed that the multiprocessing package used pickle to send things between processes. However, pickle pays attention to the __getstate__ and __setstate__ methods of an object. Multiprocessing seems to ignore them. Is this correct? Am I confused?
To replicate, install docker, and type into command line
$ docker run python:3.4 python -c "import pickle
import multiprocessing
import os
class Tricky:
def __init__(self,x):
self.data=x
def __setstate__(self,d):
self.data=10
def __getstate__(self):
return {}
def report(ar,q):
print('running report in pid %d, hailing from %d'%(os.getpid(),os.getppid()))
q.put(ar.data)
print('module loaded in pid %d, hailing from pid %d'%(os.getpid(),os.getppid()))
if __name__ == '__main__':
print('hello from pid %d'%os.getpid())
ar = Tricky(5)
q = multiprocessing.Queue()
p = multiprocessing.Process(target=report, args=(ar, q))
p.start()
p.join()
print(q.get())
print(pickle.loads(pickle.dumps(ar)).data)"
You should get something like
module loaded in pid 1, hailing from pid 0
hello from pid 1
running report in pid 5, hailing from 1
5
10
I would have thought it would have been "10" "10" but instead it is "5" "10". What could it mean?
(note: code edited to comply with programming guidelines, as suggested by user3667217)
The multiprocessing module can start one of three ways: spawn, fork, or forkserver. By default on unix, it forks. That means that there's no need to pickle anything that's already loaded into ram at the moment the new process is born.
If you need more direct control over how you want the fork to take place, you need to change the startup setting to spawn. To do this, create a context
ctx=multiprocessing.get_context('spawn')
and replace all calls to multiprocessing.foo() with calls to ctx.foo(). When you do this, every new process is born as a fresh python instance; everything that gets sent into it will be sent via pickle, instead of direct memcopy.
Reminder: when you're using multiprocessing, you need to start a process in an 'if __name__ == '__main__': clause: (see programming guidelines)
import pickle
import multiprocessing
class Tricky:
def __init__(self,x):
self.data=x
def __setstate__(self, d):
print('setstate happening')
self.data = 10
def __getstate__(self):
return self.data
print('getstate happening')
def report(ar,q):
q.put(ar.data)
if __name__ == '__main__':
ar = Tricky(5)
q = multiprocessing.Queue()
p = multiprocessing.Process(target=report, args=(ar, q))
print('now starting process')
p.start()
print('now joining process')
p.join()
print('now getting results from queue')
print(q.get())
print('now getting pickle dumps')
print(pickle.loads(pickle.dumps(ar)).data)
On windows, I see
now starting process
now joining process
setstate happening
now getting results from queue
10
now getting pickle dumps
setstate happening
10
On Ubuntu, I see:
now starting process
now joining process
now getting results from queue
5
now getting pickle dumps
getstate happening
setstate happening
10
I suppose this should answer your question. The multiprocess invokes __setstate__ method on Windows but not on Linux. And on Linux, when you call pickle.dumps it first call __getstate__, then __setstate__. It's interesting to see how multiprocessing module is behaving differently on different platforms.

Python - is there a threadsafe queue that can be pickled or otherwise serialized to disk?

I'm after a threadsafe queue that can be pickled or serialized to disk. Are there any datastructures in python that do this. The standard python Queue could not be pickled.
This can be done using the copy_reg module, but it's not the most elegant thing in the world:
import copy_reg
import threading
import pickle
from Queue import Queue as _Queue
# Make Queue a new-style class, so it can be used with copy_reg
class Queue(_Queue, object):
pass
def pickle_queue(q):
# Shallow copy of __dict__ (the underlying deque isn't actually copied, so this is fast)
q_dct = q.__dict__.copy()
# Remove all non-picklable synchronization primitives
del q_dct['mutex']
del q_dct['not_empty']
del q_dct['not_full']
del q_dct['all_tasks_done']
return Queue, (), q_dct
def unpickle_queue(state):
# Recreate our queue.
q = state[0]()
q.mutex = threading.Lock()
q.not_empty = threading.Condition(q.mutex)
q.not_full = threading.Condition(q.mutex)
q.all_tasks_done = threading.Condition(q.mutex)
q.__dict__ = state[2]
return q
copy_reg.pickle(Queue, pickle_queue, unpickle_queue)
q = Queue()
q.put("hey")
d = pickle.dumps(q)
new_q = pickle.loads(d)
print new_q.get()
# Outputs 'hey'
copy_reg allows you to register helper functions or pickling and unpickling arbitrary objects. So, we register a new-style version of the Queue class, and use the helper functions to remove all the unpickleable Lock/Condition instance variables prior to pickling, and add them back after unpickling.
There are modules like dill and cloudpickle that already know how to serialize a Queue.
They already have done the copy_reg for you.
>>> from Queue import Queue
>>> q = Queue()
>>> q.put('hey')
>>> import dill as pickle
>>> d = pickle.dumps(q)
>>> _q = pickle.loads(d)
>>> print _q.get()
hey
>>>
It's that easy! Just import dill as pickle and problem solved.
Get dill here: https://github.com/uqfoundation

Using Queue in python

I'm trying to run the following in Eclipse (using PyDev) and I keep getting error :
q = queue.Queue(maxsize=0)
NameError: global name 'queue' is not defined
I've checked the documentations and appears that is how its supposed to be placed. Am I missing something here? Is it how PyDev works? or missing something in the code? Thanks for all help.
from queue import *
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
def main():
q = queue.Queue(maxsize=0)
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
main()
Using:
Eclipse SDK
Version: 3.8.1
Build id: M20120914-1540
and Python 3.3
You do
from queue import *
This imports all the classes from the queue module already. Change that line to
q = Queue(maxsize=0)
CAREFUL: "Wildcard imports (from import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools". (Python PEP-8)
As an alternative, one could use:
from queue import Queue
q = Queue(maxsize=0)
That's because you're using : from queue import *
and then you're trying to use :
queue.Queue(maxsize=0)
remove the queue part, because from queue import * imports all the attributes to the current namespace. :
Queue(maxsize=0)
or use import queue instead of from queue import *.
If you import from queue import * this is mean that all classes and functions importing in you code fully. So you must not write name of the module, just q = Queue(maxsize=100). But if you want use classes with name of module: q = queue.Queue(maxsize=100) you mast write another import string: import queue, this is mean that you import all module with all functions not only all functions that in first case.
make sure your code is not under queue.py rename it to something else.
if your file name is queue.py it will try to search in the same file.
You Can install kombu with
pip install kombu
and then Import queue Just like this
from kombu import Queue

Categories

Resources