Multiprocessing event-queue not updating

Multiprocessing event-queue not updating - python

So I'm writing a program with an event system.
I got a list of events to be handled.
One Process is supposed to push to the handler-list new events.
This part seems to work as I tried to print out the to-handle-list after pushing one event.
It gets longer and longer, while, when I print out the to handle list in the handle-event method, it is empty all the time.
Here is my event_handler code:
class Event_Handler:
def __init__(self):
self._to_handle_list = [deque() for _ in range(Event_Prio.get_num_prios()) ]
self._controll_handler= None
self._process_lock = Lock()
def init(self, controll_EV_handler):
self._controll_handler= controll_EV_handler
def new_event(self, event): #adds a new event to list
with self._process_lock:
self._to_handle_list[event.get_Prio()].append(event) #this List grows
def handle_event(self): #deals with the to_handle_list
self._process_lock.acquire()
for i in range(Event_Prio.get_num_prios()): #here i keep a list of empty deque
print(self._to_handle_list)
if (self._to_handle_list[i]): #checks if to-do is empty, never gets here that its not
self._process_lock.release()
self._controll_handler.controll_event(self._to_handle_list[i].popleft())
return
self._process_lock.release()
def create_Event(self, prio, type):
return Event(prio, type)
I tried everything. I checked if the event-handler-id is the same for both processes (plus the lock works)
I even checked if the to-handle-list-id is the same for both methods; yes it is.
Still the one in the one process grows, while the other is empty.
Can someone please tell me why the one list is empty?
Edit: It works just fine if I throw a event through the system with only one process. has to do sth with multiprocessing
Edit: Because someone asked, here is a simple usecase for it(I only used the essentials):
class EV_Main():
def __init__(self):
self.e_h = Event_Handler()
self.e_controll = None #the controller doesnt even matter because the controll-function never gets called....list is always empty
def run(self):
self.e_h.init(self.e_controll)
process1 = Process(target = self.create_events)
process2 = Process(target = self.handle_events)
process1.start()
process2.start()
def create_events(self):
while True:
self.e_h.new_event(self.e_h.create_Event(0, 3)) # eEvent_Type.S_TOUCH_EVENT
time.sleep(0.3)
def handle_events(self):
while True:
self.e_h.handle_event()
time.sleep(0.1)

To have a shareable set of deque instances, you could create a special class DequeArray which will hold an internal list of deque instances and expose whatever methods you might need. Then I would turn this into a shareable, managed object. When the manager creates an instance of this class, what is returned is a proxy to the actual instance that resides in the manager's address space. Any method calls you make on this proxy are actually shipped of to the manager's process using pickle and any results returned the same way. Since the individual deque instances are not shareable, managed objects, do not add a method that returns one of these deque instances which is then modified without being cognizant that the version of the deque in the manager's address space has not been modified.
Individual operations on a deque are serialized. But if you are doing some operation on a deque that consists of multiple method calls on the deque and you require atomicity, then that sequence is a critical section that needs to be done under control of a lock, as in the left_rotate function below.
from multiprocessing import Process, Lock
from multiprocessing.managers import BaseManager
from collections import deque
# Add methods to this as required:
class DequeArray:
def __init__(self, array_size):
self._deques = [deque() for _ in range(array_size)]
def __repr__(self):
l = []
l.append('DequeArray [')
for d in self._deques:
l.append(' ' + str(d))
l.append(']')
return '\n'.join(l)
def __len__(self):
"""
Return our length (i.e. the number of deque
instances we have).
"""
return len(self._deques)
def append(self, i, value):
"""
Append value to the ith deque
"""
self._deques[i].append(value)
def popleft(self, i):
"""
Eexcute a popleft operation on the ith deque
and return the result.
"""
return self._deques[i].popleft()
def length(self, i):
"""
Return length of the ith dequeue.
"""
return len(self._deques[i])
class DequeArrayManager(BaseManager):
pass
DequeArrayManager.register('DequeArray', DequeArray)
# Demonstrate how to use a sharable DequeArray
def left_rotate(deque_array, lock, i):
# Rotate first element to be last element:
# This is not an atomic operation, so do under control of a lock:
with lock:
deque_array.append(i, deque_array.popleft(i))
# Required for Windows:
if __name__ == '__main__':
# This starts the manager process:
with DequeArrayManager() as manager:
# Two deques:
deque_array = manager.DequeArray(2)
# Initialize with some values:
deque_array.append(0, 0)
deque_array.append(0, 1)
deque_array.append(0, 2)
# Same values in second deque:
deque_array.append(1, 0)
deque_array.append(1, 1)
deque_array.append(1, 2)
print(deque_array)
# Both processses will be modifying the same deque in a
# non-atomic way, so we definitely need to be doing this under
# control of a lock. We don't care which process acquires the
# lock first because the results will be the same regardless.
lock = Lock()
p1 = Process(target=left_rotate, args=(deque_array, lock, 0))
p2 = Process(target=left_rotate, args=(deque_array, lock, 0))
p1.start()
p2.start()
p1.join()
p2.join()
print(deque_array)
Prints:
DequeArray [
deque([0, 1, 2])
deque([0, 1, 2])
]
DequeArray [
deque([2, 0, 1])
deque([0, 1, 2])
]

Related

How do I wait for ray on Actor class?

I am developing Actor class and ray.wait() to collect the results.
Below is the code and console outputs which is collecting the result for only 2 Actors when there are 3 Actors.
import time
import ray
#ray.remote
class Tester:
def __init__(self, param):
self.param = param
def run(self):
return self.param
params = [0,1,2]
testers = []
for p in params:
tester = Tester.remote(p)
testers.append(tester)
runs = []
for i, tester in enumerate(testers):
runs.append(tester.run.remote())
while len(runs):
done_id, result_ids = ray.wait(runs)
#runs size is not decreasing
result = ray.get(done_id[0])
print('result:{}'.format(result))
time.sleep(1)
result:2
(pid=819202)
(pid=819200)
(pid=819198)
result:1
result:0
result:0
result:0
result:0
result:0
...
...
...
The console is printing out forever because the runs variable's size is not reduced.
When I call ray.wait(runs) and get the done_id, runs's element with the done_id should be removed, but it is not removed.
I want the console output to be like below.
result:2
(pid=819202)
(pid=819200)
(pid=819198)
result:1
result:0

The script you provided is using ray.wait incorrectly. The following code does what you want:
import time
import ray
#ray.remote
class Tester:
def __init__(self, param):
self.param = param
def run(self):
return self.param
params = [0, 1, 2]
# I use list comprehensions instead of for loops for terseness.
testers = [Tester.remote(p) for p in params]
not_done_ids = [tester.run.remote() for tester in testers]
# len() is not required to check that the list is empty.
while not_done_ids:
# Replace not_done_ids with the list of object references that aren't
# ready. Store the list of object references that are ready in done_ids.
# timeout=1 means sleep at most 1 second, do not sleep if there are
# new object references that are ready.
done_ids, not_done_ids = ray.wait(not_done_ids, timeout=1)
# ray.get can take an iterable of object references.
done_return_values = ray.get(done_ids)
# Process each result.
for result in done_return_values:
print(f'result: {result}')
I added the following fixes:
ray.wait returns two lists, a list of objects that are ready, and a list of objects that may or may not be ready. You should iterate over the first list to get all object references that are ready.
Your while loop goes forever until the list is empty. I simply replaced the runs list with not_done_ids so that once all object references are ready, the while loop breaks.
ray.wait supports sleeping, with timeout. I removed your sleep and added timeout=1, which enables the program to run more efficiently (there is no sleep if another object is ready!).

Incremented dict does not keep the value

I have the following situation:
class Test:
cities_visited: dict
#staticmethod
def prepare_city_dict(persons):
Test.cities_visited = {}
for i in range(len(persons)):
name = persons[i].surname
Test.cities_visited[name] = Test.create_visit()
#staticmethod
def create_visit():
counter: dict = {"City1": 0, "City2": 0, "City3": 0}
return counter
#staticmethod
def increment_visit(surname: str, key):
counter_visit = Test.cities_visited[surname]
current_value = counter_visit[key]
print(current_value)
counter_visit[key] = current_value + 1
Test.cities_visited[surname] = counter_visit
At start-up I am calling Test.prepare_city_dict, and then I create a thread and do a lock and call other stuff, at some point I try to increment 2 cities:
Test.increment_visit("Dummy", "City1")
Test.increment_visit("Dummy", "City2")
If I am trying to log how many times a city was visited, only the 'City1' is correctly implemented.
I am coming from a different language (which is pretty obvious I think :D), running my code in a docker container on the Windows OS, everything is incremented properly.
Running the same configuration (container) under Linux OS, only the first 'City1' is properly incremented.
I taught it was a race condition, but unfortunately I cannot reproduce it and I cannot figure out what is going on.
+++ UPDATE:
class TestClass:
def main():
Test.prepare_city_dict(persons)
lock = threading.Lock()
thread = threading.Thread(target=TestClass.process_message,
args=(lock, persons,))
thread.start()
def process_message(lock, persons):
lock.acquire()
Test.increment_visit("Dummy", "City1")
..... -> lots of calculations
Test.increment_visit("Dummy", "City2")
lock.release()

Python Multiprocessing with shared data source and multiple class instances

My program needs to spawn multiple instances of a class, each processing data that is coming from a streaming data source.
For example:
parameters = [1, 2, 3]
class FakeStreamingApi:
def __init__(self):
pass
def data(self):
return 42
pass
class DoStuff:
def __init__(self, parameter):
self.parameter = parameter
def run(self):
data = streaming_api.data()
output = self.parameter ** 2 + data # Some CPU intensive task
print output
streaming_api = FakeStreamingApi()
# Here's how this would work with no multiprocessing
instance_1 = DoStuff(parameters[0])
instance_1.run()
Once the instances are running they don't need to interact with each other, they just have to get the data as it comes in. (and print error messages, etc)
I am totally at a loss how to make this work with multiprocessing, since I first have to create a new instance of the class DoStuff, and then have it run.
This is definitely not the way to do it:
# Let's try multiprocessing
import multiprocessing
for parameter in parameters:
processes = [ multiprocessing.Process(target = DoStuff, args = (parameter)) ]
# Hmm, this doesn't work...
We could try defining a function to spawn classes, but that seems ugly:
import multiprocessing
def spawn_classes(parameter):
instance = DoStuff(parameter)
instance.run()
for parameter in parameters:
processes = [ multiprocessing.Process(target = spawn_classes, args = (parameter,)) ]
# Can't tell if it works -- no output on screen?
Plus, I don't want to have 3 different copies of the API interface class running, I want that data to be shared between all the processes... and as far as I can tell, multiprocessing creates copies of everything for each new process.
Ideas?
Edit:
I think I may have got it... is there anything wrong with this?
import multiprocessing
parameters = [1, 2, 3]
class FakeStreamingApi:
def __init__(self):
pass
def data(self):
return 42
pass
class Worker(multiprocessing.Process):
def __init__(self, parameter):
super(Worker, self).__init__()
self.parameter = parameter
def run(self):
data = streaming_api.data()
output = self.parameter ** 2 + data # Some CPU intensive task
print output
streaming_api = FakeStreamingApi()
if __name__ == '__main__':
jobs = []
for parameter in parameters:
p = Worker(parameter)
jobs.append(p)
p.start()
for j in jobs:
j.join()

I came to the conclusion that it would be necessary to use multiprocessing.Queues to solve this. The data source (the streaming API) needs to pass copies of the data to all the different processes, so they can consume it.
There's another way to solve this using the multiprocessing.Manager to create a shared dict, but I didn't explore it further, as it looks fairly inefficient and cannot propagate changes to inner values (e.g if you have a dict of lists, changes to the inner lists will not propagate).

Issue with sharing data between Python processes with multiprocessing

I've seen several posts about this, so I know it is fairly straightforward to do, but I seem to be coming up short. I'm not sure if I need to create a worker pool, or use the Queue class. Basically, I want to be able to create several processes that each act autonomously (which is why they inherit from the Agent superclass).
At random ticks of my main loop I want to update each Agent. I'm using time.sleep with different values in the main loop and the Agent's run loop to simulate different processor speeds.
Here is my Agent superclass:
# Generic class to handle mpc of each agent
class Agent(mpc.Process):
# initialize agent parameters
def __init__(self,):
# init mpc
mpc.Process.__init__(self)
self.exit = mpc.Event()
# an agent's main loop...generally should be overridden
def run(self):
while not self.exit.is_set():
pass
print "You exited!"
# safely shutdown an agent
def shutdown(self):
print "Shutdown initiated"
self.exit.set()
# safely communicate values to this agent
def communicate(self,value):
print value
A specific agent's subclass (simulating an HVAC system):
class HVAC(Agent):
def __init__(self, dt=70, dh=50.0):
super(Agent, self).__init__()
self.exit = mpc.Event()
self.__pref_heating = True
self.__pref_cooling = True
self.__desired_temperature = dt
self.__desired_humidity = dh
self.__meas_temperature = 0
self.__meas_humidity = 0.0
self.__hvac_status = "" # heating, cooling, off
self.start()
def run(self): # handle AC or heater on
while not self.exit.is_set():
ctemp = self.measureTemp()
chum = self.measureHumidity()
if (ctemp < self.__desired_temperature):
self.__hvac_status = 'heating'
self.__meas_temperature += 1
elif (ctemp > self.__desired_temperature):
self.__hvac_status = 'cooling'
self.__meas_temperature += 1
else:
self.__hvac_status = 'off'
print self.__hvac_status, self.__meas_temperature
time.sleep(0.5)
print "HVAC EXITED"
def measureTemp(self):
return self.__meas_temperature
def measureHumidity(self):
return self.__meas_humidity
def communicate(self,updates):
self.__meas_temperature = updates['temp']
self.__meas_humidity = updates['humidity']
print "Measured [%d] [%f]" % (self.__meas_temperature,self.__meas_humidity)
And my main loop:
if __name__ == "__main__":
print "Initializing subsystems"
agents = {}
agents['HVAC'] = HVAC()
# Run simulation
timestep = 0
while timestep < args.timesteps:
print "Timestep %d" % timestep
if timestep % 10 == 0:
curr_temp = random.randrange(68,72)
curr_humidity = random.uniform(40.0,60.0)
agents['HVAC'].communicate({'temp':curr_temp, 'humidity':curr_humidity})
time.sleep(1)
timestep += 1
agents['HVAC'].shutdown()
print "HVAC process state: %d" % agents['HVAC'].is_alive()
So the issue is that, whenever I run agents['HVAC'].communicate(x) within the main loop, I can see the value being passed into the HVAC subclass in its run loop (so it prints the received value correctly). However, the value never is successfully stored.
So typical output looks like this:
Initializing subsystems
Timestep 0
Measured [68] [56.948675]
heating 1
heating 2
Timestep 1
heating 3
heating 4
Timestep 2
heating 5
heating 6
When in reality, as soon as Measured [68] appears, the internal stored value should be updated to output 68 (not heating 1, heating 2, etc.). So effectively, the HVAC's self.__meas_temperature is not being properly updated.
Edit: After a bit of research, I realized that I didn't necessarily understand what is happening behind the scenes. Each subprocess operates with its own virtual chunk of memory and is completely abstracted away from any data being shared this way, so passing the value in isn't going to work. My new issue is that I'm not necessarily sure how to share a global value with multiple processes.
I was looking at the Queue or JoinableQueue packages, but I'm not necessarily sure how to pass a Queue into the type of superclass setup that I have (especially with the mpc.Process.__init__(self) call).
A side concern would be if I can have multiple agents reading values out of the queue without pulling it out of the queue? For instance, if I wanted to share a temperature value with multiple agents, would a Queue work for this?
Pipe v Queue

Here's a suggested solution assuming that you want the following:
a centralized manager / main process which controls lifetimes of the workers
worker processes to do something self-contained and then report results to the manager and other processes
Before I show it though, for the record I want to say that in general unless you are CPU bound multiprocessing is not really the right fit, mainly because of the added complexity, and you'd probably be better of using a different high-level asynchronous framework. Also, you should use python 3, it's so much better!
That said, multiprocessing.Manager, makes this pretty easy to do using multiprocessing. I've done this in python 3 but I don't think anything shouldn't "just work" in python 2, but I haven't checked.
from ctypes import c_bool
from multiprocessing import Manager, Process, Array, Value
from pprint import pprint
from time import sleep, time
class Agent(Process):
def __init__(self, name, shared_dictionary, delay=0.5):
"""My take on your Agent.
Key difference is that I've commonized the run-loop and used
a shared value to signal when to stop, to demonstrate it.
"""
super(Agent, self).__init__()
self.name = name
# This is going to be how we communicate between processes.
self.shared_dictionary = shared_dictionary
# Create a silo for us to use.
shared_dictionary[name] = []
self.should_stop = Value(c_bool, False)
# Primarily for testing purposes, and for simulating
# slower agents.
self.delay = delay
def get_next_results(self):
# In the real world I'd use abc.ABCMeta as the metaclass to do
# this properly.
raise RuntimeError('Subclasses must implement this')
def run(self):
ii = 0
while not self.should_stop.value:
ii += 1
# debugging / monitoring
print('%s %s run loop execution %d' % (
type(self).__name__, self.name, ii))
next_results = self.get_next_results()
# Add the results, along with a timestamp.
self.shared_dictionary[self.name] += [(time(), next_results)]
sleep(self.delay)
def stop(self):
self.should_stop.value = True
print('%s %s stopped' % (type(self).__name__, self.name))
class HVACAgent(Agent):
def get_next_results(self):
# This is where you do your work, but for the sake of
# the example just return a constant dictionary.
return {'temperature': 5, 'pressure': 7, 'humidity': 9}
class DumbReadingAgent(Agent):
"""A dumb agent to demonstrate workers reading other worker values."""
def get_next_results(self):
# get hvac 1 results:
hvac1_results = self.shared_dictionary.get('hvac 1')
if hvac1_results is None:
return None
return hvac1_results[-1][1]['temperature']
# Script starts.
results = {}
# The "with" ensures we terminate the manager at the end.
with Manager() as manager:
# the manager is a subprocess in its own right. We can ask
# it to manage a dictionary (or other python types) for us
# to be shared among the other children.
shared_info = manager.dict()
hvac_agent1 = HVACAgent('hvac 1', shared_info)
hvac_agent2 = HVACAgent('hvac 2', shared_info, delay=0.1)
dumb_agent = DumbReadingAgent('dumb hvac1 reader', shared_info)
agents = (hvac_agent1, hvac_agent2, dumb_agent)
list(map(lambda a: a.start(), agents))
sleep(1)
list(map(lambda a: a.stop(), agents))
list(map(lambda a: a.join(), agents))
# Not quite sure what happens to the shared dictionary after
# the manager dies, so for safety make a local copy.
results = dict(shared_info)
pprint(results)

Dictionary+Queue Data Structure with Active Removal of Old Messages

I would like to create a data structure which represents a set of queues (ideally a hash, map, or dict like lookup) where messages in the queues are being actively removed after they've reached a certain age. The ttl value would be global; messages would not need nor have individual ttl's. The resolution for the ttl doesn't need to be terribly accurate - only within a second or so.
I'm not even sure what to search for here. I could create a separate global queue that a background thread is monitoring, peeking and pulling pointers to messages off the global queue that tell it to remove items from the individual queues, but the behavior needs to go both ways. If an item gets removed from an invidual queue, it needs to remove from the global queue.
I would like for this data structure to be implemented in Python, ideally, and as always, speed is of the utmost importance (more so than memory usage). Any suggestions for where to start?

I'd start by just modeling the behavior you're looking for in a single class, expressed as simply as possible. Performance can come later on through iterative optimization, but only if necessary (you may not need it).
The class below does something roughly like what you're describing. Queues are simply lists that are named and stored in dictionary. Each message is timestamped and inserted at the front of the list (FIFO). Messages are reaped by checking the timestamp of the message at the end of the list, and popping it until it hits a message that is below the age threshold.
If you plan to access this from several threads you'll need to add some fine-grained locking to squeeze the most performance out of it. For example, the reap() method should only lock 1 queue at a time, rather than locking all queues (method-level synchronization), so you'd also need to keep a lock for each named queue.
Updated -- Now uses a global set of buckets (by timestamp, 1 second resolution) to keep track of which queues have messages from that time. This reduces the number of queues to be checked on each pass.
import time
from collections import defaultdict
class QueueMap(object):
def __init__(self):
self._expire = defaultdict(lambda *n: defaultdict(int))
self._store = defaultdict(list)
self._oldest_key = int(time.time())
def get_queue(self, name):
return self._store.get(name, [])
def pop(self, name):
queue = self.get_queue(name)
if queue:
key, msg = queue.pop()
self._expire[key][name] -= 1
return msg
return None
def set(self, name, message):
key = int(time.time())
# increment count of messages in this bucket/queue
self._expire[key][name] += 1
self._store[name].insert(0, (key, message))
def reap(self, age):
now = time.time()
threshold = int(now - age)
oldest = self._oldest_key
# iterate over buckets we need to check
for key in range(oldest, threshold + 1):
# for each queue with items, expire the oldest ones
for name, count in self._expire[key].iteritems():
if count <= 0:
continue
queue = self.get_queue(name)
while queue:
if queue[-1][0] > threshold:
break
queue.pop()
del self._expire[key]
# set oldest_key for next pass
self._oldest_key = threshold
Usage:
qm = QueueMap()
qm.set('one', 'message 1')
qm.set('one', 'message 2')
qm.set('two', 'message 3')
print qm.pop('one')
print qm.get_queue('one')
print qm.get_queue('two')
# call this on a background thread which sleeps
time.sleep(2)
# reap messages older than 1 second
qm.reap(1)
# queues should be empty now
print qm.get_queue('one')
print qm.get_queue('two')

Consider checking the TTLs whenever you access the queues instead of using a thread to be constantly checking. I'm not sure what you mean about the hash/map/dict (what is the key?), but how about something like this:
import time
class EmptyException(Exception): pass
class TTLQueue(object):
TTL = 60 # seconds
def __init__(self):
self._queue = []
def push(self, msg):
self._queue.append((time.time()+self.TTL, msg))
def pop(self):
self._queue = [(t, msg) for (t, msg) in self._queue if t > time.time()]
if len(self._queue) == 0:
raise EmptyException()
return self._queue.pop(0)[1]
queues = [TTLQueue(), TTLQueue(), TTLQueue()] # this could be a dict or set or
# whatever if I knew what keys
# you expected

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing event-queue not updating - python

Related

How do I wait for ray on Actor class?

Incremented dict does not keep the value

Python Multiprocessing with shared data source and multiple class instances

Issue with sharing data between Python processes with multiprocessing

Dictionary+Queue Data Structure with Active Removal of Old Messages

Categories

Resources