Issue with sharing data between Python processes with multiprocessing

Issue with sharing data between Python processes with multiprocessing - python

I've seen several posts about this, so I know it is fairly straightforward to do, but I seem to be coming up short. I'm not sure if I need to create a worker pool, or use the Queue class. Basically, I want to be able to create several processes that each act autonomously (which is why they inherit from the Agent superclass).
At random ticks of my main loop I want to update each Agent. I'm using time.sleep with different values in the main loop and the Agent's run loop to simulate different processor speeds.
Here is my Agent superclass:
# Generic class to handle mpc of each agent
class Agent(mpc.Process):
# initialize agent parameters
def __init__(self,):
# init mpc
mpc.Process.__init__(self)
self.exit = mpc.Event()
# an agent's main loop...generally should be overridden
def run(self):
while not self.exit.is_set():
pass
print "You exited!"
# safely shutdown an agent
def shutdown(self):
print "Shutdown initiated"
self.exit.set()
# safely communicate values to this agent
def communicate(self,value):
print value
A specific agent's subclass (simulating an HVAC system):
class HVAC(Agent):
def __init__(self, dt=70, dh=50.0):
super(Agent, self).__init__()
self.exit = mpc.Event()
self.__pref_heating = True
self.__pref_cooling = True
self.__desired_temperature = dt
self.__desired_humidity = dh
self.__meas_temperature = 0
self.__meas_humidity = 0.0
self.__hvac_status = "" # heating, cooling, off
self.start()
def run(self): # handle AC or heater on
while not self.exit.is_set():
ctemp = self.measureTemp()
chum = self.measureHumidity()
if (ctemp < self.__desired_temperature):
self.__hvac_status = 'heating'
self.__meas_temperature += 1
elif (ctemp > self.__desired_temperature):
self.__hvac_status = 'cooling'
self.__meas_temperature += 1
else:
self.__hvac_status = 'off'
print self.__hvac_status, self.__meas_temperature
time.sleep(0.5)
print "HVAC EXITED"
def measureTemp(self):
return self.__meas_temperature
def measureHumidity(self):
return self.__meas_humidity
def communicate(self,updates):
self.__meas_temperature = updates['temp']
self.__meas_humidity = updates['humidity']
print "Measured [%d] [%f]" % (self.__meas_temperature,self.__meas_humidity)
And my main loop:
if __name__ == "__main__":
print "Initializing subsystems"
agents = {}
agents['HVAC'] = HVAC()
# Run simulation
timestep = 0
while timestep < args.timesteps:
print "Timestep %d" % timestep
if timestep % 10 == 0:
curr_temp = random.randrange(68,72)
curr_humidity = random.uniform(40.0,60.0)
agents['HVAC'].communicate({'temp':curr_temp, 'humidity':curr_humidity})
time.sleep(1)
timestep += 1
agents['HVAC'].shutdown()
print "HVAC process state: %d" % agents['HVAC'].is_alive()
So the issue is that, whenever I run agents['HVAC'].communicate(x) within the main loop, I can see the value being passed into the HVAC subclass in its run loop (so it prints the received value correctly). However, the value never is successfully stored.
So typical output looks like this:
Initializing subsystems
Timestep 0
Measured [68] [56.948675]
heating 1
heating 2
Timestep 1
heating 3
heating 4
Timestep 2
heating 5
heating 6
When in reality, as soon as Measured [68] appears, the internal stored value should be updated to output 68 (not heating 1, heating 2, etc.). So effectively, the HVAC's self.__meas_temperature is not being properly updated.
Edit: After a bit of research, I realized that I didn't necessarily understand what is happening behind the scenes. Each subprocess operates with its own virtual chunk of memory and is completely abstracted away from any data being shared this way, so passing the value in isn't going to work. My new issue is that I'm not necessarily sure how to share a global value with multiple processes.
I was looking at the Queue or JoinableQueue packages, but I'm not necessarily sure how to pass a Queue into the type of superclass setup that I have (especially with the mpc.Process.__init__(self) call).
A side concern would be if I can have multiple agents reading values out of the queue without pulling it out of the queue? For instance, if I wanted to share a temperature value with multiple agents, would a Queue work for this?
Pipe v Queue

Here's a suggested solution assuming that you want the following:
a centralized manager / main process which controls lifetimes of the workers
worker processes to do something self-contained and then report results to the manager and other processes
Before I show it though, for the record I want to say that in general unless you are CPU bound multiprocessing is not really the right fit, mainly because of the added complexity, and you'd probably be better of using a different high-level asynchronous framework. Also, you should use python 3, it's so much better!
That said, multiprocessing.Manager, makes this pretty easy to do using multiprocessing. I've done this in python 3 but I don't think anything shouldn't "just work" in python 2, but I haven't checked.
from ctypes import c_bool
from multiprocessing import Manager, Process, Array, Value
from pprint import pprint
from time import sleep, time
class Agent(Process):
def __init__(self, name, shared_dictionary, delay=0.5):
"""My take on your Agent.
Key difference is that I've commonized the run-loop and used
a shared value to signal when to stop, to demonstrate it.
"""
super(Agent, self).__init__()
self.name = name
# This is going to be how we communicate between processes.
self.shared_dictionary = shared_dictionary
# Create a silo for us to use.
shared_dictionary[name] = []
self.should_stop = Value(c_bool, False)
# Primarily for testing purposes, and for simulating
# slower agents.
self.delay = delay
def get_next_results(self):
# In the real world I'd use abc.ABCMeta as the metaclass to do
# this properly.
raise RuntimeError('Subclasses must implement this')
def run(self):
ii = 0
while not self.should_stop.value:
ii += 1
# debugging / monitoring
print('%s %s run loop execution %d' % (
type(self).__name__, self.name, ii))
next_results = self.get_next_results()
# Add the results, along with a timestamp.
self.shared_dictionary[self.name] += [(time(), next_results)]
sleep(self.delay)
def stop(self):
self.should_stop.value = True
print('%s %s stopped' % (type(self).__name__, self.name))
class HVACAgent(Agent):
def get_next_results(self):
# This is where you do your work, but for the sake of
# the example just return a constant dictionary.
return {'temperature': 5, 'pressure': 7, 'humidity': 9}
class DumbReadingAgent(Agent):
"""A dumb agent to demonstrate workers reading other worker values."""
def get_next_results(self):
# get hvac 1 results:
hvac1_results = self.shared_dictionary.get('hvac 1')
if hvac1_results is None:
return None
return hvac1_results[-1][1]['temperature']
# Script starts.
results = {}
# The "with" ensures we terminate the manager at the end.
with Manager() as manager:
# the manager is a subprocess in its own right. We can ask
# it to manage a dictionary (or other python types) for us
# to be shared among the other children.
shared_info = manager.dict()
hvac_agent1 = HVACAgent('hvac 1', shared_info)
hvac_agent2 = HVACAgent('hvac 2', shared_info, delay=0.1)
dumb_agent = DumbReadingAgent('dumb hvac1 reader', shared_info)
agents = (hvac_agent1, hvac_agent2, dumb_agent)
list(map(lambda a: a.start(), agents))
sleep(1)
list(map(lambda a: a.stop(), agents))
list(map(lambda a: a.join(), agents))
# Not quite sure what happens to the shared dictionary after
# the manager dies, so for safety make a local copy.
results = dict(shared_info)
pprint(results)

Related

Multi Threading stuck - suspect of error in condition variable

I am testing an Ant Colony Optimisation (ACO) software which runs with multiple threads (1 for each ant created).
Each ACO iteration should wait for all threads to finish, before allowing the next iteration to start. I am doing this with "condition()" from threading module.
Since ants share a pherormone matrix, the reading and writing on that matrix is subject to locks, also from the threading module.
Now a description of the problem:
I run the function and print something at each iteration. Sometimes, not always, it seams that the execution of the function stops, this is, it stops printing, meaning the iteration never finished.
I honestly don't now why this is happening, and i would appreciate any answer that could get me in the right track. If i had to guess I would say that the condition variable is not properly called, or something like that. However I am not sure, and I also find it odd that this only happens sometimes.
Below are the relevant functions.
The ACO starts by calling the start() function. This creates N threads, which, when finished, call update(). This update function, upon being called N times, calls notify, which allows start() to continue the process and, finally, start the next iteration. I also posted the run method of each thread.
It may be worth mentioning that, without daemon actions, the error hardly occurs. With daemon actions, it occurs almost always (which i also find odd).
Finally, the error does not happen always in the same iteration.
def start(self):
self.ants = self.create_ants()
self.iter_counter = 0
while self.iter_counter < self.num_iterations:
print "START ACQUIRED"
self.cv.acquire()
print "calling iteration"
self.iteration()
#CV wait until all ants (threads) finish and call update, which
#calls notify(), and allow continuation
while not self.iter_done:
print "iter not complete, W8ING"
self.cv.wait()
print "global update "
self.global_update_with_lock()
print "START RELEASED"
self.cv.release()
def update(self, ant):
lock = Lock()
lock.acquire()
print "Update called by %s" % (ant.ID,)
self.ant_counter += 1
self.avg_path_cost += ant.path_cost
# book-keeping
if ant.path_cost < self.best_path_cost:
self.best_path_cost = ant.path_cost
self.best_path_mat = ant.path_mat
self.best_path_vec = ant.path_vec
self.last_best_path_iteration = self.iter_counter
#all threads finished, call notify
print "ant counter"
print self.ant_counter
if self.ant_counter == len(self.ants):
print "ants finished"
#THIS MIGHT CAUSE PROBLEMS (no need to notify if its no one waiting)
self.best_cost_at_iter.append(self.best_path_cost)
self.avg_path_cost /= len(self.ants)
self.cv.acquire()
self.iter_done = True
self.cv.notify()
self.cv.release()
lock.release()
# overide Thread's run()
def run(self):
graph = self.colony.graph
while not self.end():
# we need exclusive access to the graph
graph.lock.acquire()
new_node = self.state_transition_rule(self.curr_node)
self.path_cost += graph.delta(self.curr_node, new_node)
self.path_vec.append(new_node)
self.path_mat[self.curr_node][new_node] = 1 #adjacency matrix representing path
#print "Ant %s : %s, %s" % (self.ID, self.path_vec, self.path_cost,)
self.local_updating_rule(self.curr_node, new_node)
graph.lock.release()
self.curr_node = new_node
# close the tour
self.path_vec.append(self.path_vec[0])
#RUN LOCAL HEURISTIC
if self.daemon == True:
try:
daemon_result = twoOpt(self.path_vec, graph.delta_mat)
d_path, d_adj = daemon_result['path_vec'], daemon_result['path_matrix']
self.path_vec = d_path
self.path_mat = d_adj
except Exception, e:
print "exception: " + str(e)
traceback.print_exc()
self.path_cost += graph.delta(self.path_vec[-2], self.path_vec[-1])
# send our results to the colony
self.colony.update(self)
#print "Ant thread %s terminating." % (self.ID,)
# allows thread to be restarted (calls Thread.__init__)
self.__init__(self.ID, self.start_node, self.colony, self.daemon, self.Beta, self.Q0, self.Rho)
Solution to the problem:
First of all, i corrected the error in the waiting of the condition variables, in according to the comments here.
Second, it was still hanging sometimes, and this was due to a somewhat buggy mistake in the thread counter update.
The solution was to change the counter from an int, to an array with length num_threads, full of 0's, where each thread updates its position in the list. When all threads finish, the counter array should be all 1's. This is currently working just fine.

Python Multiprocessing with shared data source and multiple class instances

My program needs to spawn multiple instances of a class, each processing data that is coming from a streaming data source.
For example:
parameters = [1, 2, 3]
class FakeStreamingApi:
def __init__(self):
pass
def data(self):
return 42
pass
class DoStuff:
def __init__(self, parameter):
self.parameter = parameter
def run(self):
data = streaming_api.data()
output = self.parameter ** 2 + data # Some CPU intensive task
print output
streaming_api = FakeStreamingApi()
# Here's how this would work with no multiprocessing
instance_1 = DoStuff(parameters[0])
instance_1.run()
Once the instances are running they don't need to interact with each other, they just have to get the data as it comes in. (and print error messages, etc)
I am totally at a loss how to make this work with multiprocessing, since I first have to create a new instance of the class DoStuff, and then have it run.
This is definitely not the way to do it:
# Let's try multiprocessing
import multiprocessing
for parameter in parameters:
processes = [ multiprocessing.Process(target = DoStuff, args = (parameter)) ]
# Hmm, this doesn't work...
We could try defining a function to spawn classes, but that seems ugly:
import multiprocessing
def spawn_classes(parameter):
instance = DoStuff(parameter)
instance.run()
for parameter in parameters:
processes = [ multiprocessing.Process(target = spawn_classes, args = (parameter,)) ]
# Can't tell if it works -- no output on screen?
Plus, I don't want to have 3 different copies of the API interface class running, I want that data to be shared between all the processes... and as far as I can tell, multiprocessing creates copies of everything for each new process.
Ideas?
Edit:
I think I may have got it... is there anything wrong with this?
import multiprocessing
parameters = [1, 2, 3]
class FakeStreamingApi:
def __init__(self):
pass
def data(self):
return 42
pass
class Worker(multiprocessing.Process):
def __init__(self, parameter):
super(Worker, self).__init__()
self.parameter = parameter
def run(self):
data = streaming_api.data()
output = self.parameter ** 2 + data # Some CPU intensive task
print output
streaming_api = FakeStreamingApi()
if __name__ == '__main__':
jobs = []
for parameter in parameters:
p = Worker(parameter)
jobs.append(p)
p.start()
for j in jobs:
j.join()

I came to the conclusion that it would be necessary to use multiprocessing.Queues to solve this. The data source (the streaming API) needs to pass copies of the data to all the different processes, so they can consume it.
There's another way to solve this using the multiprocessing.Manager to create a shared dict, but I didn't explore it further, as it looks fairly inefficient and cannot propagate changes to inner values (e.g if you have a dict of lists, changes to the inner lists will not propagate).

Python: How to return values in a multiprocessing situation?

Let's say I have a collection of Process-es, a[0] through a[m].
These processes will then send a job, via a queue, to another collection of Process-es, b[0] through b[n], where m > n
Or, to diagram:
a[0], a[1], ..., a[m] ---Queue---> b[0], b[1], ..., b[n]
Now, how do I return the result of the b processes to the relevant a process?
My first guess was using multiprocessing.Pipe()
So, I've tried doing the following:
## On the 'a' side
pipe = multiprocessing.Pipe()
job['pipe'] = pipe
queue.put(job)
rslt = pipe[0].recv()
## On the 'b' side
job = queue.get()
... process the job ...
pipe = job['pipe']
pipe.send(result)
and it doesn't work with the error: Required argument 'handle' (pos 1) not found
Reading many docs, I came up with:
## On the 'a' side
pipe = multiprocessing.Pipe()
job['pipe'] = multiprocessing.reduction.reduce_connection(pipe[1])
queue.put(job)
rslt = pipe[0].recv()
## On the 'b' side
job = queue.get()
... process the job ...
pipe = multiprocessing.reduction.rebuild_connection(job['pipe'], True, True)
pipe.send(result)
Now I get a different error: ValueError: need more than 2 values to unpack.
I've tried searching and searching and still can't find how to properly use the reduce_ and rebuild_ methods.
Please help so I can return the value from b to a.

I would recommend to avoid using this movement of Pipe and file descriptors (last time I tried, it was not very standard and not very well documented). Having to deal with it was a pain, I do not recommend it :-/
I would suggest a different approach: let the main manage the connections. Keep a work queue, but sent the responses in a different path. This means that you need some kind of identifier for the threads. I will provide a toy implementation to illustrate my proposal:
#!/usr/bin/env python
import multiprocessing
import random
def fib(n):
"Slow fibonacci implementation because why not"
if n < 2:
return n
return fib(n-2) + fib(n-1)
def process_b(queue_in, queue_out):
print "Starting process B"
while True:
j = queue_in.get()
print "Job: %d" % j["val"]
j["result"] = fib(j["val"])
queue_out.put(j)
def process_a(index, pipe_end, queue):
print "Starting process A"
value = random.randint(5, 50)
j = {
"a_id": index,
"val": value,
}
queue.put(j)
r = pipe_end.recv()
print "Process A sent value %d and received: %s" % (value, r)
def main():
print "Starting main"
a_pipes = list()
jobs = multiprocessing.Queue()
done_jobs = multiprocessing.Queue()
for i in range(5):
multiprocessing.Process(target=process_b, args=(jobs, done_jobs,)).start()
for i in range(10):
receiver, sender = multiprocessing.Pipe(duplex=False)
a_pipes.append(sender)
multiprocessing.Process(target=process_a, args=(i, receiver, jobs)).start()
while True:
j = done_jobs.get()
a_pipes[j["a_id"]].send(j["result"])
if __name__ == "__main__":
main()
Note that the Queue of jobs is connected directly between a and b processes. a process is responsible to put their identifier (which the "master" should know). The b uses a different Queue for finished work. I used the same job dictionary, but typical implementation should use some more tailored data structure. This response should have the identifier of a in order for the master to send that to the specific process.
I assume that there is some way to use it with your approach, which I don't dislike at all (it would have been my first approach). But having to deal with file descriptors and the reduce_ and rebuild_ methods is not nice. Not at all.

So, as #MariusSiuram explained in this post, trying to pass a Connection object is an exercise in frustration.
I finally resorted to using a DictProxy to return values from B to A.
This is the concept:
### This is in the main process
...
jobs_queue = multiprocessing.Queue()
manager = multiprocessing.Manager()
ret_dict = manager.dict()
...
# Somewhere during Process initialization, jobs_queue and ret_dict got passed to
# the workers' constructor
...
### This is in the "A" (left-side) workers
...
self.ret_dict.pop(self.pid, None) # Remove our identifier if exist
self.jobs_queue.put({
'request': parameters_to_be_used_by_B,
'requester': self.pid
})
while self.pid not in self.ret_dict:
time.sleep(0.1) # Or any sane value
result = self.ret_dict[self.pid]
...
### This is in the "B" (right-side) workers
...
while True:
job = self.jobs_queue.get()
if job is None:
break
result = self.do_something(job['request'])
self.ret_dict[job['requester']] = result
...

beginners program designed using threads and classes

It seems that this question is too long for anyone to comment on... I'm trying to print out some text and a progress bar in a module called 'laulau.py'. Here's a test piece of code that shows a simple version. My goal is to have only one thread, and send information to it. My question is what is the best way to do this ?
file1 (test.py)
#!/usr/bin/env python
from laulau import laulau
import time
print "FIRST WAY"
total=107
t=laulau()
t.echo('this is text')
t.setbartotal(total)
for a in range(1,total):
t.updatebar(a)
time.sleep(0.01)
time.sleep(1)
print
print "\ndone loop\n"
t.stop()
time.sleep(1)
print "SECOND WAY"
with laulau().echo("this is text"):
time.sleep(1)
print "\nyes this is working\n"
time.sleep(2)
file2: laulau.py
#!/usr/bin/env python
# vim:fileencoding=utf8
from __future__ import division
import time
import string
import threading
from sys import stdout
class laulau(threading.Thread):
def __init__(self, arg=None):
super(laulau,self).__init__()
self._stop = False
self.block='█'
self.empty='□'
self.TEMPLATE = ('%(progress)s%(empty)s %(percent)3s%%')
self.progress = None
self.percent = 0
self.bar_width=30
self.bartotal=None
def run (self):
# start thread for text
while not self._stop:
if self.bartotal is None:
print self.arg,
stdout.flush()
time.sleep(0.3)
else:
self.progress = int((self.bar_width * self.percent) / 100)
self.data = self.TEMPLATE % {
'percent': self.percent,
'progress': self.block * self.progress,
'empty': self.empty * (self.bar_width - self.progress),
}
stdout.write('\033[%dG'%1 + self.data + self.arg)
stdout.flush()
time.sleep(0.1)
def setbartotal(self,total):
# set progress bar total
if self.bartotal is None:
self.bartotal = total
self.updatebar(0)
def updatebar (self,num):
self.num=num
self.percent = self.percentage(self.num)
def percentage (self,numagain):
return int((numagain/self.bartotal)*100+1)
def echo (self,arg="Default"):
#self.thread_debug()
self.arg=arg
self._stop = False
self.start()
return self
def thread_debug(self):
print "threading enumerate :%s"%threading.enumerate()
print "current thread :%s"%threading.currentThread()
print "thread count (including main thread):%s"%threading.activeCount()
def stop(self):
self._stop = True
def stopped(self):
return self._stop == True
def __enter__(self):
print "\nwe have come through the enter function\n"
return self
def __exit__(self, type, value, traceback):
self._stop = True
print "\nwe have exited through the exit function\n"
return isinstance(value, TypeError)
In some cases the second way could work. e.g., when I am printing some text, and just need the thread to die at the end of it, but not in the case of a progress bar when it needs updates sending to it. While this all sort of works, and I learned a lot, I still can't figure out how to encapsulate this class in the way I want. As I only want one thread I don't really need to keep instantiating the class, I just need to do this once.
so e.g. my ideal way would be having three functions only:
1 to control text, turn on progress bar etc (from within one parsed string)
2 to set the progress bar total
3 to set the progress bar iteration
I need to change two variables in the class (for the progress bar)
one for the total
one for the iteration
...and it works out percentage from that.
First I thought I should start the thread by inheriting the class stuff from threading, then after looking at threading.Thread(target=blah,etc) at first I couldn't see how to use more than one function, then I discovered I could just put the class name in there threading.Thread(target=laulau) and that would start a thread with the class in, but then I was stumped on how to send that thread information seeing as I hadn't assigned it to a 'name' as in t=laulau()
My second thought was to have functions outside of the class in my module, but because I need more than one function I got a bit confused there too by adding this to the beginning of laulau.py:
def eko (arg):
t=laulau()
t.echo(arg)
def barupate(iteration):
t.updatebar(a)
def bartotal():
t.setbartotal(a)
the first function made an instance of the class but the preceding functions could not change any variables within that. and then i came across function attributes such as this.
class Foo:
#webmethod
def bar(self, arg1, arg2):
...
def webmethod(func):
func.is_webmethod = True
return func
I then started thinking maybe I could use this somehow but have never come across it before.
Ideally id like something like this:
echo.total(107)
echo('[progressbar] this is text') # starts progress bar and instance of thread if not already there...
for a in range(1,total):
echo.updatebar(a)
time.sleep(0.01)
time.sleep(1)
echo.stop() # bar would stop at the end of iterations but text animations (blinking etc) may still be going at this point...
print
print "\ndone loop\n"
if you know about python you are probably looking at me funny now, but bear in mind that I'm a total beginner non-professional and am learning every day, a lot of it thanks to this site. cheers for any help!
edit: should add that I'm aware of the progress bar module and various other recipes, but I'm making this for learning and fun purposes :)

If you just need to print out a progress bar, use the sys module, like so:
import sys
import time
progress = "0" #this is an int meant to represent 0-100 percent as 0-100
old = "0" #this represents the last updates progress, so that this update only adds the difference and not the full progress
def updatebar(progress,old):
for item in range((progress-old)/2) #this takes the range of progress but divides it by 2, making the progress bar 50 characters long
sys.stdout.write("-") #to change the character used to fill the progress bar change the "-" to something else
sys.stdout.flush()
#you may not want to use a while loop here, this just has an example of how to use the update function as it adds one to the progress bar every second
while True:
progress += 1
updatebar(progress,old)
old = progress #sets the old progress as the current one, because next iteration of the while loop the previous progress will be this one's current progress.
time.sleep(1)

Dictionary+Queue Data Structure with Active Removal of Old Messages

I would like to create a data structure which represents a set of queues (ideally a hash, map, or dict like lookup) where messages in the queues are being actively removed after they've reached a certain age. The ttl value would be global; messages would not need nor have individual ttl's. The resolution for the ttl doesn't need to be terribly accurate - only within a second or so.
I'm not even sure what to search for here. I could create a separate global queue that a background thread is monitoring, peeking and pulling pointers to messages off the global queue that tell it to remove items from the individual queues, but the behavior needs to go both ways. If an item gets removed from an invidual queue, it needs to remove from the global queue.
I would like for this data structure to be implemented in Python, ideally, and as always, speed is of the utmost importance (more so than memory usage). Any suggestions for where to start?

I'd start by just modeling the behavior you're looking for in a single class, expressed as simply as possible. Performance can come later on through iterative optimization, but only if necessary (you may not need it).
The class below does something roughly like what you're describing. Queues are simply lists that are named and stored in dictionary. Each message is timestamped and inserted at the front of the list (FIFO). Messages are reaped by checking the timestamp of the message at the end of the list, and popping it until it hits a message that is below the age threshold.
If you plan to access this from several threads you'll need to add some fine-grained locking to squeeze the most performance out of it. For example, the reap() method should only lock 1 queue at a time, rather than locking all queues (method-level synchronization), so you'd also need to keep a lock for each named queue.
Updated -- Now uses a global set of buckets (by timestamp, 1 second resolution) to keep track of which queues have messages from that time. This reduces the number of queues to be checked on each pass.
import time
from collections import defaultdict
class QueueMap(object):
def __init__(self):
self._expire = defaultdict(lambda *n: defaultdict(int))
self._store = defaultdict(list)
self._oldest_key = int(time.time())
def get_queue(self, name):
return self._store.get(name, [])
def pop(self, name):
queue = self.get_queue(name)
if queue:
key, msg = queue.pop()
self._expire[key][name] -= 1
return msg
return None
def set(self, name, message):
key = int(time.time())
# increment count of messages in this bucket/queue
self._expire[key][name] += 1
self._store[name].insert(0, (key, message))
def reap(self, age):
now = time.time()
threshold = int(now - age)
oldest = self._oldest_key
# iterate over buckets we need to check
for key in range(oldest, threshold + 1):
# for each queue with items, expire the oldest ones
for name, count in self._expire[key].iteritems():
if count <= 0:
continue
queue = self.get_queue(name)
while queue:
if queue[-1][0] > threshold:
break
queue.pop()
del self._expire[key]
# set oldest_key for next pass
self._oldest_key = threshold
Usage:
qm = QueueMap()
qm.set('one', 'message 1')
qm.set('one', 'message 2')
qm.set('two', 'message 3')
print qm.pop('one')
print qm.get_queue('one')
print qm.get_queue('two')
# call this on a background thread which sleeps
time.sleep(2)
# reap messages older than 1 second
qm.reap(1)
# queues should be empty now
print qm.get_queue('one')
print qm.get_queue('two')

Consider checking the TTLs whenever you access the queues instead of using a thread to be constantly checking. I'm not sure what you mean about the hash/map/dict (what is the key?), but how about something like this:
import time
class EmptyException(Exception): pass
class TTLQueue(object):
TTL = 60 # seconds
def __init__(self):
self._queue = []
def push(self, msg):
self._queue.append((time.time()+self.TTL, msg))
def pop(self):
self._queue = [(t, msg) for (t, msg) in self._queue if t > time.time()]
if len(self._queue) == 0:
raise EmptyException()
return self._queue.pop(0)[1]
queues = [TTLQueue(), TTLQueue(), TTLQueue()] # this could be a dict or set or
# whatever if I knew what keys
# you expected

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Issue with sharing data between Python processes with multiprocessing - python

Related

Multi Threading stuck - suspect of error in condition variable

Python Multiprocessing with shared data source and multiple class instances

Python: How to return values in a multiprocessing situation?

beginners program designed using threads and classes

Dictionary+Queue Data Structure with Active Removal of Old Messages

Categories

Resources