I have a producer-consumer pattern Queue, it consumes incoming events and schedule qualified events sending out in 5 seconds. I am using threading.Timer()python documentto do it and everything was working fine.
Recently, I was requested to change the scheduled time from 5 second to 30 minutes, and threading.Timer() crashes my script because previously the threads objects are created and are released very soon(only last 5 sec) but now it has to keep alive for 30 minutes.
Here's the code:
if scheduled_time and out_event:
threading.Timer(scheduled_time, self.send_out_event, (socket_connection, received_event, out_event,)).start() # schedule event send out
Can somesone shed some light on this? How can I solve this problem or is there any alternative for threading.Timer()?
Thanks for #dano 's comment about the 3rd party modules! Based on my work requirement, I didn't install them on the server.
Instead of using threading.Timer(), I choose to use a Redis based Delay Queue, I found some helpful source online: A unique Python redis-based queue with delay. It solved my issue.
Briefly, the author creates a sorted set in redis and give it a name, add() would appends new data into the sorted set. Every time it pops at most one element from the sorted set based upon the epoch-time score, the element which holds qualified minimum score would be pop out (without removing from redis):
def add(self, received_event, delay_queue_name="delay_queue", delay=config.SECOND_RETRY_DELAY):
try:
score = int(time.time()) + delay
self.__client.zadd(delay_queue_name, score, received_event)
self.__logger.debug("added {0} to delay queue, delay time:{1}".format(received_event, delay))
except Exception as e:
self.__logger.error("error: {0}".format(e))
def pop(self, delay_queue_name="delay_queue"):
min_score, max_score, element = 0, int(time.time()), None
try:
result = self.__client.zrangebyscore(delay_queue_name, min_score, max_score, start=0, num=1, withscores=False)
except Exception as e:
self.__logger.error("failed query from redis:{0}".format(e))
return None
if result and len(result) == 1:
element = result[0]
self.__logger.debug("poped {0} from delay queue".format(element))
else:
self.__logger.debug("no qualified element")
return element
def remove(self, element, delay_queue_name="delay_queue"):
self.__client.zrem(delay_queue_name, element)
self.__client is a Redis client instance, redis.StrictRedis(host=rhost,port=rport, db=rindex).
The difference between the online source with mine is that I switched zadd() parameters. The order of score and data are switched. Below is docs of zadd():
# SORTED SET COMMANDS
def zadd(self, name, *args, **kwargs):
"""
Set any number of score, element-name pairs to the key ``name``. Pairs
can be specified in two ways:
As *args, in the form of: score1, name1, score2, name2, ...
or as **kwargs, in the form of: name1=score1, name2=score2, ...
The following example would add four values to the 'my-key' key:
redis.zadd('my-key', 1.1, 'name1', 2.2, 'name2', name3=3.3, name4=4.4)
"""
pieces = []
if args:
if len(args) % 2 != 0:
raise RedisError("ZADD requires an equal number of "
"values and scores")
pieces.extend(args)
for pair in iteritems(kwargs):
pieces.append(pair[1])
pieces.append(pair[0])
return self.execute_command('ZADD', name, *pieces)
Related
That is maybe a very basic question, and I can think of solutions, but I wondered if there is a more elegant one I don't know of (quick googling didn't bring anything useful).
I wrote a script to communicate to a remote device. However, now that I have more than one of that type, I thought I just make the communication in concurrent futures and handle it simultaneously:
with concurrent.futures.ThreadPoolExecutor(20) as executor:
executor.map(device_ctl, ids, repeat(args))
So it just calls up to 20 threads of device_ctl with respective IDs and the same args. device_ctl is now printing some results, but since they all run in parallel, it gets mixed up and looks messy. Ideally I could have 1 line per ID that shows the current state of the communication and gets updated once it changes, e.g.:
Dev1 Connecting...
Dev2 Connected! Status: Idle
Dev3 Connected! Status: Updating
However, I don't really know how to solve it nicely. I can think of a status list that outside of the threads gets assembled into one status string, which gets frequently updated. But it feels like there could be a simpler method! Ideas?
Since there was no good answer, I made my own solution, which is quite compact but efficient. I define a class that I call globally. Each thread populates it or updates a value based on its ID. The ID is meant to be the same list entry as taken for the thread. Here I made a simple example how to use it:
class collect:
ids = []
outs = []
LINE_UP = "\033[1A"
LINE_CLEAR = "\x1b[2K"
printed = 0
def init(list):
collect.ids = [i for i in list]
collect.outs = ["" for i in list]
collect.printall()
def write(id, out):
if id not in collect.ids:
collect.ids.append(id)
collect.outs.append(out)
else:
collect.outs[collect.ids.index(id)] = out
def writeout(id, out):
if id not in collect.ids:
collect.ids.append(str(id))
collect.outs.append(str(out))
else:
collect.outs[collect.ids.index(id)] = str(out)
collect.printall()
def append(id, out):
if id not in collect.ids:
collect.ids.append(str(id))
collect.outs.append(str(out))
else:
collect.outs[collect.ids.index(id)] += str(out)
def appendout(id, out):
if id not in collect.ids:
collect.ids.append(id)
collect.outs.append(out)
else:
collect.outs[collect.ids.index(id)] += str(out)
collect.printall()
def read(id):
return collect.outs[collect.ids.index(str(id))]
def readall():
return collect.outs, "\n".join(collect.outs)
def printall(filter=""):
if collect.printed > 0:
print(collect.LINE_CLEAR + collect.LINE_UP * len(collect.ids), end="")
print(
"\n".join(
[
collect.ids[i] + "\t" + collect.outs[i] + " " * 30
for i in range(len(collect.outs))
if filter in collect.ids[i]
]
)
)
collect.printed = len(collect.ids)
def device_ctl(id,args):
collect.writeout(id,"Connecting...")
if args.connected:
collect.writeout(id,"Connected")
collect.init(ids)
with concurrent.futures.ThreadPoolExecutor(20) as executor:
executor.map(device_ctl, ids, repeat(args))
I want to use other def values.
For example, I added a 'pt' in the 'clean_beds_process' definition and add 'Patients' in the 'run' definition.
I want to patient information when the 'clean_beds_process' function is called.
However, this makes this error 'AttributeError: type object 'Patients' has no attribute 'id''
I don't know why this happen.
Maybe I have something wrong understanding of mechanism of simpy.
Please let me know how can I use a patient information when 'clean_beds_process' function is called.
Thank you.
import simpy
import random
class Patients:
def __init__(self, p_id):
self.id = p_id
self.bed_name = ""
self.admission_decision = ""
def admin_decision(self):
admin_decision_prob = random.uniform(0, 1)
if admin_decision_prob <= 0.7:
self.admission_decision = "DIS"
else:
self.dmission_decision = "IU"
return self.admission_decision
class Model:
def __init__(self, run_number):
self.env = simpy.Environment()
self.pt_ed_q = simpy.Store(self.env )
self.pt_counter = 0
self.tg = simpy.Resource(self.env, capacity = 4)
self.physician = simpy.Resource(self.env, capacity = 4)
self.bed_clean = simpy.Store(self.env)
self.bed_dirty = simpy.Store(self.env)
self.IU_bed = simpy.Resource(self.env, capacity = 50)
def generate_beds(self):
for i in range(77):
yield self.env.timeout(0)
yield self.bed_clean.put(f'bed{i}')
def generate_pt_arrivals(self):
while True:
self.pt_counter += 1
pt = Patients(self.pt_counter)
yield self.env.timeout(5)
self.env.process(self.process(pt))
def clean_beds_process(self, cleaner_id, pt):
while True:
print(pt.id)
bed = yield self.bed_dirty.get()
yield self.env.timeout(50)
yield self.bed_clean.put(bed)
def process(self, pt):
with self.tg.request() as req:
yield req
yield self.env.timeout(10)
bed = yield self.bed_clean.get()
pt.bed_name = bed
pt.admin_decision()
if pt.admission_decision == "DIS":
with self.IU_bed.request() as req:
dirty_bed_name = pt.bed_name
yield self.bed_dirty.put(dirty_bed_name)
yield self.env.timeout(600)
else:
dirty_bed_name = pt.bed_name
yield self.bed_dirty.put(dirty_bed_name)
def run(self):
self.env.process(self.generate_pt_arrivals())
self.env.process(self.generate_beds())
for i in range(2):
self.env.process(self.clean_beds_process(i+1, Patients))
self.env.run(until = 650)
run_model = Model(0)
run_model.run()
So if a patient can use either a clean bed or a dirty bed then the patient needs to make two request (one for each type of bed) and use env.any_of to wait for the first request to fire. You also need to deal with the case that both events fire at the same time. Don't forget to cancel the request you do not use. If the request that fires is for a clean bed, things stay mostly the same. But if the request is for a dirty bed, then you need to add a step to clean the bed. For this I would make the cleaners Resources instead of processes. So the patient would request a cleaner, and do a timeout for the cleaning time, release the cleaner. To collect patient data I would create a log with the patient id, key event, time, and crunch these post sim to get the stats I need. To process the log I often create a dataframe that filters the log for the first, a second dataframe that filters for the second envent, join the two dataframes on patient id. Now both events for a patient is on one row so I can get the delta. once I have have the delta I can do a sum and count. For example, if my two events are when a patient arrives, and when a patient gets a bed, get the sum of deltas and divide by the count and I have the average time to bed.
If you remember, one of the first answers I gave you awhile ago had a example to get the first available bed from two different queues
I do not have a lot of time right know, but I hope this dissertation helps a bit
I have a python script that gets data from a USB weather station, now it puts the data into MySQL whenever the data is received from the station.
I have a MySQL class with an insert function, what i want i that the function checks if it has been run the last 5 minutes if it has, quit.
Could not find any code on the internet that does this.
Maybe I need to have a sub-process, but I am not familiar with that at all.
Does anyone have an example that I can use?
Use this timeout decorator.
import signal
class TimeoutError(Exception):
def __init__(self, value = "Timed Out"):
self.value = value
def __str__(self):
return repr(self.value)
def timeout(seconds_before_timeout):
def decorate(f):
def handler(signum, frame):
raise TimeoutError()
def new_f(*args, **kwargs):
old = signal.signal(signal.SIGALRM, handler)
signal.alarm(seconds_before_timeout)
try:
result = f(*args, **kwargs)
finally:
signal.signal(signal.SIGALRM, old)
signal.alarm(0)
return result
new_f.func_name = f.func_name
return new_f
return decorate
Usage:
import time
#timeout(5)
def mytest():
print "Start"
for i in range(1,10):
time.sleep(1)
print "%d seconds have passed" % i
if __name__ == '__main__':
mytest()
Probably the most straight-forward approach (you can put this into a decorator if you like, but that's just cosmetics I think):
import time
import datetime
class MySQLWrapper:
def __init__(self, min_period_seconds):
self.min_period = datetime.timedelta(seconds=min_period_seconds)
self.last_calltime = datetime.datetime.now() - self.min_period
def insert(self, item):
now = datetime.datetime.now()
if now-self.last_calltime < self.min_period:
print "not insert"
else:
self.last_calltime = now
print "insert", item
m = MySQLWrapper(5)
m.insert(1) # insert 1
m.insert(2) # not insert
time.sleep(5)
m.insert(3) # insert 3
As a side-note: Have you noticed RRDTool during your web-search for related stuff? It does apparantly what you want to achieve, i.e.
a database to store the most recent values of arbitrary resolution/update frequency.
extrapolation/interpolation of values if updates are too frequent or missing.
generates graphs from the data.
An approach could be to store all data you can get into your MySQL database and forward a subset to such RRDTool database to generate a nice time series visualization of it. Depending on what you might need.
import time
def timeout(f, k, n):
last_time = [time.time()]
count = [0]
def inner(*args, **kwargs):
distance = time.time() - last_time[0]
if distance > k:
last_time[0] = time.time()
count[0] = 0
return f(*args, **kwargs)
elif distance < k and (count[0]+1) == n:
return False
else:
count[0] += 1
return f(*args, **kwargs)
return inner
timed = timeout(lambda x, y : x + y, 300, 1)
print timed(2, 4)
First argument is the function you want run, second is the time interval, and the third is the number of times it's allowed to run in that time interval.
Each time the function is run save a file with the current time. When the function is run again check the time stored in the file and make sure it is old enough.
Just derive to a new class and override the insert function. In the overwriting function, check last insert time and call father's insert method if it has been more than five minutes, and of course update the most recent insert time.
Python has Queue.PriorityQueue, but I cannot see a way to make each value in it unique as there is no method for checking if a value already exists (like find(name) or similar). Moreover, PriorityQueue needs the priority to remain within the value, so I could not even search for my value, as I would also have to know the priority. You would use (0.5, myvalue) as value in PriorityQueue and then it would be sorted by the first element of the tuple.
The collections.deque class on the other hand does offer a function for checking if a value already exists and is even more natural in usage (without locking, but still atomic), but it does not offer a way to sort by priority.
There are some other implementations on stackoverflow with heapq, but heapq also uses priority within the value (e.g. at the first position of a tuple), so it seems not be great for comparison of already existing values.
Creating a python priority Queue
https://stackoverflow.com/questions/3306179/priority-queue-problem-in-python
What is the best way of creating a atomic priority queue (=can be used from multiple threads) with unique values?
Example what I’d like to add:
Priority: 0.2, Value: value1
Priority: 0.3, Value: value2
Priority: 0.1, Value: value3 (shall be retrieved first automatically)
Priority: 0.4, Value: value1 (shall not be added again, even though it has different priority)
You could combine a priority queue with a set:
import heapq
class PrioritySet(object):
def __init__(self):
self.heap = []
self.set = set()
def add(self, d, pri):
if not d in self.set:
heapq.heappush(self.heap, (pri, d))
self.set.add(d)
def pop(self):
pri, d = heapq.heappop(self.heap)
self.set.remove(d)
return d
This uses the priority queue specified in one of your linked questions. I don't know if this is what you want, but it's rather easy to add a set to any kind of queue this way.
Well here's one way to do it. I basically started from how they defined PriorityQueue in Queue.py and added a set into it to keep track of unique keys:
from Queue import PriorityQueue
import heapq
class UniquePriorityQueue(PriorityQueue):
def _init(self, maxsize):
# print 'init'
PriorityQueue._init(self, maxsize)
self.values = set()
def _put(self, item, heappush=heapq.heappush):
# print 'put',item
if item[1] not in self.values:
print 'uniq',item[1]
self.values.add(item[1])
PriorityQueue._put(self, item, heappush)
else:
print 'dupe',item[1]
def _get(self, heappop=heapq.heappop):
# print 'get'
item = PriorityQueue._get(self, heappop)
# print 'got',item
self.values.remove(item[1])
return item
if __name__=='__main__':
u = UniquePriorityQueue()
u.put((0.2, 'foo'))
u.put((0.3, 'bar'))
u.put((0.1, 'baz'))
u.put((0.4, 'foo'))
while not u.empty():
item = u.get_nowait()
print item
Boaz Yaniv beat me to the punch by a few minutes, but I figured I'd post mine too as it supports the full interface of PriorityQueue. I left some print statements uncommented, but commented out the ones I put in while debugging it. ;)
In case you want to prioritise a task later.
u = UniquePriorityQueue()
u.put((0.2, 'foo'))
u.put((0.3, 'bar'))
u.put((0.1, 'baz'))
u.put((0.4, 'foo'))
# Now `foo`'s priority is increased.
u.put((0.05, 'foo'))
Here is another implementation follows the official guide:
import heapq
import Queue
class UniquePriorityQueue(Queue.Queue):
"""
- https://github.com/python/cpython/blob/2.7/Lib/Queue.py
- https://docs.python.org/3/library/heapq.html
"""
def _init(self, maxsize):
self.queue = []
self.REMOVED = object()
self.entry_finder = {}
def _put(self, item, heappush=heapq.heappush):
item = list(item)
priority, task = item
if task in self.entry_finder:
previous_item = self.entry_finder[task]
previous_priority, _ = previous_item
if priority < previous_priority:
# Remove previous item.
previous_item[-1] = self.REMOVED
self.entry_finder[task] = item
heappush(self.queue, item)
else:
# Do not add new item.
pass
else:
self.entry_finder[task] = item
heappush(self.queue, item)
def _qsize(self, len=len):
return len(self.entry_finder)
def _get(self, heappop=heapq.heappop):
"""
The base makes sure this shouldn't be called if `_qsize` is 0.
"""
while self.queue:
item = heappop(self.queue)
_, task = item
if task is not self.REMOVED:
del self.entry_finder[task]
return item
raise KeyError('It should never happen: pop from an empty priority queue')
I like #Jonny Gaines Jr.'s answer but I think it can be simplified. PriorityQueue uses a list undert he hood, so you can just define:
class PrioritySetQueue(PriorityQueue):
def _put(self, item):
if item not in self.queue:
super(PrioritySetQueue, self)._put(item)
I would like to create a data structure which represents a set of queues (ideally a hash, map, or dict like lookup) where messages in the queues are being actively removed after they've reached a certain age. The ttl value would be global; messages would not need nor have individual ttl's. The resolution for the ttl doesn't need to be terribly accurate - only within a second or so.
I'm not even sure what to search for here. I could create a separate global queue that a background thread is monitoring, peeking and pulling pointers to messages off the global queue that tell it to remove items from the individual queues, but the behavior needs to go both ways. If an item gets removed from an invidual queue, it needs to remove from the global queue.
I would like for this data structure to be implemented in Python, ideally, and as always, speed is of the utmost importance (more so than memory usage). Any suggestions for where to start?
I'd start by just modeling the behavior you're looking for in a single class, expressed as simply as possible. Performance can come later on through iterative optimization, but only if necessary (you may not need it).
The class below does something roughly like what you're describing. Queues are simply lists that are named and stored in dictionary. Each message is timestamped and inserted at the front of the list (FIFO). Messages are reaped by checking the timestamp of the message at the end of the list, and popping it until it hits a message that is below the age threshold.
If you plan to access this from several threads you'll need to add some fine-grained locking to squeeze the most performance out of it. For example, the reap() method should only lock 1 queue at a time, rather than locking all queues (method-level synchronization), so you'd also need to keep a lock for each named queue.
Updated -- Now uses a global set of buckets (by timestamp, 1 second resolution) to keep track of which queues have messages from that time. This reduces the number of queues to be checked on each pass.
import time
from collections import defaultdict
class QueueMap(object):
def __init__(self):
self._expire = defaultdict(lambda *n: defaultdict(int))
self._store = defaultdict(list)
self._oldest_key = int(time.time())
def get_queue(self, name):
return self._store.get(name, [])
def pop(self, name):
queue = self.get_queue(name)
if queue:
key, msg = queue.pop()
self._expire[key][name] -= 1
return msg
return None
def set(self, name, message):
key = int(time.time())
# increment count of messages in this bucket/queue
self._expire[key][name] += 1
self._store[name].insert(0, (key, message))
def reap(self, age):
now = time.time()
threshold = int(now - age)
oldest = self._oldest_key
# iterate over buckets we need to check
for key in range(oldest, threshold + 1):
# for each queue with items, expire the oldest ones
for name, count in self._expire[key].iteritems():
if count <= 0:
continue
queue = self.get_queue(name)
while queue:
if queue[-1][0] > threshold:
break
queue.pop()
del self._expire[key]
# set oldest_key for next pass
self._oldest_key = threshold
Usage:
qm = QueueMap()
qm.set('one', 'message 1')
qm.set('one', 'message 2')
qm.set('two', 'message 3')
print qm.pop('one')
print qm.get_queue('one')
print qm.get_queue('two')
# call this on a background thread which sleeps
time.sleep(2)
# reap messages older than 1 second
qm.reap(1)
# queues should be empty now
print qm.get_queue('one')
print qm.get_queue('two')
Consider checking the TTLs whenever you access the queues instead of using a thread to be constantly checking. I'm not sure what you mean about the hash/map/dict (what is the key?), but how about something like this:
import time
class EmptyException(Exception): pass
class TTLQueue(object):
TTL = 60 # seconds
def __init__(self):
self._queue = []
def push(self, msg):
self._queue.append((time.time()+self.TTL, msg))
def pop(self):
self._queue = [(t, msg) for (t, msg) in self._queue if t > time.time()]
if len(self._queue) == 0:
raise EmptyException()
return self._queue.pop(0)[1]
queues = [TTLQueue(), TTLQueue(), TTLQueue()] # this could be a dict or set or
# whatever if I knew what keys
# you expected