Dictionary+Queue Data Structure with Active Removal of Old Messages

Dictionary+Queue Data Structure with Active Removal of Old Messages - python

I would like to create a data structure which represents a set of queues (ideally a hash, map, or dict like lookup) where messages in the queues are being actively removed after they've reached a certain age. The ttl value would be global; messages would not need nor have individual ttl's. The resolution for the ttl doesn't need to be terribly accurate - only within a second or so.
I'm not even sure what to search for here. I could create a separate global queue that a background thread is monitoring, peeking and pulling pointers to messages off the global queue that tell it to remove items from the individual queues, but the behavior needs to go both ways. If an item gets removed from an invidual queue, it needs to remove from the global queue.
I would like for this data structure to be implemented in Python, ideally, and as always, speed is of the utmost importance (more so than memory usage). Any suggestions for where to start?

I'd start by just modeling the behavior you're looking for in a single class, expressed as simply as possible. Performance can come later on through iterative optimization, but only if necessary (you may not need it).
The class below does something roughly like what you're describing. Queues are simply lists that are named and stored in dictionary. Each message is timestamped and inserted at the front of the list (FIFO). Messages are reaped by checking the timestamp of the message at the end of the list, and popping it until it hits a message that is below the age threshold.
If you plan to access this from several threads you'll need to add some fine-grained locking to squeeze the most performance out of it. For example, the reap() method should only lock 1 queue at a time, rather than locking all queues (method-level synchronization), so you'd also need to keep a lock for each named queue.
Updated -- Now uses a global set of buckets (by timestamp, 1 second resolution) to keep track of which queues have messages from that time. This reduces the number of queues to be checked on each pass.
import time
from collections import defaultdict
class QueueMap(object):
def __init__(self):
self._expire = defaultdict(lambda *n: defaultdict(int))
self._store = defaultdict(list)
self._oldest_key = int(time.time())
def get_queue(self, name):
return self._store.get(name, [])
def pop(self, name):
queue = self.get_queue(name)
if queue:
key, msg = queue.pop()
self._expire[key][name] -= 1
return msg
return None
def set(self, name, message):
key = int(time.time())
# increment count of messages in this bucket/queue
self._expire[key][name] += 1
self._store[name].insert(0, (key, message))
def reap(self, age):
now = time.time()
threshold = int(now - age)
oldest = self._oldest_key
# iterate over buckets we need to check
for key in range(oldest, threshold + 1):
# for each queue with items, expire the oldest ones
for name, count in self._expire[key].iteritems():
if count <= 0:
continue
queue = self.get_queue(name)
while queue:
if queue[-1][0] > threshold:
break
queue.pop()
del self._expire[key]
# set oldest_key for next pass
self._oldest_key = threshold
Usage:
qm = QueueMap()
qm.set('one', 'message 1')
qm.set('one', 'message 2')
qm.set('two', 'message 3')
print qm.pop('one')
print qm.get_queue('one')
print qm.get_queue('two')
# call this on a background thread which sleeps
time.sleep(2)
# reap messages older than 1 second
qm.reap(1)
# queues should be empty now
print qm.get_queue('one')
print qm.get_queue('two')

Consider checking the TTLs whenever you access the queues instead of using a thread to be constantly checking. I'm not sure what you mean about the hash/map/dict (what is the key?), but how about something like this:
import time
class EmptyException(Exception): pass
class TTLQueue(object):
TTL = 60 # seconds
def __init__(self):
self._queue = []
def push(self, msg):
self._queue.append((time.time()+self.TTL, msg))
def pop(self):
self._queue = [(t, msg) for (t, msg) in self._queue if t > time.time()]
if len(self._queue) == 0:
raise EmptyException()
return self._queue.pop(0)[1]
queues = [TTLQueue(), TTLQueue(), TTLQueue()] # this could be a dict or set or
# whatever if I knew what keys
# you expected

Related

Multiprocessing event-queue not updating

So I'm writing a program with an event system.
I got a list of events to be handled.
One Process is supposed to push to the handler-list new events.
This part seems to work as I tried to print out the to-handle-list after pushing one event.
It gets longer and longer, while, when I print out the to handle list in the handle-event method, it is empty all the time.
Here is my event_handler code:
class Event_Handler:
def __init__(self):
self._to_handle_list = [deque() for _ in range(Event_Prio.get_num_prios()) ]
self._controll_handler= None
self._process_lock = Lock()
def init(self, controll_EV_handler):
self._controll_handler= controll_EV_handler
def new_event(self, event): #adds a new event to list
with self._process_lock:
self._to_handle_list[event.get_Prio()].append(event) #this List grows
def handle_event(self): #deals with the to_handle_list
self._process_lock.acquire()
for i in range(Event_Prio.get_num_prios()): #here i keep a list of empty deque
print(self._to_handle_list)
if (self._to_handle_list[i]): #checks if to-do is empty, never gets here that its not
self._process_lock.release()
self._controll_handler.controll_event(self._to_handle_list[i].popleft())
return
self._process_lock.release()
def create_Event(self, prio, type):
return Event(prio, type)
I tried everything. I checked if the event-handler-id is the same for both processes (plus the lock works)
I even checked if the to-handle-list-id is the same for both methods; yes it is.
Still the one in the one process grows, while the other is empty.
Can someone please tell me why the one list is empty?
Edit: It works just fine if I throw a event through the system with only one process. has to do sth with multiprocessing
Edit: Because someone asked, here is a simple usecase for it(I only used the essentials):
class EV_Main():
def __init__(self):
self.e_h = Event_Handler()
self.e_controll = None #the controller doesnt even matter because the controll-function never gets called....list is always empty
def run(self):
self.e_h.init(self.e_controll)
process1 = Process(target = self.create_events)
process2 = Process(target = self.handle_events)
process1.start()
process2.start()
def create_events(self):
while True:
self.e_h.new_event(self.e_h.create_Event(0, 3)) # eEvent_Type.S_TOUCH_EVENT
time.sleep(0.3)
def handle_events(self):
while True:
self.e_h.handle_event()
time.sleep(0.1)

To have a shareable set of deque instances, you could create a special class DequeArray which will hold an internal list of deque instances and expose whatever methods you might need. Then I would turn this into a shareable, managed object. When the manager creates an instance of this class, what is returned is a proxy to the actual instance that resides in the manager's address space. Any method calls you make on this proxy are actually shipped of to the manager's process using pickle and any results returned the same way. Since the individual deque instances are not shareable, managed objects, do not add a method that returns one of these deque instances which is then modified without being cognizant that the version of the deque in the manager's address space has not been modified.
Individual operations on a deque are serialized. But if you are doing some operation on a deque that consists of multiple method calls on the deque and you require atomicity, then that sequence is a critical section that needs to be done under control of a lock, as in the left_rotate function below.
from multiprocessing import Process, Lock
from multiprocessing.managers import BaseManager
from collections import deque
# Add methods to this as required:
class DequeArray:
def __init__(self, array_size):
self._deques = [deque() for _ in range(array_size)]
def __repr__(self):
l = []
l.append('DequeArray [')
for d in self._deques:
l.append(' ' + str(d))
l.append(']')
return '\n'.join(l)
def __len__(self):
"""
Return our length (i.e. the number of deque
instances we have).
"""
return len(self._deques)
def append(self, i, value):
"""
Append value to the ith deque
"""
self._deques[i].append(value)
def popleft(self, i):
"""
Eexcute a popleft operation on the ith deque
and return the result.
"""
return self._deques[i].popleft()
def length(self, i):
"""
Return length of the ith dequeue.
"""
return len(self._deques[i])
class DequeArrayManager(BaseManager):
pass
DequeArrayManager.register('DequeArray', DequeArray)
# Demonstrate how to use a sharable DequeArray
def left_rotate(deque_array, lock, i):
# Rotate first element to be last element:
# This is not an atomic operation, so do under control of a lock:
with lock:
deque_array.append(i, deque_array.popleft(i))
# Required for Windows:
if __name__ == '__main__':
# This starts the manager process:
with DequeArrayManager() as manager:
# Two deques:
deque_array = manager.DequeArray(2)
# Initialize with some values:
deque_array.append(0, 0)
deque_array.append(0, 1)
deque_array.append(0, 2)
# Same values in second deque:
deque_array.append(1, 0)
deque_array.append(1, 1)
deque_array.append(1, 2)
print(deque_array)
# Both processses will be modifying the same deque in a
# non-atomic way, so we definitely need to be doing this under
# control of a lock. We don't care which process acquires the
# lock first because the results will be the same regardless.
lock = Lock()
p1 = Process(target=left_rotate, args=(deque_array, lock, 0))
p2 = Process(target=left_rotate, args=(deque_array, lock, 0))
p1.start()
p2.start()
p1.join()
p2.join()
print(deque_array)
Prints:
DequeArray [
deque([0, 1, 2])
deque([0, 1, 2])
]
DequeArray [
deque([2, 0, 1])
deque([0, 1, 2])
]

How to use other def values?

I want to use other def values.
For example, I added a 'pt' in the 'clean_beds_process' definition and add 'Patients' in the 'run' definition.
I want to patient information when the 'clean_beds_process' function is called.
However, this makes this error 'AttributeError: type object 'Patients' has no attribute 'id''
I don't know why this happen.
Maybe I have something wrong understanding of mechanism of simpy.
Please let me know how can I use a patient information when 'clean_beds_process' function is called.
Thank you.
import simpy
import random
class Patients:
def __init__(self, p_id):
self.id = p_id
self.bed_name = ""
self.admission_decision = ""
def admin_decision(self):
admin_decision_prob = random.uniform(0, 1)
if admin_decision_prob <= 0.7:
self.admission_decision = "DIS"
else:
self.dmission_decision = "IU"
return self.admission_decision
class Model:
def __init__(self, run_number):
self.env = simpy.Environment()
self.pt_ed_q = simpy.Store(self.env )
self.pt_counter = 0
self.tg = simpy.Resource(self.env, capacity = 4)
self.physician = simpy.Resource(self.env, capacity = 4)
self.bed_clean = simpy.Store(self.env)
self.bed_dirty = simpy.Store(self.env)
self.IU_bed = simpy.Resource(self.env, capacity = 50)
def generate_beds(self):
for i in range(77):
yield self.env.timeout(0)
yield self.bed_clean.put(f'bed{i}')
def generate_pt_arrivals(self):
while True:
self.pt_counter += 1
pt = Patients(self.pt_counter)
yield self.env.timeout(5)
self.env.process(self.process(pt))
def clean_beds_process(self, cleaner_id, pt):
while True:
print(pt.id)
bed = yield self.bed_dirty.get()
yield self.env.timeout(50)
yield self.bed_clean.put(bed)
def process(self, pt):
with self.tg.request() as req:
yield req
yield self.env.timeout(10)
bed = yield self.bed_clean.get()
pt.bed_name = bed
pt.admin_decision()
if pt.admission_decision == "DIS":
with self.IU_bed.request() as req:
dirty_bed_name = pt.bed_name
yield self.bed_dirty.put(dirty_bed_name)
yield self.env.timeout(600)
else:
dirty_bed_name = pt.bed_name
yield self.bed_dirty.put(dirty_bed_name)
def run(self):
self.env.process(self.generate_pt_arrivals())
self.env.process(self.generate_beds())
for i in range(2):
self.env.process(self.clean_beds_process(i+1, Patients))
self.env.run(until = 650)
run_model = Model(0)
run_model.run()

So if a patient can use either a clean bed or a dirty bed then the patient needs to make two request (one for each type of bed) and use env.any_of to wait for the first request to fire. You also need to deal with the case that both events fire at the same time. Don't forget to cancel the request you do not use. If the request that fires is for a clean bed, things stay mostly the same. But if the request is for a dirty bed, then you need to add a step to clean the bed. For this I would make the cleaners Resources instead of processes. So the patient would request a cleaner, and do a timeout for the cleaning time, release the cleaner. To collect patient data I would create a log with the patient id, key event, time, and crunch these post sim to get the stats I need. To process the log I often create a dataframe that filters the log for the first, a second dataframe that filters for the second envent, join the two dataframes on patient id. Now both events for a patient is on one row so I can get the delta. once I have have the delta I can do a sum and count. For example, if my two events are when a patient arrives, and when a patient gets a bed, get the sum of deltas and divide by the count and I have the average time to bed.
If you remember, one of the first answers I gave you awhile ago had a example to get the first available bed from two different queues
I do not have a lot of time right know, but I hope this dissertation helps a bit

Issue with sharing data between Python processes with multiprocessing

I've seen several posts about this, so I know it is fairly straightforward to do, but I seem to be coming up short. I'm not sure if I need to create a worker pool, or use the Queue class. Basically, I want to be able to create several processes that each act autonomously (which is why they inherit from the Agent superclass).
At random ticks of my main loop I want to update each Agent. I'm using time.sleep with different values in the main loop and the Agent's run loop to simulate different processor speeds.
Here is my Agent superclass:
# Generic class to handle mpc of each agent
class Agent(mpc.Process):
# initialize agent parameters
def __init__(self,):
# init mpc
mpc.Process.__init__(self)
self.exit = mpc.Event()
# an agent's main loop...generally should be overridden
def run(self):
while not self.exit.is_set():
pass
print "You exited!"
# safely shutdown an agent
def shutdown(self):
print "Shutdown initiated"
self.exit.set()
# safely communicate values to this agent
def communicate(self,value):
print value
A specific agent's subclass (simulating an HVAC system):
class HVAC(Agent):
def __init__(self, dt=70, dh=50.0):
super(Agent, self).__init__()
self.exit = mpc.Event()
self.__pref_heating = True
self.__pref_cooling = True
self.__desired_temperature = dt
self.__desired_humidity = dh
self.__meas_temperature = 0
self.__meas_humidity = 0.0
self.__hvac_status = "" # heating, cooling, off
self.start()
def run(self): # handle AC or heater on
while not self.exit.is_set():
ctemp = self.measureTemp()
chum = self.measureHumidity()
if (ctemp < self.__desired_temperature):
self.__hvac_status = 'heating'
self.__meas_temperature += 1
elif (ctemp > self.__desired_temperature):
self.__hvac_status = 'cooling'
self.__meas_temperature += 1
else:
self.__hvac_status = 'off'
print self.__hvac_status, self.__meas_temperature
time.sleep(0.5)
print "HVAC EXITED"
def measureTemp(self):
return self.__meas_temperature
def measureHumidity(self):
return self.__meas_humidity
def communicate(self,updates):
self.__meas_temperature = updates['temp']
self.__meas_humidity = updates['humidity']
print "Measured [%d] [%f]" % (self.__meas_temperature,self.__meas_humidity)
And my main loop:
if __name__ == "__main__":
print "Initializing subsystems"
agents = {}
agents['HVAC'] = HVAC()
# Run simulation
timestep = 0
while timestep < args.timesteps:
print "Timestep %d" % timestep
if timestep % 10 == 0:
curr_temp = random.randrange(68,72)
curr_humidity = random.uniform(40.0,60.0)
agents['HVAC'].communicate({'temp':curr_temp, 'humidity':curr_humidity})
time.sleep(1)
timestep += 1
agents['HVAC'].shutdown()
print "HVAC process state: %d" % agents['HVAC'].is_alive()
So the issue is that, whenever I run agents['HVAC'].communicate(x) within the main loop, I can see the value being passed into the HVAC subclass in its run loop (so it prints the received value correctly). However, the value never is successfully stored.
So typical output looks like this:
Initializing subsystems
Timestep 0
Measured [68] [56.948675]
heating 1
heating 2
Timestep 1
heating 3
heating 4
Timestep 2
heating 5
heating 6
When in reality, as soon as Measured [68] appears, the internal stored value should be updated to output 68 (not heating 1, heating 2, etc.). So effectively, the HVAC's self.__meas_temperature is not being properly updated.
Edit: After a bit of research, I realized that I didn't necessarily understand what is happening behind the scenes. Each subprocess operates with its own virtual chunk of memory and is completely abstracted away from any data being shared this way, so passing the value in isn't going to work. My new issue is that I'm not necessarily sure how to share a global value with multiple processes.
I was looking at the Queue or JoinableQueue packages, but I'm not necessarily sure how to pass a Queue into the type of superclass setup that I have (especially with the mpc.Process.__init__(self) call).
A side concern would be if I can have multiple agents reading values out of the queue without pulling it out of the queue? For instance, if I wanted to share a temperature value with multiple agents, would a Queue work for this?
Pipe v Queue

Here's a suggested solution assuming that you want the following:
a centralized manager / main process which controls lifetimes of the workers
worker processes to do something self-contained and then report results to the manager and other processes
Before I show it though, for the record I want to say that in general unless you are CPU bound multiprocessing is not really the right fit, mainly because of the added complexity, and you'd probably be better of using a different high-level asynchronous framework. Also, you should use python 3, it's so much better!
That said, multiprocessing.Manager, makes this pretty easy to do using multiprocessing. I've done this in python 3 but I don't think anything shouldn't "just work" in python 2, but I haven't checked.
from ctypes import c_bool
from multiprocessing import Manager, Process, Array, Value
from pprint import pprint
from time import sleep, time
class Agent(Process):
def __init__(self, name, shared_dictionary, delay=0.5):
"""My take on your Agent.
Key difference is that I've commonized the run-loop and used
a shared value to signal when to stop, to demonstrate it.
"""
super(Agent, self).__init__()
self.name = name
# This is going to be how we communicate between processes.
self.shared_dictionary = shared_dictionary
# Create a silo for us to use.
shared_dictionary[name] = []
self.should_stop = Value(c_bool, False)
# Primarily for testing purposes, and for simulating
# slower agents.
self.delay = delay
def get_next_results(self):
# In the real world I'd use abc.ABCMeta as the metaclass to do
# this properly.
raise RuntimeError('Subclasses must implement this')
def run(self):
ii = 0
while not self.should_stop.value:
ii += 1
# debugging / monitoring
print('%s %s run loop execution %d' % (
type(self).__name__, self.name, ii))
next_results = self.get_next_results()
# Add the results, along with a timestamp.
self.shared_dictionary[self.name] += [(time(), next_results)]
sleep(self.delay)
def stop(self):
self.should_stop.value = True
print('%s %s stopped' % (type(self).__name__, self.name))
class HVACAgent(Agent):
def get_next_results(self):
# This is where you do your work, but for the sake of
# the example just return a constant dictionary.
return {'temperature': 5, 'pressure': 7, 'humidity': 9}
class DumbReadingAgent(Agent):
"""A dumb agent to demonstrate workers reading other worker values."""
def get_next_results(self):
# get hvac 1 results:
hvac1_results = self.shared_dictionary.get('hvac 1')
if hvac1_results is None:
return None
return hvac1_results[-1][1]['temperature']
# Script starts.
results = {}
# The "with" ensures we terminate the manager at the end.
with Manager() as manager:
# the manager is a subprocess in its own right. We can ask
# it to manage a dictionary (or other python types) for us
# to be shared among the other children.
shared_info = manager.dict()
hvac_agent1 = HVACAgent('hvac 1', shared_info)
hvac_agent2 = HVACAgent('hvac 2', shared_info, delay=0.1)
dumb_agent = DumbReadingAgent('dumb hvac1 reader', shared_info)
agents = (hvac_agent1, hvac_agent2, dumb_agent)
list(map(lambda a: a.start(), agents))
sleep(1)
list(map(lambda a: a.stop(), agents))
list(map(lambda a: a.join(), agents))
# Not quite sure what happens to the shared dictionary after
# the manager dies, so for safety make a local copy.
results = dict(shared_info)
pprint(results)

Alternative of threading.Timer?

I have a producer-consumer pattern Queue, it consumes incoming events and schedule qualified events sending out in 5 seconds. I am using threading.Timer()python documentto do it and everything was working fine.
Recently, I was requested to change the scheduled time from 5 second to 30 minutes, and threading.Timer() crashes my script because previously the threads objects are created and are released very soon(only last 5 sec) but now it has to keep alive for 30 minutes.
Here's the code:
if scheduled_time and out_event:
threading.Timer(scheduled_time, self.send_out_event, (socket_connection, received_event, out_event,)).start() # schedule event send out
Can somesone shed some light on this? How can I solve this problem or is there any alternative for threading.Timer()?

Thanks for #dano 's comment about the 3rd party modules! Based on my work requirement, I didn't install them on the server.
Instead of using threading.Timer(), I choose to use a Redis based Delay Queue, I found some helpful source online: A unique Python redis-based queue with delay. It solved my issue.
Briefly, the author creates a sorted set in redis and give it a name, add() would appends new data into the sorted set. Every time it pops at most one element from the sorted set based upon the epoch-time score, the element which holds qualified minimum score would be pop out (without removing from redis):
def add(self, received_event, delay_queue_name="delay_queue", delay=config.SECOND_RETRY_DELAY):
try:
score = int(time.time()) + delay
self.__client.zadd(delay_queue_name, score, received_event)
self.__logger.debug("added {0} to delay queue, delay time:{1}".format(received_event, delay))
except Exception as e:
self.__logger.error("error: {0}".format(e))
def pop(self, delay_queue_name="delay_queue"):
min_score, max_score, element = 0, int(time.time()), None
try:
result = self.__client.zrangebyscore(delay_queue_name, min_score, max_score, start=0, num=1, withscores=False)
except Exception as e:
self.__logger.error("failed query from redis:{0}".format(e))
return None
if result and len(result) == 1:
element = result[0]
self.__logger.debug("poped {0} from delay queue".format(element))
else:
self.__logger.debug("no qualified element")
return element
def remove(self, element, delay_queue_name="delay_queue"):
self.__client.zrem(delay_queue_name, element)
self.__client is a Redis client instance, redis.StrictRedis(host=rhost,port=rport, db=rindex).
The difference between the online source with mine is that I switched zadd() parameters. The order of score and data are switched. Below is docs of zadd():
# SORTED SET COMMANDS
def zadd(self, name, *args, **kwargs):
"""
Set any number of score, element-name pairs to the key ``name``. Pairs
can be specified in two ways:
As *args, in the form of: score1, name1, score2, name2, ...
or as **kwargs, in the form of: name1=score1, name2=score2, ...
The following example would add four values to the 'my-key' key:
redis.zadd('my-key', 1.1, 'name1', 2.2, 'name2', name3=3.3, name4=4.4)
"""
pieces = []
if args:
if len(args) % 2 != 0:
raise RedisError("ZADD requires an equal number of "
"values and scores")
pieces.extend(args)
for pair in iteritems(kwargs):
pieces.append(pair[1])
pieces.append(pair[0])
return self.execute_command('ZADD', name, *pieces)

Release the memory of threading.Timer in Python

I'm doing a packet injection with scapy and creating a threading.Timer which deletes the packet information from the dictionary after 10 seconds.
If I receive a response before 10 seconds, I'm cancelling the Timer and deleting the packet information from the dictionary. Since I'm cancelling the Timer, I can put .join() here. But when the Timer expires, there's no mechanism to .join().
When I run the program, the memory keeps on increasing. No doubt, increasing is slow (Initially 2% in a 1 GB RAM system. In 20 minutes, it went to 3.9%). But still it keeps on increasing. I tried gc.collect(). But it's of no help. I'm monitoring the memory with top.
Below is the code. Sorry for the large code. I guess it's better to give the whole code to know if somewhere I'm missing something.
from threading import Timer
from scapy.all import *
import gc
class ProcessPacket():
#To write the packets to the response.pcap file
def pcapWrite(self, pkts):
writerResponse(pkts)
writerResponse.flush()
return
#cancels the Timer upon receiving a response
def timerCancel(self, ipPort):
if self.pktInfo[ipPort]["timer"].isAlive():
self.pktInfo[ipPort]["timer"].cancel()
self.pktInfo[ipPort]["timer"].join()
self.delete(ipPort)
return
#deletes the packet information from pktInfo
def delete(self, ipPort):
if self.pktInfo.has_key(ipPort):
self.pcapWrite(self.pktInfo[ipPort]["packets"])
del self.pktInfo[ipPort]
return
#processes the received packet and sends the response
def createSend(self, pkt, ipPort):
self.pktInfo[ipPort]["packets"] = [pkt]
self.pktInfo[ipPort]["timer"] = Timer(10, self.delete, args=[ipPort])
myPkt = IP(src = pkt[IP].dst, dst = pkt[IP].src)/ TCP(dport = pkt[TCP].sport, sport = pkt[TCP].dport, flags = 'SA')
self.pktInfo[ipPort]["packets"].append(myPkt)
send(myPkt, verbose=0)
self.pktInfo[ipPort]["timer"].start()
return
#constructor
def __init__(self):
self.count = 0
self.writerResponse=PcapWriter('response.pcap',append = True)
self.pktInfo = {}
return
#from sniff
def pktCallback(self,pkt):
ipPort = pkt[IP].src + str(pkt[TCP].sport) + pkt[IP].dst + str(pkt[TCP].dport)
flag=pkt.sprintf('%TCP.flags%')
if self.count == 10:
self.count = 0
gc.collect()
if not self.pktInfo.has_key(ipPort) and flag == 'S':
self.pktInfo[ipPort] = {}
self.createSend(pkt, ipPort)
self.count += 1
elif self.pktInfo.has_key(ipPort):
self.timerCancel(ipPort)
self.count += 1
return
#custom filter for sniff
def myFilter(pkt):
if pkt.haslayer(IP):
if pkt[IP].src == "172.16.0.1":
return 1
return 0
if __name__ == '__main__':
respondObj = ProcessPacket()
sniff(iface = 'eth0', filter = "tcp", prn = respondObj.pktCallback, lfilter = myFilter)
In the program, I don't see any other memory consuming factor other than pktInfo and the Timer. pktInfo is increasing and decreasing so the problem is with Timer. How can I free memory of the expired or cancelled Timers?
EDIT 1:
I modified the delete() function:
#deletes the packet information from pktInfo
def delete(self, ipPort):
if self.pktInfo.has_key(ipPort):
print "Before", len(self.pktinfo.keys()), sys.getsizeof(self.pktinfo)
self.pcapWrite(self.pktInfo[ipPort]["packets"])
self.pktInfo[ipPort]["timer"]=None
self.pktInfo[ipPort]=None
del self.pktInfo[ipPort]
gc.collect()
print "After", len(self.pktinfo.keys()), sys.getsizeof(self.pktinfo)
return
After deletion the number of elements in self.pktinfo decreases. The size of the self.pktinfo remains same for a long time but eventually changes (decreases or increases). But the memory of system doesn't seem to be released. top shows the same behaviour that the memory used by the program is continuously increasing.

Release all references to unneeded Timer objects. (e.g. self.pktInfo[ipPort]["timer"] = None) - if you don't the garbage collector won't free up anything.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dictionary+Queue Data Structure with Active Removal of Old Messages - python

Related

Multiprocessing event-queue not updating

How to use other def values?

Issue with sharing data between Python processes with multiprocessing

Alternative of threading.Timer?

Release the memory of threading.Timer in Python

Categories

Resources