Forkable iterator - is there any implementations of it in Python?

Forkable iterator - is there any implementations of it in Python? - python

What I mean by "forkable iterator" - it is a regular iterator with method fork() which creates a new iterator which iterates from the current point of iteration of original iterator. And even if the original iterator was iterated further, fork will stay at the point where it was forked, until it itself will not be iterated over.
My practical use case:
I have a socket connection, and some "packets" that sent through it. Connection can be shared between "receivers" and each "packet" can be addressed to some "receiver". "Packets" can come in unordered way, so each "receiver" can potentially receive packet for different "recevier". And more than that - if one "receiver" received "packet" for different "recevier", this "different receiver" must still be able to read that packet.
So for that I want to implement such forkable iterator, which will represent the connection, and each receiver will make own fork, read it and search for "packets" addressed for it.
Does somebody know any implementations of what I'm talking about?

You are looking for the itertools.tee() function:
Return n independent iterators from a single iterable.
Do take into account that the implementation will buffer data to service all child iterators:
This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored).
Also, you should only use the returned child iterators; iterating over the source iterator will not propagate the data to the tee() iterables.

Thats my current implementation of forkable iterator:
#!/usr/bin/env python
# coding=utf-8
from collections import Iterator, deque
import threading
class ForkableIterator(Iterator):
def __init__(self, iterator, buffer=None, *args, **kwargs):
self.iterator = iter(iterator)
if buffer is None:
self.buffer = deque()
else:
self.buffer = buffer
args = iter(args)
self.refs = kwargs.get('refs', next(args, {}))
self.refs.setdefault('base', 0)
self.pointer = kwargs.get('pointer', next(args, 0))
self.lock = kwargs.get('lock', next(args, threading.Lock()))
#property
def pointer(self):
return self.refs[self] + self.refs['base']
#pointer.setter
def pointer(self, value):
self.refs[self] = value
def __del__(self):
del self.refs[self]
def __iter__(self):
return self
def next(self):
with self.lock:
if len(self.buffer) - self.pointer == 0:
elem = next(self.iterator)
self.buffer.append(elem)
else:
if self.pointer == min(self.refs.itervalues()):
elem = self.buffer.popleft()
self.refs['base'] -= 1
else:
elem = self.buffer[self.pointer]
self.pointer += 1
return elem
def fork(self):
return self.__class__(self.iterator, self.buffer,
refs=self.refs, pointer=self.pointer,
lock=self.lock)

Related

Multiprocessing event-queue not updating

So I'm writing a program with an event system.
I got a list of events to be handled.
One Process is supposed to push to the handler-list new events.
This part seems to work as I tried to print out the to-handle-list after pushing one event.
It gets longer and longer, while, when I print out the to handle list in the handle-event method, it is empty all the time.
Here is my event_handler code:
class Event_Handler:
def __init__(self):
self._to_handle_list = [deque() for _ in range(Event_Prio.get_num_prios()) ]
self._controll_handler= None
self._process_lock = Lock()
def init(self, controll_EV_handler):
self._controll_handler= controll_EV_handler
def new_event(self, event): #adds a new event to list
with self._process_lock:
self._to_handle_list[event.get_Prio()].append(event) #this List grows
def handle_event(self): #deals with the to_handle_list
self._process_lock.acquire()
for i in range(Event_Prio.get_num_prios()): #here i keep a list of empty deque
print(self._to_handle_list)
if (self._to_handle_list[i]): #checks if to-do is empty, never gets here that its not
self._process_lock.release()
self._controll_handler.controll_event(self._to_handle_list[i].popleft())
return
self._process_lock.release()
def create_Event(self, prio, type):
return Event(prio, type)
I tried everything. I checked if the event-handler-id is the same for both processes (plus the lock works)
I even checked if the to-handle-list-id is the same for both methods; yes it is.
Still the one in the one process grows, while the other is empty.
Can someone please tell me why the one list is empty?
Edit: It works just fine if I throw a event through the system with only one process. has to do sth with multiprocessing
Edit: Because someone asked, here is a simple usecase for it(I only used the essentials):
class EV_Main():
def __init__(self):
self.e_h = Event_Handler()
self.e_controll = None #the controller doesnt even matter because the controll-function never gets called....list is always empty
def run(self):
self.e_h.init(self.e_controll)
process1 = Process(target = self.create_events)
process2 = Process(target = self.handle_events)
process1.start()
process2.start()
def create_events(self):
while True:
self.e_h.new_event(self.e_h.create_Event(0, 3)) # eEvent_Type.S_TOUCH_EVENT
time.sleep(0.3)
def handle_events(self):
while True:
self.e_h.handle_event()
time.sleep(0.1)

To have a shareable set of deque instances, you could create a special class DequeArray which will hold an internal list of deque instances and expose whatever methods you might need. Then I would turn this into a shareable, managed object. When the manager creates an instance of this class, what is returned is a proxy to the actual instance that resides in the manager's address space. Any method calls you make on this proxy are actually shipped of to the manager's process using pickle and any results returned the same way. Since the individual deque instances are not shareable, managed objects, do not add a method that returns one of these deque instances which is then modified without being cognizant that the version of the deque in the manager's address space has not been modified.
Individual operations on a deque are serialized. But if you are doing some operation on a deque that consists of multiple method calls on the deque and you require atomicity, then that sequence is a critical section that needs to be done under control of a lock, as in the left_rotate function below.
from multiprocessing import Process, Lock
from multiprocessing.managers import BaseManager
from collections import deque
# Add methods to this as required:
class DequeArray:
def __init__(self, array_size):
self._deques = [deque() for _ in range(array_size)]
def __repr__(self):
l = []
l.append('DequeArray [')
for d in self._deques:
l.append(' ' + str(d))
l.append(']')
return '\n'.join(l)
def __len__(self):
"""
Return our length (i.e. the number of deque
instances we have).
"""
return len(self._deques)
def append(self, i, value):
"""
Append value to the ith deque
"""
self._deques[i].append(value)
def popleft(self, i):
"""
Eexcute a popleft operation on the ith deque
and return the result.
"""
return self._deques[i].popleft()
def length(self, i):
"""
Return length of the ith dequeue.
"""
return len(self._deques[i])
class DequeArrayManager(BaseManager):
pass
DequeArrayManager.register('DequeArray', DequeArray)
# Demonstrate how to use a sharable DequeArray
def left_rotate(deque_array, lock, i):
# Rotate first element to be last element:
# This is not an atomic operation, so do under control of a lock:
with lock:
deque_array.append(i, deque_array.popleft(i))
# Required for Windows:
if __name__ == '__main__':
# This starts the manager process:
with DequeArrayManager() as manager:
# Two deques:
deque_array = manager.DequeArray(2)
# Initialize with some values:
deque_array.append(0, 0)
deque_array.append(0, 1)
deque_array.append(0, 2)
# Same values in second deque:
deque_array.append(1, 0)
deque_array.append(1, 1)
deque_array.append(1, 2)
print(deque_array)
# Both processses will be modifying the same deque in a
# non-atomic way, so we definitely need to be doing this under
# control of a lock. We don't care which process acquires the
# lock first because the results will be the same regardless.
lock = Lock()
p1 = Process(target=left_rotate, args=(deque_array, lock, 0))
p2 = Process(target=left_rotate, args=(deque_array, lock, 0))
p1.start()
p2.start()
p1.join()
p2.join()
print(deque_array)
Prints:
DequeArray [
deque([0, 1, 2])
deque([0, 1, 2])
]
DequeArray [
deque([2, 0, 1])
deque([0, 1, 2])
]

python: how to create an iterator/iterable backed by threaded operation results?

I'm trying to improve a performance issue by preloading data used by a batching iterator, and I'm stuck at the point of fitting this into the python idiomatic style.
Currently I'm using a sequence of iterators composed one on top of the other:
iterator = list(dirin.iterdir())
iterator = TranformIterator(iterator, lambda path: load_img(path)) # This takes most of the time
iterator = PreloadingIterator(iterator) # I want to use this iterator to preload some of the data
iterator = BatchingIterator(iterator, batchsize=100)
iterator = BatchingIteratorWithMeta(iterator)
for batch in iterator:
All of these are implemented with the exception of PreloadingIterator:
class PreloadingIterator:
inner: typing.Iterator
def __init__(self,
inner: [typing.Union[typing.Sized, typing.Iterable]]
):
self.inner = inner
self.total = len(inner)
self.index = 0
self.__cache__ = []
def __len__(self):
return len(self.inner)
def __iter__(self):
mem = psutil.virtual_memory()
for item in self.inner:
memconsumed = psutil.virtual_memory()
if self.should_preload(mem, memconsumed):
pass
#threading.Thread(target=
#peeker = more_itertools.peekable(self.inner)
#preload_item = peeker.peek()
yield item
self.index += 1
def should_preload(self, oldmem, newmem):
return True # TODO
What I'm trying to do is peek ahead at the next item in the iterator(preload_item = peeker.peek()) and use that to start a thread to start loading the next result from the iterator. However, I'm struggling to think how I can change the item in for item in self.inner: so it refers to the next item, not the one from the underlying iterator.
How can I iterate over an iterator in a way which allows me to source the item from a precached result if it is available?

Confused about the size() function in python

I'm going through the code to write a circular queue in python
class CircularQueue:
# constructor for the class
# taking input for the size of the Circular queue
# from user
def __init__(self, maxSize):
self.queue = list()
# user input value for maxSize
self.maxSize = maxSize
self.head = 0
self.tail = 0
# add element to the queue
def enqueue(self, data):
# if queue is full
if self.size() == (self.maxSize - 1):
return("Queue is full!")
else:
# add element to the queue
self.queue.append(data)
# increment the tail pointer
self.tail = (self.tail+1) % self.maxSize
return True
and the part that confuses me is the self.size() in the method "enqueue"
I looked through the python docs and don't see any size() function, only references to size() in numpy.
Normally you'd want to call len() for the size of a list, but I know you can't do self.len()
any clarity/explanation of the syntax and logic behind writing something like this would be helpful!

You need to define your own size() method and just return the number of items currently held in the queue.

Python how to turn a result of a method into generator

I have the following inheritance:
class Processor(object):
def get_listings(self):
"""
returns a list of data
"""
raise NotImplemented()
def run(self):
for listing in get_listings():
do_stuff(listing)
class DBProcessor(Processor):
def get_listings(self):
"""
return a large set of paginated data
"""
...
for page in pages:
for data in db.fetch_from_query(...):
yield data
Although this works, this fails on len(self.get_listings()) or any other list operations.
My question is how to refactor my code that DBProcessor.get_listings can handle list operations, but also when it's iterator called it will return a generator?

I think I got an idea:
class DBListings(object):
def __iter__(self):
for page in pages:
for data in db.fetch_from_query(...):
yield data
def __len__(self):
return db.get_total_from_query(...)
"""
Or the following
counter = 0
for x in self:
counter += 1
return counter
"""
class DBProcessor(Processor):
def get_listings(self):
"""
return a large set of paginated data
"""
return DBListings()
UPDATE: Just tested the above code, works.

It depends on which list-operations you want to support. Some of them will only consume the generator when defaulting to iter.
If you know the result of the operation (for example len) beforehand you can just bypass it by creating a GeneratorContainer:
class GeneratorContainer():
def __init__(self, generator, length):
self.generator = generator
self.length = length
def __iter__(self):
return self.generator
def __len__(self):
return self.length
result = GeneratorContainer(DBProcessor().get_listings(), length)
# But you need to know the length-value.
Calling len will then not try to iterate over the generator. But you can always just create a list so that the data will not be exhausted:
result = list(DBProcessor().get_listings())
and use it as a list without the generator advantages and disadvantages.

If you wish to convert the generator (iterator in non-Python speak) produced by get_listings to a list, simply use
listings = list(get_listings())

Circular Programming in Python (corecursion) - is it possible?

I know Python has some lazy implementations, and as such, I was wondering if it is possible to use circular programming in Python.
If it isn't, why?

I think you mean co-routines, not co-recursion. Yes, it's perfectly possible in Python, since PEP 342: Coroutines via Enhanced Generators has been implemented.
The canonical example is the consumer decorator:
def consumer(func):
def wrapper(*args,**kw):
gen = func(*args, **kw)
next(gen)
return gen
wrapper.__name__ = func.__name__
wrapper.__dict__ = func.__dict__
wrapper.__doc__ = func.__doc__
return wrapper
Using such consumer then let's you chain filters and push information through them, acting as a pipeline:
from itertools import product
#consumer
def thumbnail_pager(pagesize, thumbsize, destination):
while True:
page = new_image(pagesize)
rows, columns = pagesize / thumbsize
pending = False
try:
for row, column in product(range(rows), range(columns)):
thumb = create_thumbnail((yield), thumbsize)
page.write(
thumb, col * thumbsize.x, row * thumbsize.y
)
pending = True
except GeneratorExit:
# close() was called, so flush any pending output
if pending:
destination.send(page)
# then close the downstream consumer, and exit
destination.close()
return
else:
# we finished a page full of thumbnails, so send it
# downstream and keep on looping
destination.send(page)
#consumer
def jpeg_writer(dirname):
fileno = 1
while True:
filename = os.path.join(dirname,"page%04d.jpg" % fileno)
write_jpeg((yield), filename)
fileno += 1
# Put them together to make a function that makes thumbnail
# pages from a list of images and other parameters.
#
def write_thumbnails(pagesize, thumbsize, images, output_dir):
pipeline = thumbnail_pager(
pagesize, thumbsize, jpeg_writer(output_dir)
)
for image in images:
pipeline.send(image)
pipeline.close()
The central principles are python generators, and yield expressions; the latter lets a generator receive information from a caller.
Edit: Ah, Co-recursion is indeed a different concept. Note that the Wikipedia article uses python for it's examples, and moreover, uses python generators.

Did you try it?
def a(x):
if x == 1: return
print "a", x
b(x - 1)
def b(x):
if x == 1: return
print "b", x
a(x - 1)
a(10)
As a side note, python does not have tail recursion, and this will fail for x > 1000 (although this limit is configurable)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Forkable iterator - is there any implementations of it in Python? - python

Related

Multiprocessing event-queue not updating

python: how to create an iterator/iterable backed by threaded operation results?

Confused about the size() function in python

Python how to turn a result of a method into generator

Circular Programming in Python (corecursion) - is it possible?

Categories

Resources