Is there a way to make a thread go to sleep if the list is empty and wake it up again when there are items? I don't want to use Queues since I want to be able to index into the datastructure.
Yes, the solution will probably involve a threading.Condition variable as you note in comments.
Without more information or a code snippet, it's difficult to know what API suits your needs. How are you producing new elements? How are you consuming them? At base, you could do something like this:
cv = threading.Condition()
elements = [] # elements is protected by, and signaled by, cv
def produce(...):
with cv:
... add elements somehow ...
cv.notify_all()
def consume(...):
with cv:
while len(elements) == 0:
cv.wait()
... remove elements somehow ...
I would go with this:
import threading
class MyList (list):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._cond = threading.Condition()
def append(self, item):
with self._cond:
super().append(item)
self._cond.notify_all()
def pop_or_sleep(self):
with self._cond:
while not len(self):
self._cond.wait()
return self.pop()
Related
I'm using tornado.websocket, where class-methods are overrides of the WebSocketHandler methods. Anyway, my code look like that:
class SocketHandler(tornado.websocket.WebSocketHandler):
current_ninja_pool = enumerate(return_dependency_lvl_0())
current_ninja = next(current_ninja_pool)
file_to_upload = []
def check_origin(self, origin):
return True
def open(self):
logging.info("A client connected.")
self.run()
def run(self):
if condition:
do_this()
else:
do_that()
self.current_ninja = next(self.current_ninja_pool)
self.run()
def on_message(self, message):
do_a_lot_of_stuff()
if message == 'next one':
self.current_ninja = next(self.current_ninja_pool)
def on_close(self):
logging.info("A client disconnected")
So, what I want is to be able to iterate my enumerate, so that every element can be processed in the methods run or on_message depending on how my client-websocket will answer. The problem is that I want to iterate under particular conditions, and I don't have a clue on how to do this. Since I'm not very familiar with the way you manipulate class- and instance-variables, I'm probably missing a point here.
Thank you
You need an iterator. Luckily, enumerate already returns an iterator; you just need to access that, rather than storing the current item.
I also suspect that current_ninja_pool should be an instance variable, not a class one (which would be shared across all instances of the class).
class SocketHandler(tornado.websocket.WebSocketHandler):
def __init__(self, *args, **kwargs)
self.current_ninja_pool = enumerate(return_dependency_lvl_0())
file_to_upload = []
def run(self):
item = next(self.current_ninja_pool)
do_something_with(item)
I have the following inheritance:
class Processor(object):
def get_listings(self):
"""
returns a list of data
"""
raise NotImplemented()
def run(self):
for listing in get_listings():
do_stuff(listing)
class DBProcessor(Processor):
def get_listings(self):
"""
return a large set of paginated data
"""
...
for page in pages:
for data in db.fetch_from_query(...):
yield data
Although this works, this fails on len(self.get_listings()) or any other list operations.
My question is how to refactor my code that DBProcessor.get_listings can handle list operations, but also when it's iterator called it will return a generator?
I think I got an idea:
class DBListings(object):
def __iter__(self):
for page in pages:
for data in db.fetch_from_query(...):
yield data
def __len__(self):
return db.get_total_from_query(...)
"""
Or the following
counter = 0
for x in self:
counter += 1
return counter
"""
class DBProcessor(Processor):
def get_listings(self):
"""
return a large set of paginated data
"""
return DBListings()
UPDATE: Just tested the above code, works.
It depends on which list-operations you want to support. Some of them will only consume the generator when defaulting to iter.
If you know the result of the operation (for example len) beforehand you can just bypass it by creating a GeneratorContainer:
class GeneratorContainer():
def __init__(self, generator, length):
self.generator = generator
self.length = length
def __iter__(self):
return self.generator
def __len__(self):
return self.length
result = GeneratorContainer(DBProcessor().get_listings(), length)
# But you need to know the length-value.
Calling len will then not try to iterate over the generator. But you can always just create a list so that the data will not be exhausted:
result = list(DBProcessor().get_listings())
and use it as a list without the generator advantages and disadvantages.
If you wish to convert the generator (iterator in non-Python speak) produced by get_listings to a list, simply use
listings = list(get_listings())
What I mean by "forkable iterator" - it is a regular iterator with method fork() which creates a new iterator which iterates from the current point of iteration of original iterator. And even if the original iterator was iterated further, fork will stay at the point where it was forked, until it itself will not be iterated over.
My practical use case:
I have a socket connection, and some "packets" that sent through it. Connection can be shared between "receivers" and each "packet" can be addressed to some "receiver". "Packets" can come in unordered way, so each "receiver" can potentially receive packet for different "recevier". And more than that - if one "receiver" received "packet" for different "recevier", this "different receiver" must still be able to read that packet.
So for that I want to implement such forkable iterator, which will represent the connection, and each receiver will make own fork, read it and search for "packets" addressed for it.
Does somebody know any implementations of what I'm talking about?
You are looking for the itertools.tee() function:
Return n independent iterators from a single iterable.
Do take into account that the implementation will buffer data to service all child iterators:
This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored).
Also, you should only use the returned child iterators; iterating over the source iterator will not propagate the data to the tee() iterables.
Thats my current implementation of forkable iterator:
#!/usr/bin/env python
# coding=utf-8
from collections import Iterator, deque
import threading
class ForkableIterator(Iterator):
def __init__(self, iterator, buffer=None, *args, **kwargs):
self.iterator = iter(iterator)
if buffer is None:
self.buffer = deque()
else:
self.buffer = buffer
args = iter(args)
self.refs = kwargs.get('refs', next(args, {}))
self.refs.setdefault('base', 0)
self.pointer = kwargs.get('pointer', next(args, 0))
self.lock = kwargs.get('lock', next(args, threading.Lock()))
#property
def pointer(self):
return self.refs[self] + self.refs['base']
#pointer.setter
def pointer(self, value):
self.refs[self] = value
def __del__(self):
del self.refs[self]
def __iter__(self):
return self
def next(self):
with self.lock:
if len(self.buffer) - self.pointer == 0:
elem = next(self.iterator)
self.buffer.append(elem)
else:
if self.pointer == min(self.refs.itervalues()):
elem = self.buffer.popleft()
self.refs['base'] -= 1
else:
elem = self.buffer[self.pointer]
self.pointer += 1
return elem
def fork(self):
return self.__class__(self.iterator, self.buffer,
refs=self.refs, pointer=self.pointer,
lock=self.lock)
Hi i have a python wrapper function which counts the number of times a function is called and based on each count i am performing some actions which writes the content to a html file. Now is there a way to set the counter back to 1 after one iteration, so that when i start for second iteration I want the count of htmloverview() to start from beginning and not count from previous values.
Decorator:
def counter(func):
#wraps(func)
def tmp(*args, **kwargs):
tmp.count += 1
return func(*args, **kwargs)
tmp.count = 0
return tmp
#counter
def htmloverview(fileouthtml,resultfile,file,identical,namesBothSub):
r= htmloverview.count
if(len(diff)==0):
if(r==1):
s = '\n'.join([message,message1,'<td>','0','</td>','</tr>'])
else:
s = '\n'.join(['<tr>','<td>',message1,'<td>','0','</td>','</tr>'])
fileouthtml.write(s)
fileouthtml.write('\n')
I can run the simulator 'n' times and each time when i run, i want the counter of htmloverview to start from beginning and not count from previous iteration, is there a way to do it.
A class seems to be a better fit here than a decorator.
class Simulation(object):
def __init__(self, func):
self.count = 0
self._func = func
def reset_count(self):
self.count = 0
def run(self, *args, **kwargs):
self.count += 1
return self._func(*args, **kwargs)
sim = Simulation(htmloverview)
for i in range(100): #first iteration
sim.run(fileouthtml,resultfile,file,identical,namesBothSub)
print sim.count #ran 100 times
sim.reset()
you could do error handling and keep error count and have multiple instances of simulation each with it's own value of count
(Would have put this in a comment if I could)
I'm a bit unsure what you are trying to achieve. Do you want your function to have one behavior the first time you call it and another for all other calls, until you reset the state? Then you could either pass the state variable as a boolean parameter:
def htmloverview(fileouthtml,resultfile,file,identical,namesBothSub, first_call) # True/False
or you could keep the state in a global variable:
htmloverview_count=0
def htmloverview(fileouthtml,resultfile,file,identical,namesBothSub):
global htmloverview_count
if htmloverview_count>0:
<do something>
htmloverview_count+=1
And reset the state from the calling context:
htmloverview_count=0
If you have lots of functions with similar behavior it might make sense to use a decorator. Please elaborate your question: would any of the above solve your problem?
I am maintaining a little library of useful functions for interacting with my company's APIs and I have come across (what I think is) a neat question that I can't find the answer to.
I frequently have to request large amounts of data from an API, so I do something like:
class Client(object):
def __init__(self):
self.data = []
def get_data(self, offset = 0):
done = False
while not done:
data = get_more_starting_at(offset)
self.data.extend(data)
offset += 1
if not data:
done = True
This works fine and allows me to restart the retrieval where I left off if something goes horribly wrong. However, since python functions are just regular objects, we can do stuff like:
def yo():
yo.hi = "yo!"
return None
and then we can interrogate yo about its properties later, like:
yo.hi => "yo!"
my question is: Can I rewrite my class-based example to pin the data to the function itself, without referring to the function by name. I know I can do this by:
def get_data(offset=0):
done = False
get_data.data = []
while not done:
data = get_more_starting_from(offset)
get_data.data.extend(data)
offset += 1
if not data:
done = True
return get_data.data
but I would like to do something like:
def get_data(offset=0):
done = False
self.data = [] # <===== this is the bit I can't figure out
while not done:
data = get_more_starting_from(offset)
self.data.extend(data) # <====== also this!
offset += 1
if not data:
done = True
return self.data # <======== want to refer to the "current" object
Is it possible to refer to the "current" object by anything other than its name?
Something like "this", "self", or "memememe!" is what I'm looking for.
I don't understand why you want to do this, but it's what a fixed point combinator allows you to do:
import functools
def Y(f):
#functools.wraps(f)
def Yf(*args):
return inner(*args)
inner = f(Yf)
return Yf
#Y
def get_data(f):
def inner_get_data(*args):
# This is your real get data function
# define it as normal
# but just refer to it as 'f' inside itself
print 'setting get_data.foo to', args
f.foo = args
return inner_get_data
get_data(1, 2, 3)
print get_data.foo
So you call get_data as normal, and it "magically" knows that f means itself.
You could do this, but (a) the data is not per-function-invocation, but per function (b) it's much easier to achieve this sort of thing with a class.
If you had to do it, you might do something like this:
def ybother(a,b,c,yrselflambda = lambda: ybother):
yrself = yrselflambda()
#other stuff
The lambda is necessary, because you need to delay evaluation of the term ybother until something has been bound to it.
Alternatively, and increasingly pointlessly:
from functools import partial
def ybother(a,b,c,yrself=None):
#whatever
yrself.data = [] # this will blow up if the default argument is used
#more stuff
bothered = partial(ybother, yrself=ybother)
Or:
def unbothered(a,b,c):
def inbothered(yrself):
#whatever
yrself.data = []
return inbothered, inbothered(inbothered)
This last version gives you a different function object each time, which you might like.
There are almost certainly introspective tricks to do this, but they are even less worthwhile.
Not sure what doing it like this gains you, but what about using a decorator.
import functools
def add_self(f):
#functools.wraps(f)
def wrapper(*args,**kwargs):
if not getattr(f, 'content', None):
f.content = []
return f(f, *args, **kwargs)
return wrapper
#add_self
def example(self, arg1):
self.content.append(arg1)
print self.content
example(1)
example(2)
example(3)
OUTPUT
[1]
[1, 2]
[1, 2, 3]