I came across a bit of code in StackOverflow that raised two questions about the way deque works. I don't have enough reputation to ask "in situ", therefore this question:
from collections import deque
from itertools import islice
def sliding_window(iterable, size=2, step=1, fillvalue=None):
if size < 0 or step < 1:
raise ValueError
it = iter(iterable)
q = deque(islice(it, size), maxlen=size)
if not q:
return # empty iterable or size == 0
q.extend(fillvalue for _ in range(size - len(q))) # pad to size
while True:
yield iter(q) # iter() to avoid accidental outside modifications
q.append(next(it))
q.extend(next(it, fillvalue) for _ in range(step - 1))
The code computes a sliding window of a given size over a sequence.
The steps I don't understand are first:
q = deque(islice(it, size), maxlen=size)
What is the use of maxlen here? Isn't islice always going to output an iterable of at most length size?
And second:
yield iter(q) # iter() to avoid accidental outside modifications
why do we need to transform to to iterable to avoid "accidental outside modifications"?
To answer second part of the question, everything in Python is passed by reference. So in case of above generator q is a reference to the original deque hold by the function, so any method that may amend the deque, would break original algorithm of the generation. When you surround q with iter() what you effectively have yielded is an iterator. You can take elements from iterator (read), but you cannot change elements itself or amend the sequence of them (write not allowed). So it's a good practice to protect from accidental damage to the container hold internally be the generator.
To answer the first part of your question, setting maxlen will make the deque not exceed that size as items are added - older items are discarded.
Related
Consider the following code snippet.
from typing import Iterable
def geometric_progression(
start: float, multiplier: float, num_elements: int
) -> Iterable[float]:
assert num_elements >= 0
if num_elements > 0:
yield start
yield from geometric_progression(
start * multiplier, multiplier, num_elements - 1
)
This function returns the first num_elements of the geometric progression starting with start and multipliying by multiplier each time. It's easy to see that the last element will be passed through one yield-statement and num_elements-1 yield-from-statements. Does this function have O(num_elements) time complexity, or does it have O(num_elements**2) time complexity due to a "ladder" of nested yield-from-statements of depths 0, 1, 2, ..., num_elements-2, num_elements-1?
EDIT: I've come up with a simpler code snippet to demonstrate what I am asking.
from typing import Iterable, Any
def identity_with_nested_yield_from(depth: int, iterable: Iterable[Any]) -> Iterable[Any]:
assert depth >= 1
if depth == 1:
yield from iterable
else:
yield from identity_with_nested_yield_from(depth-1, iterable)
Is this function O(depth + length of iterable), or is it O(depth * length of iterable)?
I could've sworn there was an optimization in place to shortcut these kinds of yield from chains, but testing shows no such optimization, and I couldn't find anything in the places I thought the optimization was implemented either.
The generators on each level of a yield from chain must be suspended and resumed individually to pass yield and send values up and down the chain. Your function has O(num_elements**2) time complexity. Also, it hits a stack overflow once the call stack reaches a depth of 1000.
yield from is formally equivalent to a loop of response = yield child.send(response), plus error propagation and handling. When consumed in iteration, the response is always None and no errors are propagated/handled. This is equivalent to a for loop.
# `yield from child` without error handling/response
for x in child:
yield x
Thus, each yield from has the time/space complexity of iterating its argument. Stacking yield from of a size n child a total of m times thus has a time complexity of O(nm).
Does anyone understand the following iterative algorithm for producing all permutations of a list of numbers?
I do not understand the logic within the while len(stack) loop. Can someone please explain how it works?
# Non-Recursion
#param nums: A list of Integers.
#return: A list of permutations.
def permute(self, nums):
if nums is None:
return []
nums = sorted(nums)
permutation = []
stack = [-1]
permutations = []
while len(stack):
index = stack.pop()
index += 1
while index < len(nums):
if nums[index] not in permutation:
break
index += 1
else:
if len(permutation):
permutation.pop()
continue
stack.append(index)
stack.append(-1)
permutation.append(nums[index])
if len(permutation) == len(nums):
permutations.append(list(permutation))
return permutations
I'm just trying to understand the code above.
As mentioned in the comments section to your question, debugging may provide a helpful way to understand what the code does. However, let me provide a high-level perspective of what your code does.
First of all, although there are no recursive calls to the function permute, the code your provided is effectively recursive, as all it does is keeping its own stack, instead of using the one provided by the memory manager of your OS. Specifically, the variable stack is keeping the recursive state, so to speak, that is passed from one recursive call to another. You could, and perhaps should, consider each iteration of the outer while loop in the permute function as a recursive call. If you do so, you will see that the outer while loop helps 'recursively' traverse each permutation of nums in a depth-first manner.
Noticing this, it's fairly easy to figure out what each 'recursive call' does. Basically, the variable permutation keeps the current permutation of nums which is being formed as while loop progresses. Variable permutations store all the permutations of nums that are found. As you may observe, permutations are updated only when len(permutation) is equal to len(nums) which can be considered as the base case of the recurrence relation that is being implemented using a custom stack. Finally, the inner while loop picks which element of nums to add to the current permutation(i.e. stored in variable permutation) being formed.
So that is about it, really. You can figure out what is exactly being done on the lines relevant to the maintenance of stack using a debugger, as suggested. As a final note, let me repeat that I, personally, would not consider this implementation to be non-recursive. It just so happens that, instead of using the abstraction provided by the OS, this recursive solution keeps its own stack. To provide a better understanding of how a proper non-recursive solution would be, you may observe the difference in recursive and iterative solutions to the problem of finding nth Fibonacci number provided below. As you can see, the non-recursive solution keeps no stack, and instead of dividing the problem into smaller instances of it(recursion) it builds up the solution from smaller solutions. (dynamic programming)
def recursive_fib(n):
if n == 0:
return 0
elif n == 1:
return 1
return recursive_fib(n-1) + recursive_fib(n-2)
def iterative_fib(n):
f_0 = 0
f_1 = 1
for i in range(3, n):
f_2 = f_1 + f_0
f_0 = f_1
f_1 = f_2
return f_1
The answer from #ilim is correct and should be the accepted answer but I just wanted to add another point that wouldn't fit as a comment. Whilst I imagine you are studying this algorithm as an exercise it should be pointed out that a better way to proceed, depending on the size of the list, may be to user itertools's permutations() function:
print [x for x in itertools.permutations([1, 2, 3])]
Testing on my machine with a list of 11 items (39m permutations) took 1.7secs with itertools.permutations(x) but took 76secs using the custom solution above. Note however that with 12 items (479m permutations) the itertools solution blows up with a memory error. If you need to generate permutations of such size efficiently you may be better dropping to native code.
This question already has answers here:
Python generator that groups another iterable into groups of N [duplicate]
(9 answers)
Closed 1 year ago.
I am passing the result of itertools.zip_longest to itertools.product, however I get errors when it gets to the end and finds None.
The error I get is:
Error: (, TypeError('sequence item 0: expected str instance, NoneType found',), )
If I use zip instead of itertools.zip_longest then I don't get all the items.
Here is the code I am using to generate the zip:
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
print(args)
#return zip(*args)
return itertools.zip_longest(*args)
sCharacters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~`!##$%^&*()_-+={[}]|\"""':;?/>.<,"
for x in grouper(sCharacters, 4):
print(x)
Here is the output. The first one is itertools.zip_longest and the second is just zip. You can see the first with the None items and the second is missing the final item, the comma: ','
How can I get a zip of all characters in a string without the none at the end.
Or how can I avoid this error?
Thanks for your time.
I've had to solve this in a performance critical case before, so here is the fastest code I've found for doing this (works no matter the values in iterable):
from itertools import zip_longest
def grouper(n, iterable):
fillvalue = object() # Guaranteed unique sentinel, cannot exist in iterable
for tup in zip_longest(*(iter(iterable),) * n, fillvalue=fillvalue):
if tup[-1] is fillvalue:
yield tuple(v for v in tup if v is not fillvalue)
else:
yield tup
The above is, a far as I can tell, unbeatable when the input is long enough and the chunk sizes are small enough. For cases where the chunk size is fairly large, it can lose out to this even uglier case, but usually not by much:
from future_builtins import map # Only on Py2, and required there
from itertools import islice, repeat, starmap, takewhile
from operator import truth # Faster than bool when guaranteed non-empty call
def grouper(n, iterable):
'''Returns a generator yielding n sized groups from iterable
For iterables not evenly divisible by n, the final group will be undersized.
'''
# Can add tests to special case other types if you like, or just
# use tuple unconditionally to match `zip`
rettype = ''.join if type(iterable) is str else tuple
# Keep islicing n items and converting to groups until we hit an empty slice
return takewhile(truth, map(rettype, starmap(islice, repeat((iter(iterable), n)))))
Either approach seamlessly leaves the final element incomplete if there aren't sufficient items to complete the group. It runs extremely fast because literally all of the work is pushed to the C layer in CPython after "set up", so however long the iterable is, the Python level work is the same, only the C level work increases. That said, it does a lot of C work, which is why the zip_longest solution (which does much less C work, and only trivial Python level work for all but the final chunk) usually beats it.
The slower, but more readable equivalent code to option #2 (but skipping the dynamic return type in favor of just tuple) is:
def grouper(n, iterable):
iterable = iter(iterable)
while True:
x = tuple(islice(iterable, n))
if not x:
return
yield x
Or more succinctly with Python 3.8+'s walrus operator:
def grouper(n, iterable):
iterable = iter(iterable)
while x := tuple(islice(iterable, n)):
yield x
the length of sCharacters is 93 (Note, 92 % 4 ==0). so since zip outputs a sequence of length of the shortest input sequence, it will miss the last element
Beware, the addition of the Nones of itertools.zip_longest are artificial values which may not be the desired behaviour for everyone. That's why zip just ignores unneccessary, additional values
EDIT:
to be able to use zip you could append some whitespace to your string:
n=4
sCharacters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~`!##$%^&*()_-+={[}]|\"""':;?/>.<,"
if len(sCharacters) % n > 0:
sCharacters = sCharacters + (" "*(n-len(sCharacters) % n))
EDIT2:
to obtain the missing tail when using zip use code like this:
tail = '' if len(sCharacters)%n == 0 else sCharacters[-(len(sCharacters)%n):]
for x in records:
data = {}
for y in sObjectName.describe()['fields']
data[y['name']] = x[y['name']]
ls.append(adapter.insert_posts(collection, data))
I want to execute the code ls.append(adapter.insert_post(collection, x)) in the batch size of 500, where x should contain 500 data dicts. I could create a list a of 500 data dicts using a double for loop and a list and then insert it. I could do that in the following way, , is there a better way to do it? :
for x in records:
for i in xrange(0,len(records)/500):
for j in xrange(0,500):
l=[]
data = {}
for y in sObjectName.describe()['fields']:
data[y['name']] = x[y['name']]
#print data
#print data
l.append(data)
ls.append(adapter.insert_posts(collection, data))
for i in xrange(0,len(records)%500):
l=[]
data = {}
for y in sObjectName.describe()['fields']:
data[y['name']] = x[y['name']]
#print data
#print data
l.append(data)
ls.append(adapter.insert_posts(collection, data))
The general structure I use looks like this:
worklist = [...]
batchsize = 500
for i in range(0, len(worklist), batchsize):
batch = worklist[i:i+batchsize] # the result might be shorter than batchsize at the end
# do stuff with batch
Note that we're using the step argument of range to simplify the batch processing considerably.
If you're working with sequences, the solution by #nneonneo is about as performant as you can get. If you want a solution which works with arbitrary iterables, you can look into some of the itertools recipes. e.g. grouper:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return itertools.izip_longest(fillvalue=fillvalue, *args)
I tend to not use this one because it "fills" the last group with None so that it is the same length as the others. I usually define my own variant which doesn't have this behavior:
def grouper2(iterable, n):
iterable = iter(iterable)
while True:
tup = tuple(itertools.islice(iterable, 0, n))
if tup:
yield tup
else:
break
This yields tuples of the requested size. This is generally good enough, but, for a little fun we can write a generator which returns lazy iterables of the correct size if we really want to...
The "best" solution here I think depends a bit on the problem at hand -- particularly the size of the groups and objects in the original iterable and the type of the original iterable. Generally, these last 2 recipes will find less use because they're more complex and rarely needed. However, If you're feeling adventurous and in the mood for a little fun, read on!
The only real modification that we need to get a lazy iterable instead of a tuple is the ability to "peek" at the next value in the islice to see if there is anything there. here I just peek at the value -- If it's missing, StopIteration will be raised which will stop the generator just as if it had ended normally. If it's there, I put it back using itertools.chain:
def grouper3(iterable, n):
iterable = iter(iterable)
while True:
group = itertools.islice(iterable, n)
item = next(group) # raises StopIteration if the group doesn't yield anything
yield itertools.chain((item,), group)
Careful though, this last function only "works" if you completely exhaust each iterable yielded before moving on to the next one. In the extreme case where you don't exhaust any of the iterables, e.g. list(grouper3(..., n)), you'll get "m" iterables which yield only 1 item, not n (where "m" is the "length" of the input iterable). This behavior could actually be useful sometimes, but not typically. We can fix that too if we use the itertools "consume" recipe (which also requires importing collections in addition to itertools):
def grouper4(iterable, n):
iterable = iter(iterable)
group = []
while True:
collections.deque(group, maxlen=0) # consume all of the last group
group = itertools.islice(iterable, n)
item = next(group) # raises StopIteration if the group doesn't yield anything
group = itertools.chain((item,), group)
yield group
Of course, list(grouper4(..., n)) will return empty iterables -- Any value not pulled from the "group" before the next invocation of next (e.g. when the for loop cycles back to the start) will never get yielded.
I like #nneonneo and #mgilson's answers but doing this over and over again is tedious. The bottom of the itertools page in python3 mentions the library more-itertools (I know this question was about python2 and this is python3 library, but some might find this useful). The following seems to do what you ask:
from more_itertools import chunked # Note: you might also want to look at ichuncked
for batch in chunked(records, 500):
# Do the work--`batch` is a list of 500 records (or less for the last batch).
Maybe something like this?
l = []
for ii, x in enumerate(records):
data = {}
for y in sObjectName.describe()['fields']
data[y['name']] = x[y['name']]
l.append(data)
if not ii % 500:
ls.append(adapter.insert_posts(collection, l))
l = []
I think one particular case scenario is not covered here. Let`s say the batch size is 100 and your list size is 103, the above answer might miss the last 3 element.
list = [.....] 103 elements
total_size = len(list)
batch_size_count=100
for start_index in range(0, total_size, batch_size_count):
list[start_index : start_index+batch_size_count] #Slicing operation
Above sliced list can be sent to each method call to complete the execution for all the elements.
I need a loop containing range(3,666,2) and 2 (for the sieve of Eratosthenes, by the way). This doesn't work ("AttributeError: 'range' object has no attribute 'extend'" ... or "append"):
primes = range(3,limit,2)
primes.extend(2)
How can I do it in the simple intuitive pythonesque way?
range() in Python 3 returns a dedicated immutable sequence object. You'll have to turn it into a list to extend it:
primes = list(range(3, limit, 2))
primes.append(2)
Note that I used list.append(), not list.extend() (which expects a sequence of values, not one integer).
However, you probably want to start your loop with 2, not end it. Moreover, materializing the whole range into a list requires some memory and kills the efficiency of the object. Use iterator chaining instead:
from itertools import chain
primes = chain([2], range(3, limit, 2))
Now you can loop over primes without materializing a whole list in memory, and still include 2 at the start of the loop.
If you're only looping and don't want to materialise, then:
from itertools import chain
primes = chain([2], range(3, limit, 2))
I think the two makes more sense at the start though...