I'm looking for a way to reverse a generator object. I know how to reverse sequences:
foo = imap(seq.__getitem__, xrange(len(seq)-1, -1, -1))
But is something similar possible with a generator as the input and a reversed generator as the output (len(seq) stays the same, so the value from the original sequence can be used)?
You cannot reverse a generator in any generic way except by casting it to a sequence and creating an iterator from that. Later terms of a generator cannot necessarily be known until the earlier ones have been calculated.
Even worse, you can't know if your generator will ever hit a StopIteration exception until you hit it, so there's no way to know what there will even be a first term in your sequence.
The best you could do would be to write a reversed_iterator function:
def reversed_iterator(iter):
return reversed(list(iter))
EDIT: You could also, of course, replace reversed in this with your imap based iterative version, to save one list creation.
reversed(list(input_generator)) is probably the easiest way.
There's no way to get a generator's values in "reverse" order without gathering all of them into a sequence first, because generating the second item could very well rely on the first having been generated.
You have to walk through the generator anyway to get the first item so you might as well make a list. Try
reversed(list(g))
where g is a generator.
reversed(tuple(g))
would work as well (I didn't check to see if there is a significant difference in performance).
def reverseGenerator(gen):
new = [i for i in gen]
yield new[::-1][0]
new.pop()
yield from reverseGenerator(i for i in new)
Related
Everyone says you lose the benefit of generators if you put the result into a list.
But you need a list, or a sequence, to even have a generator to begin with, right? So, for example, if I need to go through the files in a directory, don't I have to make them into a list first, like with os.listdir()? If so, then how is that more efficient? (I am always working with strings and files, so I really hate that all the examples use range and integers, but I digress)
Taking it a step further, the mere presence of the yield keyword is supposed to make a generator. So if I do:
for x in os.listdir():
yield x
Is a list still being created? Or is os.listdir() itself now also magically a generator? Is it possible that, os.listdir() not having been called yet, that there really isn't a list here yet?
Finally, we are told that iterators need iter() and next() methods. But doesn’t that also mean they need an index? If not, what is next() operating on? How does it know what is next without an index? Before 3.6, dict keys had no order, so how did that iteration work?
No.
See, there's no list here:
def one():
while True:
yield 1
Index and next() are two independent tools to perform an iteration. Again, if you have an object such that its iterator's next() always returns 1, you don't need any indices.
In deeper detail...
See, technically, you can always associate a list and an index with any generator or iterator: simply write down all its returned values — you'll get at most countable set of values a₀, a₁, ... But those are merely a mathematical formalism quite unnecessarily having anything in common with how a real generator works. For instance, you have a generator that always yields one. You can count how many ones have you got from it so far, and call that an index. You can write down all that ones, comma-separated, and call that a list. Do those two objects correctly describe your elapsed generator's output? Apparently so. Are they in a least bit important for the generator itself? Not really.
Of course, a real generator will probably have a state (you can call it an index—provided you don't necessarily call something an index if it is only a non-negative integral scalar; you can write down all its states, provided it works deterministically, number them and call current state's number index—yes, approximately that). They will always have a source of their states and returned values. So, indices and lists can be regarded as abstractions that describe object's behaviour. But quite unnecessary they are concrete implementation details that are really used.
Consider unbuffered file reader. It retrieves a single byte from the disk and immediately yields it. There's no a real list in memory, only the file contents on the disk (there may even be no, if our file reader is connected to a net socket instead of a real disk drive, and the Oracle of Delphi is at connection's other end). You can call file position index—until you read the stdin, which is only forward-traversable and thus indexing it makes no real physical sense—same goes for network connections via unreliable protocol, BTW.
Something like this.
1) This is wrong; it is just the easiest example to explain a generator from a list. If you think of the 8 queens-problem and you return each position as soon as the program finds it, I can't recognize a result list anywhere. Note, that often iterators are alternately offered even by python standard library (islice() vs. slice(), and an easy example not representable by a list is itertools.cycle().
In consequence 2 and 3 are also wrong.
for j in xrange(len(self.segments)):
*
***some code here***
*
if (****condition*****):
self.segments.append(segB)
So, i have a for loop and xrange(self.segments) where self.segments is incrising!
do you think there is a problem?
You won't iterate over the indices that correspond to the elements that you have added because xrange is evaluated when the loop starts. It doesn't get re-evaluated after that.
Whether or not this is wrong depends entirely on what you're trying to do. If you want to iterate over the list's elements (and you want to catch the ones that you're adding as well), then you can probably get away with:
for item in self.segments:
#...
if whatever:
self.segments.append(segB)
This is because lists iterate in a predictable way. This only works since you're adding to the end of the list -- It wouldn't necessarily work if you .insert data somewhere in the middle.
I have a Python list of objects that could be pretty long. At particular times, I'm interested in all of the elements in the list that have a certain attribute, say flag, that evaluates to False. To do so, I've been using a list comprehension, like this:
objList = list()
# ... populate list
[x for x in objList if not x.flag]
Which seems to work well. After forming the sublist, I have a few different operations that I might need to do:
Subscript the sublist to get the element at index ind.
Calculate the length of the sublist (i.e. the number of elements that have flag == False).
Search the sublist for the first instance of a particular object (i.e. using the list's .index() method).
I've implemented these using the naive approach of just forming the sublist and then using its methods to get at the data I want. I'm wondering if there are more efficient ways to go about these. #1 and #3 at least seem like they could be optimized, because in #1 I only need the first ind + 1 matching elements of the sublist, not necessarily the entire result set, and in #3 I only need to search through the sublist until I find a matching element.
Is there a good Pythonic way to do this? I'm guessing I might be able to use the () syntax in some way to get a generator instead of creating the entire list, but I haven't happened upon the right way yet. I obviously could write loops manually, but I'm looking for something as elegant as the comprehension-based method.
If you need to do any of these operations a couple of times, the overhead of other methods will be higher, the list is the best way. It's also probably the clearest, so if memory isn't a problem, then I'd recommend just going with it.
If memory/speed is a problem, then there are alternatives - note that speed-wise, these might actually be slower, depending on the common case for your software.
For your scenarios:
#value = sublist[n]
value = nth(x for x in objList if not x.flag, n)
#value = len(sublist)
value = sum(not x.flag for x in objList)
#value = sublist.index(target)
value = next(dropwhile(lambda x: x != target, (x for x in objList if not x.flag)))
Using itertools.dropwhile() and the nth() recipe from the itertools docs.
I'm going to assume you might do any of these three things, and you might do them more than once.
In that case, what you want is basically to write a lazily evaluated list class. It would keep two pieces of data, a real list cache of evaluated items, and a generator of the rest. You could then do ll[10] and it would evaluate up to the 10th item, ll.index('spam') and it would evaluate until it finds 'spam', and then len(ll) and it would evaluate the rest of the list, all the while caching in the real list what it sees so nothing is done more than once.
Constructing it would look like this:
LazyList(x for x in obj_list if not x.flag)
But nothing would actually be computed until you actually start using it as above.
Since you commented that your objList can change, if you don't also need to index or search objList itself, then you might be better off just storing two different lists, one with .flag = True and one with .flag = False. Then you can use the second list directly instead of constructing it with a list comprehension each time.
If this works in your situation, it is likely the most efficient way to do it.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
sorted() using Generator Expressions Rather Than Lists
We all know using generators instead of instantiating lists all the time saves time and memory, especially if we use comprehensions a lot.
Here's a question though, consider the following code:
output = SomeExpensiveCallEgDatabase()
results = [result[0] for result in output]
return sorted(results)
The call to sorted will return a sorted list of the results. Would it be better or worse to declare results as below and then call sorted?
results = (result[0] for result in output)
My guess is the call to sorted() would traverse the generator and instantiate a list itself in order to run quicksort or mergesort on it. So there would be no advantage in using the generator here. Is this assumption correct?
I believe your assumption to be true, since there is no easy way of ordering the collection without first having the whole list in memory (at least certainly not with the default sorting algorithm, TimSort if I'm not mistaken).
Check this out:
sorted() using Generator Expressions Rather Than Lists
To create the new List, the builtin sorted method uses PySequence_List:
PyObject* PySequence_List(PyObject *o) Return value: New reference.
Return a list object with the same contents as the arbitrary sequence
o. The returned list is guaranteed to be new.
Pros and cons of both approaches:
Memory-wise:
The returned list is the one used for the sorted version, so this would mean that in this case, only one list is stored completely in memory at any given time, using the generator version.
This makes the generator version more efficient memory-wise.
Speed:
Here the version with the whole list wins.
To create a new list based on a generator, an empty list must be created (or at best with the first element), and each following element appended to the list, with the possible redimensioning steps this may provoke.
To create a new list based on a previous list, the size of the list is known beforehand, and thus can be allocated at once and each of the entries assigned (possibly, there are other optimizations at work here, but I can't back that up).
So regarding speed, the list wins.
The answer to "what's the best", comes down to the most common answer in any field of engineering... it depends....
No you are still creating a brand new list with sorted()
output = SomeExpensiveCallEgDatabase()
results = [result[0] for result in output]
results.sort()
return results
would be closer to the generator version.
I believe it's better to use the generator version because some future version of Python may be able to take advantage of this to work more efficiently. It's always nice to get a speed up for free.
Yes, you are correct (although I believe the sorting routine is still called tim-sort, after uncle timmy <wink-ly y'rs>)
Here's a example of what I want to do
spam_list = ["We", "are", "the", "knights", "who", "say", "Ni"]
spam_order = [0,1,2,4,5,6,3]
spam_list.magical_sort(spam_order)
print(spam_list)
["We", "are", "the", "who", "say", "Ni", "knights"]
I can do it with enumerate, list and so on, but I would like to directly affect spam_list, like list.sort() and not copy it like sorted()
Edit : pushed a string example to avoid confusion between indices and values of spam_list
Edit : turned out this is a duplicate of Python sort parallel arrays in place?. Well, I can't delete so much efforts for SO consistency arguments.
You could try:
spam_list = [spam_list[i] for i in spam_order]
You can give a special key to the sort function:
order = dict(zip(spam_list, spam_order))
spam_list.sort(key=order.get)
Edit: As #ninjagecko points out in his answer, this is not really efficient, as it copies both lists to create the dictionary for the lookup. However, with the modified example given by the OP, this is the only way, because one has to build some index. The upside is that, at least for the strings, the values will not be copied, so the overhead is just that of the dictionary itself.
but I would like to directly affect spam_list, like list.sort() and not copy it like sorted()
There is ONLY ONE SOLUTION, that does exactly what you ask. Every single other solution is implicitly making a copy of one or both lists (or turning it into a dict, etc.). What you are asking for is a method which sorts two lists in-place, using O(1) extra space, using one list as the keys of the other. I personally would just accept the extra space complexity, but if you really wanted to, you could do this:
(edit: it may be the case that the original poster doesn't really care about .sort because it's efficient, but rather because it modifies state; in general this is a dangerous thing to want and non-low-level languages attempt to avoid this and even ban it, but the solutions which use slice assignment will achieve "in-place" semantics)
Create a custom dictionary subclass (effectively a Zip class) which is backed by both lists you are sorting.
Indexing myZip[i] -> results in the tuple (list1[i],list2[i])
Assignment myZip[i]=(x1,x2) -> dispatches into list1[i]=x1, list2[i]=x2.
Use that to do myZip(spam_list,spam_order).sort(), and now both spam_list and spam_order are sorted in-place
Example:
#!/usr/bin/python3
class LiveZip(list):
def __init__(self, list1, list2):
self.list1 = list1
self.list2 = list2
def __len__(self):
return len(self.list1)
def __getitem__(self, i):
return (self.list1[i], self.list2[i])
def __setitem__(self, i, tuple):
x1,x2 = tuple
self.list1[i] = x1
self.list2[i] = x2
spam_list = ["We", "are", "the", "knights", "who", "say", "Ni"]
spam_order = [0,1,2,4,5,6,3]
#spam_list.magical_sort(spam_order)
proxy = LiveZip(spam_order, spam_list)
Now let's see if it works...
#proxy.sort()
#fail --> oops, the internal implementation is not meant to be subclassed! lame
# It turns out that the python [].sort method does NOT work without passing in
# a list to the constructor (i.e. the internal implementation does not use the
# public interface), so you HAVE to implement your own sort if you want to not
# use any extra space. This kind of dumb. But the approach above means you can
# just use any standard textbook in-place sorting algorithm:
def myInPlaceSort(x):
# [replace with in-place textbook sorting algorithm]
NOW it works:
myInPlaceSort(proxy)
print(spam_list)
Unfortunately there is no way to just sort one list in O(1) space without sorting the other; if you don't want to sort both lists, you might as well do your original approach which constructs a dummy list.
You can however do the following:
spam_list.sort(key=lambda x:x)
but if the key or cmp functions makes any references to any collection (e.g. if you pass in a dict.__getitem__ of a dict you had to construct) this is no better than your original O(N)-space approach, unless you already happened to have such a dictionary lying around.
Turns out this is a duplicate question of Python sort parallel arrays in place? , but that question also had no correct answers except this one , which is equivalent to mine but without the sample code. Unless you are incredibly optimized or specialized code, I'd just use your original solution, which is equivalent in space complexity to the other solutions.
edit2:
As senderle pointed out, the OP doesn't want a sort at all, but rather wishes to, I think, apply a permutation. To achieve this, you can and SHOULD use simply indexing that other answers suggest [spam_list[i] for i in spam_order], but an explicit or implicit copy must be made still because you still need the intermediate data. (Unrelated and for the record, applying the inverse permutation is I think the inverse of parallel sorting with the identity, and you can use one to get the other, though sorting is less time-efficient. _,spam_order_inverse = parallelSort(spam_order, range(N)), then sort by spam_order_inverse. I leave the above discussion about sorting up for the record.)
edit3:
It is possible, however, to achieve an in-place permutation in O(#cycles) space but with terrible time efficiency. Every permutation can be decomposed into disjoint permutations applied in parallel on subsets. These subsets are called cycles or orbits. The period is equal to their size. You thus take a leap of faith and do as follows:
Create a temp variable.
For index i=0...N:
Put x_i into temp, assign NULL to x_i
Swap temp with x_p(i)
Swap temp with x_p(p(i))
...
Swap temp with x_p(..p(i)..), which is x_i
Put a "do not repeat" marker on the smallest element you visited larger than i
Whenever you encounter a "do not repeat" marker, perform the loop again but
without swapping, moving the marker to the smallest element larger than i
To avoid having to perform the loop again, use a bloom filter
This will run in O(N^2) time and O(#cycles) place without a bloom filter, or ~O(N) time and O(#cycle + bloomfilter_space) space if you use them
If the issue is specifically in-placeness and not memory usage per se -- if you want this to have side effects, in other words -- then you could use slice assignment. Stealing from Peter Collingridge:
other_spam_list = spam_list
spam_list[:] = [spam_list[i] for i in spam_order]
assert other_spam_list == spam_list
It seems you might even be able to do this with a generator expression! But I suspect this still implicitly creates a new sequence of some sort -- probably a tuple. If it didn't, I think it would exhibit wrong behavior; but I tested it, and its behavior seemed correct.
spam_list[:] = (spam_list[i] for i in spam_order)
Aha! See this excellent answer by the inimitable Sven Marnach -- generator slice assignment does indeed generate an implicit tuple. Which means it's safe, but not as memory efficient as you might think. Still, tuples are more memory efficient than lists, so the generator expression is preferable from that perspective.
map(lambda x:spam_list[x], spam_order)
If you actually don't care about the efficiency at all, and just want in-place semantics (which is a bit odd, because there are entire programming languages dedicated to avoiding in-place semantics), then you can do this:
def modifyList(toModify, newList):
toModify[:] = newList
def permuteAndUpdate(toPermute, permutation):
modifyList(toPermute, [toPermute[i] for i in permutation])
permuteAndUpdate(spam_list, spam_order)
print(spam_list)
# ['We', 'are', 'the', 'Ni', 'knights', 'who', 'say']
Credit goes to senderle for recognizing that this is what the OP may actually be after; he should feel free to copy this answer into his own. Should not accept this answer unless you really prefer it over his.
You may use numpy.
import numpy as np
spam_list = list(np.array(spam_list)[spam_order])