Is there a point to using nested iterators? - python

I was reading through some older code of mine and came across this line
itertools.starmap(lambda x,y: x + (y,),
itertools.izip(itertools.repeat(some_tuple,
len(list_of_tuples)),
itertools.imap(lambda x: x[0],
list_of_tuples)))
To be clear, I have some list_of_tuples from which I want to get the first item out of each tuple (the itertools.imap), I have another tuple that I want to repeat (itertools.repeat) such that there is a copy for each tuple in list_of_tuples, and then I want to get new, longer tuples based on the items from list_of_tuples (itertools.starmap).
For example, suppose some_tuple = (1, 2, 3) and list_of_tuples = [(1, other_info), (5, other), (8, 12)]. I want something like [(1, 2, 3, 1), (1, 2, 3, 5), (1, 2, 3, 8)]. This isn't the exact IO (it uses some pretty irrelevant and complex classes) and my actual lists and tuples are very big.
Is there a point to nesting the iterators like this? It seems to me like each function from itertools would have to iterate over the iterator I gave it and store the information from it somewhere, meaning that there is no benefit to putting the other iterators inside of starmap. Am I just completely wrong? How does this work?

There is no reason to nest iterators. Using variables won't have a noticeable impact on performance/memory:
first_items = itertools.imap(lambda x: x[0], list_of_tuples)
repeated_tuple = itertools.repeat(some_tuple, len(list_of_tuples))
items = itertools.izip(repeated_tuple, first_items)
result = itertools.starmap(lambda x,y: x + (y,), items)
The iterator objects used and returned by itertools do not store all the items in memory, but simply calculate the next item when it is needed. You can read more about how they work here.

I don't believe the combobulation above is necessary in this case.
it appears to be equivalent to this generator expression:
(some_tuple + (y[0],) for y in list_of_tuples)
However occasionally itertools can have a performance advantage especially in cpython

Related

Fast, pythonic way to get all tuples obtained by dropping the elements from a given tuple?

Given a tuple T that contains all different integers, I want to get all the tuples that result from dropping individual integers from T. I came up with the following code:
def drop(T):
S = set(T)
for i in S:
yield tuple(S.difference({i}))
for t in drop((1,2,3)):
print(t)
# (2,3)
# (1,3)
# (1,2)
I'm not unhappy with this, but I wonder if there is a better/faster way because with large tuples, difference() needs to look for the item in the set, but I already know that I'll be removing items sequentially. However, this code is only 2x faster:
def drop(T):
for i in range(len(T)):
yield T[:i] + T[i+1:]
and in any case, neither scales linearly with the size of T.
Instead of looking at it as "remove one item each item" you can look at it as "use all but one" and then using itertools it becomes straightforward:
from itertools import combinations
T = (1, 2, 3, 4)
for t in combinations(T, len(T)-1):
print(t)
Which gives:
(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)
* Assuming the order doesn't really matter
From your description, you're looking for combinations of the elements of T. With itertools.combinations, you can ask for all r-length tuples, in sorted order, without repeated elements. For example :
import itertools
T = [1,2,3]
for i in itertools.combinations(T, len(T) - 1):
print(i)

How to get Cartesian product in Python using a generator?

I'm trying to get the Cartesian product of multiple arrays but the arrays are pretty large and I am trying to optimize memory usage. I have tried implementing a generator by using the below code but it just returns that there is a generator at a certain location.
import itertools
x = [[1,2],[3,4]]
def iter_tools(*array):
yield list(itertools.product(*array))
print(iter_tools(*x))
When I try the same code but with return instead of yield it works fine. How could I get the cartesian product by implementing a generator?
Bottom line, itertools.product is already an iterator. You don't need to write your own. (A generator is a kind of iterator.) For example:
>>> x = [[1, 2], [3, 4]]
>>> p = itertools.product(*x)
>>> next(p)
(1, 3)
>>> next(p)
(1, 4)
Now, to explain, it seems like you're misunderstanding something fundamental. A generator function returns a generator iterator. That's what you're seeing from the print:
>>> iter_tools(*x)
<generator object iter_tools at 0x7f05d9bc3660>
Use list() to cast an iterator to a list.
>>> list(iter_tools(*x))
[[(1, 3), (1, 4), (2, 3), (2, 4)]]
Note how it's a nested list. That's because your iter_tools yields one list then nothing else. On that note, that part makes no sense because casting itertools.product to a list defeats the whole purpose of an iterator - lazy evaluation. If you actually wanted to yield the values from an iterator, you would use yield from:
def iter_tools(*array):
yield from itertools.product(*array)
In this case iter_tools is pointless, but if your actual iter_tools is more complex, this might be what you actually want.
See also:
what's the difference between yield from and yield in python 3.3.2+
How to Use Generators and yield in Python - Real Python
This answer is partly based on juanpa.arrivillaga's comment
The idea of a generator is that you don't do all the calculation at the same time, as you do with your call list(itertools.product(*array)). So what you want to do is generate the results one by one. For example like this:
def iter_tools(*array):
for i in array[0]:
for j in array[1]:
yield (i, j)
You can then do something with each resulting tuple like this:
for tup in iter_tools(*x):
print(tup)
Of course you can easily adapt the generator so that it yields each row or columns per call.
Or if you are happy with what itertools provides:
for i in itertools.product(*x):
print(i)
What you need depends on your use-case. Hope I could help you :)
If you want to yield individual item from the cartesian product, you need to iterate over the product:
import itertools
x = [[1,2],[3,4]]
def iter_tools(*array):
for a in itertools.product(*array):
yield a
for a in iter_tools(*x):
print(a)

Question about combing the sum() with zip()

a=[1,2,3]
b=[3,4,5,2]
c=[60,70,80]
sum(zip(a,b,c),())
what's the logic for the sum function here? why does it return a single tuple? especially why the following won't work
sum(zip(a,b,c))
The sum() function, simply concatenates items together with "+" and an initial value. Likewise, the zip() function produces tupled items together. Explicitly:
list(zip(a,b,c)) # [(1, 3, 60), (2, 4, 70), (3, 5, 80)]
sum([1,2,3],0) # 0 + 1 + 2 + 3
sum(zip(a,b,c),()) # () + (1,3,60) + (2,4,70) + (3,5,80)
Hope this helps explain the sum() and zip() functions. zip() can be tricky to see what it is doing since it produces an iterator instead of an answer. If you want to see what zip() does, wrap it in a list().
The sum(zip(a,b,c)) fails because the default initial value is 0. Hence, python tried to do 0 + (1,3,60) + ..., which fails because a 0 cannot be added to a tuple.
The other answers are useful in resolving any confusion, but perhaps the result you might be looking for is achieved by doing this:
sum(a+b+c)
because the + operator when applied to lists, concatenates them into a single list whereas zip does not
zip() does not do what you think it does. sum() will add the items of its input and return the result. In your case, you want to sum numbers from 3 lists. zip() returns tuples containing elements of the same index from the inputs, and when the result of this is passed to sum, it concatenates the tuples, leaving you with your undesired result. The fix is to use itertools.chain to combine the lists, then use sum to sum the numbers in those lists.
To show exactly how zip() works, an example should be useful:
a = ["a", "b", "c"]
b = [1, 2, 3]
list(zip(a, b)) -> [('a', 1), ('b', 2), ('c', 3)]
zip returned a generator of tuples (converted to a list here), each containing the element from each input that corresponds to the index of the tuple in the result, i.e, list(zip(a, b))[index] == (a[index], b[index])
What you want is this:
sum(itertools.chain(a, b, c))
EDIT: Make sure to import itertools first.

Python, Make variable equal to the second column of an array

I realise that there's a fair chance this has been asked somewhere else, but to be honest I'm not sure exactly what terminology I should be using to search for it.
But basically I've got a list with a varying number of elements. Each element contains 3 values: A string, another list, and an integer eg:
First element = ('A', [], 0)
so
ListofElements[0] = [('A', [], 0)]
And what I am trying to do is make a new list that consists of all of the integers(3rd thing in the elements) that are given in ListofElements.
I can do this already by stepping through each element of ListofElements and then appending the integer onto the new list shown here:
NewList=[]
for element in ListofElements:
NewList.append(element[2])
But using a for loop seems like the most basic way of doing it, is there a way that uses less code? Maybe a list comprehension or something such as that. It seems like something that should be able to be done on a single line.
That is just a step in my ultimate goal, which is to find out the index of the element in ListofElements that has the minimum integer value. So my process so far is to make a new list, and then find the integer index of that new list using:
index=NewList.index(min(NewList))
Is there a way that I can just avoid making the new list entirely and generate the index straight away from the original ListofElements? I got stuck with what I would need to fill in to here, or how I would iterate through :
min(ListofElements[?][2])
You can use a list coprehension:
[x[2] for x in ListOfElements]
This is generally considered a "Pythonic" approach.
You can also find the minimum in a rather stylish manner using:
minimum = min(ListOfElements, key=lambda x: x[2])
index = ListOfElements.index(minimum)
Some general notes:
In python using underscores is the standard rather than CamelCase.
In python you almost never have to use an explicit for loop. Instead prefer
coprehensions or a functional pattern (map, apply etc.)
You can map your list with itemgetter:
>>> from operator import itemgetter
>>> l = [(1, 2, 3), (1, 2, 3), (1, 2, 3), (1, 2, 3), (1, 2, 3)]
>>> map(itemgetter(2), l)
[3, 3, 3, 3, 3]
Then you can go with your approach to find the position of minimum value.

Document of Functions that accept iterators

I was having trouble with a project and was later able to successfully complete it. However, while running through some code written by someone else, I noticed they were able to utilize an iterator (for loop) within the join-function.
example:
' '.join(x for x in name.split('*'))
I thought this was awesome as it helped me cut down lines of code from my original draft.
So my question is: Are there any documents that have a list of functions that accept iterators?
I could be mistaken here, but I think what you mean by iterator is in fact called a list comprehension in python. It's not that the list comprehension in question does not return an iterable, but it seems that you are impressed not with the fact that you could pass an iterable to the join function, but instead that the fact that you could put what seems to be flow control inline. Again, tell me if I'm wrong about this.
List comprehensions can be in the form of tuples (returns a generator) or lists (returns a list). To see the difference between these two, type the following in a python shell:
>>> (x for x in 'cool')
<generator object <genexpr> at 0x03980990>
>>> [x for x in 'cool']
['c', 'o', 'o', 'l']
I would imagine it is obvious how you can work with a list, but if you want to learn more about how generators work, you might want to check this out.
Also, the fun doesn't end there with list comprehensions. The possibilities are endless.
>>> [x for x in [1,5,4,7,8,2,6,3] if x > 3]
[5, 4, 7, 8, 6]
>>> [(x,y) for x in range(3) for y in range(3)]
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
To learn more about list comprehensions in general, try here.
They're called generators, and they work in many places that accept lists or tuples. The generic term for all three is iterable. But it depends on what the code in question does. If it just iterates then a generator will work. If it tries to get the len() or access items by index, it won't.
There isn't a list of functions that accept generators or iterables, no; nobody organizes documentation that way.
Technically, the argument to str.join() in your example is called a "generator expression". A generator expression evals to an iterable object
- note that an iterable is not necessarily an iterator (but iterators are iterable).
I assume your question really was about "functions that accept generator expressions". If yes, the answer is above: any function that expects an iterable, since arguments are eval'd before being passed so the generator expression is turned into an iterable before the function is actually called.
Note that there's a distinction to be made between iterables and "sequences types" (strings, tuples, lists, sets etc): the later are indeed iterable, but they have some other specifities too (ie they usually have a length, can be iterated more than once etc) so not all functions expecting a sequence will work with non-sequence iterators. But this is usually documented.

Categories

Resources