Itertools Python - python

Is there any way to use itertools product function where the function returns each combination of lists step by step ?
For example:
itertools.product(*mylist)
-> the solution should return the first combination of the lists , after that the second one etc.

As #ggorlen has explained, itertools.product(...) returns an iterator. For example, if you have
import itertools
mylist = [('Hello','Hi'),('Andy','Betty')]
iterator = itertools.product(*mylist)
next(iterator) or iterator.__next__() will evaluate to 'Hello Andy' the first time you call them, for example. When you next call next(iterator), it will return 'Hello Betty', then 'Hi Andy', and finally 'Hi Betty', before raising StopIteration errors.
You can also convert an iterator into a list with list(iterator), if you are more comfortable with a list, but if you just need the first few values and mylist is big, this would be really inefficient, and it might be worth the time familiarising yourself with iterators.
Do consider whether you are really just iterating through iterator though. If so, just use
for combination in iterator:
pass # Body of loop
Even if you just need the first n elements and n is large you can use
for combination in itertools.islice(iterator, n):
pass # Body of loop

Related

Why does my generator function doesn't loop? [duplicate]

This question already has answers here:
What does the "yield" keyword do in Python?
(51 answers)
Closed 1 year ago.
I'm creating a python generator to loop over a sentence. "this is a test" should return
this
is
a
test
Question1: What's the problem with the implementation below? It only return "this" and doesn't loop. how to fix it?
def my_sentense_with_generator(sentence):
index = 0
words = sentence.split()
current = index
yield words[current]
index +=1
for i in my_sentense_with_generator('this is a test'):
print(i)
>> this
Question2 : Another way of implementation is below. It works. But i'm confused about the purpose of using 'for' here. I was taught that in one way, generator is used in lieu of "for loop" so that python doesn't have to build up the the whole list upfront, so it takes much less memory and time link. But in this solution, it uses for loop to construct a generator.. does it defeat the purpose of generator??
def my_sentense_with_generator(sentence):
for w in sentence.split():
yield w
The purpose of a generator is not to avoid defining a loop, it is to generate the elements only when they are needed (and not when it is constructed)
In your 1st example, you need a loop in the generator as well. Otherwise the generator is only able to generate a single element, then it is exhausted.
NB. In the generator below, the str.split creates a list, so there is no memory benefit in using a generator. This could be replaced by an iterator iter(sentence.split())
def my_sentence_with_generator(sentence):
words = sentence.split()
for word in words:
yield word
for i in my_sentence_with_generator('this is a test'):
print(i)
output:
this
is
a
test
The loop in the generator defines the elements of the generator, if will pause at a yield until something requests an element of the generator. So you also need one loop outside the generator to request the elements.
Example of a partial collection of the elements:
g = my_sentence_with_generator('this is a test')
next(g), next(g)
output: ('this', 'is')
example of the utility of a generator:
def count():
'''this generator can yield 1 trillion numbers'''
for i in range(1_000_000_000_000_000):
yield i
# we instanciate the generator
c = count()
# we collect only 3 elements, this consumes very little memory
next(c), next(c), next(c)
str.split returns a list, so you're not going to avoid creating a list if you call that within your generator. And you'll either need to keep the original string in memory until you're done with the results, or create twice as many strings as you'd need otherwise if you don't want to have that list, otherwise it'll be impossible to figure out what next to yield.
As an example, this is what a version of str.split might look like as a generator:
def split_string(sentence, sep=None):
# Creates a list with a maximum of two elements:
split_list = sentence.split(sep, maxsplit=1)
while split_list:
yield split_list[0]
if len(split_list) == 1:
# No more separators to be found in the rest of the string, so we return
return
split_list = split_list[1].split(sep, maxsplit=1)
This creates a lot of short-lived lists, but they will never have more than 2 elements each. Not very practical, it's likely to be much less performant than just calling str.split once, but hopefully it gives you a better understanding of how generators work.

Add an arbitrary element to an xrange()?

In Python, it's more memory-efficient to use xrange() instead of range when iterating.
The trouble I'm having is that I want to iterate over a large list -- such that I need to use xrange() and after that I want to check an arbitrary element.
With range(), it's easy: x = range(...) + [arbitrary element].
But with xrange(), there doesn't seem to be a cleaner solution than this:
for i in xrange(...):
if foo(i):
...
if foo(arbitrary element):
...
Any suggestions for cleaner solutions? Is there a way to "append" an arbitrary element to a generator?
itertools.chain lets you make a combined iterator from multiple iterables without concatenating them (so no expensive temporaries):
from itertools import chain
# Must wrap arbitrary element in one-element tuple (or list)
for i in chain(xrange(...), (arbitrary_element,)):
if foo(i):
...
I would recommend keeping the arbitrary_element check out of the loop, but if you want to make it part of the loop, you can use itertools.chain:
for i in itertools.chain(xrange(...), [arbitrary_element]):
...

Python Function Not Working

I am trying to create a function, new_function, that takes a number as an argument.
This function will manipulate values in a list based on what number I pass as an argument. Within this function, I will place another function, new_sum, that is responsible for manipulating values inside the list.
For example, if I pass 4 into new_function, I need new_function to run new_sum on each of the first four elements. The corresponding value will change, and I need to create four new lists.
example:
listone=[1,2,3,4,5]
def new_function(value):
for i in range(0,value):
new_list=listone[:]
variable=new_sum(i)
new_list[i]=variable
return new_list
# running new_function(4) should return four new lists
# [(new value for index zero, based on new_sum),2,3,4,5]
# [1,(new value for index one, based on new_sum),3,4,5]
# [1,2,(new value for index two, based on new_sum),4,5]
# [1,2,3,(new value for index three, based on new_sum),5]
My problem is that i keep on getting one giant list. What am I doing wrong?
Fix the indentation of return statement:
listone=[1,2,3,4,5]
def new_function(value):
for i in range(0,value):
new_list=listone[:]
variable=new_sum(i)
new_list[i]=variable
return new_list
The problem with return new_list is that once you return, the function is done.
You can make things more complicated by accumulating the results and returning them all at the end:
listone=[1,2,3,4,5]
def new_function(value):
new_lists = []
for i in range(0,value):
new_list=listone[:]
variable=new_sum(i)
new_list[i]=variable
new_lists.append(new_list)
return new_lists
However, this is exactly what generators are for: If you yield instead of return, that gives the caller one value, and then resumes when he asks for the next value. So:
listone=[1,2,3,4,5]
def new_function(value):
for i in range(0,value):
new_list=listone[:]
variable=new_sum(i)
new_list[i]=variable
yield new_list
The difference is that the first version gives the caller a list of four lists, while the second gives the caller an iterator of four lists. Often, you don't care about the difference—and, in fact, an iterator may be better for responsiveness, memory, or performance reasons.*
If you do care, it often makes more sense to just make a list out of the iterator at the point you need it. In other words, use the second version of the function, then just writes:
new_lists = list(new_function(4))
By the way, you can simplify this by not trying to mutate new_list in-place, and instead just change the values while copying. For example:
def new_function(value):
for i in range(value):
yield listone[:i] + [new_sum(i)] + listone[i+1:]
* Responsiveness is improved because you get the first result as soon as it's ready, instead of only after they're all ready. Memory use is improved because you don't need to keep all of the lists in memory at once, just one at a time. Performance may be improved because interleaving the work can result in better cache behavior and pipelining.

Why does len() not support iterators?

Many of Python's built-in functions (any(), all(), sum() to name some) take iterables but why does len() not?
One could always use sum(1 for i in iterable) as an equivalent, but why is it len() does not take iterables in the first place?
Many iterables are defined by generator expressions which don't have a well defined len. Take the following which iterates forever:
def sequence(i=0):
while True:
i+=1
yield i
Basically, to have a well defined length, you need to know the entire object up front. Contrast that to a function like sum. You don't need to know the entire object at once to sum it -- Just take one element at a time and add it to what you've already summed.
Be careful with idioms like sum(1 for i in iterable), often it will just exhaust iterable so you can't use it anymore. Or, it could be slow to get the i'th element if there is a lot of computation involved. It might be worth asking yourself why you need to know the length a-priori. This might give you some insight into what type of data-structure to use (frequently list and tuple work just fine) -- or you may be able to perform your operation without needing calling len.
This is an iterable:
def forever():
while True:
yield 1
Yet, it has no length. If you want to find the length of a finite iterable, the only way to do so, by definition of what an iterable is (something you can repeatedly call to get the next element until you reach the end) is to expand the iterable out fully, e.g.:
len(list(the_iterable))
As mgilson pointed out, you might want to ask yourself - why do you want to know the length of a particular iterable? Feel free to comment and I'll add a specific example.
If you want to keep track of how many elements you have processed, instead of doing:
num_elements = len(the_iterable)
for element in the_iterable:
...
do:
num_elements = 0
for element in the_iterable:
num_elements += 1
...
If you want a memory-efficient way of seeing how many elements end up being in a comprehension, for example:
num_relevant = len(x for x in xrange(100000) if x%14==0)
It wouldn't be efficient to do this (you don't need the whole list):
num_relevant = len([x for x in xrange(100000) if x%14==0])
sum would probably be the most handy way, but it looks quite weird and it isn't immediately clear what you're doing:
num_relevant = sum(1 for _ in (x for x in xrange(100000) if x%14==0))
So, you should probably write your own function:
def exhaustive_len(iterable):
length = 0
for _ in iterable: length += 1
return length
exhaustive_len(x for x in xrange(100000) if x%14==0)
The long name is to help remind you that it does consume the iterable, for example, this won't work as you might think:
def yield_numbers():
yield 1; yield 2; yield 3; yield 5; yield 7
the_nums = yield_numbers()
total_nums = exhaustive_len(the_nums)
for num in the_nums:
print num
because exhaustive_len has already consumed all the elements.
EDIT: Ah in that case you would use exhaustive_len(open("file.txt")), as you have to process all lines in the file one-by-one to see how many there are, and it would be wasteful to store the entire file in memory by calling list.

python input for itertools.product

Looking for a way to simulate nested loops (or a cartesian product) i came across the itertools.product function.
i need a function or piece of code that receive a list of integers as input and returns a specific generator.
example:
input = [3,2,4] -> gen = product(xrange(3),xrange(2),xrange(4))
or
input = [2,4,5,6] -> gen = product(xrange(2),xrange(4),xrange(5),xrange(6))
as the size of the lists varies i am very confused in how to do that without the need of a lot of precoding based on a crazy amount of ifs and the size of the list.
also is there a difference in calling product(range(3)) or product(xrange(3))?
def bigproduct(*args):
newargs = [xrange(x) for x in args]
return itertools.product(*newargs)
for i in bigproduct(3, 2, 4):
....
range() generates a list up-front, therefore uses time up front and more space, but takes less time to get each element. xrange() generates each element on the fly, so takes up less space and initial time, but takes more time to return each element.
This can be easily accomplished using map:
from itertools import product
for i in product(*map(range, shape)):
print i

Categories

Resources