itertools and strided list assignment - python

Given a list, e.g. x = [True]*20, I want to assign False to every other element.
x[::2] = False
raises TypeError: must assign iterable to extended slice
So I naively assumed you could do something like this:
x[::2] = itertools.repeat(False)
or
x[::2] = itertools.cycle([False])
However, as far as I can tell, this results in an infinite loop. Why is there an infinite loop? Is there an alternative approach that does not involve knowing the number of elements in the slice before assignment?
EDIT: I understand that x[::2] = [False] * len(x)/2 works in this case, or you can come up with an expression for the multiplier on the right side in the more general case. I'm trying to understand what causes itertools to cycle indefinitely and why list assignment behaves differently from numpy array assignment. I think there must be something fundamental about python I'm misunderstanding. I was also thinking originally there might be performance reasons to prefer itertools to list comprehension or creating another n-element list.

What you are attempting to do in this code is not what you think (i suspect)
for instance:
x[::2] will return a slice containing every odd element of x, since x is of size 20,
the slice will be of size 10, but you are trying to assign a non-iterable of size 1 to it.
to successfully use the code you have you will need to do:
x = [True]*20
x[::2] = [False]*10
wich will assign an iterable of size 10 to a slice of size 10.
Why work in the dark with the number of elements? use
len(x[::2])
which would be equal to 10, and then use
x[::2] = [False]*len(x[::2])
you could also do something like:
x = [True if (index & 0x1 == 0) else False for index, element in enumerate(x)]
EDIT: Due to OP edit
The documentation on cycle says it Repeats indefinitely. which means it will continuously 'cycle' through the iterator it has been given.
Repeat has a similar implementation, however documentation states that it
Runs indefinitely unless the times argument is specified.
which has not been done in the questions code. Thus both will lead to infinite loops.
About the itertools being faster comment. Yes itertools are generally faster than other implementations because they are optimised to be as fast as the creators could make them.
However if you do not want to recreate a list you can use generator expressions such as the following:
x = (True if (index & 0x1 == 0) else False for index, element in enumerate(x))
which do not store all of their elements in memory but produce them as they are needed, however, generator functions can be used up.
for instance:
x = [True]*20
print(x)
y = (True if (index & 0x1 == 0) else False for index, element in enumerate(x))
print ([a for a in y])
print ([a for a in y])
will print x then the elements in the generator y, then a null list, because the generator has been used up.

As Mark Tolonen pointed out in a concise comment, the reason why your itertools attempts are cycling indefinitely is because, for the list assignment, python is checking the length of the right hand side.
Now to really dig in...
When you say:
x[::2] = itertools.repeat(False)
The left hand side (x[::2]) is a list, and you are assigning a value to a list where the value is the itertools.repeat(False) iterable, which will iterate forever since it wasn't given a length (as per the docs).
If you dig into the list assignment code in the cPython implementation, you'll find the unfortunately/painfully named function list_ass_slice, which is at the root of a lot of list assignment stuff. In that code you'll see this segment:
v_as_SF = PySequence_Fast(v, "can only assign an iterable");
if(v_as_SF == NULL)
goto Error;
n = PySequence_Fast_GET_SIZE(v_as_SF);
Here it is trying to get the length (n) of the iterable you are assigning to the list. However, before even getting there it is getting stuck on PySequence_Fast, where it ends up trying to convert your iterable to a list (with PySequence_List), within which it ultimately creates an empty list and tries to simply extend it with your iterable.
To extend the list with the iterable, it uses listextend(), and in there you'll see the root of the problem:
/* Run iterator to exhaustion. */
for (;;) {
and there you go.
Or least I think so... :) It was an interesting question so I thought I'd have some fun and dig through the source to see what was up, and ended up there.
As to the different behaviour with numpy arrays, it will simply be a difference in how the numpy.array assignments are handled.
Note that using itertools.repeat doesn't work in numpy, but it doesn't hang up (I didn't check the implementation to figure out why):
>>> import numpy, itertools
>>> x = numpy.ones(10,dtype='bool')
>>> x[::2] = itertools.repeat(False)
>>> x
array([ True, True, True, True, True, True, True, True, True, True], dtype=bool)
>>> #but the scalar assignment does work as advertised...
>>> x = numpy.ones(10,dtype='bool')
>>> x[::2] = False
>>> x
array([False, True, False, True, False, True, False, True, False, True], dtype=bool)

Try this:
l = len(x)
x[::2] = itertools.repeat(False, l/2 if l % 2 == 0 else (l/2)+1)
Your original solution ends up in an infinite loop because that's what repeat is supposed to do, from the documentation:
Make an iterator that returns object over and over again. Runs indefinitely unless the times argument is specified.

The slice x[::2] is exactly len(x)/2 elements long, so you could achieve what you want with:
x[::2] = [False]*(len(x)/2)
The itertools.repeat and itertools.cycle methods are designed to yield values infinitely. However you can specify a limit on repeat(). Like this:
x[::2] = itertools.repeat(False, len(x)/2)

The right hand side of an extended slice assignment needs to be an iterable of the right size (ten, in this case).
Here is it with a regular list on the righthand side:
>>> x = [True] * 20
>>> x[::2] = [False] * 10
>>> x
[False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True]
And here it is with itertools.repeat on the righthand side.
>>> from itertools import repeat
>>> x = [True] * 20
>>> x[::2] = repeat(False, 10)
>>> x
[False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True]

Related

How to count groups of certain value in a numpy 1d array?

I have a numpy 1d arrays with boolean values, that looks like
array_in = np.array([False, True, True, True, False, False, True, True, False])
This arrays have different length. As you can see, there are parts, where True values located next to each other, so we have groups of Trues and groups of Falses. I want to count the number of True groups. For our case, we have
N = 2
I tried to do some loops with conditions, but it got really messy and confusing.
You can use np.diff to determine changes between groups. By attaching False to the start and the end of this difference calculation we make sure that True groups at the start and end are properly counted.
import numpy as np
array_in = np.array([False, True, True, True, False, False, True, True, False, True, False, True])
true_groups = np.sum(np.diff(array_in, prepend=False, append=False))//2
#print(true_groups)
>>>4
If you don't want to write loops and conditions, you could take a shortcut by looking at this like a connected components problem.
import numpy as np
from skimage import measure
array_in = np.array([False, True, True, True, False, False, True, True, False])
N = max(measure.label(array_in))
When an array is passed into the measure.label() function, it treats the 0 values in that array as the "background". Then it looks at all the non-zero values, finds connected regions, and numbers them.
For example, the label output on the above array will be [0, 1, 1, 1, 0, 0, 2, 2, 0]. Naturally, then doing a simple max on the output gives you the largest group number (here it's 2) -- which is also the same as the number of True groups.
A straightforward way of finding the number of groups of True is by counting the number of False, True sequences in the array. With a list comprehension, that will look like:
sum([1 for i in range(1, len(x)) if x[i] and not x[i-1]])
Alternatively you can convert the initial array into a string of '0's and '1's and count the number of '01' occurences:
''.join([str(int(k)) for k in x]).count('01')
In native numpy, a vectorised solution can be done by checking how many times F changes to T sequentially.
For example,
np.bitwise_and(array_in[1:], ~array_in[:-1]).sum() + array_in[0]
I am computing a bitwise and of every element of the array with it's previous element after negating the previous element. However, in doing so, the first element is ignored, so I am adding that in manually.

Python: Different methods to initialize 2D arrays gives different outputs [duplicate]

This question already has answers here:
List of lists changes reflected across sublists unexpectedly
(17 answers)
Closed 2 years ago.
I was solving a few questions involving dynamic programming. I initialized the dp table as -
n = 3
dp = [[False]*n]*n
print(dp)
#Output - [[False, False, False], [False, False, False], [False, False, False]]
Followed by which I set the diagonal elements to True using -
for i in range(n):
dp[i][i] = True
print(dp)
#Output - [[True, True, True], [True, True, True], [True, True, True]]
However, the above sets every value in dp to True. But when I initialize dp as -
dp = [[False]*n for i in range(n)]
Followed by setting diagonal elements to True, I get the correct output - [[True, False, False], [False, True, False], [False, False, True]]
So how exactly does the star operator generate values of the list?
When you do dp = [[False]*n]*n, you get a list of n of the same lists, as in, when you modify one, all are modified. That's why with that loop of n, you seemingly modify all n^2 elements.
You can check it like this:
[id(x) for x in dp]
> [1566380391432, 1566380391432, 1566380391432, 1566380391432, 1566380391432] # you'll see same values
With dp = [[False]*n for i in range(n)] you are creating different lists n times. Let's try again for this dp:
[id(x) for x in dp]
[1566381807176, 1566381801160, 1566381795912, 1566381492552, 1566380166600]
In general, opt to use * to expand immutable data types, and use for ... to expand mutable data types (like lists).
Your issue is that in the first example, you are not actually creating more lists.
To explain, lets go threw the example line by line.
First, you create a new list [False]*3. This will create list with the value False 3 times.
Next, you create a different list with a reference to the first list. Note that the first list is not copied, only a reference is stored.
Next, by multiplying by 3 you create a list with 3 references to the same list. Since these are only references, changing one will change the others too.
This is why assigning dp[i][i]=True, you are actually setting all element i of all three lists to True, since they are all 3 the same list. Thus if you do this for all i all value in all (but there is only one) lists will be set to true.
The second option actually creates 3 separate lists, so the code works properly.

Is Python `list.extend(iterator)` guaranteed to be lazy?

Summary
Suppose I have an iterator that, as elements are consumed from it, performs some side effect, such as modifying a list. If I define a list l and call l.extend(iterator), is it guaranteed that extend will push elements onto l one-by-one, as elements from the iterator are consumed, as opposed to kept in a buffer and then pushed on all at once?
My experiments
I did a quick test in Python 3.7 on my computer, and list.extend seems to be lazy based on that test. (See code below.) Is this guaranteed by the spec, and if so, where in the spec is that mentioned?
(Also, feel free to criticize me and say "this is not Pythonic, you fool!"--though I would appreciate it if you also answer the question if you want to criticize me. Part of why I'm asking is for my own curiosity.)
Say I define an iterator that pushes onto a list as it runs:
l = []
def iterator(k):
for i in range(5):
print([j in k for j in range(5)])
yield i
l.extend(iterator(l))
Here are examples of non-lazy (i.e. buffered) vs. lazy possible extend implementations:
def extend_nonlazy(l, iterator):
l += list(iterator)
def extend_lazy(l, iterator):
for i in iterator:
l.append(i)
Results
Here's what happens when I run both known implementations of extend.
Non-lazy:
l = []
extend_nonlazy(l, iterator(l))
# output
[False, False, False, False, False]
[False, False, False, False, False]
[False, False, False, False, False]
[False, False, False, False, False]
[False, False, False, False, False]
# l = [0, 1, 2, 3, 4]
Lazy:
l = []
extend_lazy(l, iterator(l))
[False, False, False, False, False]
[True, False, False, False, False]
[True, True, False, False, False]
[True, True, True, False, False]
[True, True, True, True, False]
My own experimentation shows that native list.extend seems to work like the lazy version, but my question is: does the Python spec guarantee that?
I don't think the issue is lazy vs non lazy because, either in slice assignment or in list extend, you need all the elements of the iterator and these elements are consumed at once (in the normal case). The issue you raised is more important: are these operations atomic or not atomic? See one definition of "atomicity" in Wikipedia:
Atomicity guarantees that each transaction is treated as a single "unit", which either succeeds completely, or fails completely.
Have a look at this example (CPython 3.6.8):
>>> def new_iterator(): return (1/(i-2) for i in range(5))
>>> L = []
>>> L[:] = new_iterator()
Traceback (most recent call last):
...
ZeroDivisionError: division by zero
>>> L
[]
The slice assignment failed because of the exception (i == 2 => 1/(i - 2) raises an exception) and the list was left unchanged. Hence, the slice assignement operation is atomic.
Now, the same example with: extend:
>>> L.extend(new_iterator())
Traceback (most recent call last):
...
ZeroDivisionError: division by zero
>>> L
[-0.5, -1.0]
When the exception was raised, the two first elements were already appended to the list. The extend operation is not atomic, since a failure does not leave the list unchanged.
Should the extend operation be atomic or not? Frankly I have no idea about that, but as written in #wim's answer, the real issue is that it's not clearly stated in the documentation (and worse, the documentation asserts that extend is equivalent to the slice assignment, which is not true in the reference implementation).
Is Python list.extend(iterator) guaranteed to be lazy?
No. On the contrary, it's documented that
l.extend(iterable)
is equivalent to
l[len(l):] = iterable
In CPython, such a slice assignment will first convert a generator on the right hand side into a list anyway (see here), i.e. it's consuming the iterable all at once.
The example shown in your question is, strictly speaking, contradicting the documentation. I've filed a documentation bug, but it was promptly closed by Raymond Hettinger.
As an aside, there are less convoluted ways to demonstrate the discrepancy. Just define a failing generator:
def gen():
yield 1
yield 2
yield 3
uh-oh
Now L.extend(gen()) will modify L, but L[:] = gen() will not.

How can I check that a Python list contains only True and then only False using one or two lines?

I would like to only allow lists where the first contiguous group of elements are True and then all of the remaining elements are False. I want lists like these examples to return True:
[True]
[False]
[True, False]
[True, False, False]
[True, True, True, False]
And lists like these to return False:
[False, True]
[True, False, True]
I am currently using this function, but I feel like there is probably a better way of doing this:
def my_function(x):
n_trues = sum(x)
should_be_true = x[:n_trues] # get the first n items
should_be_false = x[n_trues:len(x)] # get the remaining items
# return True only if all of the first n elements are True and the remaining
# elements are all False
return all(should_be_true) and all([not element for element in should_be_false])
Testing:
test_cases = [[True], [False],
[True, False],
[True, False, False],
[True, True, True, False],
[False, True],
[True, False, True]]
print([my_function(test_case) for test_case in test_cases])
# expected output: [True, True, True, True, True, False, False]
Is it possible to use a comprehension instead to make this a one/two line function? I know I could not define the two temporary lists and instead put their definitions in place of their names on the return line, but I think that would be too messy.
Method 1
You could use itertools.groupby. This would avoid doing multiple passes over the list and would also avoid creating the temp lists in the first place:
def check(x):
status = list(k for k, g in groupby(x))
return len(status) <= 2 and (status[0] is True or status[-1] is False)
This assumes that your input is non-empty and already all boolean. If that's not always the case, adjust accordingly:
def check(x):
status = list(k for k, g in groupby(map(book, x)))
return status and len(status) <= 2 and (status[0] or not status[-1])
If you want to have empty arrays evaluate to True, either special case it, or complicate the last line a bit more:
return not status or (len(status) <= 2 and (status[0] or not status[-1]))
Method 2
You can also do this in one pass using an iterator directly. This relies on the fact that any and all are guaranteed to short-circuit:
def check(x):
iterator = iter(x)
# process the true elements
all(iterator)
# check that there are no true elements left
return not any(iterator)
Personally, I think method 1 is total overkill. Method 2 is much nicer and simpler, and achieves the same goals faster. It also stops immediately if the test fails, rather than having to process the whole group. It also doesn't allocate any temporary lists at all, even for the group aggregation. Finally, it handles empty and non-boolean inputs out of the box.
Since I'm writing on mobile, here's an IDEOne link for verification: https://ideone.com/4MAYYa

Using list comprehension to compare elements of two arrays

How can I use list comprehension in python to compare if two arrays has same elements or not?
I did the following:
>>> aa=[12,3,13];
>>> bb=[3,13,12];
>>> pp=[True for x in aa for y in bb if y==x]
>>> pp
[True, True, True]
>>> bb=[3,13,123];
>>> pp=[True for x in aa for y in bb if y==x]
[True, True]
I also want to output the False if not true rather than outputting just two trues like in the latter case but don't know how to do it.
Finally, I want to get one True/False value (if all are true then true, if one of them is false, then false) rather than the list of true and/or false. I know the simple loop to iterate over pp(list of true and false) is enough for that but I am sure there is more pythonic way to that.
You are testing every element of each list against every element of the other list, finding all combinations that are True. Apart from inefficient, this is also the incorrect approach.
Use membership testing instead, and see all these tests are True with the all() function:
all(el in bb for el in aa)
all() returns True if each element of the iterable you give it is True, False otherwise.
This won't quite test if the lists are equivalent; you need to test for the length as well:
len(aa) == len(bb) and all(el in bb for el in aa)
To make this a little more efficient for longer bb lists; create a set() from that list first:
def equivalent(aa, bb):
if len(aa) != len(bb):
return False
bb_set = set(bb)
return all(el in bb_set for el in aa)
This still doesn't deal with duplicate numbers very well; [1, 1, 2] is equivalent to [1, 2, 2] with this approach. You underspecified what should happen in such cornercases; the only strict equivalent test would be to sort both inputs:
len(aa) == len(bb) and sorted(aa) == sorted(bb)
where we first test for length to avoid having to sort in case the lengths differ.
If duplicates are allowed, whatever the length of the input, you can forgo loops altogether and just use sets:
not set(aa).symmetric_difference(bb)
to test if they have the same unique elements.
set(aa) == set(bb)
This has the same effect but may be slightly faster
not set(aa).symmetric_difference(bb)
If you need [1,1] to be not equivalent to [1] do
sorted(aa) == sorted(bb)
"You are testing every element of each list against every element of
the other list, finding all combinations that are True. Apart from
inefficient, this is also the incorrect approach."
I agree with the above statement, the below code lists the False values as well but I don't think you really need this.
>>> bp = [y==x for x in aa for y in bb]
[False, False, True, True, False, False, False, True, False]
>>> False in bp
True

Categories

Resources