Using list comprehension to compare elements of two arrays - python

How can I use list comprehension in python to compare if two arrays has same elements or not?
I did the following:
>>> aa=[12,3,13];
>>> bb=[3,13,12];
>>> pp=[True for x in aa for y in bb if y==x]
>>> pp
[True, True, True]
>>> bb=[3,13,123];
>>> pp=[True for x in aa for y in bb if y==x]
[True, True]
I also want to output the False if not true rather than outputting just two trues like in the latter case but don't know how to do it.
Finally, I want to get one True/False value (if all are true then true, if one of them is false, then false) rather than the list of true and/or false. I know the simple loop to iterate over pp(list of true and false) is enough for that but I am sure there is more pythonic way to that.

You are testing every element of each list against every element of the other list, finding all combinations that are True. Apart from inefficient, this is also the incorrect approach.
Use membership testing instead, and see all these tests are True with the all() function:
all(el in bb for el in aa)
all() returns True if each element of the iterable you give it is True, False otherwise.
This won't quite test if the lists are equivalent; you need to test for the length as well:
len(aa) == len(bb) and all(el in bb for el in aa)
To make this a little more efficient for longer bb lists; create a set() from that list first:
def equivalent(aa, bb):
if len(aa) != len(bb):
return False
bb_set = set(bb)
return all(el in bb_set for el in aa)
This still doesn't deal with duplicate numbers very well; [1, 1, 2] is equivalent to [1, 2, 2] with this approach. You underspecified what should happen in such cornercases; the only strict equivalent test would be to sort both inputs:
len(aa) == len(bb) and sorted(aa) == sorted(bb)
where we first test for length to avoid having to sort in case the lengths differ.
If duplicates are allowed, whatever the length of the input, you can forgo loops altogether and just use sets:
not set(aa).symmetric_difference(bb)
to test if they have the same unique elements.

set(aa) == set(bb)
This has the same effect but may be slightly faster
not set(aa).symmetric_difference(bb)
If you need [1,1] to be not equivalent to [1] do
sorted(aa) == sorted(bb)

"You are testing every element of each list against every element of
the other list, finding all combinations that are True. Apart from
inefficient, this is also the incorrect approach."
I agree with the above statement, the below code lists the False values as well but I don't think you really need this.
>>> bp = [y==x for x in aa for y in bb]
[False, False, True, True, False, False, False, True, False]
>>> False in bp
True

Related

Get all possible boolean sequences for a given list length

Given a random list length, how can I efficiently get all possible sequences of boolean values, except strings that are all True or all False?
For instance, given the number 3 it should return something like the following.
[True, False, False],
[True, True, False],
[True, False, True],
[False, True, False],
[False, True, True],
[False, False, True],
Is there already a known function that does this?
The order that it returns the sequences in is not important. I mainly just need a number of how many sequences are possible for a given list length.
This is mostly a maths question, unless you need the sequences themselves. If you do, there is a neat python solution:
from itertools import product
[seq for seq in product((True, False), repeat=3)][1:-1]
The list comprehension will contain all possible sequences, but we don't want (True, True, True) and (False, False, False). Conveniently, these will be the first and last element respectively, so we can simply discard them, using slicing from 1 to -1.
For sequences with different lengths, just change the "repeat" optional argument of the itertools.product function.
You don't need a function to determine this. Simple math will do the trick.
2**n - 2
2 because there are only two options (True/False)
n is your list length
-2 because you want to exclude the all True and all False results
This is more of a maths question, but here goes:
The number of total options is equal to the multiplication of the number of options per position, so, if you receive 3 as input:
index[0] could be true or false - 2
index[1] could be true or false - 2
index[2] could be true or false - 2
index has a total of 6 options.

how to use all to filter list above threshold

x=[255, 255, 255, 250, 250, 250, 250, 250, 250, 255, 255, 255]
y=all([i for i in x if i>260])
y1 = all([i for i in x if i>220])
y2 = all([True for i in x if i>260])
y3 = all([True for i in x if i>220])
y1==y2==y3==y #gives True
How does all handle empty lists, and how can I use to filter items above 220.
the only thing working for me now is
y=len([i for i in x if i>220])== len(x)
what I want to know if all list is above certain number i.e 220
You want this:
all([i > 220 for i in x])
all for an empty list should be True, because all of its elements - all 0 of them - are truthy
Edit: if it's desired to rule out the trivial case where x is empty, then as suggested in the comments just add an additional check for that, such as len(X) > 0
There are some small tweaks required to make your list comprehensions work. For instance, you don't handle the case in which the value in the list is not greater than x. For example, let's assume our list is:
>>> x = [1, 2, 3, 4, 5, 6]
If we use the code snipped from your question:
>>> [True for i in x if i > 3]
[True, True, True]
The result is probably not exactly what you expected. Therefore, calling all() on that list will always evaluate to True. One small tweak, however (note the addition of the else statement):
>>> [True if i > 3 else False for i in x]
[False, False, False, True, True, True]
This is the desired output. Now you can also apply all() to the output list.
>>> y1 = [True if i > 3 else False for i in x]
>>> all(y1)
False
And if we want to know whether all values are greater than 0, we expect all() to return True in the following case:
>>> y2 = [True if i > 0 else False for i in x]
>>> all(y2)
True
To answer your question about how all() handles empty lists:
>>> all([])
True
Depending on the structure of your code and the desired output you may want to have an edge case statement that handles empty lists. Hopefully this is enough input for you to adapt it to your problem statement.
Making the code more compact: As very kindly pointed out in the comments, the code above is a bit "wordy" to make a point and for the sake of the explanation. When writing list comprehensions and condition statements in Python (and perhaps other programming languages), you can actually shorten the list comprehension and condition statement:
>>> y1 = [i > 3 for i in x]
>>> y1
[False, False, False, True, True, True]
>>> all(y1)
False
i > 3 in the list comprehension will turn into True or False and therefore yield the same result as writing True if i > 3 else False.
Finally, if you want to know if all elements meet the threshold and there is at least one element in the list:
>>> y1 = [i > 3 for i in x]
>>> y1
[False, False, False, True, True, True]
>>> all(y1) and y1
[] # this will evaluate to False if you use it in a condition statement
Alternatively you can also convert it to a boolean:
>>> bool(all(y1) and y1)
False
Which does the right thing for empty lists as well:
>>> bool(all([]) and [])
False
instead of True for i in x if i>260, do i>260 for i in x
i>260 will result in a boolean: True, or False for each element.

How can I check that a Python list contains only True and then only False using one or two lines?

I would like to only allow lists where the first contiguous group of elements are True and then all of the remaining elements are False. I want lists like these examples to return True:
[True]
[False]
[True, False]
[True, False, False]
[True, True, True, False]
And lists like these to return False:
[False, True]
[True, False, True]
I am currently using this function, but I feel like there is probably a better way of doing this:
def my_function(x):
n_trues = sum(x)
should_be_true = x[:n_trues] # get the first n items
should_be_false = x[n_trues:len(x)] # get the remaining items
# return True only if all of the first n elements are True and the remaining
# elements are all False
return all(should_be_true) and all([not element for element in should_be_false])
Testing:
test_cases = [[True], [False],
[True, False],
[True, False, False],
[True, True, True, False],
[False, True],
[True, False, True]]
print([my_function(test_case) for test_case in test_cases])
# expected output: [True, True, True, True, True, False, False]
Is it possible to use a comprehension instead to make this a one/two line function? I know I could not define the two temporary lists and instead put their definitions in place of their names on the return line, but I think that would be too messy.
Method 1
You could use itertools.groupby. This would avoid doing multiple passes over the list and would also avoid creating the temp lists in the first place:
def check(x):
status = list(k for k, g in groupby(x))
return len(status) <= 2 and (status[0] is True or status[-1] is False)
This assumes that your input is non-empty and already all boolean. If that's not always the case, adjust accordingly:
def check(x):
status = list(k for k, g in groupby(map(book, x)))
return status and len(status) <= 2 and (status[0] or not status[-1])
If you want to have empty arrays evaluate to True, either special case it, or complicate the last line a bit more:
return not status or (len(status) <= 2 and (status[0] or not status[-1]))
Method 2
You can also do this in one pass using an iterator directly. This relies on the fact that any and all are guaranteed to short-circuit:
def check(x):
iterator = iter(x)
# process the true elements
all(iterator)
# check that there are no true elements left
return not any(iterator)
Personally, I think method 1 is total overkill. Method 2 is much nicer and simpler, and achieves the same goals faster. It also stops immediately if the test fails, rather than having to process the whole group. It also doesn't allocate any temporary lists at all, even for the group aggregation. Finally, it handles empty and non-boolean inputs out of the box.
Since I'm writing on mobile, here's an IDEOne link for verification: https://ideone.com/4MAYYa

How to pythonically select a random index from a 2D list such that the corresponding element matches a value?

I have a 2D list of booleans. I want to select a random index from the the list where the value is False. For example, given the following list:
[[True, False, False],
[True, True, True],
[False, True, True]]
The valid choices would be: [0, 1], [0, 2], and [2, 0].
I could keep a list of valid indices and then use random.choice to select from it, but it seems unpythonic to keep a variable and update it every time the underlying list changes for only this one purpose.
Bonus points if your answer runs quickly.
We can use a oneliner like:
import numpy as np
from random import choice
choice(np.argwhere(~a))
With a the array of booleans.
This works as follows: by using ~a, we negate the elements of the array. Next we use np.argwhere to construct a k×2-array: an array where every row has two elements: for every dimension the value such that the corresponding value has as value False.
By choice(..) we thus select a random row. We can however not use this directly to access the element. We can use the tuple(..) constructor to cast it to a tuple:
>>> tuple(choice(np.argwhere(~a)))
(2, 0)
You can thus fetch the element then with:
t = tuple(choice(np.argwhere(~a)))
a[t]
But of course, it is not a surprise that:
>>> t = tuple(choice(np.argwhere(~a)))
>>> a[t]
False
My non-numpy version:
result = random.choice([
(i,j)
for i in range(len(a))
for j in range(len(a[i]))
if not a[i][j]])
Like Willem's np version, this generates a list of valid tuples and invokes random.choice() to pick one.
Alternatively, if you hate seeing range(len(...)) as much as I do, here is an enumerate() version:
result = random.choice([
(i, j)
for i, row in enumerate(a)
for j, cell in enumerate(row)
if not cell])
Assuming you don't want to use numpy.
matrix = [[True, False, False],
[True, True, True],
[False, True, True]]
valid_choices = [(i,j) for i, x in enumerate(matrix) for j, y in enumerate(x) if not y]
random.choice(valid_choices)
With list comprehensions, you can change the if condition (if not y) to suit your needs. This will return the coordinate that is randomly selected, but optionally you could change the value part of the list comprehension (i,j) in this case to: y and it'd return false, though thats a bit redundant in this case.

itertools and strided list assignment

Given a list, e.g. x = [True]*20, I want to assign False to every other element.
x[::2] = False
raises TypeError: must assign iterable to extended slice
So I naively assumed you could do something like this:
x[::2] = itertools.repeat(False)
or
x[::2] = itertools.cycle([False])
However, as far as I can tell, this results in an infinite loop. Why is there an infinite loop? Is there an alternative approach that does not involve knowing the number of elements in the slice before assignment?
EDIT: I understand that x[::2] = [False] * len(x)/2 works in this case, or you can come up with an expression for the multiplier on the right side in the more general case. I'm trying to understand what causes itertools to cycle indefinitely and why list assignment behaves differently from numpy array assignment. I think there must be something fundamental about python I'm misunderstanding. I was also thinking originally there might be performance reasons to prefer itertools to list comprehension or creating another n-element list.
What you are attempting to do in this code is not what you think (i suspect)
for instance:
x[::2] will return a slice containing every odd element of x, since x is of size 20,
the slice will be of size 10, but you are trying to assign a non-iterable of size 1 to it.
to successfully use the code you have you will need to do:
x = [True]*20
x[::2] = [False]*10
wich will assign an iterable of size 10 to a slice of size 10.
Why work in the dark with the number of elements? use
len(x[::2])
which would be equal to 10, and then use
x[::2] = [False]*len(x[::2])
you could also do something like:
x = [True if (index & 0x1 == 0) else False for index, element in enumerate(x)]
EDIT: Due to OP edit
The documentation on cycle says it Repeats indefinitely. which means it will continuously 'cycle' through the iterator it has been given.
Repeat has a similar implementation, however documentation states that it
Runs indefinitely unless the times argument is specified.
which has not been done in the questions code. Thus both will lead to infinite loops.
About the itertools being faster comment. Yes itertools are generally faster than other implementations because they are optimised to be as fast as the creators could make them.
However if you do not want to recreate a list you can use generator expressions such as the following:
x = (True if (index & 0x1 == 0) else False for index, element in enumerate(x))
which do not store all of their elements in memory but produce them as they are needed, however, generator functions can be used up.
for instance:
x = [True]*20
print(x)
y = (True if (index & 0x1 == 0) else False for index, element in enumerate(x))
print ([a for a in y])
print ([a for a in y])
will print x then the elements in the generator y, then a null list, because the generator has been used up.
As Mark Tolonen pointed out in a concise comment, the reason why your itertools attempts are cycling indefinitely is because, for the list assignment, python is checking the length of the right hand side.
Now to really dig in...
When you say:
x[::2] = itertools.repeat(False)
The left hand side (x[::2]) is a list, and you are assigning a value to a list where the value is the itertools.repeat(False) iterable, which will iterate forever since it wasn't given a length (as per the docs).
If you dig into the list assignment code in the cPython implementation, you'll find the unfortunately/painfully named function list_ass_slice, which is at the root of a lot of list assignment stuff. In that code you'll see this segment:
v_as_SF = PySequence_Fast(v, "can only assign an iterable");
if(v_as_SF == NULL)
goto Error;
n = PySequence_Fast_GET_SIZE(v_as_SF);
Here it is trying to get the length (n) of the iterable you are assigning to the list. However, before even getting there it is getting stuck on PySequence_Fast, where it ends up trying to convert your iterable to a list (with PySequence_List), within which it ultimately creates an empty list and tries to simply extend it with your iterable.
To extend the list with the iterable, it uses listextend(), and in there you'll see the root of the problem:
/* Run iterator to exhaustion. */
for (;;) {
and there you go.
Or least I think so... :) It was an interesting question so I thought I'd have some fun and dig through the source to see what was up, and ended up there.
As to the different behaviour with numpy arrays, it will simply be a difference in how the numpy.array assignments are handled.
Note that using itertools.repeat doesn't work in numpy, but it doesn't hang up (I didn't check the implementation to figure out why):
>>> import numpy, itertools
>>> x = numpy.ones(10,dtype='bool')
>>> x[::2] = itertools.repeat(False)
>>> x
array([ True, True, True, True, True, True, True, True, True, True], dtype=bool)
>>> #but the scalar assignment does work as advertised...
>>> x = numpy.ones(10,dtype='bool')
>>> x[::2] = False
>>> x
array([False, True, False, True, False, True, False, True, False, True], dtype=bool)
Try this:
l = len(x)
x[::2] = itertools.repeat(False, l/2 if l % 2 == 0 else (l/2)+1)
Your original solution ends up in an infinite loop because that's what repeat is supposed to do, from the documentation:
Make an iterator that returns object over and over again. Runs indefinitely unless the times argument is specified.
The slice x[::2] is exactly len(x)/2 elements long, so you could achieve what you want with:
x[::2] = [False]*(len(x)/2)
The itertools.repeat and itertools.cycle methods are designed to yield values infinitely. However you can specify a limit on repeat(). Like this:
x[::2] = itertools.repeat(False, len(x)/2)
The right hand side of an extended slice assignment needs to be an iterable of the right size (ten, in this case).
Here is it with a regular list on the righthand side:
>>> x = [True] * 20
>>> x[::2] = [False] * 10
>>> x
[False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True]
And here it is with itertools.repeat on the righthand side.
>>> from itertools import repeat
>>> x = [True] * 20
>>> x[::2] = repeat(False, 10)
>>> x
[False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True]

Categories

Resources