I have a list of Booleans:
[True, True, False, False, False, True]
and I am looking for a way to count the number of True in the list (so in the example above, I want the return to be 3.) I have found examples of looking for the number of occurrences of specific elements, but is there a more efficient way to do it since I'm working with Booleans? I'm thinking of something analogous to all or any.
True is equal to 1.
>>> sum([True, True, False, False, False, True])
3
list has a count method:
>>> [True,True,False].count(True)
2
This is actually more efficient than sum, as well as being more explicit about the intent, so there's no reason to use sum:
In [1]: import random
In [2]: x = [random.choice([True, False]) for i in range(100)]
In [3]: %timeit x.count(True)
970 ns ± 41.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit sum(x)
1.72 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
If you are only concerned with the constant True, a simple sum is fine. However, keep in mind that in Python other values evaluate as True as well. A more robust solution would be to use the bool builtin:
>>> l = [1, 2, True, False]
>>> sum(bool(x) for x in l)
3
UPDATE: Here's another similarly robust solution that has the advantage of being more transparent:
>>> sum(1 for x in l if x)
3
P.S. Python trivia: True could be true without being 1. Warning: do not try this at work!
>>> True = 2
>>> if True: print('true')
...
true
>>> l = [True, True, False, True]
>>> sum(l)
6
>>> sum(bool(x) for x in l)
3
>>> sum(1 for x in l if x)
3
Much more evil:
True = False
You can use sum():
>>> sum([True, True, False, False, False, True])
3
After reading all the answers and comments on this question, I thought to do a small experiment.
I generated 50,000 random booleans and called sum and count on them.
Here are my results:
>>> a = [bool(random.getrandbits(1)) for x in range(50000)]
>>> len(a)
50000
>>> a.count(False)
24884
>>> a.count(True)
25116
>>> def count_it(a):
... curr = time.time()
... counting = a.count(True)
... print("Count it = " + str(time.time() - curr))
... return counting
...
>>> def sum_it(a):
... curr = time.time()
... counting = sum(a)
... print("Sum it = " + str(time.time() - curr))
... return counting
...
>>> count_it(a)
Count it = 0.00121307373046875
25015
>>> sum_it(a)
Sum it = 0.004102230072021484
25015
Just to be sure, I repeated it several more times:
>>> count_it(a)
Count it = 0.0013530254364013672
25015
>>> count_it(a)
Count it = 0.0014507770538330078
25015
>>> count_it(a)
Count it = 0.0013344287872314453
25015
>>> sum_it(a)
Sum it = 0.003480195999145508
25015
>>> sum_it(a)
Sum it = 0.0035257339477539062
25015
>>> sum_it(a)
Sum it = 0.003350496292114258
25015
>>> sum_it(a)
Sum it = 0.003744363784790039
25015
And as you can see, count is 3 times faster than sum. So I would suggest to use count as I did in count_it.
Python version: 3.6.7
CPU cores: 4
RAM size: 16 GB
OS: Ubuntu 18.04.1 LTS
Just for completeness' sake (sum is usually preferable), I wanted to mention that we can also use filter to get the truthy values. In the usual case, filter accepts a function as the first argument, but if you pass it None, it will filter for all "truthy" values. This feature is somewhat surprising, but is well documented and works in both Python 2 and 3.
The difference between the versions, is that in Python 2 filter returns a list, so we can use len:
>>> bool_list = [True, True, False, False, False, True]
>>> filter(None, bool_list)
[True, True, True]
>>> len(filter(None, bool_list))
3
But in Python 3, filter returns an iterator, so we can't use len, and if we want to avoid using sum (for any reason) we need to resort to converting the iterator to a list (which makes this much less pretty):
>>> bool_list = [True, True, False, False, False, True]
>>> filter(None, bool_list)
<builtins.filter at 0x7f64feba5710>
>>> list(filter(None, bool_list))
[True, True, True]
>>> len(list(filter(None, bool_list)))
3
It is safer to run through bool first. This is easily done:
>>> sum(map(bool,[True, True, False, False, False, True]))
3
Then you will catch everything that Python considers True or False into the appropriate bucket:
>>> allTrue=[True, not False, True+1,'0', ' ', 1, [0], {0:0}, set([0])]
>>> list(map(bool,allTrue))
[True, True, True, True, True, True, True, True, True]
If you prefer, you can use a comprehension:
>>> allFalse=['',[],{},False,0,set(),(), not True, True-1]
>>> [bool(i) for i in allFalse]
[False, False, False, False, False, False, False, False, False]
I prefer len([b for b in boollist if b is True]) (or the generator-expression equivalent), as it's quite self-explanatory. Less 'magical' than the answer proposed by Ignacio Vazquez-Abrams.
Alternatively, you can do this, which still assumes that bool is convertable to int, but makes no assumptions about the value of True:
ntrue = sum(boollist) / int(True)
Related
Trying to convert a list of 1/0s to a list of boolean.
bool([1,0,1,0]) doesn't seem to work.
[1,0,1,0] == 1 doesn't work.
Is there another way (hopefully non-list comprehension)?
In python 2
bool_list = map(bool,int_list)
In python 3:
bool_list = list(map(bool,int_list))
[x==1 for x in list]
is a general approach that you can easily adapt should you ever have a list with entries other than 0 or 1.
The solution using numpy module:
import numpy as np
l = np.array([1,0,1,0])
print((l > 0).tolist())
The output:
[True, False, True, False]
l > 0 - each element of the array l is tested, if it's bigger than 0. The results of these tests are the Boolean elements of the result array
This approach is also quite good when dealing with multidimensional arrays:
l = np.array([[1,0,1,0], [1,1,1,1], [0,0,1,0]])
print((l > 0).tolist())
The output:
[[True, False, True, False], [True, True, True, True], [False, False, True, False]]
Try this, using list comprehensions:
lst = [1, 0, 1, 0]
[bool(x) for x in lst]
=> [True, False, True, False]
In terms of readability, bool() and map() are better options. In terms of speed, they are about half as fast:
For Python 3.5.2 (repl.it):
[not not x for x in lst]
is faster than the other three options, although only slightly faster than
[x==1 for x in lst]
For Python 2.7.10 (repl.it):
x==1 edges out not not, but both are still twice as fast as the other two.
not not also has the advantage over x==1 in that it will apply to all values, not just 0 and 1
I run through the following sequence of statements:
>>> a = range(10)
>>> min(a, key=lambda x: x < 5.3)
6
>>> max(a, key=lambda x: x < 5.3)
0
The min and max give the exact opposite of what I was expecting.
The python documentation on min and max is pretty sketchy.
Can anyone explain to me how the "key" works?
Explanation of the key argument
Key works like this:
a_list = ['apple', 'banana', 'canary', 'doll', 'elephant']
min(a_list, key=len)
returns 'doll', and
max(a_list, key=len)
returns 'elephant'
You provide it with a function, and it uses the minimum or maximum of the results of the function applied to each item to determine which item to return in the result.
Application
If your function returns a boolean, like yours, for min it'll return the first of the minimum of True or False, (which is False) which would be 6 or the first of the max (True) which would be 0.
To see this:
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> import pprint
>>> pprint.pprint(dict((i, i<5.3) for i in a))
{0: True,
1: True,
2: True,
3: True,
4: True,
5: True,
6: False,
7: False,
8: False,
9: False}
Why?
>>> min([True, False])
False
>>> max([True, False])
True
Explanation
Why is True greater than False?
>>> True == 1
True
>>> False == 0
True
>>> issubclass(bool, int)
True
It turns out that True and False are very closely related to 1 and 0. They even evaluate the same respectively.
because all values are either True or False. So the first one of each is chosen.
min([True, 1, False, 0])
will return False instead of 0 because both have the same value but False happens first
Your code uses a key function with only boolean values so, for min, the first False
happens at the number 6. So it is the chosen one.
What you probably want to do is:
min(x for x in range(10) if x < 5.3)
This is a followup question to a question I posted here, but it's a very different question, so I thought I would post it separately.
I have a Python script which reads an very large array, and I needed to optimize an operation on each element (see referenced SO question). I now need to split the output array into two separate arrays.
I have the code:
output = [True if (len(element_in_array) % 2) else False for element_in_array in master_list]
which outputs an array of length len(master_list) consisting of True or False, depending on if the length of element_in_array is odd or even. My problem is that I need to split master_list into two arrays: one array containing the element_in_array's that correspond to the True elements in output and another containing the element_in_array's corresponding to the False elements in output.
This can clearly be done with traditional array operators such as append, but I need this to be as optimized and as fast as possible. I have many millions of elements in my master_list, so is there a way to accomplish this without directly looping through master_list and using append to create two new arrays.
Any advice would be greatly appreciated.
Thanks!
You can use itertools.compress:
>>> from itertools import compress, imap
>>> import operator
>>> lis = range(10)
>>> output = [random.choice([True, False]) for _ in xrange(10)]
>>> output
[True, True, False, False, False, False, False, False, False, False]
>>> truthy = list(compress(lis, output))
>>> truthy
[0, 1]
>>> falsy = list(compress(lis, imap(operator.not_,output)))
>>> falsy
[2, 3, 4, 5, 6, 7, 8, 9]
Go for NumPy if you want even faster solution, plus it also allows us to do array filtering based on boolean arrays:
>>> import numpy as np
>>> a = np.random.random(10)*10
>>> a
array([ 2.94518349, 0.09536957, 8.74605883, 4.05063779, 2.11192606,
2.24215582, 7.02203768, 2.1267423 , 7.6526713 , 3.81429322])
>>> output = np.array([True, True, False, False, False, False, False, False, False, False])
>>> a[output]
array([ 2.94518349, 0.09536957])
>>> a[~output]
array([ 8.74605883, 4.05063779, 2.11192606, 2.24215582, 7.02203768,
2.1267423 , 7.6526713 , 3.81429322])
Timing comparison:
>>> lis = range(1000)
>>> output = [random.choice([True, False]) for _ in xrange(1000)]
>>> a = np.random.random(1000)*100
>>> output_n = np.array(output)
>>> %timeit list(compress(lis, output))
10000 loops, best of 3: 44.9 us per loop
>>> %timeit a[output_n]
10000 loops, best of 3: 20.9 us per loop
>>> %timeit list(compress(lis, imap(operator.not_,output)))
1000 loops, best of 3: 150 us per loop
>>> %timeit a[~output_n]
10000 loops, best of 3: 28.7 us per loop
If you can use NumPy, this will be a lot simpler. And, as a bonus, it'll also be a lot faster, and it'll use a lot less memory to store your giant array. For example:
>>> import numpy as np
>>> import random
>>> # create an array of 1000 arrays of length 1-1000
>>> a = np.array([np.random.random(random.randint(1, 1000))
for _ in range(1000)])
>>> lengths = np.vectorize(len)(a)
>>> even_flags = lengths % 2 == 0
>>> evens, odds = a[even_flags], a[~even_flags]
>>> len(evens), len(odds)
(502, 498)
You could try using the groupby function in itertools. The key function would be the function that determines if the length of an element is even or not. The iterator returned by groupby consists of key-value tuples, where key is a value returned by the key function (here, True or False) and the value is a sequence of items which all share
the same key. Create a dictionary which maps a value returned by the key function to a list, and you can extend the appropriate list with a set of values from the initial iterator.
trues = []
falses = []
d = { True: trues, False: falses }
def has_even_length(element_in_array):
return len(element_in_array) % 2 == 0
for k, v in itertools.groupby(master_list, has_even_length):
d[k].extend(v)
The documentation for groupby says you typically want to make sure the list is sorted on the same key returned by the key function. In this case, it's OK to leave it unsorted; you'll just have more than things returned by the iterator returned by groupby, as there could be an a number of alternating true/false sets in the sequence.
This question already has an answer here:
numpy: search of the first and last index in an array
(1 answer)
Closed 9 years ago.
The following question can easily be solved with a loop, but I suspect that there may be a more pythonic way of acheiving this.
In essence, I have an iterable of booleans that tend to be clustered into groups. Here's an illustrative example:
[True, True, True, True, False, False, False, True, True, True, True, True]
I'd like to pick out the beginning index and end index for each cluster of Trues. Using a loop, this is easy -- each time my iterator is a True, I simply need to check if I'm already in a True cluster. If not, I set the in_true_cluster variable to true and store the index. Once I find a False, I store index - 1 as the end point.
Is there a more pythonic way of doing this? Note that I'm using PANDAS and NumPy as well, so solutions using logical indexing are acceptable.
Actually, here's a numpy way, which should be faster than doing it with itertools or a manual loop:
>>> a = np.array([True, True, True, True, False, False, False, True, True, True, True, True])
>>> np.diff(a)
array([False, False, False, True, False, False, True, False, False,
False, False], dtype=bool)
>>> _.nonzero()
(array([3, 6]),)
As you mention in the comments, pandas's groupby would also work.
Timings to convince #poke this is worthwhile:
>>> %%timeit a = np.random.randint(2, size=1000000)
... np.diff(a).nonzero()
...
100 loops, best of 3: 12.2 ms per loop
>>> def cluster_changes(array):
... changes = []
... last = None
... for i, elt in enumerate(array):
... if elt != last:
... last = elt
... changes.append(i)
... return changes
...
>>> %%timeit a = np.random.randint(2, size=1000000)
cluster_changes(a)
...
1 loops, best of 3: 348 ms per loop
That's a factor of 30 on this array using the one-liner, as compared to the 7-line manual function. (Of course, the data here has many more cluster changes than OP's data, but that's not going to make up for such a big difference.)
How about:
In [25]: l = [True, True, True, True, False, False, False, True, True, True, True, True]
In [26]: d = np.diff(np.array([False] + l + [False], dtype=np.int))
In [28]: zip(np.where(d == 1)[0], np.where(d == -1)[0] - 1)
Out[28]: [(0, 3), (7, 11)]
Here, the two runs are at indices [0; 3] and [7; 11].
I try:
[True,True,False] and [True,True,True]
and get
[True, True True]
but
[True,True,True] and [True,True,False]
gives
[True,True,False]
Not too sure why it's giving those strange results, even after taking a look at some other python boolean comparison questions. Integer does the same (replace True -> 1 and False ->0 above and the results are the same). What am I missing? I obviously want
[True,True,False] and [True,True,True]
to evaluate to
[True,True,False]
Others have explained what's going on. Here are some ways to get what you want:
>>> a = [True, True, True]
>>> b = [True, True, False]
Use a listcomp:
>>> [ai and bi for ai,bi in zip(a,b)]
[True, True, False]
Use the and_ function with a map:
>>> from operator import and_
>>> map(and_, a, b)
[True, True, False]
Or my preferred way (although this does require numpy):
>>> from numpy import array
>>> a = array([True, True, True])
>>> b = array([True, True, False])
>>> a & b
array([ True, True, False], dtype=bool)
>>> a | b
array([ True, True, True], dtype=bool)
>>> a ^ b
array([False, False, True], dtype=bool)
Any populated list evaluates to True. True and x produces x, the second list.
From the Python documentation:
The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.
You're getting the second value returned.
P.S. I had never seen this behavior before either, I had to look it up myself. My naive expectation was that a boolean expression would yield a boolean result.
and returns the last element if they all are evaluated to True.
>>> 1 and 2 and 3
3
The same is valid for lists, which are evalueted to True if they are not empty (as in your case).
[True, True, False] is being evaluated as a boolean (because of the and operator), and evaluates to True since it is non-empty. Same with [True, True, True]. The result of either statement is then just whatever is after the and operator.
You could do something like [ai and bi for ai, bi in zip(a, b)] for lists a and b.
As far as I know, you need to zip through the list. Try a list comprehension of this sort:
l1 = [True,True,False]
l2 = [True,True,True]
res = [ x and y for (x,y) in zip(l1, l2)]
print res
Python works by short-circuiting its boolean and gives the result expression as the result.
A populated list evaluates to true and gives the result as the value of the second list. Look at this, when I just interchanged the position of your first and second list.
In [3]: [True,True,True] and [True, True, False]
Out[3]: [True, True, False]