I have a numpy 1d arrays with boolean values, that looks like
array_in = np.array([False, True, True, True, False, False, True, True, False])
This arrays have different length. As you can see, there are parts, where True values located next to each other, so we have groups of Trues and groups of Falses. I want to count the number of True groups. For our case, we have
N = 2
I tried to do some loops with conditions, but it got really messy and confusing.
You can use np.diff to determine changes between groups. By attaching False to the start and the end of this difference calculation we make sure that True groups at the start and end are properly counted.
import numpy as np
array_in = np.array([False, True, True, True, False, False, True, True, False, True, False, True])
true_groups = np.sum(np.diff(array_in, prepend=False, append=False))//2
#print(true_groups)
>>>4
If you don't want to write loops and conditions, you could take a shortcut by looking at this like a connected components problem.
import numpy as np
from skimage import measure
array_in = np.array([False, True, True, True, False, False, True, True, False])
N = max(measure.label(array_in))
When an array is passed into the measure.label() function, it treats the 0 values in that array as the "background". Then it looks at all the non-zero values, finds connected regions, and numbers them.
For example, the label output on the above array will be [0, 1, 1, 1, 0, 0, 2, 2, 0]. Naturally, then doing a simple max on the output gives you the largest group number (here it's 2) -- which is also the same as the number of True groups.
A straightforward way of finding the number of groups of True is by counting the number of False, True sequences in the array. With a list comprehension, that will look like:
sum([1 for i in range(1, len(x)) if x[i] and not x[i-1]])
Alternatively you can convert the initial array into a string of '0's and '1's and count the number of '01' occurences:
''.join([str(int(k)) for k in x]).count('01')
In native numpy, a vectorised solution can be done by checking how many times F changes to T sequentially.
For example,
np.bitwise_and(array_in[1:], ~array_in[:-1]).sum() + array_in[0]
I am computing a bitwise and of every element of the array with it's previous element after negating the previous element. However, in doing so, the first element is ignored, so I am adding that in manually.
Related
Given a random list length, how can I efficiently get all possible sequences of boolean values, except strings that are all True or all False?
For instance, given the number 3 it should return something like the following.
[True, False, False],
[True, True, False],
[True, False, True],
[False, True, False],
[False, True, True],
[False, False, True],
Is there already a known function that does this?
The order that it returns the sequences in is not important. I mainly just need a number of how many sequences are possible for a given list length.
This is mostly a maths question, unless you need the sequences themselves. If you do, there is a neat python solution:
from itertools import product
[seq for seq in product((True, False), repeat=3)][1:-1]
The list comprehension will contain all possible sequences, but we don't want (True, True, True) and (False, False, False). Conveniently, these will be the first and last element respectively, so we can simply discard them, using slicing from 1 to -1.
For sequences with different lengths, just change the "repeat" optional argument of the itertools.product function.
You don't need a function to determine this. Simple math will do the trick.
2**n - 2
2 because there are only two options (True/False)
n is your list length
-2 because you want to exclude the all True and all False results
This is more of a maths question, but here goes:
The number of total options is equal to the multiplication of the number of options per position, so, if you receive 3 as input:
index[0] could be true or false - 2
index[1] could be true or false - 2
index[2] could be true or false - 2
index has a total of 6 options.
Given a numpy array:
x = np.array([False, True, True, False, False, False, False, False, True, False])
How do I find the number of times the values transitions from False to True?
For the above example, the answer would be 2. I don't want to include transitions from True to False in the count.
From the answers to How do I identify sequences of values in a boolean array?, the following produces the indices at which the values are about to change, which is not what I want as this includes True-False transitions.
np.argwhere(np.diff(x)).squeeze()
# [0 2 7 8]
I know that this can be done by looping through the array, however I was wondering if there was a faster way to do this?
Get one-off slices - x[:-1] (starting from the first elem and ending in second last elem) and x[1:] (starting from the second elem and going on until the end), then look for the first slice being lesser than the second one, i.e. catch the pattern of [False, True] and finally get the count with ndarray.sum() or np.count_nonzero() -
(x[:-1] < x[1:]).sum()
np.count_nonzero(x[:-1] < x[1:])
Another way would be to look for the first slice being False and the second one as True, the idea again being to catch that pattern of [False, True] -
(~x[:-1] & x[1:]).sum()
np.count_nonzero(~x[:-1] & x[1:])
I kind of like to use numpy method "roll" for this kind of problems...
"roll" rotates the array to left some step length : (-1,-2,...) or to right (1,2,...)
import numpy as np
np.roll(x,-1)
...this will give x but shifted one step to the left:
array([ True, True, False, False, False, False, False, True, False, False],
dtype=bool)
A False followed by a True can then be expressed as:
~x & np.roll(x,-1)
array([ True, False, False, False, False, False, False, True, False, False],
dtype=bool)
This question already has answers here:
Make numpy matrix more sparse
(2 answers)
Closed 5 years ago.
I have a bool array that was created with respect to a double array:
array1 = ... # the double array initialization
array2 = array1 < threshold # threshold is set somewhere else
Assuming the output of my second array is like this:
# array2 = [True, False, True, True, True, False]
I want to select percentage of the True items. For example, if I want to randomly select 75% of the True items, the output would be any of these:
# array3 = [True, False, True, True, False, False]
# array3 = [False, False, True, True, True, False]
# array3 = [True, False, False, True, True, False]
The third array contains 3 out of the 4 True items that were found in the second array. How can I achieve this?
So, that is actually just a job of
getting all the indexes of True in your vector -> true_indices
shuffle true_indices
true_indices = true_indices[0:len(true_indices)*3/4
array3 = [False]*len(array2)
array3[true_indices] = True
done. all these "I need to randomly pick a fixed amount from a set" is usually well convertible to a shuffling method.
Numpy comes with a shuffle function.
I have a python list of ints, a. I also have another list, b, which is a tuple of 2 values (c, d). I need to see if any elements of a have values that are between any of the tuple elements of b.
I think there is a way to do this using map(), but I can't figure out how to pass in the values of my tuple list.
For example, my data structure looks like:
a = [1, 2, 3, 4, 5, 6, 7]
b = [(12,14), (54, 78), (2,3), (9,11)]
I am trying to find out if any of the elements in a have values between any of the tuple elements of b. In the above case, 2 and 3 (from a) are inside (inclusive) of the tuple (2,3) in b. So my final answer would be True.
Does anyone have any idea how to do this in a performat way? Right now, I am looping through each element of a and then looping through each element of b. This is ok for small amounts of data, but my arrays are quite large and this step takes way to long.
dId you want this?
[c in range(k[0], k[1]+1) for c in a for k in b]
returns:
[False, False, False, False, # 1 is in any ofthe tuple range in b?
False, False, True, False, # 2 is in any of the tuple range in b?
False, False, True, False, # etc. etc....
False, False, False, False,
False, False, False, False,
False, False, False, False,
False, False, False, False] # searches for each element in a within the range specified by the tuple elements in b
If you wanted to do something else like check every element in a using each element of b first then swap the order of the fors:
[c in range(k[0], k[1]+1) for k in b for c in a]
returns:
[False, False, False, False, False, False, False, // is b[0] range covers 1-7?
False, False, False, False, False, False, False, // etc. etc.
False, True, True, False, False, False, False,
False, False, False, False, False, False, False]
I assumed this is what you didn't want....but thought I would post it to you anyway
If the (c, d) values restricted to a certain range (say 0-100), you could calculate a boolean array of allowed values, and compare a against that with a simple index lookup.
If you the values are not restricted or the range would be too large, put the values of b into a sorted data structure. Then you can look up a against that quickly, without needing to go through the whole list every time. While building this lookup data structure, you would have to look out for overlapping ranges, and merge them.
If you sort first you can avoid checking every value in a against every tuple in b. The tuples that have already been checked against lower values of a can be discarded, which will make the check much faster.
def check_value_in_ranges(a, b):
a = sorted(set(a))
b = sorted(set(b), reverse=True)
lower, upper = b.pop()
for value in a:
while value >= lower:
if value <= upper:
return True
elif not b:
return False # no tuples left to check against
lower, upper = b.pop()
return False # no values of a left to check
I think this works whether the tuples are overlapping or not - check here.
I would like to determine the sum of a two dimensional numpy array. However, elements with a certain value I want to exclude from this summation. What is the most efficient way to do this?
For example, here I initialize a two dimensional numpy array of 1s and replace several of them by 2:
import numpy
data_set = numpy.ones((10, 10))
data_set[4][4] = 2
data_set[5][5] = 2
data_set[6][6] = 2
How can I sum over the elements in my two dimensional array while excluding all of the 2s? Note that with the 10 by 10 array the correct answer should be 97 as I replaced three elements with the value 2.
I know I can do this with nested for loops. For example:
elements = []
for idx_x in range(data_set.shape[0]):
for idx_y in range(data_set.shape[1]):
if data_set[idx_x][idx_y] != 2:
elements.append(data_set[idx_x][idx_y])
data_set_sum = numpy.sum(elements)
However on my actual data (which is very large) this is too slow. What is the correct way of doing this?
Use numpy's capability of indexing with boolean arrays. In the below example data_set!=2 evaluates to a boolean array which is True whenever the element is not 2 (and has the correct shape). So data_set[data_set!=2] is a fast and convenient way to get an array which doesn't contain a certain value. Of course, the boolean expression can be more complex.
In [1]: import numpy as np
In [2]: data_set = np.ones((10, 10))
In [4]: data_set[4,4] = 2
In [5]: data_set[5,5] = 2
In [6]: data_set[6,6] = 2
In [7]: data_set[data_set != 2].sum()
Out[7]: 97.0
In [8]: data_set != 2
Out[8]:
array([[ True, True, True, True, True, True, True, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
...
[ True, True, True, True, True, True, True, True, True,
True]], dtype=bool)
Without numpy, the solution is not much more complex:
x = [1,2,3,4,5,6,7]
sum(y for y in x if y != 7)
# 21
Works for a list of excluded values too:
# set is faster for resolving `in`
exl = set([1,2,3])
sum(y for y in x if y not in exl)
# 22
Using np.sums where= argument, we avoid the need for array copying which would otherwise be triggered from using advanced array indexing:
>>> import numpy as np
>>> data_set = np.ones((10,10))
>>> data_set[(4,5,6),(4,5,6)] = 2
>>> np.sum(data_set, where=data_set != 2)
97.0
>>> data_set.sum(where=data_set != 2)
97.0
https://numpy.org/doc/stable/reference/generated/numpy.sum.html
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
https://numpy.org/doc/stable/user/basics.indexing.html#advanced-indexing
How about this way that makes use of numpy's boolean capabilities.
We simply set all the values that meet the specification to zero before taking the sum, that way we don't change the shape of the array as we would if we were to filter them from the array.
The other benefit of this is that it means we can sum along axis after the filter is applied.
import numpy
data_set = numpy.ones((10, 10))
data_set[4][4] = 2
data_set[5][5] = 2
data_set[6][6] = 2
print "Sum", data_set.sum()
another_set = numpy.array(data_set) # Take a copy, we'll need that later
data_set[data_set == 2] = 0 # Set all the values that are 2 to zero
print "Filtered sum", data_set.sum()
print "Along axis", data_set.sum(0), data_set.sum(1)
Equally we could use any other boolean to set the data we wish to exclude from the sum.
another_set[(another_set > 1) & (another_set < 3)] = 0
print "Another filtered sum", another_set.sum()