This question already has answers here:
convert numpy array to 0 or 1
(7 answers)
Closed 2 years ago.
I have this function:
if elem < 0:
elem = 0
else:
elem = 1
I want to apply this function to every element in a NumPy array, which would be done with a for loop when performing this function for only the same dimensions. But in this case, I need it to work regardless of the array dimensions and shape. Would there be any way this can be achieved in Python with NumPy?
Or would there be any general way to apply any def to every element in a NumPy n-dimensional array?
Isn't it
arr = (arr >= 0).astype(int)
np.where
np.where(arr < 0, 0, 1)
You can use a boolean mask to define an array of decisions. Let's work through a concrete example. You have an array of positive and negative numbers and you want to take the square root only at non-negative locations:
arr = np.random.normal(size=100)
You compute a mask like
mask = arr >= 0
The most straightforward way to apply the mask is to create an output array, and fill in the required elements:
result = np.empty(arr.shape)
result[mask] = np.sqrt(arr[mask])
result[~mask] = arr[~mask]
This is not super efficient because you have to compute the inverse of the mask and apply it multiple times. For this specific example, your can take advantage of the fact that np.sqrt is a ufunc and use its where keyword:
result = arr.copy()
np.sqrt(arr, where=mask, out=result)
One popular way to apply the mask would be to use np.where but I specifically constructed this example to show the caveats. The simplistic approach would be to compute
result = np.where(mask, np.sqrt(arr), arr)
where chooses the value from either np.sqrt(arr) or arr depending on whether mask is truthy or not. This is a very good method in many cases, but you have to have the values pre-computed for both branches, which is exactly what to want to avoid with a square root.
TL;DR
Your specific example is looking for a representation of the mask itself. If you don't care about the type:
result = arr >= 0
If you do care about the type:
result = (arr >= 0).astype(int)
OR
result = -np.clip(arr, -1, 0)
These solutions create a different array from the input. If you want to replace values in the same buffer,
mask = arr >= 0
arr[mask] = 1
arr[~mask] = 0
You can do something like this:
import numpy as np
a=np.array([-2,-1,0,1,2])
a[a>=0]=1
a[a<0]=0
>>> a
array([0, 0, 1, 1, 1])
An alternative to the above solutions could be combining list comprenhension with ternary operators.
my_array = np.array([-1.2, 3.0, -10.11, 5.2])
sol = np.asarray([0 if val < 0 else 1 for val in my_array])
take a look to these sources
https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
https://book.pythontips.com/en/latest/ternary_operators.html
Use numpy.vectorize():
import numpy as np
def unit(elem):
if elem < 0:
elem = 0
else:
elem = 1
a = np.array([[1, 2, -0.5], [0.5, 2, 3]])
vfunc = np.vectorize(unit)
vfunc(a)
# array([[1, 1, 0], [1, 1, 1]])
Related
I want to apply the function any() to all the rows of a matrix at the same time.
If I use any() with a vector, of course it will return True (or 1 in my case) whenever any element would return True:
import numpy as np
print any(np.array([0,0,0,1]))*1
Now suppose I have a matrix instead. If I want to obtain a vector with 1 and 0 depending on whether each element of the matrix would return True when taken alone, I can do it with a for loop:
matrix=np.array([[0,0,0],[0,0,1],[0,1,0]])
result=np.zeros(len(matrix)).astype('int')
i=0
for line in matrix:
result[i]=any(matrix[i])
i+=1
print result
However, this method does not seem very practical, because the elements of the matrix will be handled once at a time with the for loop. Is there a better way to extend any to a matrix input, in such a way that it returns a vector of several 1 and 0 as above?
Note that I do not want to use matrix.any() because it will just return a single True or False statement, whereas I want it to be applied to each individual element of the matrix.
numpy.any(matrix, axis=1)
numpy.any already has the functionality you want.
You can do this:
import numpy as np
matrix = np.array([[0, 0, 0], [0, 0, 1], [0, 1, 0]])
matrix_sums = np.sum(matrix, axis=1)
are_truthy_matrix_sums = matrix_sums > 0
print are_truthy_matrix_sums
We use np.sum to simplify the matrix to a 1D array with the sums, before comparing these sums against 0 to see if there were any truthy values in these rows.
This prints:
[False True True]
This question already has answers here:
convert numpy array to 0 or 1
(7 answers)
Closed 2 years ago.
I have this function:
if elem < 0:
elem = 0
else:
elem = 1
I want to apply this function to every element in a NumPy array, which would be done with a for loop when performing this function for only the same dimensions. But in this case, I need it to work regardless of the array dimensions and shape. Would there be any way this can be achieved in Python with NumPy?
Or would there be any general way to apply any def to every element in a NumPy n-dimensional array?
Isn't it
arr = (arr >= 0).astype(int)
np.where
np.where(arr < 0, 0, 1)
You can use a boolean mask to define an array of decisions. Let's work through a concrete example. You have an array of positive and negative numbers and you want to take the square root only at non-negative locations:
arr = np.random.normal(size=100)
You compute a mask like
mask = arr >= 0
The most straightforward way to apply the mask is to create an output array, and fill in the required elements:
result = np.empty(arr.shape)
result[mask] = np.sqrt(arr[mask])
result[~mask] = arr[~mask]
This is not super efficient because you have to compute the inverse of the mask and apply it multiple times. For this specific example, your can take advantage of the fact that np.sqrt is a ufunc and use its where keyword:
result = arr.copy()
np.sqrt(arr, where=mask, out=result)
One popular way to apply the mask would be to use np.where but I specifically constructed this example to show the caveats. The simplistic approach would be to compute
result = np.where(mask, np.sqrt(arr), arr)
where chooses the value from either np.sqrt(arr) or arr depending on whether mask is truthy or not. This is a very good method in many cases, but you have to have the values pre-computed for both branches, which is exactly what to want to avoid with a square root.
TL;DR
Your specific example is looking for a representation of the mask itself. If you don't care about the type:
result = arr >= 0
If you do care about the type:
result = (arr >= 0).astype(int)
OR
result = -np.clip(arr, -1, 0)
These solutions create a different array from the input. If you want to replace values in the same buffer,
mask = arr >= 0
arr[mask] = 1
arr[~mask] = 0
You can do something like this:
import numpy as np
a=np.array([-2,-1,0,1,2])
a[a>=0]=1
a[a<0]=0
>>> a
array([0, 0, 1, 1, 1])
An alternative to the above solutions could be combining list comprenhension with ternary operators.
my_array = np.array([-1.2, 3.0, -10.11, 5.2])
sol = np.asarray([0 if val < 0 else 1 for val in my_array])
take a look to these sources
https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
https://book.pythontips.com/en/latest/ternary_operators.html
Use numpy.vectorize():
import numpy as np
def unit(elem):
if elem < 0:
elem = 0
else:
elem = 1
a = np.array([[1, 2, -0.5], [0.5, 2, 3]])
vfunc = np.vectorize(unit)
vfunc(a)
# array([[1, 1, 0], [1, 1, 1]])
I have a 0,1 numpy array like this:
[0,0,0,1,1,1,0,0,1,1,0,0,0,1,1,1,1,0,0,0]
I want to have a function that tells me number 1 is repeated 3,2,4 times in this array, respectively. Is there a simple numpy function for this?
This is one way to do it to find first the clusters and then get their frequency using Counter. The first part is inspired from this answer for 2d arrays. I added the second Counter part to get the desired answer.
If you find the linked original answer helpful, please visit it and upvote it.
from scipy.ndimage import measurements
from collections import Counter
arr = np.array([0,0,0,1,1,1,0,0,1,1,0,0,0,1,1,1,1,0,0,0])
cluster, freq = measurements.label(arr)
print (list(Counter(cluster).values())[1:])
# [3, 2, 4]
Assume you only have 0s and 1s:
import numpy as np
a = np.array([0,0,0,1,1,1,0,0,1,1,0,0,0,1,1,1,1,0,0,0])
# pad a with 0 at both sides for edge cases when a starts or ends with 1
d = np.diff(np.pad(a, pad_width=1, mode='constant'))
# subtract indices when value changes from 0 to 1 from indices where value changes from 1 to 0
np.flatnonzero(d == -1) - np.flatnonzero(d == 1)
# array([3, 2, 4])
A custom implementation?
def count_consecutives(predicate, iterable):
tmp = []
for e in iterable:
if predicate(e): tmp.append(e)
else:
if len(tmp) > 0: yield(len(tmp)) # > 1 if you want at least two consecutive
tmp = []
if len(tmp) > 0: yield(len(tmp)) # > 1 if you want at least two consecutive
So you can:
array = [0,0,0,1,1,1,0,0,1,1,0,0,0,1,1,1,1,0,0,0]
(count_consecutives(lambda x: x == 0, array)
#=> [3, 2, 4]
And also:
array = [0,0,0,1,2,3,0,0,3,2,1,0,0,1,11,10,10,0,0,100]
count_consecutives(lambda x: x > 1, array)
# => [2, 2, 3, 1]
Simple Version:
if I do this:
import numpy as np
a = np.zeros(2)
a[[1, 1]] += np.array([1, 1])
I get [0, 1] as an output. but I would like [0, 2]. Is that possible somehow, using implicit numpy looping instead of looping over it myself?
What-I-actually-need-to-do version:
I have a structured array that contains an index, a value, and some boolean value. I would like to sum those values at those indices, based on the boolean. Clearly that can be done with a simple loop, but it seems like it should be possible with clever numpy indexing (as above).
For example, I have an array with 5 elements that I want to populate from the array with values, indices, and conditions:
import numpy as np
size = 5
nvalues = 10
np.random.seed(1)
a = np.zeros(nvalues, dtype=[('val', float), ('ix', int), ('cond', bool)])
a = np.rec.array(a)
a.val = np.random.rand(nvalues)
a.cond = (np.random.rand(nvalues) > 0.3)
a.ix = np.random.randint(size, size=nvalues)
# obvious solution
obvssum = np.zeros(size)
for i in a:
if i.cond:
obvssum[i.ix] += i.val
# is something this possible?
doesntwork = np.zeros(size)
doesntwork[a[a.cond].ix] += a[a.cond].val
print(doesntwork)
print(obvssum)
Output:
[ 0. 0. 0.61927097 0.02592623 0.29965467]
[ 0. 0. 1.05459336 0.02592623 1.27063303]
I think what's happening here is if a[a.cond].ix were guaranteed to be unique, my method would work just fine, as noted in the simple example.
This is what the at method of NumPy ufuncs is for:
output = numpy.zeros(size)
numpy.add.at(output, a[a.cond].ix, a[a.cond].val)
I have implemented a cyclic iteration function in two ways:
def Spin1(n, N) : # n - current state, N - highest state
value = n + 1
case1 = (value > N)
case2 = (value <= N)
return case1 * 0 + case2 * value
def Spin2(n, N) :
value = n + 1
if value > N :
return 0
else : return value
These functions are identical regarding the returned results. However the second function is not broadcasting-capable for a numpy array. So to test the first function I run this:
import numpy
AR1 = numpy.zeros((3, 4), dtype = numpy.uint32)
AR1[1,2] = 5
print AR1
print Spin1(AR1,5)
Magically it works, and that is so sweet. So I see exactly what I want:
[[0 0 0 0]
[0 0 5 0]
[0 0 0 0]]
[[1 1 1 1]
[1 1 0 1]
[1 1 1 1]]
Now with the second function print Spin2(AR1,5) it fails with this error:
if value > N
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
And it's clear why, since if Array statement is nonsence. So for now I just used the first variant. But when I look at those functions I have a strong feeling that in the first function there are much more mathematical operations so I don't lose the hope that I can do something about optimising it.
Questions:
1. Is it possible to optimise the function Spin1 to do less operations or how do I use the function Spin2 in broadcasting mode (possibly without making my code too ugly)? Extra question: What would be the fastest way to do that manipulation with an array?
2. Is there some standard Python function which does the same calculation (not implicitly broadcasting-capable) and how it is correctly called - "cyclic increment" probably?
There is a numpy function for this: np.where:
In [590]: AR1
Out[590]:
array([[0, 0, 0, 0],
[0, 0, 5, 0],
[0, 0, 0, 0]], dtype=uint32)
In [591]: np.where(AR1 >= 5, 0, 1)
Out[591]:
array([[1, 1, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 1]])
So, you could define:
def Spin1(n, N) :
value = n + 1
return np.where(value > N, 0, value)
NumPy also provides a way to turn normal Python functions into ufuncs:
def Spin2(n, N) :
value = n + 1
if value > N :
return 0
else : return value
Spin2 = np.vectorize(Spin2)
So that you can now call Spin2 on arrays:
In [595]: Spin2(AR1, 5)
Out[595]:
array([[1, 1, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 1]])
However, np.vectorize mainly provides syntactic sugar. There is still a Python function call being made for each array element, which makes np.vectorized ufuncs no faster than equivalent code using Python for-loops.
Your Spin1 follows a well established pattern in array oriented languages (e.g. APL, MATLAB) for 'vectorizing' a function like Spin2. You create one or more booleans (or 0/1 arrays) to represent the various states the array elements can take, and then construct the output by multiplication and summation.
For example, to avoid divide-by-zero problems, I have used:
1/(x+(x==0))
A variation on this is to use a boolean index array to select array elements that should be changed. In this case, you want to return value, but with selected elements 'rolled over'.
def Spin3(n, N) : # n - current state, N - highest state
value = n + 1
value[value>N] = 0
return value
In this case, the indexing approach is simpler, and seems to fit the program logic better. It may be faster, but I can't guarantee that. It's good to keep both approaches in mind.
I put here some feedback as an answer, just not to mess up with the question. So I've done timing tests on various functions and it turns out that assigning by a boolean mask in this case is the fastest variant (hpaulj's answer). np.where was 1.4 times slower and np.vectorize(Spin2) was 15 times slower. Now just out of curiousity I wanted to test this with loops, so I made up this algorithm for testing:
AR1 = numpy.zeros((rows, cols), dtype = numpy.uint32)
while d <= 100:
Buf = numpy.zeros_like(AR1)
r = 0
c = 0
while (r < rows) :
while (c < cols) :
temp = AR1[r, c] + 1
if temp > 5 :
Buf[r, c] = 1
else : Buf[r, c] = temp
c += 1
r += 1
c = 0
AR1 = Buf
d += 1
I am not sure, but it seems to be very straightforward implementation of all the above mentioned functions. But it is sooo slow, almost 300 times slower. I have read similar questions on SO, but still I don't get it, WHY is it so? And what exactly is causing this slowdown. Here I have intentionally made up a buffer to avoid read-write functions on the same elements and do not do memory clean up. So what can be more simple, I am confused. Don't want to open a new question, since it was asked few times already, so probably someone will put comments or has good links clarifying this?