First occurrence in numpy logics - python

Let's say I have a numpy.ndarray:
a = np.array([0,4,10,0,11,10])
I compared this with 10.
a >= 10
# array([False, False, True, False, True, True], dtype=bool)
I would like to have a single True, i.e. True only at the first occurrence.
I would like to apply this to a given axis in n-D numpy.ndarray.(say, 1000*1000*10)
a_2d = np.array([[0,4,10],[0,11,10]])
#if axis == 1: array([[False, False, True], [False, True, False]])
What I have done:
As for a 1-D array, I managed to do it by using this.
b=np.zeros(a.size)
b[np.argmax(a>=10)]=True
#b=array([ 0., 0., 1., 0., 0., 0.])
However, I have no idea how to apply this to a large n-D array.

This one should work with no for loops, for 1D or 2D:
def firstByRow(a, f = lambda x: x >= 10):
b = (np.cumsum(f(a), axis = -1) == 1).T
b[1:] = b[1:] * np.equal(b[1:], np.diff((f(a)).astype(int), axis = -1).T)
return b.T
Not sure if it would be faster than a slightly loopier code though, as it does both cumsum and diff
EDIT:
You can also do this, which is probably faster (leveraging that np.unique(return_index = True) picks the first occurrence):
def firstByAxis(a, f = lambda x: x >= 10, axis = 0):
c = np.where(f(a))
i = np.unique(c[axis], return_index = True)[1]
b = np.zeros_like(a)
b[tuple(np.take(c, i, axis = -1))] = 1
return b

You can try the following:
>>> import numpy as np
>>> a_2d = np.array([[0,4,10],[0,11,10]])
>>> r, c = np.where( a_2d >= 10 )
>>> mask = r+c == (r+c).min()
>>> highMask = np.zeros(np.shape(a_2d))
>>> highMask[r[mask], c[mask]] = 1
>>> highMask
array([[ 0., 0., 1.],
[ 0., 1., 0.]])
There is no such thing as the 'first' one in a 2D array. In a 2D array, the minimum indices will form a line on the 2D axis, the both of which will have minimum indices values. For a 3D matrix, this will be a surface, etc ..
Example of such a line would be:
0 0 0 0 0 1
0 0 0 0 1 0
0 0 0 1 0 0
0 0 1 0 0 0
0 1 0 0 0 0
1 0 0 0 0 0
All of which are equidistant from the [0,0] location ...

If you enumerate over the argmax, you can update your zeros array.
Code:
a = np.array([[0, 4, 10], [0, 11, 10]])
print(a)
b = np.zeros(a.shape)
for i, j in enumerate(np.argmax(a >= 10, axis=1)):
b[i, j] = 1
print(b)
Results:
[[ 0 4 10]
[ 0 11 10]]
[[ 0. 0. 1.]
[ 0. 1. 0.]]
Using advanced indexing:
c = np.zeros(a.shape)
c[list(range(a.shape[0])), np.argmax(a >= 10, axis=1)] = 1

Related

I get the error 'ValueError: NumPy boolean array indexing assignment cannot assign 100 input values to the 90 output values where the mask is true'

This is my code, and I get the following error:
ValueError: NumPy boolean array indexing assignment cannot assign 100 input values to the 90 output values where the mask is true. The error is on the line assigning a value for V.
What am I doing wrong?
from pylab import *
import numpy as np
from math import *
r = np.linspace(0, 10, 100)
V = np.piecewise(r, [r < 1, r > 1 ], [1, (r/2)*(3 - (r**2))])
#Graph of V r / k q vs. r/R for unitless quantities
figure()
plot(r, V)
xlabel('r / R')
ylabel('V r / k q')
title('Behavior of V(r) vs r')
ax = plt.gca()
ax.set_xticks([1])
ax.set_xticklabels(['R'])
plt.tick_params(left = False, right = False , labelleft = False)
grid()
show()
First your error with full traceback:
In [1]: r = np.linspace(0, 3, 20)
In [2]: V = np.piecewise(r, [r < 1, r > 1 ], [1, (r/2)*(3 - (r**2))])
Traceback (most recent call last):
File "<ipython-input-2-0bb13126761c>", line 1, in <module>
V = np.piecewise(r, [r < 1, r > 1 ], [1, (r/2)*(3 - (r**2))])
File "<__array_function__ internals>", line 5, in piecewise
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 612, in piecewise
y[cond] = func
ValueError: NumPy boolean array indexing assignment cannot assign 20 input values to the 13 output values where the mask is true
The problem is in piecewise. If you (re)read its docs, you'll see that the elements of the 3rd argument list are supposed to scalars or functions. You provided an array as the 2nd.
In [4]: np.size((r/2)*(3 - (r**2)))
Out[4]: 20
In [5]: np.sum(r<1), np.sum(r>1)
Out[5]: (7, 13)
piecewise is trying to assign 7 values from one condition to the array, and 13 for the other. That's why its complaining when you provide all 20 for the 2nd. 20 does not match 13!
If both values are scalar:
In [6]: V = np.piecewise(r, [r < 1, r > 1 ], [1, 2])
In [7]: V
Out[7]:
array([1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
2., 2., 2.])
We could use a lambda function that will evaluate just the 13 values we want:
In [8]: V = np.piecewise(r, [r < 1, r > 1 ], [1, lambda i: (i/2)*(3-(i**2))])
In [9]: V
Out[9]:
array([ 1. , 1. , 1. , 1. , 1. ,
1. , 1. , 0.98279633, 0.88700977, 0.6967488 ,
0.40020411, -0.01443359, -0.55897361, -1.24522525, -2.08499781,
-3.0901006 , -4.27234291, -5.64353404, -7.21548331, -9. ])
But we don't need to use piecewise. Instead evaluate all values of r, and replace selected ones:
In [10]: V = (r/2)*(3 - (r**2))
In [12]: V[r<1] = 1
In [13]: V
Out[13]:
array([ 1. , 1. , 1. , 1. , 1. ,
1. , 1. , 0.98279633, 0.88700977, 0.6967488 ,
0.40020411, -0.01443359, -0.55897361, -1.24522525, -2.08499781,
-3.0901006 , -4.27234291, -5.64353404, -7.21548331, -9. ])
From the docs:
funclist : list of callables, f(x,*args,**kw), or scalars
Each function is evaluated over `x` wherever its corresponding
condition is True. It should take a 1d array as input and give an 1d
array or a scalar value as output. If, instead of a callable,
a scalar is provided then a constant function (``lambda x: scalar``) is
assumed.
So the working piecewise is doing:
In [17]: fun = lambda i: (i/2)*(3-(i**2))
In [18]: fun(r[r>1])
Out[18]:
array([ 0.98279633, 0.88700977, 0.6967488 , 0.40020411, -0.01443359,
-0.55897361, -1.24522525, -2.08499781, -3.0901006 , -4.27234291,
-5.64353404, -7.21548331, -9. ])
to create 13 values to put in V.
Did you wanted something like this?
r = np.linspace(1, 10, 100)
V = np.piecewise(r, [r < 1, r > 0 ], [1, (r/2)*(3 - (r**2))])
#Graph of V r / k q vs. r/R for unitless quantities
figure()
plt.plot(r, V)
plt.xlabel('r / R')
plt.ylabel('V r / k q')
plt.title('Behavior of V(r) vs r')
ax = plt.gca()
ax.set_xticks([1])
ax.set_xticklabels(['R'])
plt.tick_params(left = False, right = False , labelleft = False)
plt.grid()
plt.show()
Graph

Fastest method for mapping an array of Boolean True counts to a Boolean Array

I have a 1D array of Boolean "True" counts that I want to map to a 2D array.
#Array of boolean True counts
b = [1,3,2,5]
#want this 2D array:
[1,1,1,1]
[0,1,1,1]
[0,1,0,1]
[0,0,0,1]
[0,0,0,1]
The faster the implementation (NumPy/SciPy) the better.
Thank you
Pure numpy method, using np.tri and advanced indexing:
b = np.array([1,3,2,5])
k = b.max()
np.tri(k+1,k,-1,dtype=int)[b].T
# array([[1, 1, 1, 1],
# [0, 1, 1, 1],
# [0, 1, 0, 1],
# [0, 0, 0, 1],
# [0, 0, 0, 1]])
UPDATE:
Two solns that should work better if k >> len(b). m5 and m6 in the benchmarks.
Benchmark code borrowed and extended from #Ehsan, 2nd condition. Changes: Added m5,m6. Reduced highest test size from 1000 to 200. Changed output dtype from int to int8.
Interesting observation; my original solution m2 performs significantly worse on my (low RAM) computer than on #Ehsan's.
Code (new functions only):
##Paul's solution 2
def m5(b):
k = b.max()
n = b.size
return (np.arange(1,2*n+1,dtype=np.int8)&1).repeat(np.ravel([b,k-b],order="F")).reshape(k,n,order="F")
##Paul's solution 3
def m6(b):
k = b.max()
mytri = np.array([1,0],dtype=np.int8).repeat(k)
mytri = np.lib.stride_tricks.as_strided(mytri[k:],(k,k+1),
(mytri.strides[0],-mytri.strides[0]))
return mytri[:,b]
Try:
pd.DataFrame([[1]*x for x in [1,3,2,5]]).T.fillna(0).values
output:
array([[1., 1., 1., 1.],
[0., 1., 1., 1.],
[0., 1., 0., 1.],
[0., 0., 0., 1.],
[0., 0., 0., 1.]])
You can create array of zeroes of shape required:
arr = np.zeros((np.max(b), len(b)))
Then you can create a temporary array x = np.indices(arr.shape)[0] which is:
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4]])
And pad arr with ones like so:
arr[np.where(x<b)] = 1
Numpy approach without the need to create tri in case b.max() is large:
b = np.array([1,3,2,5])
r, c = b.size, b.max()
a = np.zeros((c,r), dtype=int)
a[np.arange(c)[:,None]<b] = 1
output:
[[1 1 1 1]
[0 1 1 1]
[0 1 0 1]
[0 0 0 1]
[0 0 0 1]]
Comparison using benchit:
##Ehsan's solution
def m1(b):
r, c = b.size, b.max()
a = np.zeros((c,r), dtype=int)
a[np.arange(c)[:,None]<b] = 1
return a
##Paul's solution
def m2(b):
k = b.max()
return np.tri(k+1,k,-1,dtype=int)[b].T
##Binyamin's solution
def m3(b):
return pd.DataFrame([[1]*x for x in b]).T.fillna(0).values
##mathfux's solution
def m4(b):
arr = np.zeros((np.max(b), len(b)), dtype=int)
x = np.indices(arr.shape)[0]
arr[np.where(x<b)] = 1
return arr
For different inputs:
in_ = [np.random.randint(100, size=n) for n in [10,100,1000,10000]]
in_ = [np.random.randint(n, size=n) for n in [10,100,1000,10000]]
So what you pick depends on your b.max() value vs. b.size. For larger b.max() values (compared to b.size), m1 is faster and for smaller b.max() (compared to b.size), m2 seems to be faster.
UPDATE: Adding a new solution and comparison with #Paul's new solutions:
##Ehsan's solution 2
def m7(b):
return np.less.outer(np.arange(b.max()),b)+0
Or almost equally:
def m8(b):
return (np.arange(b.max())<b[:,None]).T+0
comparison:
in_ = [np.random.randint(10, size=n) for n in [10,100,1000]]
in_ = [np.random.randint(10000, size=n) for n in [10,100,1000,10000]]
including m8:
in_ = [np.random.randint(10000, size=n) for n in [10,100,1000]]

Find rows in numpy.array (A) that are not in numpy.array(B) [duplicate]

how do I get a row-wise comparison between two arrays, in the result of a row-wise true/false array?
Given datas:
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
Result step 1:
c = np.array([True, True,False,True])
Result final:
a = a[c]
So how do I get the array c ????
P.S.: In this example the arrays a and b are sorted, please give also information if in your solution it is important that the arrays are sorted
Here's a vectorised solution:
res = (a[:, None] == b).all(-1).any(-1)
print(res)
array([ True, True, False, True])
Note that a[:, None] == b compares each row of a with b element-wise. We then use all + any to deduce if there are any rows which are all True for each sub-array:
print(a[:, None] == b)
[[[ True True]
[False True]
[False False]]
[[False True]
[ True True]
[False False]]
[[False False]
[False False]
[False False]]
[[False False]
[False False]
[ True True]]]
Approach #1
We could use a view based vectorized solution -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
A,B = view1D(a,b)
out = np.isin(A,B)
Sample run -
In [8]: a
Out[8]:
array([[1, 0],
[2, 0],
[3, 1],
[4, 2]])
In [9]: b
Out[9]:
array([[1, 0],
[2, 0],
[4, 2]])
In [10]: A,B = view1D(a,b)
In [11]: np.isin(A,B)
Out[11]: array([ True, True, False, True])
Approach #2
Alternatively for the case when all rows in b are in a and rows are lexicographically sorted, using the same views, but with searchsorted -
out = np.zeros(len(A), dtype=bool)
out[np.searchsorted(A,B)] = 1
If the rows are not necessarily lexicographically sorted -
sidx = A.argsort()
out[sidx[np.searchsorted(A,B,sorter=sidx)]] = 1
you can use numpy with apply_along_axis (kind of iteration on specific axis while axis=0 iterate on every cell, axis=1 iterate on every row, axis=2 on matrix and so on
import numpy as np
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
c = np.apply_along_axis(lambda x,y: x in y, 1, a, b)
You can do it as a list comp via:
c = np.array([row in b for row in a])
though this approach will be slower than a pure numpy approach (if it exists).
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
i = 0
j = 0
result = []
We can take advantage of the fact that they are sorted and do this in O(n) time. Using two pointers we just move ahead the pointer that has gotten behind:
while i < len(a) and j < len(b):
if tuple(a[i])== tuple(b[j]):
result.append(True)
i += 1
j += 1 # get rid of this depending on how you want to handle duplicates
elif tuple(a[i]) > tuple(b[j]):
j += 1
else:
result.append(False)
i += 1
Pad with False if it ends early.
if len(result) < len(a):
result.extend([False] * (len(a) - len(result)))
print(result) # [True, True, False, True]
This answer is adapted from Better way to find matches in two sorted lists than using for loops? (Java)
You can use scipy's cdist which has a few advantages:
from scipy.spatial.distance import cdist
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
c = cdist(a, b)==0
print(c.any(axis=1))
[ True True False True]
print(a[c.any(axis=1)])
[[1 0]
[2 0]
[4 2]]
Also, cdist allows passing of a function pointer. So you can specify your own distance functions, to do whatever comparison you need:
c = cdist(a, b, lambda u, v: (u==v).all())
print(c)
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 0.]
[0. 0. 1.]]
And now you can find which index matches. Which will also indicate if there are multiple matches.
# Array with multiple instances
a2 = np.array([[1,0],[2,0],[3,1],[4,2],[3,1],[4,2]])
c2 = cdist(a2, b, lambda u, v: (u==v).all())
print(c2)
idx = np.where(c2==1)
print(idx)
print(idx[0][idx[1]==2])
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 0.]
[0. 0. 1.]
[0. 0. 0.]
[0. 0. 1.]]
(array([0, 1, 3, 5], dtype=int64), array([0, 1, 2, 2], dtype=int64))
[3 5]
The recommended answer is good, but will struggle when dealing with arrays with a large number of rows. An alternative is:
baseval = np.max([a.max(), b.max()]) + 1
a[:,1] = a[:,1] * baseval
b[:,1] = b[:,1] * baseval
c = np.isin(np.sum(a, axis=1), np.sum(b, axis=1))
This uses the maximum value contained in either array plus 1 as a numeric base and treats the columns as baseval^0 and baseval^1 values. This ensures that the sum of the columns are unique for each possible pair of values. If the order of the columns is not important then both input arrays can be sorted column-wise using np.sort(a,axis=1) beforehand.
This can be extended to arrays with more columns using:
baseval = np.max([a.max(), b.max()]) + 1
n_cols = a.shape[1]
a = a * baseval ** np.array(range(n_cols))
b = b * baseval ** np.array(range(n_cols))
c = np.isin(np.sum(a, axis=1), np.sum(b, axis=1))
Overflow can occur when baseval ** (n_cols+1) > 9223372036854775807 if using int64. This can be avoided by setting the numpy arrays to use python integers using dtype=object.

Deducting the median from each column

I have a dataframe, df with numbers, like so:
1 1 1
2 1 1
2 1 3
I'd like to deduct the median from each column so that the median of each becomes 0.
-1 0 0
0 0 0
0 0 2
How do I do this in a pythandic way? I'm guessing it is possible without iterating over the values, computing the median and then deducting. I'd like to do it tersely, approximately like so:
from numpy import median
df -= median(df) #does not work, deducts median for whole dataframe
Just like this
df -= df.median(axis=0)
median of numpy computes median of overall data.
To accomplish using numpy, try this code instead.
df -= median(df, axis=0)
for more detail, see the document: http://docs.scipy.org/doc/numpy/reference/generated/numpy.median.html
Some testing in ipython showed:
In [23]: A = numpy.arange(9)
In [24]: B = A.reshape((3,3))
In [25]: C = numpy.median(B,axis=0)
In [26]: D = B - C[None,:]
In [27]: B
Out[27]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [28]: D
Out[28]:
array([[-3., -3., -3.],
[ 0., 0., 0.],
[ 3., 3., 3.]])
In [29]: C
Out[29]: array([ 3., 4., 5.])
So the next line gets the median along the columns
C = numpy.median(B,axis=0)
And the next line subtracts it from the matrix, column by column
D = B - C[None,:]

Convert 1-D array of discrete values to n-D array of continuous values in Numpy

What is best way of doing: given a 1-D array of discrete variables size N (here N=4) and X is the number of unique elements, I am trying to create a multidimensional array of size (N*X) where elements are 1 or 0 depending on the occurrence of elements in the 1-D array, e.g. Following array_1D (N=4 and X=3) will result in array_ND of size 3*4:
array_1D = np.array([x, y, z, x])
array_ND = [[1 0 0 1]
[0 1 0 0]
[0 0 1 0]]
Thanks,
Aso
Try this:
(np.unique(a)[..., None] == a).astype(np.int)
You can leave out the .astype(np.int) part if you want a boolean array. Here we have used broadcasting (the [..., None] part) to avoid explicit looping.
Broken down, as suggested in the comments:
>>> import numpy as np
>>> a = np.array([1, 2, 3, 1])
>>> unique_elements = np.unique(a)
>>> result = unique_elements[..., None] == a
>>> unique_elements
array([1, 2, 3])
>>> result
array([[ True, False, False, True],
[False, True, False, False],
[False, False, True, False]], dtype=bool)
If the initial array contains valid indexes from 0 to n - 1 then you can write
eye = np.eye(3)
array_1D = np.array([0, 1, 2, 0])
array_ND = eye[array_1D]
The resulting matrix will be
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 1., 0., 0.]])
which is the transpose of the one you expect.
What's happening here is that numpy uses the elements of array_1D as row indices of eye. So the resulting matrix contains as many rows as the elements of array_1D and each one of them relates to the respective element. (0 relates to 1 0 0, etc.)

Categories

Resources