I have a 1D Numpy array A of length N. For each element x in the array, I want to know what proportion of all elements in the array are within the range [x-eps; x+eps], where eps is a constant. N is in the order of 15,000.
At present I do it as follows (minimal example):
import numpy as np
N = 15000
eps = 0.01
A = np.random.rand(N, 1)
prop = np.array([np.mean((A >= x - eps) & (A <= x + eps)) for x in A])
.. which takes around 1 sec on my computer.
My question: is there a more efficient way of doing this?
Edit: I think #jdehesa suggestion in the comments would work as follows:
prop = np.isclose(A, A.T, atol=eps, rtol=0).mean(axis=1)
It's a nice concise solution, but without a speed advantage (on my computer).
That's a good setup to leverage np.searchsorted -
sidx = A.argsort()
ridx = np.searchsorted(A, A+eps, 'right', sorter=sidx)
lidx = np.searchsorted(A, A-eps, 'left', sorter=sidx)
out = ridx - lidx
Timings -
In [71]: N = 15000
...: eps = 0.01
...: A = np.random.rand(N)
In [72]: %timeit np.array([np.sum((A >= x - eps) & (A <= x + eps)) for x in A])
560 ms ± 5.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [73]: %%timeit
...: sidx = A.argsort()
...: ridx = np.searchsorted(A, A+eps, 'right', sorter=sidx)
...: lidx = np.searchsorted(A, A-eps, 'left', sorter=sidx)
...: out = ridx - lidx
5.35 ms ± 47.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Further improvement with pre-sorting :
In [81]: %%timeit
...: sidx = A.argsort()
...: b = A[sidx]
...: ridx = np.searchsorted(b, A+eps, 'right')
...: lidx = np.searchsorted(b, A-eps, 'left')
...: out = ridx - lidx
3.93 ms ± 19.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
As stated in the comments, for the mean equivalent version, simply divide final array output by N.
Related
I have to create a big array from a known one by using a set of indices of the original array. The indices are stored as a ndarray, and to build up the new array I am doing something like this:
import numpy as np
dim_1 = 200
high_index = 1000
dim_2 = 300
masks_array = np.random.randint( low = 0, high = high_index - 1, size=(high_index, dim_1) )
the_array = np.random.rand( high_index, dim_2 )
new_array = np.array( [ the_array[ masks_array[ j, : ], : ] for j in range(high_index) ] )
Is this the fastest way to generate the new_array from the masks_array? Is there a way to do this without a loop? And out of interest, since the "for" loop is inside the np.array constructor, does this translate into efficient looping in Python (similar to list comprehension)?
In [198]: dim_1 = 200
...: high_index = 1000
...: dim_2 = 300
...:
...: masks_array = np.random.randint( low = 0, high = high_index - 1, size=
...: (high_index, dim_1) )
...: the_array = np.random.rand( high_index, dim_2 )
...:
...: new_array = np.array( [ the_array[ masks_array[ j, : ], : ] for j i
...: n range(high_index) ] )
In [199]: new_array.shape
Out[199]: (1000, 200, 300)
In [200]: masks_array.shape
Out[200]: (1000, 200)
In [201]: the_array.shape
Out[201]: (1000, 300)
Let's try the simple indexing with the masks_array:
In [205]: arr = the_array[masks_array,:]
In [206]: arr.shape
Out[206]: (1000, 200, 300)
In [207]: np.allclose(new_array, arr)
Out[207]: True
Time comparisions:
In [213]: timeit new_array = np.array([the_array[masks_array[j,:],:] for j in ra
...: nge(high_index)])
658 ms ± 17.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [214]: timeit arr = the_array[masks_array,:]
292 ms ± 65.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The time savings are modest, I suspect because of the large overall size of the result.
This is python, and np.array is a function. So
[the_array[masks_array[j,:],:] for j in range(high_index)]
is evaluated first, and then passed to `np.array.
In [215]: timeit [the_array[masks_array[j,:],:] for j in range(high_index)]
369 ms ± 7.94 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I have a 1D numpy array X with the shape (1000,). I want to inject in random (uniform) places 10 random (normal) values and thus obtain the numpy array of shape (1010,). How to do it efficiently in numpy?
You can use np.insert together with np.random.choice:
n = 10
np.insert(a, np.random.choice(len(a), size=n), np.random.normal(size=n))
Here's one based on masking -
def addrand(a, N):
n = len(a)
m = np.concatenate((np.ones(n, dtype=bool), np.zeros(N, dtype=bool)))
np.random.shuffle(m)
out = np.empty(len(a)+N, dtype=a.dtype)
out[m] = a
out[~m] = np.random.uniform(N)
return out
Sample run -
In [22]: a = 10+np.random.rand(20)
In [23]: a
Out[23]:
array([10.65458302, 10.18034826, 10.08652451, 10.03342622, 10.63930492,
10.48439184, 10.2859206 , 10.91419282, 10.56905636, 10.01595702,
10.21063965, 10.23080433, 10.90546147, 10.02823502, 10.67987108,
10.00583747, 10.24664158, 10.78030108, 10.33638157, 10.32471524])
In [24]: addrand(a, N=3) # adding 3 rand numbers
Out[24]:
array([10.65458302, 10.18034826, 10.08652451, 10.03342622, 0.79989563,
10.63930492, 10.48439184, 10.2859206 , 10.91419282, 10.56905636,
10.01595702, 0.23873077, 10.21063965, 10.23080433, 10.90546147,
10.02823502, 0.66857723, 10.67987108, 10.00583747, 10.24664158,
10.78030108, 10.33638157, 10.32471524])
Timings :
In [71]: a = np.random.rand(1000)
In [72]: %timeit addrand(a, N=10)
37.3 µs ± 273 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# #a_guest's soln
In [73]: %timeit np.insert(a, np.random.choice(len(a), size=10), np.random.normal(size=10))
63.3 µs ± 2.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note: If you are working with bigger arrays, it seems np.insert one is doing better.
You could use numpy.insert(arr, obj, values, axis=None).
import numpy as np
a = np.arange(1000)
a = np.insert(a, np.random.randint(low = 1, high = 999, size=10), np.random.normal(loc=0.0, scale=1.0, size=10))
Keep in mind that insert doesn't automatically change your original array, but it returns a modified copy.
Not sure if this is the most efficient way, but it works, at least.
A = np.arange(1000)
for i in np.random.randint(low = 0, high = 1000, size = 10):
A = np.concatenate((A[:i], [np.random.normal(),], A[i:]))
Edit, checking performance:
def insert_random(A):
for i in np.random.randint(low = 0, high = len(A), size = 10):
A = np.concatenate((A[:i], [np.random.normal(),], A[i:]))
return A
A = np.arange(1000)
%timeit test(A)
83.2 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
So definitely not the most efficient. np.insert seems to be the way to go.
(I asked a similar question before but this is a different operation.)
I have 2 arrays of boolean masks and I am looking to calculate an operation on every combination of two masks.
The slow version
N = 10000
M = 580
masksA = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
masksB = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
result = np.zeros(shape=(N,N), dtype=np.float)
for i in range(N):
for j in range(N):
result[i,j] = np.float64(np.count_nonzero(np.logical_and(masksA[i,:],masksB[j,:]))) / M
It seems the first input would be masksA as the question text reads - "operation on every combination of two masks".
We can use matrix-multiplication to solve it, like so -
result = masksA.astype(np.float).dot(masksB.T)/M
Alternatively, use lower precision np.float32 for dtype conversion for faster computations. Since, we are counting, it should be fine with lower precision.
Timings -
In [5]: N = 10000
...: M = 580
...:
...: np.random.seed(0)
...: masksA = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
...: masksB = np.array(np.random.randint(0,2, size=(N,M)), dtype=np.bool)
In [6]: %timeit masksA.astype(np.float).dot(masksB.T)
1.87 s ± 50.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit masksA.astype(np.float32).dot(masksB.T)
1 s ± 7.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I have a square matrix that is NxN (N is usually >500). It is constructed using a numpy array.
I need to extract a new matrix that has the i-th column and row removed from this matrix. The new matrix is (N-1)x(N-1).
I am currently using the following code to extract this matrix:
new_mat = np.delete(old_mat,idx_2_remove,0)
new_mat = np.delete(old_mat,idx_2_remove,1)
I have also tried to use:
row_indices = [i for i in range(0,idx_2_remove)]
row_indices += [i for i in range(idx_2_remove+1,N)]
col_indices = row_indices
rows = [i for i in row_indices for j in col_indices]
cols = [j for i in row_indices for j in col_indices]
old_mat[(rows, cols)].reshape(len(row_indices), len(col_indices))
But I found this is slower than using np.delete() in the former. The former is still quite slow for my application.
Is there a faster way to accomplish what I want?
Edit 1:
It seems the following is even faster than the above two, but not by much:
new_mat = old_mat[row_indices,:][:,col_indices]
Here are 3 alternatives I quickly wrote:
Repeated delete:
def foo1(arr, i):
return np.delete(np.delete(arr, i, axis=0), i, axis=1)
Maximal use of slicing (may need some edge checks):
def foo2(arr,i):
N = arr.shape[0]
res = np.empty((N-1,N-1), arr.dtype)
res[:i, :i] = arr[:i, :i]
res[:i, i:] = arr[:i, i+1:]
res[i:, :i] = arr[i+1:, :i]
res[i:, i:] = arr[i+1:, i+1:]
return res
Advanced indexing:
def foo3(arr,i):
N = arr.shape[0]
idx = np.r_[:i,i+1:N]
return arr[np.ix_(idx, idx)]
Test that they work:
In [874]: x = np.arange(100).reshape(10,10)
In [875]: np.allclose(foo1(x,5),foo2(x,5))
Out[875]: True
In [876]: np.allclose(foo1(x,5),foo3(x,5))
Out[876]: True
Compare timings:
In [881]: timeit foo1(arr,100).shape
4.98 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [882]: timeit foo2(arr,100).shape
526 µs ± 1.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [883]: timeit foo3(arr,100).shape
2.21 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So the slicing is fastest, even if the code is longer. It looks like np.delete works like foo3, but one dimension at a time.
I wish to efficiently use pandas (or numpy) instead of a nested for loop with an if statement to solve a particular problem. Here is a toy version:
Suppose I have the following two DataFrames
import pandas as pd
import numpy as np
dict1 = {'vals': [100,200], 'in': [0,1], 'out' :[1,3]}
df1 = pd.DataFrame(data=dict1)
dict2 = {'vals': [500,800,300,200], 'in': [0.1,0.5,2,4], 'out' :[0.5,2,4,5]}
df2 = pd.DataFrame(data=dict2)
Now I wish to loop through each row each dataframe and multiply the vals if a particular condition is met. This code works for what I want
ans = []
for i in range(len(df1)):
for j in range(len(df2)):
if (df1['in'][i] <= df2['out'][j] and df1['out'][i] >= df2['in'][j]):
ans.append(df1['vals'][i]*df2['vals'][j])
np.sum(ans)
However, clearly this is very inefficient and in reality my DataFrames can have millions of entries making this unusable. I am also not making us of pandas or numpy efficient vector implementations. Does anyone have any ideas how to efficiently vectorize this nested loop?
I feel like this code is something akin to matrix multiplication so could progress be made utilising outer? It's the if condition that I'm finding hard to wedge in, as the if logic needs to compare each entry in df1 against all entries in df2.
You can also use a compiler like Numba to do this job. This would also outperform the vectorized solution and doesn't need a temporary array.
Example
import numba as nb
import numpy as np
import pandas as pd
import time
#nb.njit(fastmath=True,parallel=True,error_model='numpy')
def your_function(df1_in,df1_out,df1_vals,df2_in,df2_out,df2_vals):
sum=0.
for i in nb.prange(len(df1_in)):
for j in range(len(df2_in)):
if (df1_in[i] <= df2_out[j] and df1_out[i] >= df2_in[j]):
sum+=df1_vals[i]*df2_vals[j]
return sum
Testing
dict1 = {'vals': np.random.randint(1, 100, 1000),
'in': np.random.randint(1, 10, 1000),
'out': np.random.randint(1, 10, 1000)}
df1 = pd.DataFrame(data=dict1)
dict2 = {'vals': np.random.randint(1, 100, 1500),
'in': 5*np.random.random(1500),
'out': 5*np.random.random(1500)}
df2 = pd.DataFrame(data=dict2)
# First call has some compilation overhead
res=your_function(df1['in'].values, df1['out'].values, df1['vals'].values,
df2['in'].values, df2['out'].values, df2['vals'].values)
t1 = time.time()
for i in range(1000):
res = your_function(df1['in'].values, df1['out'].values, df1['vals'].values,
df2['in'].values, df2['out'].values, df2['vals'].values)
print(time.time() - t1)
Timings
vectorized solution #AGN Gazer: 9.15ms
parallelized Numba Version: 0.7ms
m1 = np.less_equal.outer(df1['in'], df2['out'])
m2 = np.greater_equal.outer(df1['out'], df2['in'])
m = np.logical_and(m1, m2)
v12 = np.outer(df1['vals'], df2['vals'])
print(v12[m].sum())
Or, replace first three lines with this long line:
m = np.less_equal.outer(df1['in'], df2['out']) & np.greater_equal.outer(df1['out'], df2['in'])
s = np.outer(df1['vals'], df2['vals'])[m].sum()
For very large problems, dask is recommended.
Timing Tests:
Here is a timing comparison when using 1000 and 1500-long arrays:
In [166]: dict1 = {'vals': np.random.randint(1,100,1000), 'in': np.random.randint(1,10,1000), 'out': np.random.randint(1,10,1000)}
...: df1 = pd.DataFrame(data=dict1)
...:
...: dict2 = {'vals': np.random.randint(1,100,1500), 'in': 5*np.random.random(1500), 'out': 5*np.random.random(1500)}
...: df2 = pd.DataFrame(data=dict2)
Author's original method (Python loops):
In [167]: def f(df1, df2):
...: ans = []
...: for i in range(len(df1)):
...: for j in range(len(df2)):
...: if (df1['in'][i] <= df2['out'][j] and df1['out'][i] >= df2['in'][j]):
...: ans.append(df1['vals'][i]*df2['vals'][j])
...: return np.sum(ans)
...:
...:
In [168]: %timeit f(df1, df2)
47.3 s ± 1.02 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Ben.T method:
In [170]: %timeit df2['ans']= df2.apply(lambda row: df1['vals'][(df1['in'] <= row['out']) & (df1['out'] >= row['in'])].sum()*row['vals'],1); df2['a
...: ns'].sum()
2.22 s ± 40.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Vectorized solution proposed here:
In [171]: def g(df1, df2):
...: m = np.less_equal.outer(df1['in'], df2['out']) & np.greater_equal.outer(df1['out'], df2['in'])
...: return np.outer(df1['vals'], df2['vals'])[m].sum()
...:
...:
In [172]: %timeit g(df1, df2)
7.81 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Your answer:
471 µs ± 35.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Method 1 (3+ times slower):
df1.apply(lambda row: list((df2['vals'][(row['in'] <= df2['out']) & (row['out'] >= df2['in'])] * row['vals'])), axis=1).sum()
1.56 ms ± 7.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Method 2 (2+ times slower):
ans = []
for name, row in df1.iterrows():
_in = row['in']
_out = row['out']
_vals = row['vals']
ans.append(df2['vals'].loc[(df2['in'] <= _out) & (df2['out'] >= _in)].values * _vals)
1.01 ms ± 8.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Method 3 (3+ times faster):
df1_vals = df1.values
ans = np.zeros(shape=(len(df1_vals), len(df2.values)))
for i in range(df1_vals.shape[0]):
df2_vals = df2.values
df2_vals[:, 2][~np.logical_and(df1_vals[i, 1] >= df2_vals[:, 0], df1_vals[i, 0] <= df2_vals[:, 1])] = 0
ans[i, :] = df2_vals[:, 2] * df1_vals[i, 2]
144 µs ± 3.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In Method 3 you can view the solution by performing:
ans[ans.nonzero()]
Out[]: array([ 50000., 80000., 160000., 60000.]
I wasn't able to think of a way to remove the underlying loop :( but I learnt a lot about numpy in the process! (yay for learning)
One way to do it is by using apply. Create a column in df2 containing the sum of vals in df1, meeting your criteria on in and out, multiplied by the vals of the row of df2
df2['ans']= df2.apply(lambda row: df1['vals'][(df1['in'] <= row['out']) &
(df1['out'] >= row['in'])].sum()*row['vals'],1)
then just sum this column
df2['ans'].sum()