I have a 1D numpy array X with the shape (1000,). I want to inject in random (uniform) places 10 random (normal) values and thus obtain the numpy array of shape (1010,). How to do it efficiently in numpy?
You can use np.insert together with np.random.choice:
n = 10
np.insert(a, np.random.choice(len(a), size=n), np.random.normal(size=n))
Here's one based on masking -
def addrand(a, N):
n = len(a)
m = np.concatenate((np.ones(n, dtype=bool), np.zeros(N, dtype=bool)))
np.random.shuffle(m)
out = np.empty(len(a)+N, dtype=a.dtype)
out[m] = a
out[~m] = np.random.uniform(N)
return out
Sample run -
In [22]: a = 10+np.random.rand(20)
In [23]: a
Out[23]:
array([10.65458302, 10.18034826, 10.08652451, 10.03342622, 10.63930492,
10.48439184, 10.2859206 , 10.91419282, 10.56905636, 10.01595702,
10.21063965, 10.23080433, 10.90546147, 10.02823502, 10.67987108,
10.00583747, 10.24664158, 10.78030108, 10.33638157, 10.32471524])
In [24]: addrand(a, N=3) # adding 3 rand numbers
Out[24]:
array([10.65458302, 10.18034826, 10.08652451, 10.03342622, 0.79989563,
10.63930492, 10.48439184, 10.2859206 , 10.91419282, 10.56905636,
10.01595702, 0.23873077, 10.21063965, 10.23080433, 10.90546147,
10.02823502, 0.66857723, 10.67987108, 10.00583747, 10.24664158,
10.78030108, 10.33638157, 10.32471524])
Timings :
In [71]: a = np.random.rand(1000)
In [72]: %timeit addrand(a, N=10)
37.3 µs ± 273 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# #a_guest's soln
In [73]: %timeit np.insert(a, np.random.choice(len(a), size=10), np.random.normal(size=10))
63.3 µs ± 2.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note: If you are working with bigger arrays, it seems np.insert one is doing better.
You could use numpy.insert(arr, obj, values, axis=None).
import numpy as np
a = np.arange(1000)
a = np.insert(a, np.random.randint(low = 1, high = 999, size=10), np.random.normal(loc=0.0, scale=1.0, size=10))
Keep in mind that insert doesn't automatically change your original array, but it returns a modified copy.
Not sure if this is the most efficient way, but it works, at least.
A = np.arange(1000)
for i in np.random.randint(low = 0, high = 1000, size = 10):
A = np.concatenate((A[:i], [np.random.normal(),], A[i:]))
Edit, checking performance:
def insert_random(A):
for i in np.random.randint(low = 0, high = len(A), size = 10):
A = np.concatenate((A[:i], [np.random.normal(),], A[i:]))
return A
A = np.arange(1000)
%timeit test(A)
83.2 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
So definitely not the most efficient. np.insert seems to be the way to go.
Related
To be clear, below is what I am trying to do. And the question is, how can I change the function oper_AB() so that instead of the nested for loop, I utilize the vectorization/broadcasting in numpy and get to the ret_list much faster?
def oper(a_1D, b_1D):
return np.dot(a_1D, b_1D) / np.dot(b_1D, b_1D)
def oper_AB(A_2D, B_2D):
ret_list = []
for a_1D in A_2D:
for b_1D in B_2D:
ret_list.append(oper(a_1D, b_1D))
return ret_list
Strictly addressing the question (with the reservation that I suspect the OP wants the norm, not the norm squared, as divisor below):
r = a # b.T / np.linalg.norm(b, axis=1)**2
Example:
np.random.seed(0)
a = np.random.randint(0, 10, size=(2,2))
b = np.random.randint(0, 10, size=(2,2))
Then:
>>> a
array([[5, 0],
[3, 3]])
>>> b
array([[7, 9],
[3, 5]])
>>> oper_AB(a, b)
[0.2692307692307692,
0.4411764705882353,
0.36923076923076925,
0.7058823529411765]
>>> a # b.T / np.linalg.norm(b, axis=1)**2
array([[0.26923077, 0.44117647],
[0.36923077, 0.70588235]])
>>> np.ravel(a # b.T / np.linalg.norm(b, axis=1)**2)
array([0.26923077, 0.44117647, 0.36923077, 0.70588235])
Speed:
n, m = 1000, 100
a = np.random.uniform(size=(n, m))
b = np.random.uniform(size=(n, m))
orig = %timeit -o oper_AB(a, b)
# 2.73 s ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
new = %timeit -o np.ravel(a # b.T / np.linalg.norm(b, axis=1)**2)
# 2.22 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
orig.average / new.average
# 1228.78 (speedup)
Our solution is 1200x faster than the original.
Correctness:
>>> np.allclose(np.ravel(a # b.T / np.linalg.norm(b, axis=1)**2), oper_AB(a, b))
True
Speed on large array, comparison to #Ahmed AEK's solution:
n, m = 2000, 2000
a = np.random.uniform(size=(n, m))
b = np.random.uniform(size=(n, m))
new = %timeit -o np.ravel(a # b.T / np.linalg.norm(b, axis=1)**2)
# 86.5 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
other = %timeit -o AEK(a, b) # Ahmed AEK's answer
# 102 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Our solution is 15% faster :-)
this should work.
result = (np.matmul(A_2D, B_2D.transpose())/np.sum(B_2D*B_2D,axis=1)).flatten()
but this second implementation will be faster because of cache utilization.
def oper_AB(A_2D, B_2D):
b_squared = np.sum(B_2D*B_2D,axis=1).reshape([-1,1])
b_normalized = B_2D/b_squared
del b_squared
returned_val = np.matmul(A_2D,b_normalized.transpose())
return returned_val.flatten()
the del is there just if the memory allocated by B_2D is too big, (or it's just me being used to working with multiple GB arrays)
Edit: as requested for A_1D - B_1D
def oper2_AB(A_2D, B_2D):
output = np.zeros([A_2D.shape[0]*B_2D.shape[0],A_2D.shape[1]],dtype=A_2D.dtype)
for i in range(len(A_2D)):
output[i*len(B_2D):(i+1)*len(B_2D)] = A_2D[i]-B_2D
return output
I have a code below:
import numpy as np
wtsarray # shape(5000000,21)
covmat # shape(21,21)
portvol = np.zeros(shape=(wtsarray.shape[0],))
for i in range(0, wtsarray.shape[0]):
portvol[i] = np.sqrt(np.dot(wtsarray[i].T, np.dot(covmat, wtsarray[i]))) * np.sqrt(mtx)
Nothing wrong with the above code, except that there's 5 million rows of row vector, and the for loop can be a little slow, I was wondering if you guys know of a way to vectorise it, so far I have tried with little success.
Or if there is any way to treat each individual row in a numpy matrix as a row vector and perform the above operation?
Thanks, if there are any suggestions on rephrasing my questions, please let me know as well.
portvol = np.sqrt(np.sum(wtsarray * (wtsarray # covmat.T), axis=1)) * np.sqrt(mtx)
should give you what you want. It replaces the first np.dot with elementwise multiplication followed by summation and it replaces the second np.dot(covmat, wtsarray[i]) with matrix multiplication, wtsarray # covmat.T.
For a smaller sample arrays:
In [24]: wtsarray = np.arange(15).reshape((5,3)); covmat=np.arange(9).reshape((3,3))
In [25]: portvol = np.zeros((5))
In [26]: for i in range(0, wtsarray.shape[0]):
...: portvol[i] = np.sqrt(np.dot(wtsarray[i], np.dot(covmat, wtsarray[i])))
...:
In [27]: portvol
Out[27]: array([ 7.74596669, 25.92296279, 43.95452195, 61.96773354, 79.97499609])
#ogdenkev's solution:
In [28]: np.sqrt(np.sum(wtsarray * (wtsarray # covmat.T), axis=1))
Out[28]: array([ 7.74596669, 25.92296279, 43.95452195, 61.96773354, 79.97499609])
In [30]: timeit np.sqrt(np.sum(wtsarray * (wtsarray # covmat.T), axis=1))
20.4 µs ± 891 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Same thing using einsum:
In [29]: np.sqrt(np.einsum('ij,jk,ik->i',wtsarray,covmat,wtsarray))
Out[29]: array([ 7.74596669, 25.92296279, 43.95452195, 61.96773354, 79.97499609])
In [31]: timeit np.sqrt(np.einsum('ij,jk,ik->i',wtsarray,covmat,wtsarray))
12.9 µs ± 24.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
A matmul version is in the works
In [35]: np.sqrt(np.squeeze(wtsarray[:,None,:]#covmat#wtsarray[:,:,None]))
Out[35]: array([ 7.74596669, 25.92296279, 43.95452195, 61.96773354, 79.97499609])
In [36]: timeit np.sqrt(np.squeeze(wtsarray[:,None,:]#covmat#wtsarray[:,:,None]))
13.5 µs ± 15.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I have a dataframe df containing the columns x, y (both starting at 0) and several value columns. The x and y coordinates are not complete, meaning many x-y combinations, and sometimes complete x or y values are missing. I would like to create a 2-d numpy array with the complete matrix of shape (df.x.max() + 1, (df.y.max()+1)), and missing values replaced with np.nan. pd.pivot comes already quite close, but does not fill completely missing x/y values.
The following code already achieves what is needed, but due to the for loop, this is rather slow:
img = np.full((df.x.max() + 1, df.y.max() +1 ), np.nan)
col = 'value'
for ind, line in df.iterrows():
img[line.x, line.y] = line[value]
A significantly faster version goes as follows:
ind = pd.MultiIndex.from_product((range(df.x.max() + 1), range(df.y.max() +1 )), names=['x', 'y'])
s_img = pd.Series([np.nan]*len(ind), index=ind, name='value')
temp = df.loc[readout].set_index(['x', 'y'])['value']
s_img.loc[temp.index] = temp
img = s_img.unstack().values
The question is whether a vectorized method exists which might make the code shorter and faster.
Thanks for any hints in advance!
Often the fastest way to populate a NumPy array is simply to allocate an array and then assign values
to it using a vectorized operator or function. In this case, np.put seems ideal since it allows you to assign values using a (flat) array of indices and an array of values.
nrows, ncols = df['x'].max() + 1, df['y'].max() +1
img = np.full((nrows, ncols), np.nan)
ind = df['x']*ncols + df['y']
np.put(img, ind, df['value'])
Here is a benchmark which shows using np.put can be 82x faster than alt (the unstacking method)
for making a (100, 100)-shaped resultant array:
In [184]: df = make_df(100,100)
In [185]: %timeit orig(df)
161 ms ± 753 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [186]: %timeit alt(df)
31.2 ms ± 235 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [187]: %timeit using_put(df)
378 µs ± 1.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [188]: 31200/378
Out[188]: 82.53968253968254
This is the setup used for the benchmark:
import numpy as np
import pandas as pd
def make_df(nrows, ncols):
df = pd.DataFrame(np.arange(nrows*ncols).reshape(nrows, ncols))
df.index.name = 'x'
df.columns.name = 'y'
ind_x = np.random.choice(np.arange(nrows), replace=False, size=nrows//2)
ind_y = np.random.choice(np.arange(ncols), replace=False, size=ncols//2)
df = df.drop(ind_x, axis=0).drop(ind_y, axis=1).stack().reset_index().rename(columns={0:'value'})
return df
def orig(df):
img = np.full((df.x.max() + 1, df.y.max() +1 ), np.nan)
col = 'value'
for ind, line in df.iterrows():
img[line.x, line.y] = line['value']
return img
def alt(df):
ind = pd.MultiIndex.from_product((range(df.x.max() + 1), range(df.y.max() +1 )), names=['x', 'y'])
s_img = pd.Series([np.nan]*len(ind), index=ind, name='value')
# temp = df.loc[readout].set_index(['x', 'y'])['value']
temp = df.set_index(['x', 'y'])['value']
s_img.loc[temp.index] = temp
img = s_img.unstack().values
return img
def using_put(df):
nrows, ncols = df['x'].max() + 1, df['y'].max() +1
img = np.full((nrows, ncols), np.nan)
ind = df['x']*ncols + df['y']
np.put(img, ind, df['value'])
return img
Alternatively, since your DataFrame is sparse, you might be interested in creating a sparse matrix:
import scipy.sparse as sparse
def using_coo(df):
nrows, ncols = df['x'].max() + 1, df['y'].max() +1
result = sparse.coo_matrix(
(df['value'], (df['x'], df['y'])), shape=(nrows, ncols), dtype='float64')
return result
As one would expect, making sparse matrices (from sparse data) is even faster (and requires less memory) than creating dense NumPy arrays:
In [237]: df = make_df(100,100)
In [238]: %timeit using_put(df)
381 µs ± 2.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [239]: %timeit using_coo(df)
196 µs ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [240]: 381/196
Out[240]: 1.9438775510204083
I have the following np.ndarray:
>>> arr
array([[1, 2],
[3, 4]])
I would like to split it to X and y where each array is coordinates and values, respectively. So far I managed to solve this using np.ndenumerate:
>>> X, y = zip(*np.ndenumerate(arr))
>>> X
((0, 0), (0, 1), (1, 0), (1, 1))
>>> y
(1, 2, 3, 4)
I'm wondering if there's a more idiomatic and faster way to achieve it, since the arrays I'm actually dealing with have millions of values.
I need the X and y array to pass them to a sklearn classifier later. The formats above seemed the most natural for me, but perhaps there's a better way I can pass them to the fit function.
Reshaping arr to y is easy, you can achieve it by y = arr.flatten(). I suggest treating generating X as a separate task.
Let's assume that your dataset is of shape NxM. In our benchmark we set N to 500 and M to 1000.
N = 500
M = 1000
arr = np.random.randn(N, M)
Then by using np.mgrid and transforming indices you can get the result as:
np.mgrid[:N, :M].transpose(1, 2, 0).reshape(-1, 2)
Benchmarks:
%timeit np.mgrid[:N, :M].transpose(1, 2, 0).reshape(-1, 2)
# 3.11 ms ± 35.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit zip(*np.ndenumerate(arr))
# 235 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In your case you can unpack and get N and M by:
N, M = arr.shape
and then:
X = np.mgrid[:N, :M].transpose(1, 2, 0).reshape(-1, 2)
Use numpy.where with numpy.ravel():
import numpy as np
def ndenumerate(np_array):
return list(zip(*np.where(np_array+1))), np_array.ravel()
arr = np.random.randint(0, 100, (1000,1000))
X_new, y_new = ndenumerate(arr)
X,y = zip(*np.ndenumerate(arr))
Output (validation):
all(i1 == i2 for i1, i2 in zip(X, X_new))
# True
all(y == y_new)
# True
Benchmark (about 3x faster):
%timeit ndenumerate(arr)
# 234 ms ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit zip(*np.ndenumerate(arr))
# 877 ms ± 91.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I have a square matrix that is NxN (N is usually >500). It is constructed using a numpy array.
I need to extract a new matrix that has the i-th column and row removed from this matrix. The new matrix is (N-1)x(N-1).
I am currently using the following code to extract this matrix:
new_mat = np.delete(old_mat,idx_2_remove,0)
new_mat = np.delete(old_mat,idx_2_remove,1)
I have also tried to use:
row_indices = [i for i in range(0,idx_2_remove)]
row_indices += [i for i in range(idx_2_remove+1,N)]
col_indices = row_indices
rows = [i for i in row_indices for j in col_indices]
cols = [j for i in row_indices for j in col_indices]
old_mat[(rows, cols)].reshape(len(row_indices), len(col_indices))
But I found this is slower than using np.delete() in the former. The former is still quite slow for my application.
Is there a faster way to accomplish what I want?
Edit 1:
It seems the following is even faster than the above two, but not by much:
new_mat = old_mat[row_indices,:][:,col_indices]
Here are 3 alternatives I quickly wrote:
Repeated delete:
def foo1(arr, i):
return np.delete(np.delete(arr, i, axis=0), i, axis=1)
Maximal use of slicing (may need some edge checks):
def foo2(arr,i):
N = arr.shape[0]
res = np.empty((N-1,N-1), arr.dtype)
res[:i, :i] = arr[:i, :i]
res[:i, i:] = arr[:i, i+1:]
res[i:, :i] = arr[i+1:, :i]
res[i:, i:] = arr[i+1:, i+1:]
return res
Advanced indexing:
def foo3(arr,i):
N = arr.shape[0]
idx = np.r_[:i,i+1:N]
return arr[np.ix_(idx, idx)]
Test that they work:
In [874]: x = np.arange(100).reshape(10,10)
In [875]: np.allclose(foo1(x,5),foo2(x,5))
Out[875]: True
In [876]: np.allclose(foo1(x,5),foo3(x,5))
Out[876]: True
Compare timings:
In [881]: timeit foo1(arr,100).shape
4.98 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [882]: timeit foo2(arr,100).shape
526 µs ± 1.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [883]: timeit foo3(arr,100).shape
2.21 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So the slicing is fastest, even if the code is longer. It looks like np.delete works like foo3, but one dimension at a time.