If we have an numpy array a that needs to be sampled with replacement to create a second numpy array b,
import numpy as np
a = np.arange(10, 200*1000)
b = np.random.choice(a, len(a), replace=True)
What is the most efficient way to find an array of indexes named mapping that will transform a to b? It is OK to change np.random.choice to a more suitable function.
The following code is too slow and takes 7-8 seconds on a Macbook Pro to creating the mapping array. With an array size of 1 million, it will take much longer.
mapping = np.array([], dtype=np.int)
for n in b:
m = np.searchsorted(a, n)
mapping = np.append(mapping, m)
Perhaps, run the choice on index of a and slice a using this random index mapping:
mapping = np.random.choice(np.arange(len(a)), len(a), replace=True)
b = a[mapping]
Related
I have a numpy array of shape 5000, 9 and dtype int. I am trying to create an array of shape 5000, 5000 of dtype int that contains a count of shared elements in each pair of arrays.
I can accomplish this using itertools.combinations and a loop, but that approach is pretty slow (3-4 minutes on my machine), so I'm searching for a more efficient alternative. Any suggestions would be greatly appreciated!
from itertools import combinations
import numpy as np
# create random array where row don't have duplicates
data = np.random.rand(5000, 9).argsort(axis=0)
counts = np.zeros((5000, 9), dtype=int)
for i, j in combinations(range(len(data)), 2):
counts[i, j] = len(np.intersect1d(data[i], data[j]))
Let's try:
# sample data with 200 unique values
np.random.seed(1)
data = np.array([np.random.choice(np.arange(200), size=9, replace=False)
for _ in range(5000)]
)
# identify the unique values:
uniques = np.unique(data)
# dummy for each row
a = (data[...,None] == uniques).sum(1)
# output
out = np.einsum('ij,kj->ik',a,a)
Takes about 4.5s on my system.
In Python, I am trying to initialize 2-element arrays of zeros within a size N by N array. The code I'm using works but I'm looking for something more efficient and elegant:
array1 = np.empty((N,N), dtype=object)
for i in range(N):
for j in range(N):
array1[i,j] = np.zeros(2, dtype=np.int)
Thank ahead for the help
As I understand it, you should probably use a 3D array:
import numpy as np
array1 = np.empty((N,N,2), dtype=object)
which returns an array of N rows, N columns and 2 depth. If you want to pass a (NxN) array to let's say the first depth, just use:
tmp = np.ones(N,N) #for instance
array1(:,:,0) = tmp
I have a really big matrix (nxn)for which I would to build the intersecting tiles (submatrices) with the dimensions mxm. There will be an offset of step bvetween each contiguous submatrices. Here is an example for n=8, m=4, step=2:
import numpy as np
matrix=np.random.randn(8,8)
n=matrix.shape[0]
m=4
step=2
This will store all the corner indices (x,y) from which we will take a 4x4 natrix: (x:x+4,x:x+4)
a={(i,j) for i in range(0,n-m+1,step) for j in range(0,n-m+1,step)}
The submatrices will be extracted like that
sub_matrices = np.zeros([m,m,len(a)])
for i,ind in enumerate(a):
x,y=ind
sub_matrices[:,:,i]=matrix[x:x+m, y:y+m]
Is there a faster way to do this submatrices initialization?
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.
from skimage.util.shape import view_as_windows
# Get indices as array
ar = np.array(list(a))
# Get all sliding windows
w = view_as_windows(matrix,(m,m))
# Get selective ones by indexing with ar
selected_windows = np.moveaxis(w[ar[:,0],ar[:,1]],0,2)
Alternatively, we can extract the row and col indices with a list comprehension and then index with those, like so -
R = [i[0] for i in a]
C = [i[1] for i in a]
selected_windows = np.moveaxis(w[R,C],0,2)
Optimizing from the start, we can skip the creation of stepping array, a and simply use the step arg with view_as_windows, like so -
view_as_windows(matrix,(m,m),step=2)
This would give us a 4D array and indexing into the first two axes of it would have all the mxm shaped windows. These windows are simply views into input and hence no extra memory overhead plus virtually free runtime!
import numpy as np
a = np.random.randn(n, n)
b = a[0:m*step:step, 0:m*step:step]
If you have a one-dimension array, you can get it's submatrix by the following code:
c = a[start:end:step]
If the dimension is two or more, add comma between every dimension.
d = a[start1:end1:step1, start2:end3:step2]
say I have a sequence s and I'd like to select n random sub sequences from it each with length l and store in a matrix. Is there a more numpy way of doing that than
s = np.arange(0, 1000)
n = 5
l = 10
i = np.random.randint(0, len(s)-10, 5)
ss = np.array([s[x:x+l] for x in i])
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows for efficient patch extraction, like so -
from skimage.util.shape import view_as_windows
# Get sliding windows (these are simply views)
w = view_as_windows(s, l)
# Index with indices, i for desired output
out = w[i]
Related :
NumPy Fancy Indexing - Crop different ROIs from different channels
Take N first values from every row in NumPy matrix that fulfill condition
Selecting Random Windows from Multidimensional Numpy Array Rows
How can you iterate over all 2^(n^2) binary n by n matrices (or 2d arrays) in numpy? I would something like:
for M in ....:
Do you have to use itertools.product([0,1], repeat = n**2) and then convert to a 2d numpy array?
This code will give me a random 2d binary matrix but that isn't what I need.
np.random.randint(2, size=(n,n))
Note that 2**(n**2) is a big number for even relatively small n, so your loop might run indefinetely long.
Being said that, one possible way to iterate matrices you need is for example
nxn = np.arange(n**2).reshape(n, -1)
for i in xrange(0, 2**(n**2)):
arr = (i >> nxn) % 2
# do smthng with arr
np.array(list(itertools.product([0,1], repeat = n**2))).reshape(-1,n,n)
produces a (2^(n^2),n,n) array.
There may be some numpy 'grid' function that does the same, but my recollection from other discussions is that itertools.product is pretty fast.
g=(np.array(x).reshape(n,n) for x in itertools.product([0,1], repeat = n**2))
is a generator that produces the nxn arrays one at time:
g.next()
# array([[0, 0],[0, 0]])
Or to produce the same 3d array:
np.array(list(g))