Say I have a 100x100 array in numpy, from this array I want to select 10 random blocks of (x*x)
pixels and change the values of these blocks simultaneously. What is the best way to index the slices for each block? An ideal solution would be something along the lines of the following, where the slices are taken between the pairs of tuples.
A = np.ones(100,100)
blockSize = 10
numBlocks = 15
blockCenter_Row = tuple(np.random.randint(blockSize,high=(100-blockSize),size=numBlocks))
blockCenter_Col = tuple(np.random.randint(blockSize,high=(100-blockSize),size=numBlocks))
rowLeft_Boundary = tuple((i-blockSize/2) for i in blockCenter_Row)
rowRight_Boundary = tuple((i+blockSize/2) for i in blockCenter_Row)
colLower_Boundary = tuple((i-blockSize/2) for i in blockCenter_Row)
colUpper_Boundary = tuple((i+blockSize/2) for i in blockCenter_Row)
for value in range(10):
A[rowLeft_Boundary:rowRight_Boundary,colLower_Boundary:colUpper_Boundary] = value
I think you can use as_strided() to do the trick, if the blocks can be overlaped.
import pylab as pl
from numpy.lib.stride_tricks import as_strided
blockSize = 10
numBlocks = 15
n = 100
a = np.zeros((n, n))
itemsize = a.dtype.itemsize
new_shape = n-blockSize+1, n-blockSize+1, blockSize, blockSize
new_stride = itemsize*n, itemsize, itemsize*n, itemsize
b = as_strided(a, shape=new_shape, strides=new_stride)
idx0 = np.random.randint(0, b.shape[0], numBlocks)
idx1 = np.random.randint(0, b.shape[1], numBlocks)
b[idx0, idx1, :, :] = np.random.rand(numBlocks, blockSize, blockSize)*3 + np.arange(numBlocks).reshape(-1, 1, 1)
pl.imshow(a, cmap="gray", interpolation="nearest")
here is the output:
Related
I would like to apply function func over each row of 2D ndarray arr shaped n x m with provided list of arguments args (of lengh n). That is for each row i function is executed as func(arr[i, :], args[i]).
This task can be acomplished with np.fromiter (using for loop):
iterable = (func(row, arg) for row, arg in zip(arr, args))
results = np.fromiter(iterable, dtype=int)
However this can take some time in case of large arrays. Acoording to unutbu's answer using numpy's python utility functions (e.g. np.apply_along_axis) does not provide siginifacnt speedup. Is there a way to optimize this process?
To avoid falling into XY problem trap, beneath is my orginal problem statement:
I have an ndarray representing image, shaped n x m. This image undergo processing during, which for each row a specifix index i is calculated. I want to compose a image of orginal shape (n x m) using data on the right from index i for each row. That is I want to resample each row[i:] of length m - i to m samples. Note that I want to use my own implementation of resampling function (don't want to use scipy.signal.resample etc).
EDIT:
Test code with func example (added count argument to fromiter as suggested by LudvigH):
import numpy as np
import matplotlib.pyplot as plt
def simple_slant_range_correction(
row, height, n_samples, max_ground_range, max_slant_range, slant_range_resolution
):
ground_ranges = np.linspace(height, max_ground_range, n_samples)
slant_ranges = np.sqrt(ground_ranges ** 2 + height ** 2)
slant_ranges_indicies = slant_ranges / slant_range_resolution - 1
slant_ranges_indicies_floor = np.floor(slant_ranges_indicies).astype(np.int16)
slant_ranges_indicies_ceil = np.clip(
0, n_samples - 1, slant_ranges_indicies_floor + 1
)
weight = slant_ranges_indicies - slant_ranges_indicies_floor
return (
weight * row[slant_ranges_indicies_ceil]
+ (1 - weight) * row[slant_ranges_indicies_floor]
).astype(np.float32)
if __name__ == "__main__":
# Test parameters
n, m = 100, 100
max_slant_range = 50
slant_range_resolution = max_slant_range / m
# Create some dummy data
data = np.zeros((n, m))
h_indicies = np.ones((n), dtype=int)
for i in np.arange(0, n, 5):
data[:i, :i] += i
h_indicies[:i] += 1
heights = h_indicies * slant_range_resolution
max_ground_ranges = np.sqrt(max_slant_range ** 2 - heights ** 2)
# Perform resampling based on h_index
iters = (
simple_slant_range_correction(
row, height, m, max_ground_range, max_slant_range, slant_range_resolution
)
for row, height, max_ground_range in zip(data, heights, max_ground_ranges)
)
data_sampled = np.fromiter(iters, dtype=np.dtype((np.float32, m)), count=n)
# Plot data
fig, axs = plt.subplots(1, 2)
axs[0].plot(h_indicies + 0.5, np.arange(n) + 0.5, c="red")
axs[0].imshow(data, vmin=0, vmax=data.max())
axs[1].imshow(data_sampled, vmin=0, vmax=data.max())
axs[0].set_axis_off()
axs[1].set_axis_off()
plt.tight_layout()
plt.show()
It is typically faster to take advantage of vectorization by using numpy operations to manipulate the data, as compared to using python functions and objects to manipulate the data. Below is an example of a way to solve the problem described at the end of your question using numpy vectorization.
import numpy as np
Choosing some array and column indices as an example:
# 1 2 3 3 1
# A = 4 5 6 6 row_indices = 3
# 7 8 9 9 2
A = np.array([[1,2,3,3],[4,5,6,6],[7,8,9,9]])
row_indices = np.array([1,3,2])
Use vector operations to build a boolean masking array and then multiply the original array by the mask:
NM = np.shape(A)
N = NM[0]
M = NM[1]
col = np.arange(M,dtype=np.uint32)
B = np.outer(np.ones([1,N],dtype=np.uint32),col)
C = np.outer(row_indices,np.ones([1,M],dtype=np.uint32))
A_sampled = (B>=C)*A
print(A_sampled)
# output:
# 0 2 3 3
# 0 0 0 6
# 0 0 9 9
I have 3 numpy matrices:
One contains pixels positions in X (x_pos), another pixel positions in Y (y_pos) and a last one containing pixel values (p_value)
I would like to use these 3 matrices to build a results image
With loops I have this result:
#Resulting image
res = np.zeros((128,128,3), dtype = np.uint8)
for i in range(x_pos.shape[0]):
for j in range(x_pos.shape[1]):
# Get coordinates
x = x_pos[i][j]
y = y_pos[i][j]
res[y,x] = p_value[i][j]
With large matrices (2048*2048) this code already takes a lot of time. Is it possible to optimize this code without using a nested loop?
I specify that the positions in the pos_x and pos_y matrices do not necessarily follow each other, there may be holes or duplicate values
It should be possible using np.meshgrid
i = np.arange(0, x.shape[0])
j = np.arange(0, x.shape[1])
i_1, j_1 = np.meshgrid(i, j, indexing='ij')
res[y_1.ravel(),x_1.ravel()] = p_value[i_1.ravel(),j_1.ravel()]
First use consistent numpy 2d array indexing:
x = x_pos[i,j]
y = y_pos[i,j]
res[y,x] = p_value[i,j]
Now instead of scalar i,j use arrays
i = np.arange(n); j = np.arange(m)
You didn't provida [mcve] so I won't try to demonstrate that th
Thanks to #hpaulj and #ai2ys answer the problem is solved.
Here is a comparison of the results in terms of execution speed:
import numpy as np
import cv2
import time
m_size = 4096
m_x = np.random.randint(0,m_size,(m_size,m_size), dtype = np.uint16)
m_y = np.random.randint(0,m_size,(m_size,m_size), dtype = np.uint16)
p_value = np.ones((m_size,m_size), dtype = np.uint8)
#Meshgrid method:
out = np.zeros((m_size,m_size),dtype=np.uint8)
start = time.time()
i = np.arange(0, m_x.shape[0])
j = np.arange(0, m_x.shape[1])
i_1, j_1 = np.meshgrid(i, j, indexing='ij')
out[m_x.ravel(),m_y.ravel()] = p_value[i_1.ravel(),j_1.ravel()]
end = time.time()
print("Meshgrid: {} s".format(end - start))
#No for loop method:
out = np.zeros((m_size,m_size),dtype=np.uint8)
start = time.time()
i = np.arange(m_x.shape[0])
j = np.arange(m_y.shape[1])
x = m_x[i,j]
y = m_y[i,j]
out[x,y] = p_value[i,j]
end = time.time()
print("No loop: {} s".format(end - start))
#For loop method:
out = np.zeros((m_size,m_size),dtype=np.uint8)
start = time.time()
for i in range(m_x.shape[0]):
for j in range(m_y.shape[1]):
x = m_x[i,j]
y = m_y[i,j]
out[x,y] = p_value[i,j]
end = time.time()
print("Nested loop: {} s".format(end - start))
#Output:
Meshgrid: 0.4837045669555664 s
No loop: 0.3600656986236572 s
Nested loop: 13.10097336769104 s
I have N NumPy arrays of shape data[n,m,3]. I want to fit/squeeze/split/slice/reshape them into N' arrays of shape new_data_#[1000,m,3] where # is the indexing of new arrays. The problem is that n can be smaller, or bigger than 1000. When it is smaller somehow I should fill the rest of 1000 capacity of new_array with the next array, and when it is bigger than 1000 I should make a new_data_# and add the rest to that one. I don't know how to manage this. Here is a pseudo-code but it can't be done this way, for example, the while maybe is not necessary. The output can be written to the disk or returned in a new data format.
def array2blocks(array_files)
for each N in array_files:
N = data = np.random.rand(n, m, 3)
new_data = np.zeros((1000, m, 3), dtype=np.float32)
j=0
index = 0
while j <= new_data.shape[0]:
for i in range(data.shape[0]):
print("--->", data[i,:,:])
print (i)
if i <= new_data.shape[0]:
# here first we should check the left capacity of new_data and then insert data into it
# new_data[i, :, :] = data[i, :, :] #this overrides previous items so not correct
print(new_data)
else:
print('n>1000')
new_data_name = 'new_data' + '_' + str(index)
# here fill rest of the data in the new_data
...
index += 1
#when capacity is full write it to the disk
print(new_data)
UPDATE with Aaron's old answer:
I replaced 1000 with batch_size = 5 to make it simple.
def numpyarrays2blocks(array_files):
N1 = np.random.rand(7, 4, 3)
N2 = np.random.rand(7, 4, 3)
N3 = np.random.rand(4, 4, 3)
# array_files = []
array_files.append(N1)
array_files.append(N2)
array_files.append(N3)
for N in array_files:
n = N.shape[0]
m = N.shape[1]
batch_size = 5
# N = data = np.random.rand(n, m, 3)
data = N
# print(data)
new_arrays = []
i = 0 # the current row index to insert
while i < n:
new_data = np.zeros((batch_size, m, 3), dtype=np.float32)
j = min(i + batch_size, n) # the last row (exclusive) to copy to new_data
# j - i is the number of rows to copy
new_data[:j - i, :, :] = data[i:j, :, :]
print('NEW DATA: ', new_data)
i = j # update the index
new_arrays.append(new_data)
print(new_arrays)
data is used to store the temporary result, and data_start is the index to insert rows to data.
Allocate data if it is None
yield data if it is fully filled.
merge_and_split is a generator so that the memory demand should be low.
import random
from typing import Iterator
import numpy as np
def merge_and_split(arrays, batch_size) -> Iterator:
arrays = tuple(arrays)
dtype = arrays[0].dtype
data_shape = (batch_size,) + arrays[0].shape[1:]
assert all(a.shape[1:] == data_shape[1:] for a in arrays), "Shape mismatch"
data = None
data_start = 0
for src in arrays:
src_index = 0
src_avail = src.shape[0]
while src_avail >= 1:
if data is None:
# allocate if None
data = np.zeros(data_shape, dtype=dtype)
data_start = 0
num_moved = min(batch_size - data_start, src_avail)
data[data_start:data_start + num_moved, ...] = src[src_index:src_index + num_moved, ...]
data_start += num_moved
src_index += num_moved
src_avail -= num_moved
if data_start >= batch_size:
yield data
data = None
if data is not None:
yield data
def input_arrays():
number = 10
r = random.Random(13)
return [np.random.randint(0, 10, size=(r.randint(1, 5), 4, 3)) for _ in range(number)]
def main():
# Testing input and output
arrays = input_arrays()
# for i, item in enumerate(arrays):
# print('input', i, item.shape)
# print(item)
result = list(merge_and_split(arrays, 5))
# for i, item in enumerate(result):
# print('result', i, item.shape)
# print(item)
src_concat = np.vstack(arrays)
row_number = sum(s.shape[0] for s in arrays)
print('concatenated', src_concat.shape, row_number)
out_concat = np.vstack(result)
print(out_concat.shape)
print((out_concat[0:row_number, ...] == src_concat).all()) # They are indeed the same
if __name__ == '__main__':
main()
You can concatenate all your original arrays split them:
ars = ... # list of N arrays
ars = np.concatenate(ars, axis=0)
ars = np.split(ars, np.arange(1000, ars.shape[0], 1000))
The last line can be written as ars = np.split(ars, 1000), but only if you're sure that the total number of elements is a multiple of 1000, since np.split will barf otherwise. Specifying explicit split-points, as with np.arange, allows you to have a shorter final segment.
I have a 4D numpy array of size (98,359,256,269) that I want to threshold.
Right now, I have two separate lists that keep the coordinates of the first 2 dimension and the last 2 dimensions. (mag_ang for the first 2 dimensions and indices for the last 2).
size of indices : (61821,2)
size of mag_ang : (35182,2)
Currently, my code looks like this:
inner_points = []
for k in indices:
x = k[0]
y = k[1]
for i,ctr in enumerate(mag_ang):
mag = ctr[0]
ang = ctr[1]
if X[mag][ang][x][y] > 10:
inner_points.append((y,x))
This code works but it's pretty slow and I wonder if there's any more pythonic/faster way to do this?s
(EDIT: added a second alternate method)
Use numpy multi-array indexing:
import time
import numpy as np
n_mag, n_ang, n_x, n_y = 10, 12, 5, 6
shape = n_mag, n_ang, n_x, n_y
X = np.random.random_sample(shape) * 20
nb_indices = 100 # 61821
indices = np.c_[np.random.randint(0, n_x, nb_indices), np.random.randint(0, n_y, nb_indices)]
nb_mag_ang = 50 # 35182
mag_ang = np.c_[np.random.randint(0, n_mag, nb_mag_ang), np.random.randint(0, n_ang, nb_mag_ang)]
# original method
inner_points = []
start = time.time()
for x, y in indices:
for mag, ang in mag_ang:
if X[mag][ang][x][y] > 10:
inner_points.append((y, x))
end = time.time()
print(end - start)
# faster method 1:
inner_points_faster1 = []
start = time.time()
for x, y in indices:
if np.any(X[mag_ang[:, 0], mag_ang[:, 1], x, y] > 10):
inner_points_faster1.append((y, x))
end = time.time()
print(end - start)
# faster method 2:
start = time.time()
# note: depending on the real size of mag_ang and indices, you may wish to do this the other way round ?
found = X[:, :, indices[:, 0], indices[:, 1]][mag_ang[:, 0], mag_ang[:, 1], :] > 10
# 'found' shape is (nb_mag_ang x nb_indices)
assert found.shape == (nb_mag_ang, nb_indices)
matching_indices_mask = found.any(axis=0)
inner_points_faster2 = indices[matching_indices_mask, :]
end = time.time()
print(end - start)
# finally assert equality of findings
inner_points = np.unique(np.array(inner_points))
inner_points_faster1 = np.unique(np.array(inner_points_faster1))
inner_points_faster2 = np.unique(inner_points_faster2)
assert np.array_equal(inner_points, inner_points_faster1)
assert np.array_equal(inner_points, inner_points_faster2)
yields
0.04685807228088379
0.0
0.0
(of course if you increase the shape the time will not be zero for the second and third)
Final note: here I use "unique" at the end, but it would maybe be wise to do it upfront for the indices and mag_ang arrays (except if you are sure that they are unique already)
Use numpy directly. If indices and mag_ang are numpy arrays of two columns each for the appropriate coordinate:
(x, y), (mag, ang) = indices.T, mag_ang.T
index_matrix = np.meshgrid(mag, ang, x, y).T.reshape(-1,4)
inner_mag, inner_ang, inner_x, inner_y = np.where(X[index_matrix] > 10)
Now you the inner... variables hold arrays for each coordinate. To get a single list of pars you can zip the inner_y and inner_x.
Here are few vecorized ways leveraging broadcasting -
thresh = 10
mask = X[mag_ang[:,0],mag_ang[:,1],indices[:,0,None],indices[:,1,None]]>thresh
r = np.where(mask)[0]
inner_points_out = indices[r][:,::-1]
For larger arrays, we can compare first and then index to get the mask -
mask = (X>thresh)[mag_ang[:,0],mag_ang[:,1],indices[:,0,None],indices[:,1,None]]
If you are only interested in the unique coordinates off indices, use the mask directly -
inner_points_out = indices[mask.any(1)][:,::-1]
For large arrays, we can also leverage multi-cores with numexpr module.
Thus, first off import the module -
import numexpr as ne
Then, replace (X>thresh) with ne.evaluate('X>thresh') in the computation(s) listed earlier.
Use np.where
inner = np.where(X > 10)
a, b, x, y = zip(*inner)
inner_points = np.vstack([y, x]).T
I'm trying to multiplicate 2 big matrices with memory limit using hdf5 (pytables)
but function numpy.dot seems to give me error:
Valueerror: array is too big
I need to do matrix multiplication by myself maybe blockwise or there is some another python function similar to numpy.dot?
import numpy as np
import time
import tables
import cProfile
import numexpr as ne
n_row=10000
n_col=100
n_batch=10
rows = n_row
cols = n_col
batches = n_batch
atom = tables.UInt8Atom() #?
filters = tables.Filters(complevel=9, complib='blosc') # tune parameters
fileName_a = 'C:\carray_a.h5'
shape_a = (rows*batches, cols) # predefined size
h5f_a = tables.open_file(fileName_a, 'w')
ca_a = h5f_a.create_carray(h5f_a.root, 'carray', atom, shape_a, filters=filters)
for i in range(batches):
data = np.random.rand(rows,cols)
ca_a[i*rows:(i+1)*rows]= data[:]
#h5f_0.close()
rows = n_col
cols = n_row
batches = n_batch
fileName_b = 'C:\carray_b.h5'
shape_b = (rows, cols*batches) # predefined size
h5f_b = tables.open_file(fileName_b, 'w')
ca_b = h5f_b.create_carray(h5f_b.root, 'carray', atom, shape_b, filters=filters)
#need to batch by cols
sz= rows/batches
for i in range(batches):
data = np.random.rand(sz, cols*batches)
ca_b[i*sz:(i+1)*sz]= data[:]
#h5f_1.close()
rows = n_batch*n_row
cols = n_batch*n_row
fileName_c = 'C:\carray_c.h5'
shape_c = (rows, cols) # predefined size
h5f_c = tables.open_file(fileName_c, 'w')
ca_c = h5f_c.create_carray(h5f_c.root, 'carray', atom, shape_c, filters=filters)
a= h5f_a.root.carray#[:]
b= h5f_b.root.carray#[:]
c= h5f_c.root.carray
t0= time.time()
c= np.dot(a,b) #error if aray is big
print (time.time()-t0)
Update: so here is the code.It's interesting but using hdf5 it works even faster.
import numpy as np
import tables
import time
sz= 100 #chunk size
n_row=10000 #m
n_col=1000 #n
#for arbitrary size
A=np.random.rand(n_row,n_col)
B=np.random.rand(n_col,n_row)
# A=np.random.randint(5, size=(n_row,n_col))
# B=np.random.randint(5, size=(n_col,n_row))
#using numpy array
#C= np.zeros((n_row,n_row))
#using hdf5
fileName_C = 'CArray_C.h5'
atom = tables.Float32Atom()
shape = (A.shape[0], B.shape[1])
Nchunk = 128 # ?
chunkshape = (Nchunk, Nchunk)
chunk_multiple = 1
block_size = chunk_multiple * Nchunk
h5f_C = tables.open_file(fileName_C, 'w')
C = h5f_C.create_carray(h5f_C.root, 'CArray', atom, shape, chunkshape=chunkshape)
sz= block_size
t0= time.time()
for i in range(0, A.shape[0], sz):
for j in range(0, B.shape[1], sz):
for k in range(0, A.shape[1], sz):
C[i:i+sz,j:j+sz] += np.dot(A[i:i+sz,k:k+sz],B[k:k+sz,j:j+sz])
print (time.time()-t0)
t0= time.time()
res= np.dot(A,B)
print (time.time()-t0)
print (C== res)
h5f_C.close()
I don't know of a np.dot that work without loading into memory. I think blocking would work pretty well. Create a an output array (called "c" below) as pytables CArray and fill in blocks. You should choose the chunkshape when you create it to match your blocking scheme. Something like
atom = tables.Float32Atom() # you have UInt8Atom() above. do you mean that?
shape = (a.shape[0], b.shape[1])
# you can vary block_size and chunkshape independently, but I would
# aim to have block_size an integer multiple of chunkshape
# your mileage may vary and depends on the array size and how you'll
# access it in the future.
Nchunk = 128 # ?
chunkshape = (Nchunk, Nchunk)
chunk_multiple = 1
block_size = chunk_multiple * Nchunk
c = h5f.create_carray(h5.root, 'c', atom, shape, chunkshape=chunkshape)
for i_start in range(0, a.shape[0], block_size):
for j_start in range(0, b.shape[1], block_size):
for k_start in range(0, a.shape[1], block_size):
c[i_start:i_start+block_size, j_start:j_start + block_size] += \
np.dot(a[i_start:i_start + block_size, k_start:k_start + block_size],
b[k_start:k_start + block_size, j_start:j_start + block_size]