Is there a way to slice a 2d array in numpy into smaller 2d arrays?
Example
[[1,2,3,4], -> [[1,2] [3,4]
[5,6,7,8]] [5,6] [7,8]]
So I basically want to cut down a 2x4 array into 2 2x2 arrays. Looking for a generic solution to be used on images.
There was another question a couple of months ago which clued me in to the idea of using reshape and swapaxes. The h//nrows makes sense since this keeps the first block's rows together. It also makes sense that you'll need nrows and ncols to be part of the shape. -1 tells reshape to fill in whatever number is necessary to make the reshape valid. Armed with the form of the solution, I just tried things until I found the formula that works.
You should be able to break your array into "blocks" using some combination of reshape and swapaxes:
def blockshaped(arr, nrows, ncols):
"""
Return an array of shape (n, nrows, ncols) where
n * nrows * ncols = arr.size
If arr is a 2D array, the returned array should look like n subblocks with
each subblock preserving the "physical" layout of arr.
"""
h, w = arr.shape
assert h % nrows == 0, f"{h} rows is not evenly divisible by {nrows}"
assert w % ncols == 0, f"{w} cols is not evenly divisible by {ncols}"
return (arr.reshape(h//nrows, nrows, -1, ncols)
.swapaxes(1,2)
.reshape(-1, nrows, ncols))
turns c
np.random.seed(365)
c = np.arange(24).reshape((4, 6))
print(c)
[out]:
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]]
into
print(blockshaped(c, 2, 3))
[out]:
[[[ 0 1 2]
[ 6 7 8]]
[[ 3 4 5]
[ 9 10 11]]
[[12 13 14]
[18 19 20]]
[[15 16 17]
[21 22 23]]]
I've posted an inverse function, unblockshaped, here, and an N-dimensional generalization here. The generalization gives a little more insight into the reasoning behind this algorithm.
Note that there is also superbatfish's
blockwise_view. It arranges the
blocks in a different format (using more axes) but it has the advantage of (1)
always returning a view and (2) being capable of handling arrays of any
dimension.
It seems to me that this is a task for numpy.split or some variant.
e.g.
a = np.arange(30).reshape([5,6]) #a.shape = (5,6)
a1 = np.split(a,3,axis=1)
#'a1' is a list of 3 arrays of shape (5,2)
a2 = np.split(a, [2,4])
#'a2' is a list of three arrays of shape (2,5), (2,5), (1,5)
If you have a NxN image you can create, e.g., a list of 2 NxN/2 subimages, and then divide them along the other axis.
numpy.hsplit and numpy.vsplit are also available.
There are some other answers that seem well-suited for your specific case already, but your question piqued my interest in the possibility of a memory-efficient solution usable up to the maximum number of dimensions that numpy supports, and I ended up spending most of the afternoon coming up with possible method. (The method itself is relatively simple, it's just that I still haven't used most of the really fancy features that numpy supports so most of the time was spent researching to see what numpy had available and how much it could do so that I didn't have to do it.)
def blockgen(array, bpa):
"""Creates a generator that yields multidimensional blocks from the given
array(_like); bpa is an array_like consisting of the number of blocks per axis
(minimum of 1, must be a divisor of the corresponding axis size of array). As
the blocks are selected using normal numpy slicing, they will be views rather
than copies; this is good for very large multidimensional arrays that are being
blocked, and for very large blocks, but it also means that the result must be
copied if it is to be modified (unless modifying the original data as well is
intended)."""
bpa = np.asarray(bpa) # in case bpa wasn't already an ndarray
# parameter checking
if array.ndim != bpa.size: # bpa doesn't match array dimensionality
raise ValueError("Size of bpa must be equal to the array dimensionality.")
if (bpa.dtype != np.int # bpa must be all integers
or (bpa < 1).any() # all values in bpa must be >= 1
or (array.shape % bpa).any()): # % != 0 means not evenly divisible
raise ValueError("bpa ({0}) must consist of nonzero positive integers "
"that evenly divide the corresponding array axis "
"size".format(bpa))
# generate block edge indices
rgen = (np.r_[:array.shape[i]+1:array.shape[i]//blk_n]
for i, blk_n in enumerate(bpa))
# build slice sequences for each axis (unfortunately broadcasting
# can't be used to make the items easy to operate over
c = [[np.s_[i:j] for i, j in zip(r[:-1], r[1:])] for r in rgen]
# Now to get the blocks; this is slightly less efficient than it could be
# because numpy doesn't like jagged arrays and I didn't feel like writing
# a ufunc for it.
for idxs in np.ndindex(*bpa):
blockbounds = tuple(c[j][idxs[j]] for j in range(bpa.size))
yield array[blockbounds]
You question practically the same as this one. You can use the one-liner with np.ndindex() and reshape():
def cutter(a, r, c):
lenr = a.shape[0]/r
lenc = a.shape[1]/c
np.array([a[i*r:(i+1)*r,j*c:(j+1)*c] for (i,j) in np.ndindex(lenr,lenc)]).reshape(lenr,lenc,r,c)
To create the result you want:
a = np.arange(1,9).reshape(2,1)
#array([[1, 2, 3, 4],
# [5, 6, 7, 8]])
cutter( a, 1, 2 )
#array([[[[1, 2]],
# [[3, 4]]],
# [[[5, 6]],
# [[7, 8]]]])
Some minor enhancement to TheMeaningfulEngineer's answer that handles the case when the big 2d array cannot be perfectly sliced into equally sized subarrays
def blockfy(a, p, q):
'''
Divides array a into subarrays of size p-by-q
p: block row size
q: block column size
'''
m = a.shape[0] #image row size
n = a.shape[1] #image column size
# pad array with NaNs so it can be divided by p row-wise and by q column-wise
bpr = ((m-1)//p + 1) #blocks per row
bpc = ((n-1)//q + 1) #blocks per column
M = p * bpr
N = q * bpc
A = np.nan* np.ones([M,N])
A[:a.shape[0],:a.shape[1]] = a
block_list = []
previous_row = 0
for row_block in range(bpc):
previous_row = row_block * p
previous_column = 0
for column_block in range(bpr):
previous_column = column_block * q
block = A[previous_row:previous_row+p, previous_column:previous_column+q]
# remove nan columns and nan rows
nan_cols = np.all(np.isnan(block), axis=0)
block = block[:, ~nan_cols]
nan_rows = np.all(np.isnan(block), axis=1)
block = block[~nan_rows, :]
## append
if block.size:
block_list.append(block)
return block_list
Examples:
a = np.arange(25)
a = a.reshape((5,5))
out = blockfy(a, 2, 3)
a->
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
out[0] ->
array([[0., 1., 2.],
[5., 6., 7.]])
out[1]->
array([[3., 4.],
[8., 9.]])
out[-1]->
array([[23., 24.]])
For now it just works when the big 2d array can be perfectly sliced into equally sized subarrays.
The code bellow slices
a ->array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
into this
block_array->
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[ 3, 4, 5],
[ 9, 10, 11]],
[[12, 13, 14],
[18, 19, 20]],
[[15, 16, 17],
[21, 22, 23]]])
p ang q determine the block size
Code
a = arange(24)
a = a.reshape((4,6))
m = a.shape[0] #image row size
n = a.shape[1] #image column size
p = 2 #block row size
q = 3 #block column size
block_array = []
previous_row = 0
for row_block in range(blocks_per_row):
previous_row = row_block * p
previous_column = 0
for column_block in range(blocks_per_column):
previous_column = column_block * q
block = a[previous_row:previous_row+p,previous_column:previous_column+q]
block_array.append(block)
block_array = array(block_array)
If you want a solution that also handles the cases when the matrix is
not equally divided, you can use this:
from operator import add
half_split = np.array_split(input, 2)
res = map(lambda x: np.array_split(x, 2, axis=1), half_split)
res = reduce(add, res)
Here is a solution based on unutbu's answer that handle case where matrix cannot be equally divided. In this case, it will resize the matrix before using some interpolation. You need OpenCV for this. Note that I had to swap ncols and nrows to make it works, didn't figured why.
import numpy as np
import cv2
import math
def blockshaped(arr, r_nbrs, c_nbrs, interp=cv2.INTER_LINEAR):
"""
arr a 2D array, typically an image
r_nbrs numbers of rows
r_cols numbers of cols
"""
arr_h, arr_w = arr.shape
size_w = int( math.floor(arr_w // c_nbrs) * c_nbrs )
size_h = int( math.floor(arr_h // r_nbrs) * r_nbrs )
if size_w != arr_w or size_h != arr_h:
arr = cv2.resize(arr, (size_w, size_h), interpolation=interp)
nrows = int(size_w // r_nbrs)
ncols = int(size_h // c_nbrs)
return (arr.reshape(r_nbrs, ncols, -1, nrows)
.swapaxes(1,2)
.reshape(-1, ncols, nrows))
a = np.random.randint(1, 9, size=(9,9))
out = [np.hsplit(x, 3) for x in np.vsplit(a,3)]
print(a)
print(out)
yields
[[7 6 2 4 4 2 5 2 3]
[2 3 7 6 8 8 2 6 2]
[4 1 3 1 3 8 1 3 7]
[6 1 1 5 7 2 1 5 8]
[8 8 7 6 6 1 8 8 4]
[6 1 8 2 1 4 5 1 8]
[7 3 4 2 5 6 1 2 7]
[4 6 7 5 8 2 8 2 8]
[6 6 5 5 6 1 2 6 4]]
[[array([[7, 6, 2],
[2, 3, 7],
[4, 1, 3]]), array([[4, 4, 2],
[6, 8, 8],
[1, 3, 8]]), array([[5, 2, 3],
[2, 6, 2],
[1, 3, 7]])], [array([[6, 1, 1],
[8, 8, 7],
[6, 1, 8]]), array([[5, 7, 2],
[6, 6, 1],
[2, 1, 4]]), array([[1, 5, 8],
[8, 8, 4],
[5, 1, 8]])], [array([[7, 3, 4],
[4, 6, 7],
[6, 6, 5]]), array([[2, 5, 6],
[5, 8, 2],
[5, 6, 1]]), array([[1, 2, 7],
[8, 2, 8],
[2, 6, 4]])]]
I publish my solution. Notice that this code doesn't' actually create copies of original array, so it works well with big data. Moreover, it doesn't crash if array cannot be divided evenly (but you can easly add condition for that by deleting ceil and checking if v_slices and h_slices are divided without rest).
import numpy as np
from math import ceil
a = np.arange(9).reshape(3, 3)
p, q = 2, 2
width, height = a.shape
v_slices = ceil(width / p)
h_slices = ceil(height / q)
for h in range(h_slices):
for v in range(v_slices):
block = a[h * p : h * p + p, v * q : v * q + q]
# do something with a block
This code changes (or, more precisely, gives you direct access to part of an array) this:
[[0 1 2]
[3 4 5]
[6 7 8]]
Into this:
[[0 1]
[3 4]]
[[2]
[5]]
[[6 7]]
[[8]]
If you need actual copies, Aenaon code is what you are looking for.
If you are sure that big array can be divided evenly, you can use numpy splitting tools.
to add to #Aenaon answer and his blockfy function, if you are working with COLOR IMAGES/ 3D ARRAY here is my pipeline to create crops of 224 x 224 for 3 channel input
def blockfy(a, p, q):
'''
Divides array a into subarrays of size p-by-q
p: block row size
q: block column size
'''
m = a.shape[0] #image row size
n = a.shape[1] #image column size
# pad array with NaNs so it can be divided by p row-wise and by q column-wise
bpr = ((m-1)//p + 1) #blocks per row
bpc = ((n-1)//q + 1) #blocks per column
M = p * bpr
N = q * bpc
A = np.nan* np.ones([M,N])
A[:a.shape[0],:a.shape[1]] = a
block_list = []
previous_row = 0
for row_block in range(bpc):
previous_row = row_block * p
previous_column = 0
for column_block in range(bpr):
previous_column = column_block * q
block = A[previous_row:previous_row+p, previous_column:previous_column+q]
# remove nan columns and nan rows
nan_cols = np.all(np.isnan(block), axis=0)
block = block[:, ~nan_cols]
nan_rows = np.all(np.isnan(block), axis=1)
block = block[~nan_rows, :]
## append
if block.size:
block_list.append(block)
return block_list
then extended above to
for file in os.listdir(path_to_crop): ### list files in your folder
img = io.imread(path_to_crop + file, as_gray=False) ### open image
r = blockfy(img[:,:,0],224,224) ### crop blocks of 224 x 224 for red channel
g = blockfy(img[:,:,1],224,224) ### crop blocks of 224 x 224 for green channel
b = blockfy(img[:,:,2],224,224) ### crop blocks of 224 x 224 for blue channel
for x in range(0,len(r)):
img = np.array((r[x],g[x],b[x])) ### combine each channel into one patch by patch
img = img.astype(np.uint8) ### cast back to proper integers
img_swap = img.swapaxes(0, 2) ### need to swap axes due to the way things were proceesed
img_swap_2 = img_swap.swapaxes(0, 1) ### do it again
Image.fromarray(img_swap_2).save(path_save_crop+str(x)+"bounding" + file,
format = 'jpeg',
subsampling=0,
quality=100) ### save patch with new name etc
Related
Is there a way to slice a 2d array in numpy into smaller 2d arrays?
Example
[[1,2,3,4], -> [[1,2] [3,4]
[5,6,7,8]] [5,6] [7,8]]
So I basically want to cut down a 2x4 array into 2 2x2 arrays. Looking for a generic solution to be used on images.
There was another question a couple of months ago which clued me in to the idea of using reshape and swapaxes. The h//nrows makes sense since this keeps the first block's rows together. It also makes sense that you'll need nrows and ncols to be part of the shape. -1 tells reshape to fill in whatever number is necessary to make the reshape valid. Armed with the form of the solution, I just tried things until I found the formula that works.
You should be able to break your array into "blocks" using some combination of reshape and swapaxes:
def blockshaped(arr, nrows, ncols):
"""
Return an array of shape (n, nrows, ncols) where
n * nrows * ncols = arr.size
If arr is a 2D array, the returned array should look like n subblocks with
each subblock preserving the "physical" layout of arr.
"""
h, w = arr.shape
assert h % nrows == 0, f"{h} rows is not evenly divisible by {nrows}"
assert w % ncols == 0, f"{w} cols is not evenly divisible by {ncols}"
return (arr.reshape(h//nrows, nrows, -1, ncols)
.swapaxes(1,2)
.reshape(-1, nrows, ncols))
turns c
np.random.seed(365)
c = np.arange(24).reshape((4, 6))
print(c)
[out]:
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]]
into
print(blockshaped(c, 2, 3))
[out]:
[[[ 0 1 2]
[ 6 7 8]]
[[ 3 4 5]
[ 9 10 11]]
[[12 13 14]
[18 19 20]]
[[15 16 17]
[21 22 23]]]
I've posted an inverse function, unblockshaped, here, and an N-dimensional generalization here. The generalization gives a little more insight into the reasoning behind this algorithm.
Note that there is also superbatfish's
blockwise_view. It arranges the
blocks in a different format (using more axes) but it has the advantage of (1)
always returning a view and (2) being capable of handling arrays of any
dimension.
It seems to me that this is a task for numpy.split or some variant.
e.g.
a = np.arange(30).reshape([5,6]) #a.shape = (5,6)
a1 = np.split(a,3,axis=1)
#'a1' is a list of 3 arrays of shape (5,2)
a2 = np.split(a, [2,4])
#'a2' is a list of three arrays of shape (2,5), (2,5), (1,5)
If you have a NxN image you can create, e.g., a list of 2 NxN/2 subimages, and then divide them along the other axis.
numpy.hsplit and numpy.vsplit are also available.
There are some other answers that seem well-suited for your specific case already, but your question piqued my interest in the possibility of a memory-efficient solution usable up to the maximum number of dimensions that numpy supports, and I ended up spending most of the afternoon coming up with possible method. (The method itself is relatively simple, it's just that I still haven't used most of the really fancy features that numpy supports so most of the time was spent researching to see what numpy had available and how much it could do so that I didn't have to do it.)
def blockgen(array, bpa):
"""Creates a generator that yields multidimensional blocks from the given
array(_like); bpa is an array_like consisting of the number of blocks per axis
(minimum of 1, must be a divisor of the corresponding axis size of array). As
the blocks are selected using normal numpy slicing, they will be views rather
than copies; this is good for very large multidimensional arrays that are being
blocked, and for very large blocks, but it also means that the result must be
copied if it is to be modified (unless modifying the original data as well is
intended)."""
bpa = np.asarray(bpa) # in case bpa wasn't already an ndarray
# parameter checking
if array.ndim != bpa.size: # bpa doesn't match array dimensionality
raise ValueError("Size of bpa must be equal to the array dimensionality.")
if (bpa.dtype != np.int # bpa must be all integers
or (bpa < 1).any() # all values in bpa must be >= 1
or (array.shape % bpa).any()): # % != 0 means not evenly divisible
raise ValueError("bpa ({0}) must consist of nonzero positive integers "
"that evenly divide the corresponding array axis "
"size".format(bpa))
# generate block edge indices
rgen = (np.r_[:array.shape[i]+1:array.shape[i]//blk_n]
for i, blk_n in enumerate(bpa))
# build slice sequences for each axis (unfortunately broadcasting
# can't be used to make the items easy to operate over
c = [[np.s_[i:j] for i, j in zip(r[:-1], r[1:])] for r in rgen]
# Now to get the blocks; this is slightly less efficient than it could be
# because numpy doesn't like jagged arrays and I didn't feel like writing
# a ufunc for it.
for idxs in np.ndindex(*bpa):
blockbounds = tuple(c[j][idxs[j]] for j in range(bpa.size))
yield array[blockbounds]
You question practically the same as this one. You can use the one-liner with np.ndindex() and reshape():
def cutter(a, r, c):
lenr = a.shape[0]/r
lenc = a.shape[1]/c
np.array([a[i*r:(i+1)*r,j*c:(j+1)*c] for (i,j) in np.ndindex(lenr,lenc)]).reshape(lenr,lenc,r,c)
To create the result you want:
a = np.arange(1,9).reshape(2,1)
#array([[1, 2, 3, 4],
# [5, 6, 7, 8]])
cutter( a, 1, 2 )
#array([[[[1, 2]],
# [[3, 4]]],
# [[[5, 6]],
# [[7, 8]]]])
Some minor enhancement to TheMeaningfulEngineer's answer that handles the case when the big 2d array cannot be perfectly sliced into equally sized subarrays
def blockfy(a, p, q):
'''
Divides array a into subarrays of size p-by-q
p: block row size
q: block column size
'''
m = a.shape[0] #image row size
n = a.shape[1] #image column size
# pad array with NaNs so it can be divided by p row-wise and by q column-wise
bpr = ((m-1)//p + 1) #blocks per row
bpc = ((n-1)//q + 1) #blocks per column
M = p * bpr
N = q * bpc
A = np.nan* np.ones([M,N])
A[:a.shape[0],:a.shape[1]] = a
block_list = []
previous_row = 0
for row_block in range(bpc):
previous_row = row_block * p
previous_column = 0
for column_block in range(bpr):
previous_column = column_block * q
block = A[previous_row:previous_row+p, previous_column:previous_column+q]
# remove nan columns and nan rows
nan_cols = np.all(np.isnan(block), axis=0)
block = block[:, ~nan_cols]
nan_rows = np.all(np.isnan(block), axis=1)
block = block[~nan_rows, :]
## append
if block.size:
block_list.append(block)
return block_list
Examples:
a = np.arange(25)
a = a.reshape((5,5))
out = blockfy(a, 2, 3)
a->
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
out[0] ->
array([[0., 1., 2.],
[5., 6., 7.]])
out[1]->
array([[3., 4.],
[8., 9.]])
out[-1]->
array([[23., 24.]])
For now it just works when the big 2d array can be perfectly sliced into equally sized subarrays.
The code bellow slices
a ->array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
into this
block_array->
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[ 3, 4, 5],
[ 9, 10, 11]],
[[12, 13, 14],
[18, 19, 20]],
[[15, 16, 17],
[21, 22, 23]]])
p ang q determine the block size
Code
a = arange(24)
a = a.reshape((4,6))
m = a.shape[0] #image row size
n = a.shape[1] #image column size
p = 2 #block row size
q = 3 #block column size
block_array = []
previous_row = 0
for row_block in range(blocks_per_row):
previous_row = row_block * p
previous_column = 0
for column_block in range(blocks_per_column):
previous_column = column_block * q
block = a[previous_row:previous_row+p,previous_column:previous_column+q]
block_array.append(block)
block_array = array(block_array)
If you want a solution that also handles the cases when the matrix is
not equally divided, you can use this:
from operator import add
half_split = np.array_split(input, 2)
res = map(lambda x: np.array_split(x, 2, axis=1), half_split)
res = reduce(add, res)
Here is a solution based on unutbu's answer that handle case where matrix cannot be equally divided. In this case, it will resize the matrix before using some interpolation. You need OpenCV for this. Note that I had to swap ncols and nrows to make it works, didn't figured why.
import numpy as np
import cv2
import math
def blockshaped(arr, r_nbrs, c_nbrs, interp=cv2.INTER_LINEAR):
"""
arr a 2D array, typically an image
r_nbrs numbers of rows
r_cols numbers of cols
"""
arr_h, arr_w = arr.shape
size_w = int( math.floor(arr_w // c_nbrs) * c_nbrs )
size_h = int( math.floor(arr_h // r_nbrs) * r_nbrs )
if size_w != arr_w or size_h != arr_h:
arr = cv2.resize(arr, (size_w, size_h), interpolation=interp)
nrows = int(size_w // r_nbrs)
ncols = int(size_h // c_nbrs)
return (arr.reshape(r_nbrs, ncols, -1, nrows)
.swapaxes(1,2)
.reshape(-1, ncols, nrows))
a = np.random.randint(1, 9, size=(9,9))
out = [np.hsplit(x, 3) for x in np.vsplit(a,3)]
print(a)
print(out)
yields
[[7 6 2 4 4 2 5 2 3]
[2 3 7 6 8 8 2 6 2]
[4 1 3 1 3 8 1 3 7]
[6 1 1 5 7 2 1 5 8]
[8 8 7 6 6 1 8 8 4]
[6 1 8 2 1 4 5 1 8]
[7 3 4 2 5 6 1 2 7]
[4 6 7 5 8 2 8 2 8]
[6 6 5 5 6 1 2 6 4]]
[[array([[7, 6, 2],
[2, 3, 7],
[4, 1, 3]]), array([[4, 4, 2],
[6, 8, 8],
[1, 3, 8]]), array([[5, 2, 3],
[2, 6, 2],
[1, 3, 7]])], [array([[6, 1, 1],
[8, 8, 7],
[6, 1, 8]]), array([[5, 7, 2],
[6, 6, 1],
[2, 1, 4]]), array([[1, 5, 8],
[8, 8, 4],
[5, 1, 8]])], [array([[7, 3, 4],
[4, 6, 7],
[6, 6, 5]]), array([[2, 5, 6],
[5, 8, 2],
[5, 6, 1]]), array([[1, 2, 7],
[8, 2, 8],
[2, 6, 4]])]]
I publish my solution. Notice that this code doesn't' actually create copies of original array, so it works well with big data. Moreover, it doesn't crash if array cannot be divided evenly (but you can easly add condition for that by deleting ceil and checking if v_slices and h_slices are divided without rest).
import numpy as np
from math import ceil
a = np.arange(9).reshape(3, 3)
p, q = 2, 2
width, height = a.shape
v_slices = ceil(width / p)
h_slices = ceil(height / q)
for h in range(h_slices):
for v in range(v_slices):
block = a[h * p : h * p + p, v * q : v * q + q]
# do something with a block
This code changes (or, more precisely, gives you direct access to part of an array) this:
[[0 1 2]
[3 4 5]
[6 7 8]]
Into this:
[[0 1]
[3 4]]
[[2]
[5]]
[[6 7]]
[[8]]
If you need actual copies, Aenaon code is what you are looking for.
If you are sure that big array can be divided evenly, you can use numpy splitting tools.
to add to #Aenaon answer and his blockfy function, if you are working with COLOR IMAGES/ 3D ARRAY here is my pipeline to create crops of 224 x 224 for 3 channel input
def blockfy(a, p, q):
'''
Divides array a into subarrays of size p-by-q
p: block row size
q: block column size
'''
m = a.shape[0] #image row size
n = a.shape[1] #image column size
# pad array with NaNs so it can be divided by p row-wise and by q column-wise
bpr = ((m-1)//p + 1) #blocks per row
bpc = ((n-1)//q + 1) #blocks per column
M = p * bpr
N = q * bpc
A = np.nan* np.ones([M,N])
A[:a.shape[0],:a.shape[1]] = a
block_list = []
previous_row = 0
for row_block in range(bpc):
previous_row = row_block * p
previous_column = 0
for column_block in range(bpr):
previous_column = column_block * q
block = A[previous_row:previous_row+p, previous_column:previous_column+q]
# remove nan columns and nan rows
nan_cols = np.all(np.isnan(block), axis=0)
block = block[:, ~nan_cols]
nan_rows = np.all(np.isnan(block), axis=1)
block = block[~nan_rows, :]
## append
if block.size:
block_list.append(block)
return block_list
then extended above to
for file in os.listdir(path_to_crop): ### list files in your folder
img = io.imread(path_to_crop + file, as_gray=False) ### open image
r = blockfy(img[:,:,0],224,224) ### crop blocks of 224 x 224 for red channel
g = blockfy(img[:,:,1],224,224) ### crop blocks of 224 x 224 for green channel
b = blockfy(img[:,:,2],224,224) ### crop blocks of 224 x 224 for blue channel
for x in range(0,len(r)):
img = np.array((r[x],g[x],b[x])) ### combine each channel into one patch by patch
img = img.astype(np.uint8) ### cast back to proper integers
img_swap = img.swapaxes(0, 2) ### need to swap axes due to the way things were proceesed
img_swap_2 = img_swap.swapaxes(0, 1) ### do it again
Image.fromarray(img_swap_2).save(path_save_crop+str(x)+"bounding" + file,
format = 'jpeg',
subsampling=0,
quality=100) ### save patch with new name etc
I have the following array:
import numpy as np
a = np.array([[ 1, 2, 3],
[ 1, 2, 3],
[ 1, 2, 3]])
I understand that np.random.shuffle(a.T) will shuffle the array along the row, but what I need is for it to shuffe each row idependently. How can this be done in numpy? Speed is critical as there will be several million rows.
For this specific problem, each row will contain the same starting population.
import numpy as np
np.random.seed(2018)
def scramble(a, axis=-1):
"""
Return an array with the values of `a` independently shuffled along the
given axis
"""
b = a.swapaxes(axis, -1)
n = a.shape[axis]
idx = np.random.choice(n, n, replace=False)
b = b[..., idx]
return b.swapaxes(axis, -1)
a = a = np.arange(4*9).reshape(4, 9)
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
# [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
# [18, 19, 20, 21, 22, 23, 24, 25, 26],
# [27, 28, 29, 30, 31, 32, 33, 34, 35]])
print(scramble(a, axis=1))
yields
[[ 3 8 7 0 4 5 1 2 6]
[12 17 16 9 13 14 10 11 15]
[21 26 25 18 22 23 19 20 24]
[30 35 34 27 31 32 28 29 33]]
while scrambling along the 0-axis:
print(scramble(a, axis=0))
yields
[[18 19 20 21 22 23 24 25 26]
[ 0 1 2 3 4 5 6 7 8]
[27 28 29 30 31 32 33 34 35]
[ 9 10 11 12 13 14 15 16 17]]
This works by first swapping the target axis with the last axis:
b = a.swapaxes(axis, -1)
This is a common trick used to standardize code which deals with one axis.
It reduces the general case to the specific case of dealing with the last axis.
Since in NumPy version 1.10 or higher swapaxes returns a view, there is no copying involved and so calling swapaxes is very quick.
Now we can generate a new index order for the last axis:
n = a.shape[axis]
idx = np.random.choice(n, n, replace=False)
Now we can shuffle b (independently along the last axis):
b = b[..., idx]
and then reverse the swapaxes to return an a-shaped result:
return b.swapaxes(axis, -1)
If you don't want a return value and want to operate on the array directly, you can specify the indices to shuffle.
>>> import numpy as np
>>>
>>>
>>> a = np.array([[1,2,3], [1,2,3], [1,2,3]])
>>>
>>> # Shuffle row `2` independently
>>> np.random.shuffle(a[2])
>>> a
array([[1, 2, 3],
[1, 2, 3],
[3, 2, 1]])
>>>
>>> # Shuffle column `0` independently
>>> np.random.shuffle(a[:,0])
>>> a
array([[3, 2, 3],
[1, 2, 3],
[1, 2, 1]])
If you want a return value as well, you can use numpy.random.permutation, in which case replace np.random.shuffle(a[n]) with a[n] = np.random.permutation(a[n]).
Warning, do not do a[n] = np.random.shuffle(a[n]). shuffle does not return anything, so the row/column you end up "shuffling" will be filled with nan instead.
Good answer above. But I will throw in a quick and dirty way:
a = np.array([[1,2,3], [1,2,3], [1,2,3]])
ignore_list_outpput = [np.random.shuffle(x) for x in a]
Then, a can be something like this
array([[2, 1, 3],
[4, 6, 5],
[9, 7, 8]])
Not very elegant but you can get this job done with just one short line.
Building on my comment to #Hun's answer, here's the fastest way to do this:
def shuffle_along(X):
"""Minimal in place independent-row shuffler."""
[np.random.shuffle(x) for x in X]
This works in-place and can only shuffle rows. If you need more options:
def shuffle_along(X, axis=0, inline=False):
"""More elaborate version of the above."""
if not inline:
X = X.copy()
if axis == 0:
[np.random.shuffle(x) for x in X]
if axis == 1:
[np.random.shuffle(x) for x in X.T]
if not inline:
return X
This, however, has the limitation of only working on 2d-arrays. For higher dimensional tensors, I would use:
def shuffle_along(X, axis=0, inline=True):
"""Shuffle along any axis of a tensor."""
if not inline:
X = X.copy()
np.apply_along_axis(np.random.shuffle, axis, X) # <-- I just changed this
if not inline:
return X
You can do it with numpy without any loop or extra function, and much more faster. E. g., we have an array of size (2, 6) and we want a sub array (2,2) with independent random index for each column.
import numpy as np
test = np.array([[1, 1],
[2, 2],
[0.5, 0.5],
[0.3, 0.3],
[4, 4],
[7, 7]])
id_rnd = np.random.randint(6, size=(2, 2)) # select random numbers, use choice and range if don want replacement.
new = np.take_along_axis(test, id_rnd, axis=0)
Out:
array([[2. , 2. ],
[0.5, 2. ]])
It works for any number of dimensions.
As of NumPy 1.20.0 released in January 2021 we have a permuted() method on the new Generator type (introduced with the new random API in NumPy 1.17.0, released in July 2019). This does exactly what you need:
import numpy as np
rng = np.random.default_rng()
a = np.array([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
])
shuffled = rng.permuted(a, axis=1)
This gives you something like
>>> print(shuffled)
[[2 3 1]
[1 3 2]
[2 1 3]]
As you can see, the rows are permuted independently. This is in sharp contrast with both rng.permutation() and rng.shuffle().
If you want an in-place update you can pass the original array as the out keyword argument. And you can use the axis keyword argument to choose the direction along which to shuffle your array.
I'm trying to convert my MATLAB code to python but I'm having some issues. This code is supposed to segment letters from a picture.
Here's the whole code in MATLAB:
he = imread('r.jpg');
imshow(he);
%C = makecform(type) creates the color transformation structure C that defines the color space conversion specified by type.
cform = makecform('srgb2lab');
%To perform the transformation, pass the color transformation structure as an argument to the applycform function.
lab_he = applycform(he,cform);
%convert to double precision
ab = double(lab_he(:,:,2:3));
%size of dimension in 2D array
nrows = size(ab,1);
ncols = size(ab,2);
%B = reshape(A,sz1,...,szN) reshapes A into a sz1-by-...-by-szN array where
%sz1,...,szN indicates the size of each dimension. You can specify a single
% dimension size of [] to have the dimension size automatically calculated,
% such that the number of elements in B matches the number of elements in A.
% For example, if A is a 10-by-10 matrix, then reshape(A,2,2,[]) reshapes
% the 100 elements of A into a 2-by-2-by-25 array.
ab = reshape(ab,nrows*ncols,2);
% repeat the clustering 3 times to avoid local minima
nColors = 3;
[cluster_idx, cluster_center] = kmeans(ab,nColors,'distance','sqEuclidean', ...
'Replicates',3);
pixel_labels = reshape(cluster_idx,nrows,ncols);
imshow(pixel_labels,[]), title('image labeled by cluster index');
segmented_images = cell(1,3);
rgb_label = repmat(pixel_labels,[1 1 3]);
for k = 1:nColors
color = he;
color(rgb_label ~= k) = 0;
segmented_images{k} = color;
end
figure,imshow(segmented_images{1}), title('objects in cluster 1');
figure,imshow(segmented_images{2}), title('objects in cluster 2');
figure,imshow(segmented_images{3}), title('objects in cluster 3');
mean_cluster_value = mean(cluster_center,2);
[tmp, idx] = sort(mean_cluster_value);
blue_cluster_num = idx(1);
L = lab_he(:,:,1);
blue_idx = find(pixel_labels == blue_cluster_num);
L_blue = L(blue_idx);
is_light_blue = im2bw(L_blue,graythresh(L_blue));
% target_labels = repmat(uint8(0),[nrows ncols]);
% target_labels(blue_idx(is_light_blue==false)) = 1;
% target_labels = repmat(target_labels,[1 1 3]);
% blue_target = he;
% blue_target(target_labels ~= 1) = 0;
% figure,imshow(blue_target), title('blue');
Here's what I have in Python so far:
import cv2
import numpy as np
from matplotlib import pyplot as plt
import sys
img = cv2.imread('r.jpg',1)
print "original image: ", img.shape
cv2.imshow('BGR', img)
img1 = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img2 = cv2.cvtColor(img1, cv2.COLOR_RGB2LAB)
cv2.imshow('RGB', img1)
cv2.imshow('LAB', img2) #differs from the LAB color space in MATLAB (need to patch maybe?)
print "LAB converted image: ", img2.shape
print "LAB converted image dimension", img2.ndim #says the image is a 3 dimensional array
img2a = img2.shape[2][1:2]
print "Slicing the LAB converted image", img2a
#we need to convert that to double precision
print img2.dtype
img2a = img2.astype(np.uint64) #convert to double precision
print img2a.dtype
#print img2a
row = img2a.shape[0] #returns number of rows of img2a
column = img2a.shape[1] #returns number of columns of img2a
print "row: ", row #matches the MATLAB version
print "column: ", column #matchees the MATLAB version
rowcol = row * column
k = cv2.waitKey(0)
if k == 27: # wait for ESC key to exit
cv2.destroyAllWindows()
elif k == ord('s'): # wait for 's' key to save and exit
cv2.imwrite('final image',final_image)
cv2.destroyAllWindows()
Now the part i'm currently stuck in is that here in Matlab code, I have lab_he(:,:,2:3) which means all the rows and all the columns from depth 2 and 3. However when I try to replicate that in Python img2a = img2.shape[2][1:2] but it doesn't work or makes sense. Please help.
In Octave/MATLAB
octave:29> x=reshape(1:(2*3*4),3,2,4);
octave:30> x(:,:,2:3)
ans =
ans(:,:,1) =
7 10
8 11
9 12
ans(:,:,2) =
13 16
14 17
15 18
octave:31> size(x(:,:,2:3))
ans =
3 2 2
octave:33> x(:,:,2:3)(2,2,:)
ans(:,:,1) = 11
ans(:,:,2) = 17
In numpy:
In [13]: x=np.arange(1,1+2*3*4).reshape(3,2,4,order='F')
In [14]: x[:,:,1:3]
Out[14]:
array([[[ 7, 13],
[10, 16]],
[[ 8, 14],
[11, 17]],
[[ 9, 15],
[12, 18]]])
In [15]: _.shape
Out[15]: (3, 2, 2)
In [17]: x[:,:,1:3][1,1,:]
Out[17]: array([11, 17])
Or with numpy normal 'C' ordering, and indexing on the 1st dimension
In [18]: y=np.arange(1,1+2*3*4).reshape(4,2,3)
In [19]: y[1:3,:,:]
Out[19]:
array([[[ 7, 8, 9],
[10, 11, 12]],
[[13, 14, 15],
[16, 17, 18]]])
In [20]: y[1:3,:,:][:,1,1]
Out[20]: array([11, 17])
Same indexing ideas, though matching numbers and shapes requires some care, not only with the 0 v 1 index base. A 3d array is displayed in a different arangement. Octave divides it into blocks on the last index (its primary iterator), numpy iterates on the first index.
In 3d it makes more sense to talk about first, 2nd, 3rd dimensions rather than row,col,depth. In 4d you run out of names. :)
I had to reshape array at specific depth, and I programmed a little recursive function to do so:
def recursive_array_cutting(tab, depth, i, min, max):
if(i==depth):
tab = tab[min:max]
return tab
temp_list = []
nb_subtab = len(tab)
for index in range(nb_subtab):
temp_list.append(recursive_array_cutting(tab[index], depth, i+1, min, max))
return np.asanyarray(temp_list)
It allow to get all array/values between the min and the max of a specific depth, for instance, if you have a (3, 4) tab = [[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3]] and only want the last two values of the deepest array, you call like this : tab = recursive_array_cutting(tab, 1, 0, 0, 2) to get the output : [[0 1][0 1][0 1]].
If you have a more complexe array like this tab = [[[0, 1, 2, 3], [1, 1, 2, 3], [2, 1, 2, 3]], [[0, 1, 2, 3], [1, 1, 2, 3], [2, 1, 2, 3]], [[0, 1, 2, 3], [1, 1, 2, 3], [2, 1, 2, 3]]] (3, 3, 4) and want a (3, 2, 4) array, you can call like this : tab = recursive_array_cutting(tab, 1, 0, 0, 2) to get this output, and get rid of the last dimension in depth 1.
Function like this surely exist in numpy, but I did not found it.
I have numpy ndarrays which could be 3 or 4 dimensional. I'd like to find maximum values and their indices in a moving subarray window with specified strides.
For example, suppose I have a 4x4 2d array and my moving subarray window is 2x2 with stride 2 for simplicity:
[[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9,10,11,12],
[13,14,15,16]].
I'd like to find
[[ 6 8],
[14 16]]
for max values and
[(1,1), (3,1),
(3,1), (3,3)]
for indices as output.
Is there a concise, efficient implementation for this for ndarray without using loops?
Here's a solution using stride_tricks:
def make_panes(arr, window):
arr = np.asarray(arr)
r,c = arr.shape
s_r, s_c = arr.strides
w_r, w_c = window
if c % w_c != 0 or r % w_r != 0:
raise ValueError("Window doesn't fit array.")
shape = (r / w_r, c / w_c, w_r, w_c)
strides = (w_r*s_r, w_c*s_c, s_r, s_c)
return np.lib.stride_tricks.as_strided(arr, shape, strides)
def max_in_panes(arr, window):
w_r, w_c = window
r, c = arr.shape
panes = make_panes(arr, window)
v = panes.reshape((-1, w_r * w_c))
ix = np.argmax(v, axis=1)
max_vals = v[np.arange(r/w_r * c/w_c), ix]
i = np.repeat(np.arange(0,r,w_r), c/w_c)
j = np.tile(np.arange(0, c, w_c), r/w_r)
rel_i, rel_j = np.unravel_index(ix, window)
max_ix = i + rel_i, j + rel_j
return max_vals, max_ix
A demo:
>>> vals, ix = max_in_panes(x, (2,2))
>>> print vals
[[ 6 8]
[14 16]]
>>> print ix
(array([1, 1, 3, 3]), array([1, 3, 1, 3]))
Note that this is pretty untested, and is designed to work with 2d arrays. I'll leave the generalization to n-d arrays to the reader...
The question is the inverse of this question. I'm looking for a generic method to from the original big array from small arrays:
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[ 3, 4, 5],
[ 9, 10, 11]],
[[12, 13, 14],
[18, 19, 20]],
[[15, 16, 17],
[21, 22, 23]]])
->
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
I am currently developing a solution, will post it when it's done, would however like to see other (better) ways.
import numpy as np
def blockshaped(arr, nrows, ncols):
"""
Return an array of shape (n, nrows, ncols) where
n * nrows * ncols = arr.size
If arr is a 2D array, the returned array looks like n subblocks with
each subblock preserving the "physical" layout of arr.
"""
h, w = arr.shape
return (arr.reshape(h//nrows, nrows, -1, ncols)
.swapaxes(1,2)
.reshape(-1, nrows, ncols))
def unblockshaped(arr, h, w):
"""
Return an array of shape (h, w) where
h * w = arr.size
If arr is of shape (n, nrows, ncols), n sublocks of shape (nrows, ncols),
then the returned array preserves the "physical" layout of the sublocks.
"""
n, nrows, ncols = arr.shape
return (arr.reshape(h//nrows, -1, nrows, ncols)
.swapaxes(1,2)
.reshape(h, w))
For example,
c = np.arange(24).reshape((4,6))
print(c)
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]
# [12 13 14 15 16 17]
# [18 19 20 21 22 23]]
print(blockshaped(c, 2, 3))
# [[[ 0 1 2]
# [ 6 7 8]]
# [[ 3 4 5]
# [ 9 10 11]]
# [[12 13 14]
# [18 19 20]]
# [[15 16 17]
# [21 22 23]]]
print(unblockshaped(blockshaped(c, 2, 3), 4, 6))
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]
# [12 13 14 15 16 17]
# [18 19 20 21 22 23]]
Note that there is also superbatfish's
blockwise_view. It arranges the
blocks in a different format (using more axes) but it has the advantage of (1)
always returning a view and (2) being capable of handing arrays of any
dimension.
Yet another (simple) approach:
threedarray = ...
twodarray = np.array(map(lambda x: x.flatten(), threedarray))
print(twodarray.shape)
I hope I get you right, let's say we have a,b :
>>> a = np.array([[1,2] ,[3,4]])
>>> b = np.array([[5,6] ,[7,8]])
>>> a
array([[1, 2],
[3, 4]])
>>> b
array([[5, 6],
[7, 8]])
in order to make it one big 2d array use numpy.concatenate:
>>> c = np.concatenate((a,b), axis=1 )
>>> c
array([[1, 2, 5, 6],
[3, 4, 7, 8]])
It works for the images I tested for now. Will if further tests are made. It is however a solution which takes no account about speed and memory usage.
def unblockshaped(blocks, h, w):
n, nrows, ncols = blocks.shape
bpc = w/ncols
bpr = h/nrows
reconstructed = zeros((h,w))
t = 0
for i in arange(bpr):
for j in arange(bpc):
reconstructed[i*nrows:i*nrows+nrows,j*ncols:j*ncols+ncols] = blocks[t]
t = t+1
return reconstructed
Here is a solution that one can use if someone is wishing to create tiles of a matrix:
from itertools import product
import numpy as np
def tiles(arr, nrows, ncols):
"""
If arr is a 2D array, the returned list contains nrowsXncols numpy arrays
with each array preserving the "physical" layout of arr.
When the array shape (rows, cols) are not divisible by (nrows, ncols) then
some of the array dimensions can change according to numpy.array_split.
"""
rows, cols = arr.shape
col_arr = np.array_split(range(cols), ncols)
row_arr = np.array_split(range(rows), nrows)
return [arr[r[0]: r[-1]+1, c[0]: c[-1]+1]
for r, c in product(row_arr, col_arr)]