How to randomly shuffle "tiles" in a numpy array - python

I have an nxn numpy array, and I would like to divide it evenly into nxn tiles and randomly shuffle these, while retaining the pattern inside the tiles.
For example, if I have an array that's size (200,200), I want to be able to divide this into say 16 arrays of size (50,50), or even 64 arrays of size (25,25), and randomly shuffle these, while retaining the same shape of the original array (200,200) and retaining the order of numbers inside of the smaller arrays.
I have looked up specific numpy functions, and I found the numpy.random.shuffle(x) function, but this will randomly shuffle the individual elements of an array. I would only like to shuffle these smaller arrays within the larger array.
Is there any numpy function or quick way that will do this? I'm not sure where to begin.
EDIT: To further clarify exactly what I want:
Let's say I have an input 2D array of shape (10,10) of values:
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
I choose a tile size such that it fits evenly into this array, so since this array has shape (10,10), I can either choose to split this into 4 (5,5) tiles, or 25 (2,2) tiles. So if I choose 4 (5,5) tiles, I want to randomly shuffle these tiles that results in an output array that could look like this:
50 51 52 53 54 0 1 2 3 4
60 61 62 63 64 10 11 12 13 14
70 71 72 73 74 20 21 22 23 24
80 81 82 83 84 30 31 32 33 34
90 91 92 93 94 40 41 42 43 44
55 56 57 58 59 5 6 7 8 9
65 66 67 68 69 15 16 17 18 19
75 76 77 78 79 25 26 27 28 29
85 86 87 88 89 35 36 37 38 39
95 96 97 98 99 45 46 47 48 49
Every array (both the input array, the output array, and the separate tiles) would be squares, so that when randomly shuffled the size and dimension of the main array stays the same (10,10).

here is my solution using loop
import numpy as np
arr = np.arange(36).reshape(6,6)
def suffle_section(arr, n_sections):
assert arr.shape[0]==arr.shape[1], "arr must be square"
assert arr.shape[0]%n_sections == 0, "arr size must divideable into equal n_sections"
size = arr.shape[0]//n_sections
new_arr = np.empty_like(arr)
## randomize section's row index
rand_indxes = np.random.permutation(n_sections*n_sections)
for i in range(n_sections):
## randomize section's column index
for j in range(n_sections):
rand_i = rand_indxes[i*n_sections + j]//n_sections
rand_j = rand_indxes[i*n_sections + j]%n_sections
new_arr[i*size:(i+1)*size, j*size:(j+1)*size] = \
arr[rand_i*size:(rand_i+1)*size, rand_j*size:(rand_j+1)*size]
return new_arr
result = suffle_section(arr, 3)
display(arr)
display(result)
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
array([[ 4, 5, 16, 17, 24, 25],
[10, 11, 22, 23, 30, 31],
[14, 15, 2, 3, 0, 1],
[20, 21, 8, 9, 6, 7],
[26, 27, 12, 13, 28, 29],
[32, 33, 18, 19, 34, 35]])

If you have access to skimage (it comes with Spyder) you could use view_as_blocks:
from skimage.util import view_as_blocks
def shuffle_tiles(arr, m, n):
a_= view_as_blocks(arr,(m,n)).reshape(-1,m,n)
# shuffle works along 1st dimension and in-place
np.random.shuffle(a_)
return a_

We will use np.random.shuffle alongwith axes permutations to achieve the desired results. There are two interpretations to it. Hence, two solutions.
Shuffle randomly within each block
Elements in each block are randomized and that same randomized order is maintaiined in all blocks.
def randomize_tiles_shuffle_within(a, M, N):
# M,N are the height and width of the blocks
m,n = a.shape
b = a.reshape(m//M,M,n//N,N).swapaxes(1,2).reshape(-1,M*N)
np.random.shuffle(b.T)
return b.reshape(m//M,n//N,M,N).swapaxes(1,2).reshape(a.shape)
Shuffle randomly blocks w.r.t each other
Blocks are randomized w.r.t each other, while keeping the order within each block same as in the original array.
def randomize_tiles_shuffle_blocks(a, M, N):
m,n = a.shape
b = a.reshape(m//M,M,n//N,N).swapaxes(1,2).reshape(-1,M*N)
np.random.shuffle(b)
return b.reshape(m//M,n//N,M,N).swapaxes(1,2).reshape(a.shape)
Sample runs -
In [47]: a
Out[47]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
In [48]: randomize_tiles_shuffle_within(a, 3, 3)
Out[48]:
array([[ 1, 7, 13, 4, 10, 16],
[14, 8, 12, 17, 11, 15],
[ 0, 6, 2, 3, 9, 5],
[19, 25, 31, 22, 28, 34],
[32, 26, 30, 35, 29, 33],
[18, 24, 20, 21, 27, 23]])
In [49]: randomize_tiles_shuffle_blocks(a, 3, 3)
Out[49]:
array([[ 3, 4, 5, 18, 19, 20],
[ 9, 10, 11, 24, 25, 26],
[15, 16, 17, 30, 31, 32],
[ 0, 1, 2, 21, 22, 23],
[ 6, 7, 8, 27, 28, 29],
[12, 13, 14, 33, 34, 35]])

Here is an approach that tries hard to avoid unnecessary copies:
import numpy as np
def f_pp(a,bs):
i,j = a.shape
k,l = bs
esh = i//k,k,j//l,l
bc = esh[::2]
sh1,sh2 = np.unravel_index(np.random.permutation(bc[0]*bc[1]),bc)
ns1,ns2 = np.unravel_index(np.arange(bc[0]*bc[1]),bc)
out = np.empty_like(a)
out.reshape(esh)[ns1,:,ns2] = a.reshape(esh)[sh1,:,sh2]
return out
Timings:
pp 0.41529153706505895
dv 1.3133141631260514
br 1.6034217830747366
Test script (continued)
# Divakar
def f_dv(a,bs):
M,N = bs
m,n = a.shape
b = a.reshape(m//M,M,n//N,N).swapaxes(1,2).reshape(-1,M*N)
np.random.shuffle(b)
return b.reshape(m//M,n//N,M,N).swapaxes(1,2).reshape(a.shape)
from skimage.util import view_as_blocks
# Brenlla shape fixed by pp
def f_br(arr,bs):
m,n = bs
a_= view_as_blocks(arr,(m,n))
sh = a_.shape
a_ = a_.reshape(-1,m,n)
# shuffle works along 1st dimension and in-place
np.random.shuffle(a_)
return a_.reshape(sh).swapaxes(1,2).reshape(arr.shape)
ex = np.arange(100000).reshape(1000,100)
bs = 10,10
tst = np.tile(np.arange(np.prod(bs)).reshape(bs),np.floor_divide(ex.shape,bs))
from timeit import timeit
for n,f in list(globals().items()):
if n.startswith('f_'):
assert (tst==f(tst,bs)).all()
print(n[2:],timeit(lambda:f(ex,bs),number=1000))

Here's code to shuffle row order but keep row items exactly as is:
import numpy as np
np.random.seed(0)
#creates a 6x6 array
a = np.random.randint(0,100,(6,6))
a
array([[44, 47, 64, 67, 67, 9],
[83, 21, 36, 87, 70, 88],
[88, 12, 58, 65, 39, 87],
[46, 88, 81, 37, 25, 77],
[72, 9, 20, 80, 69, 79],
[47, 64, 82, 99, 88, 49]])
#creates a number for each row index, 0,1,2,3,4,5
order = np.arange(6)
#shuffle index array
np.random.shuffle(order)
#make new array in shuffled order
shuffled = np.array([a[y] for y in order])
shuffled
array([[46, 88, 81, 37, 25, 77],
[88, 12, 58, 65, 39, 87],
[83, 21, 36, 87, 70, 88],
[47, 64, 82, 99, 88, 49],
[44, 47, 64, 67, 67, 9],
[72, 9, 20, 80, 69, 79]])

Related

Numpy - select multiple squares from 2d array

I have a 2d array, and a list of <start_y, height, start_x, width>.
What I need is to select squares according to the list,
So, for example if this is my 2d array:
[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 23 24 25 26 27 28 29]
[30 31 32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47 48 49]]
and the list is:
[[1,3,2,5],
[2,2,0,3]]
I need the output to be:
[[12,13,14,15,16,
22,23,24,25,26,
32,33,34,35,36],
[20,21,22,
30,31,32]]
i.e. - the first square starts from index 1 in the y axis, with height of 3, and in index 2 in the x axis with width of 5 - and the same logic for the second element in the list.
I obviously tried things like arr[l[:,0]:l[:,0]+l[:,1],l[:,2]:l[:,2]+l[:,3]] where arr is the array and l is the list, but it all returned an invalid syntax error.
I guess the solution involves advanced broadcasting, but I couldn't figure it out by my own.
Any help will be appreciated!
EDIT:
I'm looking for a solution without for loop (it is currently implemented with a loop, and I'm looking to make my code more efficient).
Here's a difficulty: your squares are of different sizes. Most broadcasting or useful functions will result in one array. If your squares were the same size, we could probably figure out how to get a stacked version of them into a 3d array. But if they're different sizes, how would we stack them? Nothing wrong with a for-loop here.
Read this: https://numpy.org/doc/stable/reference/arrays.indexing.html
No "advanced broadcasting" needed.
import numpy as np
arr = np.array([
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19,],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29,],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39,],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49,],
])
coords = np.array([
[1,3,2,5],
[2,2,0,3],
])
for coord in coords:
y, h, x, w = coord
sq = arr[y:y+h, x:x+w]
print(sq)
Might be easier in your code to make a function
def get_square(arr, coord):
y, h, x, w = coord
return arr[y:y+h, x:x+w]

Creating a list with 3 values every 3 values

I'm having troubles writing this piece of code.
I need to create a list to only have 3 values every 3 values :
The expected output must be something like :
output1 = [1,2,3,7,8,9,13,14,15,....67,68,69]
output2 = [4,5,6,10,11,12...70,71,72]
Any ideas how can I reach that ?
Use two loops -- one for each group of three, and one for each item within that group. For example:
>>> [i*6 + j for i in range(12) for j in range(1, 4)]
[1, 2, 3, 7, 8, 9, 13, 14, 15, 19, 20, 21, 25, 26, 27, 31, 32, 33, 37, 38, 39, 43, 44, 45, 49, 50, 51, 55, 56, 57, 61, 62, 63, 67, 68, 69]
>>> [i*6 + j for i in range(12) for j in range(4, 7)]
[4, 5, 6, 10, 11, 12, 16, 17, 18, 22, 23, 24, 28, 29, 30, 34, 35, 36, 40, 41, 42, 46, 47, 48, 52, 53, 54, 58, 59, 60, 64, 65, 66, 70, 71, 72]
Suppose you want n values every n values of total sets starting with start. Just change the start and number of sets you need. In below example list start with 1, so first set [1,2,3] and we need 12 sets each containing 3 consecutive element
Method 1
n = 3
start = 1
total = 12
# 2*n*i + start is first element of every set of n tuples (Arithmetic progression)
print([j for i in range(total) for j in range(2*n*i + start, 2*n*i + start+n)])
# Or
print(sum([list(range(2*n*i + start, 2*n*i + start+n)) for i in range(total)], []))
Method 2 (Numpy does operation in C, so fast)
import numpy as np
n = 3
start = 1
total = 12
# One liner
print(
(np.arange(start, start + n, step=1)[:, np.newaxis] + np.arange(0, total, 1) * 2*n).transpose().reshape(-1)
)
##############EXPLAINATION OF ABOVE ONE LINEAR########################
# np.arange start, start+1, ... start + n - 1
first_set = np.arange(start, start + n, step=1)
# [1 2 3]
# np.arange 0, 2*n, 4*n, 6*n, ....
multiple_to_add = np.arange(0, total, 1) * 2*n
print(multiple_to_add)
# broadcast first set using np.newaxis and repeatively add to each element in multiple_to_add
each_set_as_col = first_set[:, np.newaxis] + multiple_to_add
# [[ 1 7 13 19 25 31 37 43 49 55 61 67]
# [ 2 8 14 20 26 32 38 44 50 56 62 68]
# [ 3 9 15 21 27 33 39 45 51 57 63 69]]
# invert rows and columns
each_set_as_row = each_set_as_col.transpose()
# [[ 1 2 3]
# [ 7 8 9]
# [13 14 15]
# [19 20 21]
# [25 26 27]
# [31 32 33]
# [37 38 39]
# [43 44 45]
# [49 50 51]
# [55 56 57]
# [61 62 63]
# [67 68 69]]
merge_all_set_in_single_row = each_set_as_row.reshape(-1)
# array([ 1, 2, 3, 7, 8, 9, 13, 14, 15, 19, 20, 21, 25, 26, 27, 31, 32,
# 33, 37, 38, 39, 43, 44, 45, 49, 50, 51, 55, 56, 57, 61, 62, 63, 67,
# 68, 69])
To make the logic understandable, because sometimes the Pythonic methods look 'magic'
Here's a naive algorithm to do that:
output1 = []
output2 = []
for i in range(1, 100): # change as you like:
if (i-1) % 6 < 3:
output1.append(i)
else:
output2.append(i)
What's going on here:
Initializing two empty lists.
Iterate through integers in a range.
How to tell if i should go to output1 or output2:
I can see that 3 consecutive numbers go to output1, then 3 consecutive to output2.
This tells me I can use the modulo % operator, (doing % 6)
The rest is simple logic to get the exact result wanted.

Regarding ndarray creation in julia: Stacking in extra dimension

I would like to convert the following python code into julia:
import numpy as np
x = np.random.random([4,5,6])
y = np.array([[x, x, x ],
[2*x,3*x,4*x]])
print(y.shape)
-> (2, 3, 4, 5, 6)
In julia, the analogous syntax seems to me is
x = rand(4,5,6)
y = [x x x; 2x 3x 4x]
println(size(y))
-> (8, 15, 6)
These results are different. Can you tell me how to do it?
Using random numbers and multipliers obscures the details which you seek. Let's do consecutive numbering and try to get Python and Julia to display alike:
python>>>>>> z = np.reshape(np.array(range(1,121)), [4, 5, 6])
>>> z
array([[[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[ 13, 14, 15, 16, 17, 18],
[ 19, 20, 21, 22, 23, 24],
[ 25, 26, 27, 28, 29, 30]],
[[ 31, 32, 33, 34, 35, 36],
[ 37, 38, 39, 40, 41, 42],
[ 49, 50, 51, 52, 53, 54],
[ 55, 56, 57, 58, 59, 60]],
[[ 61, 62, 63, 64, 65, 66],
[ 67, 68, 69, 70, 71, 72],
[ 73, 74, 75, 76, 77, 78],
[ 79, 80, 81, 82, 83, 84],
[ 85, 86, 87, 88, 89, 90]],
[[ 91, 92, 93, 94, 95, 96],
[ 97, 98, 99, 100, 101, 102],
[103, 104, 105, 106, 107, 108],
[109, 110, 111, 112, 113, 114],
[115, 116, 117, 118, 119, 120]]])
julia>z = reshape(1:120, 6, 5, 4)
6×5×4 reshape(::UnitRange{Int64}, 6, 5, 4) with eltype Int64:
[:, :, 1] =
1 7 13 19 25
2 8 14 20 26
3 9 15 21 27
4 10 16 22 28
5 11 17 23 29
6 12 18 24 30
[:, :, 2] =
31 37 43 49 55
32 38 44 50 56
33 39 45 51 57
34 40 46 52 58
35 41 47 53 59
36 42 48 54 60
[:, :, 3] =
61 67 73 79 85
62 68 74 80 86
63 69 75 81 87
64 70 76 82 88
65 71 77 83 89
66 72 78 84 90
[:, :, 4] =
91 97 103 109 115
92 98 104 110 116
93 99 105 111 117
94 100 106 112 118
95 101 107 113 119
96 102 108 114 120
So, if you want things to print similarly on the screen, you need to swap first and last dimension sizes (reverse the order of dimensions) on the arrays between Julia and Python. In addition, since Julia concatenates the arrays when you put them in the same brackets, but Python just nests its arrays in greater depth, you need to use np.reshape on Python or reshape on Julia to change the arrays to the shape you want. I suggest you check the resulting arrays on a consecutive set of integers to be sure they print alike before going back to your random floating point numbers. Remember that the indexing order is different when you access elements, too. Consider
>>> zzz = np.array([[z,z,z], [z,z,z]]) # python
> zzz = reshape([z z z; z z z], 6, 5, 4, 3, 2) # julia

removing all duplicates of max numbers in a correlation table

I need code for taking a .csv of a correlation table, sample of the table is posted here:
AA bb cc dd ff
AA 100 87 71 71 78
bb 87 100 73 74 81
cc 71 73 100 96 69
dd 71 74 96 100 71
ee 78 81 69 100 100
ff 72 73 68 68 71
Pg 68 69 62 62 64
Ph 68 69 69 62 64
Pi 68 69 62 62 64
Pj 68 69 63 63 64
Pk 70 71 65 65 67
I currently have read the .csv file with python's .csv module as a list of lists. I then removed the first column and row. And am now trying to take these int values and find the max values of each row. If there are multiple max values in a row, I want those values as well.
Then I intend to place that output into a table
file1values col row %
group1 AA AA 100
...
group1 dd ee 100
group1 ff ee 100
The issue I have so far is getting the max values for each row. Also I think I would be a bit confused on how to get the address (the col and row) for each max value.
Here is code so far:
from io import StringIO
import csv
import numpy as np
with open('/home/group1.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
data_as_list = list(reader)
a = np.array(data_as_list)
a = np.delete(a, (0), axis=0)
a = np.delete(a, (0), axis=1)
np.set_printoptions(threshold=np.nan)
print (a)
print ('')
count = 0
b = (a.astype(int))
maxArr = []
while (count < b.shape[0]):
print (b[count])
count = count + 1
maxArr.append(max(b[count - 1]))
print (maxArr)
there are easier ways...
create a random matrix for tests
> import numpy as np
> m=np.random.randint(100,size=(10,10))
set diagonal to zero (or set to an out of range negative number)
> np.fill_diagonal(m,0)
array([[ 0, 35, 52, 40, 54, 1, 20, 41, 62, 92],
[45, 0, 75, 71, 85, 86, 83, 39, 52, 69],
[29, 21, 0, 78, 32, 14, 13, 27, 31, 26],
[99, 90, 16, 0, 28, 36, 30, 45, 85, 41],
[29, 21, 48, 31, 0, 86, 18, 7, 70, 76],
[96, 97, 34, 82, 51, 0, 69, 22, 27, 85],
[71, 58, 98, 42, 3, 51, 0, 19, 41, 93],
[54, 97, 86, 75, 62, 91, 78, 0, 55, 89],
[87, 44, 44, 54, 94, 94, 57, 24, 0, 81],
[94, 32, 1, 92, 34, 46, 96, 38, 75, 0]])
find the maximum values per column/row (since your matrix is symmetric doesn't matter)
> cm=np.argmax(m,1)
array([9, 5, 3, 0, 5, 1, 2, 1, 4, 6])
You will need to map the row/column indices to your labels.
> for r in range(10):
print(r,cm[r],m[r,cm[r]])
0 9 92
1 5 86
2 3 78
3 0 99
4 5 86
5 1 97
6 2 98
7 1 97
8 4 94
9 6 96

Iterate over a matrix, sum over some rows and add the result to another array

Hi there I have the following matrix
[[ 47 43 51 81 54 81 52 54 31 46]
[ 35 21 30 16 37 11 35 30 39 37]
[ 8 17 11 2 5 4 11 9 17 10]
[ 5 9 4 0 1 1 0 3 9 3]
[ 2 7 2 0 0 0 0 1 2 1]
[215 149 299 199 159 325 179 249 249 199]
[ 27 49 24 4 21 8 35 15 45 25]
[100 100 100 100 100 100 100 100 100 100]]
I need to iterate over the matrix summing all elements in rows 0,1,2,3,4 only
example: I need
row_0_sum = 47+43+51+81....46
Furthermore I need to store each rows sum in an array like this
[row0_sum, row1_sum, row2_sum, row3_sum, row4_sum]
So far I have tried this code but its not doing the job:
mu = np.zeros(shape=(1,6))
#get an average
def standardize_ratings(matrix):
sum = 0
for i, eli in enumerate(matrix):
for j, elj in enumerate(eli):
if(i<5):
sum = sum + matrix[i][j]
if(j==elj.len -1):
mu[i] = sum
sum = 0
print "mu[i]="
print mu[i]
This just gives me an Error: numpy.int32 object has no attribute 'len'
So can someone help me. What's the best way to do this and which type of array in Python should I use to store this. Im new to Python but have done programming....
Thannks
Make your data, matrix, a numpy.ndarray object, instead of a list of lists, and then just do matrix.sum(axis=1).
>>> matrix = np.asarray([[ 47, 43, 51, 81, 54, 81, 52, 54, 31, 46],
[ 35, 21, 30, 16, 37, 11, 35, 30, 39, 37],
[ 8, 17, 11, 2, 5, 4, 11, 9, 17, 10],
[ 5, 9, 4, 0, 1, 1, 0, 3, 9, 3],
[ 2, 7, 2, 0, 0, 0, 0, 1, 2, 1],
[215, 149, 299, 199, 159, 325, 179, 249, 249, 199],
[ 27, 49, 24, 4, 21, 8, 35, 15, 45, 25],
[100, 100, 100, 100, 100, 100, 100, 100, 100, 100]])
>>> print matrix.sum(axis=1)
[ 540 291 94 35 15 2222 253 1000]
To get the first five rows from the result, you can just do:
>>> row_sums = matrix.sum(axis=1)
>>> rows_0_through_4_sums = row_sums[:5]
>>> print rows_0_through_4_sums
[540 291 94 35 15]
Or, you can alternatively sub-select only those rows to begin with and only apply the summation to them:
>>> rows_0_through_4 = matrix[:5,:]
>>> print rows_0_through_4.sum(axis=1)
[540 291 94 35 15]
Some helpful links will be:
NumPy for Matlab Users, if you are familiar with these things in Matlab/Octave
Slicing/Indexing in NumPy

Categories

Resources