Related
I need to populate a 2D array whose shape is 3xN, where N is initially unknown. The code looks as follows:
import numpy as np
import random
nruns = 5
all_data = [[]]
for run in range(nruns):
n = random.randint(1,10)
d1 = random.sample(range(0, 30), n)
d2 = random.sample(range(0, 30), n)
d3 = random.sample(range(0, 30), n)
data_tmp = [d1, d2, d3]
all_data = np.concatenate((all_data,data_tmp),axis=0)
This gives the following error:
ValueError Traceback (most recent call last)
<ipython-input-103-22af8f04e7c0> in <module>
10 d3 = random.sample(range(0, 30), n)
11 data_tmp = [d1, d2, d3]
---> 12 all_data = np.concatenate((all_data,data_tmp),axis=0)
13 print(np.shape(data_tmp))
<__array_function__ internals> in concatenate(*args, **kwargs)
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 1 has size 4
Is there a way to do this without pre-allocating all_data? Note that in my application, the data will not be random, but generated inside the loop.
Many thanks!
You could store the data generated in each step of the for loop into a list and create the array when you are done.
In [298]: import numpy as np
...: import random
In [299]: nruns = 5
...: all_data = []
In [300]: for run in range(nruns):
...: n = random.randint(1,10)
...: d1 = random.sample(range(0, 30), n)
...: d2 = random.sample(range(0, 30), n)
...: d3 = random.sample(range(0, 30), n)
...: all_data.append([d1, d2, d3])
In [301]: all_data = np.hstack(all_data)
In [302]: all_data
Out[302]:
array([[13, 28, 14, 15, 11, 0, 0, 19, 6, 28, 14, 18, 1, 15, 4, 20,
9, 14, 15, 13, 27, 28, 25, 5, 7, 4, 10, 22, 12, 6, 23, 15,
0, 20, 14, 5, 13],
[10, 9, 23, 4, 25, 28, 17, 14, 3, 4, 5, 9, 7, 18, 23, 9,
14, 15, 25, 26, 29, 12, 21, 0, 5, 6, 11, 27, 13, 26, 22, 14,
6, 5, 7, 23, 0],
[13, 0, 7, 14, 29, 26, 12, 16, 13, 3, 9, 6, 11, 2, 19, 17,
28, 14, 25, 24, 3, 12, 22, 7, 23, 18, 5, 14, 0, 14, 15, 8,
3, 2, 26, 21, 16]])
See if this is what you need, i.e. populate along axis 1 instead of 0.
import numpy as np
import random
nruns = 5
all_data = [[], [], []]
for run in range(nruns):
n = random.randint(1,10)
d1 = random.sample(range(0, 30), n)
d2 = random.sample(range(0, 30), n)
d3 = random.sample(range(0, 30), n)
data_tmp = [d1, d2, d3]
all_data = np.concatenate((all_data, data_tmp), axis=1)
How about using np.random only:
nruns = 5
# set seed for repeatability, remove for randomness
np.random.seed(42)
# randomize the lengths for the runs
num_samples = np.random.randint(1,10, nruns)
# sampling with the total length
all_data = np.random.randint(0,30, (3, num_samples.sum()))
# or, if `range(0,30)` represents some population
# all_data = np.random.choice(range(0,30), (3,num_samples.sum()) )
print(all_data)
Output:
[[25 18 22 10 10 23 20 3 7 23 2 21 20 1 23 11 29 5 1 27 20 0 11 25
21 28 11 24 16 26 26]
[ 9 27 27 15 14 29 29 14 29 18 11 22 19 24 2 4 18 6 20 8 6 17 3 24
27 13 17 25 8 25 20]
[ 1 19 27 14 27 6 11 28 7 14 2 13 16 3 17 7 3 1 29 5 21 9 3 21
28 17 25 11 1 9 29]]
I have a code in Matlab which I need to translate in Python. A point here that shapes and indexes are really important since it works with tensors. I'm a little bit confused since it seems that it's enough to use order='F' in python reshape(). But when I work with 3D data I noticed that it does not work. For example, if A is an array from 1 to 27 in python
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]],
[[19, 20, 21],
[22, 23, 24],
[25, 26, 27]]])
if I perform A.reshape(3, 9, order='F') I get
[[ 1 4 7 2 5 8 3 6 9]
[10 13 16 11 14 17 12 15 18]
[19 22 25 20 23 26 21 24 27]]
In Matlab for A = 1:27 reshaped to [3, 3, 3] and then to [3, 9] it seems that I get another array:
1 4 7 10 13 16 19 22 25
2 5 8 11 14 17 20 23 26
3 6 9 12 15 18 21 24 27
And SVD in Matlab and Python gives different results. So, is there a way to fix this?
And maybe you know the correct way of operating with multidimensional arrays in Matlab -> python, like should I get the same SVD for arrays like arange(1, 13).reshape(3, 4) and in Matlab 1:12 -> reshape(_, [3, 4]) or what is the correct way to work with that? Maybe I can swap axes somehow in python to get the same results as in Matlab? Or change the order of axes in reshape(x1, x2, x3,...) in Python?
I was having the same issues, until I found this wikipedia article: row- and column-major order
Python (and C) organizes the data arrays in row major order. As you can see in your first example code, the elements first increases with the columns:
array([[[ 1, 2, 3],
- - - -> increasing
Then in the rows
array([[[ 1, 2, 3],
[ 4, <--- new element
When all columns and rows are full, it moves to the next page.
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, <-- new element in next page
In matlab (as fortran) increases first the rows, then the columns, and so on.
For N-dimensionals arrays it looks like:
Python (row major -> last dimension is contiguous): [dim1,dim2,...,dimN]
Matlab (column major -> first dimension is contiguous): the same tensor in memory would look the other way around .. [dimN,...,dim2,dim1]
If you want to export n-dim. arrays from python to matlab, the easiest way is to permute the dimensions first:
(in python)
import numpy as np
import scipy.io as sio
A=np.reshape(range(1,28),[3,3,3])
sio.savemat('A',{'A':A})
(in matlab)
load('A.mat')
A=permute(A,[3 2 1]);%dimensions in reverse ordering
reshape(A,9,3)' %gives the same result as A.reshape([3,9]) in python
Just notice that the (9,3) an the (3,9) are intentionally putted in reverse order.
In Matlab
A = 1:27;
A = reshape(A,3,3,3);
B = reshape(A,9,3)'
B =
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27
size(B)
ans =
3 9
In Python
A = np.array(range(1,28))
A = A.reshape(3,3,3)
B = A.reshape(3,9)
B
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24, 25, 26, 27]])
np.shape(B)
(3, 9)
I have an nxn numpy array, and I would like to divide it evenly into nxn tiles and randomly shuffle these, while retaining the pattern inside the tiles.
For example, if I have an array that's size (200,200), I want to be able to divide this into say 16 arrays of size (50,50), or even 64 arrays of size (25,25), and randomly shuffle these, while retaining the same shape of the original array (200,200) and retaining the order of numbers inside of the smaller arrays.
I have looked up specific numpy functions, and I found the numpy.random.shuffle(x) function, but this will randomly shuffle the individual elements of an array. I would only like to shuffle these smaller arrays within the larger array.
Is there any numpy function or quick way that will do this? I'm not sure where to begin.
EDIT: To further clarify exactly what I want:
Let's say I have an input 2D array of shape (10,10) of values:
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
I choose a tile size such that it fits evenly into this array, so since this array has shape (10,10), I can either choose to split this into 4 (5,5) tiles, or 25 (2,2) tiles. So if I choose 4 (5,5) tiles, I want to randomly shuffle these tiles that results in an output array that could look like this:
50 51 52 53 54 0 1 2 3 4
60 61 62 63 64 10 11 12 13 14
70 71 72 73 74 20 21 22 23 24
80 81 82 83 84 30 31 32 33 34
90 91 92 93 94 40 41 42 43 44
55 56 57 58 59 5 6 7 8 9
65 66 67 68 69 15 16 17 18 19
75 76 77 78 79 25 26 27 28 29
85 86 87 88 89 35 36 37 38 39
95 96 97 98 99 45 46 47 48 49
Every array (both the input array, the output array, and the separate tiles) would be squares, so that when randomly shuffled the size and dimension of the main array stays the same (10,10).
here is my solution using loop
import numpy as np
arr = np.arange(36).reshape(6,6)
def suffle_section(arr, n_sections):
assert arr.shape[0]==arr.shape[1], "arr must be square"
assert arr.shape[0]%n_sections == 0, "arr size must divideable into equal n_sections"
size = arr.shape[0]//n_sections
new_arr = np.empty_like(arr)
## randomize section's row index
rand_indxes = np.random.permutation(n_sections*n_sections)
for i in range(n_sections):
## randomize section's column index
for j in range(n_sections):
rand_i = rand_indxes[i*n_sections + j]//n_sections
rand_j = rand_indxes[i*n_sections + j]%n_sections
new_arr[i*size:(i+1)*size, j*size:(j+1)*size] = \
arr[rand_i*size:(rand_i+1)*size, rand_j*size:(rand_j+1)*size]
return new_arr
result = suffle_section(arr, 3)
display(arr)
display(result)
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
array([[ 4, 5, 16, 17, 24, 25],
[10, 11, 22, 23, 30, 31],
[14, 15, 2, 3, 0, 1],
[20, 21, 8, 9, 6, 7],
[26, 27, 12, 13, 28, 29],
[32, 33, 18, 19, 34, 35]])
If you have access to skimage (it comes with Spyder) you could use view_as_blocks:
from skimage.util import view_as_blocks
def shuffle_tiles(arr, m, n):
a_= view_as_blocks(arr,(m,n)).reshape(-1,m,n)
# shuffle works along 1st dimension and in-place
np.random.shuffle(a_)
return a_
We will use np.random.shuffle alongwith axes permutations to achieve the desired results. There are two interpretations to it. Hence, two solutions.
Shuffle randomly within each block
Elements in each block are randomized and that same randomized order is maintaiined in all blocks.
def randomize_tiles_shuffle_within(a, M, N):
# M,N are the height and width of the blocks
m,n = a.shape
b = a.reshape(m//M,M,n//N,N).swapaxes(1,2).reshape(-1,M*N)
np.random.shuffle(b.T)
return b.reshape(m//M,n//N,M,N).swapaxes(1,2).reshape(a.shape)
Shuffle randomly blocks w.r.t each other
Blocks are randomized w.r.t each other, while keeping the order within each block same as in the original array.
def randomize_tiles_shuffle_blocks(a, M, N):
m,n = a.shape
b = a.reshape(m//M,M,n//N,N).swapaxes(1,2).reshape(-1,M*N)
np.random.shuffle(b)
return b.reshape(m//M,n//N,M,N).swapaxes(1,2).reshape(a.shape)
Sample runs -
In [47]: a
Out[47]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
In [48]: randomize_tiles_shuffle_within(a, 3, 3)
Out[48]:
array([[ 1, 7, 13, 4, 10, 16],
[14, 8, 12, 17, 11, 15],
[ 0, 6, 2, 3, 9, 5],
[19, 25, 31, 22, 28, 34],
[32, 26, 30, 35, 29, 33],
[18, 24, 20, 21, 27, 23]])
In [49]: randomize_tiles_shuffle_blocks(a, 3, 3)
Out[49]:
array([[ 3, 4, 5, 18, 19, 20],
[ 9, 10, 11, 24, 25, 26],
[15, 16, 17, 30, 31, 32],
[ 0, 1, 2, 21, 22, 23],
[ 6, 7, 8, 27, 28, 29],
[12, 13, 14, 33, 34, 35]])
Here is an approach that tries hard to avoid unnecessary copies:
import numpy as np
def f_pp(a,bs):
i,j = a.shape
k,l = bs
esh = i//k,k,j//l,l
bc = esh[::2]
sh1,sh2 = np.unravel_index(np.random.permutation(bc[0]*bc[1]),bc)
ns1,ns2 = np.unravel_index(np.arange(bc[0]*bc[1]),bc)
out = np.empty_like(a)
out.reshape(esh)[ns1,:,ns2] = a.reshape(esh)[sh1,:,sh2]
return out
Timings:
pp 0.41529153706505895
dv 1.3133141631260514
br 1.6034217830747366
Test script (continued)
# Divakar
def f_dv(a,bs):
M,N = bs
m,n = a.shape
b = a.reshape(m//M,M,n//N,N).swapaxes(1,2).reshape(-1,M*N)
np.random.shuffle(b)
return b.reshape(m//M,n//N,M,N).swapaxes(1,2).reshape(a.shape)
from skimage.util import view_as_blocks
# Brenlla shape fixed by pp
def f_br(arr,bs):
m,n = bs
a_= view_as_blocks(arr,(m,n))
sh = a_.shape
a_ = a_.reshape(-1,m,n)
# shuffle works along 1st dimension and in-place
np.random.shuffle(a_)
return a_.reshape(sh).swapaxes(1,2).reshape(arr.shape)
ex = np.arange(100000).reshape(1000,100)
bs = 10,10
tst = np.tile(np.arange(np.prod(bs)).reshape(bs),np.floor_divide(ex.shape,bs))
from timeit import timeit
for n,f in list(globals().items()):
if n.startswith('f_'):
assert (tst==f(tst,bs)).all()
print(n[2:],timeit(lambda:f(ex,bs),number=1000))
Here's code to shuffle row order but keep row items exactly as is:
import numpy as np
np.random.seed(0)
#creates a 6x6 array
a = np.random.randint(0,100,(6,6))
a
array([[44, 47, 64, 67, 67, 9],
[83, 21, 36, 87, 70, 88],
[88, 12, 58, 65, 39, 87],
[46, 88, 81, 37, 25, 77],
[72, 9, 20, 80, 69, 79],
[47, 64, 82, 99, 88, 49]])
#creates a number for each row index, 0,1,2,3,4,5
order = np.arange(6)
#shuffle index array
np.random.shuffle(order)
#make new array in shuffled order
shuffled = np.array([a[y] for y in order])
shuffled
array([[46, 88, 81, 37, 25, 77],
[88, 12, 58, 65, 39, 87],
[83, 21, 36, 87, 70, 88],
[47, 64, 82, 99, 88, 49],
[44, 47, 64, 67, 67, 9],
[72, 9, 20, 80, 69, 79]])
I have the following Numpy array of shape (4, 4, 3):
a = [[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
[[12 13 14]
[15 16 17]
[18 19 20]
[21 22 23]]
[[24 25 26]
[27 28 29]
[30 31 32]
[33 34 35]]
[[36 37 38]
[39 40 41]
[42 43 44]
[45 46 47]]]
I am looking for an elegant solution to re-arrange the elements in that array to get the following 3D array of shape (3, 4, 4):
a_new = [[[ 0 3 6 9]
[12 15 18 21]
[24 27 30 33]
[36 39 42 45]]
[[ 1 4 7 10]
[13 16 19 22]
[25 28 31 34]
[37 40 43 46]]
[[ 2 5 8 11]
[14 17 20 23]
[26 29 32 35]
[38 41 44 47]]]
Use np.transpose -
a.transpose(2,0,1)
Or use np.rollaxis -
np.rollaxis(a,2,0) # Or np.rollaxis(a,-1,0)
In case somebody asks the same question for pure Python:
mylist = [[[1,2,3], [4,5,6]], [[7,8,9], [10, 11, 12]]]
flat = sum(sum(mylist, []), [])
groups = 3
print [flat[r::groups] for r in range(groups)]
[[1, 4, 7, 10], [2, 5, 8, 11], [3, 6, 9, 12]]
The fastest way I can think of is to use numpy's swapaxes function in combination with the transpose function.
anew=np.swapaxes(a,0,1).T
I'm trying to create a list of lists, such that each inner list has 8 elements, in a python one-liner.
So far I have the following:
locations = [[alphabet.index(j) for j in test]]
That maps to one big list inside of a list:
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]]
But I want to split it up to be multiple inner lists, each 8 elements:
[[1,2,3,4,5,6,7,8],[9,10,11,12,13,14,15,16]]
Any idea how I can acheive this?
Use list slicing with range() to get the starting indexes:
In [3]: test = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
In [4]: [test[i:i+8] for i in range(0, len(test), 8)]
Out[4]: [[1, 2, 3, 4, 5, 6, 7, 8], [9, 10, 11, 12, 13, 14, 15, 16]]
As a function:
In [7]: def slicing(list_, elem_):
...: return [list_[i:i+elem_] for i in range(0, len(list_), elem_)]
In [8]: slicing(test, 8)
Out[8]: [[1, 2, 3, 4, 5, 6, 7, 8], [9, 10, 11, 12, 13, 14, 15, 16]]
Another solution could be to use NumPy
import numpy as np
data = [x for x in xrange(0, 64)]
data_split = np.array_split(np.asarray(data), 8)
Output:
for a in data_split:
print a
[0 1 2 3 4 5 6 7]
[ 8 9 10 11 12 13 14 15]
[16 17 18 19 20 21 22 23]
[24 25 26 27 28 29 30 31]
[32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47]
[48 49 50 51 52 53 54 55]
[56 57 58 59 60 61 62 63]