Numpy: shuffle arrays in unison multiple times with different seeds

Numpy: shuffle arrays in unison multiple times with different seeds - python

I have multiple numpy arrays with the same number of rows (axis_0) that I'd like to shuffle in unison. After one shuffle, I'd like to shuffle them again with a different random seed.
Till now, I've used the solution from
Better way to shuffle two numpy arrays in unison :
def shuffle_in_unison(a, b):
rng_state = numpy.random.get_state()
numpy.random.shuffle(a)
numpy.random.set_state(rng_state)
numpy.random.shuffle(b)
However, this doesn't work for multiple unison shuffles, since rng_state is always the same.
I've tried to use RandomState in order to get a different seed for each call, but this doesn't even work for a single unison shuffle:
a = np.array([1,2,3,4,5])
b = np.array([10,20,30,40,50])
def shuffle_in_unison(a, b):
r = np.random.RandomState() # different state from /dev/urandom for each call
state = r.get_state()
np.random.shuffle(a) # array([4, 2, 1, 5, 3])
np.random.set_state(state)
np.random.shuffle(b) # array([40, 20, 50, 10, 30])
# -> doesn't work
return a,b
for i in xrange(10):
a,b = shuffle_in_unison(a,b)
print a,b
What am I doing wrong?
Edit:
For everyone that doesn't have huge arrays like me, just use the solution by Francesco (https://stackoverflow.com/a/47156309/3955022):
def shuffle_in_unison(a, b):
n_elem = a.shape[0]
indeces = np.random.permutation(n_elem)
return a[indeces], b[indeces]
The only drawback is that this is not an in-place operation, which is a pity for large arrays like mine (500G).

I don't know what are you doing wrong with the way you set the state. However I found an alternative solution: instead of shuffling n arrays, shuffle their indeces only once with numpy.random.choice and then reorder all the arrays.
a = np.array([1,2,3,4,5])
b = np.array([10,20,30,40,5])
def shuffle_in_unison(a, b):
n_elem = a.shape[0]
indeces = np.random.choice(n_elem, size=n_elem, replace=False)
return a[indeces], b[indeces]
for i in xrange(5):
a, b = shuffle_in_unison(a ,b)
print(a, b)
I get:
[5 2 4 3 1] [50 20 40 30 10]
[1 3 4 2 5] [10 30 40 20 50]
[1 2 5 4 3] [10 20 50 40 30]
[3 2 1 4 5] [30 20 10 40 50]
[1 2 5 3 4] [10 20 50 30 40]
edit
Thanks to #Divakar for the suggestion.
Here is a more readable way to obtain the same result using numpy.random.premutation
def shuffle_in_unison(a, b):
n_elem = a.shape[0]
indeces = np.random.permutation(n_elem)
return a[indeces], b[indeces]

I don't know exactly what you are doing well, but you have not chosen the solution with the most votes on that page or with the second most votes. Try this one:
from sklearn.utils import shuffle
for i in range(10):
X, Y = shuffle(X, Y, random_state=i)
print ("X - ", X, "Y - ", Y)
Output:
X - [3 5 1 4 2] Y - [30 50 10 40 20]
X - [1 5 2 3 4] Y - [10 50 20 30 40]
X - [2 4 5 3 1] Y - [20 40 50 30 10]
X - [3 1 4 2 5] Y - [30 10 40 20 50]
X - [3 2 1 5 4] Y - [30 20 10 50 40]
X - [4 3 2 1 5] Y - [40 30 20 10 50]
X - [1 5 4 3 2] Y - [10 50 40 30 20]
X - [1 3 4 5 2] Y - [10 30 40 50 20]
X - [2 4 3 1 5] Y - [20 40 30 10 50]
X - [1 2 4 3 5] Y - [10 20 40 30 50]

I don't normally have to shuffle my data more than once at a time. But this function accommodates any number of input arrays, as well as any number of random shuffles - and it shuffles in-place.
import numpy as np
def shuffle_arrays(arrays, shuffle_quant=1):
assert all(len(arr) == len(arrays[0]) for arr in arrays)
max_int = 2**(32 - 1) - 1
for i in range(shuffle_quant):
seed = np.random.randint(0, max_int)
for arr in arrays:
rstate = np.random.RandomState(seed)
rstate.shuffle(arr)
And can be used like this
a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])
shuffle_arrays([a, b, c], shuffle_quant=5)
A few things to note:
Method uses NumPy and no other packages.
The assert ensures that all input arrays have the same length along
their first dimension.
The max_int keeps random seed within int32 range.
Arrays shuffled in-place by their first dimension - nothing returned.
After the shuffle, the data can be split using np.split or referenced using slices - depending on the application.

Related

Creating a new matrix from a matrix of index in numpy

I have a 3D numpy array A with shape(k, l, m) and a 2D numpy array B with shape (k,l) with the indexes (between 0 and m-1) of particular items that I want to create a new 2D array C with shape (k,l), like this:
import numpy as np
A = np.random.random((2,3,4))
B = np.array([[0,0,0],[2,2,2]))
C = np.zeros((2,3))
for i in range(2):
for j in range(3):
C[i,j] = A[i, j, B[i,j]]
Is there a more efficient way of doing this?

Use inbuilt routine name fromfunction of Numpy library. And turn your code into
C = np.fromfunction(lambda i, j: A[i, j, B[i,j]], (5, 5))

Setup:
import numpy as np
k,l,m = 2,3,4
a = np.arange(k*l*m).reshape(k,l,m)
b = np.random.randint(0,4,(k,l))
print(a)
print('*'*10)
print(b)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
**********
[[3 0 3]
[2 1 2]]
Use integer indexing to select the values then reshape.
x,y = np.indices(a.shape[:-1])
c = a[x,y,b]
print(c)
[[ 3 4 11]
[14 17 22]]
Using numpy.ix_.
x,y = np.ix_(np.arange(a.shape[0]),np.arange(a.shape[1]))
d = a[x,y,b]

How to efficiently subtract values from each column with numpy

I have a 2D array of shape (50,50). I need to subtract a value from each column of this array skipping the first), which is calculated based on the index of the column. For example, using a for loop it would look something like this:
for idx in range(1, A[0, :].shape[0]):
A[0, idx] -= idx * (...) # simple calculations with idx
Now, of course this works fine, but it's very slow and performance is critical for my application. I've tried computing the values to be subtracted using np.fromfunction() and then subtracting it from the original array, but results are different than those obtained by the for loop iteractive subtraction:
func = lambda i, j: j * (...) #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (1,50))
A[0, 1:] -= subtraction_matrix
What am I doing wrong? Or is there some other method that would be better? Any help is appreciated!

All your code snippets indicate that you require the subtraction to happen only in the first row of A (though you've not explicitly mentioned that). So, I'm proceeding with that understanding.
Referring to your use of from_function(), you can use the subtraction_matrix as below:
A[0,1:] -= subtraction_matrix[1:]
Testing it out (assuming shape (5,5) instead of (50,50)):
import numpy as np
A = np.arange(25).reshape(5,5)
print (A)
func = lambda j: j * 10 #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (5,), dtype=A.dtype)
A[0,1:] -= subtraction_matrix[1:]
print (A)
Output:
[[ 0 1 2 3 4] # print(A), before subtraction
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
[[ 0 -9 -18 -27 -36] # print(A), after subtraction
[ 5 6 7 8 9]
[ 10 11 12 13 14]
[ 15 16 17 18 19]
[ 20 21 22 23 24]]
If you want the subtraction to happen in all the rows of A, you just need to use the line A[:,1:] -= subtraction_matrix[1:], instead of the line A[0,1:] -= subtraction_matrix[1:]

Using generator items selectively

Let's say I have some arrays/lists that contains a lot of values, which means that loading several of these into memory would ultimately result in a memory error due to lack of memory. One way to circumvent this is to load these arrays/lists into a generator, and then use them when needed. However, with generators you don't have so much control as with arrays/lists - and that is my problem.
Let me explain.
As an example I have the following code, which produces a generator with some small lists. So yeah, this is not memory intensive at all, just an example:
import numpy as np
np.random.seed(10)
number_of_lists = range(0, 5)
generator_list = (np.random.randint(0, 10, 10) for i in number_of_lists)
If I iterate over this list I get the following:
for i in generator_list:
print(i)
>> [9 4 0 1 9 0 1 8 9 0]
>> [8 6 4 3 0 4 6 8 1 8]
>> [4 1 3 6 5 3 9 6 9 1]
>> [9 4 2 6 7 8 8 9 2 0]
>> [6 7 8 1 7 1 4 0 8 5]
What I would like to do is sum element wise for all the lists (axis = 0). So the above should in turn result in:
[36, 22, 17, 17, 28, 16, 28, 31, 29, 14]
To do this I could use the following:
sum = [0]*10
for i in generator_list:
sum += i
where 10 is the length of one of the lists.
So far so good. I am not sure if there is a better/more optimized way of doing it, but it works.
My problem is that I would like to determine which lists in the generator_list I want to use. For example, what if I wanted to sum two of the first [0] list, one of the third, and 2 of the last, i.e.:
[9 4 0 1 9 0 1 8 9 0]
[9 4 0 1 9 0 1 8 9 0]
[4 1 3 6 5 3 9 6 9 1]
[6 7 8 1 7 1 4 0 8 5]
[6 7 8 1 7 1 4 0 8 5]
>> [34, 23, 19, 10, 35, 5, 19, 22, 43, 11]
How would I go about doing that ?
And before any questions arise why I want to do it this way, the reason is that in my real case, getting the arrays into the generator takes some time. I could then in principle just generate a new generator where I put in the order of lists as seen in the new list, but again, that would mean I would have to wait to get them in a new generator. And if this is to happen thousands of times (as seen with bootstrapping), well, it would take some time. With the first generator I have ALL lists that are available. Now I just wish to use them selectively so I don't have to create a new generator every time I want to mix it up, and sum a new set of arrays/lists.

import numpy as np
np.random.seed(10)
number_of_lists = range(5)
generator_list = (np.random.randint(0, 10, 10) for i in number_of_lists)
indices = [0, 0, 2, 4, 4]
assert sorted(indices) == indices, "only works for sorted list"
# sum_ = [0] * 10
# I prefer this:
sum_ = np.zeros((10,), dtype=int)
generator_index = -1
for index in indices:
while generator_index < index:
vector = next(generator_list)
generator_index += 1
sum_ += vector
print(sum_)
outputs
[34 23 19 10 37 5 19 22 43 11]

How to reshape a vector to TensorFlow's filters?

I want to transfer some weights trained by another network to TensorFlow, the weights are stored in a single vector like this:
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
By using numpy, I can reshape it to two 3 by 3 filters like this:
1 2 3 9 10 11
3 4 5 12 13 14
6 7 8 15 16 17
Thus, the shape of my filters are (1,2,3,3). However, in TensorFlow, the shape of filters are (3,3,2,1):
tf_weights = tf.Variable(tf.random_normal([3,3,2,1]))
After reshaping the tf_weights to the expected shape, the weight becomes a mess and I can't get the expected convolution result.
To be specific, when the shape of an image or filter is [number,channel,size,size], I wrote a convolution function and it gives the correct answer,but it's too slow:
def convol(images,weights,biases,stride):
"""
Args:
images:input images or features, 4-D tensor
weights:weights, 4-D tensor
biases:biases, 1-D tensor
stride:stride, a float number
Returns:
conv_feature: convolved feature map
"""
image_num = images.shape[0] #the number of input images or feature maps
channel = images.shape[1] #channels of an image,images's shape should be like [n,c,h,w]
weight_num = weights.shape[0] #number of weights, weights' shape should be like [n,c,size,size]
ksize = weights.shape[2]
h = images.shape[2]
w = images.shape[3]
out_h = (h+np.floor(ksize/2)*2-ksize)/2+1
out_w = out_h
conv_features = np.zeros([image_num,weight_num,out_h,out_w])
for i in range(image_num):
image = images[i,...,...,...]
for j in range(weight_num):
sum_convol_feature = np.zeros([out_h,out_w])
for c in range(channel):
#extract a single channel image
channel_image = image[c,...,...]
#pad the image
padded_image = im_pad(channel_image,ksize/2)
#transform this image to a vector
im_col = im2col(padded_image,ksize,stride)
weight = weights[j,c,...,...]
weight_col = np.reshape(weight,[-1])
mul = np.dot(im_col,weight_col)
convol_feature = np.reshape(mul,[out_h,out_w])
sum_convol_feature = sum_convol_feature + convol_feature
conv_features[i,j,...,...] = sum_convol_feature + biases[j]
return conv_features
Instead, by using tensorflow's conv2d like this:
img = np.zeros([1,3,224,224])
img = img - 1
img = np.rollaxis(img, 1, 4)
weight_array = googleNet.layers[1].weights
weight_array = np.reshape(weight_array,[64,3,7,7])
biases_array = googleNet.layers[1].biases
tf_weight = tf.Variable(weight_array)
tf_img = tf.Variable(img)
tf_img = tf.cast(tf_img,tf.float32)
tf_biases = tf.Variable(biases_array)
conv_feature = tf.nn.bias_add(tf.nn.conv2d(tf_img,tf_weight,strides=[1,2,2,1],padding='SAME'),tf_biases)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
feautre = sess.run(conv_feature)
The feature map I got is wrong.

Don't use np.reshape. It might mess up the order of your values.
Use np.rollaxis instead:
>>> a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18])
>>> a = a.reshape((1,2,3,3))
>>> a
array([[[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]]]])
>>> b = np.rollaxis(a, 1, 4)
>>> b.shape
(1, 3, 3, 2)
>>> b = np.rollaxis(b, 0, 4)
>>> b.shape
(3, 3, 2, 1)
Note that the order of the two axes with size 3 haven't changed. If I were to label them, the two rollaxis operations caused the shapes to change as (1, 2, 31, 32) -> (1, 31, 32, 2) -> (31, 32, 2, 1). Your final array looks like:
>>> b
array([[[[ 1],
[10]],
[[ 2],
[11]],
[[ 3],
[12]]],
[[[ 4],
[13]],
[[ 5],
[14]],
[[ 6],
[15]]],
[[[ 7],
[16]],
[[ 8],
[17]],
[[ 9],
[18]]]])

Sample Tensor Manipulations
I dont know if this might be of help. Consider the Reshape ,Gather, Dynamic_partition and Split operations and adapt this to your needs.
In what comes below is the illustration of these operations that can be adapted to use in your situation. I copied this from my git repo. I will believe if you run this examples in ipython you can figure out what you really want and get even better insight.
Reshape ,Gather, Dynamic_partition and Split
Gather Operation ( tf.gather( ) )
Generate an array and test the gather operation. Note this approach for fast prototyping:
We generate an array in Numpy and test the operations of tensor flow on it.
Use: Gather slices from params according to indices.
indices must be an integer tensor of any dimension (usually 0-D or 1-D). This is best illustrated by an example:
array = np.array([[1,2,3],[4,9,6],[2,3,4],[7,8,0]])
array.shape
(4, 3)
In [27]:
gather_output0 = tf.gather(array,1)
gather_output01 = tf.gather(array,2)
gather_output02 = tf.gather(array,3)
gather_output11 = tf.gather(array,[1,2])
gather_output12 = tf.gather(array,[1,3])
gather_output13 = tf.gather(array,[3,2])
gather_output = tf.gather(array,[1,0,2])
gather_output1 = tf.gather(array,[1,1,2])
gather_output2 = tf.gather(array,[1,2,1])
In [28]:
with tf.Session() as sess:
print (gather_output0.eval());print("\n")
print (gather_output01.eval());print("\n")
print (gather_output02.eval());print("\n")
print (gather_output11.eval());print("\n")
print (gather_output12.eval());print("\n")
print (gather_output13.eval());print("\n")
print (gather_output.eval());print("\n")
print (gather_output1.eval());print("\n")
print (gather_output2.eval());print("\n")
#print (gather_output2.eval());print("\n")
[4 9 6]
[2 3 4]
[7 8 0]
[[4 9 6]
[2 3 4]]
[[4 9 6]
[7 8 0]]
[[7 8 0]
[2 3 4]]
[[4 9 6]
[1 2 3]
[2 3 4]]
[[4 9 6]
[4 9 6]
[2 3 4]]
[[4 9 6]
[2 3 4]
[4 9 6]]
And looking at this simple example:
Initialise simple array
test gather operation
In [11]:
array_simple = np.array([1,2,3])
In [15]:
print "shape of simple array is: ", array_simple.shape
shape of simple array is: (3,)
In [57]:
gather1 = tf.gather(array1,[0])
gather01 = tf.gather(array1,[1])
gather02 = tf.gather(array1,[2])
gather2 = tf.gather(array1,[1,2])
gather3 = tf.gather(array1,[0,1])
with tf.Session() as sess:
print (gather1.eval());print("\n")
print (gather01.eval());print("\n")
print (gather02.eval());print("\n")
print (gather2.eval());print("\n")
print (gather3.eval());print("\n")
[1]
[2]
[3]
[2 3]
[1 2]
tf.reshape( )
Note:
* Use the same array that was initiated
* Do reshape using tf.reshape( )
In [64]:
array.shape # Confirm array shape
Out[64]:
(4, 3)
In [74]:
print ("This is the array\n" ,array) # see the output and compare with the initial array,
This is the array
[[1 2 3]
[4 9 6]
[2 3 4]
[7 8 0]]
In [84]:
reshape_ops= tf.reshape(array,[-1,4]) # Note the parameters in reshpe
reshape_ops1= tf.reshape(array,[-1,3]) # Note the parameters in reshpe
reshape_ops2= tf.reshape(array,[-1,6]) # Note the parameters in reshpe
reshape_ops_back1= tf.reshape(array,[6,-1]) # Note the parameters in reshpe
reshape_ops_back2= tf.reshape(array,[3,-1]) # Note the parameters in reshpe
reshape_ops_back3= tf.reshape(array,[4,-1]) # Note the parameters in reshpe
In [86]:
with tf.Session() as sess:
print(reshape_ops.eval());print("\n")
print(reshape_ops1.eval());print("\n")
print(reshape_ops2.eval());print("\n")
print ("Output when we reverse the parameters:");print("\n")
print(reshape_ops_back1.eval());print("\n")
print(reshape_ops_back2.eval());print("\n")
print(reshape_ops_back3.eval());print("\n")
[[1 2 3 4]
[9 6 2 3]
[4 7 8 0]]
[[1 2 3]
[4 9 6]
[2 3 4]
[7 8 0]]
[[1 2 3 4 9 6]
[2 3 4 7 8 0]]
Output when we reverse the parameters:
[[1 2]
[3 4]
[9 6]
[2 3]
[4 7]
[8 0]]
[[1 2 3 4]
[9 6 2 3]
[4 7 8 0]]
[[1 2 3]
[4 9 6]
[2 3 4]
[7 8 0]]
Note: The input size and output size must be the same. ---otherwise it gives error. Simple way to check this out is to make sure the input can be paritioned into the the reshape parameters by doing simple multiplications.
Dynamic_cell_partitions
This is declared as :
tf.dynamic_partition (array, partitions, num_partitions, name=None)
Note:
* we decalare number_partitions --- number of partitions
* Use our array initialised earlier
* We declare the partition as [0 1 0 1] . This signifies the partitions we want 0's fall to one partition and 1 the other partitions given that we have two num_partitions=2.
* The output is a list
In [96]:
print ("This is the array\n" ,array) # This is output array
This is the array
[[1 2 3]
[4 9 6]
[2 3 4]
[7 8 0]]
We show how to make two and three partitions below
In [123]:
num_partitions = 2
num_partitions1 = 3
partitions = [0, 0, 1, 1]
partitions1 = [0 ,1 ,1, 2 ]
In [119]:
dynamic_ops =tf.dynamic_partition(array, partitions, num_partitions, name=None) # 2 partitions
dynamic_ops1 =tf.dynamic_partition(array, partitions1, num_partitions1, name=None) # 3 partitions
In [125]:
with tf.Session() as sess:
run = sess.run(dynamic_ops)
run1 = sess.run(dynamic_ops1)
print("Output for 2 partitions: ")
print (run[0]);print("\n")
print(run[1]) ;print("\n")# Compare result with initial array. Out is list
print("Output for three partitions: ")
print (run1[0]);print("\n")
print (run1[1]);print("\n")
print (run1[2]);print("\n")
Output for 2 partitions:
[[1 2 3]
[4 9 6]]
[[2 3 4]
[7 8 0]]
Output for three partitions:
[[1 2 3]]
[[4 9 6]
[2 3 4]]
[[7 8 0]]
tf.split( )
Make sure you use an up to date tensorflow version. Otherwise in older versions, this implemetation will give error
This is specified in the documentation as below:
tf.split(value, num_or_size_splits, axis=0, num=None, name='split').
It splits a tensor into subtensors. This is best illustrated by an example:
* we define (5,30) aray in numpy
* we split the array along axis 1
* We specify the number of splits as 1-Dimen Tensor along axis 1. So we have 3 splits.
Specify an array
Create a (5 by 30) numpy array. The syntax using numpy is shown below
In [2]:
ArrayBeforeSplitting = np.arange(150).reshape(5,30)
print ("Array shape without split operation is : " ,ArrayBeforeSplitting.shape)
('Array shape without split operation is : ', (5, 30))
specify number of splits
In [3]:
split_1D = tf.Variable([8,13,9])
print("specify number of partions using 1-Dimen Variable:" , tf.shape(split_1D))
('specify number of partions using 1-Dimen Variable:', <tf.Tensor 'Shape:0' shape=(1,) dtype=int32>)
Use tf.split
Make 3 splits aong y axis so that we have (5,8) ,(5,13),(5,9) splits. The axis 1 add up to give 30-- we can see axis 1 has 30 elements so the partition along that axis should add up to 30 otherwise it gives error.
In [6]:
split1,split2,split3 = tf.split(ArrayBeforeSplitting,split_1D,1)
# we have 3 splits along axis 1 specified spcifically
# by the split_1D . That is split axis 1D (with 30 elements) into partions with 8 ,13, and 9 elements while the x axis
#remains constant
In [7]:
#INitialise global variables. because split_ID is a variable and needs to be initialised before being
#used in a computational graph
init_op = tf.global_variables_initializer()
In [16]:
with tf.Session() as sess:
sess.run(init_op) # run variable initialisation.
result=split1.eval();print("\n")
print(result)
print("the shape of the first split operation is : ",result.shape)
result2=split2.eval();print("\n")
print(result2)
print("the shape of the second split operation is : ",result2.shape)
result3=split3.eval();print("\n")
print(result3)
print("the shape of the third split operation is : ",result3.shape)
[[ 0 1 2 3 4 5 6 7]
[ 30 31 32 33 34 35 36 37]
[ 60 61 62 63 64 65 66 67]
[ 90 91 92 93 94 95 96 97]
[120 121 122 123 124 125 126 127]]
('the shape of the first split operation is : ', (5, 8))
[[ 8 9 10 11 12 13 14 15 16 17 18 19 20]
[ 38 39 40 41 42 43 44 45 46 47 48 49 50]
[ 68 69 70 71 72 73 74 75 76 77 78 79 80]
[ 98 99 100 101 102 103 104 105 106 107 108 109 110]
[128 129 130 131 132 133 134 135 136 137 138 139 140]]
('the shape of the second split operation is : ', (5, 13))
Hope this helps!

Python: Shrink/Extend 2D arrays in fractions

There are 2D arrays of numbers as outputs of some numerical processes in the form of 1x1, 3x3, 5x5, ... shaped, that correspond to different resolutions.
In a stage an average i.e., 2D array value in the shape nxn needs to be produced.
If the outputs were in consistency of shape i.e., say all in 11x11 the solution was obvious, so:
element_wise_mean_of_all_arrays.
For the problem of this post however the arrays are in different shapes so the obvious way does not work!
I thought it might be some help by using kron function however it didn't. For example, if array is in shape of 17x17 how to make it 21x21. So for all others from 1x1,3x3,..., to build a constant-shaped array, say 21x21.
Also it can be the case that the arrays are smaller and bigger in shape compared to the target shape. That is an array of 31x31 to be shruk into 21x21.
You could imagine the problem as a very common task for images, being shrunk or extended.
What are possible efficient approaches to do the same jobs on 2D arrays, in Python, using numpy, scipy, etc?
Updates:
Here is a bit optimized version of the accepted answer bellow:
def resize(X,shape=None):
if shape==None:
return X
m,n = shape
Y = np.zeros((m,n),dtype=type(X[0,0]))
k = len(X)
p,q = k/m,k/n
for i in xrange(m):
Y[i,:] = X[i*p,np.int_(np.arange(n)*q)]
return Y
It works perfectly, however do you all agree it is the best choice in terms of the efficiency? If not any improvement?
# Expanding ---------------------------------
>>> X = np.array([[1,2,3],[4,5,6],[7,8,9]])
[[1 2 3]
[4 5 6]
[7 8 9]]
>>> resize(X,[7,11])
[[1 1 1 1 2 2 2 2 3 3 3]
[1 1 1 1 2 2 2 2 3 3 3]
[1 1 1 1 2 2 2 2 3 3 3]
[4 4 4 4 5 5 5 5 6 6 6]
[4 4 4 4 5 5 5 5 6 6 6]
[7 7 7 7 8 8 8 8 9 9 9]
[7 7 7 7 8 8 8 8 9 9 9]]
# Shrinking ---------------------------------
>>> X = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]
>>> resize(X,(2,2))
[[ 1 3]
[ 9 11]]
Final note: that the code above easily could be translated to Fortran for the highest performance possible.

I'm not sure I understand exactly what you are trying but if what I think the simplest way would be:
wanted_size = 21
a = numpy.array([[1,2,3],[4,5,6],[7,8,9]])
b = numpy.zeros((wanted_size, wanted_size))
for i in range(wanted_size):
for j in range(wanted_size):
idx1 = i * len(a) / wanted_size
idx2 = j * len(a) / wanted_size
b[i][j] = a[idx1][idx2]
You could maybe replace the b[i][j] = a[idx1][idx2] with some custom function like the average of a 3x3 matrix centered in a[idx1][idx2] or some interpolation function.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy: shuffle arrays in unison multiple times with different seeds - python

Related

Creating a new matrix from a matrix of index in numpy

How to efficiently subtract values from each column with numpy

Using generator items selectively

How to reshape a vector to TensorFlow's filters?

Python: Shrink/Extend 2D arrays in fractions

Categories

Resources