keeping track of indices change in numpy.reshape - python

While using numpy.reshape in Python, is there a way to keep track of the change in indices?
For example, if a numpy array with the shape (m,n,l,k) is reshaped into an array with the shape (m*n,k*l); is there a way to get the initial index ([x,y,w,z]) for the current [X,Y] index and vice versa?

Yes there is, it's called raveling and unraveling the index. For example you have two arrays:
import numpy as np
arr1 = np.arange(10000).reshape(20, 10, 50)
arr2 = arr.reshape(20, 500)
say you want to index the (10, 52) (equivalent to arr2[10, 52]) element but in arr1:
>>> np.unravel_index(np.ravel_multi_index((10, 52), arr2.shape), arr1.shape)
(10, 1, 2)
or in the other direction:
>>> np.unravel_index(np.ravel_multi_index((10, 1, 2), arr1.shape), arr2.shape)
(10, 52)

You don't keep track of it, but you can calculate it. The original m x n is mapped onto the new m*n dimension, e.g. n*x+y == X. But we can verify with a couple of multidimensional ravel/unravel functions (as answered by #MSeifert).
In [671]: m,n,l,k=2,3,4,5
In [672]: np.ravel_multi_index((1,2,3,4), (m,n,l,k))
Out[672]: 119
In [673]: np.unravel_index(52, (m*n,l*k))
Out[673]: (2, 12)

Related

Trouble using np.append with 2d array

So i'm trying to append 2 np array together but it gives me this error ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)I know that this mean the shape of the array are not the same but I don't understand why and how to fix it.
arr1 = np.array([
[10.24217065 5.63381577]
[ 2.71521988 -3.33068004]
[-3.43022486 16.40921457]
[ 1.4461307 12.59851726]
[12.34829023 29.67531647]
[16.65382971 9.8915765 ]])
arr2 = np.array([4.62643996 5.14587112])
arr3 = np.append(arr1,arr2,axis=0)
Simply make them the same dimension:
arr3 = np.append(arr1, [arr2], axis=0)
arr2 has only a single dimension, since its shape is (2,). arr1 on the other hand has two dimensions, since its shape is (6, 2). These aren't compatible for np.append, as it says.
You can make arr2 have the required number of dimensions in many ways. One of them is reshaping:
arr3 = np.append(arr1, arr2.reshape(1, 2), axis=0)
At this point, the arrays have shape (6, 2) and (1, 2), which np.append knows how to deal with. The output will have shape (7, 2).
The error message tells you exactly what is the problem. The first array has two dimensions and the second array has one dimension. Another pair of [ ] in the second array will do the job.
arr2 = np.array([[4.62643996 5.14587112]])
arr3 = np.vstack((arr1, arr2))
or if you really want to use append, my favorite is
arr3 = np.append(arr1, arr2[np.newaxis, :])

Get average of the numpy ndarray

I have a shape of A = (8, 64, 64, 64, 1) numpy.ndarray. We can use np.means or np.average to calculate the means of a numpy array. But I want to get the means of the 8 (64,64,64) arrays. That is, i only want 8 values, calculated from the means of the (64,64,64). Of course I can use a for loop, or use [np.means(A[i]) for i in range(A.shape[0])]. I am wondering if there is any numpy method to do this
You can use np.means axis kwarg:
np.mean(A, (1, 2, 3, 4))
The same works with np.average, too.

Indexing a multidimensional array from a list of indices in NumPy

Say I have an array of the form
array = np.random.rand(50, 50, 2)
and I have a list of tuples of indices, which will contain duplicates:
indices = [(0, 2), (0, 3), (0, 2), (1, 1), (0, 3), (0, 2)]
I'm trying to figure out the best way to create a scatterplot of the array elements taken from the indices, with the 3rd dimension of the array giving the x and y coordinates of the point to be plotted. I have tried a few different things but realized that simple array indexing won't give me what I'm looking for because of the way that broadcasting works. I can implement this by iterating over the array and adding the points I want to a new array but that strikes me as unpythonic and I want to make sure I learn correct habits.
I have a solution, inspired by hpaulj above. I can convert the list of tuples of indices to an array, and use the long dimension of the resulting array to index my large array, like so:
index_array = np.array(indices)
reduced_array = array[index_array[:,0],index_array[:,1],:]
and then use reduced_array as an input to my scatter plot.

Iterating over 3D numpy using one dimension as iterator remaining dimensions in the loop

Despite there being a number of similar questions related to iterating over a 3D array and after trying out some functions like nditer of numpy, I am still confused on how the following can be achieved:
I have a signal of dimensions (30, 11, 300) which is 30 trials of 11 signals containing 300 signal points.
Let this signal be denoted by the variable x_
I have another function which takes as input a (11, 300) matrix and plots it on 1 graph (11 signals containing 300 signal points plotted on a single graph). Let this function be sliding_window_plot.
Currently, I can get it to do this:
x_plot = x_[0,:,:]
for i in range(x_.shape[0]):
sliding_window_plot(x_plot[:,:])
which plots THE SAME (first trial) 11 signals containing 300 points on 1 plot, 30 times.
I want it to plot the i'th set of signals. Not the first (0th) trial of signals everytime. Any hints on how to attempt this?
You should be able to iterate over the first dimension with a for loop:
for s in x_:
sliding_window_plot(s)
with each iteration s will be the next array of shape (11, 300).
In general for all nD-arrays where n>1, you can iterate over the very first dimension of the array as if you're iterating over any other iterable. For checking whether an array is an iterable, you can use np.iterable(arr). Here is an example:
In [9]: arr = np.arange(3 * 4 * 5).reshape(3, 4, 5)
In [10]: arr.shape
Out[10]: (3, 4, 5)
In [11]: np.iterable(arr)
Out[11]: True
In [12]: for a in arr:
...: print(a.shape)
...:
(4, 5)
(4, 5)
(4, 5)
So, in each iteration we get a matrix (of shape (4, 5)) as output. In total, 3 such outputs constitute the 3D array of shape (3, 4, 5)
If, for some reason, you want to iterate over other dimensions then you can use numpy.rollaxis to move the desired axis to the first position and then iterate over it as mentioned in iterating-over-arbitrary-dimension-of-numpy-array
NOTE: Having said that numpy.rollaxis is only maintained for backwards compatibility. So, it is recommended to use numpy.moveaxis instead for moving the desired axis to the first dimension.
You are hardcoding the 0th slice outside the for loop. You need to create x_plot to be inside the loop. In fact you can simplify your code by not using x_plot at all.
for i in rangge(x_.shape[0]):
sliding_window_plot(x_[i])

Python time-lat-lon array manipulation and grouping

For a t-x-y array representing time-latitude-longitude and where the values of the t-x-y grid hold arbitrary measured variables, how can i 'group' x-y slices of the array for a give time condition?
For example, if a companion t-array is a 1d list of datetimes, how can i find the elementwise mean of the x-y grids that have months equal to 1. If t has only 10 elements where month = 1 then I want a (10, len(x), len(y)) array. From here I know I can do np.mean(out, axis=0) to get my desired mean values across the x-y grid, where out is the result of the array manipulation.
The shape of t-x-y is approximately (2000, 50, 50), that is a (50, 50) grid of values for 2000 different times. Assume that the number of unique conditions (whether I'm slicing by month or year) are << than the total number of elements in the t array.
What is the most pythonic way to achieve this? This operation will be repeated with many datasets so a computationally efficient solution is preferred. I'm relatively new to python (I can't even figure out how to create an example array for you to test with) so feel free to recommend other modules that may help. (I have looked at Pandas, but it seems like it mainly handles 1d time-series data...?)
Edit:
This is the best I can do as an example array:
>>> t = np.repeat([1,2,3,4,5,6,7,8,9,10,11,12],83)
>>> t.shape
(996,)
>>> a = np.random.randint(1,101,2490000).reshape(996, 50, 50)
>>> a.shape
(996, 50, 50)
>>> list(set(t))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
So a is the array of random data, t is (say) your array representing months of the year, in this case just plain integers. In this example there are 83 instances of each month. How can we separate out the 83 x-yslices of a that correspond to when t = 1 (to create a monthly mean dataset)?
One possible answer to the (my) question, using numpy.where
To find the slices of a, where t = 1:
>>> import numpy as np
>>> out = a[np.where(t == 1),:,:]
although this gives the slightly confusing (to me at least) output of:
>>> out.shape
(1, 83, 50, 50)
but if we follow through with my needing the mean
>>> out2 = np.mean(np.mean(out, axis = 0), axis = 0)
reduces the result to the expected:
>>> out2.shape
(50,50)
Can anyone improve on this or see any issues here?

Categories

Resources