List of 2D-arrays/matrices to 2d array in python - python

I have a list of arrays with the length 3625, consisting of (101, 101) matrices/2D-arrays which I want to convert/reshape to a 2D array e.g. size (725, 5) or directly into a dataframe with the same size, so that each element in the this new list contains one of those 2d-arrays.
I tried it like this, also with np.ravel and reshape, but I can't seem to get it into the right shape.
list = np.zeros((725,5))
for i in y:
list = np.append(list, [[i]])

Your question is not clear, please try to give more concrete examples. Anyway, as far as I understand you are looking for something like this.
import numpy as np
data = np.zeros((3625, 101, 101))
print(data.shape)
reshaped_data = np.reshape(data, (725, 5, 101, 101))
print(reshaped_data.shape)
Output:
(3625, 101, 101)
(725, 5, 101, 101)

Related

How to write a multidimensional array into a fits with proper axes formatting?

I have a list of two arrays which returns two arrays like the following:
mainarray = array1, array2
The size of array1 is let's say 50. However, array2 is multidimensional of the shape [100, 100, 50], where say [100, 100] is the sky coordinates, and for each sky coordinate we have 50 values corresponding to array1. Now I want to write this data into a fits file, with a proper formatting of first two axes corresponding to [100, 100] data points, second axes corresponding to [50, 50] values of array1 and array2[axis=2]. What is the proper way of doing this? Will just
hdu = fits.PrimaryHDU(mainarray)
hdu.writeto('skymap.fits', clobber = True)
work?

Splitting a numpy array into two subsets of different sizes

I have a numpy array of a shape (400, 3, 3, 3) and I want to split it into two parts, so I would get arrays like (100, 3, 3, 3) and (300, 3, 3, 3).
I was playing with numpy split methods, e.g.:
subsets = np.array_split(arr, 2)
which gives me what I want, but it divides the original array into two halves the same size and I don't know how to specify these sizes. It'd be probably easy with some indexing (I guess) but I'm not sure how to do it.
As mentioned in my comment, you can use the Ellipsis notation to specify all axes:
x, y = arr[:100, ...], arr[100:, ...]

Aligning N-dimensional numpy arrays

I am putting data in numpy arrays for comparisons. They way the data is stored sometimes the dimensions are out of order. For example if the first array has the shape (10, 20, 30, 40), sometimes the second array will have the shape (10, 20, 40, 30). We can assume that the lengths of the dimensions will be unique.
Is there an easy way to convert the shape of the second array to the shape of the first without knowing the number of dimensions or the length of the dimensions beforehand? I think I can do it with a long series of elif statements and transpose operations, but I'm hoping there is a cleaner method available.
Use shape.index to find where each axis needs to be, then use transpose to re-order the axes:
import numpy as np
A = np.ones((10, 20, 40, 30))
B = np.ones((10, 20, 30, 40))
new_order = [A.shape.index(i) for i in B.shape]
B = B.transpose(new_order)

Creating numpy array of matrices

I am trying to create a multi-dimensional numpy array where the data type is a matrix. So, I would like to be able to store a 3x3 numpy matrices into a multi-dimensional array. For example, I would like to create a numpy array of size 100 x 100 x 100, so when I refer to an index like:
x [10, 10, 10] <- should return a 3x3 numpy matrix
I can do something like:
x = np.array((100, 100, 100), np.matrix)
However, I am not sure how to define the size of the matrix in this case. Another option is to do something like:
x = np.array((100, 100, 100, 3, 3))
However, this way I am not able to take advantage of the matrix object class and its functions.
[EDIT]
One thing I now realised is that I can cast an array to numpy matrix. So, using something like:
x = np.array((100, 100, 100, 3, 3))
a = np.matrix(x[1, 1, 1])
However, I wonder if there is a more direct way.
[MORE EDIT]
After reading comments, it seems the numpy matrix class is not really that useful. I can do something like the following to compute the inverse, for example:
x = np.array((100, 100, 100, 3, 3))
a = np.matrix(x[1, 1, 1])
a_inv = np.linalg.inv(a)

Python time-lat-lon array manipulation and grouping

For a t-x-y array representing time-latitude-longitude and where the values of the t-x-y grid hold arbitrary measured variables, how can i 'group' x-y slices of the array for a give time condition?
For example, if a companion t-array is a 1d list of datetimes, how can i find the elementwise mean of the x-y grids that have months equal to 1. If t has only 10 elements where month = 1 then I want a (10, len(x), len(y)) array. From here I know I can do np.mean(out, axis=0) to get my desired mean values across the x-y grid, where out is the result of the array manipulation.
The shape of t-x-y is approximately (2000, 50, 50), that is a (50, 50) grid of values for 2000 different times. Assume that the number of unique conditions (whether I'm slicing by month or year) are << than the total number of elements in the t array.
What is the most pythonic way to achieve this? This operation will be repeated with many datasets so a computationally efficient solution is preferred. I'm relatively new to python (I can't even figure out how to create an example array for you to test with) so feel free to recommend other modules that may help. (I have looked at Pandas, but it seems like it mainly handles 1d time-series data...?)
Edit:
This is the best I can do as an example array:
>>> t = np.repeat([1,2,3,4,5,6,7,8,9,10,11,12],83)
>>> t.shape
(996,)
>>> a = np.random.randint(1,101,2490000).reshape(996, 50, 50)
>>> a.shape
(996, 50, 50)
>>> list(set(t))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
So a is the array of random data, t is (say) your array representing months of the year, in this case just plain integers. In this example there are 83 instances of each month. How can we separate out the 83 x-yslices of a that correspond to when t = 1 (to create a monthly mean dataset)?
One possible answer to the (my) question, using numpy.where
To find the slices of a, where t = 1:
>>> import numpy as np
>>> out = a[np.where(t == 1),:,:]
although this gives the slightly confusing (to me at least) output of:
>>> out.shape
(1, 83, 50, 50)
but if we follow through with my needing the mean
>>> out2 = np.mean(np.mean(out, axis = 0), axis = 0)
reduces the result to the expected:
>>> out2.shape
(50,50)
Can anyone improve on this or see any issues here?

Categories

Resources