Splitting a numpy array into two subsets of different sizes - python

I have a numpy array of a shape (400, 3, 3, 3) and I want to split it into two parts, so I would get arrays like (100, 3, 3, 3) and (300, 3, 3, 3).
I was playing with numpy split methods, e.g.:
subsets = np.array_split(arr, 2)
which gives me what I want, but it divides the original array into two halves the same size and I don't know how to specify these sizes. It'd be probably easy with some indexing (I guess) but I'm not sure how to do it.

As mentioned in my comment, you can use the Ellipsis notation to specify all axes:
x, y = arr[:100, ...], arr[100:, ...]

Related

Selecting last column of a Numpy array while maintaining then umber of dimensions? [duplicate]

I'm using numpy and want to index a row without losing the dimension information.
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:]
xslice.shape # >> (10,)
In this example xslice is now 1 dimension, but I want it to be (1,10).
In R, I would use X[10,:,drop=F]. Is there something similar in numpy. I couldn't find it in the documentation and didn't see a similar question asked.
Thanks!
Another solution is to do
X[[10],:]
or
I = array([10])
X[I,:]
The dimensionality of an array is preserved when indexing is performed by a list (or an array) of indexes. This is nice because it leaves you with the choice between keeping the dimension and squeezing.
It's probably easiest to do x[None, 10, :] or equivalently (but more readable) x[np.newaxis, 10, :]. None or np.newaxis increases the dimension of the array by 1, so that you're back to the original after the slicing eliminates a dimension.
As far as why it's not the default, personally, I find that constantly having arrays with singleton dimensions gets annoying very quickly. I'd guess the numpy devs felt the same way.
Also, numpy handle broadcasting arrays very well, so there's usually little reason to retain the dimension of the array the slice came from. If you did, then things like:
a = np.zeros((100,100,10))
b = np.zeros(100,10)
a[0,:,:] = b
either wouldn't work or would be much more difficult to implement.
(Or at least that's my guess at the numpy dev's reasoning behind dropping dimension info when slicing)
I found a few reasonable solutions.
1) use numpy.take(X,[10],0)
2) use this strange indexing X[10:11:, :]
Ideally, this should be the default. I never understood why dimensions are ever dropped. But that's a discussion for numpy...
Here's an alternative I like better. Instead of indexing with a single number, index with a range. That is, use X[10:11,:]. (Note that 10:11 does not include 11).
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10:11,:]
xslice.shape # >> (1,10)
This makes it easy to understand with more dimensions too, no None juggling and figuring out which axis to use which index. Also no need to do extra bookkeeping regarding array size, just i:i+1 for any i that you would have used in regular indexing.
b = np.ones((2, 3, 4))
b.shape # >> (2, 3, 4)
b[1:2,:,:].shape # >> (1, 3, 4)
b[:, 2:3, :].shape . # >> (2, 1, 4)
To add to the solution involving indexing by lists or arrays by gnebehay, it is also possible to use tuples:
X[(10,),:]
This is especially annoying if you're indexing by an array that might be length 1 at runtime. For that case, there's np.ix_:
some_array[np.ix_(row_index,column_index)]
I've been using np.reshape to achieve the same as shown below
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:].reshape(1, -1)
xslice.shape # >> (1, 10)

how to switch a nested list to a array

I am wondering how to combine two array of different shape to a new array but not change the original shape because the reshape information is import to future process. For example, I have two array (after feeding with data):
a: with shape (4950,40,10)
b: with shape (4950,64)
As we know, we can easily to use a list c.append(a,b) to get a nested one. So how could we do it using array?
Thanks.
A numpy array is a regular matrix where for example all rows must have the same length. You cannot build a numpy array shaped like the nested-list python structure:
[[1, 2, 3],
[[4, 5],
[6, 7]]]
the .shape member of a numpy array is a tuple of integers and for example (9, 3, 7) means a three-dimensional grid of 9×3×7 scalar values, you cannot have elements that are differently shaped numpy arrays.
For example if m is a numpy array where m.shape == (9, 3, 7) then m[i] is a numpy (sub)array with shape (3, 7) for any i value.
You can however just pass [a, b] around (i.e. a python list containing two differently shaped numpy arrays).

Multiplying tensors containing images in numpy

I have the following 3rd order tensors. Both tensors matrices the first tensor containing 100 10x9 matrices and the second containing 100 3x10 matrices (which I have just filled with ones for this example).
My aim is to multiply the matrices as the line up one to one correspondance wise which would result in a tensor with shape: (100, 3, 9) This can be done with a for loop that just zips up both tensors and then takes the dot of each but I am looking to do this just with numpy operators. So far here are some failed attempts
Attempt 1:
import numpy as np
T1 = np.ones((100, 10, 9))
T2 = np.ones((100, 3, 10))
print T2.dot(T1).shape
Ouput of attempt 1 :
(100, 3, 100, 9)
Which means it tried all possible combinations ... which is not what I am after.
Actually non of the other attempts even compile. I tried using np.tensordot , np.einsum (read here https://jameshensman.wordpress.com/2010/06/14/multiple-matrix-multiplication-in-numpy that it is supposed to do the job but I did not get Einsteins indices correct) also in the same link there is some crazy tensor cube reshaping method that I did not manage to visualize. Any suggestions / ideas-explanations on how to tackle this ?
Did you try?
In [96]: np.einsum('ijk,ilj->ilk',T1,T2).shape
Out[96]: (100, 3, 9)
The way I figure this out is look at the shapes:
(100, 10, 9)) (i, j, k)
(100, 3, 10) (i, l, j)
-------------
(100, 3, 9) (i, l, k)
the two j sum and cancel out. The others carry to the output.
For 4d arrays, with dimensions like (100,3,2,24 ) there are several options:
Reshape to 3d, T1.reshape(300,2,24), and after reshape back R.reshape(100,3,...). Reshape is virtually costless, and a good numpy tool.
Add an index to einsum: np.einsum('hijk,hilj->hilk',T1,T2), just a parallel usage to that of i.
Or use elipsis: np.einsum('...jk,...lj->...lk',T1,T2). This expression works with 3d, 4d, and up.

Python time-lat-lon array manipulation and grouping

For a t-x-y array representing time-latitude-longitude and where the values of the t-x-y grid hold arbitrary measured variables, how can i 'group' x-y slices of the array for a give time condition?
For example, if a companion t-array is a 1d list of datetimes, how can i find the elementwise mean of the x-y grids that have months equal to 1. If t has only 10 elements where month = 1 then I want a (10, len(x), len(y)) array. From here I know I can do np.mean(out, axis=0) to get my desired mean values across the x-y grid, where out is the result of the array manipulation.
The shape of t-x-y is approximately (2000, 50, 50), that is a (50, 50) grid of values for 2000 different times. Assume that the number of unique conditions (whether I'm slicing by month or year) are << than the total number of elements in the t array.
What is the most pythonic way to achieve this? This operation will be repeated with many datasets so a computationally efficient solution is preferred. I'm relatively new to python (I can't even figure out how to create an example array for you to test with) so feel free to recommend other modules that may help. (I have looked at Pandas, but it seems like it mainly handles 1d time-series data...?)
Edit:
This is the best I can do as an example array:
>>> t = np.repeat([1,2,3,4,5,6,7,8,9,10,11,12],83)
>>> t.shape
(996,)
>>> a = np.random.randint(1,101,2490000).reshape(996, 50, 50)
>>> a.shape
(996, 50, 50)
>>> list(set(t))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
So a is the array of random data, t is (say) your array representing months of the year, in this case just plain integers. In this example there are 83 instances of each month. How can we separate out the 83 x-yslices of a that correspond to when t = 1 (to create a monthly mean dataset)?
One possible answer to the (my) question, using numpy.where
To find the slices of a, where t = 1:
>>> import numpy as np
>>> out = a[np.where(t == 1),:,:]
although this gives the slightly confusing (to me at least) output of:
>>> out.shape
(1, 83, 50, 50)
but if we follow through with my needing the mean
>>> out2 = np.mean(np.mean(out, axis = 0), axis = 0)
reduces the result to the expected:
>>> out2.shape
(50,50)
Can anyone improve on this or see any issues here?

Numpy index slice without losing dimension information

I'm using numpy and want to index a row without losing the dimension information.
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:]
xslice.shape # >> (10,)
In this example xslice is now 1 dimension, but I want it to be (1,10).
In R, I would use X[10,:,drop=F]. Is there something similar in numpy. I couldn't find it in the documentation and didn't see a similar question asked.
Thanks!
Another solution is to do
X[[10],:]
or
I = array([10])
X[I,:]
The dimensionality of an array is preserved when indexing is performed by a list (or an array) of indexes. This is nice because it leaves you with the choice between keeping the dimension and squeezing.
It's probably easiest to do x[None, 10, :] or equivalently (but more readable) x[np.newaxis, 10, :]. None or np.newaxis increases the dimension of the array by 1, so that you're back to the original after the slicing eliminates a dimension.
As far as why it's not the default, personally, I find that constantly having arrays with singleton dimensions gets annoying very quickly. I'd guess the numpy devs felt the same way.
Also, numpy handle broadcasting arrays very well, so there's usually little reason to retain the dimension of the array the slice came from. If you did, then things like:
a = np.zeros((100,100,10))
b = np.zeros(100,10)
a[0,:,:] = b
either wouldn't work or would be much more difficult to implement.
(Or at least that's my guess at the numpy dev's reasoning behind dropping dimension info when slicing)
I found a few reasonable solutions.
1) use numpy.take(X,[10],0)
2) use this strange indexing X[10:11:, :]
Ideally, this should be the default. I never understood why dimensions are ever dropped. But that's a discussion for numpy...
Here's an alternative I like better. Instead of indexing with a single number, index with a range. That is, use X[10:11,:]. (Note that 10:11 does not include 11).
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10:11,:]
xslice.shape # >> (1,10)
This makes it easy to understand with more dimensions too, no None juggling and figuring out which axis to use which index. Also no need to do extra bookkeeping regarding array size, just i:i+1 for any i that you would have used in regular indexing.
b = np.ones((2, 3, 4))
b.shape # >> (2, 3, 4)
b[1:2,:,:].shape # >> (1, 3, 4)
b[:, 2:3, :].shape . # >> (2, 1, 4)
To add to the solution involving indexing by lists or arrays by gnebehay, it is also possible to use tuples:
X[(10,),:]
This is especially annoying if you're indexing by an array that might be length 1 at runtime. For that case, there's np.ix_:
some_array[np.ix_(row_index,column_index)]
I've been using np.reshape to achieve the same as shown below
import numpy as np
X = np.zeros((100,10))
X.shape # >> (100, 10)
xslice = X[10,:].reshape(1, -1)
xslice.shape # >> (1, 10)

Categories

Resources