How to one liner access numpy array in a list? - python

Given a array in list
import numpy as np
n_pair = 5
np.random.seed ( 0 )
nsteps = 4
nmethod = 2
nbands = 3
t_band=0
t_method=0
t_step=0
t_sbj=0
t_gtmethod=1
all_sub = [[np.random.rand ( nmethod, nbands, 2 ) for _ in range ( nsteps )] for _ in range ( 3)]
Then extract the array data point from each of the list as below
this_gtmethod=[x[t_step][t_method][t_band][t_gtmethod] for x in all_sub]
However, I would like to avoid the loop and instead would like to access directly all the three elements as below
this_gtmethod=all_sub[:][t_step][t_method][t_band][t_gtmethod]
But, it does not return the expected result when indexing the element as above
May I know where did I do wrong?

This sort of slicing and indexing is best accomplished with Numpy arrays rather than lists.
If you make all_sub into a Numpy array, you can achieve your desired result with simple slicing.
all_sub = np.array(all_sub)
this_gtmethod = all_sub[:, t_step, t_method, t_band, t_gtmethod]
The result is the same as with your looping example.

You made a list of lists of arrays:
In [279]: type(all_sub), len(all_sub)
Out[279]: (list, 3)
In [280]: type(all_sub[0]), len(all_sub[0])
Out[280]: (list, 4)
In [282]: type(all_sub[0][0]), all_sub[0][0].shape
Out[282]: (numpy.ndarray, (2, 3, 2))
Lists can only be indexed with a scalar value or slice. List comprehension is the normal way of iterating through a list.
But an array can be indexed several dimensions at a time:
In [283]: all_sub[0][1][1,2,:]
Out[283]: array([0.46147936, 0.78052918])
Since the nested lists are all the same size, and arrays the same, it can be turned into a multidimensional array:
In [284]: M = np.array(all_sub)
In [285]: M.shape
Out[285]: (3, 4, 2, 3, 2)
2 ways of accessing the same subarrays:
In [286]: M[:,0,0,0,:]
Out[286]:
array([[0.5488135 , 0.71518937],
[0.31542835, 0.36371077],
[0.58651293, 0.02010755]])
In [287]: [a[0][0,0,:] for a in all_sub]
Out[287]:
[array([0.5488135 , 0.71518937]),
array([0.31542835, 0.36371077]),
array([0.58651293, 0.02010755])]

Related

How to get the index of np.maximum?

I know np.maximum computes the element-wise maximum, e.g.
>>> b = np.array([3, 6, 1])
>>> c = np.array([4, 2, 9])
>>> np.maximum(b, c)
array([4, 6, 9])
But is there any way to get the index as well? like in the above example, I also want something like this where each tuple denote (which array, index), it could be tuple or dictionary or something else. And also it would be great if it could work on 3d array, like the input two arrays are 3d arrays.
array([(1, 0), (0, 1), (1, 2)])
You could stack the two 1d-arrays to get a 2d-array and use argmax:
arr = np.vstack((b, c))
indices = np.argmax(arr, axis=0)
This will give you a list of integers, not tuples, but as you know that you compare per column, the last elements of each tuple are unnecessary anyway. They are just ascending integers starting at 0. If you really need them, though, you could just add
indices = list(zip(indices, range(len(b)))

How to create a list of n arrays from a list of tuples, each tuple containing n arrays? (Other than with a for loop)

The issue
I have a list which contains 4 tuples. It is the output of multiprocessing.Pool.map() , but I don't think that's important.
Each tuple contains 3 numpy arrays.
What is a good way to create 3 arrays, i.e. append (vstack) all the first arrays into 1, all the second into another, etc? Ie create the orange output from the orange arrays, etc, in the screenshot below:
What I have tried
I could of course do a very banal loop, like in the toy example below; it works, but it doesn't seem very elegant. I presume there's a more elegant/ pythonic way?
x = np.random.rand(10,2)
a = ((x,2*x,3*x))
b = a
c = a
d = a
my_list =[a,b,c,d]
num_items = len(my_list[0])
out =[None] * num_items
for i in range(num_items): #3 arrays in each tuple
out[i] =[]
for l in my_list:
out[i].append( l[i] )
out[i] = np.vstack(out[i])
my_array = np.array(my_list).swapaxes(0,1) # puts the `out` dimension in front
my_array.shape
Out[]: (3, 4, 10, 2)
if you want to concatenate the first dimension:
my.array.reshape(3, -1, 2) #-> shape (3, 40, 2)
if you really want a list:
list(my_array) #-> list of 3 arrays of shape (4, 10, 2)
You can try:
np.concatenate(my_list, axis=1)

Indexing a numpy array with a list of tuples

Why can't I index an ndarray using a list of tuple indices like so?
idx = [(x1, y1), ... (xn, yn)]
X[idx]
Instead I have to do something unwieldy like
idx2 = numpy.array(idx)
X[idx2[:, 0], idx2[:, 1]] # or more generally:
X[tuple(numpy.vsplit(idx2.T, 1)[0])]
Is there a simpler, more pythonic way?
You can use a list of tuples, but the convention is different from what you want. numpy expects a list of row indices, followed by a list of column values. You, apparently, want to specify a list of (x,y) pairs.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer-array-indexing
The relevant section in the documentation is 'integer array indexing'.
Here's an example, seeking 3 points in a 2d array. (2 points in 2d can be confusing):
In [223]: idx
Out[223]: [(0, 1, 1), (2, 3, 0)]
In [224]: X[idx]
Out[224]: array([2, 7, 4])
Using your style of xy pairs of indices:
In [230]: idx1 = [(0,2),(1,3),(1,0)]
In [231]: [X[i] for i in idx1]
Out[231]: [2, 7, 4]
In [240]: X[tuple(np.array(idx1).T)]
Out[240]: array([2, 7, 4])
X[tuple(zip(*idx1))] is another way of doing the conversion. The tuple() is optional in Python2. zip(*...) is a Python idiom that reverses the nesting of a list of lists.
You are on the right track with:
In [242]: idx2=np.array(idx1)
In [243]: X[idx2[:,0], idx2[:,1]]
Out[243]: array([2, 7, 4])
My tuple() is just a bit more compact (and not necessarily more 'pythonic'). Given the numpy convention, some sort of conversion is necessary.
(Should we check what works with n-dimensions and m-points?)
Use a tuple of NumPy arrays which can be directly passed to index your array:
index = tuple(np.array(list(zip(*index_tuple))))
new_array = list(prev_array[index])

Slicing n-dimensional numpy array using list of indices

Say I have a 3 dimensional numpy array:
np.random.seed(1145)
A = np.random.random((5,5,5))
and I have two lists of indices corresponding to the 2nd and 3rd dimensions:
second = [1,2]
third = [3,4]
and I want to select the elements in the numpy array corresponding to
A[:][second][third]
so the shape of the sliced array would be (5,2,2) and
A[:][second][third].flatten()
would be equivalent to to:
In [226]:
for i in range(5):
for j in second:
for k in third:
print A[i][j][k]
0.556091074129
0.622016249651
0.622530505868
0.914954716368
0.729005532319
0.253214472335
0.892869371179
0.98279375528
0.814240066639
0.986060321906
0.829987410941
0.776715489939
0.404772469431
0.204696635072
0.190891168574
0.869554447412
0.364076117846
0.04760811817
0.440210532601
0.981601369658
Is there a way to slice a numpy array in this way? So far when I try A[:][second][third] I get IndexError: index 3 is out of bounds for axis 0 with size 2 because the [:] for the first dimension seems to be ignored.
Numpy uses multiple indexing, so instead of A[1][2][3], you can--and should--use A[1,2,3].
You might then think you could do A[:, second, third], but the numpy indices are broadcast, and broadcasting second and third (two one-dimensional sequences) ends up being the numpy equivalent of zip, so the result has shape (5, 2).
What you really want is to index with, in effect, the outer product of second and third. You can do this with broadcasting by making one of them, say second into a two-dimensional array with shape (2,1). Then the shape that results from broadcasting second and third together is (2,2).
For example:
In [8]: import numpy as np
In [9]: a = np.arange(125).reshape(5,5,5)
In [10]: second = [1,2]
In [11]: third = [3,4]
In [12]: s = a[:, np.array(second).reshape(-1,1), third]
In [13]: s.shape
Out[13]: (5, 2, 2)
Note that, in this specific example, the values in second and third are sequential. If that is typical, you can simply use slices:
In [14]: s2 = a[:, 1:3, 3:5]
In [15]: s2.shape
Out[15]: (5, 2, 2)
In [16]: np.all(s == s2)
Out[16]: True
There are a couple very important difference in those two methods.
The first method would also work with indices that are not equivalent to slices. For example, it would work if second = [0, 2, 3]. (Sometimes you'll see this style of indexing referred to as "fancy indexing".)
In the first method (using broadcasting and "fancy indexing"), the data is a copy of the original array. In the second method (using only slices), the array s2 is a view into the same block of memory used by a. An in-place change in one will change them both.
One way would be to use np.ix_:
>>> out = A[np.ix_(range(A.shape[0]),second, third)]
>>> out.shape
(5, 2, 2)
>>> manual = [A[i,j,k] for i in range(5) for j in second for k in third]
>>> (out.ravel() == manual).all()
True
Downside is that you have to specify the missing coordinate ranges explicitly, but you could wrap that into a function.
I think there are three problems with your approach:
Both second and third should be slices
Since the 'to' index is exclusive, they should go from 1 to 3 and from 3 to 5
Instead of A[:][second][third], you should use A[:,second,third]
Try this:
>>> np.random.seed(1145)
>>> A = np.random.random((5,5,5))
>>> second = slice(1,3)
>>> third = slice(3,5)
>>> A[:,second,third].shape
(5, 2, 2)
>>> A[:,second,third].flatten()
array([ 0.43285482, 0.80820122, 0.64878266, 0.62689481, 0.01298507,
0.42112921, 0.23104051, 0.34601169, 0.24838564, 0.66162209,
0.96115751, 0.07338851, 0.33109539, 0.55168356, 0.33925748,
0.2353348 , 0.91254398, 0.44692211, 0.60975602, 0.64610556])

Shapes of the np.arrays, unexpected additional dimension

I'm dealing with arrays in python, and this generated a lot of doubts...
1) I produce a list of list reading 4 columns from N files and I store 4 elements for N times in a list. I then convert this list in a numpy array:
s = np.array(s)
and I ask for the shape of this array. The answer is correct:
print s.shape
#(N,4)
I then produce the mean of this Nx4 array:
s_m = sum(s)/len(s)
print s_m.shape
#(4,)
that I guess it means that this array is a 1D array. Is this correct?
2) If I subtract the mean vector s_m from the rows of the array s, I can proceed in two ways:
residuals_s = s - s_m
or:
residuals_s = []
for i in range(len(s)):
residuals_s.append([])
tmp = s[i] - s_m
residuals_s.append(tmp)
if I now ask for the shape of residuals_s in the two cases I obtain two different answers. In the first case I obtain:
(N,4)
in the second:
(N,1,4)
can someone explain why there is an additional dimension?
You can get the mean using the numpy method (producing the same (4,) shape):
s_m = s.mean(axis=0)
s - s_m works because s_m is 'broadcasted' to the dimensions of s.
If I run your second residuals_s I get a list containing empty lists and arrays:
[[],
array([ 1.02649662, 0.43613824, 0.66276758, 2.0082684 ]),
[],
array([ 1.13000227, -0.94129685, 0.63411801, -0.383982 ]),
...
]
That does not convert to a (N,1,4) array, but rather a (M,) array with dtype=object. Did you copy and paste correctly?
A corrected iteration is:
for i in range(len(s)):
residuals_s.append(s[i]-s_m)
produces a simpler list of arrays:
[array([ 1.02649662, 0.43613824, 0.66276758, 2.0082684 ]),
array([ 1.13000227, -0.94129685, 0.63411801, -0.383982 ]),
...]
which converts to a (N,4) array.
Iteration like this usually is not needed. But if it is, appending to lists like this is one way to go. Another is to pre allocate an array, and assign rows
residuals_s = np.zeros_like(s)
for i in range(s.shape[0]):
residuals_s[i,:] = s[i]-s_m
I get your (N,1,4) with:
In [39]: residuals_s=[]
In [40]: for i in range(len(s)):
....: residuals_s.append([])
....: tmp = s[i] - s_m
....: residuals_s[-1].append(tmp)
In [41]: residuals_s
Out[41]:
[[array([ 1.02649662, 0.43613824, 0.66276758, 2.0082684 ])],
[array([ 1.13000227, -0.94129685, 0.63411801, -0.383982 ])],
...]
In [43]: np.array(residuals_s).shape
Out[43]: (10, 1, 4)
Here the s[i]-s_m array is appended to an empty list, which has been appended to the main list. So it's an array within a list within a list. It's this intermediate list that produces the middle 1 dimension.
You are using NumPy ndarray without using the functions in NumPy, sum() is a python builtin function, you should use numpy.sum() instead.
I suggest you change your code as:
import numpy as np
np.random.seed(0)
s = np.random.randn(10, 4)
s_m = np.mean(a, axis=0, keepdims=True)
residuals_s = s - s_m
print s.shape, s_m.shape, residuals_s.shape
use mean() function with axis and keepdims arguments will give you the correct result.

Categories

Resources