I seems found a bug when I'm using python 2.7 with numpy module:
import numpy as np
x=np.arange(3*4*5).reshape(3,4,5)
x
Here I got the full 'x' array as follows:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
Then I try to indexing single row values in sheet [1]:
x[1][0][:]
Result:
array([20, 21, 22, 23, 24])
But something wrong while I was try to indexing single column in sheet [1]:
x[1][:][0]
Result still be the same as previous:
array([20, 21, 22, 23, 24])
Should it be array([20, 25, 30, 35])??
It seems something wrong while indexing the middle index with range?
No, it's not a bug.
When you use [:] you are using slicing notation and it takes all the list:
l = ["a", "b", "c"]
l[:]
#output:
["a", "b", "c"]
and in your case:
x[1][:]
#output:
array([[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]])
What you realy wish is using numpy indexing notation:
x[1, : ,0]
#output:
array([20, 25, 30, 35])
This is not a bug. x[1][:][0] is not a multiple index ("give me the elements where first dimension is 1, second is any, third is 0"). Instead, you are indexing three times, three objects.
x1 = x[1] # x1 is the first 4x5 subarray
x2 = x1[:] # x2 is same as x1
x3 = x2[0] # x3 is the first row of x2
To use multiple index, you want to do it in a single slice:
x[1, :, 0]
Related
given a 3D array:
a = np.arange(3*4*5).reshape(3,4,5)
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
I would like to create the following matrix:
result =
array([[20, 21, 22, 23, 24],
[ 5, 6, 7, 8, 9],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]])
Using the indices idx = [1,0,2,2]
I.e, I would like to "take" per matrix, the row specified in idx, where len(idx)==a.shape[1] and np.max(idx)<a.shape[0] as idx choose from dimension 1.
Given that your array has three dimensions (x,y,z), since you want to take one value for each row in the yth direction, you can do this:
a[idx, range(a.shape[1])]
Output:
array([[20, 21, 22, 23, 24],
[ 5, 6, 7, 8, 9],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]])
I want to create a 6x6 numpy matrix, with the first row filled with: 0, 1, ..., 5, the second row filled with 10, 11, ... , 15, and the last row filled with 50, 51, ... , 55.
I thought about using (1) nested (two layer) list comprehensions, and then converting list-of-list into a numpy.matrix object, or (2) using variables inside of range function, i.e. - range(x) and vary x from 1 to 6. I was not able to get either of these two ideas to work.
Below is my non-vectorized / looping code to construct this matrix. Is there a more Pythonic way of doing this?
a = np.zeros((6,6))
for i in range(6):
for j in range(6):
a[i,j] = 10*i + j
print(a)
(This is one of the examples given at 39:00 in the intro video to NumPy on Youtube:
Intro to Numerical Computing with NumPy
How about np.ogrid?
np.add(*np.ogrid[:60:10, :6])
# array([[ 0, 1, 2, 3, 4, 5],
# [10, 11, 12, 13, 14, 15],
# [20, 21, 22, 23, 24, 25],
# [30, 31, 32, 33, 34, 35],
# [40, 41, 42, 43, 44, 45],
# [50, 51, 52, 53, 54, 55]])
Details
ogrid returns an open meshgrid:
a, b = np.ogrid[:60:10, :6]
a
# array([[ 0],
# [10],
# [20],
# [30],
# [40],
# [50]])
b
# array([[0, 1, 2, 3, 4, 5]])
You can then perform broadcasted addition:
# a + b
np.add(a, b)
# array([[ 0, 1, 2, 3, 4, 5],
# [10, 11, 12, 13, 14, 15],
# [20, 21, 22, 23, 24, 25],
# [30, 31, 32, 33, 34, 35],
# [40, 41, 42, 43, 44, 45],
# [50, 51, 52, 53, 54, 55]])
Similarly, you can also generate two ranges using np.arange and add them:
np.arange(0, 60, 10)[:,None] + np.arange(6)
# array([[ 0, 1, 2, 3, 4, 5],
# [10, 11, 12, 13, 14, 15],
# [20, 21, 22, 23, 24, 25],
# [30, 31, 32, 33, 34, 35],
# [40, 41, 42, 43, 44, 45],
# [50, 51, 52, 53, 54, 55]])
This can be accomplished with broadcasting,
arange(0, 6) + 10*arange(0, 6)[:, None]
array([[ 0, 1, 2, 3, 4, 5],
[10, 11, 12, 13, 14, 15],
[20, 21, 22, 23, 24, 25],
[30, 31, 32, 33, 34, 35],
[40, 41, 42, 43, 44, 45],
[50, 51, 52, 53, 54, 55]])
I'd recommend reading https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html and https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html. "Pythonic" doesn't really matter when working with numpy. Some times iterating, list comprehensions, and other pythonic approaches work well with arrays, other times they are terribly inefficient. However, the links given cover some high level concepts that are very powerfull with numpy.
I can use slicing to select a region when opening netcdf files in xarray, using preprocess ie:
SSA=dict(lat=slice(-38,-34),lon=slice(138,141))
def Mask(ds):
return ds.sel(**SSA)
xr.open_mfdataset(filelist, preprocess=Mask)
but what is the most efficient way to extract the data for a list of seperate points by latitude and longitude??
A list of points can be selected using a DataArray as the indexer. This will result in the array being reindexed along the indexer's coordinates.
Straight from the docs on More Advanced Indexing:
In [78]: da = xr.DataArray(np.arange(56).reshape((7, 8)), dims=['x', 'y'])
In [79]: da
Out[79]:
<xarray.DataArray (x: 7, y: 8)>
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55]])
Dimensions without coordinates: x, y
In [80]: da.isel(x=xr.DataArray([0, 1, 6], dims='z'),
....: y=xr.DataArray([0, 1, 0], dims='z'))
....:
Out[80]:
<xarray.DataArray (z: 3)>
array([ 0, 9, 48])
Dimensions without coordinates: z
The indexing array can also be easily pulled out of a pandas DataFrame, with something like da.sel(longitude=df.longitude.to_xarray(), latitude=df.latitude.to_xarray()), which will result in the DataArray being reindexed by the DataFrame's index.
I've got two 2-D numpy arrays with same shape, let's say (10,6).
The first array x is full of some meaningful float numbers.
x = np.arange(60).reshape(-1,6)
The second array a is sparse array, with each row contains ONLY 2 non-zero values.
a = np.zeros((10,6))
for i in range(10):
a[i, 1] = 1
a[i, 2] = 1
Then there's a third array with the shape of (10,2), and I want to update the values of each row to the first array x at the position where a is not zero.
v = np.arange(20).reshape(10,2)
so the original x and the updated x will be:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59]])
and
array([[ 0, 0, 1, 3, 4, 5],
[ 6, 2, 3, 9, 10, 11],
[12, 4, 5, 15, 16, 17],
[18, 6, 7, 21, 22, 23],
[24, 8, 9, 27, 28, 29],
[30, 10, 11, 33, 34, 35],
[36, 12, 13, 39, 40, 41],
[42, 14, 15, 45, 46, 47],
[48, 16, 17, 51, 52, 53],
[54, 18, 19, 57, 58, 59]])
I've tried the following method
x[np.where(a!=0)] = v
Then I got an error of shape mismatch: value array of shape (10,2) could not be broadcast to indexing result of shape (20,)
What's wrong with this approach, is there an alternative to do it? Thanks a lot.
Thanks to the comment by #Divakar, the problem happens because the shapes of the two variables on both side of the assignment mark = are different.
To the left, the expression x[np.where(a!=0)] or x[a!=0] or x[np.nonzero(a)] are not structured, which has a shape of (20,)
To the right, we need an array of similar shape to finish the assignment. Therefore, a simple ravel() or reshape(-1) will do the job.
so the solution is as simple as x[a!=0] = v.ravel().
import numpy as np
arrayOne = np.random.rand(6).reshape((2, 3))
arrayTwo = np.asarray([[0,1,2], [1,2,0]])
arrayThree = np.zeros((2, 2))
arrayOne[arrayTwo != 0] = arrayThree.ravel()
print(arrayOne)
[[0.56251284 0. 0. ]
[0. 0. 0.20076913]]
Note regarding edit: The solution above is not mine, all credit goes to Divakar. I edited because my earlier answer misunderstood OP's question and I wish to avoid confusion.
I'm having problems with the following task.
Assume we have a matrix, looking like this:
Mat = np.array([
[11, 12, 13, 14, 15], \
[21, 22, 23, 24, 25], \
[31, 32, 33, 34, 35], \
[41, 42, 43, 44, 45], \
[51, 52, 53, 54, 55]])
What I want to do is to replace the entries 22, 33 and 44 with something different that I calculated before. I know I could do this with for loops but I think there has to be a more elegant way.
I have something like this in mind:
Subselect the main diagonal from [1,1] to [-2,-2] and save it as an array.
Modify this array in the desired manner.
Save the modified array as part of the main diagonal of the matrix.
I found the np.diagonal() to get the diagonal and got so far:
Mat = np.array([
[11, 12, 13, 14, 15], \
[21, 22, 23, 24, 25], \
[31, 32, 33, 34, 35], \
[41, 42, 43, 44, 45], \
[51, 52, 53, 54, 55]])
print(Mat)
snipA = Mat.diagonal()
snipB = snipA[1:len(snipA)-1]
print(snipA)
print(snipB)
There are two problems now. First, I cannot modify snipB in any way. I get the error: "output array is read-only". Second, how can I save a modified snipB into the matrix again?
Any help is appreciated.
You can index and modify a part of the diagonal like so:
>>> subdiag = np.arange(1, len(mat)-1)
>>> mat[subdiag, subdiag]
array([22, 33, 44])
>>> mat[subdiag, subdiag] = 0
>>> mat
array([[11, 12, 13, 14, 15],
[21, 0, 23, 24, 25],
[31, 32, 0, 34, 35],
[41, 42, 43, 0, 45],
[51, 52, 53, 54, 55]])
>>>
>>> mat[subdiag, subdiag] = [22, 33, 44]
>>> mat
array([[11, 12, 13, 14, 15],
[21, 22, 23, 24, 25],
[31, 32, 33, 34, 35],
[41, 42, 43, 44, 45],
[51, 52, 53, 54, 55]])
You can also do this with einsum since numpy 1.10
np.einsum('ii->i', mat)[1:-1] = 0
mat
array([[11, 12, 13, 14, 15],
[21, 0, 23, 24, 25],
[31, 32, 0, 34, 35],
[41, 42, 43, 0, 45],
[51, 52, 53, 54, 55]])