How to append values to multidimensional numpy array? - python

How can I efficiently append values to a multidimensional numpy array?
import numpy as np
a = np.array([[1,2,3], [4,5,6]])
print(a)
I want to append np.NaN for k=2 times to each dimension/array of the outer array?
One option would be to use a loop - but I guess there must be something smarter (vectorized) in numpy
Expected result would be:
np.array([[1,2,3, np.NaN, np.NaN, ], [4,5,6, np.NaN, np.NaN, ]])
I.e. I am looking for a way to:
np.concatenate((a, np.NaN))
on all the inner dimensions.
A
np.append(a, [[np.NaN, np.NaN]], axis=0)
fails with:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 2

For your problem np.hstack() or np.pad() should do the job.
Using np.hstack():
k = 2
a_mat = np.array([[1,2,3], [4, 5, 6]])
nan_mat = np.zeros((a_mat.shape[0], k))
nan_mat.fill(np.nan)
a_mat = np.hstack((a_mat, nan_mat))
Using np.pad():
k = 2
padding_shape = [(0, 0), (0, k)] # [(dim1_before_pads, dim1_after_pads)..]
a_mat = np.array([[1,2,3], [4, 5, 6]])
np.pad(a_mat, mode='constant', constant_values=np.nan)
Note: Incase you are using np.pad() for filling with np.nan, check this post out as well: about padding with np.nan

Related

Numpy python - calculating sum of columns from irregular dimension

I have a multi-dimensional array for scores, and for which, I need to get sum of each columns at 3rd level in Python. I am using Numpy to achieve this.
import numpy as np
Data is something like:
score_list = [
[[1,1,3], [1,2,5]],
[[2,7,5], [4,1,3]]
]
This should return:
[[3 8 8] [5 3 8]]
Which is happening correctly using this:
sum_array = np_array.sum(axis=0)
print(sum_array)
However, if I have irregular shape like this:
score_list = [
[[1,1], [1,2,5]],
[[2,7], [4,1,3]]
]
I expect it to return:
[[3 8] [5 3 8]]
However, it comes up with warning and the return value is:
[list([1, 1, 2, 7]) list([1, 2, 5, 4, 1, 3])]
How can I get expected result?
numpy will try to cast it into an nd array which will fail, instead consider passing each sublist individually using zip.
score_list = [
[[1,1], [1,2,5]],
[[2,7], [4,1,3]]
]
import numpy as np
res = [np.sum(x,axis=0) for x in zip(*score_list)]
print(res)
[array([3, 8]), array([5, 3, 8])]
Here is one solution for doing this, keep in mind that it doesn't use numpy and will be very inefficient for larger matrices (but for smaller matrices runs just fine).
# Create matrix
score_list = [
[[1,1,3], [1,2,5]],
[[2,7,5], [4,1,3]]
]
# Get each row
for i in range(1, len(score_list)):
# Get each list within the row
for j in range(len(score_list[i])):
# Get each value in each list
for k in range(len(score_list[i][j])):
# Add current value to the same index
# on the first row
score_list[0][j][k] += score_list[i][j][k]
print(score_list[0])
There is bound to be a better solution but this is a temporary fix for you :)
Edit. Made more efficient
A possible solution:
a = np.vstack([np.array(score_list[x], dtype='object')
for x in range(len(score_list))])
[np.add(*[x for x in a[:, i]]) for i in range(a.shape[1])]
Another possible solution:
a = sum(score_list, [])
b = [a[x] for x in range(0,len(a),2)]
c = [a[x] for x in range(1,len(a),2)]
[np.add(x[0], x[1]) for x in [b, c]]
Output:
[array([3, 8]), array([5, 3, 8])]

How to reshape matrix with numpy without using explicit values for argument?

I am trying to create a function and calculate the inner product using numpy.
I got the function to work however I am using explicit numbers in my np.reshape function and I need to make it to use based on input.
my code looks like this:
import numpy as np
X = np.array([[1,2],[3,4]])
Z = np.array([[1,4],[2,5],[3,6]])
# Calculating S
def calculate_S(X, n, m):
assert n == X.shape[0]
n,d1=X.shape
m,d2=X.shape
S = np.diag(np.inner(X,X))
return S
S= calculate_S(X,n,m)
S = S.reshape(2,1)
print(s)
output:
---------------------------------
[[ 5]
[25]]
So the output is correct however instead of specifying 2,1 I need that those values to be automatically placed there based on the shape of my matrix.
How do I do that?
In [163]: X = np.array([[1,2],[3,4]])
In [164]: np.inner(X,X)
Out[164]:
array([[ 5, 11],
[11, 25]])
In [165]: np.diag(np.inner(X,X))
Out[165]: array([ 5, 25])
reshape with -1 gets around having to specify the 2:
In [166]: np.diag(np.inner(X,X)).reshape(-1,1)
Out[166]:
array([[ 5],
[25]])
another way of adding dimension:
In [167]: np.diag(np.inner(X,X))[:,None]
Out[167]:
array([[ 5],
[25]])
You can get the "diagonal" directly with:
In [175]: np.einsum('ij,ij->i',X,X)
Out[175]: array([ 5, 25])
another
In [177]: (X[:,None,:]#X[:,:,None])[:,0,:]
Out[177]:
array([[ 5],
[25]])

Python xarray - vectorized indexing

I'm trying to understand vectorized indexing in xarray by following this example from the docs:
import xarray as xr
import numpy as np
da = xr.DataArray(np.arange(12).reshape((3, 4)), dims=['x', 'y'],
coords={'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']})
ind_x = xr.DataArray([0, 1], dims=['x'])
ind_y = xr.DataArray([0, 1], dims=['y'])
The output of the array da is as follows:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
So far so good. Now in the example there are shown two ways of indexing. Orthogonal (not interested in this case) and vectorized (what I want). For the vectorized indexing the following is shown:
In [37]: da[ind_x, ind_x] # vectorized indexing
Out[37]:
<xarray.DataArray (x: 2)>
array([0, 5])
Coordinates:
y (x) <U1 'a' 'b'
* x (x) int64 0 1
The result seems to be what I want, but this feels very strange to me. ind_x (which in theory refers to dims=['x']) is being passed twice but somehow is capable of indexing what appears to be both in the x and y dims. As far as I understand the x dim would be the rows and y dim would be the columns, is that correct? How come the same ind_x is capable of accessing both the rows and the cols?
This seems to be the concept I need for my problem, but can't understand how it works or how to extend it to more dimensions. I was expecting this result to be given by da[ind_x, ind_y] however that seems to yield the orthogonal indexing surprisingly enough.
Having the example with ind_x being used twice is probably a little confusing: actually, the dimension of the indexer doesn't have to matter at all for the indexing behavior! Observe:
ind_a = xr.DataArray([0, 1], dims=["a"]
da[ind_a, ind_a]
Gives:
<xarray.DataArray (a: 2)>
array([0, 5])
Coordinates:
x (a) int32 0 1
y (a) <U1 'a' 'b'
Dimensions without coordinates: a
The same goes for the orthogonal example:
ind_a = xr.DataArray([0, 1], dims=["a"])
ind_b = xr.DataArray([0, 1], dims=["b"])
da[ind_a, ind_b]
Result:
<xarray.DataArray (a: 2, b: 2)>
array([[0, 2],
[4, 6]])
Coordinates:
x (a) int32 0 1
y (b) <U1 'a' 'c'
Dimensions without coordinates: a, b
The difference is purely in terms of "labeling", as in this case you end up with dimensions without coordinates.
Fancy indexing
Generally stated, I personally do not find "fancy indexing" the most intuitive concept. I did find this example in NEP 21 pretty clarifying: https://numpy.org/neps/nep-0021-advanced-indexing.html
Specifically, this:
Consider indexing a 2D array by two 1D integer arrays, e.g., x[[0, 1], [0, 1]]:
Outer indexing is equivalent to combining multiple integer indices with itertools.product(). The result in this case is another 2D
array with all combinations of indexed elements, e.g.,
np.array([[x[0, 0], x[0, 1]], [x[1, 0], x[1, 1]]])
Vectorized indexing is equivalent to combining multiple integer
indices with zip(). The result in this case is a 1D array containing
the diagonal elements, e.g., np.array([x[0, 0], x[1, 1]]).
Back to xarray
da[ind_x, ind_y]
Can also be written as:
da.isel(x=ind_x, y=ind_y)
The dimensions are implicit in the order. However, xarray still attempts to broadcast (based on dimension labels), so da[ind_y] mismatches and results in an error. da[ind_a] and da[ind_b] both work.
More dimensions
The dims you provide for the indexer are what determines the shape of the output, not the dimensions of the array you're indexing.
If you want to select single values along the dimensions (so we're zip()-ing through the indexes simultaneously), just make sure that your indexers share the dimension, here for a 3D array:
da = xr.DataArray(
data=np.arange(3 * 4 * 5).reshape(3, 4, 5),
coords={
"x": [1, 2, 3],
"y": ["a", "b", "c", "d"],
"z": [1.0, 2.0, 3.0, 4.0, 5.0],
},
dims=["x", "y", "z"],
)
ind_along_x = xr.DataArray([0, 1], dims=["new_index"])
ind_along_y = xr.DataArray([0, 2], dims=["new_index"])
ind_along_z = xr.DataArray([0, 3], dims=["new_index"])
da[ind_along_x, ind_along_y, ind_along_z]
Note that the values of the indexers do not have to the same -- that would be a pretty severe limitation, after all.
Result:
<xarray.DataArray (new_index: 2)>
array([ 0, 33])
Coordinates:
x (new_index) int32 1 2
y (new_index) <U1 'a' 'c'
z (new_index) float64 1.0 4.0
Dimensions without coordinates: new_index

How to delete rows of numpy array by multiple row indices?

I have two lists of indices (idx[0] and idx[1]), and I should delete the corresponding rows from numpy array y_test.
y_test
12 11 10
1 2 2
3 2 3
4 1 2
13 1 10
idx[0] = [0,2]
idx[1] = [1,3]
I tried to delete the rows as follows (using ~). But it didn't work:
result = y_test[(~idx[0]+~idx[1]+~idx[2])]
Expected result:
result =
13 1 10
Instead of removing elements, just make a new array with the desired ones. This will keep any future indexing from getting jumbled up and maintain the old array.
import numpy as np
y_test = np.asarray([[12, 11, 10], [1, 2, 2], [3, 2, 3], [4, 1, 2], [13, 1, 10]])
idx = [[0, 2], [1, 3]]
# flatten list of lists
idx_flat = [i for j in idx for i in j]
# assign values that are NOT in your idx list to a new array
result = [row for num, row in enumerate(y_test) if num not in idx_flat]
# cast this however you want it, right now 'result' is a list of np.arrays
print result
[array([13, 1, 10])]
For an understanding of the flatten step using list comprehensions check this out
You can use numpy.delete which deletes the subarrays along the axis.
np.delete(y_test, idx, axis=0)
Make sure that idx.dtype is an integer type and use numpy.astype if not.
Your approach did not work because idx is not a boolean index array but holds the indices. So ~ which is binary negation will produce ~[0, 2] = [-1, -3] (where both should be numpy arrays).
I would definitely recommend reading up on the difference between index arrays and boolean index arrays. For boolean index arrays I would suggest using numpy.logical_not and numpy.logical_or.
+ concatenates Python lists but is the standard plus for numpy arrays.
Since you are using NumPy I'd suggest masking in this way.
Setup:
import numpy as np
y_test = np.array([[12,11,10],
[1,2,2],
[3,2,3],
[4,1,2],
[13,1,10]])
idx = np.array([[0,2], [1,3]])
Generate the mask:
Generate a mask of ones then set to zero elements at index in idx:
mask = np.ones(len(y_test), dtype = int).reshape(5,1)
mask[idx.flatten()] = 0
Finally apply the mask:
y_test[~np.all(y_test * mask == 0, axis=1)]
#=> [[13 1 10]]
y_test has not been modified.

Numpy: np_array[index_order_array] changes the order of elements in an array?

I'm new to Python and Numpy
Why array = [1,2,3,4] and new_array = array[[3,2,0,1]] results in changing the order of elements as mentioned in the inner array?
import numpy as np
array = np.array([10,20,30,40,50])
array_link = np.array(['A','B','C','D','E'])
new_array = np.ndarray(5, dtype=np.int32)
new_array_link = np.ndarray(5, dtype=np.int32)
perm = np.random.permutation(array.shape[0])
new_array = array[perm]
new_array_link = array_link[perm]
print(new_array)
print(new_array_link)
# Output:
# [30 40 10 50 20]
# ['C' 'D' 'A' 'E' 'B']
Here is the Playground
Is this how it is supposed to work? Shouldn't it be initializing a new (perhaps 2D) array with the elements of inner array (as the first row)?
The first of these 2 lines is useless. python does not require that you initialize or 'pre-define' a variable. The first creates an array; the second also creates one, and reassigns the variable. The original value of new_array is discarded.
new_array = np.ndarray(5, dtype=np.int32)
...
new_array = array[perm]
And as a general rule, np.ndarray is only used for advanced purposes. np.array, np.zeros etc are used to create new arrays.
array is a poor choice of variable name. array looks too much like np.array, and actually confused me when I first copied the above lines.
array = np.array([10,20,30,40,50])
In sum your code does:
In [28]: arr = np.array([10,20,30,40,50])
In [29]: perm = np.random.permutation(arr.shape[0])
In [30]: perm
Out[30]: array([2, 0, 1, 4, 3])
In [31]: arr1 = arr[perm]
In [32]: arr1
Out[32]: array([30, 10, 20, 50, 40])
arr1 is a new array with values selected from arr. arr itself is unchanged.
You could assign values to predefined array this way:
In [35]: arr2 = np.zeros(5, int)
In [36]: arr2
Out[36]: array([0, 0, 0, 0, 0])
In [37]: arr2[:] = arr[perm]
In [38]: arr2
Out[38]: array([30, 10, 20, 50, 40])
In arr[perm], the result is the same shape as perm, in this case a 5 element 1d array. If I turn perm into a (5,1) column array, the result is also a (5,1) array:
In [40]: arr[perm[:,None]]
Out[40]:
array([[30],
[10],
[20],
[50],
[40]])
In [41]: _.shape
Out[41]: (5, 1)
Another example of array indexing - with a (2,2) array:
In [43]: arr[np.array([[0,1],[2,3]])]
Out[43]:
array([[10, 20],
[30, 40]])

Categories

Resources