Store indexes after concatenating a numpy array - python

I have a list like this,
mylist = [
np.array([48.5, 38.0, 40.0]),
np.array([61.5, 52.5, 55.5, 46.5]),
np.array([35.5, 36.5]),
]
I want to find the index of the array, and the location of the specific value in the array together with the values in mylist.
I am able to achieve the last column with np.concatenate(mylist) but don't know how to handle the rest efficiently.
expected = np.vstack(
(
np.array([0, 0, 0, 1, 1, 1, 1, 2, 2]),
np.array([0, 1, 2, 0, 1, 2, 3, 0, 1]),
np.array([48.5, 38.0, 40.0, 61.5, 52.5, 55.5, 46.5, 35.5, 36.5]),
)
).T
It can be read as i.e. 38 is in the first array (index=0) and it is the second element of that array (index = 1).

This does what you ask, if this is really what you want.
import numpy as np
mylist = [
np.array([48.5, 38. , 40. ]),
np.array([61.5, 52.5, 55.5, 46.5 ]),
np.array([35.5, 36.5])]
a1 = []
a2 = []
for i,l in enumerate(mylist):
a1.extend( [i] * len(l) )
a2.extend( list(range(len(l))) )
final = np.array( [a1, a2, np.concatenate(mylist)] ).T
print(final)
Output:
[[ 0. 0. 48.5]
[ 0. 1. 38. ]
[ 0. 2. 40. ]
[ 1. 0. 61.5]
[ 1. 1. 52.5]
[ 1. 2. 55.5]
[ 1. 3. 46.5]
[ 2. 0. 35.5]
[ 2. 1. 36.5]]

You can use use map with len to find out len of each sublist in mylist. Then use that in np.repeat to get "X" co-ordinate. Now, apply np.arange on each of the lengths to get "Y" co-ordinate and concatenate them using np.hstack. Now, just np.column_stack them together.
lens = list(map(len, mylist))
idx0 = np.repeat(np.arange(len(mylist)), lens) # [0, 0, 0, 1, 1, 1, 1, 2, 2]
idx1 = np.hstack([np.arange(v) for v in lens]) # [0, 1, 2, 0, 1, 2, 3, 0, 1]
vals = np.hstack(mylist) # [48.5, 38. , 40. , 61.5, 52.5, 55.5, 46.5, 35.5, 36.5]
out = np.column_stack([idx0, idx1, vals])
print(out)
[[ 0. 0. 48.5]
[ 0. 1. 38. ]
[ 0. 2. 40. ]
[ 1. 0. 61.5]
[ 1. 1. 52.5]
[ 1. 2. 55.5]
[ 1. 3. 46.5]
[ 2. 0. 35.5]
[ 2. 1. 36.5]]

Related

Extend a number of list zero into a multi-dimension array in Python

I am new to Python and I am trying to extend an existing list with a list of zero by a number. Below is my code but I believe there is another way to make it simpler and also improve the performance.
missing_len_last_slice = step - len(result_list[-1])
list = []
list_append_zero = np.pad(list, (0, len(list_channels)), 'constant')
for y in range(missing_len_last_slice):
list.append(list_append_zero)
merge_list_result = np.vstack((result_list[-1], list))
result_list[-1] = merge_list_result
Current:
Length: 5.200
array([[-0.4785, -1.578 ],
[-0.484 , -1.5815],
[-0.483 , -1.584 ],
...,
[-0.13 , -0.9475],
[-0.117 , -0.9315],
[-0.1175, -0.9395]])
Expectation:
Length: 10.000 with the extension of 4.800 [0, 0]
array([[-0.4785, -1.578 ],
[-0.484 , -1.5815],
[-0.483 , -1.584 ],
...,
[-0.13 , -0.9475],
[-0.117 , -0.9315],
[-0.1175, -0.9395],
[0, 0],
[0, 0],
...
[0, 0]])
PS: The number dimension of the array is dynamic. In the example, it is 2 as [-0.4785, -1.578 ].
As my previous answer you can do this as:
a = np.array([[-0.4785, -1.578 ],
[-0.484 , -1.5815],
[-0.483 , -1.584 ]])
pad_fill = np.array([0.0, 0.0])
padding_number = 2 # 10000 - a.shape[0]
A_pad = np.pad(a, ((0, padding_number), (0, 0)), constant_values=pad_fill)
# [[-0.4785 -1.578 ]
# [-0.484 -1.5815]
# [-0.483 -1.584 ]
# [ 0. 0. ]
# [ 0. 0. ]]

How can I retrieve values from a list of numpy arrays with a list of indices?

I have a list of numpy array indices which I created with argsort():
i =
[array([0, 1, 3, 2, 4], dtype=int64),
array([1, 3, 0, 2, 4], dtype=int64),
array([2, 4, 0, 1, 3], dtype=int64),
array([3, 1, 0, 2, 4], dtype=int64),
array([4, 2, 0, 3, 1], dtype=int64)]
This is the corresponding list of arrays with values:
v =
[array([0. , 0.19648367, 0.24237755, 0.200832 , 0.28600039]),
array([0.19648367, 0. , 0.25492185, 0.15594099, 0.31378135]),
array([0.24237755, 0.25492185, 0. , 0.25685254, 0.2042604 ]),
array([0.200832 , 0.15594099, 0.25685254, 0. , 0.29995309]),
array([0.28600039, 0.31378135, 0.2042604 , 0.29995309, 0. ])]
When I try to loop over the lists like this:
for line in i:
v[line]
I get the error:
TypeError: only integer scalar arrays can be converted to a scalar index
But when I try to access them individually like this:
v[0][i[0]]
It works and outputs the values in v[0] in correct order like this:
array([0. , 0.19648367, 0.200832 , 0.24237755, 0.28600039])
I want the arrays in v ordered from the smallest value to biggest.
What am I doing wrong?
This is all easier (and faster) if you don't use a python list of Numpy arrays, but instead use a multi-dimensional numpy array. Then you have all the great tool from numpy at you disposal and can avoid slow loops. For example for you can use np.take_along_axis:
from numpy import array
i = np.array([
[0, 1, 3, 2, 4],
[1, 3, 0, 2, 4],
[2, 4, 0, 1, 3],
[3, 1, 0, 2, 4],
[4, 2, 0, 3, 1]])
v = array([
[0., 0.19648367, 0.24237755, 0.200832 , 0.28600039],
[0.19648367, 0. , 0.25492185, 0.15594099, 0.31378135],
[0.24237755, 0.25492185, 0. , 0.25685254, 0.2042604 ],
[0.200832 , 0.15594099, 0.25685254, 0. , 0.29995309],
[0.28600039, 0.31378135, 0.2042604 , 0.29995309, 0. ]]
)
np.take_along_axis(v,i, 1)
result:
array([[0. , 0.19648367, 0.200832 , 0.24237755, 0.28600039],
[0. , 0.15594099, 0.19648367, 0.25492185, 0.31378135],
[0. , 0.2042604 , 0.24237755, 0.25492185, 0.25685254],
[0. , 0.15594099, 0.200832 , 0.25685254, 0.29995309],
[0. , 0.2042604 , 0.28600039, 0.29995309, 0.31378135]])
Loop through each line of i, and loop through each line of v at the same time using enumerate:
import numpy as np
i = np.array([[0, 1, 3, 2, 4], [1, 3, 0, 2, 4], [2, 4, 0, 1, 3], [3, 1, 0, 2, 4], [4, 2, 0, 3, 1]])
v = np.array([[0. , 0.19648367, 0.24237755, 0.200832 , 0.28600039],
[0.19648367, 0. , 0.25492185, 0.15594099, 0.31378135],
[0.24237755, 0.25492185, 0. , 0.25685254, 0.2042604 ],
[0.200832 , 0.15594099, 0.25685254, 0. , 0.29995309],
[0.28600039, 0.31378135, 0.2042604 , 0.29995309, 0. ]] )
# you can rearrange each line of v by using indices in each row of i
for index, line in enumerate(i):
print(v[index][line])
Output:
[0. 0.19648367 0.200832 0.24237755 0.28600039]
[0. 0.15594099 0.19648367 0.25492185 0.31378135]
[0. 0.2042604 0.24237755 0.25492185 0.25685254]
[0. 0.15594099 0.200832 0.25685254 0.29995309]
[0. 0.2042604 0.28600039 0.29995309 0.31378135]

Set numpy array elements to zero for each row's smallest 2 elements [duplicate]

This question already has an answer here:
Fill a matrix from a matrix of indices
(1 answer)
Closed 5 years ago.
For example
E =
array([[ 10. , 2.38761596, 7.00090613, 4.51495754],
[ 2.38761596, 10. , 2.80035826, 1. ],
[ 7.00090613, 2.80035826, 10. , 5.95109207],
[ 4.51495754, 1. , 5.95109207, 10. ]])
The indices for smallest 2 for each row can be get from argsort :
IndexSortE = np.argsort(E)
smallest2 = IndexSortE[:,0:2]
smallest2
array([[1, 3],
[3, 0],
[1, 3],
[1, 0]])
Now how do I get E0 like this ?? :
E0 =
array([[ 10. , 0.00000000, 7.00090613, 0.00000000],
[ 0.00000000, 10. , 2.80035826, 0.00000000],
[ 7.00090613, 0.00000000, 10. , 0.00000000],
[ 0.00000000, 0.00000000, 5.95109207, 10. ]])
Thanks
You can create another array of row indices; then take advantage of advanced indexing to modify the corresponding values:
E[np.arange(E.shape[0])[:,None], smallest2] = 0
E
#array([[ 10. , 0. , 7.00090613, 0. ],
# [ 0. , 10. , 2.80035826, 0. ],
# [ 7.00090613, 0. , 10. , 0. ],
# [ 0. , 0. , 5.95109207, 10. ]])
To add some explanations, use np.broadcast_arrays to see how these indices are broadcasted:
np.broadcast_arrays(np.arange(E.shape[0])[:,None], smallest2)
# [array([[0, 0],
# [1, 1],
# [2, 2],
# [3, 3]]), array([[1, 3],
# [3, 0],
# [1, 3],
# [1, 0]])]
gives a length two list, the first one gives row indices while the second one gives column indices. Now according to advanced indexing rules, this pair will position elements at
(0, 1), (0, 3),
(1, 3), (1, 0),
...
etc.

Python 2.7 appending column to 2d array

I have 2 arrays, x and y:
x = [[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 9. 0. 3. 6.]]
y = [[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]
I want a z matrix, as: z = [y[0], x, y[1], y[2]]:
[[ 1. 1. 2. 3. 4. 0. 0.]
[ 0. 5. 6. 7. 8. 1. 0.]
[ 0. 9. 0. 3. 6. 0. 1.]]
So I made this code:
z = np.c_[y[0], x]
for j in range(n):
z = np.c_[x, y[j]]
But it is not saving the matrix. My resulting z was just the last operation:
[[ 1. 2. 3. 4. 0.]
[ 5. 6. 7. 8. 0.]
[ 9. 0. 3. 6. 1.]]
How could I save the changes made on the matrix? I also tried to numpy.append() the same way, but it gives an error message:
ValueError: all the input arrays must have same number of dimensions
Using np.c to stack columns of y and x..
np.c_[np.array(y)[0],np.asanyarray(x),np.array(y)[1],np.array(y)[2]]
Out[536]:
array([[1, 1, 2, ..., 4, 0, 0],
[0, 5, 6, ..., 8, 1, 0],
[0, 9, 0, ..., 6, 0, 1]])
Or you can use np.roll to shift the columns before stacking them and shift again afterwards.
np.roll(np.c_[np.array(x),np.roll(np.array(y),-1,axis=1)],1,axis=1)
Out[549]:
array([[1, 1, 2, ..., 4, 0, 0],
[0, 5, 6, ..., 8, 1, 0],
[0, 9, 0, ..., 6, 0, 1]])
I think that the command you are looking for is numpy.insert(a, pos, col, axis = 1). If you make z = insert(y, 1, x, axis = 1) it will insert a new column on y with the values from x, and save the output in z.

How can I create a sparse matrix instead of a dense one in this program?

I have this delta function which have 3 cases. mask1, mask2 and if none of them is satisfied delta = 0, since res = np.zeros
def delta(r, dr):
res = np.zeros(r.shape)
mask1 = (r >= 0.5*dr) & (r <= 1.5*dr)
res[mask1] = (5-3*np.abs(r[mask1])/dr \
- np.sqrt(-3*(1-np.abs(r[mask1])/dr)**2+1)) \
/(6*dr)
mask2 = np.logical_not(mask1) & (r <= 0.5*dr)
res[mask2] = (1+np.sqrt(-3*(r[mask2]/dr)**2+1))/(3*dr)
return res
Then I have this other function where I call the former and I construct an array, E
def matrix_E(nk,X,Y,xhi,eta,dx,dy):
rx = abs(X[np.newaxis,:] - xhi[:,np.newaxis])
ry = abs(Y[np.newaxis,:] - eta[:,np.newaxis])
deltx = delta(rx,dx)
delty = delta(ry,dy)
E = deltx*delty
return E
The thing is that most of the elements of E belong to the third case of delta, 0. Most means about 99%.
So, I would like to have a sparse matrix instead of a dense one and not to stock the 0 elements in order to save memory.
Any ideas in how I could do it?
The normal way to create a sparse matrix is to construct three 1d arrays, with the nonzero values, and their i and j indexes. Then pass them to the coo_matrix function.
The coordinates don't have to be in order, so you could construct the arrays for the 2 nonzero mask cases and concatenate them.
Here's a sample construction using 2 masks
In [107]: x=np.arange(5)
In [108]: i,j,data=[],[],[]
In [110]: mask1=x%2==0
In [111]: mask2=x%2!=0
In [112]: i.append(x[mask1])
In [113]: j.append((x*2)[mask1])
In [114]: i.append(x[mask2])
In [115]: j.append(x[mask2])
In [116]: i=np.concatenate(i)
In [117]: j=np.concatenate(j)
In [118]: i
Out[118]: array([0, 2, 4, 1, 3])
In [119]: j
Out[119]: array([0, 4, 8, 1, 3])
In [120]: M=sparse.coo_matrix((x,(i,j)))
In [121]: print(M)
(0, 0) 0
(2, 4) 1
(4, 8) 2
(1, 1) 3
(3, 3) 4
In [122]: M.A
Out[122]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 4, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 2]])
A coo format stores those 3 arrays as is, but they get sorted and cleaned up when converted to other formats and printed.
I can work on adapting this to your case, but this may be enough to get you started.
It looks like X,Y,xhi,eta are 1d arrays. rx and ry are then 2d. delta returns a result the same shape as its input. E = deltx*delty suggests that deltax and deltay are the same shape (or at least broadcastable).
Since sparse matrix has a .multiply method to do element wise multiplication, we can focus on producing sparse delta matrices.
If you afford the memory to make rx, and a couple of masks, then you can also afford to make deltax (all the same size). Even through deltax has lots of zeros, it is probably fastest to make it dense.
But let's try to case the delta calculation, as a sparse build.
This looks like the essense of what you are doing in delta, at least with one mask:
start with a 2d array:
In [138]: r = np.arange(24).reshape(4,6)
In [139]: mask1 = (r>=8) & (r<=16)
In [140]: res1 = r[mask1]*0.2
In [141]: I,J = np.where(mask1)
the resulting vectors are:
In [142]: I
Out[142]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
In [143]: J
Out[143]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
In [144]: res1
Out[144]: array([ 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. , 3.2])
Make a sparse matrix:
In [145]: M=sparse.coo_matrix((res1,(I,J)), r.shape)
In [146]: M.A
Out[146]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
I could make another sparse matrix with mask2, and add the two.
In [147]: mask2 = (r>=17) & (r<=22)
In [148]: res2 = r[mask2]*-0.4
In [149]: I,J = np.where(mask2)
In [150]: M2=sparse.coo_matrix((res2,(I,J)), r.shape)
In [151]: M2.A
Out[151]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
...
In [153]: (M1+M2).A
Out[153]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
Or I could concatenate the res1 and res2, etc and make one sparse matrix:
In [156]: I1,J1 = np.where(mask1)
In [157]: I2,J2 = np.where(mask2)
In [158]: res12=np.concatenate((res1,res2))
In [159]: I12=np.concatenate((I1,I2))
In [160]: J12=np.concatenate((J1,J2))
In [161]: M12=sparse.coo_matrix((res12,(I12,J12)), r.shape)
In [162]: M12.A
Out[162]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
Here I choose the masks so the nonzero values don't overlap, but both methods work if they did. It's a delibrate design feature of the coo format that values for repeated indices are summed. It's very handy feature when creating sparse matries for finite element problems.
I can also get index arrays by creating a sparse matrix from the mask:
In [179]: rmask1=sparse.coo_matrix(mask1)
In [180]: rmask1.row
Out[180]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
In [181]: rmask1.col
Out[181]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
In [184]: sparse.coo_matrix((res1, (rmask1.row, rmask1.col)),rmask1.shape).A
Out[184]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
I can't, though, create a mask from a sparse version of r. (r>=8) & (r<=16). That kind of inequality test has not been implemented for sparse matrices. But that might not matter, since r is probably not sparse.

Categories

Resources