I am quite new to python and have read lots of SO questions on this topic however none of them answers my needs.
I end up with an ndarray:
[[1, 2, 3]
[4, 5, 6]]
Now I want to pad each element (e.g. [1, 2, 3]) with a tailored padding just for that element. Of course I could do it in a for loop and append each result to a new ndarray but isn't there a faster and cleaner way I could apply this over the whole ndarray at once?
I imagined it could work like:
myArray = [[1, 2, 3]
[4, 5, 6]]
paddings = [(1, 2),
(2, 1)]
myArray = np.pad(myArray, paddings, 'constant')
But of course this just outputs:
[[0 0 0 0 0 0 0 0 0]
[0 0 1 2 3 0 0 0 0]
[0 0 3 4 5 0 0 0 0]
[0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0]]
Which is not what i need. The target result would be:
[[0 1 2 3 0 0]
[0 0 4 5 6 0]]
How can I achieve this using numpy?
Here is a loop based solution but with creating a zeros array as per the dimensions of input array and paddings. Explanation in comments:
In [192]: myArray
Out[192]:
array([[1, 2, 3],
[4, 5, 6]])
In [193]: paddings
Out[193]:
array([[1, 2],
[2, 1]])
# calculate desired shape; needed for initializing `padded_arr`
In [194]: target_shape = (myArray.shape[0], myArray.shape[1] + paddings.shape[1] + 1)
In [195]: padded_arr = np.zeros(target_shape, dtype=np.int32)
In [196]: padded_arr
Out[196]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=int32)
After this, we can use a for loop to slot fill the sequences from myArray, based on the values from paddings:
In [199]: for idx in range(paddings.shape[0]):
...: padded_arr[idx, paddings[idx, 0]:-paddings[idx, 1]] = myArray[idx]
...:
In [200]: padded_arr
Out[200]:
array([[0, 1, 2, 3, 0, 0],
[0, 0, 4, 5, 6, 0]], dtype=int32)
The reason we've to resort to a loop based solution is because numpy.lib.pad() doesn't yet support this sort of padding, even with all available additional modes and keyword arguments that it already provides.
Related
Imagine we are under a segmentation problem that has 5 classes (0, 1, 2, 3, 4). Considering that we have the following 3D mask volumes (A.K.A. 3D numpy arrays):
# Ground truth mask
y_true = np.array([[[2, 1, 4], [0, 1, 1], [2, 1, 0]],
[[2, 2, 2], [0, 1, 0], [0, 1, 1]],
[[2, 4, 4], [2, 1, 4], [2, 1, 1]]])
# Predicted mask
y_pred = np.array([[[2, 0, 4], [0, 2, 1], [2, 0, 0]],
[[2, 4, 0], [0, 1, 2], [0, 4, 1]],
[[2, 0, 4], [1, 1, 4], [2, 2, 1]]])
How can I compute the Hausdorff distance between them? I've looked into Monai's implementation however couldn't figure out the meaning of the compute_hausdorff_distance output.
I implemented a one-hot encoder, since Monai requires the inputs to be one-hot encoded.
def one_hot_encode(array):
return np.eye(5)[array].astype(dtype=int)
Now we have that:
# Ground truth mask
y_true = [[[[0 0 1 0 0]
[0 1 0 0 0]
[0 0 0 0 1]]
...
[[1 0 0 0 0]
[0 1 0 0 0]
[0 1 0 0 0]]]
# Predicted mask
y_pred = [[[[0 0 1 0 0]
[1 0 0 0 0]
[0 0 0 0 1]]
...
[[0 0 1 0 0]
[0 0 1 0 0]
[0 1 0 0 0]]]]
The output of Monai's implementation is:
>>> compute_hausdorff_distance(one_hot_encode(y_pred), one_hot_encode(y_true), include_background=True)
>>> [[1. 1. 1. ]
[2. 1.41421356 3. ]
[2.23606798 1. 1. ]]
Looking at it I can understand it is computing the euclidean distance. It looks like it is looking at labels as positions, but should't the output be of shape 3x3x3just like the masks?
Also, Scipy implementation only works for 2D masks/arrays. Would it be right to compute the Hausdorff distance slice-wise, i.e., slice by slice, and afterwards average all the slice Hausdorff distances obtained? Or does this approach violates the Hausdorff distance principle for 3D data?
Let's say I have a symmetric n-by-n array A and a 1D array x of length n, where the rows/columns of A correspond to the entries of x, and x is ordered. Now assume both A and x are randomly rearranged, so that the rows/columns still correspond but they're no longer in order. How can I manipulate A to recover the correct order?
As an example: x = array([1, 3, 2, 0]) and
A = array([[1, 3, 2, 0],
[3, 9, 6, 0],
[2, 6, 4, 0],
[0, 0, 0, 0]])
so the mapping from x to A in this example is A[i][j] = x[i]*x[j]. x should be sorted like array([0, 1, 2, 3]) and I want to arrive at
A = array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6],
[0, 3, 6, 9]])
I guess that OP is looking for a flexible way to use indices that sorts both rows and columns of his mapping at once. What is more, OP might be interested in doing it in reverse, i.e. find and initial view of mapping if it's lost.
def mapping(x, my_map, return_index=True, return_inverse=True):
idx = np.argsort(x)
out = my_map(x[idx], x[idx])
inv = np.empty_like(idx)
inv[idx] = np.arange(len(idx))
return out, idx, inv
x = np.array([1, 3, 2, 0])
a, idx, inv = mapping(x, np.multiply.outer) #sorted mapping
b = np.multiply.outer(x, x) #straight mapping
print(b)
>>> [[1 3 2 0]
[3 9 6 0]
[2 6 4 0]
[0 0 0 0]]
print(a)
>>> [[0 0 0 0]
[0 1 2 3]
[0 2 4 6]
[0 3 6 9]]
np.array_equal(b, a[np.ix_(inv, inv)]) #sorted to straight
>>> True
np.array_equal(a, b[np.ix_(idx, idx)]) #straight to sorted
>>> True
A simple implementation would be
idx = np.argsort(x)
A = A[idx, :]
A = A[:, idx]
Another possibility would be (all credit to #mathfux):
A[np.ix_(idx, idx)]
You can use argsort and fancy indexing:
idx = np.argsort(x)
A2 = A[idx[None], idx[:,None]]
output:
array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6],
[0, 3, 6, 9]])
I need convert many array to one matrix. One array must become one column i use np.column_stack but dos't work for me
[1 0 0 ... 0 0 1]
[1 0 0 ... 0 0 1]
[1 0 0 ... 0 0 1]
[1 0 0 ... 0 0 1]
[1 0 0 ... 0 0 1]
[1 0 0 ... 0 0 1]
to this
[1 1 1 1 1 1
0 0 0 0 0 0
0 0 0 0 0 0
. . . . . .
. . . . . .
. . . . . .
0 0 0 0 0 0
0 0 0 0 0 0
1 1 1 1 1 1 ]
So you have a list of arrays:
In [3]: alist = [np.array([1,0,0,1]) for i in range(3)]
In [4]: alist
Out[4]: [array([1, 0, 0, 1]), array([1, 0, 0, 1]), array([1, 0, 0, 1])]
Join them to become rows of a 2d array:
In [5]: np.vstack(alist)
Out[5]:
array([[1, 0, 0, 1],
[1, 0, 0, 1],
[1, 0, 0, 1]])
to become columns:
In [6]: np.column_stack(alist)
Out[6]:
array([[1, 1, 1],
[0, 0, 0],
[0, 0, 0],
[1, 1, 1]])
You comment code is unclear, but:
for i in range(6):
np.column_stack((arrays[i]))
doesn't make sense, nor does it follow column_stack docs. column_stack makes a new array; it does not operate in-place. List append does operate inplace, and is a good choice when building a list iteratively, but it should not be taken as a model for building arrays itertively.
All the concatenate and stack functions takes a list of arrays as input. Take advantage of that. And remember, they return a new array on each call. (that applies for np.append as well, but I discourage using that).
Another option in the stack family:
In [7]: np.stack(alist, axis=1)
Out[7]:
array([[1, 1, 1],
[0, 0, 0],
[0, 0, 0],
[1, 1, 1]])
I would put all the arrays in one list and then reshape it
import numpy as np
l=[[1,1,1,0,1,1],[1,0,0,1,0,1]]
l=np.reshape(l,len(l)*len(l[0]),1)
Since what you want is basically vertical stacking of 1D arrays, it makes sense to use np.vstack and then transpose the result using .T:
my_array = np.array([1,0,0,0,0,0,1])
result = np.vstack([my_array] * 6).T
Here I assume you just copy the 1D array 6 times, but alternatively you can pass a list of 1D arrays as an argument to np.vstack.
You can use numpy.asmatrix like the following below. The last steps convert the matrix to a one column matrix like requested.
EDIT
As hpaulj pointed out, np.array (ndarray) is typically used more now, but if you are using a matrix type, the solution below works for this example.
import numpy as np
a1 = [ 1, 2, 3, 4, 5]
a2 = [ 6, 7, 8, 9, 10]
a3 = [11, 12, 13, 14, 15]
mat = np.asmatrix([a1, a2, a3])
mat
## matrix([[ 1, 2, 3, 4, 5],
## [ 6, 7, 8, 9, 10],
## [11, 12, 13, 14, 15]])
mat.shape
## (3, 5)
### If you want to reshape the final matrix
mat2 = mat.reshape(1, 15)
mat2.shape
## (1, 15)
### Convert to 1 column: You can also transpose it.
mat2.transpose().shape
## (15, 1)
Suppose I have an array
[[0 2 1]
[1 0 1]
[2 1 1]]
and I want to convert it into a tensor of the form
[[[1 0 0]
[0 1 0]
[0 0 0]]
[[0 0 1]
[1 0 1]
[0 1 1]]
[[0 1 0]
[0 0 0]
[1 0 0]]]
Where each depth layer (index i) is a binary mask showing where i appears in the input.
I have written code for this which works correctly but is too slow for any use. Can I replace the loop in this function with another vectorized operation?
def im2segmap(im, depth):
tensor = np.zeros((im.shape[0], im.shape[1], num_classes))
for c in range(depth):
rows, cols = np.argwhere(im==c).T
tensor[c, rows, cols] = 1
return tensor
Use broadcasting -
(a==np.arange(num_classes)[:,None,None]).astype(int)
Or with builtin outer comparison -
(np.equal.outer(range(num_classes),a)).astype(int)
Use uint8 if you have to use an int dtype or keep as boolean by skipping the int conversion altogether for further boost.
Sample run -
In [42]: a = np.array([[0,2,1],[1,0,1],[2,1,1]])
In [43]: num_classes = 3 # or depth
In [44]: (a==np.arange(num_classes)[:,None,None]).astype(int)
Out[44]:
array([[[1, 0, 0],
[0, 1, 0],
[0, 0, 0]],
[[0, 0, 1],
[1, 0, 1],
[0, 1, 1]],
[[0, 1, 0],
[0, 0, 0],
[1, 0, 0]]])
To have the depth/num_classes as the third dim, extend the input array and then compare against the range array -
(a[...,None]==np.arange(num_classes)).astype(int)
(np.equal.outer(im, range(num_classes))).astype(int)
(np.equal.outer(im, range(num_classes))).astype(np.uint8) # lower prec
I have a big csr_matrix and I want to add over rows and obtain a new csr_matrix with the same number of columns but reduced number of rows. (Context: The matrix is a document-term matrix obtained from sklearn CountVectorizer and I want to be able to quickly combine documents according to codes associated with these documents)
For a minimal example, this is my matrix:
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import vstack
row = np.array([0, 4, 1, 3, 2])
col = np.array([0, 2, 2, 0, 1])
dat = np.array([1, 2, 3, 4, 5])
A = csr_matrix((dat, (row, col)), shape=(5, 5))
print A.toarray()
[[1 0 0 0 0]
[0 0 3 0 0]
[0 5 0 0 0]
[4 0 0 0 0]
[0 0 2 0 0]]
No let's say I want a new matrix B in which rows (1, 4) and (2, 3, 5) are combined by summing them, which would look something like this:
[[5 0 0 0 0]
[0 5 5 0 0]]
And should be again in sparse format (because the real data I'm working with is large). I tried to sum over slices of the matrix and then stack it:
idx1 = [1, 4]
idx2 = [2, 3, 5]
A_sub1 = A[idx1, :].sum(axis=1)
A_sub2 = A[idx2, :].sum(axis=1)
B = vstack((A_sub1, A_sub2))
But this gives me the summed up values just for the non-zero columns in the slice, so I can't combine it with the other slices because the number of columns in the summed slices are different.
I feel like there must be an easy way to do this. But I couldn't find any discussion of this online or in the documentation. What am I missing?
Thank you for your help
Note that you can do this by carefully constructing another matrix. Here's how it would work for a dense matrix:
>>> S = np.array([[1, 0, 0, 1, 0,], [0, 1, 1, 0, 1]])
>>> np.dot(S, A.toarray())
array([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])
>>>
The sparse version is only a little more complicated. The information about which rows should be summed together is encoded in row:
col = range(5)
row = [0, 1, 1, 0, 1]
dat = [1, 1, 1, 1, 1]
S = csr_matrix((dat, (row, col)), shape=(2, 5))
result = S * A
# check that the result is another sparse matrix
print type(result)
# check that the values are the ones we want
print result.toarray()
Output:
<class 'scipy.sparse.csr.csr_matrix'>
[[5 0 0 0 0]
[0 5 5 0 0]]
You can handle more rows in your output by including higher values in row and extending the shape of S accordingly.
The indexing should be:
idx1 = [0, 3] # rows 1 and 4
idx2 = [1, 2, 4] # rows 2,3 and 5
Then you need to keep A_sub1 and A_sub2 in sparse format and use axis=0:
A_sub1 = csr_matrix(A[idx1, :].sum(axis=0))
A_sub2 = csr_matrix(A[idx2, :].sum(axis=0))
B = vstack((A_sub1, A_sub2))
B.toarray()
array([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])
Note, I think the A[idx, :].sum(axis=0) operations involve conversion from sparse matrices - so #Mr_E's answer is probably better.
Alternatively, it works when you use axis=0 and np.vstack (as opposed to scipy.sparse.vstack):
A_sub1 = A[idx1, :].sum(axis=0)
A_sub2 = A[idx2, :].sum(axis=0)
np.vstack((A_sub1, A_sub2))
Giving:
matrix([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])