Does anyone know of a Python replacement for Matlab / Octave bwdist() function? This function returns Euclidian distance of each cell to the closest non-zero cell for a given matrix. I saw an Octave C implementation, a pure Matlab implementation, and I was wondering if anyone had to implement this in ANSI C (which doesn't include any Matlab / Octave headers, so I can integrate from Python easily) or in pure Python.
Both links I mentioned are below:
C++
Matlab M-File
As a test, a Matlab code / output looks something like this:
bw= [0 1 0 0 0;
1 0 0 0 0;
0 0 0 0 1;
0 0 0 0 0;
0 0 1 0 0]
D = bwdist(bw)
D =
1.00000 0.00000 1.00000 2.00000 2.00000
0.00000 1.00000 1.41421 1.41421 1.00000
1.00000 1.41421 2.00000 1.00000 0.00000
2.00000 1.41421 1.00000 1.41421 1.00000
2.00000 1.00000 0.00000 1.00000 2.00000
I tested a recommended distance_transform_edt call in Python, which gave this result:
import numpy as np
from scipy import ndimage
a = np.array(([0,1,0,0,0],
[1,0,0,0,0],
[0,0,0,0,1],
[0,0,0,0,0],
[0,0,1,0,0]))
res = ndimage.distance_transform_edt(a)
print res
[[ 0. 1. 0. 0. 0.]
[ 1. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 1.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 1. 0. 0.]]
This result does not seem to match the Octave / Matlab output.
While Matlab bwdist returns distances to the closest non-zero cell, Python distance_transform_edt returns distances “to the closest background element”. SciPy documentation is not clear about what it considers to be the “background”, there is some type conversion machinery behind it; in practice 0 is the background, non-zero is the foreground.
So if we have matrix a:
>>> a = np.array(([0,1,0,0,0],
[1,0,0,0,0],
[0,0,0,0,1],
[0,0,0,0,0],
[0,0,1,0,0]))
then to calculate the same result we need to replaces ones with zeros and zeros with ones, e.g. consider matrix 1-a:
>>> a
array([[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0]])
>>> 1 - a
array([[1, 0, 1, 1, 1],
[0, 1, 1, 1, 1],
[1, 1, 1, 1, 0],
[1, 1, 1, 1, 1],
[1, 1, 0, 1, 1]])
In this case scipy.ndimage.morphology.distance_transform_edt gives the expected results:
>>> distance_transform_edt(1-a)
array([[ 1. , 0. , 1. , 2. , 2. ],
[ 0. , 1. , 1.41421356, 1.41421356, 1. ],
[ 1. , 1.41421356, 2. , 1. , 0. ],
[ 2. , 1.41421356, 1. , 1.41421356, 1. ],
[ 2. , 1. , 0. , 1. , 2. ]])
Does scipy.ndimage.morphology.distance_transform_edt meet your needs?
No need to do the 1-a
>>> distance_transform_edt(a==0)
array([[ 1. , 0. , 1. , 2. , 2. ],
[ 0. , 1. , 1.41421356, 1.41421356, 1. ],
[ 1. , 1.41421356, 2. , 1. , 0. ],
[ 2. , 1.41421356, 1. , 1.41421356, 1. ],
[ 2. , 1. , 0. , 1. , 2. ]])
I think you can use distanceTransform() from OpenCV that Calculates the distance to the closest zero pixel for each pixel of the source image.
Check this link: https://docs.opencv.org/3.4/d7/d1b/group__imgproc__misc.html#ga8a0b7fdfcb7a13dde018988ba3a43042
Related
I have the following function that calculates the eucledian distance between all combinations of the vectors in Matrix A and Matrix B
def distance_matrix(A,B):
n=A.shape[1]
m=B.shape[1]
C=np.zeros((n,m))
for ai, a in enumerate(A.T):
for bi, b in enumerate(B.T):
C[ai][bi]=np.linalg.norm(a-b)
return C
This works fine and creates an n*m-Matrix from a d*n-Matrix and a d*m-Matrix containing the eucledian distance between all combinations of the column vectors.
>>> print(A)
[[-1 -1 1 1 2]
[ 1 -1 2 -1 1]]
>>> print(B)
[[-2 -1 1 2]
[-1 2 1 -1]]
>>> print(distance_matrix(A,B))
[[2.23606798 1. 2. 3.60555128]
[1. 3. 2.82842712 3. ]
[4.24264069 2. 1. 3.16227766]
[3. 3.60555128 2. 1. ]
[4.47213595 3.16227766 1. 2. ]]
I spent some time looking for a numpy or scipy function to achieve this in a more efficient way. Is there such a function or what would be the vecotrized way to do this?
You can use:
np.linalg.norm(A[:,:,None]-B[:,None,:],axis=0)
or (totaly equivalent but without in-built function)
((A[:,:,None]-B[:,None,:])**2).sum(axis=0)**0.5
We need a 5x4 final array so we extend our array this way:
A[:,:,None] -> 2,5,1
↑ ↓
B[:,None,:] -> 2,1,4
A[:,:,None] - B[:,None,:] -> 2,5,4
and we apply our sum over the axis 0 to finally get a 5,4 ndarray.
Yes, you can broadcast your vectors:
A = np.array([[-1, -1, 1, 1, 2], [ 1, -1, 2, -1, 1]])
B = np.array([[-2, -1, 1, 2], [-1, 2, 1, -1]])
C = np.linalg.norm(A.T[:, None, :] - B.T[None, :, :], axis=-1)
print(C)
array([[2.23606798, 1. , 2. , 3.60555128],
[1. , 3. , 2.82842712, 3. ],
[4.24264069, 2. , 1. , 3.16227766],
[3. , 3.60555128, 2. , 1. ],
[4.47213595, 3.16227766, 1. , 2. ]])
You can get an explanation of how it works here:
https://sparrow.dev/pairwise-distance-in-numpy/
I've been looking for a way to (efficiently) compute a distance matrix from a target value and an input matrix.
If you consider an input array as:
[0 0 1 2 5 2 1]
[0 0 2 3 5 2 1]
[0 1 1 2 5 4 1]
[1 1 1 2 5 4 0]
Ho do you compute the spatial distance matrix associated to the target value 0?
i.e. what is the distance from each pixel to the closest 0 value?
Thanks in advance
You are looking for scipy.ndimage.morphology.distance_transform_edt. It operates on a binary array and computes euclidean distances on each TRUE position to the nearest background FALSE position. In our case, since we want to find out distances from nearest 0s, so the background is 0. Now, under the hoods, it converts the input to a binary array assuming 0 as the background, so we can just use it with the default parameters. Hence, it would be as simple as -
In [179]: a
Out[179]:
array([[0, 0, 1, 2, 5, 2, 1],
[0, 0, 2, 3, 5, 2, 1],
[0, 1, 1, 2, 5, 4, 1],
[1, 1, 1, 2, 5, 4, 0]])
In [180]: from scipy import ndimage
In [181]: ndimage.distance_transform_edt(a)
Out[181]:
array([[0. , 0. , 1. , 2. , 3. , 3.16, 3. ],
[0. , 0. , 1. , 2. , 2.83, 2.24, 2. ],
[0. , 1. , 1.41, 2.24, 2.24, 1.41, 1. ],
[1. , 1.41, 2.24, 2.83, 2. , 1. , 0. ]])
Solving for generic case
Now, let's say we want to find out distances from nearest 1s, then it would be -
In [183]: background = 1 # element from which distances are to be computed
# compare this with original array, a to verify
In [184]: ndimage.distance_transform_edt(a!=background)
Out[184]:
array([[2. , 1. , 0. , 1. , 2. , 1. , 0. ],
[1.41, 1. , 1. , 1.41, 2. , 1. , 0. ],
[1. , 0. , 0. , 1. , 2. , 1. , 0. ],
[0. , 0. , 0. , 1. , 2. , 1.41, 1. ]])
This is how I scale a single vector:
vector = np.array([-4, -3, -2, -1, 0])
# pass the vector, current range of values, the desired range, and it returns the scaled vector
scaledVector = np.interp(vector, (vector.min(), vector.max()), (-1, +1)) # results in [-1. -0.5 0. 0.5 1. ]
How can I apply the above approach to each column of a given matrix?
matrix = np.array(
[[-4, -4, 0, 0, 0],
[-3, -3, 1, -15, 0],
[-2, -2, 8, -1, 0],
[-1, -1, 11, 12, 0],
[0, 0, 50, 69, 80]])
scaledMatrix = [insert code that scales each column of the matrix]
Note that the first two columns of the scaledMatrix should be equal to the scaledVector from the first example. For the matrix above, the correctly computed scaledMatrix is:
[[-1. -1. -1. -0.64285714 -1. ]
[-0.5 -0.5 -0.96 -1. -1. ]
[ 0. 0. -0.68 -0.66666667 -1. ]
[ 0.5 0.5 -0.56 -0.35714286 -1. ]
[ 1. 1. 1. 1. 1. ]]
My current approach (wrong):
np.interp(matrix, (np.min(matrix), np.max(matrix)), (-1, +1))
If you want to do it by hand and understand what's going on:
First substract columnwise mins to make each columns have min 0.
Then divide by columnwise amplitude (max - min) to make each column have max 1.
Now each column is between 0 and 1. If you want it to be between -1 and 1, multiply by 2, and substract 1:
In [3]: mins = np.min(matrix, axis=0)
In [4]: maxs = np.max(matrix, axis=0)
In [5]: (matrix - mins[None, :]) / (maxs[None, :] - mins[None, :])
Out[5]:
array([[ 0. , 0. , 0. , 0.17857143, 0. ],
[ 0.25 , 0.25 , 0.02 , 0. , 0. ],
[ 0.5 , 0.5 , 0.16 , 0.16666667, 0. ],
[ 0.75 , 0.75 , 0.22 , 0.32142857, 0. ],
[ 1. , 1. , 1. , 1. , 1. ]])
In [6]: 2 * _ - 1
Out[6]:
array([[-1. , -1. , -1. , -0.64285714, -1. ],
[-0.5 , -0.5 , -0.96 , -1. , -1. ],
[ 0. , 0. , -0.68 , -0.66666667, -1. ],
[ 0.5 , 0.5 , -0.56 , -0.35714286, -1. ],
[ 1. , 1. , 1. , 1. , 1. ]])
I use [None, :] for numpy to understand that I'm talking about "row vectors", not column ones.
Otherwise, use the wonderful sklearn package, whose preprocessing module has lots of useful transformers:
In [13]: from sklearn.preprocessing import MinMaxScaler
In [14]: scaler = MinMaxScaler(feature_range=(-1, 1))
In [15]: scaler.fit(matrix)
Out[15]: MinMaxScaler(copy=True, feature_range=(-1, 1))
In [16]: scaler.transform(matrix)
Out[16]:
array([[-1. , -1. , -1. , -0.64285714, -1. ],
[-0.5 , -0.5 , -0.96 , -1. , -1. ],
[ 0. , 0. , -0.68 , -0.66666667, -1. ],
[ 0.5 , 0.5 , -0.56 , -0.35714286, -1. ],
[ 1. , 1. , 1. , 1. , 1. ]])
Does anyone has experience in creating sparse matrix with the non-zero values follows a uniform distribution of [-0.5, 0.5] and has zero mean (zero centered) in python (e.g. using Scipy.sparse)?
I am aware that scipy.sparse package provide a few method on creating random sparse matrix, like 'rand' and 'random'. However I could not achieve what I want with those method. For example, I tried:
import numpy as np
import scipy.sparse as sp
s = np.random.uniform(-0.5,0.5)
W=sp.random(1024, 1024, density=0.01, format='csc', data_rvs=s)
To specifiy my idea:
Let say I want the above mentioned matrix which is non-sparse, or dense, I will create it by:
dense=np.random.rand(1024,1024)-0.5
'np.random.rand(1024,1024)' will create a dense uniform matrix with values in [0,1]. To make it zero mean, I centre the matrix by substract it 0.5.
However if I create a sparse matrix, let say:
sparse=sp.rand(1024,1024,density=0.01, format='csc')
The matrix will be having non-zero values in uniform [0,1]. However, if I want to centre the matrix, I cannot simply do 'sparse-=0.5' which will cause all the originally zero entries non-zero after substraction.
So, how can I achieve the same as for the above example for dense matrix on sparse matrix?
Thank you for all of your help!
The data_rvs parameter is expecting a "callable" that takes a size. This isn't exactly obvious from the documentation. This can be done with a lambda as follows:
import numpy as np
import scipy.sparse as sp
W = sp.random(1024, 1024, density=0.01, format='csc',
data_rvs=lambda s: np.random.uniform(-0.5, 0.5, size=s))
Then print(W) gives:
(243, 0) -0.171300809713
(315, 0) 0.0739590145626
(400, 0) 0.188151369316
(440, 0) -0.187384896218
: :
(1016, 0) 0.29262088084
(156, 1) -0.149881296136
(166, 1) -0.490405135834
(191, 1) 0.188167190147
(212, 1) 0.0334533020488
: :
(411, 1) 0.122330200832
(431, 1) -0.0494334160833
(813, 1) -0.0076379249885
(828, 1) 0.462807265425
: :
(840, 1021) 0.456423017883
(12, 1022) -0.47313075329
: :
(563, 1022) -0.477190349161
(655, 1022) -0.460942546313
(673, 1022) 0.0930207181126
(676, 1022) 0.253643616387
: :
(843, 1023) 0.463793903168
(860, 1023) 0.454427252782
For the newbie, the lambda may look odd - this is just an unnamed function. The sp.random function takes an optional argument data_rvs that defaults to None. When specified, it is expected to be a function that takes a size argument and returns that number of random numbers. A simple function to do this would be:
def generate_n_uniform_randoms(n):
return np.uniform(-0.5, 0.5, n)
I don't know the origin of the API, but the shape is not needed as sp.random presumably first figures out which indices will be non-zero, and then it just needs to compute random values for those indices, which is a set of a known size.
The lambda is just syntactic sugar that allows us to define that function inline in terms of some other function call. We could instead write
W = sp.random(1024, 1024, density=0.01, format='csc',
data_rvs=generate_n_uniform_randoms)
Actually, this can be a "callable" - some object f for which f(n) returns n random variables. This can be a function, but it can also be an object of a class that implements the __call__(self, n) function. For example:
class ufoo(object):
def __call__(self, n):
import numpy
return numpy.random.uniform(-0.5, 0.5, n)
W = sp.random(1024, 1024, density=0.01, format='csc',
data_rvs=ufoo())
If you need the mean to be exactly zero (within roundoff of course), this can be done by subtracting the mean from the non-zero values, as I mentioned above:
W.data -= np.mean(W.data)
Then:
W[idx].mean()
-2.3718641632430623e-18
sparse.random does 2 things - distributes nonzeros randomly, and generates random uniform values.
In [62]: M = sparse.random(10,10,density=.2, format='csr')
In [63]: M
Out[63]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 20 stored elements in Compressed Sparse Row format>
In [64]: M.data
Out[64]:
array([ 0.42825407, 0.51858978, 0.8084335 , 0.08691635, 0.13210409,
0.61288928, 0.39675205, 0.58242891, 0.5174367 , 0.57859824,
0.48812484, 0.13472883, 0.82992478, 0.70568697, 0.45001632,
0.52147305, 0.72943809, 0.55801913, 0.97018861, 0.83236235])
You can modify the data values cheaply without changing the sparsity distribution:
In [65]: M.data -= 0.5
In [66]: M.A
Out[66]:
array([[ 0. , 0. , 0. , -0.07174593, 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0.01858978, 0. , 0. , 0.3084335 , -0.41308365,
0. , 0. , 0. , 0. , -0.36789591],
[ 0. , 0. , 0. , 0. , 0.11288928,
-0.10324795, 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.08242891, 0.0174367 , 0. ],
[ 0. , 0. , 0.07859824, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , -0.01187516, 0. , 0. , -0.36527117],
[ 0. , 0. , 0.32992478, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.20568697,
0. , 0. , -0.04998368, 0. , 0. ],
[ 0.02147305, 0. , 0.22943809, 0.05801913, 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0.47018861, 0.33236235, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ]])
In [67]: np.mean(M.data)
Out[67]: 0.044118297661574338
Or replacing the nonzero values with a new set of values:
In [69]: M.data = np.random.randint(-5,5,20)
In [70]: M
Out[70]:
<10x10 sparse matrix of type '<class 'numpy.int32'>'
with 20 stored elements in Compressed Sparse Row format>
In [71]: M.A
Out[71]:
array([[ 0, 0, 0, 4, 0, 0, 0, 0, 0, 0],
[-1, 0, 0, 1, 2, 0, 0, 0, 0, -4],
[ 0, 0, 0, 0, 0, 4, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, -5, -5, 0],
[ 0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, -3, 0, 0, 3],
[ 0, 0, -1, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, -4, 0, 0, -1, 0, 0],
[-1, 0, -5, -2, 0, 0, 0, 0, 0, 0],
[ 0, 3, 1, 0, 0, 0, 0, 0, 0, 0]])
In [72]: M.data
Out[72]:
array([ 4, -1, 1, 2, -4, 0, 4, -5, -5, 2, -3, 3, -1, -4, -1, -1, -5,
-2, 3, 1])
In my opinion, your requirements are still incomplete (see disadvantage mentioned below).
Here is some implementation for my simple construction outlined above in my comment:
import numpy as np
import scipy.sparse as sp
M, N, NNZ = 5, 5, 10
assert NNZ % 2 == 0
flat_dim = M*N
valuesA = np.random.uniform(-0.5, 0.5, size=NNZ // 2)
valuesB = valuesA * -1
values = np.hstack((valuesA, valuesB))
positions_flat = np.random.choice(flat_dim, size=NNZ, replace=False)
positions_2d = np.unravel_index(positions_flat, dims=(M, N))
mat = sp.coo_matrix((values, (positions_2d[0], positions_2d[1])), shape=(M, N))
print(mat.todense())
print(mat.data.mean())
Output:
[[ 0. 0. 0. 0.0273862 0. ]
[-0.3943963 0. 0. -0.04134932 0. ]
[-0.10121743 0. -0.0273862 0. 0.04134932]
[ 0.3943963 0. 0. 0. 0. ]
[-0.24680983 0. 0.24680983 0.10121743 0. ]]
0.0
Advantages
sparse
zero mean
entries from uniform distribution
Potential disadvantage:
for each value x in the matrix, somewhere -x is to be found!
meaning: it's not uniform in a more broad joint-distribution sense
if that's hurtful only you can tell
if yes: the above construction could be easily modified to use any centered values from some distribution, so your problem collapses into this somewhat smaller (but not necessarily much easier problem)
Now in regards to that linked problem: i'm guessing here, but i would not be surprised to see that sampling x values uniformly with the constraint mean(x)=0 is NP-hard.
Keep in mind, that a-posteriori centering of nonzeros, as recommend in the other answer, changes the underlying distribution (even for simple distributions). In some cases even invalidating bounds (leaving interval -0.5, 0.5).
This means: this question is all about formalizing which objective is how important and balance these out in some way.
I recently posted a question here which was answered exactly as I asked. However, I think I overestimated my ability to manipulate the answer further. I read the broadcasting doc, and followed a few links that led me way back to 2002 about numpy broadcasting.
I've used the second method of array creation using broadcasting:
N = 10
out = np.zeros((N**3,4),dtype=int)
out[:,:3] = (np.arange(N**3)[:,None]/[N**2,N,1])%N
which outputs:
[[0,0,0,0]
[0,0,1,0]
...
[0,1,0,0]
[0,1,1,0]
...
[9,9,8,0]
[9,9,9,0]]
but I do not understand via the docs how to manipulate that. I would ideally like to be able to set the increments in which each individual column changes.
ex. Column A changes by 0.5 up to 2, column B changes by 0.2 up to 1, and column C changes by 1 up to 10.
[[0,0,0,0]
[0,0,1,0]
...
[0,0,9,0]
[0,0.2,0,0]
...
[0,0.8,9,0]
[0.5,0,0,0]
...
[1.5,0.8,9,0]]
Thanks for any help.
You can adjust your current code just a little bit to make it work.
>>> out = np.zeros((4*5*10,4))
>>> out[:,:3] = (np.arange(4*5*10)[:,None]//(5*10, 10, 1)*(0.5, 0.2, 1)%(2, 1, 10))
>>> out
array([[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1. , 0. ],
[ 0. , 0. , 2. , 0. ],
...
[ 0. , 0. , 8. , 0. ],
[ 0. , 0. , 9. , 0. ],
[ 0. , 0.2, 0. , 0. ],
...
[ 0. , 0.8, 9. , 0. ],
[ 0.5, 0. , 0. , 0. ],
...
[ 1.5, 0.8, 9. , 0. ]])
The changes are:
No int dtype on the array, since we need it to hold floats in some columns. You could specify a float dtype if you want (or even something more complicated that only allows floats in the first two columns).
Rather than N**3 total values, figure out the number of distinct values for each column, and multiply them together to get our total size. This is used for both zeros and arange.
Use the floor division // operator in the first broadcast operation because we want integers at this point, but later we'll want floats.
The values to divide by are again based on the number of values for the later columns (e.g. for A,B,C numbers of values, divide by B*C, C, 1).
Add a new broadcast operation to multiply by various scale factors (how much each value increases at once).
Change the values in the broadcast mod % operation to match the bounds on each column.
This small example helps me understand what is going on:
In [123]: N=2
In [124]: np.arange(N**3)[:,None]/[N**2, N, 1]
Out[124]:
array([[ 0. , 0. , 0. ],
[ 0.25, 0.5 , 1. ],
[ 0.5 , 1. , 2. ],
[ 0.75, 1.5 , 3. ],
[ 1. , 2. , 4. ],
[ 1.25, 2.5 , 5. ],
[ 1.5 , 3. , 6. ],
[ 1.75, 3.5 , 7. ]])
So we generate a range of numbers (0 to 7) and divide them by 4,2, and 1.
The rest of the calculation just changes each value without further broadcasting
Apply %N to each element
In [126]: np.arange(N**3)[:,None]/[N**2, N, 1]%N
Out[126]:
array([[ 0. , 0. , 0. ],
[ 0.25, 0.5 , 1. ],
[ 0.5 , 1. , 0. ],
[ 0.75, 1.5 , 1. ],
[ 1. , 0. , 0. ],
[ 1.25, 0.5 , 1. ],
[ 1.5 , 1. , 0. ],
[ 1.75, 1.5 , 1. ]])
Assigning to an int array is the same as converting the floats to integers:
In [127]: (np.arange(N**3)[:,None]/[N**2, N, 1]%N).astype(int)
Out[127]:
array([[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]])