Does anyone has experience in creating sparse matrix with the non-zero values follows a uniform distribution of [-0.5, 0.5] and has zero mean (zero centered) in python (e.g. using Scipy.sparse)?
I am aware that scipy.sparse package provide a few method on creating random sparse matrix, like 'rand' and 'random'. However I could not achieve what I want with those method. For example, I tried:
import numpy as np
import scipy.sparse as sp
s = np.random.uniform(-0.5,0.5)
W=sp.random(1024, 1024, density=0.01, format='csc', data_rvs=s)
To specifiy my idea:
Let say I want the above mentioned matrix which is non-sparse, or dense, I will create it by:
dense=np.random.rand(1024,1024)-0.5
'np.random.rand(1024,1024)' will create a dense uniform matrix with values in [0,1]. To make it zero mean, I centre the matrix by substract it 0.5.
However if I create a sparse matrix, let say:
sparse=sp.rand(1024,1024,density=0.01, format='csc')
The matrix will be having non-zero values in uniform [0,1]. However, if I want to centre the matrix, I cannot simply do 'sparse-=0.5' which will cause all the originally zero entries non-zero after substraction.
So, how can I achieve the same as for the above example for dense matrix on sparse matrix?
Thank you for all of your help!
The data_rvs parameter is expecting a "callable" that takes a size. This isn't exactly obvious from the documentation. This can be done with a lambda as follows:
import numpy as np
import scipy.sparse as sp
W = sp.random(1024, 1024, density=0.01, format='csc',
data_rvs=lambda s: np.random.uniform(-0.5, 0.5, size=s))
Then print(W) gives:
(243, 0) -0.171300809713
(315, 0) 0.0739590145626
(400, 0) 0.188151369316
(440, 0) -0.187384896218
: :
(1016, 0) 0.29262088084
(156, 1) -0.149881296136
(166, 1) -0.490405135834
(191, 1) 0.188167190147
(212, 1) 0.0334533020488
: :
(411, 1) 0.122330200832
(431, 1) -0.0494334160833
(813, 1) -0.0076379249885
(828, 1) 0.462807265425
: :
(840, 1021) 0.456423017883
(12, 1022) -0.47313075329
: :
(563, 1022) -0.477190349161
(655, 1022) -0.460942546313
(673, 1022) 0.0930207181126
(676, 1022) 0.253643616387
: :
(843, 1023) 0.463793903168
(860, 1023) 0.454427252782
For the newbie, the lambda may look odd - this is just an unnamed function. The sp.random function takes an optional argument data_rvs that defaults to None. When specified, it is expected to be a function that takes a size argument and returns that number of random numbers. A simple function to do this would be:
def generate_n_uniform_randoms(n):
return np.uniform(-0.5, 0.5, n)
I don't know the origin of the API, but the shape is not needed as sp.random presumably first figures out which indices will be non-zero, and then it just needs to compute random values for those indices, which is a set of a known size.
The lambda is just syntactic sugar that allows us to define that function inline in terms of some other function call. We could instead write
W = sp.random(1024, 1024, density=0.01, format='csc',
data_rvs=generate_n_uniform_randoms)
Actually, this can be a "callable" - some object f for which f(n) returns n random variables. This can be a function, but it can also be an object of a class that implements the __call__(self, n) function. For example:
class ufoo(object):
def __call__(self, n):
import numpy
return numpy.random.uniform(-0.5, 0.5, n)
W = sp.random(1024, 1024, density=0.01, format='csc',
data_rvs=ufoo())
If you need the mean to be exactly zero (within roundoff of course), this can be done by subtracting the mean from the non-zero values, as I mentioned above:
W.data -= np.mean(W.data)
Then:
W[idx].mean()
-2.3718641632430623e-18
sparse.random does 2 things - distributes nonzeros randomly, and generates random uniform values.
In [62]: M = sparse.random(10,10,density=.2, format='csr')
In [63]: M
Out[63]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 20 stored elements in Compressed Sparse Row format>
In [64]: M.data
Out[64]:
array([ 0.42825407, 0.51858978, 0.8084335 , 0.08691635, 0.13210409,
0.61288928, 0.39675205, 0.58242891, 0.5174367 , 0.57859824,
0.48812484, 0.13472883, 0.82992478, 0.70568697, 0.45001632,
0.52147305, 0.72943809, 0.55801913, 0.97018861, 0.83236235])
You can modify the data values cheaply without changing the sparsity distribution:
In [65]: M.data -= 0.5
In [66]: M.A
Out[66]:
array([[ 0. , 0. , 0. , -0.07174593, 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0.01858978, 0. , 0. , 0.3084335 , -0.41308365,
0. , 0. , 0. , 0. , -0.36789591],
[ 0. , 0. , 0. , 0. , 0.11288928,
-0.10324795, 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.08242891, 0.0174367 , 0. ],
[ 0. , 0. , 0.07859824, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , -0.01187516, 0. , 0. , -0.36527117],
[ 0. , 0. , 0.32992478, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.20568697,
0. , 0. , -0.04998368, 0. , 0. ],
[ 0.02147305, 0. , 0.22943809, 0.05801913, 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0.47018861, 0.33236235, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ]])
In [67]: np.mean(M.data)
Out[67]: 0.044118297661574338
Or replacing the nonzero values with a new set of values:
In [69]: M.data = np.random.randint(-5,5,20)
In [70]: M
Out[70]:
<10x10 sparse matrix of type '<class 'numpy.int32'>'
with 20 stored elements in Compressed Sparse Row format>
In [71]: M.A
Out[71]:
array([[ 0, 0, 0, 4, 0, 0, 0, 0, 0, 0],
[-1, 0, 0, 1, 2, 0, 0, 0, 0, -4],
[ 0, 0, 0, 0, 0, 4, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, -5, -5, 0],
[ 0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, -3, 0, 0, 3],
[ 0, 0, -1, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, -4, 0, 0, -1, 0, 0],
[-1, 0, -5, -2, 0, 0, 0, 0, 0, 0],
[ 0, 3, 1, 0, 0, 0, 0, 0, 0, 0]])
In [72]: M.data
Out[72]:
array([ 4, -1, 1, 2, -4, 0, 4, -5, -5, 2, -3, 3, -1, -4, -1, -1, -5,
-2, 3, 1])
In my opinion, your requirements are still incomplete (see disadvantage mentioned below).
Here is some implementation for my simple construction outlined above in my comment:
import numpy as np
import scipy.sparse as sp
M, N, NNZ = 5, 5, 10
assert NNZ % 2 == 0
flat_dim = M*N
valuesA = np.random.uniform(-0.5, 0.5, size=NNZ // 2)
valuesB = valuesA * -1
values = np.hstack((valuesA, valuesB))
positions_flat = np.random.choice(flat_dim, size=NNZ, replace=False)
positions_2d = np.unravel_index(positions_flat, dims=(M, N))
mat = sp.coo_matrix((values, (positions_2d[0], positions_2d[1])), shape=(M, N))
print(mat.todense())
print(mat.data.mean())
Output:
[[ 0. 0. 0. 0.0273862 0. ]
[-0.3943963 0. 0. -0.04134932 0. ]
[-0.10121743 0. -0.0273862 0. 0.04134932]
[ 0.3943963 0. 0. 0. 0. ]
[-0.24680983 0. 0.24680983 0.10121743 0. ]]
0.0
Advantages
sparse
zero mean
entries from uniform distribution
Potential disadvantage:
for each value x in the matrix, somewhere -x is to be found!
meaning: it's not uniform in a more broad joint-distribution sense
if that's hurtful only you can tell
if yes: the above construction could be easily modified to use any centered values from some distribution, so your problem collapses into this somewhat smaller (but not necessarily much easier problem)
Now in regards to that linked problem: i'm guessing here, but i would not be surprised to see that sampling x values uniformly with the constraint mean(x)=0 is NP-hard.
Keep in mind, that a-posteriori centering of nonzeros, as recommend in the other answer, changes the underlying distribution (even for simple distributions). In some cases even invalidating bounds (leaving interval -0.5, 0.5).
This means: this question is all about formalizing which objective is how important and balance these out in some way.
Related
I am a complete beginner with NumPy and I am trying to generate the following matrix pattern. Below is my code. What I am not figuring out is that what am I doing wrong to get this result. Thanks in advance for any help.
import numpy as np
def matrix(n):
final = []
for i in range(n):
final.append(list(np.tile([0,1],int(n/2))) if i%2==0 else list(np.tile([1,0],int(n/2))))
print(np.array(final))
size = 8
matrix(size)
While using numpy you should avoid working with arrays and for loops for matrix creating and editing because for large matrices it would be very slow.
Try to examine this code:
import math
import numpy as np
def zero_borders(mat: np.ndarray) -> None:
"""Makes the borders of the array zero."""
mat[:, 0] = 0 # left border
mat[:, -1] = 0 # right border
mat[0, :] = 0 # upper border
mat[-1, :] = 0 # bottom border
def zero_center_square(mat: np.ndarray) -> None:
"""Makes small square of zeros in the center of the array."""
size = mat.shape[0]
i_low = size//2 - 1
i_high = math.ceil(size/2)
mat[i_low, i_low:i_high + 1] = 0 # upper edge of the square
mat[i_high, i_low:i_high + 1] = 0 # upper edge of the square
mat[i_low:i_high + 1, i_low] = 0 # left edge of the square
mat[i_low:i_high + 1, i_high] = 0 # right edge of the square
def matrix(n: int) -> np.ndarray:
"""Creates a square matrix with special pattern."""
mat = np.ones((n, n))
zero_borders(mat)
zero_center_square(mat)
return mat
def main():
print("Even size:")
print(matrix(8))
print("")
print("Odd size:")
print(matrix(9))
if __name__ == "__main__":
main()
The output:
Even size:
[[0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 0. 0. 1. 1. 0.]
[0. 1. 1. 0. 0. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 0.]
[0. 0. 0. 0. 0. 0. 0. 0.]]
Odd size:
[[0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 1. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 0. 0. 0. 1. 1. 0.]
[0. 1. 1. 0. 1. 0. 1. 1. 0.]
[0. 1. 1. 0. 0. 0. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 1. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0.]]
You can use numpy ix_() like this:
>>> x = np.zeros((9,9), dtype=int)
>>> p1 = np.ix_([1,2,6,7],[1,2,3,4,5,6,7])
>>> x[p]=1
>>> p2 = np.ix_([3,4,5],[1,2,6,7])
>>> x[p2]=1
>>> x
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 0, 0, 0, 1, 1, 0],
[0, 1, 1, 0, 1, 0, 1, 1, 0],
[0, 1, 1, 0, 0, 0, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]])
You have not mentioned any particular pattern for lxl length of matrix, so I will write just code about how to generate the matrix in given image.
You can use NumPy (particularly numpy.pad()) to create that matrix easily as:
import numpy as np
# Create required matrix
matrix = np.pad(np.pad(np.pad(np.array([[1]]), (1, 1)), (2, 2), constant_values = 1), (1, 1))
# If you want that as list instead of NumPy array
list_matrix = list(list(i) for i in matrix)
Sorry for the long post.
I'm using python 3.6 on windows 10.I have a pandas data frame that contain around 100,000 rows. From this data frame I need to generate Four numpy arrays. First 5 relevant rows of my data frame looks like below
A B x UB1 LB1 UB2 LB2
0.2134 0.7866 0.2237 0.1567 0.0133 1.0499 0.127
0.24735 0.75265 0.0881 0.5905 0.422 1.4715 0.5185
0.0125 0.9875 0.1501 1.3721 0.5007 2.0866 2.0617
0.8365 0.1635 0.0948 1.9463 1.0854 2.4655 1.9644
0.1234 0.8766 0.0415 2.7903 2.2602 3.5192 3.2828
Column B is (1-Column A), Actually column B is not there in my data frame. I have added it to explain my problem
From this data frame, I need to generate three arrays. My arrays looks like
My array c looks like array([-0.2134, -0.7866,-0.24735, -0.75265,-0.0125, -0.9875,-0.8365, -0.1635,-0.1234, -0.8766],dtype=float32)
Where first element is first row of column A with added negative sign, similarly 2nd element is taken from 1st row of column B, third element is from second row of column A,fourth element is 2nd row of column B & so on
My second array UB looks like
array([ 0.2237, 0.0881, 0.1501, 0.0948, 0.0415, 0.2237],dtype=float32)
where elements are rows of column X.
My third array,bounds, looks like
array([[0.0133 , 0.1567],
[0.127 , 1.0499],
[0.422 , 0.5905],
[0.5185 , 1.4715],
[0.5007 , 1.3721],
[2.0617 , 2.0866],
[1.0854 , 1.9463],
[1.9644 , 2.4655],
[2.2602 , 2.7903],
[3.2828 , 3.5192]])
Where bounds[0][0] is first row of LB1,bounds[0][1] is first row of UB1. bounds[1][0] is first row of LB2, bounds [1][1] is first row of UB2. Again bounds[2][0] is 2nd row of LB1 & so on.
My fourth array looks like
array([[-1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, -1, 1, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, -1, 1, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, -1, 1, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, -1, 1]])
It contains same number of rows as data frame rows & column=2*data frame rows.
Can you please tell me for 100,000 rows of record what is the efficient way to generate these arrays
This should be rather straightforward:
from io import StringIO
import pandas as pd
import numpy as np
data = """A B x UB1 LB1 UB2 LB2
0.2134 0.7866 0.2237 0.1567 0.0133 1.0499 0.127
0.24735 0.75265 0.0881 0.5905 0.422 1.4715 0.5185
0.0125 0.9875 0.1501 1.3721 0.5007 2.0866 2.0617
0.8365 0.1635 0.0948 1.9463 1.0854 2.4655 1.9644
0.1234 0.8766 0.0415 2.7903 2.2602 3.5192 3.2828"""
df = pd.read_csv(StringIO(data), sep='\\s+', header=0)
c = -np.stack([df['A'], 1 - df['A']], axis=1).ravel()
print(c)
# [-0.2134 -0.7866 -0.24735 -0.75265 -0.0125 -0.9875 -0.8365 -0.1635
# -0.1234 -0.8766 ]
ub = df['x'].values
print(ub)
# [0.2237 0.0881 0.1501 0.0948 0.0415]
bounds = np.stack([df['LB1'], df['UB1'], df['LB2'], df['UB2']], axis=1).reshape((-1, 2))
print(bounds)
# [[0.0133 0.1567]
# [0.127 1.0499]
# [0.422 0.5905]
# [0.5185 1.4715]
# [0.5007 1.3721]
# [2.0617 2.0866]
# [1.0854 1.9463]
# [1.9644 2.4655]
# [2.2602 2.7903]
# [3.2828 3.5192]]
n = len(df)
fourth = np.zeros((n, 2 * n))
idx = np.arange(n)
fourth[idx, 2 * idx] = -1
fourth[idx, 2 * idx + 1] = 1
print(fourth)
# [[-1. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
# [ 0. 0. -1. 1. 0. 0. 0. 0. 0. 0.]
# [ 0. 0. 0. 0. -1. 1. 0. 0. 0. 0.]
# [ 0. 0. 0. 0. 0. 0. -1. 1. 0. 0.]
# [ 0. 0. 0. 0. 0. 0. 0. 0. -1. 1.]]
I have this delta function which have 3 cases. mask1, mask2 and if none of them is satisfied delta = 0, since res = np.zeros
def delta(r, dr):
res = np.zeros(r.shape)
mask1 = (r >= 0.5*dr) & (r <= 1.5*dr)
res[mask1] = (5-3*np.abs(r[mask1])/dr \
- np.sqrt(-3*(1-np.abs(r[mask1])/dr)**2+1)) \
/(6*dr)
mask2 = np.logical_not(mask1) & (r <= 0.5*dr)
res[mask2] = (1+np.sqrt(-3*(r[mask2]/dr)**2+1))/(3*dr)
return res
Then I have this other function where I call the former and I construct an array, E
def matrix_E(nk,X,Y,xhi,eta,dx,dy):
rx = abs(X[np.newaxis,:] - xhi[:,np.newaxis])
ry = abs(Y[np.newaxis,:] - eta[:,np.newaxis])
deltx = delta(rx,dx)
delty = delta(ry,dy)
E = deltx*delty
return E
The thing is that most of the elements of E belong to the third case of delta, 0. Most means about 99%.
So, I would like to have a sparse matrix instead of a dense one and not to stock the 0 elements in order to save memory.
Any ideas in how I could do it?
The normal way to create a sparse matrix is to construct three 1d arrays, with the nonzero values, and their i and j indexes. Then pass them to the coo_matrix function.
The coordinates don't have to be in order, so you could construct the arrays for the 2 nonzero mask cases and concatenate them.
Here's a sample construction using 2 masks
In [107]: x=np.arange(5)
In [108]: i,j,data=[],[],[]
In [110]: mask1=x%2==0
In [111]: mask2=x%2!=0
In [112]: i.append(x[mask1])
In [113]: j.append((x*2)[mask1])
In [114]: i.append(x[mask2])
In [115]: j.append(x[mask2])
In [116]: i=np.concatenate(i)
In [117]: j=np.concatenate(j)
In [118]: i
Out[118]: array([0, 2, 4, 1, 3])
In [119]: j
Out[119]: array([0, 4, 8, 1, 3])
In [120]: M=sparse.coo_matrix((x,(i,j)))
In [121]: print(M)
(0, 0) 0
(2, 4) 1
(4, 8) 2
(1, 1) 3
(3, 3) 4
In [122]: M.A
Out[122]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 4, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 2]])
A coo format stores those 3 arrays as is, but they get sorted and cleaned up when converted to other formats and printed.
I can work on adapting this to your case, but this may be enough to get you started.
It looks like X,Y,xhi,eta are 1d arrays. rx and ry are then 2d. delta returns a result the same shape as its input. E = deltx*delty suggests that deltax and deltay are the same shape (or at least broadcastable).
Since sparse matrix has a .multiply method to do element wise multiplication, we can focus on producing sparse delta matrices.
If you afford the memory to make rx, and a couple of masks, then you can also afford to make deltax (all the same size). Even through deltax has lots of zeros, it is probably fastest to make it dense.
But let's try to case the delta calculation, as a sparse build.
This looks like the essense of what you are doing in delta, at least with one mask:
start with a 2d array:
In [138]: r = np.arange(24).reshape(4,6)
In [139]: mask1 = (r>=8) & (r<=16)
In [140]: res1 = r[mask1]*0.2
In [141]: I,J = np.where(mask1)
the resulting vectors are:
In [142]: I
Out[142]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
In [143]: J
Out[143]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
In [144]: res1
Out[144]: array([ 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. , 3.2])
Make a sparse matrix:
In [145]: M=sparse.coo_matrix((res1,(I,J)), r.shape)
In [146]: M.A
Out[146]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
I could make another sparse matrix with mask2, and add the two.
In [147]: mask2 = (r>=17) & (r<=22)
In [148]: res2 = r[mask2]*-0.4
In [149]: I,J = np.where(mask2)
In [150]: M2=sparse.coo_matrix((res2,(I,J)), r.shape)
In [151]: M2.A
Out[151]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
...
In [153]: (M1+M2).A
Out[153]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
Or I could concatenate the res1 and res2, etc and make one sparse matrix:
In [156]: I1,J1 = np.where(mask1)
In [157]: I2,J2 = np.where(mask2)
In [158]: res12=np.concatenate((res1,res2))
In [159]: I12=np.concatenate((I1,I2))
In [160]: J12=np.concatenate((J1,J2))
In [161]: M12=sparse.coo_matrix((res12,(I12,J12)), r.shape)
In [162]: M12.A
Out[162]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
Here I choose the masks so the nonzero values don't overlap, but both methods work if they did. It's a delibrate design feature of the coo format that values for repeated indices are summed. It's very handy feature when creating sparse matries for finite element problems.
I can also get index arrays by creating a sparse matrix from the mask:
In [179]: rmask1=sparse.coo_matrix(mask1)
In [180]: rmask1.row
Out[180]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
In [181]: rmask1.col
Out[181]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
In [184]: sparse.coo_matrix((res1, (rmask1.row, rmask1.col)),rmask1.shape).A
Out[184]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
I can't, though, create a mask from a sparse version of r. (r>=8) & (r<=16). That kind of inequality test has not been implemented for sparse matrices. But that might not matter, since r is probably not sparse.
I have a 3D numpy array consisting of 1's and zeros defining open versus filled space in a porous solid (it's currently a numpy Int64 array). I want to determine the euclidian distance from each of the "1" points (voxels) to its nearest zero point. Is there a simple way to do this?
What you are asking for is the distance transform, which you can compute using scipy's ndimage package and its distance_transform_edt function:
>>> import numpy as np
>>> import scipy.ndimage as ndi
>>> img = np.random.randint(2, size=(5, 5))
>>> img
array([[0, 0, 1, 1, 1],
[1, 0, 1, 0, 1],
[0, 1, 1, 1, 1],
[0, 0, 0, 1, 1],
[0, 1, 1, 1, 1]])
>>> ndi.distance_transform_edt(img)
array([[ 0. , 0. , 1. , 1. , 1.41421356],
[ 1. , 0. , 1. , 0. , 1. ],
[ 0. , 1. , 1. , 1. , 1.41421356],
[ 0. , 0. , 0. , 1. , 2. ],
[ 0. , 1. , 1. , 1.41421356, 2.23606798]])
If val contains the value (0 or 1) and pos contains the positions of each of these voxels, then you could use scipy.spatial.distance.cdist to compute all pairwise distances:
import numpy as np
from scipy.spatial.distance import cdist
# Find the points corresponding to zeros and ones
zero_indices = (val == 0)
one_indices = (val == 1)
# Compute all pairwise distances between zero-points and one-points
pairwise_distances = distance.cdist(pos[zero_indices, :], pos[one_indices, :])
# Choose the minimum distance
min_dist = np.min(pairwise_distances, axis=0)
I searched the net to find a guide for Scipy sparse matrices and I failed. I would be happy if anybody would share any source for it but now going to question:
I have an array of tuples. I want to change the array of tuples to a sparse matrix where the tuples appear on the main diagonal and diagonal just beside to it as the following example shows it. What is the fancy(efficient) way of doing it?
import numpy as np
A=np.asarray([[1,2],[3,4],[5,6],[7,8]])
B=np.zeros((A.shape[0],A.shape[0]+1))
for i in range(A.shape[0]):
B[i,i]=A[i,0]
B[i,i+1]=A[i,1]
print B
Output being:
[[ 1. 2. 0. 0. 0.]
[ 0. 3. 4. 0. 0.]
[ 0. 0. 5. 6. 0.]
[ 0. 0. 0. 7. 8.]]
You can build those really fast as a CSR matrix:
>>> A = np.asarray([[1,2],[3,4],[5,6],[7,8]])
>>> rows = len(A)
>>> cols = rows + 1
>>> data = A.flatten() # we want a copy
>>> indptr = np.arange(0, len(data)+1, 2) # 2 non-zero entries per row
>>> indices = np.repeat(np.arange(cols), [1] + [2] * (cols-2) + [1])
>>> import scipy.sparse as sps
>>> a_sps = sps.csr_matrix((data, indices, indptr), shape=(rows, cols))
>>> a_sps.A
array([[1, 2, 0, 0, 0],
[0, 3, 4, 0, 0],
[0, 0, 5, 6, 0],
[0, 0, 0, 7, 8]])
Try diags from scipy
import numpy as np
import scipy.sparse
A = np.asarray([[1,2],[3,4],[5,6],[7,8]])
B = scipy.sparse.diags([A[:,0], A[:,1]], [0, 1], [4, 5])
When I print B.todense(), it gives me
[[ 1. 2. 0. 0. 0.]
[ 0. 3. 4. 0. 0.]
[ 0. 0. 5. 6. 0.]
[ 0. 0. 0. 7. 8.]]