I'm trying to work out the singular values from a matrix with many zeros using SLEPc:s Lanczos type svd solver, in python/cython.
The matrix that I use is a PETc matrix
[[ 0.00648130+0.32060635j 0 0 0 0 0 ]
[ 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 ]
[ 0 0 0 0 0 -0.00668978-0.31948359j ]]
When when I invoke the svd solver with the code bellow
size = Matrix.getSize()
S = SLEPc.SVD()
S.create()
S.setOperator(Matrix)
S.setType(SLEPc.SVD.Type.LANCZOS)
S.setDimensions(min(size))
S.solve()
i get the error
/usr/local/lib/python2.7/dist-packages/slepc4py/lib/linux-gnu-cxx-complex/SLEPc.so in slepc4py.SLEPc.SVD.solve (src/slepc4py.SLEPc.c:35357)()
Error: error code 76
[0] SVDSolve() line 111 in /home/fremling/slepc-3.7.2/src/svd/interface/svdsolve.c
[0] SVDSolve_Lanczos() line 229 in /home/fremling/slepc-3.7.2/src/svd/impls/lanczos/gklanczos.c
[0] DSSolve() line 543 in /home/fremling/slepc-3.7.2/src/sys/classes/ds/interface/dsops.c
[0] DSSolve_SVD_DC() line 255 in /home/fremling/slepc-3.7.2/src/sys/classes/ds/impls/svd/dssvd.c
[0] Error in external library
[0] Error in Lapack xBDSDC 5
I realize some of the singular values will be zero, but that should not be reasons for the crash, right?
I should mention that most of the time the code runs without problem, but when there are many zeros, these crashes happen.
Complete code example works with given matrix for all SLEPc SVD methods except SLEPc.SVD.Type.CROSS. Tests were run using version 3.7.0 of slepc4py and petsc4py.
import numpy as np
import slepc4py.SLEPc as SLEPc
import petsc4py.PETSc as PETSc
# numpy version
A = np.array([[0.00648130+0.32060635j,0,0,0,0,0]
,[0,0,0,0,0,0]
,[0,0,0,0,0,0]
,[0,0,0,0,0,0]
,[0,0,0,0,0,0]
,[0,0,0,0,0,-0.00668978-0.31948359j]])
u,s,d = np.linalg.svd(A)
print('Singular values: ', s)
# SLEPc version
Ap = PETSc.Mat()
Ap.create()
Ap.setSizes(A.shape)
Ap.setUp()
for row in range(A.shape[0]):
for col in range(A.shape[1]):
Ap.setValue(row, col, A[row,col])
Ap.assemble()
#for stype in [SLEPc.SVD.Type.CROSS, SLEPc.SVD.Type.CYCLIC, SLEPc.SVD.Type.LANCZOS, SLEPc.SVD.Type.LAPACK, SLEPc.SVD.Type.TRLANCZOS]:
for stype in [SLEPc.SVD.Type.CYCLIC, SLEPc.SVD.Type.LANCZOS, SLEPc.SVD.Type.LAPACK, SLEPc.SVD.Type.TRLANCZOS]:
S = SLEPc.SVD()
S.create()
S.setOperator(Ap)
S.setType(stype)
S.setDimensions(A.shape[0])
S.solve()
s_slepc = []
i=0
while i < S.getConverged():
s_slepc.append(S.getValue(i))
i += 1
print('Singular values (SLEPc %s): ' % S.getType(), s_slepc)
Produces output:
('Singular values: ', array([ 0.32067186, 0.31955362, 0. , 0. , 0. , 0. ]))
('Singular values (SLEPc cyclic): ', [0.3206718555003113, 0.31955362216025096, 5.558046393682893e-17, 1.5567126663969806e-34, 1.1955235065555233e-34, 8.758810386256485e-36])
('Singular values (SLEPc lanczos): ', [0.32067185550031124, 0.31955362216025107, 7.598620143277e-17, 9.80035376111015e-18, 8.135560423584465e-18, 4.5426042596528355e-18])
('Singular values (SLEPc lapack): ', [0.32067185550031124, 0.31955362216025107, 0.0, 0.0, 0.0, 0.0])
('Singular values (SLEPc trlanczos): ', [0.32067185550031124, 0.31955362216025107, 1.4803092323093608e-09, 9.80035376111015e-18, 8.135560423584465e-18, 4.5426042596528355e-18])
Related
I am currently trying to shuffle an array and am running into some problems.
What I have:
my_array=array([nan, 1, 1, nan, nan, 2, nan, ..., nan, nan, nan])
What I want to do:
I want to shuffle the dataset while keeping the numbers (e.g. the 1,1 in the array) together.
What I did is first converting every naninto an unique negative number.
my_array=array([-1, 1, 1, -2, -3, 2, -4, ..., -2158, -2159, -2160])
Afterward I split everything up with pandas:
df = pd.DataFrame(my_array)
df.rename(columns={0: 'sampleID'}, inplace=True)
groups = [df.iloc[:, 0] for _, df in df.groupby('sampleID')]
If I know shuffle my dataset I will have an equal probability for every group to appear at a given place, but this would neglect the number of elements in each group. If I have a group of several elements like [9,9,9,9,9,9] it should have a higher chance at appearing earlier than some random nan. Correct me on this one if I'm wrong.
One way to get around this problem is numpys choice method.
For this I have to create a probability array
probability_array = np.zeros(len(groups))
for index, item in enumerate(groups):
probability_array[index] = len(item) / len(groups)
All of this to finally call:
groups=np.array(groups,dtype=object)
rng = np.random.default_rng()
shuffled_indices = rng.choice(len(groups), len(groups), replace=False, p=probability_array)
shuffled_array = np.concatenate(groups[shuffled_indices]).ravel()
shuffled_array[shuffled_array < 1] = np.NaN
All of this is quite cumbersome and not very fast. Besides the fact that you can certainly code it better, I feel like I am missing some very simple solution to my problem.
Can somebody point me in the right direction?
One approach:
import numpy as np
from itertools import groupby
# toy data
my_array = np.array([np.nan, 1, 1, np.nan, np.nan, 2, 2, 2, np.nan, 3, 3, 3, np.nan, 4, 4, np.nan, np.nan])
# find groups
groups = np.array([[key, sum(1 for _ in group)] for key, group in groupby(my_array)])
# permute
keys, repetitions = zip(*np.random.permutation(groups))
# recreate new array
res = np.repeat(keys, repetitions)
print(res)
Output (single run)
[ 3. 3. 3. nan nan nan nan 2. 2. 2. 1. 1. nan nan nan 4. 4.]
I have solved your problem under some restrictions
Instead of NaN, I have used zeros as separators
I assumed that an array of yours ALWAYS starts with a sequence of non-zero integers and ends with another sequence of non-zero integers.
With these provisions, I have essentially shuffled a representation of the sequences of integers, and later I have stitched everything in place again.
In [102]: import numpy as np
...: from itertools import groupby
...: a = np.array([int(_) for _ in '1110022220003044440005500000600777'])
...: print(a)
...: n, z = [], []
...: for i,g in groupby(a):
...: if i:
...: n.append((i, sum(1 for _ in g)))
...: else:
...: z.append(sum(1 for _ in g))
...: np.random.shuffle(n)
...: nn = n[0]
...: b = [*[nn[0]]*nn[1]]
...: for zz, nn in zip(z, n[1:]):
...: b += [*[0]*zz, *[nn[0]]*nn[1]]
...: print(np.array(b))
[1 1 1 0 0 2 2 2 2 0 0 0 3 0 4 4 4 4 0 0 0 5 5 0 0 0 0 0 6 0 0 7 7 7]
[7 7 7 0 0 1 1 1 0 0 0 4 4 4 4 0 6 0 0 0 5 5 0 0 0 0 0 2 2 2 2 0 0 3]
Note
The lengths of the runs of separators in the shuffled array is exactly the same as in the original array, but shuffling also the separators is easy. A more difficult problem would be to change arbitrarily the lengths, keepin' the array length unchanged.
I tested two 3x3 matrix to know the inverse in Python and Excel, but the results are different. Which I should consider as the correct or best result?
These are the matrix I tested:
Matrix 1:
1 0 0
1 2 0
1 2 3
Matrix 2:
1 0 0
4 5 0
7 8 9
The Matrix 1 inverse is the same in Python and Excel, but Matrix 2 inverse is different.
In Excel I use the MINVERSE(matrix) function, and in Python np.linalg.inv(matrix) (from Numpy library)
I can't post images yet, so I can't show the results from Excel :c
This is the code I use in Python:
# Matrix 1
A = np.array([[1,0,0],
[1,2,0],
[1,2,3]])
Ainv = np.linalg.inv(A)
print(Ainv)
Result:
[[ 1. 0. 0. ]
[-0.5 0.5 0. ]
[ 0. -0.33333333 0.33333333]]
# (This is the same in Excel)
# Matrix 2
B = np.array([[1,0,0],
[4,5,0],
[7,8,9]])
Binv = np.linalg.inv(B)
print(Binv)
Result:
[[ 1.00000000e+00 0.00000000e+00 -6.16790569e-18]
[-8.00000000e-01 2.00000000e-01 1.23358114e-17]
[-6.66666667e-02 -1.77777778e-01 1.11111111e-01]]
# (This is different in Excel)
I'm beggining to study python and saw this:
I have and array(km_media) that have nan values,
km_media = km / (2019 - year)
it happend because the variable year has some 2019.
So for the sake of learning, I would like to know how do to 2 things:
how can I use the replace() to substitute the nan values for 0 in the variable;
how can i print the variable that has the nan values with the replace.
What I have until now:
1.
km_media = km_media.replace('nan', 0)
print(f'{km_media.replace('nan',0)}')
Thanks
Not sure is this will do what you are looking for?
a = 2 / np.arange(5)
print(a)
array([ inf, 2. , 1. , 0.66666667, 0.5 ])
b = [i if i != np.inf or i != np.nan else 0 for i in a]
print(b)
Output:
[0, 2.0, 1.0, 0.6666666666666666, 0.5]
Or:
np.where(((a == np.inf) | (a == np.nan)), 0, a)
Or:
a[np.isinf(a)] = 0
Also, for part 2 of your question, I'm not sure what you mean. If you have just replaced the inf's with 0, then you will just be printing zeros. If you want the index position of the inf's you have replaced, you can grab them before replacement:
np.where(a == np.inf)[0][0]
Output:
0 # this is the index position of np.inf in array a
I have used sklearn to fit and predict a model, but I want to have the top 5 predictions (in terms of probabilities) per item.
So I used predict_proba, which gave me a list of lists like:
probabilities = [[0.8,0.15,0.5,0,0],[0.4,0.6,0,0,0],[0,0,0,0,1]]
What I want to do, is loop over this list of lists to give me an overview of each prediction made, along with its position in the list (which represents the classes).
When using [i for i, j in enumerate(predicted_proba[0]) if j > 0] it returns me [0],[1] , which is what I want for the complete list of lists (and if possible also with the probability next to it).
When trying to use a for-loop over the above code, it returns an IndexError.
Something like this:
probabilities = [[0.8, 0.15, 0.5, 0, 0], [0.4, 0.6, 0, 0, 0], [0, 0, 0, 0, 1]]
for list in range(0,len(probabilities)):
print("Iteration_number:", list)
for index, prob in enumerate(probabilities[list]):
print("index", index, "=", prob)
Results in:
Iteration_number: 0
index 0 = 0.8
index 1 = 0.15
index 2 = 0.5
index 3 = 0
index 4 = 0
Iteration_number: 1
index 0 = 0.4
index 1 = 0.6
index 2 = 0
index 3 = 0
index 4 = 0
Iteration_number: 2
index 0 = 0
index 1 = 0
index 2 = 0
index 3 = 0
index 4 = 1
for i in predicted_proba:
for index, value in enumerate(i):
if value > 0:
print(index)
Hope this helps.
I have written some code to identify the connected components in a binary image. I have used recursive depth first search. However, for some images, the Python Recursion Limit is not enough. Even though I increase the limit to the maximum supported limit on my computer, the program still fails for some images. How can I iteratively implement DFS? Or is there any other better solution?
My code:
count=1
height = 4
width = 5
g = np.zeros((height+2,width+2))
w = np.zeros((height+2,width+2))
dx = [-1,0,1,1,1,0,-1,-1]
dy = [1,1,1,0,-1,-1,-1,0]
def dfs(x,y,c):
global w
w[x][y]=c
for i in range(8):
nx = x+dx[i]
ny = y+dy[i]
if g[nx][ny] and not w[nx][ny]:
dfs(nx,ny,c)
def find_connected_components(image):
global count,g
g[1:-1,1:-1]=image
for i in range(1,height+1):
for j in range(1,width+1):
if g[i][j] and not w[i][j]:
dfs(i,j,count)
count+=1
mask1 = np.array([[0,0,0,0,1],[0,1,1,0,1],[0,0,1,0,0],[1,0,0,0,1]])
find_connected_components(mask1)
print mask1
print w[1:-1,1:-1]
Input and Output:
[[0 0 0 0 1]
[0 1 1 0 1]
[0 0 1 0 0]
[1 0 0 0 1]]
[[ 0. 0. 0. 0. 1.]
[ 0. 2. 2. 0. 1.]
[ 0. 0. 2. 0. 0.]
[ 3. 0. 0. 0. 4.]]
Have a list of locations to visit
Use a while loop visiting each location, popping it out of the list as you do.
Like so:
def dfs(x,y,c):
global w
locs = [(x,y,c)]
while locs:
x,y,c = locs.pop()
w[x][y]=c
for i in range(8):
nx = x+dx[i]
ny = y+dy[i]
if g[nx][ny] and not w[nx][ny]:
locs.append((nx, ny, c))