numpy/scipy build adjacency matrix from weighted edgelist - python

I'm reading a weighted egdelist / numpy array like:
0 1 1
0 2 1
1 2 1
1 0 1
2 1 4
where the columns are 'User1','User2','Weight'. I'd like to perform a DFS algorithm with scipy.sparse.csgraph.depth_first_tree, which requires a N x N matrix as input. How can I convert the previous list into a square matrix as:
0 1 1
1 0 1
0 4 0
within numpy or scipy?
Thanks for your help.
EDIT:
I've been working with a huge (150 million nodes) network, so I'm looking for a memory efficient way to do that.

You could use a memory-efficient scipy.sparse matrix:
import numpy as np
import scipy.sparse as sparse
arr = np.array([[0, 1, 1],
[0, 2, 1],
[1, 2, 1],
[1, 0, 1],
[2, 1, 4]])
shape = tuple(arr.max(axis=0)[:2]+1)
coo = sparse.coo_matrix((arr[:, 2], (arr[:, 0], arr[:, 1])), shape=shape,
dtype=arr.dtype)
print(repr(coo))
# <3x3 sparse matrix of type '<type 'numpy.int64'>'
# with 5 stored elements in COOrdinate format>
To convert the sparse matrix to a dense numpy array, you could use todense:
print(coo.todense())
# [[0 1 1]
# [1 0 1]
# [0 4 0]]

Try something like the following:
import numpy as np
import scipy.sparse as sps
A = np.array([[0, 1, 1],[0, 2, 1],[1, 2, 1],[1, 0, 1],[2, 1, 4]])
i, j, weight = A[:,0], A[:,1], A[:,2]
# find the dimension of the square matrix
dim = max(len(set(i)), len(set(j)))
B = sps.lil_matrix((dim, dim))
for i,j,w in zip(i,j,weight):
B[i,j] = w
print B.todense()
>>>
[[ 0. 1. 1.]
[ 1. 0. 1.]
[ 0. 4. 0.]]

Related

Efficiently permute array in row wise using Numpy

Given a 2D array, I would like to permute this array row-wise.
Currently, I will create a for loop to permute the 2D array row by row as below:
for i in range(npart):
pr=np.random.permutation(range(m))
# arr_rand3 is the same as arr, but with each row permuted
arr_rand3[i,:]=arr[i,pr]
But, I wonder whether there is some setting within Numpy that can perform this in single line (without the for-loop).
The full code is
import numpy as np
arr=np.array([[0,0,0,0,0],[0,4,1,1,1],[0,1,1,2,2],[0,3,2,2,2]])
npart=len(arr[:,0])
m=len(arr[0,:])
# Permuted version of arr
arr_rand3=np.zeros(shape=np.shape(arr),dtype=np.int)
# Nodal association matrix for C
X=np.zeros(shape=(m,m),dtype=np.double)
# Random nodal association matrix for C_rand3
X_rand3=np.zeros(shape=(m,m),dtype=np.double)
for i in range(npart):
pr=np.random.permutation(range(m))
# arr_rand3 is the same as arr, but with each row permuted
arr_rand3[i,:]=arr[i,pr]
In Numpy 1.19+ you should be able to do:
import numpy as np
arr = np.array([[0, 0, 0, 0, 0], [0, 4, 1, 1, 1], [0, 1, 1, 2, 2], [0, 3, 2, 2, 2]])
rng = np.random.default_rng()
arr_rand3 = rng.permutation(arr, axis=1)
print(arr_rand3)
Output
[[0 0 0 0 0]
[4 0 1 1 1]
[1 0 1 2 2]
[3 0 2 2 2]]
According to the documentation, the method random.Generator.permutation receives a new parameter axis:
axis int, optional
The axis which x is shuffled along. Default is 0.

Numpy broadcast array to smaller array with exact position for every row

Consider example matrix array:
[[0 1 2 1 0]
[1 1 2 1 0]
[0 1 0 0 0]
[1 2 1 0 0]
[1 2 2 3 2]]
What I need to do:
find maxima in every row
select smaller surrounding of the maxima from every row (3 values in this case)
paste the surrounding of the maxima into new array (narrower)
For the example above, the result is:
[[ 1. 2. 1.]
[ 1. 2. 1.]
[ 0. 1. 0.]
[ 1. 2. 1.]
[ 2. 3. 2.]]
My current working code:
import numpy as np
A = np.array([
[0, 1, 2, 1, 0],
[1, 1, 2, 1, 0],
[0, 1, 0, 0, 0],
[1, 2, 1, 0, 0],
[1, 2, 2, 3, 2],
])
b = A.argmax(axis=1)
C = np.zeros((len(A), 3))
for idx, loc, row in zip(range(len(A)), b, A):
print(idx, loc, row)
C[idx] = row[loc-1:loc+2]
print(C)
My question:
How to get rid of the for loop and replace it with some cheaper numpy operation?
Note:
This algorithm is for straightening broken "lines" in video stream frames with thousands of rows.
Approach #1
We can have a vectorized solution based on setting up sliding windows and then indexing into those with b-offsetted indices to get desired output. We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.
The implementation would be -
from skimage.util.shape import view_as_windows
L = 3 # window length
w = view_as_windows(A,(1,L))[...,0,:]
Cout = w[np.arange(len(b)),b-L//2]
Being a view-based method, this has the advantage of being memory-efficient and hence good on performance too.
Approach #2
Alternatively, a one-liner by creating all those indices with outer-addition would be -
A[np.arange(len(b))[:,None],b[:,None] + np.arange(-(L//2),L//2+1)]
This works by making and array with all the desired indices, but somehow using that directly on A results in a 3D array, hence the subsequent indexing... Probably not optimal, but definitely another way of doing it!
import numpy as np
A = np.array([
[0, 1, 2, 1, 0],
[1, 1, 2, 1, 0],
[0, 1, 0, 0, 0],
[1, 2, 1, 0, 0],
[1, 2, 2, 3, 2],
])
b = A.argmax(axis = 1).reshape(-1, 1)
index = b + np.arange(-1,2,1).reshape(1, -1)
A[:,index][np.arange(b.size),np.arange(b.size)]

n*n matrix of 0s and 1s in python [duplicate]

This question already has answers here:
How to make a checkerboard in numpy?
(28 answers)
Closed 4 years ago.
How to create an ‘n*n’ checkerboard matrix with the values alternate 0 and 1, using the tile function.
For example:
when n has a value of 2, Output should be:
[[0 1]
[1 0]]
I am able to create a matrix with 0 and 1, but they are not appearing alternatively, below is what i tried:
import numpy as np
n = 4
arr = ([0,1])
print(np.tile(arr,(n,n//2)))
output I got:
[[0 1 0 1]
[0 1 0 1]
[0 1 0 1]
[0 1 0 1]]`
output I want:
[[0 1 0 1]
[1 0 1 0]
[0 1 0 1]
[1 0 1 0]]`
A simple way using numpy could be to define a vector of 0s and 1s of size n and take advantage of broadcasting to create a nxn checkerboard:
def checkerboard(n):
a = np.resize([0,1], n)
return np.abs(a-np.array([a]).T)
Sample use -
checkerboard(2)
array([[0, 1],
[1, 0]])
checkerboard(4)
array([[0, 1, 0, 1],
[1, 0, 1, 0],
[0, 1, 0, 1],
[1, 0, 1, 0]])
Details -
The above works by initially creating a length n 1D vector of 0s and 1s using np.resize:
import numpy as np
n = 3
np.resize([0,1], n)
array([0, 1, 0])
And then subtracting its transposed (2D), which will result in a broadcast array of shape (n,n), with negative and positive 1s:
a-np.array([a]).T
array([[ 0, 1, 0, 1],
[-1, 0, -1, 0],
[ 0, 1, 0, 1],
[-1, 0, -1, 0]])
We just need to take the absolute value of it and we have a checkerboard matrix.
You could use numpy fancy indexing, no need to use np.tile:
import numpy as np
def tiling(n):
result = np.zeros((n, n))
result[::2, 1::2] = 1
result[1::2, ::2] = 1
return result
print(tiling(2))
print()
print(tiling(4))
Output
[[0. 1.]
[1. 0.]]
[[0. 1. 0. 1.]
[1. 0. 1. 0.]
[0. 1. 0. 1.]
[1. 0. 1. 0.]]
Here is a one line numpy solution. That said, I think Daniel's response is much more readable and probably more efficient.
If n is odd then np.arange(n*n).reshape(n,n)%2 gives the correct result. However, if n is even, then all the rows and columns will be the same (like your result). We can fix this by subtracting one from every other row.
tile = (np.arange(n*n).reshape(n,n)-np.arange(n).reshape(n,1)*(n%2+1))%2
Equivalently,
tile = (np.arange(n*n).reshape(n,n,order='F')-np.arange(n)*(n+1))%2

Adding values to non zero elements in a Sparse Matrix

I have a sparse matrix in which I want to increment all the values of non-zero elements by one. However, I cannot figure it out. Is there a way to do it using standard packages in python? Any help will be appreciated.
I cannot comment on it's performance but you can do (Scipy 1.1.0);
>>> from scipy.sparse import csr_matrix
>>> a = csr_matrix([[0, 2, 0], [1, 0, 0]])
>>> print(a)
(0, 1) 2
(1, 0) 1
>>> a[a.nonzero()] = a[a.nonzero()] + 1
>>> print(a)
(0, 1) 3
(1, 0) 2
If your matrix have 2 dimensions, you can do the following:
sparse_matrix = [[element if element==0 else element+1 for element in row ]for row in sparse_matrix]
It will iterate over every element of your matrix and return the element without any change if it is equals to zero, else it add 1 to the element and return it.
More about conditionals in list comprehension in the answer for this question.
You can use the package numpy which has efficient functions for dealing with n-dimensional arrays. What you need is:
array[array>0] += 1
where array is the numpy array of your matrix. Example here:
`
import numpy as np
my_matrix = [[2,0,0,0,7],[0,0,0,4,0]]
array = np.array(my_matrix);
print("Matrix before incrementing values: \n", array)
array[array>0] += 1
print("Matrix after incrementing values: \n", array)`
Outputs:
Matrix before incrementing values:
[[2 0 0 0 7]
[0 0 0 4 0]]
Matrix after incrementing values:
[[3 0 0 0 8]
[0 0 0 5 0]]
Hope this helps!
When you have a scipy sparse matrix (scipy.sparse) is:
import scipy.sparse as sp
my_matrix = [[2,0,0,0,7],[0,0,0,4,0]]
my_matrix = sp.csc_matrix(my_matrix)
my_matrix.data += 1
my_matrix.todense()
Returns:
[[3, 0, 0, 0, 8], [0, 0, 0, 5, 0]]

Sum over rows in scipy.sparse.csr_matrix

I have a big csr_matrix and I want to add over rows and obtain a new csr_matrix with the same number of columns but reduced number of rows. (Context: The matrix is a document-term matrix obtained from sklearn CountVectorizer and I want to be able to quickly combine documents according to codes associated with these documents)
For a minimal example, this is my matrix:
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import vstack
row = np.array([0, 4, 1, 3, 2])
col = np.array([0, 2, 2, 0, 1])
dat = np.array([1, 2, 3, 4, 5])
A = csr_matrix((dat, (row, col)), shape=(5, 5))
print A.toarray()
[[1 0 0 0 0]
[0 0 3 0 0]
[0 5 0 0 0]
[4 0 0 0 0]
[0 0 2 0 0]]
No let's say I want a new matrix B in which rows (1, 4) and (2, 3, 5) are combined by summing them, which would look something like this:
[[5 0 0 0 0]
[0 5 5 0 0]]
And should be again in sparse format (because the real data I'm working with is large). I tried to sum over slices of the matrix and then stack it:
idx1 = [1, 4]
idx2 = [2, 3, 5]
A_sub1 = A[idx1, :].sum(axis=1)
A_sub2 = A[idx2, :].sum(axis=1)
B = vstack((A_sub1, A_sub2))
But this gives me the summed up values just for the non-zero columns in the slice, so I can't combine it with the other slices because the number of columns in the summed slices are different.
I feel like there must be an easy way to do this. But I couldn't find any discussion of this online or in the documentation. What am I missing?
Thank you for your help
Note that you can do this by carefully constructing another matrix. Here's how it would work for a dense matrix:
>>> S = np.array([[1, 0, 0, 1, 0,], [0, 1, 1, 0, 1]])
>>> np.dot(S, A.toarray())
array([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])
>>>
The sparse version is only a little more complicated. The information about which rows should be summed together is encoded in row:
col = range(5)
row = [0, 1, 1, 0, 1]
dat = [1, 1, 1, 1, 1]
S = csr_matrix((dat, (row, col)), shape=(2, 5))
result = S * A
# check that the result is another sparse matrix
print type(result)
# check that the values are the ones we want
print result.toarray()
Output:
<class 'scipy.sparse.csr.csr_matrix'>
[[5 0 0 0 0]
[0 5 5 0 0]]
You can handle more rows in your output by including higher values in row and extending the shape of S accordingly.
The indexing should be:
idx1 = [0, 3] # rows 1 and 4
idx2 = [1, 2, 4] # rows 2,3 and 5
Then you need to keep A_sub1 and A_sub2 in sparse format and use axis=0:
A_sub1 = csr_matrix(A[idx1, :].sum(axis=0))
A_sub2 = csr_matrix(A[idx2, :].sum(axis=0))
B = vstack((A_sub1, A_sub2))
B.toarray()
array([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])
Note, I think the A[idx, :].sum(axis=0) operations involve conversion from sparse matrices - so #Mr_E's answer is probably better.
Alternatively, it works when you use axis=0 and np.vstack (as opposed to scipy.sparse.vstack):
A_sub1 = A[idx1, :].sum(axis=0)
A_sub2 = A[idx2, :].sum(axis=0)
np.vstack((A_sub1, A_sub2))
Giving:
matrix([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])

Categories

Resources