sparse matrix to ijv format - python

Is there an efficient way to represent a sparse matrix as ijv (3 arrays : row, column, value) form.
Using nested loop seems very naive and slow for large matrices.
The code comes from here:
# Python program for Sparse Matrix Representation
# using arrays
# assume a sparse matrix of order 4*5
# let assume another matrix compactMatrix
# now store the value,row,column of arr1 in sparse matrix compactMatrix
sparseMatrix = [[0,0,3,0,4],[0,0,5,7,0],[0,0,0,0,0],[0,2,6,0,0]]
# initialize size as 0
size = 0
for i in range(4):
for j in range(5):
if (sparseMatrix[i][j] != 0):
size += 1
# number of columns in compactMatrix(size) should
# be equal to number of non-zero elements in sparseMatrix
rows, cols = (3, size)
compactMatrix = [[0 for i in range(cols)] for j in range(rows)]
k = 0
for i in range(4):
for j in range(5):
if (sparseMatrix[i][j] != 0):
compactMatrix[0][k] = i
compactMatrix[1][k] = j
compactMatrix[2][k] = sparseMatrix[i][j]
k += 1
for i in compactMatrix:
print(i)
# This code is contributed by MRINALWALIA
I am going to print sparse matrix to file in ijv form and read it in C++.
scipy.sparse.coo_matrix just give me:
print(coo_matrix([[0,0,3,0,4],[0,0,5,7,0],[0,0,0,0,0],[0,2,6,0,0]]))
(0, 2) 3
(0, 4) 4
(1, 2) 5
(1, 3) 7
(3, 1) 2
(3, 2) 6
with np.where() I can get the index of nonzero elements, but how about the v array?
Probably do you know a more efficient method (I am not going to use swig, ... to wrap the code)?
Edit
size=np.count_nonzero(sparseMatrix)
rows, cols = np.where(sparseMatrix)
compactMatrix = np.zeros((3, size))
for i in range(size):
compactMatrix[0][i] = rows[i]
compactMatrix[1][i] = cols[i]
compactMatrix[2][i] = sparseMatrix[rows[i]][cols[i]]
print(compactMatrix)
That's what I thought finally.

Related

Fancy indexing of a numpy ndarray

Suppose i have an array shaped as a:
import numpy as np
n = 10
d = 5
a = np.zeros(shape = np.repeat(n,d))
And that I want to obtain the values corresponding to indexes (0,...,:,...,0) for the : along dimensions, resulting in a (n,d)-shaped array b, with b[i,j] = a[0,...,0,i,0,...,0] where the i is in the jth dimension.
How can i extractb from a ?
Get the flattened indices and just index for a vectorized solution -
n = len(a)
d = a.ndim
idxs = np.multiply.outer(n**np.arange(d), np.arange(n))
out = a.flat[idxs]
Easiest is to do a for loop:
# get the first slice of `a` along given dimension `j`
def get_slice(a,j):
idx = [0]*len(a.shape)
idx[j] = slice(None)
return a[tuple(idx)]
out = np.stack([get_slice(a,j) for j in range(len(a.shape))])
And out.shape is (10,5)

python identity matrix for loop

A program that will create a square matrix of any size the values on its diagonals are 1, and the remaining values of the matrix are 0.
matrix = []
dimension = int (input ("Enter matrix unit size:"))
for i in range (0, dimension):
     for j in range (0, dimension):
         if i == j:
             matrix.append (1)
         else:
             matrix.append (0)
        
print (matrix)
I need matrix like [[],[],[]], how?
matrix[[i]].append(1) - doesn't work
You need to insert one row to matrix before entering the for j loop, and then add the element to the row, rather than to the matrix.
matrix = []
dimension = int(input("Enter identity matrix size:"))
for i in range(0, dimension):
row = []
matrix.append(row)
for j in range(0, dimension):
if i == j:
row.append(1)
else:
row.append(0)
print(matrix)
You could let
matrix = [[1 if i == j else 0 for i in range(dimension)] for j in range(dimension)]
Note, though, that any sort of linear algebra will be much more conveniently carried out in NumPy/SciPy. In NumPy, for instance, the identity matrix would be produced with numpy.eye through
import numpy as np
np.eye(dimension)
and in SciPy, using scipy.sparse.identity,
from scipy.sparse import identity
identity(dimension)
import numpy as np
matrix = []
dimension = int (input ("Enter matrix unit size:"))
for i in range (0, dimension):
for j in range (0, dimension):
if i == j:
matrix.append (1)
else:
matrix.append (0)
npmatrix = np.array(matrix)
npmatrix = npmatrix.reshape(dimension,dimension)
print(npmatrix)

How to concatenate vectors to create a array in loop?

I have a loop for that generate a new vector (100,) in each iteration. So the code loop likes
for i in range (10):
for j in range (4):
#Create a new vector (100,)
#Concatenate 4 vector together to make (400,) #400=4*10
#Append the concatenation vectors (400,) in vertical to make (10,400) array
My expected is that generates a matrix size of (10,400) that concatenate vectors in these loops
Currently, my solution is
matrix_= np.empty([10,400])
for i in range (10):
vector_horz=[]
for j in range (4):
#Create a new vector (100,)
vector_rnd=#Random make a vector/list with size of (100,1)
#Concatenate 4 vector together to make (400,) #400=4*10
vector_horz.append(vector_rnd)
#Append the concatenation vectors (400,) in vertical to make (10,400) array
matrix_(:,i)=vector_horz
However, it said that my size of matrix and vector_horz cannot assign. Could you give me another solution?
Option 1
(Recommended) First generate your data and create an array at the end:
data = []
for i in range(10):
for j in range(4):
temp_list = ... # random vector of 100 elements
data.extend(temp_list)
arr = np.reshape(data, (10, 400))
Option 2
Alternatively, initialise an empty array with np.empty and assign one slice at a time:
arr = np.empty((10, 400))
for i in range(10):
for j in range(4):
temp_list = ...
arr[i, j * 100 : (j + 1) * 100] = temp_list

Split a 2d array creating array from row to row values with unitary displacement

I want to split a 2D array this way:
Example:
From this 4x4 2D array:
np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
Create these five 2x2 2D arrays, with unitary displacement (shift):
np.array([[1,2],[3,4]])
np.array([[4,5],[6,7]])
np.array([[7,8],[9,10]])
np.array([[10,11],[12,13]])
np.array([[13,14],[15,16]])
In a general case, from an NXN 2D array (square arrays) create Y 2D arrays of MXM shape, as many as possible.
Just to be more precise: to create the output array, not necessarily it will be made of all values from the row.
Example:
From a 2D 8x8 array, with values from 1 to 64, if I want to split this array in 2D 2x2 arrays, the first row from 8x8 array is a row from 1 to 8, and the first output 2D 2x2 array will be np.array([[1,2],[3,4]]), and the second output 2D 2x2 array will be np.array([[4,5],[6,7]])... It continues until the last output 2D array, that will be np.array([[61,62],[63,64]]). Look that each 2D 2x2 array was not filled with all the values from the row (CORRECT). And that exists a unitary displacement (shift) from previous array to next array.
There is a Numpy method that do this?
MSeifert answered here (How to split an 2D array, creating arrays from "row to row" values) a question that solves almost 95% of this question, except the unitary displacement (shift) part.
So, from the 4x4 2D array example:
np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
Instead of create these FOUR 2x2 2D arrays (without unitary shift/displacement):
np.array([[1,2],[3,4]])
np.array([[5,6],[7,8]])
np.array([[9,10],[11,12]])
np.array([[13,14],[15,16]])
Create these FIVE 2x2 2D arrays (with unitary shift/displacement):
np.array([[1,2],[3,4]])
np.array([[4,5],[6,7]])
np.array([[7,8],[9,10]])
np.array([[10,11],[12,13]])
np.array([[13,14],[15,16]])
And, of course, it should work for the general case of, given a square NXN 2D array, to create Y MXM square 2D arrays.
Example: from a 60x60 square 2d array, create Y MXM square 2D arrays (10x10, for example).
Plus: I need to know what is the rule that relates the number of points of the original square 2D array (4x4 2D array, in the example), with points of mini square 2D arrays (2X2 2D arrays, in the example). In the case, given 16 points (4x4 2D array), it is possible create 5 2x2 2D arrays (each one with 4 points).
The condition for the subarrays to exactly fit is (M+1)*(M-1) divides (N+1)*(N-1), the number of subarrays you can put is the quotient of these numbers. Note that these numbers are equal to M*M-1 and N*N-1. In this form the rule also applies to non square matrices.
Examples
M N M*M-1 N*N-1 Y
-----------------------------
3 5 8 24 3
3 7 8 48 6
5 7 24 48 2
4 11 15 120 8
Implementation: Please note that this returns overlapping views into the original array. If you want to modify them you may want to make a copy. Also note that this implementation fits as many subsquares as fit, any leftover elements in the larger matrix are dropped.
Update: I've added two functions which calculate given N all possible M and vice versa.
Output:
# Testing predictions ...
# ok
# M = 105
# solutions: [105, 1273, 1377, 4135, 4239, 5407, 5511, 5513]
# this pattern repeats at offsets 5512, 11024, 16536, ...
# N = 1000001
# solutions: [2, 3, 4, 5, 7, 9, 11, 31, 49, 1000001]
# example N, M = (5, 3)
# [[[ 0 1 2]
# [ 3 4 5]
# [ 6 7 8]]
# [[ 8 9 10]
# [11 12 13]
# [14 15 16]]
# [[16 17 18]
# [19 20 21]
# [22 23 24]]]
Code:
import numpy as np
import sympy
import itertools as it
import functools as ft
import operator as op
def get_subsquares(SqNN, M0, M1=None):
M1 = M0 if M1 is None else M1
N0, N1 = SqNN.shape
K = (N0*N1-1) // (M0*M1-1)
SqNN = SqNN.ravel()
s, = SqNN.strides
return np.lib.stride_tricks.as_strided(SqNN, (K, M0, M1),
(s*(M0*M1-1), s*M1, s))
def get_M_for_N(N):
"""Given N return all possible M
"""
assert N >= 2
f = 1 + (N & 1)
factors = sympy.factorint((N+1)//f)
factors.update(sympy.factorint((N-1)//f))
if f == 2:
factors[2] += 2
factors = [ft.reduce(op.mul, fs) for fs in it.product(
*([a**k for k in range(n+1)] for a, n in factors.items()))]
return [fs + 1 for fs in sorted(set(factors) & set(fs - 2 for fs in factors)) if (N*N-1) % (fs * (fs+2)) == 0]
def get_N_for_M(M):
"""Given M return all possible N in the form period, smallest
smallest is a list of all solutions between M and M*M if M is even
and between M and (M*M+1) / 2 if M is odd, all other solutions can be
obtained by adding multiples of period
"""
assert M >= 2
f = 1 + (M & 1)
factors = sympy.factorint((M+1)//f)
factors.update(sympy.factorint((M-1)//f))
factors = [k**v for k, v in factors.items()]
rep = (M+1)*(M-1) // f
f0 = [ft.reduce(op.mul, fs) for fs in it.product(*zip(it.repeat(1), factors))]
f1 = [rep // (f*a) for a in f0]
inv = [f if b==1 else f*b + 2 if a==1 else 2 * sympy.mod_inverse(a, b)
for a, b in zip(f1, f0)]
if f==1:
inv[1:-1] = [a%b for a, b in zip(inv[1:-1], f0[1:-1])]
return rep, sorted(a*b - 1 for a, b in zip(f1, inv))
def test_predict(N):
def brute_force(a, b):
return [i for i in range(a, b) if (i*i-1) % (a*a-1) == 0]
for x in range(2, N+1):
period, pred = get_N_for_M(x)
assert brute_force(x, period*4+2) \
== [a + b for b in range(0, 4*period, period) for a in pred]
def brute_force(b):
return [i for i in range(2, b+1) if (b*b-1) % (i*i-1) == 0]
for x in range(2, N+1):
assert brute_force(x) == get_M_for_N(x)
print('ok')
# test
print("Testing predictions ...")
test_predict(200)
print()
# examples
M = 105
period, pred = get_N_for_M(M)
print(f"M = {M}")
print(f"solutions: {pred}")
print(f"this pattern repeats at offsets {period}, {2*period}, {3*period}, ...")
print()
N = 1000001
pred = get_M_for_N(N)
print(f"N = {N}")
print(f"solutions: {pred}")
print()
N, M = 5, 3
SqNN = np.arange(N*N).reshape(N, N)
print(f"example N, M = ({N}, {M})")
print(get_subsquares(SqNN, M))

Filter array, store adjacency information

Let's say I have an 2D array of (N, N) shape:
import numpy as np
my_array = np.random.random((N, N))
Now I want to do some computations only on some "cells" of this array, for instance the ones inside the central part of the array. To avoid doing computations on cells I'm not interested in, what I usually do here is create a Boolean mask, in this spirit:
my_mask = np.zeros_like(my_array, bool)
my_mask[40:61,40:61] = True
my_array[my_mask] = some_twisted_computations(my_array[my_mask])
But what if some_twisted_computations() involves values of the neighboring cells if they are inside the mask? Performance-wise, would it be a good idea to create an "adjacency array" with a (len(my_mask), 4) shape, storing the index of 4-connected neighbor cells in the flat my_array[mask] array that I will use in some_twisted_computations()? If yes, what are the efficient options for computing such adjacency array? Should I switch to lower-level langage/other data structures?
My real-worlds arrays shapes are around (1000,1000,1000), the mask concerns only a small subset (~100000) of these values and is of rather complex geometry. I hope my questions make sense...
EDIT: the very dirty and slow solution I've worked out:
wall = mask
i = 0
top_neighbors = []
down_neighbors = []
left_neighbors = []
right_neighbors = []
indices = []
for index, val in np.ndenumerate(wall):
if not val:
continue
indices += [index]
if wall[index[0] + 1, index[1]]:
down_neighbors += [(index[0] + 1, index[1])]
else:
down_neighbors += [i]
if wall[index[0] - 1, index[1]]:
top_neighbors += [(index[0] - 1, index[1])]
else:
top_neighbors += [i]
if wall[index[0], index[1] - 1]:
left_neighbors += [(index[0], index[1] - 1)]
else:
left_neighbors += [i]
if wall[index[0], index[1] + 1]:
right_neighbors += [(index[0], index[1] + 1)]
else:
right_neighbors += [i]
i += 1
top_neighbors = [i if type(i) is int else indices.index(i) for i in top_neighbors]
down_neighbors = [i if type(i) is int else indices.index(i) for i in down_neighbors]
left_neighbors = [i if type(i) is int else indices.index(i) for i in left_neighbors]
right_neighbors = [i if type(i) is int else indices.index(i) for i in right_neighbors]
The best answer will probably depend on the nature of the computations you want to do. For example, if they can be expressed as summations over neighboring pixels, then something like np.convolve or scipy.signal.fftconvolve can be a really nice solution.
For your specific question of efficiently generating arrays of neighbor indices, you might try something like this:
x = np.random.rand(100, 100)
mask = x > 0.9
i, j = np.where(mask)
i_neighbors = i[:, np.newaxis] + [0, 0, -1, 1]
j_neighbors = j[:, np.newaxis] + [-1, 1, 0, 0]
# need to do something with the edge cases
# the best choice will depend on your application
# here we'll change out-of-bounds neighbors to the
# central point itself.
i_neighbors = np.clip(i_neighbors, 0, 99)
j_neighbors = np.clip(j_neighbors, 0, 99)
# compute some vectorized result over the neighbors
# as a concrete example, here we'll do a standard deviation
result = x[i_neighbors, j_neighbors].std(axis=1)
The result is an array of values corresponding to the masked region, containing the standard deviation of neighboring values.
Hopefully that approach will work for whatever specific problem you have in mind!
Edit: given the edited question above, here's how my response can be adapted to generate arrays of indices in a vectorized manner:
x = np.random.rand(100, 100)
mask = x > -0.9
i, j = np.where(mask)
i_neighbors = i[:, np.newaxis] + [0, 0, -1, 1]
j_neighbors = j[:, np.newaxis] + [-1, 1, 0, 0]
i_neighbors = np.clip(i_neighbors, 0, 99)
j_neighbors = np.clip(j_neighbors, 0, 99)
indices = np.zeros(x.shape, dtype=int)
indices[mask] = np.arange(len(i))
neighbor_in_mask = mask[i_neighbors, j_neighbors]
neighbors = np.where(neighbor_in_mask,
indices[i_neighbors, j_neighbors],
np.arange(len(i))[:, None])
left_indices, right_indices, top_indices, bottom_indices = neighbors.T

Categories

Resources