broadcast intersection on a numpy array - python

I am trying to generate a large 2D numpy NxN array (larr) where each cell contains the intersection (c) between lists (a or b) of string elements (director names) belonging both to the unit represented by the row (company i) and the unit represented by the column (company j). The lists (a and b) are taken from another array (marray) where companies are identified by an integer between 1 and N in the column 'nfirm'. I am not interested in the diagonal of the matrix (I substitute NaN).
I came up with the following nested loop but it is very slow and memory consuming. I was wondering whether it is possible to do it in a more efficient way by broadcasting the union operation. Any tip to improve it is very much appreciated. Thanks!
larr = np.empty(shape=(N,N), dtype=object)
for i in range(1,N):
for j in range(1,N):
a= marray['listdir'][marray['nfirm']==i].tolist()
b= marray['listdir'][marray['nfirm']==j].tolist()
c=np.intersect1d(a,b)
if (len(c)>0 and (i!=j)):
larr[i,j]=c
else:
larr[i,j]='NaN'
del a, b, c

Minor improvement: a is recomputed j times but independent of j:
larr = np.empty(shape=(N, N), dtype=object)
for i in range(1, N):
a = marray['listdir'][marray['nfirm']==i].tolist()
for j in range(1, N):
b = marray['listdir'][marray['nfirm']==j].tolist()
c = np.intersect1d(a,b)
if len(c) > 0 and i != j:
larr[i, j] = c
else:
larr[i, j] = 'NaN'
del b
del a
del c

So if I understand correctly,
marray['listdir'] is a list of string representing directors, indexed by movie.
marray['nfirm'] is a list of integers representing companies, also indexed by movie.
You want to create a matrix of directors that have been shared by pairs of companies.
To be more efficient, you can first build a dictionary linking companies to movies, then build the matrix:
firm_movies = [[] for _ in xrange(N)]
for i, m in enumerate(marray['nfirm']):
firm_movies[m].append(i)
larr = np.empty(shape=(N, N), dtype=object)
for i in xrange(N):
larr[i, i] = 'NaN'
for j in xrange(i+1, N):
a = marray['listdir'][firm_movies[i]]
b = marray['listdir'][firm_movies[j]]
c = np.intersect1d(a,b)
larr[i, j] = larr[j, i] = c if len(c)>0 else 'NaN'

Related

Issues with generating the following block matrix using horizontal stacking and vertical stacking

I am trying to generate the following block matrix consisting of submatrices A and B, and N is a positive integer. So far, my code is as follows:
C_lower = B
for j in range(0,N):
for i in range(0,N-j):
col = np.linalg.matrix_power(A,i) # B
C = np.hstack(np.vstack((C_lower,col)))
However, it seems like my code is not working because the loop continues forever. Any suggestions?
Similarly, I'm also having issues with constructing the following block diagonal matrices:
I tried using block_diag from scipy, but there is no way I can repeat Q as many times as N is equal to (i.e., N = 50 in my case). I had to do block_diag(Q,Q,Q,Q,Q,Q,Q.......) in order to get the block diagonal matrix I want.
Here's the answer to your first question. There are a number of issues in your code. This is a better way of achieving what you want:
C = np.zeros((N, N, A.shape[0], B.shape[1]))
for i in range(N):
for j in range(i + 1):
C[i, j] = np.linalg.matrix_power(A, i - j) # B
Similarly for your second question:
Q_ = np.zeros((N, N, *Q.shape))
for i in range(N):
Q_[i, i] = Q

Python: einsum inside for loop

Suppose A and B are two 4 dimensional numpy arrays with the same dimension.
A = np.random.rand(5,5,2,10)
B = np.random.rand(5,5,2,10)
a, b, c, d = A.shape
dat = []
for k in range(d):
sum = 0
for l in range(c):
sum = sum + np.einsum('ij,ji->', A[:,:,l,k], B[:,:,l,k])
dat.append(sum)
I was wondering whether I can use the "einsum" to replace the inner for loop, maybe even outer for loop, or maybe some matrix manipulation to replace all of it, casue the data set is large.
Is there any faster way to achieve this?

Checking triangle inequality in a massive numpy matrix

I have a symmetric NumPy matrix D of non-negative floating point numbers. A number in the ith row and jth column represents the distance between objects i and j, whatever they are. The matrix is large (~10,000 rows/columns). I would like to check if all the distances in the matrix obey the triangle inequality, that is: D[i,j]<=D[i,k]+D[k,j] for all i, j, k.
The problem can be solved, quite inefficiently, by using a triple-nested loop. But is there a faster, vectorized solution?
You can certainly vectorise the innermost loop easily enough with (untested):
for i in range(N):
for j in range(i):
assert all(D[i,j] <= D[i,:] + D[:,j])
For double vectorisation you can loop through k (also untested):
for k in range(N):
row = D[k,:].reshape(1, N)
col = D[:,k].reshape(N, 1)
assert (D <= row + col).all()
(row + col generates a square matrix the same size as D)

Matrix multiplication explanation

When we multiply two matrices A of size m x k and B of size k x n we use the following code:
#for resultant matrix rows
for i in range(m):
#for resultant matrix column
for j in range(n):
for l in range(k):
#A's row x B's columns
c[i][j]=c[i][j]+a[i][l]*b[l][j]
are my comments in the code right explanation of the loops? Is there a better explanation of the loops or is there a better thought process to code matrix multiplication?
EDIT1: I am not looking for a better code. My question is about the thought process that goes in when we transform the math of matrix multiplicate into code.
Your code is correct but if you want to add detail comment/explanation like you ask for you can do so:
#for resultant matrix rows
for i in range(m):
#for resultant matrix column
for j in range(n):
#for each entry in resultant matrix we have k entries to sum
for l in range(k):
#where each i, j entry in the result matrix is given by multiplying the
#entries A[i][l] (across row i of A) by the entries B[l][j] (down
#column j of B), for l = 1, 2, ..., k, and summing the results over l:
c[i][j]=c[i][j]+a[i][l]*b[l][j]
EDIT: if you want a better explanation of the loop or thought process than take out #A's row x B's columns comments. and replace it with "where each i, j entry in the result matrix is given by multiplying the entries A[i][l] (across row i of A) by the entries B[l][j] (down column j of B), for l = 1, 2, ..., k, and summing the results over " also don't use l as an iterator it looks like a 1
You can use numpy.dot function. Here's the documentation. Example (extracted from the documentatio):
> a = [[1, 0], [0, 1]]
> b = [[4, 1], [2, 2]]
> np.dot(a, b)
> array([[4, 1],
[2, 2]])
The condition that should always stand in order to do 2 matrices multiplication is that first matrix must have the same amount of rows that the other matrix has columns.
so if matrix_1 is m x n than second matrix_2 should be n x p. The result of the two will have a dimension of m x p
the Pseudocode will be:
multiplyMatrix(matrix1, matrix2)
-- Multiplies rows and columns and sums them
multiplyRowAndColumn(row, column) returns number
var
total: number
begin
for each rval in row and cval in column
begin
total += rval*cval
end
return total
end
begin
-- If the rows don't match up then the function fails
if matrix1:n != matrix2:m return failure;
dim = matrix1:n -- Could also be matrix2:m
newmat = new squarematrix(dim) -- Create a new dim x dim matrix
for each r in matrix1:rows and c in matrix2:columns
begin
end
end
In python either you can do what you did, or you can use ijk-algo, ikj-algo, psyco ikj-algo, Numpy, or SciPy to accomplish this. It appears that Numpy is the fastest and most efficient.
YOUR CODE LOOKS RIGHT AND YOUR COMMENTS ALSO DO LOOK CORRECT

Python 3: Multiply a vector by a matrix without NumPy

I'm fairly new to Python and trying to create a function to multiply a vector by a matrix (of any column size).
e.g.:
multiply([1,0,0,1,0,0], [[0,1],[1,1],[1,0],[1,0],[1,1],[0,1]])
[1, 1]
Here is my code:
def multiply(v, G):
result = []
total = 0
for i in range(len(G)):
r = G[i]
for j in range(len(v)):
total += r[j] * v[j]
result.append(total)
return result
The problem is that when I try to select the first row of each column in the matrix (r[j]) the error 'list index out of range' is shown. Is there any other way of completing the multiplication without using NumPy?
The Numpythonic approach: (using numpy.dot in order to get the dot product of two matrices)
In [1]: import numpy as np
In [3]: np.dot([1,0,0,1,0,0], [[0,1],[1,1],[1,0],[1,0],[1,1],[0,1]])
Out[3]: array([1, 1])
The Pythonic approach:
The length of your second for loop is len(v) and you attempt to indexing v based on that so you got index Error . As a more pythonic way you can use zip function to get the columns of a list then use starmap and mul within a list comprehension:
In [13]: first,second=[1,0,0,1,0,0], [[0,1],[1,1],[1,0],[1,0],[1,1],[0,1]]
In [14]: from itertools import starmap
In [15]: from operator import mul
In [16]: [sum(starmap(mul, zip(first, col))) for col in zip(*second)]
Out[16]: [1, 1]
I think the problem with your code was that you loop through the rows of the matrix rather than by the columns. Also you don't reset your 'total' variable after each vector*matrix column calculation. This is what you want:
def multiply(v, G):
result = []
for i in range(len(G[0])): #this loops through columns of the matrix
total = 0
for j in range(len(v)): #this loops through vector coordinates & rows of matrix
total += v[j] * G[j][i]
result.append(total)
return result
i have attached a code for matrix multiplication do follow the example format for one dimensional multiplication (lists of list)
def MM(a,b):
c = []
for i in range(0,len(a)):
temp=[]
for j in range(0,len(b[0])):
s = 0
for k in range(0,len(a[0])):
s += a[i][k]*b[k][j]
temp.append(s)
c.append(temp)
return c
a=[[1,2]]
b=[[1],[2]]
print(MM(a,b))
result is [[5]]
r is an element from G so it's a row which only has two elements. That means you can't use index j to get a value from r because j goes from 0 till the length of v, which is 6 in your example.
I needed solution where the first matrix could be 2-dimensional. Extending the solution from #Kasramvd to accept a two dimensional first matrix. Posted here for reference:
>>> first,second=[[1,0,0,1,0,0],[0,1,1,1,0,0]], [[0,1],[1,1],[1,0],[1,0],[1,1],[0,1]]
>>> from itertools import starmap
>>> from operator import mul
>>> [[sum(starmap(mul, zip(row, col))) for col in zip(*second)] for row in first]
[[1, 1], [3, 1]]
# check matrices
A = [[1,2],[3,4]]
B = [[1,4],[5,6],[7,8],[9,6]]
def custom_mm(A,B):
if len(A[0]) == len(B): -- condition to check if matrix multiplication is valid or not. Making sure matrix is nXm and mXy
result = [] -- final matrix
for i in range(0,len(A)): -- loop through each row of first matrix
temp = [] -- temporary list to hold output of each row of the output matrix where number of elements will be column of second matrix
for j in range(0,len(B[0])): -- loop through each column of second matrix
total = 0
l = 0 -- dummy index to switch row of second matrix
for k in range(0,len(A[0])):
total += A[i][k]*B[l][j]
l = l+1
temp.append(total)
result.append(temp)
return result
else:
return (print("not possible"))
print(custom_mm(A,B))
There is a code that help u to multiply two matrix:
A=[[1,2,3],[4,5,6],[7,8,9]]
B=[[1,2,3],[4,5,6],[7,8,9]]
matrix=[]
def multiplicationLineColumn(line,column):
try:
sizeLine=len(line)
sizeColumn=len(column)
if(sizeLine!=sizeColumn):
raise ValueError("Exception")
res = sum([line[i] * column[i] for i in range(sizeLine)])
return res
except ValueError:
print("sould have the same len line & column")
def getColumn(matrix,numColumn):
size=len(matrix)
column= [matrix[i][numColumn] for i in range(size)]
return column
def getLine(matrix,numLine):
line = matrix[numLine]
return line
for i in range(len(A)):
matrix.append([])
for j in range(len(B)):
matrix[i].append(multiplicationLineColumn(getLine(A,i),getColumn(B,j)))
print(matrix)

Categories

Resources