Producing slices of dot output without intermediates

Producing slices of dot output without intermediates - python

I have a large set of 3 x 3 matrices (n of them, say) and corresponding 3 x 1 vectors, and would like to multiply each vector by its corresponding matrix. If I stack the matrices into a n x 3 x 3 ndarray called R and the vectors into a 3 x n ndarray called v, I can obtain the stack of multiplied vectors via,
import numpy as np
intermediate = np.dot(R, v)
out = np.diagonal(intermediate, axis1=0, axis2=2)
But this is very inefficient: np.dot produces the n x 3 x n intermediate array, out of which I then manually select a 3 x n slice. Other than by looping over n, can I somehow produce the 3 x n array without making the intermediate n x 3 x n array?

Expanding on the hint provided by #hpaulj: the multiplication I described can be carried out by,
out = np.einsum('ijk,ki->ji', R, v)
The speedup over the approach in my question is already 3 orders of magnitude (!) for n = 1000:
%timeit d = np.diagonal(np.dot(R, v), axis1=0, axis2=2)
10 loops, best of 3: 27.8 ms per loop
%timeit o = np.einsum('ijk,ki->ji', R, v)
10000 loops, best of 3: 21.9 µs per loop

Related

How does torch.einsum perform this 4D tensor multiplication?

I have come across a code which uses torch.einsum to compute a tensor multiplication. I am able to understand the workings for lower order tensors, but, not for the 4D tensor as below:
import torch
a = torch.rand((3, 5, 2, 10))
b = torch.rand((3, 4, 2, 10))
c = torch.einsum('nxhd,nyhd->nhxy', [a,b])
print(c.size())
# output: torch.Size([3, 2, 5, 4])
I need help regarding:
What is the operation that has been performed here (explanation for how the matrices were multiplied/transposed etc.)?
Is torch.einsum actually beneficial in this scenario?

(Skip to the tl;dr section if you just want the breakdown of steps involved in an einsum)
I'll try to explain how einsum works step by step for this example but instead of using torch.einsum, I'll be using numpy.einsum (documentation), which does exactly the same but I am just, in general, more comfortable with it. Nonetheless, the same steps happen for torch as well.
Let's rewrite the above code in NumPy -
import numpy as np
a = np.random.random((3, 5, 2, 10))
b = np.random.random((3, 4, 2, 10))
c = np.einsum('nxhd,nyhd->nhxy', a,b)
c.shape
#(3, 2, 5, 4)
Step by step np.einsum
Einsum is composed of 3 steps: multiply, sum and transpose
Let's look at our dimensions. We have a (3, 5, 2, 10) and a (3, 4, 2, 10) that we need to bring to (3, 2, 5, 4) based on 'nxhd,nyhd->nhxy'
1. Multiply
Let's not worry about the order in which the n,x,y,h,d axes is, and just worry about the fact if you want to keep them or remove (reduce) them. Writing them down as a table and see how we can arrange our dimensions -
## Multiply ##
n x y h d
--------------------
a -> 3 5 2 10
b -> 3 4 2 10
c1 -> 3 5 4 2 10
To get the broadcasting multiplication between x and y axis to result in (x, y), we will have to add a new axis at the right places and then multiply.
a1 = a[:,:,None,:,:] #(3, 5, 1, 2, 10)
b1 = b[:,None,:,:,:] #(3, 1, 4, 2, 10)
c1 = a1*b1
c1.shape
#(3, 5, 4, 2, 10) #<-- (n, x, y, h, d)
2. Sum / Reduce
Next, we want to reduce the last axis 10. This will get us the dimensions (n,x,y,h).
## Reduce ##
n x y h d
--------------------
c1 -> 3 5 4 2 10
c2 -> 3 5 4 2
This is straightforward. Lets just do np.sum over the axis=-1
c2 = np.sum(c1, axis=-1)
c2.shape
#(3,5,4,2) #<-- (n, x, y, h)
3. Transpose
The last step is rearranging the axis using a transpose. We can use np.transpose for this. np.transpose(0,3,1,2) basically brings the 3rd axis after the 0th axis and pushes the 1st and 2nd. So, (n,x,y,h) becomes (n,h,x,y)
c3 = c2.transpose(0,3,1,2)
c3.shape
#(3,2,5,4) #<-- (n, h, x, y)
4. Final check
Let's do a final check and see if c3 is the same as the c which was generated from the np.einsum -
np.allclose(c,c3)
#True
TL;DR.
Thus, we have implemented the 'nxhd , nyhd -> nhxy' as -
input -> nxhd, nyhd
multiply -> nxyhd #broadcasting
sum -> nxyh #reduce
transpose -> nhxy
Advantage
Advantage of np.einsum over the multiple steps taken, is that you can choose the "path" that it takes to do the computation and perform multiple operations with the same function. This can be done by optimize paramter, which will optimize the contraction order of an einsum expression.
A non-exhaustive list of these operations, which can be computed by einsum, is shown below along with examples:
Trace of an array, numpy.trace.
Return a diagonal, numpy.diag.
Array axis summations, numpy.sum.
Transpositions and permutations, numpy.transpose.
Matrix multiplication and dot product, numpy.matmul numpy.dot.
Vector inner and outer products, numpy.inner numpy.outer.
Broadcasting, element-wise and scalar multiplication, numpy.multiply.
Tensor contractions, numpy.tensordot.
Chained array operations, inefficient calculation order, numpy.einsum_path.
Benchmarks
%%timeit
np.einsum('nxhd,nyhd->nhxy', a,b)
#8.03 µs ± 495 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
np.sum(a[:,:,None,:,:]*b[:,None,:,:,:], axis=-1).transpose(0,3,1,2)
#13.7 µs ± 1.42 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
It shows that np.einsum does the operation faster than individual steps.

pandas matrix calculation till the diagonal

i'm doing a matrix calculation using pandas in python.
my raw data is in the form of list of strings(which is unique for each row).
id list_of_value
0 ['a','b','c']
1 ['d','b','c']
2 ['a','b','c']
3 ['a','b','c']
i have to do a calculate a score with one row and against all the other rows
score calculation algorithm:
Step 1: Take value of id 0: ['a','b','c'],
Step 2: find the intersection between id 0 and id 1 ,
resultant = ['b','c']
Step 3: Score Calculation => resultant.size / id(0).size
repeat step 2,3 between id 0 and id 1,2,3, similarly for all the ids.
Create N * N matrix:
- 0 1 2 3
0 1 0.6 1 1
1 0.6 1 1 1
2 1 1 1 1
3 1 1 1 1
At present i'm using the pandas dummies approach to calculate the score:
s = pd.get_dummies(df.list_of_value.explode()).sum(level=0)
s.dot(s.T).div(s.sum(1))
but there is an repetition in calculation after the diagonal of the matrix, the score calculation till diagonal is sufficient. for eg:
calculation of score of ID 0, will be only till ID(row,column) (0,0), score for ID(row,column) (0,1),(0,2),(0,3) can be copied from ID(row,column) (1,0),(2,0),(3,0).
Detail on the calculation:
i need to calculate till the diagonal, that is till the yellow colored box(the diagonal of matrix), the white values are already calculated in the green shaded area (for ref), i just have to transpose the green shaded area to white.
how can i do this in pandas?

First of all here is a profiling of your code. First all commands separately, and then as you posted it.
%timeit df.list_of_value.explode()
%timeit pd.get_dummies(s)
%timeit s.sum(level=0)
%timeit s.dot(s.T)
%timeit s.sum(1)
%timeit s2.div(s3)
The above profiling returned the following results:
Explode : 1000 loops, best of 3: 201 µs per loop
Dummies : 1000 loops, best of 3: 697 µs per loop
Sum : 1000 loops, best of 3: 1.36 ms per loop
Dot : 1000 loops, best of 3: 453 µs per loop
Sum2 : 10000 loops, best of 3: 162 µs per loop
Divide : 100 loops, best of 3: 1.81 ms per loop
Running Your two lines together results in:
100 loops, best of 3: 5.35 ms per loop
Using a different approach relying less on the (sometimes expensive) functionality of pandas, the code I created takes just about a third of the time by skipping the calculation for the upper triangular matrix and the diagonal as well.
import numpy as np
# create a matrix filled with ones (thus the diagonal is already filled with ones)
df2 = np.ones(shape = (len(df), len(df)))
for i in range(len(df)):
d0 = set(df.iloc[i].list_of_value)
d0_len = len(d0)
# the inner loop starts at i+1 because we don't need to calculate the diagonal
for j in range(i + 1, len(df)):
df2[j, i] = len(d0.intersection(df.iloc[j].list_of_value)) / d0_len
# copy the lower triangular matrix to the upper triangular matrix
df2[np.mask_indices(len(df2), np.triu)] = df2.T[np.mask_indices(len(df2), np.triu)]
# create a DataFrame from the numpy array with the column names set to score<id>
df2 = pd.DataFrame(df2, columns = [f"score{i}" for i in range(len(df))])
With df given as
df = pd.DataFrame(
[[['a','b','c']],
[['d','b','c']],
[['a','b','c']],
[['a','b','c']]],
columns = ["list_of_value"])
the profiling for this code results in a running time of only 1.68ms.
1000 loops, best of 3: 1.68 ms per loop
UPDATE
Instead of operating on the entire DataFrame, just picking the Series that is needed gives a huge speedup.
Three methods to iterate over the entries in the Series have been tested, and all of them are more or less equal regarding the performance.
%%timeit df = pd.DataFrame([[['a','b','c']], [['d','b','c']], [['a','b','c']], [['a','b','c']]], columns = ["list_of_value"])
# %%timeit df = pd.DataFrame([[random.choices(list("abcdefghijklmnopqrstuvwxyz"), k = 15)] for _ in range(100)], columns = ["list_of_value"])
# create a matrix filled with ones (thus the diagonal is already filled with ones)
df2 = np.ones(shape = (len(df), len(df)))
# get the Series from the DataFrame
dfl = df.list_of_value
for i, d0 in enumerate(dfl.values):
# for i, d0 in dfl.iteritems(): # in terms of performance about equal to the line above
# for i in range(len(dfl)): # slightly less performant than enumerate(dfl.values)
d0 = set(d0)
d0_len = len(d0)
# the inner loop starts at i+1 because we don't need to calculate the diagonal
for j in range(i + 1, len(dfl)):
df2[j, i] = len(d0.intersection(dfl.iloc[j])) / d0_len
# copy the lower triangular matrix to the upper triangular matrix
df2[np.mask_indices(len(df2), np.triu)] = df2.T[np.mask_indices(len(df2), np.triu)]
# create a DataFrame from the numpy array with the column names set to score<id>
df2 = pd.DataFrame(df2, columns = [f"score{i}" for i in range(len(dfl))])
There are a lot of pitfalls with pandas. E.g. always access the rows of a DataFrame or Series via df.iloc[0] instead of df[0]. Both works but df.iloc[0] is much faster.
The timings for the first matrix with 4 elements each with a list of size 3 resulted in a speedup of about 3 times as fast.
1000 loops, best of 3: 443 µs per loop
And when using a bigger dataset I got far better results with a speedup of over 11:
# operating on the DataFrame
10 loop, best of 3: 565 ms per loop
# operating on the Series
10 loops, best of 3: 47.7 ms per loop
UPDATE 2
When not using pandas at all (during the calculation), you get another significant speedup. Therefore you simply need to convert the column to operate on into a list.
%%timeit df = pd.DataFrame([[['a','b','c']], [['d','b','c']], [['a','b','c']], [['a','b','c']]], columns = ["list_of_value"])
# %%timeit df = pd.DataFrame([[random.choices(list("abcdefghijklmnopqrstuvwxyz"), k = 15)] for _ in range(100)], columns = ["list_of_value"])
# convert the column of the DataFrame to a list
dfl = list(df.list_of_value)
# create a matrix filled with ones (thus the diagonal is already filled with ones)
df2 = np.ones(shape = (len(dfl), len(dfl)))
for i, d0 in enumerate(dfl):
d0 = set(d0)
d0_len = len(d0)
# the inner loop starts at i+1 because we don't need to calculate the diagonal
for j in range(i + 1, len(dfl)):
df2[j, i] = len(d0.intersection(dfl[j])) / d0_len
# copy the lower triangular matrix to the upper triangular matrix
df2[np.mask_indices(len(df2), np.triu)] = df2.T[np.mask_indices(len(df2), np.triu)]
# create a DataFrame from the numpy array with the column names set to score<id>
df2 = pd.DataFrame(df2, columns = [f"score{i}" for i in range(len(dfl))])
On the data provided in the question we only see a slightly better result compared to the first update.
1000 loops, best of 3: 363 µs per loop
But when using bigger data (100 rows with lists of size 15) the advantage gets obvious:
100 loops, best of 3: 5.26 ms per loop
Here a comparison of all the suggested methods:
+----------+-----------------------------------------+
| | Using the Dataset from the question |
+----------+-----------------------------------------+
| Question | 100 loops, best of 3: 4.63 ms per loop |
+----------+-----------------------------------------+
| Answer | 1000 loops, best of 3: 1.59 ms per loop |
+----------+-----------------------------------------+
| Update 1 | 1000 loops, best of 3: 447 µs per loop |
+----------+-----------------------------------------+
| Update 2 | 1000 loops, best of 3: 362 µs per loop |
+----------+-----------------------------------------+

Although this question is well answered I will show a more readable and also very efficient alternative:
from itertools import product
len_df = df.shape[0]
values = tuple(map(lambda comb: np.isin(*comb).sum() / len(comb[0]),
product(df['list_of_value'], repeat=2)))
pd.DataFrame(index=df['id'],
columns=df['id'],
data=np.array(values).reshape(len_df, len_df))
id 0 1 2 3
id
0 1.000000 0.666667 1.000000 1.000000
1 0.666667 1.000000 0.666667 0.666667
2 1.000000 0.666667 1.000000 1.000000
3 1.000000 0.666667 1.000000 1.000000
%%timeit
len_df = df.shape[0]
values = tuple(map(lambda comb: np.isin(*comb).sum() / len(comb[0]),
product(df['list_of_value'], repeat=2)))
pd.DataFrame(index=df['id'],
columns=df['id'],
data=np.array(values).reshape(len_df, len_df))
850 µs ± 18.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
#convert the column of the DataFrame to a list
dfl = list(df.list_of_value)
# create a matrix filled with ones (thus the diagonal is already filled with ones)
df2 = np.ones(shape = (len(dfl), len(dfl)))
for i, d0 in enumerate(dfl):
d0 = set(d0)
d0_len = len(d0)
# the inner loop starts at i+1 because we don't need to calculate the diagonal
for j in range(i + 1, len(dfl)):
df2[j, i] = len(d0.intersection(dfl[j])) / d0_len
# copy the lower triangular matrix to the upper triangular matrix
df2[np.mask_indices(len(df2), np.triu)] = df2.T[np.mask_indices(len(df2), np.triu)]
# create a DataFrame from the numpy array with the column names set to score<id>
df2 = pd.DataFrame(df2, columns = [f"score{i}" for i in range(len(dfl))])
470 µs ± 79.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I am not inclined to change your first line, although I'm sure it could be faster, because it's not going to be the bottleneck as your data gets larger. But the second line could be, and is also extremely easy to improve:
Change this:
s.dot(s.T).div(s.sum(1))
To:
arr=s.values
np.dot( arr, arr.T ) / arr[0].sum()
That's just doing it in numpy instead of pandas, but often you'll get a huge speedup. On your small, sample data it will only speed up by 2x, but if you increase your dataframe from 4 rows to 400 rows, then I see a speedup of over 20x.
As an aside, I would be inclined to not worry about the triangular aspect of the problem, at least as far as speed. You have to make the code considerably more complex and you probably aren't even gaining any speed in a situation like this.
Conversely, if conserving storage space is important, then obviously retaining only the upper (or lower) triangle will cut your storage needs by slightly more than half.
(If you really do care about the triangular aspect for dimensionality numpy does have related functions/methods but I don't know them offhand and, again, it's not clear to me if it's worth the extra complexity in this case.)

Python: how to compute the jaccard index between two networks?

I have two dataframes df1 and df2 that contains the edgelist of two networks g1 and g2 containing the same nodes but different connections. For each node I want to compare the jaccard index between the two networks.
I define the function that compute the jaccard index
def compute_jaccard_index(set_1, set_2):
n = len(set_1.intersection(set_2))
return n / float(len(set_1) + len(set_2) - n)
df1
i j
0 0 2
1 0 5
2 1 2
3 2 3
4 2 4
5 2 7
df2
i j
0 0 2
1 0 5
2 0 1
3 1 3
4 2 4
5 2 7
what I am doing is the following:
tmp1 = pd.unique(df1['i'])
tmp2 = pd.unique(df2['i'])
JI = []
for i in tmp1:
tmp11 = df1[df1['i']==i]
tmp22 = df2[df2['i']==i]
set_1 = list(tmp11['j'])
set_2 = list(tmp22['j'])
JI.append(compute_jaccard_index(set_1, set_2))
I am wondering if there is a more efficient way

I've always found it faster to take advantage of scipy's sparse matrices and vectorize the operations rather than depending on python's set functions. Here is a simple function that coverts
DataFrame edge lists into sparse matrices (both directed and undirected):
import scipy.sparse as spar
def sparse_adjmat(df, N=None, directed=False, coli='i', colj='j'):
# figure out size of matrix if not given
if N is None:
N = df[[coli, colj]].max() + 1
# make a directed sparse adj matrix
adjmat = spar.csr_matrix((np.ones(df.shape[0],dtype=int), (df[coli].values, df[colj].values)), shape = (N,N))
# for undirected graphs, force the adj matrix to be symmetric
if not directed:
adjmat[df[colj].values, df[coli].values] = 1
return adjmat
then it is just simple vector operations on the binary adjacency matrices:
def sparse_jaccard(m1,m2):
intersection = m1.multiply(m2).sum(axis=1)
a = m1.sum(axis=1)
b = m2.sum(axis=1)
jaccard = intersection/(a+b-intersection)
# force jaccard to be 0 even when a+b-intersection is 0
jaccard.data = np.nan_to_num(jaccard.data)
return np.array(jaccard).flatten()
For comparison, I've made a random pandas edge list function and wrapped your code into the following functions:
def erdos_renyi_df(N=100,m=400):
df = pd.DataFrame(np.random.randint(0,N, size=(m,2)), columns = ['i','j'])
df.drop_duplicates(['i','j'], inplace=True)
df.sort_values(['i','j'], inplace=True)
df.reset_index(inplace=True, drop=True)
return df
def compute_jaccard_index(set_1, set_2):
n = len(set_1.intersection(set_2))
return n / float(len(set_1) + len(set_2) - n)
def set_based_jaccard(df1,df2):
tmp1 = pd.unique(df1['i'])
tmp2 = pd.unique(df2['i'])
JI = []
for i in tmp1:
tmp11 = df1[df1['i']==i]
tmp22 = df2[df2['i']==i]
set_1 = set(tmp11['j'])
set_2 = set(tmp22['j'])
JI.append(compute_jaccard_index(set_1, set_2))
return JI
We can then compare the runtime by making two random networks:
N = 10**3
m = 4*N
df1 = erdos_renyi_df(N,m)
df2 = erdos_renyi_df(N,m)
And calculating the Jaccard similarity for each node using your set based method:
%timeit set_based_jaccard(df1,df2)
1.54 s ± 113 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
And the sparse method (including the overhead of converting to sparse matrices):
%timeit sparse_jaccard(sparse_adjmat(df1, N=N, directed=True, coli='i', colj='j'),sparse_adjmat(df2, N=N, directed=True, coli='i', colj='j'))
1.71 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
As you can see, the sparse matrix code is about 1000 times faster.

How can I solve the of transpose of a matrix in single line in python?

A two dimensional matrix can be represented in Python row-wise, as a list of lists: Each inner list represents one row of the matrix. For instance, the matrix
1 2 3
4 5 6
would be represented as [[1,2,3],[4,5,6]].
The transpose of a matrix makes each row into a column. For instance, the transpose of the matrix above is
1 4
2 5
3 6
Write a Python function transpose(m) that takes as input a two dimensional matrix using this row-wise representation and returns the transpose of the matrix using the same representation.
Here are some examples to show how your function should work. You may assume that the input to the function is always a non-empty matrix.
>>> transpose([[1,4,9]])
[[1], [4], [9]]
>>> transpose([[1,3,5],[2,4,6]])
[[1,2], [3,4], [5,6]]
0
>>> transpose([[1,1,1],[2,2,2],[3,3,3]])
[[1,2,3], [1,2,3], [1,2,3]]

This is a great question, why does it have no rated answers?
This is an ideal case for using some advanced slicing:
def transpose(M):
n = len(M[0])
L = sum(M, [])
return [L[i::n] for i in range(n)]
If you need speed, you should use itertools.chain instead of sum:
from itertools import chain
def transpose2(M):
n = len(M[0])
L = list(chain(*M))
return [L[i::n] for i in range(n)]
I ran some tests on an old mac (2011) and these were the timings:
(transpose0 is for comparison)
def transpose0(M):
return [[M[j][i] for j in range (len(M))] for i in range (len(M[0]))]
5x5 matrix:
%timeit transpose(M): 2.67 µs µs ...
%timeit transpose2(M): 2.94 µs ...
%timeit transpose0(M): 6.96 µs ...
10x10 matrix:
%timeit transpose(M): 6.25 µs ...
%timeit transpose2(M): 5.83 µs ...
%timeit transpose0(M): 19.1 µs ...
100x100 matrix:
%timeit transpose(M): 2.11 ms ...
%timeit transpose2(M): 194 µs ...
%timeit transpose0(M): 1.21 ms ...
For matrices smaller than 7x7, sum is faster. For larger matrices, you want itertools.chain. But honestly, for any serious work, I'd recommend numpy.

import math
def transpose(m):
result=[[[m[j][i] for j in range (len(m))] for i in range (len(m[0]))]
for r in result:
print(r)
output:
>>>transpose([[2,4,6],[7,8,9],[3,6,7]])
[[2,7,3],[4,8,6],[6,9,7]]

def transpose(m):
rez = [[[m[j][i] for j in range(len(m))] for i in range(len(m[0]))]]
for row in rez:
print(row)
transpose([[1,4,9]])

I may not understand the problem:
>>> a = [[1,2,3],[4,5,6]]
>>> print(list(zip(*a)))
[(1, 4), (2, 5), (3, 6)]

How to return array from element-wise calculation on two different arrays?

I have the following two numpy arrays of n elements:
A = np.array([2 5 8 9 8 7 5 6])
B = np.array([8 9 6 5 2 8 5 7])
I would like to obtain array C:
C = np.array([sqrt(2^2+8^2) sqrt(5^2+9^2) ... sqrt(6^2+7^2)])
That is, array C would consist of n elements; each element would be equal to the square root of the square of the respective element in A plus the square of the respective element in B.
I tried using np.apply_along_axis but it seems that this function is designed for one array only.

As mentioned in comments you can use:
C = np.sqrt(A**2 + B**2)
Or you can use comprehension and zip:
C = [sqrt(a**2 + b**2) for a, b in zip(A,B)]

If your arrays are huge in size, consider using np.square instead of ** operator.
In [16]: np.sqrt(np.square(A) + np.square(B))
Out[16]:
array([ 8.24621125, 10.29563014, 10. , 10.29563014,
8.24621125, 10.63014581, 7.07106781, 9.21954446])
The difference in execution times are very minimal though.
In [13]: ar = np.arange(100000)
In [14]: %timeit np.square(ar)
10000 loops, best of 3: 158 µs per loop
In [15]: %timeit ar**2
10000 loops, best of 3: 179 µs per loop

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Producing slices of dot output without intermediates - python

Related

How does torch.einsum perform this 4D tensor multiplication?

pandas matrix calculation till the diagonal

Python: how to compute the jaccard index between two networks?

How can I solve the of transpose of a matrix in single line in python?

How to return array from element-wise calculation on two different arrays?

Categories

Resources