Numpy multiply arrays into matrix (outer product) - python

I have 2 numpy arrays of shape (5,1) say:
a=[1,2,3,4,5]
b=[2,4,2,3,6]
How can I make a matrix multiplying each i-th element with each j-th? Like:
..a = [1,2,3,4,5]
b
2 2, 4, 6, 8,10
4 4, 8,12,16,20
2 2, 4, 6, 8,10
3 3, 6, 9,12,15
6 6,12,18,24,30
Without using forloops? is there any combination of reshape, reductions or multiplications that I can use?
Right now I create a a*b tiling of each array along rows and along colums and then multiply element wise, but it seems to me there must be an easier way.

With numpy.outer() and numpy.transpose() routines:
import numpy as np
a = [1,2,3,4,5]
b = [2,4,2,3,6]
c = np.outer(a,b).transpose()
print(c)
Or just with swapped array order:
c = np.outer(b, a)
The output;
[[ 2 4 6 8 10]
[ 4 8 12 16 20]
[ 2 4 6 8 10]
[ 3 6 9 12 15]
[ 6 12 18 24 30]]

For some reason np.multiply.outer seems to be faster than np.outer for small inputs. And broadcasting is faster still - but for bigger arrays they are all pretty much equal.
%timeit np.outer(a,b)
%timeit np.multiply.outer(a,b)
%timeit a[:, None]*b
100000 loops, best of 3: 5.97 µs per loop
100000 loops, best of 3: 3.27 µs per loop
1000000 loops, best of 3: 1.38 µs per loop
a = np.random.randint(0,10,100)
b = np.random.randint(0,10,100)
%timeit np.outer(a,b)
%timeit np.multiply.outer(a,b)
%timeit a[:, None]*b
100000 loops, best of 3: 15.5 µs per loop
100000 loops, best of 3: 14 µs per loop
100000 loops, best of 3: 13.5 µs per loop
a = np.random.randint(0,10,10000)
b = np.random.randint(0,10,10000)
%timeit np.outer(a,b)
%timeit np.multiply.outer(a,b)
%timeit a[:, None]*b
10 loops, best of 3: 154 ms per loop
10 loops, best of 3: 154 ms per loop
10 loops, best of 3: 152 ms per loop

Related

Make matrix from consecutive array slices or rolls

I have an array like:
[10 20 30 40]
And I want to build a matrix M1 like this:
10 0 0 0
20 10 0 0
30 20 10 0
40 30 20 10
My approach is to first build the following matrix M2 out of consecutive "rolls" of the array:
10 20 30 40
20 10 40 30
30 20 10 40
40 30 20 10
And then take the lower triangular matrix with np.tril. I would be interested then in efficient methods to build M2 or M1 directly without through M2.
A simple way to build M2 could be:
import numpy as np
def M2_simple(a):
a = np.asarray(a)
return np.stack([a[np.arange(-i, len(a) - i)] for i in range(len(a))]).T
print(M2_simple(np.array([10, 20, 30, 40])))
# [[10 40 30 20]
# [20 10 40 30]
# [30 20 10 40]
# [40 30 20 10]]
After some trying I came to the following, better solution, based on advanced indexing:
def M2_indexing(a):
a = np.asarray(a)
r = np.arange(len(a))[np.newaxis]
return a[np.newaxis][np.zeros_like(r), r - r.T].T
This is obviously much faster than the previous, but measuring the performance still seems not as fast as it could be (for example, it takes order of magnitude longer than tiling, not being a so different operation), and it requires me to build big indexing matrices.
Is there a better way to build these matrices?
EDIT:
Actually, you can build M1 directly using the same method:
import numpy as np
def M1_strided(a):
a = np.asarray(a)
n = len(a)
s, = a.strides
a0 = np.concatenate([np.zeros(len(a) - 1, a.dtype), a])
return np.lib.stride_tricks.as_strided(
a0, (n, n), (s, s), writeable=False)[:, ::-1]
print(M1_strided(np.array([10, 20, 30, 40])))
# [[10 0 0 0]
# [20 10 0 0]
# [30 20 10 0]
# [40 30 20 10]]
In this case the speed benefit is even better, since you are saving the call to np.tril:
N = 100
a = np.square(np.arange(N))
%timeit np.tril(M2_simple(a))
# 792 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.tril(M2_indexing(a))
# 259 µs ± 9.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.tril(M2_strided(a))
# 134 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit M1_strided(a)
# 45.2 µs ± 583 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
You can build the M2 matrix more efficiently with np.lib.stride_tricks.as_strided:
import numpy as np
from numpy.lib.stride_tricks import as_strided
def M2_strided(a):
a = np.asarray(a)
n = len(a)
s, = a.strides
return np.lib.stride_tricks.as_strided(
np.tile(a[::-1], 2), (n, n), (s, s), writeable=False)[::-1]
As an extra benefit, you will only use twice as much memory as the original array (as opposed to the squared size). You just need to be careful not to write to the array created like this (which should not be a problem if you are going to call np.tril later on in) - I added writeable=False to disallow writing operations.
A quick speed comparison with IPython:
N = 100
a = np.square(np.arange(N))
%timeit M2_simple(a)
# 693 µs ± 17.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit M2_indexing(a)
# 163 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit M2_strided(a)
# 38.3 µs ± 348 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Another one using as_strided similar to #jdehesa's solution , but with negative strides that saves us the flipping at the end, like so -
def strided_app2(a):
n = len(a)
ae = np.concatenate((np.zeros(n-1,dtype=a.dtype),a))
s = a.strides[0]
return np.lib.stride_tricks.as_strided(ae[n-1:],(n,n),(s,-s),writeable=False)
Sample run -
In [66]: a
Out[66]: array([10, 20, 30, 40])
In [67]: strided_app2(a)
Out[67]:
array([[10, 0, 0, 0],
[20, 10, 0, 0],
[30, 20, 10, 0],
[40, 30, 20, 10]])
Digging further
Going deeper into the timings for each step, it's revealed that the bottleneck is the concatenation part. So, we can employ array-initialization, giving us an alternative one and seems to be much better for large arrays, like so -
def strided_app3(a):
n = len(a)
ae = np.zeros(2*n-1,dtype=a.dtype)
ae[-n:] = a
s = a.strides[0]
return np.lib.stride_tricks.as_strided(ae[n-1:],(n,n),(s,-s),writeable=False)
Timings -
In [55]: a = np.random.rand(100000)
In [56]: %timeit M1_strided(a) ##jdehesa's soln
...: %timeit strided_app2(a)
...: %timeit strided_app3(a)
10000 loops, best of 3: 107 µs per loop
10000 loops, best of 3: 94.5 µs per loop
10000 loops, best of 3: 84.4 µs per loop
In [61]: a = np.random.rand(1000000)
In [62]: %timeit M1_strided(a) ##jdehesa's soln
...: %timeit strided_app2(a)
...: %timeit strided_app3(a)
100 loops, best of 3: 2.02 ms per loop
100 loops, best of 3: 2 ms per loop
1000 loops, best of 3: 1.84 ms per loop
In [63]: a = np.random.rand(10000000)
In [64]: %timeit M1_strided(a) ##jdehesa's soln
...: %timeit strided_app2(a)
...: %timeit strided_app3(a)
10 loops, best of 3: 25.2 ms per loop
10 loops, best of 3: 24.6 ms per loop
100 loops, best of 3: 13.9 ms per loop
Actually, there is a builtin for that:
>>> import scipy.linalg as sl
>>> sl.toeplitz([10,20,30,40], [0,0,0,0])
array([[10, 0, 0, 0],
[20, 10, 0, 0],
[30, 20, 10, 0],
[40, 30, 20, 10]])

Create index dictionary from integer list

I have a (long) array a of a handful of different integers. I would now like to create a dictionary where the keys are the integers and the values are arrays of indices where in a the respective integer occurs. This
import numpy
a = numpy.array([1, 1, 5, 5, 1])
u = numpy.unique(a)
d = {val: numpy.where(a==val)[0] for val in u}
print(d)
{1: array([0, 1, 4]), 5: array([2, 3])}
works fine, but it seems rather wasteful to first call unique, followed by a couple of wheres.
np.digitize doesn't seem to be ideal either as you have to specify the bins in advance.
Any ideas of how to improve the above?
Approach #1
One approach based on sorting would be -
def group_into_dict(a):
# Get argsort indices
sidx = a.argsort()
# Use argsort indices to sort input array
sorted_a = a[sidx]
# Get indices that define the grouping boundaries based on identical elems
cut_idx = np.flatnonzero(np.r_[True,sorted_a[1:] != sorted_a[:-1],True])
# Form the final dict with slicing the argsort indices for values and
# the starts as the keys
return {sorted_a[i]:sidx[i:j] for i,j in zip(cut_idx[:-1], cut_idx[1:])}
Sample run -
In [55]: a
Out[55]: array([1, 1, 5, 5, 1])
In [56]: group_into_dict(a)
Out[56]: {1: array([0, 1, 4]), 5: array([2, 3])}
Timings on array with 1000000 elements and varying proportion of unique numbers to compare proposed one against the original one -
# 1/100 unique numbers
In [75]: a = np.random.randint(0,10000,(1000000))
In [76]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
1 loop, best of 3: 6.62 s per loop
In [77]: %timeit group_into_dict(a)
10 loops, best of 3: 121 ms per loop
# 1/1000 unique numbers
In [78]: a = np.random.randint(0,1000,(1000000))
In [79]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
1 loop, best of 3: 720 ms per loop
In [80]: %timeit group_into_dict(a)
10 loops, best of 3: 92.1 ms per loop
# 1/10000 unique numbers
In [81]: a = np.random.randint(0,100,(1000000))
In [82]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
10 loops, best of 3: 120 ms per loop
In [83]: %timeit group_into_dict(a)
10 loops, best of 3: 75 ms per loop
# 1/50000 unique numbers
In [84]: a = np.random.randint(0,20,(1000000))
In [85]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
10 loops, best of 3: 60.8 ms per loop
In [86]: %timeit group_into_dict(a)
10 loops, best of 3: 60.3 ms per loop
So, if you are dealing with just 20 or less unique numbers, stick to the original one read on; otherwise sorting based one seems to be working well.
Approach #2
Pandas based one suited for very few unique numbers -
In [142]: a
Out[142]: array([1, 1, 5, 5, 1])
In [143]: import pandas as pd
In [144]: {u:np.flatnonzero(a==u) for u in pd.Series(a).unique()}
Out[144]: {1: array([0, 1, 4]), 5: array([2, 3])}
Timings on array with 1000000 elements with 20 unique elements -
In [146]: a = np.random.randint(0,20,(1000000))
In [147]: %timeit {u:np.flatnonzero(a==u) for u in pd.Series(a).unique()}
10 loops, best of 3: 35.6 ms per loop
# Original solution
In [148]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
10 loops, best of 3: 58 ms per loop
and for fewer unique elements -
In [149]: a = np.random.randint(0,10,(1000000))
In [150]: %timeit {u:np.flatnonzero(a==u) for u in pd.Series(a).unique()}
10 loops, best of 3: 25.3 ms per loop
In [151]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
10 loops, best of 3: 44.9 ms per loop
In [152]: a = np.random.randint(0,5,(1000000))
In [153]: %timeit {u:np.flatnonzero(a==u) for u in pd.Series(a).unique()}
100 loops, best of 3: 17.9 ms per loop
In [154]: %timeit {val: np.where(a==val)[0] for val in np.unique(a)}
10 loops, best of 3: 34.4 ms per loop
How does pandas help here for fewer elements?
With sorting based approach #1, for the case of 20 unique elements, getting the argsort indices was the bottleneck -
In [164]: a = np.random.randint(0,20,(1000000))
In [165]: %timeit a.argsort()
10 loops, best of 3: 51 ms per loop
Now, pandas based function gives us the unique elements be it negative numbers or anything, which we are simply comparing against the elements in the input array without the need for sorting. Let's see the improvement on that side :
In [166]: %timeit pd.Series(a).unique()
100 loops, best of 3: 3.17 ms per loop
Of course, then it needs to get np.flatnonzero indices, which still keeps it comparatively more efficient.
If it's really just a handful of distincts it may be worthwhile using argpartition instead of argsort. The downside is that this requires a compression step:
import numpy as np
def f_pp_1(a):
ws = np.empty(a.max()+1, int)
rng = np.arange(a.size)
ws[a] = rng
unq = rng[ws[a] == rng]
idx = np.argsort(a[unq])
unq = unq[idx]
ws[a[unq]] = np.arange(len(unq))
compressed = ws[a]
counts = np.cumsum(np.bincount(compressed))
return dict(zip(a[unq], np.split(np.argpartition(a, counts[:-1]), counts[:-1])))
If the uniques are small we can save the sompresssion step:
def f_pp_2(a):
bc = np.bincount(a)
keys, = np.where(bc)
counts = np.cumsum(bc[keys])
return dict(zip(keys, np.split(np.argpartition(a, counts[:-1]), counts[:-1])))
data = np.random.randint(0, 10, (5,))[np.random.randint(0, 5, (10000000,))]
sol = f_pp_1(data)
for k, v in sol.items():
assert np.all(k == data[v])
For small numbers of distincts where is competitive if we can avoid unique:
def f_OP_plus(a):
ws = np.empty(a.max()+1, int)
rng = np.arange(a.size)
ws[a] = rng
unq = rng[ws[a] == rng]
idx = np.argsort(a[unq])
unq = unq[idx]
return {val: np.where(a==val)[0] for val in unq}
Here are my timings (best of 3, 10 per block) using the same test arrays as #Divakar (randint(0, nd, (ns,)) -- nd, ns = number of distincts, number of samples):
nd, ns: 5, 1000000
OP 39.88609421 ms
OP_plus 13.04150990 ms
Divakar_1 44.14700069 ms
Divakar_2 21.64940450 ms
pp_1 33.15216140 ms
pp_2 22.43267260 ms
nd, ns: 10, 1000000
OP 52.33878891 ms
OP_plus 17.14743648 ms
Divakar_1 57.76002519 ms
Divakar_2 30.70066951 ms
pp_1 45.33982391 ms
pp_2 34.71166079 ms
nd, ns: 20, 1000000
OP 67.47841339 ms
OP_plus 26.41335099 ms
Divakar_1 71.37646740 ms
Divakar_2 43.09316459 ms
pp_1 57.16468811 ms
pp_2 45.55416510 ms
nd, ns: 50, 1000000
OP 98.91191521 ms
OP_plus 51.15756912 ms
Divakar_1 72.72288438 ms
Divakar_2 70.31920571 ms
pp_1 63.78925461 ms
pp_2 53.00321991 ms
nd, ns: 100, 1000000
OP 148.17743159 ms
OP_plus 92.62091429 ms
Divakar_1 85.02774101 ms
Divakar_2 116.78823209 ms
pp_1 77.01576019 ms
pp_2 66.70976470 ms
And if we don't use the first nd integers for uniques but instead draw them randomly from between 0 and 10000:
nd, ns: 5, 1000000
OP 40.11689581 ms
OP_plus 12.99256920 ms
Divakar_1 42.13181480 ms
Divakar_2 21.55767360 ms
pp_1 33.21835019 ms
pp_2 23.46851982 ms
nd, ns: 10, 1000000
OP 52.84317869 ms
OP_plus 17.96655210 ms
Divakar_1 57.74175161 ms
Divakar_2 32.31985010 ms
pp_1 44.79893579 ms
pp_2 33.42640731 ms
nd, ns: 20, 1000000
OP 66.46886449 ms
OP_plus 25.78120639 ms
Divakar_1 66.58960858 ms
Divakar_2 42.47685110 ms
pp_1 53.67698781 ms
pp_2 44.53037870 ms
nd, ns: 50, 1000000
OP 98.95576960 ms
OP_plus 50.79147881 ms
Divakar_1 72.44545210 ms
Divakar_2 70.91441818 ms
pp_1 64.19071071 ms
pp_2 53.36350428 ms
nd, ns: 100, 1000000
OP 145.62422500 ms
OP_plus 90.82918381 ms
Divakar_1 76.92769479 ms
Divakar_2 115.24481240 ms
pp_1 70.85122908 ms
pp_2 58.85340699 ms
With ns,nd = number of samples, number of distincts, Your solution is O(ns*nd) so inefficient.
A simple O(ns) approach with defaultdict :
from collections import defaultdict
d=defaultdict(list)
for i,x in enumerate(a):d[x].append(i)
Unfortunately slow because python loops are slow, but faster than yours if nd/ns>1%.
Another O(ns) linear solution is possible if nd/ns<<1 (optimized here with numba) :
#numba.njit
def filldict_(a):
v=a.max()+1
cnts= np.zeros(v,np.int64)
for x in a:
cnts[x]+=1
g=cnts.max()
res=np.empty((v,g),np.int64)
cnts[:]=0
i=0
for x in a:
res[x,cnts[x]]=i
cnts[x]+=1
i+=1
return res,cnts,v
def filldict(a):
res,cnts,v=filldict_(a)
return {x:res[x,:cnts[x]] for x in range(v) if cnts[x]>0}
Faster on random numbers with little keys. runs :
In [51]: a=numpy.random.randint(0,100,1000000)
In [52]: %timeit d=group_into_dict(a) #Divakar
134 ms ± 17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [53]: %timeit c=filldict(a)
11.2 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
A lookup table mechanism can be added if keys are big integers, with little overload.
pandas solution 1: Use groupby and its indices function
df = pd.DataFrame(a)
d = df.groupby(0).indices
a = np.random.randint(0,10000,(1000000))
%%timeit
df = pd.DataFrame(a)
d = df.groupby(0).indices
42.6 ms ± 2.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
a = np.random.randint(0,100,(1000000))
%%timeit
df = pd.DataFrame(a)
d = df.groupby(0).indices
22.3 ms ± 5.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
pandas solution 2: Use groupby only (if you already know the keys or can get keys fast using other methods)
a = np.random.randint(0,100,(1000000))
%%timeit
df = pd.DataFrame(a)
d = df.groupby(0)
206 µs ± 14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
groupby itself is really fast but it won't give you keys. If you already know the keys, you can then just the groupby objects as your dictionary. Usage:
d.get_group(key).index # index part is what you need!
Disadvantage: d.get_group(key) itself will cost non-trivial time.
%timeit d.get_group(10).index
496 µs ± 56.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
so it depends on your application and whether you know the keys to decide whether to go with this approach.
If all your values are positive, you may use np.nonzero(np.bincount(a))[0] to get the keys at a reasonable speed. (1.57 ms ± 78.2 µs for a = np.random.randint(0,1000,(1000000)))

Pandas: Selecting rows for which groupby.sum() satisfies condition

In pandas I have a dataframe of the form:
>>> import pandas as pd
>>> df = pd.DataFrame({'ID':[51,51,51,24,24,24,31], 'x':[0,1,0,0,1,1,0]})
>>> df
ID x
51 0
51 1
51 0
24 0
24 1
24 1
31 0
For every 'ID' the value of 'x' is recorded several times, it is either 0 or 1. I want to select those rows from df that contain an 'ID' for which 'x' is 1 at least twice.
For every 'ID' I manage to count the number of times 'x' is 1, by
>>> df.groupby('ID')['x'].sum()
ID
51 1
24 2
31 0
But I don't know how to proceed from here. I would like the following output:
ID x
24 0
24 1
24 1
Use groupby and filter
df.groupby('ID').filter(lambda s: s.x.sum()>=2)
Output:
ID x
3 24 0
4 24 1
5 24 1
df = pd.DataFrame({'ID':[51,51,51,24,24,24,31], 'x':[0,1,0,0,1,1,0]})
df.loc[df.groupby(['ID'])['x'].transform(func=sum)>=2,:]
out:
ID x
3 24 0
4 24 1
5 24 1
Using np.bincount and pd.factorize
alternative advance technique to draw better performance
f, u = df.ID.factorize()
df[np.bincount(f, df.x.values)[f] >= 2]
ID x
3 24 0
4 24 1
5 24 1
In obnoxious one-liner form
df[(lambda f, w: np.bincount(f, w)[f] >= 2)(df.ID.factorize()[0], df.x.values)]
ID x
3 24 0
4 24 1
5 24 1
np.bincount and np.unique
I could've used np.unique with the return_inverse parameter to accomplish the same exact thing. But, np.unique will sort the array and will change the time complexity of the solution.
u, f = np.unique(df.ID.values, return_inverse=True)
df[np.bincount(f, df.x.values)[f] >= 2]
One-liner
df[(lambda f, w: np.bincount(f, w)[f] >= 2)(np.unique(df.ID.values, return_inverse=True)[1], df.x.values)]
Timing
%timeit df[(lambda f, w: np.bincount(f, w)[f] >= 2)(df.ID.factorize()[0], df.x.values)]
%timeit df[(lambda f, w: np.bincount(f, w)[f] >= 2)(np.unique(df.ID.values, return_inverse=True)[1], df.x.values)]
%timeit df.groupby('ID').filter(lambda s: s.x.sum()>=2)
%timeit df.loc[df.groupby(['ID'])['x'].transform(func=sum)>=2]
%timeit df.loc[df.groupby(['ID'])['x'].transform('sum')>=2]
small data
1000 loops, best of 3: 302 µs per loop
1000 loops, best of 3: 241 µs per loop
1000 loops, best of 3: 1.52 ms per loop
1000 loops, best of 3: 1.2 ms per loop
1000 loops, best of 3: 1.21 ms per loop
large data
np.random.seed([3,1415])
df = pd.DataFrame(dict(
ID=np.random.randint(100, size=10000),
x=np.random.randint(2, size=10000)
))
1000 loops, best of 3: 528 µs per loop
1000 loops, best of 3: 847 µs per loop
10 loops, best of 3: 20.9 ms per loop
1000 loops, best of 3: 1.47 ms per loop
1000 loops, best of 3: 1.55 ms per loop
larger data
np.random.seed([3,1415])
df = pd.DataFrame(dict(
ID=np.random.randint(100, size=100000),
x=np.random.randint(2, size=100000)
))
1000 loops, best of 3: 2.01 ms per loop
100 loops, best of 3: 6.44 ms per loop
10 loops, best of 3: 29.4 ms per loop
100 loops, best of 3: 3.84 ms per loop
100 loops, best of 3: 3.74 ms per loop

Row Sum of a dot product for huge matrix in python

I have 2 matrix 100kx200 and 200x100k
if they were small matrix I would just use numpy dot product
sum(a.dot(b), axis = 0)
however the matrix is too big, and also I can't use loops is there a smart way for doing this?
A possible optimization is
>>> numpy.sum(a # b, axis=0)
array([ 1.83633615, 18.71643672, 15.26981078, -46.33670382, 13.30276476])
>>> numpy.sum(a, axis=0) # b
array([ 1.83633615, 18.71643672, 15.26981078, -46.33670382, 13.30276476])
Computing a # b requires 10k×200×10k operations, while summing the rows first will reduce the multiplication to 1×200×10k operations, giving a 10k× improvement.
This is mainly due to recognizing
numpy.sum(x, axis=0) == [1, 1, ..., 1] # x
=> numpy.sum(a # b, axis=0) == [1, 1, ..., 1] # (a # b)
== ([1, 1, ..., 1] # a) # b
== numpy.sum(a, axis=0) # b
Similar for the other axis.
>>> numpy.sum(a # b, axis=1)
array([ 2.8794171 , 9.12128399, 14.52009991, -8.70177811, -15.0303783 ])
>>> a # numpy.sum(b, axis=1)
array([ 2.8794171 , 9.12128399, 14.52009991, -8.70177811, -15.0303783 ])
(Note: x # y is equivalent to x.dot(y) for 2D matrixes and 1D vectors on Python 3.5+ with numpy 1.10.0+)
$ INITIALIZATION='import numpy;numpy.random.seed(0);a=numpy.random.randn(1000,200);b=numpy.random.rand(200,1000)'
$ python3 -m timeit -s "$INITIALIZATION" 'numpy.einsum("ij,jk->k", a, b)'
10 loops, best of 3: 87.2 msec per loop
$ python3 -m timeit -s "$INITIALIZATION" 'numpy.sum(a#b, axis=0)'
100 loops, best of 3: 12.8 msec per loop
$ python3 -m timeit -s "$INITIALIZATION" 'numpy.sum(a, axis=0)#b'
1000 loops, best of 3: 300 usec per loop
Illustration:
In [235]: a = np.random.rand(3,3)
array([[ 0.465, 0.758, 0.641],
[ 0.897, 0.673, 0.742],
[ 0.763, 0.274, 0.485]])
In [237]: b = np.random.rand(3,2)
array([[ 0.303, 0.378],
[ 0.039, 0.095],
[ 0.192, 0.668]])
Now, if we simply do a # b, we would need 18 multiply and 6 addition ops. On the other hand, if we do np.sum(a, axis=0) # b we would only need 6 multiply and 2 addition ops. An improvement of 3x because we had 3 rows in a. As for OP's case, this should give 10k times improvement over simple a # b computation since he has 10k rows in a.
There are two sum-reductions happening - One from the marix-multilication with np.dot, and then with the explicit sum.
We could use np.einsum to do both of those in one go, like so -
np.einsum('ij,jk->k',a,b)
Sample run -
In [27]: a = np.random.rand(3,4)
In [28]: b = np.random.rand(4,3)
In [29]: np.sum(a.dot(b), axis = 0)
Out[29]: array([ 2.70084316, 3.07448582, 3.28690401])
In [30]: np.einsum('ij,jk->k',a,b)
Out[30]: array([ 2.70084316, 3.07448582, 3.28690401])
Runtime test -
In [45]: a = np.random.rand(1000,200)
In [46]: b = np.random.rand(200,1000)
In [47]: %timeit np.sum(a.dot(b), axis = 0)
100 loops, best of 3: 5.5 ms per loop
In [48]: %timeit np.einsum('ij,jk->k',a,b)
10 loops, best of 3: 71.8 ms per loop
Sadly, doesn't look like we are doing any better with np.einsum.
For changing to np.sum(a.dot(b), axis = 1), just swap the output string notation there - np.einsum('ij,jk->i',a,b), like so -
In [42]: np.sum(a.dot(b), axis = 1)
Out[42]: array([ 3.97805141, 3.2249661 , 1.85921549])
In [43]: np.einsum('ij,jk->i',a,b)
Out[43]: array([ 3.97805141, 3.2249661 , 1.85921549])
Some quick time tests using the idea I added to Divakar's answer:
In [162]: a = np.random.rand(1000,200)
In [163]: b = np.random.rand(200,1000)
In [174]: timeit c1=np.sum(a.dot(b), axis=0)
10 loops, best of 3: 27.7 ms per loop
In [175]: timeit c2=np.sum(a,axis=0).dot(b)
1000 loops, best of 3: 432 µs per loop
In [176]: timeit c3=np.einsum('ij,jk->k',a,b)
10 loops, best of 3: 170 ms per loop
In [177]: timeit c4=np.einsum('j,jk->k', np.einsum('ij->j', a), b)
1000 loops, best of 3: 353 µs per loop
In [178]: timeit np.einsum('ij->j', a) #b
1000 loops, best of 3: 304 µs per loop
einsum is actually faster than np.sum!
In [180]: timeit np.einsum('ij->j', a)
1000 loops, best of 3: 173 µs per loop
In [181]: timeit np.sum(a,0)
1000 loops, best of 3: 312 µs per loop
For larger arrays the einsum advantage decreases
In [183]: a = np.random.rand(100000,200)
In [184]: b = np.random.rand(200,100000)
In [185]: timeit np.einsum('ij->j', a) #b
10 loops, best of 3: 51.5 ms per loop
In [186]: timeit c2=np.sum(a,axis=0).dot(b)
10 loops, best of 3: 59.5 ms per loop

python: check if an numpy array contains any element of another array

What is the best way to check if an numpy array contains any element of another array?
example:
array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]`
I want to get a True if array1 contains any value of array2, otherwise a False.
Using Pandas, you can use isin:
a1 = np.array([10,5,4,13,10,1,1,22,7,3,15,9])
a2 = np.array([3,4,9,10,13,15,16,18,19,20,21,22,23])
>>> pd.Series(a1).isin(a2).any()
True
And using the in1d numpy function(per the comment from #Norman):
>>> np.any(np.in1d(a1, a2))
True
For small arrays such as those in this example, the solution using set is the clear winner. For larger, dissimilar arrays (i.e. no overlap), the Pandas and Numpy solutions are faster. However, np.intersect1d appears to excel for larger arrays.
Small arrays (12-13 elements)
%timeit set(array1) & set(array2)
The slowest run took 4.22 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.69 µs per loop
%timeit any(i in a1 for i in a2)
The slowest run took 12.29 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 1.88 µs per loop
%timeit np.intersect1d(a1, a2)
The slowest run took 10.29 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 15.6 µs per loop
%timeit np.any(np.in1d(a1, a2))
10000 loops, best of 3: 27.1 µs per loop
%timeit pd.Series(a1).isin(a2).any()
10000 loops, best of 3: 135 µs per loop
Using an array with 100k elements (no overlap):
a3 = np.random.randint(0, 100000, 100000)
a4 = a3 + 100000
%timeit np.intersect1d(a3, a4)
100 loops, best of 3: 13.8 ms per loop
%timeit pd.Series(a3).isin(a4).any()
100 loops, best of 3: 18.3 ms per loop
%timeit np.any(np.in1d(a3, a4))
100 loops, best of 3: 18.4 ms per loop
%timeit set(a3) & set(a4)
10 loops, best of 3: 23.6 ms per loop
%timeit any(i in a3 for i in a4)
1 loops, best of 3: 34.5 s per loop
You can try this
>>> array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
>>> array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]
>>> set(array1) & set(array2)
set([3, 4, 9, 10, 13, 15, 22])
If you get result means there are common elements in both array.
If result is empty means no common elements.
You can use any built-in function and list comprehension:
>>> array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
>>> array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]
>>> any(i in array2 for i in array1)
True

Categories

Resources