I have seen this, but it doesn't quite answer my question.
I have an array:
x = np.array([0, 1, 2])
I want this:
y = np.array([[0,1], [0,2], [1,0], [1,2], [2,0], [2,1]])
That is, I want to take each value (let's call it i) of the array x and create x.shape[0]-1 new arrays with all of the other values of x, excluding i.
Essentially y contains the indices of a 3x3 matrix without any diagonal elements.
I have a feeling there's an easy, pythonic way of doing this that's just not coming to me.
Approach #1 : One approach would be -
x[np.argwhere(~np.eye(len(x),dtype=bool))]
Approach #2 : In two steps -
r = np.arange(len(x))
out = x[np.argwhere(r[:,None]!=r)]
Approach #3 : For performance, it might be better to create those pairwise coordinates and then mask. To get the paiwise coordinates, let's use cartesian_product_transpose, like so -
r = np.arange(len(x))
mask = r[:,None]!=r
out = cartesian_product_transpose(x,x)[mask.ravel()]
Approach #4 : Another with np.broadcast_to that avoids making copies until masking, again meant as a performance measure -
n = len(x)
r = np.arange(n)
mask = r[:,None]!=r
c0 = np.broadcast_to(x[:,None], (n, n))[mask]
c1 = np.broadcast_to(x, (n,n))[mask]
out = np.column_stack((c0,c1))
Runtime test -
In [382]: x = np.random.randint(0,9,(1000))
# #tom10's soln
In [392]: %timeit list(itertools.permutations(x, 2))
10 loops, best of 3: 62 ms per loop
In [383]: %%timeit
...: x[np.argwhere(~np.eye(len(x),dtype=bool))]
100 loops, best of 3: 11.4 ms per loop
In [384]: %%timeit
...: r = np.arange(len(x))
...: out = x[np.argwhere(r[:,None]!=r)]
100 loops, best of 3: 12.9 ms per loop
In [388]: %%timeit
...: r = np.arange(len(x))
...: mask = r[:,None]!=r
...: out = cartesian_product_transpose(x,x)[mask.ravel()]
100 loops, best of 3: 16.5 ms per loop
In [389]: %%timeit
...: n = len(x)
...: r = np.arange(n)
...: mask = r[:,None]!=r
...: c0 = np.broadcast_to(x[:,None], (n, n))[mask]
...: c1 = np.broadcast_to(x, (n,n))[mask]
...: out = np.column_stack((c0,c1))
100 loops, best of 3: 6.72 ms per loop
This is a case where, unless you really need to speed, etc, of numpy, pure Python gives a cleaner solution:
import itertools
y = itertools.permutations([0, 1, 2], 2)
# [(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]
Related
I want to generate a 3D matrix in numpy. The code is:
mean_value = np.array([1, 2, 3], dtype=np.float32)
h, w = 5, 5
b = np.ones((h, w, 1), dtype=np.float32) * np.reshape(mean_value, [1, 1, 3])
print(b.shape) # (5, 5, 3)
Is there any quicker way for generating b? Thanks.
For efficiency (memory, performance), simply broadcast with np.broadcast_to for a view output -
np.broadcast_to(mean_value,(h,w,)+mean_value.shape)
Being a view, it has no memory overhead and hence, virtually free on runtime.
Let's verify the performance part -
In [45]: mean_value = np.array([1, 2, 3], dtype=np.float32)
...: h, w = 5, 5
In [46]: %timeit np.broadcast_to(mean_value,(h,w,)+mean_value.shape)
100000 loops, best of 3: 3.21 µs per loop
In [47]: mean_value = np.random.rand(10000)
...: h, w = 5000, 5000
In [48]: %timeit np.broadcast_to(mean_value,(h,w,)+mean_value.shape)
100000 loops, best of 3: 3.22 µs per loop
And memory part (being a view) -
In [49]: np.shares_memory(mean_value,np.broadcast_to(mean_value,(h,w,)+mean_value.shape))
Out[49]: True
How can I set
d[i,j,i,j] = s[i,j]
using "NumPy" and without for loop?
I've tried the follow:
l1=range(M)
l2=range(N)
d[l1,l2,l1,l2] = s[l1,l2]
If you think about it, that would be same as creating a 2D array of shape (m*n, m*n) and assigning the values from s into the diagonal places. To have the final output as 4D, we just need a reshape at the end. That's basically being implemented below -
m,n = s.shape
d = np.zeros((m*n,m*n),dtype=s.dtype)
d.ravel()[::m*n+1] = s.ravel()
d.shape = (m,n,m,n)
Runtime test
Approaches -
# #MSeifert's solution
def assign_vals_ix(s):
d = np.zeros((m, n, m, n), dtype=s.dtype)
l1 = range(m)
l2 = range(n)
d[np.ix_(l1,l2)*2] = s[np.ix_(l1,l2)]
return d
# Proposed in this post
def assign_vals(s):
m,n = s.shape
d = np.zeros((m*n,m*n),dtype=s.dtype)
d.ravel()[::m*n+1] = s.ravel()
return d.reshape(m,n,m,n)
# Using a strides based approach
def assign_vals_strides(a):
m,n = a.shape
p,q = a.strides
d = np.zeros((m,n,m,n),dtype=a.dtype)
out_strides = (q*(n*m*n+n),(m*n+1)*q)
d_view = np.lib.stride_tricks.as_strided(d, (m,n), out_strides)
d_view[:] = a
return d
Timings -
In [285]: m,n = 10,10
...: s = np.random.rand(m,n)
...: d = np.zeros((m,n,m,n))
...:
In [286]: %timeit assign_vals_ix(s)
10000 loops, best of 3: 21.3 µs per loop
In [287]: %timeit assign_vals_strides(s)
100000 loops, best of 3: 9.37 µs per loop
In [288]: %timeit assign_vals(s)
100000 loops, best of 3: 4.13 µs per loop
In [289]: m,n = 20,20
...: s = np.random.rand(m,n)
...: d = np.zeros((m,n,m,n))
In [290]: %timeit assign_vals_ix(s)
10000 loops, best of 3: 60.2 µs per loop
In [291]: %timeit assign_vals_strides(s)
10000 loops, best of 3: 41.8 µs per loop
In [292]: %timeit assign_vals(s)
10000 loops, best of 3: 35.5 µs per loop
You can use integer array indexing (creating the broadcasted indices with np.ix_):
d[np.ix_(l1,l2)*2] = s[np.ix_(l1,l2)]
The first time the indices have to be duplicated (you want [i, j, i, j] instead of just [i, j]) that's why I multiplied the tuple returned by np.ix_ with 2.
For example:
>>> d = np.zeros((10, 10, 10, 10), dtype=int)
>>> s = np.arange(100).reshape(10, 10)
>>> l1 = range(3)
>>> l2 = range(5)
>>> d[np.ix_(l1,l2)*2] = s[np.ix_(l1,l2)]
And to make sure that the correct values were assigned:
>>> # Assert equality for the given condition
>>> for i in l1:
... for j in l2:
... assert d[i, j, i, j] == s[i, j]
>>> # Interactive tests
>>> d[0, 0, 0, 0], s[0, 0]
(0, 0)
>>> d[1, 2, 1, 2], s[1, 2]
(12, 12)
>>> d[2, 0, 2, 0], s[2, 0]
(20, 20)
>>> d[2, 4, 2, 4], s[2, 4]
(24, 24)
I have a large numpy array. Is there a way to subtract each element with the elements below it, and store the result in a new list/array, without using a loop.
A simple example of what I mean:
a = numpy.array([4,3,2,1])
result = [4-3, 4-2, 4-1, 3-2, 3-1, 2-1] = [1, 2, 3, 1, 2 ,1]
Note that the 'real' array I am working with doesn't contain numbers in sequence. This is just to make the example simple.
I know the result should have (n-1)! elements, where n is the size of the array.
Is there a way to do this without using a loop, but by repeating the array in a 'smart' way?
Thanks!
temp = a[:, None] - a
result = temp[np.triu_indices(len(a), k=1)]
Perform all pairwise subtractions to produce temp, including subtracting elements from themselves and subtracting earlier elements from later elements, then use triu_indices to select the results we want. (a[:, None] adds an extra length-1 axis to a.)
Note that almost all of the runtime is spent constructing result from temp (because triu_indices is slow and using indices to select the upper triangle of an array is slow). If you can use temp directly, you can save a lot of time:
In [13]: a = numpy.arange(2000)
In [14]: %%timeit
....: temp = a[:, None] - a
....:
100 loops, best of 3: 6.99 ms per loop
In [15]: %%timeit
....: temp = a[:, None] - a
....: result = temp[numpy.triu_indices(len(a), k=1)]
....:
10 loops, best of 3: 51.7 ms per loop
Here's a masking based approach for the extraction after broadcasted subtractions and for the mask creation we are again making use of broadcasting (double broadcasting powered so to speak) -
r = np.arange(a.size)
out = (a[:, None] - a)[r[:,None] < r]
Runtime test
Vectorized approaches -
# #user2357112's solution
def pairwise_diff_triu_indices_based(a):
return (a[:, None] - a)[np.triu_indices(len(a), k=1)]
# Proposed in this post
def pairwise_diff_masking_based(a):
r = np.arange(a.size)
return (a[:, None] - a)[r[:,None] < r]
Timings -
In [109]: a = np.arange(2000)
In [110]: %timeit pairwise_diff_triu_indices_based(a)
10 loops, best of 3: 36.1 ms per loop
In [111]: %timeit pairwise_diff_masking_based(a)
100 loops, best of 3: 11.8 ms per loop
Closer look at involved performance parameters
Let's dig deep a bit through the timings on this setup to study how much mask based approach helps. Now, for comparison there are two parts - Mask creation vs. indices creation and Mask based boolean indexing vs. integer based indexing.
How much mask creation helps?
In [37]: r = np.arange(a.size)
In [38]: %timeit np.arange(a.size)
1000000 loops, best of 3: 1.88 µs per loop
In [39]: %timeit r[:,None] < r
100 loops, best of 3: 3 ms per loop
In [40]: %timeit np.triu_indices(len(a), k=1)
100 loops, best of 3: 14.7 ms per loop
About 5x improvement on mask creation over index setup.
How much boolean indexing helps against integer based indexing?
In [41]: mask = r[:,None] < r
In [42]: idx = np.triu_indices(len(a), k=1)
In [43]: subs = a[:, None] - a
In [44]: %timeit subs[mask]
100 loops, best of 3: 4.15 ms per loop
In [45]: %timeit subs[idx]
100 loops, best of 3: 10.9 ms per loop
About 2.5x improvement here.
a = [4, 3, 2, 1]
differences = ((x - y) for i, x in enumerate(a) for y in a[i+1:])
for diff in differences:
# do something with difference.
pass
Check out itertools.combinations:
from itertools import combinations
l = [4, 3, 2, 1]
result = []
for n1, n2 in combinations(l, 2):
result.append(n1 - n2)
print result
Results in:
[1, 2, 3, 1, 2, 1]
combinations returns a generator, so this is good for very large lists :)
Let say I have two large 2-d numpy array of same dimensions (say 2000x2000). I want to sum them element wise. I was wondering if there is a faster way than np.add()
Edit: I am adding a similar example of what I am using now. Is there a way to speed up this?
#a and b are the two matrices I already have.Dimension is 2000x2000
#shift is also a list that is previously known
for j in range(100000):
b=np.roll(b, shift[j] , axis=0)
a=np.add(a,b)
Approach #1 (Vectorized)
We can use modulus to simulate the circulating behavior of roll/circshift and with broadcasted indices to cover all rows, we would have a fully vectorized approach, like so -
n = b.shape[0]
idx = n-1 - np.mod(shift.cumsum()[:,None]-1 - np.arange(n), n)
a += b[idx].sum(0)
Approach #2 (Loopy one)
b_ext = np.row_stack((b, b[:-1] ))
start_idx = n-1 - np.mod(shift.cumsum()-1,n)
for j in range(start_idx.size):
a += b_ext[start_idx[j]:start_idx[j]+n]
Colon notation vs using indices for slicing
The idea here to do minimal work once we are inside the loop. We are pre-computing the start row index of each iteration before going into the loop. So, all we need to do once inside the loop is slicing using colon notation, which is a view into the array and adding up. This should be much better than rolling that needs to compute all of those row indices that results in a copy that is expensive.
Here's a bit more into the view and copy concepts when slicing with colon and indices -
In [11]: a = np.random.randint(0,9,(10))
In [12]: a
Out[12]: array([8, 0, 1, 7, 5, 0, 6, 1, 7, 0])
In [13]: a[3:8]
Out[13]: array([7, 5, 0, 6, 1])
In [14]: a[[3,4,5,6,7]]
Out[14]: array([7, 5, 0, 6, 1])
In [15]: np.may_share_memory(a, a[3:8])
Out[15]: True
In [16]: np.may_share_memory(a, a[[3,4,5,6,7]])
Out[16]: False
Runtime test
Function defintions -
def original_loopy_app(a,b):
for j in range(shift.size):
b=np.roll(b, shift[j] , axis=0)
a += b
def vectorized_app(a,b):
n = b.shape[0]
idx = n-1 - np.mod(shift.cumsum()[:,None]-1 - np.arange(n), n)
a += b[idx].sum(0)
def modified_loopy_app(a,b):
n = b.shape[0]
b_ext = np.row_stack((b, b[:-1] ))
start_idx = n-1 - np.mod(shift.cumsum()-1,n)
for j in range(start_idx.size):
a += b_ext[start_idx[j]:start_idx[j]+n]
Case #1:
In [5]: # Setup input arrays
...: N = 200
...: M = 1000
...: a = np.random.randint(11,99,(N,N))
...: b = np.random.randint(11,99,(N,N))
...: shift = np.random.randint(0,N,M)
...:
In [6]: original_loopy_app(a1,b1)
...: vectorized_app(a2,b2)
...: modified_loopy_app(a3,b3)
...:
In [7]: np.allclose(a1, a2) # Verify results
Out[7]: True
In [8]: np.allclose(a1, a3) # Verify results
Out[8]: True
In [9]: %timeit original_loopy_app(a1,b1)
...: %timeit vectorized_app(a2,b2)
...: %timeit modified_loopy_app(a3,b3)
...:
10 loops, best of 3: 107 ms per loop
10 loops, best of 3: 137 ms per loop
10 loops, best of 3: 48.2 ms per loop
Case #2:
In [13]: # Setup input arrays (datasets are exactly 1/10th of original sizes)
...: N = 200
...: M = 10000
...: a = np.random.randint(11,99,(N,N))
...: b = np.random.randint(11,99,(N,N))
...: shift = np.random.randint(0,N,M)
...:
In [14]: %timeit original_loopy_app(a1,b1)
...: %timeit modified_loopy_app(a3,b3)
...:
1 loops, best of 3: 1.11 s per loop
1 loops, best of 3: 481 ms per loop
So, we are looking at 2x+ speedup there with the modified loopy approach!
I'm using numpy einsum to calculate the dot products of an array of column vectors pts, of shape (3,N), with itself, resulting on a matrix dotps, of shape (N,N), with all the dot products. This is the code I use:
dotps = np.einsum('ij,ik->jk', pts, pts)
This works, but I only need the values above the main diagonal. ie. the upper triangular part of the result without the diagonal. Is it possible to compute only these values with einsum? or in any other way that is faster than using einsum to compute the whole matrix?
My pts array can be quite large so if I could calculate only the values I need that would double my computation speed.
You can slice relevant columns and then use np.einsum -
R,C = np.triu_indices(N,1)
out = np.einsum('ij,ij->j',pts[:,R],pts[:,C])
Sample run -
In [109]: N = 5
...: pts = np.random.rand(3,N)
...: dotps = np.einsum('ij,ik->jk', pts, pts)
...:
In [110]: dotps
Out[110]:
array([[ 0.26529103, 0.30626052, 0.18373867, 0.13602931, 0.51162729],
[ 0.30626052, 0.56132272, 0.5938057 , 0.28750708, 0.9876753 ],
[ 0.18373867, 0.5938057 , 0.84699103, 0.35788749, 1.04483158],
[ 0.13602931, 0.28750708, 0.35788749, 0.18274288, 0.4612556 ],
[ 0.51162729, 0.9876753 , 1.04483158, 0.4612556 , 1.82723949]])
In [111]: R,C = np.triu_indices(N,1)
...: out = np.einsum('ij,ij->j',pts[:,R],pts[:,C])
...:
In [112]: out
Out[112]:
array([ 0.30626052, 0.18373867, 0.13602931, 0.51162729, 0.5938057 ,
0.28750708, 0.9876753 , 0.35788749, 1.04483158, 0.4612556 ])
Optimizing further -
Let's time our approach and see if there's any scope for improvement performance-wise.
In [126]: N = 5000
In [127]: pts = np.random.rand(3,N)
In [128]: %timeit np.triu_indices(N,1)
1 loops, best of 3: 413 ms per loop
In [129]: R,C = np.triu_indices(N,1)
In [130]: %timeit np.einsum('ij,ij->j',pts[:,R],pts[:,C])
1 loops, best of 3: 1.47 s per loop
Staying within the memory constraints, it doesn't look like we can do much about optimizing np.einsum. So, let's shift the focus to np.triu_indices.
For N = 4, we have :
In [131]: N = 4
In [132]: np.triu_indices(N,1)
Out[132]: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))
It seems to be creating a regular pattern, sort of like a shifting one though. This could be written with a cumulative sum that has shifts at those 3 and 5 positions. Thinking generically, we would end up coding it something like this -
def triu_indices_cumsum(N):
# Length of R and C index arrays
L = (N*(N-1))/2
# Positions along the R and C arrays that indicate
# shifting to the next row of the full array
shifts_idx = np.arange(2,N)[::-1].cumsum()
# Initialize "shift" arrays for finally leading to R and C
shifts1_arr = np.zeros(L,dtype=int)
shifts2_arr = np.ones(L,dtype=int)
# At shift positions along the shifts array set appropriate values,
# such that when cumulative summed would lead to desired R and C arrays.
shifts1_arr[shifts_idx] = 1
shifts2_arr[shifts_idx] = -np.arange(N-2)[::-1]
# Finall cumsum to give R, C
R_arr = shifts1_arr.cumsum()
C_arr = shifts2_arr.cumsum()
return R_arr, C_arr
Let's time it for various N's!
In [133]: N = 100
In [134]: %timeit np.triu_indices(N,1)
10000 loops, best of 3: 122 µs per loop
In [135]: %timeit triu_indices_cumsum(N)
10000 loops, best of 3: 61.7 µs per loop
In [136]: N = 1000
In [137]: %timeit np.triu_indices(N,1)
100 loops, best of 3: 17 ms per loop
In [138]: %timeit triu_indices_cumsum(N)
100 loops, best of 3: 16.3 ms per loop
Thus, it looks like for decent N's, the customized cumsum based triu_indices might be worth a look!