Difference between ALL 1D points in array with python diff()? - python

Looking for tips on how one would write a function (or could recommend a function that already exists) that calculates the difference between all entries in the array i.e. an implementation of diff() but for all entry combinations in the array not just consecutive pairs.
Here is an example of what I want:
# example array
a = [3, 2, 5, 1]
Now we want to apply a function which will return the difference between all combinations of entries. Now given that length(a) == 4 that means that the total number of combinations is, for N = 4; N*(N-1)*0.5 = 6 (if the length of a was 5 then the total number of combinations would be 10 and so on). So the function should return the following for vector a:
result = some_function(a)
print result
array([-1, 2, -2, 3, -1, -4])
So the 'function' would be similar to pdist but instead of calculating the Euclidean distance, it should simply calculate the difference between the Cartesian coordinate along one axis e.g. the z-axis if we assume that the entries in a are coordinates. As can be noted I need the sign of each difference to understand what side of the axis each point is located.
Thanks.

Something like this?
>>> import itertools as it
>>> a = [3, 2, 5, 1]
>>> [y - x for x, y in it.combinations(a, 2)]
[-1, 2, -2, 3, -1, -4]

So I tried out the methods proposed by wim and Joe (and Joe and wim's combined suggestion), and this is what I came up with:
import itertools as it
import numpy as np
a = np.random.randint(10, size=1000)
def cartesian_distance(x):
return np.subtract.outer(x,x)[np.tril_indices(x.shape[0],k=-1)]
%timeit cartesian_distance(a)
%timeit [y - x for x, y in it.combinations(a, 2)]
10 loops, best of 3: 97.9 ms per loop
1 loops, best of 3: 333 ms per loop
For smaller entries:
a = np.random.randint(10, size=10)
def cartesian_distance(x):
return np.subtract.outer(x,x)[np.tril_indices(x.shape[0],k=-1)]
%timeit cartesian_distance(a)
%timeit [y - x for x, y in it.combinations(a, 2)]
10000 loops, best of 3: 78.6 µs per loop
10000 loops, best of 3: 40.1 µs per loop

Related

Matrix element wise multiplication with shifted columns

Say I have two arrays, A and B.
An element wise multiplication is defined as follows:
I want to do an element-wise multiplication in a convolutional-like manner, i.e., move every column one step right, for example, column 1 will be now column 2 and column 3 will be now column 1.
This should yield a ( 2 by 3 by 3 ) array (2x3 matrix for all 3 possibilities)
We can concatenate A with one of it's own slice and then get those sliding windows. To get those windows, we can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows. Then, multiply those windows with B for the final output. More info on use of as_strided based view_as_windows.
Hence, we will have one vectorized solution like so -
In [70]: from skimage.util.shape import view_as_windows
In [71]: A1 = np.concatenate((A,A[:,:-1]),axis=1)
In [74]: view_as_windows(A1,A.shape)[0]*B
Out[74]:
array([[[1, 0, 3],
[0, 0, 6]],
[[2, 0, 1],
[0, 0, 4]],
[[3, 0, 2],
[0, 0, 5]]])
We can also leverage multi-cores with numexpr module for the final step of broadcasted-multiplication, which should be better on larger arrays. Hence, for the sample case, it would be -
In [53]: import numexpr as ne
In [54]: w = view_as_windows(A1,A.shape)[0]
In [55]: ne.evaluate('w*B')
Out[55]:
array([[[1, 0, 3],
[0, 0, 6]],
[[2, 0, 1],
[0, 0, 4]],
[[3, 0, 2],
[0, 0, 5]]])
Timings on large arrays comparing the proposed two methods -
In [56]: A = np.random.rand(500,500)
...: B = np.random.rand(500,500)
In [57]: A1 = np.concatenate((A,A[:,:-1]),axis=1)
...: w = view_as_windows(A1,A.shape)[0]
In [58]: %timeit w*B
...: %timeit ne.evaluate('w*B')
1 loop, best of 3: 422 ms per loop
1 loop, best of 3: 228 ms per loop
Squeezing out the best off strided-based method
If you really squeeze out the best off the strided-view-based approach, go with the original np.lib.stride_tricks.as_strided based one to avoid the functional overhead off view_as_windows -
def vaw_with_as_strided(A,B):
A1 = np.concatenate((A,A[:,:-1]),axis=1)
s0,s1 = A1.strides
S = (A.shape[1],)+A.shape
w = np.lib.stride_tricks.as_strided(A1,shape=S,strides=(s1,s0,s1))
return w*B
Comparing against #Paul Panzer's array-assignment based one, the crossover seems to be at 19x19 shaped arrays -
In [33]: n = 18
...: A = np.random.rand(n,n)
...: B = np.random.rand(n,n)
In [34]: %timeit vaw_with_as_strided(A,B)
...: %timeit pp(A,B)
10000 loops, best of 3: 22.4 µs per loop
10000 loops, best of 3: 21.4 µs per loop
In [35]: n = 19
...: A = np.random.rand(n,n)
...: B = np.random.rand(n,n)
In [36]: %timeit vaw_with_as_strided(A,B)
...: %timeit pp(A,B)
10000 loops, best of 3: 24.5 µs per loop
10000 loops, best of 3: 24.5 µs per loop
So, for anything smaller than 19x19, array-assignment seems to be better and for larger than those, strided-based one should be the way to go.
Just a note on view_as_windows/as_strided. Neat as these functions are, it is useful to know that they have a rather pronounced constant overhead. Here is comparison between #Divakar's view_as_windows based solution (vaw) and a copy-reshape based approach by me.
As you can see vaw is not very fast on small to medium sized operands and only begins to shine above array size 30x30.
Code:
from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
from skimage.util.shape import view_as_windows
B = BenchmarkBuilder()
#B.add_function()
def vaw(A,B):
A1 = np.concatenate((A,A[:,:-1]),axis=1)
w = view_as_windows(A1,A.shape)[0]
return w*B
#B.add_function()
def pp(A,B):
m,n = A.shape
aux = np.empty((n,m,2*n),A.dtype)
AA = np.concatenate([A,A],1)
aux.reshape(-1)[:-n].reshape(n,-1)[...] = AA.reshape(-1)[:-1]
return aux[...,:n]*B
#B.add_arguments('array size')
def argument_provider():
for exp in range(4, 16):
dim_size = int(1.4**exp)
a = np.random.rand(dim_size,dim_size)
b = np.random.rand(dim_size,dim_size)
yield dim_size, MultiArgument([a,b])
r = B.run()
r.plot()
import pylab
pylab.savefig('vaw.png')
Run a for loop for the number of columns and use np.roll() around axis =1, to shift your columns and do the matrix multiplication.
refer to the accepted answer in this reference.
Hope this helps.
I can actually pad the array from its two sides with 2 columns (to get 2x5 array)
and run a conv2 with 'b' as a kernel, I think it's more efficient

Numpy: Index 3D array with index of last axis stored in 2D array

I have a ndarray of shape(z,y,x) containing values. I am trying to index this array with another ndarray of shape(y,x) that contains the z-index of the value I am interested in.
import numpy as np
val_arr = np.arange(27).reshape(3,3,3)
z_indices = np.array([[1,0,2],
[0,0,1],
[2,0,1]])
Since my arrays are rather large I tried to use np.take to avoid unnecessary copies of the array but just can't wrap my head around indexing 3-dimensional arrays with it.
How do I have to index val_arr with z_indices to get the values at the desired z-axis position? The expected outcome would be:
result_arr = np.array([[9,1,20],
[3,4,14],
[24,7,17]])
You can use choose to make the selection:
>>> z_indices.choose(val_arr)
array([[ 9, 1, 20],
[ 3, 4, 14],
[24, 7, 17]])
The function choose is incredibly useful, but can be somewhat tricky to make sense of. Essentially, given an array (val_arr) we can make a series of choices (z_indices) from each n-dimensional slice along the first axis.
Also: any fancy indexing operation will create a new array rather than a view of the original data. It is not possible to index val_arr with z_indices without creating a brand new array.
With readability, np.choose definitely looks great.
If performance is of essence, you can calculate the linear indices and then use np.take or use a flattened version with .ravel() and extract those specific elements from val_arr. The implementation would look something like this -
def linidx_take(val_arr,z_indices):
# Get number of columns and rows in values array
_,nC,nR = val_arr.shape
# Get linear indices and thus extract elements with np.take
idx = nC*nR*z_indices + nR*np.arange(nR)[:,None] + np.arange(nC)
return np.take(val_arr,idx) # Or val_arr.ravel()[idx]
Runtime tests and verify results -
Ogrid based solution from here is made into a generic version for these tests, like so :
In [182]: def ogrid_based(val_arr,z_indices):
...: v_shp = val_arr.shape
...: y,x = np.ogrid[0:v_shp[1], 0:v_shp[2]]
...: return val_arr[z_indices, y, x]
...:
Case #1: Smaller datasize
In [183]: val_arr = np.random.rand(30,30,30)
...: z_indices = np.random.randint(0,30,(30,30))
...:
In [184]: np.allclose(z_indices.choose(val_arr),ogrid_based(val_arr,z_indices))
Out[184]: True
In [185]: np.allclose(z_indices.choose(val_arr),linidx_take(val_arr,z_indices))
Out[185]: True
In [187]: %timeit z_indices.choose(val_arr)
1000 loops, best of 3: 230 µs per loop
In [188]: %timeit ogrid_based(val_arr,z_indices)
10000 loops, best of 3: 54.1 µs per loop
In [189]: %timeit linidx_take(val_arr,z_indices)
10000 loops, best of 3: 30.3 µs per loop
Case #2: Bigger datasize
In [191]: val_arr = np.random.rand(300,300,300)
...: z_indices = np.random.randint(0,300,(300,300))
...:
In [192]: z_indices.choose(val_arr) # Seems like there is some limitation here with bigger arrays.
Traceback (most recent call last):
File "<ipython-input-192-10c3bb600361>", line 1, in <module>
z_indices.choose(val_arr)
ValueError: Need between 2 and (32) array objects (inclusive).
In [194]: np.allclose(linidx_take(val_arr,z_indices),ogrid_based(val_arr,z_indices))
Out[194]: True
In [195]: %timeit ogrid_based(val_arr,z_indices)
100 loops, best of 3: 3.67 ms per loop
In [196]: %timeit linidx_take(val_arr,z_indices)
100 loops, best of 3: 2.04 ms per loop
If you have numpy >= 1.15.0 you could use numpy.take_along_axis. In your case:
result_array = numpy.take_along_axis(val_arr, z_indices.reshape((3,3,1)), axis=2)
That should give you the result you want in one neat line of code. Note the size of the indices array. It needs to have the same number of dimensions as your val_arr (and the same size in the first two dimensions).
Inspired by this thread, using np.ogrid:
y,x = np.ogrid[0:3, 0:3]
print [z_indices, y, x]
[array([[1, 0, 2],
[0, 0, 1],
[2, 0, 1]]),
array([[0],
[1],
[2]]),
array([[0, 1, 2]])]
print val_arr[z_indices, y, x]
[[ 9 1 20]
[ 3 4 14]
[24 7 17]]
I have to admit that multidimensional fancy indexing can be messy and confusing :)

Order of elements in a numpy array

I have a 2-d array of shape(nx3), say arr1. Now consider a second array, arr2, of same shape as arr1 and has the same rows. However, the rows are not in the same order. I want to get the indices of each row in arr2 as they are in arr1. I am looking for fastest Pythonic way to do this as n is of the order of 10,000.
For example:
arr1 = numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2 = numpy.array([[4, 5, 6], [7, 8, 9], [1, 2, 3]])
ind = [1, 2, 0]
Note that the row elements need not be integers. In fact they are floats.
I have found related answers that use numpy.searchsorted but they work for 1-D arrays only.
If you are ensure that arr2 is a permutation of arr1, you can use sort to get the index:
import numpy as np
n = 100000
a1 = np.random.randint(0, 100, size=(n, 3))
a2 = a1[np.random.permutation(np.arange(n))]
idx1 = np.lexsort(a1.T)
idx2 = np.lexsort(a2.T)
idx = idx2[np.argsort(idx1)]
np.all(a1 == a2[idx])
if they don't have exact the same values, you can use kdTree in scipy:
n = 100000
a1 = np.random.uniform(0, 100, size=(n, 3))
a2 = a1[np.random.permutation(np.arange(n))] + np.random.normal(0, 1e-8, size=(n, 3))
from scipy import spatial
tree = spatial.cKDTree(a2)
dist, idx = tree.query(a1)
np.allclose(a1, a2[idx])
Before we begin, you should mention whether duplicates can exist in your list.
That said, the method I would use is numpy's where function within a list comprehension like so:
[numpy.where(arr1 == x)[0][0] for x in arr2]
Though this might not be the fastest way. Another method might include building a dictionary from the rows in arr1 somehow and then looking them up with arr2.
While this is very similar to: Find indexes of matching rows in two 2-D arrays I don't have the reputation to leave a comment.
However, based on that comment there appear to be two clear possibilities for a large matrix like yours:
def find_rows_searchsorted(a, b):
dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
a_view = np.ascontiguousarray(a).view(dt).ravel()
b_view = np.ascontiguousarray(b).view(dt).ravel()
sort_b = np.argsort(b_view)
where_in_b = np.searchsorted(b_view, a_view, sorter=sort_b)
return np.take(sort_b, where_in_b)
def find_rows_iterative(a, b):
answer = np.empty(a.shape[0], dtype=int)
for idx, row in enumerate(a):
answer[idx] = np.where(np.equal(b, row).all(1))[0]
return answer
def find_rows_list_comprehension(a, b):
return [np.where(b == x)[0][0] for x in a]
However, a little timing with a matrix of 10000 elements shows that the searchsorted based method is significantly faster than the brute force iterative method:
arr1 = np.random.randn(10000, 3)
shuffled_inds = np.arange(arr1.shape[0])
np.random.shuffle(shuffled_inds)
arr2 = arr1[new_inds, :]
np.array_equal(find_rows_searchsorted(arr2, arr1), new_inds)
>> True
np.array_equal(find_rows_iterative(arr2, arr1), new_inds)
>> True
np.array_equal(find_rows_list_comprehension(arr2, arr1), new_inds)
>> True
%timeit find_rows_iterative(arr2, arr1)
>> 1 loops, best of 3: 2.62 s per loop
%timeit find_rows_list_comprehension(arr2, arr1)
>> 1 loops, best of 3: 1.61 s per loop
%timeit find_rows_searchsorted(arr2, arr1)
>> 100 loops, best of 3: 6.53 ms per loop
Based off of HYRY's great responses I also added lexsort and kdball tests as well as a test of argsort for structured arrays.
def find_rows_lexsort(a, b):
idx1 = np.lexsort(a.T)
idx2 = np.lexsort(b.T)
return idx2[np.argsort(idx1)]
def find_rows_argsort(a, b):
a_rec = np.core.records.fromarrays(a.transpose())
b_rec = np.core.records.fromarrays(b.transpose())
idx1 = a_rec.argsort(order=a_rec.dtype.names).argsort()
return b_rec.argsort(order=b_rec.dtype.names)[idx1]
def find_rows_kdball(a, b):
from scipy import spatial
tree = spatial.cKDTree(b)
_, idx = tree.query(a)
return idx
%timeit find_rows_lexsort(arr2, arr1)
>> 100 loops, best of 3: 4.63 ms per loop
%timeit find_rows_argsort(arr2, arr1)
>> 100 loops, best of 3: 7.37 ms per loop
%timeit find_rows_kdball(arr2, arr1)
>> 100 loops, best of 3: 18.5 ms per loop

Better way than T.eye in theano

The problem is, given a arbitrary 1-d vector y, expanded it into d basis vectors with n dimension.
The rule of the expansion is: each element in y is the index of columns in the n*n identity matrix.
For example:
y = [3, 0, 1]
n = 4
Since n = 4, we have the 4*4 identity matrix:
[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 1, 0]
[0, 0, 0, 1]
Expand each element y using the rule, we have:
[0, 1, 0]
[0, 0, 1]
[0, 0, 0]
[1, 0, 0]
I want to solve this problem using theano, with very large n (>50k) and very long y (>10k), so efficiency is important.
The solution using numpy is trivial, but the numpy.eye function may cost too much, we may use anther method to make it faster. Comparing the following methods:
import numpy as np
import theano
import theano.tensor as T
n = 25500
y_value = np.asarray([2, 0, 10, 4], dtype='int32')
# method 1
%timeit np.eye(n)[y_value]
# 10 loops, best of 3: 56.9 ms per loop
# method 2
def vec(i):
e = np.zeros(n)
e[i] = 1
return e
%timeit np.vstack([vec(i) for i in y_value])
# 100 loops, best of 3: 16.3 ms per loop
However, the second method may not work in theano since loop in symbolic variable may not trivial. Is there a method which can avoid using T.eye?
y_value can be an arbitrary 1-d vector.
You can try another approach. In my computer:
>>> %timeit np.eye(n)[y_value]
1 loops, best of 3: 544 ms per loop
However, you don't need to create the whole array if you know in advance the rows you want. You can do this:
>>> n = 25500
>>> n_rows = y_value.size
>>> r = np.zeros((n_rows, n))
>>> r[range(n_rows), y_value] = 1
You create a way smaller array, only y x n where y is the size of your index vector, and populate it in every row. The timing in my computer is:
>>> %%timeit
..: r = np.zeros((n_rows, n))
..: r[range(n_rows), y_value] = 1
100 loops, best of 3: 3.8 ms per loop
x151 speedup in my laptop.
Additionally, if you don't want an array full of zeros at the rear (x-axis), you could do:
>>> %%timeit
..: r = np.zeros((n_rows, y_value.max()+1))
..: r[range(n_rows), y_value] = 1
100000 loops, best of 3: 16 µs per loop
Which is even faster, but the resulting array is y x ymax, in this case 99 x 100, which might not be what you want.

Efficient way to create an array that is a sequence of variable length ranges in numpy

Suppose I have an array
import numpy as np
x=np.array([5,7,2])
I want to create an array that contains a sequence of ranges stacked together with the
length of each range given by x:
y=np.hstack([np.arange(1,n+1) for n in x])
Is there some way to do this without the speed penalty of a list comprehension or looping. (x could be a very large array)
The result should be
y == np.array([1,2,3,4,5,1,2,3,4,5,6,7,1,2])
You could use accumulation:
def my_sequences(x):
x = x[x != 0] # you can skip this if you do not have 0s in x.
# Create result array, filled with ones:
y = np.cumsum(x, dtype=np.intp)
a = np.ones(y[-1], dtype=np.intp)
# Set all beginnings to - previous length:
a[y[:-1]] -= x[:-1]
# and just add it all up (btw. np.add.accumulate is equivalent):
return np.cumsum(a, out=a) # here, in-place should be safe.
(One word of caution: If you result array would be larger then the possible size np.iinfo(np.intp).max this might with some bad luck return wrong results instead of erroring out cleanly...)
And because everyone always wants timings (compared to Ophion's) method:
In [11]: x = np.random.randint(0, 20, 1000000)
In [12]: %timeit ua,uind=np.unique(x,return_inverse=True);a=[np.arange(1,k+1) for k in ua];np.concatenate(np.take(a,uind))
1 loops, best of 3: 753 ms per loop
In [13]: %timeit my_sequences(x)
1 loops, best of 3: 191 ms per loop
of course the my_sequences function will not ill-perform when the values of x get large.
First idea; prevent multiple calls to np.arange and concatenate should be much faster then hstack:
import numpy as np
x=np.array([5,7,2])
>>>a=np.arange(1,x.max()+1)
>>> np.hstack([a[:k] for k in x])
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])
>>> np.concatenate([a[:k] for k in x])
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])
If there are many nonunique values this seems more efficient:
>>>ua,uind=np.unique(x,return_inverse=True)
>>>a=[np.arange(1,k+1) for k in ua]
>>>np.concatenate(np.take(a,uind))
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])
Some timings for your case:
x=np.random.randint(0,20,1000000)
Original code
#Using hstack
%timeit np.hstack([np.arange(1,n+1) for n in x])
1 loops, best of 3: 7.46 s per loop
#Using concatenate
%timeit np.concatenate([np.arange(1,n+1) for n in x])
1 loops, best of 3: 5.27 s per loop
First code:
#Using hstack
%timeit a=np.arange(1,x.max()+1);np.hstack([a[:k] for k in x])
1 loops, best of 3: 3.03 s per loop
#Using concatenate
%timeit a=np.arange(1,x.max()+1);np.concatenate([a[:k] for k in x])
10 loops, best of 3: 998 ms per loop
Second code:
%timeit ua,uind=np.unique(x,return_inverse=True);a=[np.arange(1,k+1) for k in ua];np.concatenate(np.take(a,uind))
10 loops, best of 3: 522 ms per loop
Looks like we gain a 14x speedup with the final code.
Small sanity check:
ua,uind=np.unique(x,return_inverse=True)
a=[np.arange(1,k+1) for k in ua]
out=np.concatenate(np.take(a,uind))
>>>out.shape
(9498409,)
>>>np.sum(x)
9498409

Categories

Resources