pairwise subtraction of arrays in python - python

I have two matrices, A of shape 512*3 and B of shape 1024*3
I want to calculate pairwise subtraction between their rows, so the result would be of shape 512*1024*3
(they are actually arrays of 3D point coordinates : x , y , z and I eventually want to find k nearest points from B to every point in A)
and I can't use for loops. is there any pythonic way to do this?
thank u.

From the reference I linked in my previous comment:
http://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc
You are trying to do this.
Just follow the example, as in:
import numpy as np
np.random.seed(123)
a = np.random.uniform(size=(8,3)) # or (512,3)
b = np.random.uniform(size=(16,3)) # or (1024,3)
diff = a[np.newaxis,:,:]-b[:,np.newaxis,:]
dist = np.sqrt(np.sum(diff**2,axis=-1))

The difference:
diff = A[:, np.newaxis] - B[np.newaxis, :]
The closest k points in B for each point in A:
k = 5
dists = np.sum(np.square(A[:, np.newaxis] - B[np.newaxis, :]), axis=-1)
top_k = np.argpartition(dists, k, axis=1)[:, :k]
That top_k is not sorted by distance, though. You can sort it later or do instead:
top_k = np.argsort(dists, axis=1)[:, :k]
Which is less efficient but simpler.

Related

2D indexing of scipy sparse matrix

import numpy as np
import scipy.sparse
x = np.random.randint(0, 1000, (1000, 100))
# prob better way to do this
d = np.random.random((1000,1000))
d[d < 0.99] = 0
y = scipy.sparse.csr_matrix(d)
What I would like to do is to create a new matrix z containing the values of y at the indices in x.
ie [0, 0] of z should contain the y[0, x[0, 0]]
[0, 1] of z should contain the y[0, x[0, 1]]
%time for i in range(1000): x[i, y[i]].todense()
~247ms
%time for i in range(1000): np.take(x[i].todense(), y[i])
~150ms
both of the above work, but I am looking for a faster method- this is currently the bottleneck on my code.
Please assume that representing the whole scipy.sparse matrix as dense isn't feasible.
edit:
%time z = np.vstack([q.todense()[0, p] for q, p in zip(x, y)])
is ~110ms
The answer seems to be to use an appropriately shaped broadcasting index, as outlined here: How to generate multi-dimensional 2D numpy index using a sub-index for one dimension
(answer deserves more upvotes)!
%time res = y[np.arange(0, 1000).reshape((-1, 1)), x].todense()

Faster way to calculate minimum distance from a list of points with numpy/python [duplicate]

If I have two single-dimensional arrays of length M and N what is the most efficient way to calculate the euclidean distance between all points with the resultant being an NxM array? I'm trying to figure this out with Numpy but am pretty new to it so I'm a little stuck.
Currently I am doing it this way:
def get_distances(x,y):
#compute distances between all points
distances = np.zeros((len(y),len(x)))
for i in range(len(y)):
for j in range(len(x)):
distances[i,j] = (x[j] - y[i])**2
return distances
Suppose you have 1-dimensional positions :
a = np.random.uniform(50,200,5)
b = np.random.uniform(50,200,3)
you can just use broadcasting:
result = np.abs(a[:, None] - b[None, :])
with the result being:
array([[ 44.37361012, 22.20152487, 89.04608885],
[ 42.83825434, 20.66616909, 87.51073307],
[ 0.19806059, 21.97402467, 44.87053932],
[ 8.42276237, 13.74932288, 53.0952411 ],
[ 8.12181467, 30.29389993, 36.55066406]])
So the i, j index is the distance between point i from array 1 and point j of array 2
if you want the result to be NxM in shape you need to exchange a and b:
result = np.abs(b[:, None] - a[None, :])

Aggregating 2 NumPy arrays by confidence

I have 2 np arrays containing values in the interval [0,1].
I want to create the third array, containing the most "confident" values, meaning to take elementwise, the number from the array which is closer to 1 or 0. Consider the following example:
[0.7,0.12,1,0.5]
[0.1,0.99,0.001,0.49]
so my constructed array would be:
[0.1,0.99,1,0.49]
import numpy as np
A = np.array([0.7,0.12,1,0.5])
B = np.array([0.1,0.99,0.001,0.49])
maxi = np.maximum(A,B)
mini = np.minimum(A,B)
# Find where the maximum is closer to 1 than the minimum is to 0
idx = 1-maxi < mini
maxi*idx + mini*~idx
returns
array([ 0.1 , 0.99, 1. , 0.49])
You can try this:
c=np.array([a[i] if min(1-a[i],a[i])<min(1-b[i],b[i]) else b[i] for i in range(len(a))])
The result is:
array([ 0.1 , 0.99, 1. , 0.49])
Another way of stating your "confidence" measure is to ask which of the two numbers are furtest away from 0.5. That is, which of the two numbers x yields the largest abs(0.5 - x). The following solution constructs a 2D array c with the original arrays as columns. Then we construct and apply a boolean mask based on abs(0.5 - c):
import numpy as np
a = np.array([0.7,0.12,1,0.5])
b = np.array([0.1,0.99,0.001,0.49])
# Combine
c = np.concatenate((a, b)).reshape((2, len(a))).T
# Create mask
b_or_a = np.asarray(np.argmax(np.abs((0.5 - c)), axis=1), dtype=bool)
mask = np.zeros(c.shape, dtype=bool)
mask[:, 0] = ~b_or_a
mask[:, 1] = b_or_a
# Applt mask
d = c[mask]
print(d) # [ 0.1 0.99 1. 0.49]

Python numpy array manipulation

i need to manipulate an numpy array:
My Array has the followng format:
x = [1280][720][4]
The array stores image data in the third dimension:
x[0][0] = [Red,Green,Blue,Alpha]
Now i need to manipulate my array to the following form:
x = [1280][720]
x[0][0] = Red + Green + Blue / 3
My current code is extremly slow and i want to use the numpy array manipulation to speed it up:
for a in range(0,719):
for b in range(0,1279):
newx[a][b] = x[a][b][0]+x[a][b][1]+x[a][b][2]
x = newx
Also, if possible i need the code to work for variable array sizes.
Thansk Alot
Use the numpy.mean function:
import numpy as np
n = 1280
m = 720
# Generate a n * m * 4 matrix with random values
x = np.round(np.random.rand(n, m, 4)*10)
# Calculate the mean value over the first 3 values along the 2nd axix (starting from 0)
xnew = np.mean(x[:, :, 0:3], axis=2)
x[:, :, 0:3] gives you the first 3 values in the 3rd dimension, see: numpy indexing
axis=2 specifies, along which axis of the matrix the mean value is calculated.
Slice the alpha channel out of the array, and then sum the array along the RGB axis and divide by 3:
x = x[:,:,:-1]
x_sum = x.sum(axis=2)
x_div = x_sum / float(3)

Efficiently generating a Cauchy matrix from two Numpy arrays

A Cauchy matrix (Wikipedia article) is a matrix determined by two vectors (arrays of numbers). Given two vectors x and y, the Cauchy matrix C generated by them is defined entry-wise as
C[i][j] := 1/(x[i] - y[j])
Given two Numpy arrays x and y, what is an efficient way to generate a Cauchy matrix?
This is the most efficient way I found, using array broadcasting to take advantage of vectorization.
1.0 / (x.reshape((-1,1)) - y)
Edit: #HYRY and #shx2 have suggested that, instead of x.reshape((-1,1)), which makes a copy, you can use x[:,np.newaxis], which returns a view of the same array. #HYRY also suggests 1.0/np.subtract.outer(x,y), which is slightly slower for me but maybe more explicit.
Example:
>>> x = numpy.array([1,2,3,4]) #x
>>> y = numpy.array([5,6,7]) #y
>>>
>>> #transpose x, to nx1
... x = x.reshape((-1,1))
>>> x
array([[1],
[2],
[3],
[4]])
>>>
>>> #array of differences x[i] - y[j]
... #an nx1 array minus a 1xm array is an nxm array
... diff_matrix = x-y
>>> diff_matrix
array([[-4, -5, -6],
[-3, -4, -5],
[-2, -3, -4],
[-1, -2, -3]])
>>>
>>> #apply the multiplicative inverse to each entry
... cauchym = 1.0/diff_matrix
>>> cauchym
array([[-0.25 , -0.2 , -0.16666667],
[-0.33333333, -0.25 , -0.2 ],
[-0.5 , -0.33333333, -0.25 ],
[-1. , -0.5 , -0.33333333]])
I tried a few other methods, all of which were significantly slower.
This is the naive approach, which costs list comprehension:
cauchym = numpy.array([[ 1.0/(x_i-y_j) for y_j in y] for x_i in x])
This one generates the matrix as a 1-dimensional array (saving the cost of nested Python lists) and reshapes it to a matrix afterward. It also moves the division to a single Numpy operation:
cauchym = 1.0/numpy.array([(x_i-y_j) for x_i in x for y_j in y]).reshape([len(x),len(y)])
Using numpy.repeat and numpy.tile (which respectively tile the array horizontally and vertically). This way makes unnecessary copies:
lenx = len(x)
leny = len(y)
xm = numpy.repeat(x,leny) #the i'th row is s_i
ym = numpy.tile(y,lenx)
cauchym = (1.0/(xm-ym)).reshape([lenx,leny]);
I created a function hope it helps u to understand in a better way.
# Creating a function in order to form a cauchy matrix
def cauchy_matrix(arr1,arr2):
"""
Enter two arrays in order to get a cauchy matrix.The input array should be a 1-D array.
arr1 = First 1-D array
arr2 = Second 1-D array
It returns the cauchy matrix having shape equal to m*n, where m is size of arr1 and n is size of arr2.
"""
my_list = []
try:
for i in range(len(arr1)):
for j in range(len(arr2)):
z = 1/(arr1[i]-arr2[j])
my_list.append(z)
return np.array(my_list).reshape(arr1.shape[0],arr2.shape[0])
except ZeroDivisionError:
print("Check if both the arrays has '0' as one of it's element. One array can have a zero but both the arrays having '0' is not acceptable!")

Categories

Resources