iterating over numpy arrays - python

I am having a very difficult time vectoring, I can't seem to think about math in that way yet. I have this right now:
#!/usr/bin/env python
import numpy as np
import math
grid = np.zeros((2,2))
aList = np.arange(1,5).reshape(2,2)
i,j = np.indices((2,2))
iArray = (i - aList[:,0:1])
jArray = (j - aList[:,1:2])
print np.power(np.power(iArray, 2) + np.power(jArray, 2), .5)
My print out looks like this:
[[ 2.23606798 1.41421356]
[ 4.47213595 3.60555128]]
What I am trying to do is take a 2D array of pixel values, grid, and say how far each pixel is from a list of important pixels, aList.
# # #
# # #
* # *
An example is if the *s (0,2) and (2,2) are important pixels and I am currently at the # (2,0) pixel, my value for the # pixel would be:
[(0-2)^2 + (2-0)^2]^.5 + [(2-2)^2 + (0-2)^2]^.5
All grid does is hold pixel values so I need to get the index of each pixel value to associate distance. However my Alist array holds [x,y] coordinates, So that one is easy. I think I right now I have two issues:
1. I am not getting the indeces correctly
2. I am not looping over the coordinates in aList properly

With a little help from broadcasting, I get this, with data based on your last example:
import numpy as np
grid = np.zeros((3, 3))
aList = np.array([[2, 0], [2, 2]])
important_rows, important_cols = aList.T
rows, cols = np.indices(grid.shape)
dist = np.sqrt((important_rows - rows.ravel()[:, None])**2 +
(important_cols - cols.ravel()[:, None])**2).sum(axis=-1)
dist = dist.reshape(grid.shape)
>>> dist
array([[ 4.82842712, 4.47213595, 4.82842712],
[ 3.23606798, 2.82842712, 3.23606798],
[ 2. , 2. , 2. ]])
You can get more memory efficient by doing:
important_rows, important_cols = aList.T
rows, cols = np.meshgrid(np.arange(grid.shape[0]),
np.arange(grid.shape[1]),
sparse=True, indexing='ij')
dist2 = np.sqrt((rows[..., None] - important_rows)**2 +
(cols[..., None] - important_cols)**2).sum(axis=-1)

My approach:
import numpy as np
n = 3
aList = np.zeros([n,n])
distance = np.zeros([n,n])
I,J = np.indices([n,n])
aList[2,2] = 1; aList[0,2] = 1 #Importan pixels
important = np.where(aList == 1) #Where the important pixels are
for i,j in zip(I[important],J[important]): #This part could be improved...
distance += np.sqrt((i-I)**2+(j-J)**2)
print distance
The last 'for' could be improved, but if you have only a few important pixels, the performance will be good...
Checking with:
import matplotlib.pyplot as plt
n = 500
...
aList[249+100,349] = 1; aList[249-100,349] = 1 ;aList[249,50] = 1
...
plt.plot(I[important],J[important],'rx',markersize=20)
plt.imshow(distance.T,origin='lower',
cmap=plt.cm.gray)
plt.show()
The result is very comfortable:

Related

How to randomly select an element in a 2D numpy array in the same row or column of specified element using vectorization?

I have a 2D NumPy array (say arr1) containing values 0 or 1 as float values. Let size of arr1 be h x w. I have another NumPy array (say arr2) of size n x 2, where each row specifies a location (row and column index) in arr1. For every arr1 location (say (x1, y1) ) specified by each row of arr2, I need to select another location (say (x2, y2)) in arr1 which is in the same row or column as (x1, y1), such that there is atleast one cell between (x1, y1) and (x2, y2), including these two cells, whose value in arr1 is 1.
How can I achieve this efficiently in time? Typical values of h,w,n are 800,800,500000 respectively. So, I would like to achieve this without any for loops.
Example:
import numpy
h=4
w=4
n=3
arr1 = numpy.array([
[0, 1, 0, 0],
[1, 0, 1, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
])
arr2 = numpy.array([
[1, 1],
[2, 2],
[0, 2],
])
Expected solution:
First row of arr2 is (1,1). Valid solutions in 2nd column as (0,1), (2,1), (3,1) and valid solutions in 2nd row are (1,0), (1,2), (1,3). So the code should randomly pick one of these.
Similar for second row of arr2 which is (2,2), valid solutions are (0,2), (1,2), (3,2), (2,0), (2,1), (2,3).
For third row of arr2 which is (0,2), valid solutions are (0,0),(0,1),(1,2),(2,2),(3,2). Note that (0,3) is not a valid solution since there is no cell containing 1 between (0,2) and (0,3).
Note that if a row in arr2 is (0,3), there is no cell in that column with the value 1. Such cases are extremely rare and in such cases, it suffices to pick a location that is sufficiently far away in that column. It is not necessary to detect such cases and pick a location in the same row.
PS: I have a solution by iterating over each row of arr2, but that takes over 1 minute. I am looking for a vectorized solution
I tried writing a solution to this using vectorization, but it still need to uses masked reduce operations which makes things way slower than normal broadcasting
The basic idea behind what i did is to rotate every row by the col index in arr2 and find the min/max index of a non 0 element. Than you select a random number between this 2 (all possible rotated solutions) and rotate it back to get the actual solution for that row.
You than do the same on the transposed arr1 and arr2 to get the the solution by columns and than you randomly select if you want the row or colum solution.
Still this take ~26s to execute
PS: If no solution is possible for the initial point (both row and col are all zeros), a -1 will appear in those coordinates
import time
import numpy as np
def get_same_row_random(arr, pos, shift):
width = arr.shape[1]
app1 = -np.ones_like(arr, dtype=np.int64)
app2 = np.zeros_like(arr, dtype=np.bool8)
w1 = np.where(arr == 1)
app1[w1] = w1[1]
app2[w1] = True
res = app1[pos]
mask = app2[pos]
mask[np.arange(shift.size), shift] = False # Can't select original coordinates
res = (res - shift.reshape(-1,1)) % width
res_min = np.min(res, initial=width, where=mask, axis=1)
res_max = np.max(res, initial=-1, where=mask, axis=1)
w = (res_min == width)
res_min[w] = -1
res_max += 1
res = (np.random.randint(res_min, res_max) + shift) % width
res[w] = -1
return res
h, w, n = 800,800, 500000
arr1 = np.random.randint(2, size=(h,w))
arr2_1 = np.random.randint(h, size=n)
arr2_2 = np.random.randint(w, size=n)
arr2 = np.vstack((arr2_1, arr2_2)).T
start = time.time()
row_indexes = get_same_row_random(arr1, arr2[:,0], arr2[:,1])
col_indexes = get_same_row_random(arr1.T, arr2[:,1], arr2[:,0])
r_or_c = np.random.randint(2, size=n)
exc_row = np.where(row_indexes == -1)[0]
exc_col = np.where(col_indexes == -1)[0]
r_or_c[exc_row] = 1
r_or_c[exc_col] = 0
wr = np.where(r_or_c == 0)[0]
wc = np.where(r_or_c == 1)[0]
res = np.array(arr2)
res[wr,1] = row_indexes[wr]
res[wc,0] = col_indexes[wc]
end = time.time()

Is there a way to get the top k values per row of a numpy array (Python)?

Given a numpy array of the form below:
x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]
is there a way to retain the top-3 values in each row and set others to zero in python (without an explicit loop). The result in the case of the example above would be
x = [[4.,3.,0.,0.,8.],[0.,3.1,0.,9.2,5.5],[0.0,7.0,4.4,0.0,1.3]]
Code for one example
import numpy as np
arr = np.array([1.2,3.1,0.,9.2,5.5,3.2])
indexes=arr.argsort()[-3:][::-1]
a = list(range(6))
A=set(indexes); B=set(a)
zero_ind=(B.difference(A))
arr[list(zero_ind)]=0
The output:
array([0. , 0. , 0. , 9.2, 5.5, 3.2])
Above is my sample code (with many lines) for a 1-D numpy array. Looping through each row of a numpy array and performing this same computation repeatedly would be quite expensive. Is there a simpler way?
Here is a fully vectorized code without third party outside numpy. It is using numpy's argpartition to efficiently find the k-th values. See for instance this answer for other use cases.
def truncate_top_k(x, k, inplace=False):
m, n = x.shape
# get (unsorted) indices of top-k values
topk_indices = numpy.argpartition(x, -k, axis=1)[:, -k:]
# get k-th value
rows, _ = numpy.indices((m, k))
kth_vals = x[rows, topk_indices].min(axis=1)
# get boolean mask of values smaller than k-th
is_smaller_than_kth = x < kth_vals[:, None]
# replace mask by 0
if not inplace:
return numpy.where(is_smaller_than_kth, 0, x)
x[is_smaller_than_kth] = 0
return x
Use np.apply_along_axis to apply a function to 1-D slices along a given axis
import numpy as np
def top_k_values(array):
indexes = array.argsort()[-3:][::-1]
A = set(indexes)
B = set(list(range(array.shape[0])))
array[list(B.difference(A))]=0
return array
arr = np.array([[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]])
result = np.apply_along_axis(top_k_values, 1, arr)
print(result)
Output
[[4. 3. 0. 0. 8. ]
[0. 3.1 0. 9.2 5.5]
[0. 7. 4.4 0. 1.3]]
def top_k(arr, k, axis = 0):
top_k_idx = = np.take_along_axis(np.argpartition(arr, -k, axis = axis),
np.arange(-k,-1),
axis = axis) # indices of top k values in axis
out = np.zeros.like(arr) # create zero array
np.put_along_axis(out, top_k_idx, # put idx values of arr in out
np.take_along_axis(arr, top_k_idx, axis = axis),
axis = axis)
return out
This should work for arbitrary axis and k, but does not work in-place. If you want in-place it's a bit simpler:
def top_k(arr, k, axis = 0):
remove_idx = = np.take_along_axis(np.argpartition(arr, -k, axis = axis),
np.arange(arr.shape[axis] - k),
axis = axis) # indices to remove
np.put_along_axis(out, remove_idx, 0, axis = axis) # put 0 in indices
Here is an alternative that use a list comprehension to look thru your array and applying the keep_top_3 function
import numpy as np
import heapq
def keep_top_3(arr):
smallest = heapq.nlargest(3, arr)[-1] # find the top 3 and use the smallest as cut off
arr[arr < smallest] = 0 # replace anything lower than the cut off with 0
return arr
x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]
result = [keep_top_3(np.array(arr)) for arr in x]
I hope this helps :)

Aggregating 2 NumPy arrays by confidence

I have 2 np arrays containing values in the interval [0,1].
I want to create the third array, containing the most "confident" values, meaning to take elementwise, the number from the array which is closer to 1 or 0. Consider the following example:
[0.7,0.12,1,0.5]
[0.1,0.99,0.001,0.49]
so my constructed array would be:
[0.1,0.99,1,0.49]
import numpy as np
A = np.array([0.7,0.12,1,0.5])
B = np.array([0.1,0.99,0.001,0.49])
maxi = np.maximum(A,B)
mini = np.minimum(A,B)
# Find where the maximum is closer to 1 than the minimum is to 0
idx = 1-maxi < mini
maxi*idx + mini*~idx
returns
array([ 0.1 , 0.99, 1. , 0.49])
You can try this:
c=np.array([a[i] if min(1-a[i],a[i])<min(1-b[i],b[i]) else b[i] for i in range(len(a))])
The result is:
array([ 0.1 , 0.99, 1. , 0.49])
Another way of stating your "confidence" measure is to ask which of the two numbers are furtest away from 0.5. That is, which of the two numbers x yields the largest abs(0.5 - x). The following solution constructs a 2D array c with the original arrays as columns. Then we construct and apply a boolean mask based on abs(0.5 - c):
import numpy as np
a = np.array([0.7,0.12,1,0.5])
b = np.array([0.1,0.99,0.001,0.49])
# Combine
c = np.concatenate((a, b)).reshape((2, len(a))).T
# Create mask
b_or_a = np.asarray(np.argmax(np.abs((0.5 - c)), axis=1), dtype=bool)
mask = np.zeros(c.shape, dtype=bool)
mask[:, 0] = ~b_or_a
mask[:, 1] = b_or_a
# Applt mask
d = c[mask]
print(d) # [ 0.1 0.99 1. 0.49]

Standard deviation from center of mass along Numpy array axis

I am trying to find a well-performing way to calculate the standard deviation from the center of mass/gravity along an axis of a Numpy array.
In formula this is (sorry for the misalignment):
The best I could come up with is this:
def weighted_com(A, axis, weights):
average = np.average(A, axis=axis, weights=weights)
return average * weights.sum() / A.sum(axis=axis).astype(float)
def weighted_std(A, axis):
weights = np.arange(A.shape[axis])
w1com2 = weighted_com(A, axis, weights)**2
w2com1 = weighted_com(A, axis, weights**2)
return np.sqrt(w2com1 - w1com2)
In weighted_com, I need to correct the normalization from sum of weights to sum of values (which is an ugly workaround, I guess). weighted_std is probably fine.
To avoid the XY problem, I still ask for what I actually want, (a better weighted_std) instead of a better version of my weighted_com.
The .astype(float) is a safety measure as I'll apply this to histograms containing ints, which caused problems due to integer division when not in Python 3 or when from __future__ import division is not active.
You want to take the mean, variance and standard deviation of the vector [1, 2, 3, ..., n] — where n is the dimension of the input matrix A along the axis of interest —, with weights given by the matrix A itself.
For concreteness, say you want to consider these center-of-mass statistics along the vertical axis (axis=0) — this is what corresponds to the formulas you wrote. For a fixed column j, you would do
n = A.shape[0]
r = np.arange(1, n+1)
mu = np.average(r, weights=A[:,j])
var = np.average(r**2, weights=A[:,j]) - mu**2
std = np.sqrt(var)
In order to put all of the computations for the different columns together, you have to stack together a bunch of copies of r (one per column) to form a matrix (that I have called R in the code below). With a bit of care, you can make things work for both axis=0 and axis=1.
import numpy as np
def com_stats(A, axis=0):
A = A.astype(float) # if you are worried about int vs. float
n = A.shape[axis]
m = A.shape[(axis-1)%2]
r = np.arange(1, n+1)
R = np.vstack([r] * m)
if axis == 0:
R = R.T
mu = np.average(R, axis=axis, weights=A)
var = np.average(R**2, axis=axis, weights=A) - mu**2
std = np.sqrt(var)
return mu, var, std
For example,
A = np.array([[1, 1, 0], [1, 2, 1], [1, 1, 1]])
print(A)
# [[1 1 0]
# [1 2 1]
# [1 1 1]]
print(com_stats(A))
# (array([ 2. , 2. , 2.5]), # centre-of-mass mean by column
# array([ 0.66666667, 0.5 , 0.25 ]), # centre-of-mass variance by column
# array([ 0.81649658, 0.70710678, 0.5 ])) # centre-of-mass std by column
EDIT:
One can avoid creating in-memory copies of r to build R by using numpy.lib.stride_tricks: swap the line
R = np.vstack([r] * m)
above with
from numpy.lib.stride_tricks import as_strided
R = as_strided(r, strides=(0, r.itemsize), shape=(m, n))
The resulting R is a (strided) ndarray whose underlying array is the same as r's — absolutely no copying of any values occurs.
from numpy.lib.stride_tricks import as_strided
FMT = '''\
Shape: {}
Strides: {}
Position in memory: {}
Size in memory (bytes): {}
'''
def find_base_nbytes(obj):
if obj.base is not None:
return find_base_nbytes(obj.base)
return obj.nbytes
def stats(obj):
return FMT.format(obj.shape,
obj.strides,
obj.__array_interface__['data'][0],
find_base_nbytes(obj))
n=10
m=1000
r = np.arange(1, n+1)
R = np.vstack([r] * m)
S = as_strided(r, strides=(0, r.itemsize), shape=(m, n))
print(stats(r))
print(stats(R))
print(stats(S))
Output:
Shape: (10,)
Strides: (8,)
Position in memory: 4299744576
Size in memory (bytes): 80
Shape: (1000, 10)
Strides: (80, 8)
Position in memory: 4304464384
Size in memory (bytes): 80000
Shape: (1000, 10)
Strides: (0, 8)
Position in memory: 4299744576
Size in memory (bytes): 80
Credit to this SO answer and this one for explanations on how to get the memory address and size of the underlying array of a strided ndarray.

Python numpy array manipulation

i need to manipulate an numpy array:
My Array has the followng format:
x = [1280][720][4]
The array stores image data in the third dimension:
x[0][0] = [Red,Green,Blue,Alpha]
Now i need to manipulate my array to the following form:
x = [1280][720]
x[0][0] = Red + Green + Blue / 3
My current code is extremly slow and i want to use the numpy array manipulation to speed it up:
for a in range(0,719):
for b in range(0,1279):
newx[a][b] = x[a][b][0]+x[a][b][1]+x[a][b][2]
x = newx
Also, if possible i need the code to work for variable array sizes.
Thansk Alot
Use the numpy.mean function:
import numpy as np
n = 1280
m = 720
# Generate a n * m * 4 matrix with random values
x = np.round(np.random.rand(n, m, 4)*10)
# Calculate the mean value over the first 3 values along the 2nd axix (starting from 0)
xnew = np.mean(x[:, :, 0:3], axis=2)
x[:, :, 0:3] gives you the first 3 values in the 3rd dimension, see: numpy indexing
axis=2 specifies, along which axis of the matrix the mean value is calculated.
Slice the alpha channel out of the array, and then sum the array along the RGB axis and divide by 3:
x = x[:,:,:-1]
x_sum = x.sum(axis=2)
x_div = x_sum / float(3)

Categories

Resources