How to rotate a binary vector to minimum in Python - python

If I have an arbitrary binary vector (numpy array) in Python, e.g.
import numpy as np
vector = np.zeros((8,1))
vector[2,1] = 1
vector[3,1] = 1
This would give me the binary array 00001100. I could also have 00000000 or 00010100 etc. How to make such a script that when I give this binary vector as an input, the script gives the minimum right-rotated binary numpy array as output? Few examples:
00010000 --> 00000001
10100000 --> 00000101
11000001 --> 00000111
00000000 --> 00000000
11111111 --> 11111111
10101010 --> 01010101
11110000 --> 00001111
00111000 --> 00000111
10001111 --> 00011111
etc. Any suggestions / good optimized Python implementations in mind? =) Thank you for any assistance. I need this for Local Binary Pattern implementation =)

The fastest way to do this is create a table first and then you can use ndarray indexing to get the result, here is the code:
You need create the table yourself, the code here is just a demo
import numpy as np
np.random.seed(0)
#create the table
def rotated(s):
for i in range(len(s)):
s2 = s[i:] + s[:i]
if s2[-1] == "1":
yield int(s2, 2)
bitmap = []
for i in range(256):
s = "{:08b}".format(i)
try:
r = min(rotated(s))
except ValueError:
r = i
bitmap.append(r)
bitmap = np.array(bitmap, np.uint8)
Then we can use bitmap and numpy.packbits() and numpy.unpackbits():
a = np.random.randint(0, 2, (10, 8))
a = np.vstack((a, np.array([[1,1,0,0,0,0,0,1]])))
b = np.unpackbits(bitmap[np.packbits(a, axis=1)], axis=1)
print a
print
print b
here is the output:
[[0 1 1 0 1 1 1 1]
[1 1 1 0 0 1 0 0]
[0 0 0 1 0 1 1 0]
[0 1 1 1 1 0 1 0]
[1 0 1 1 0 1 1 0]
[0 1 0 1 1 1 1 1]
[0 1 0 1 1 1 1 0]
[1 0 0 1 1 0 1 0]
[1 0 0 0 0 0 1 1]
[0 0 0 1 1 0 1 0]
[1 1 0 0 0 0 0 1]]
[[0 1 1 0 1 1 1 1]
[0 0 1 0 0 1 1 1]
[0 0 0 0 1 0 1 1]
[0 0 1 1 1 1 0 1]
[0 1 0 1 1 0 1 1]
[0 1 0 1 1 1 1 1]
[0 0 1 0 1 1 1 1]
[0 0 1 1 0 1 0 1]
[0 0 0 0 0 1 1 1]
[0 0 0 0 1 1 0 1]
[0 0 0 0 0 1 1 1]]

Try this:
v = np.array([0,0,1,1,1,0,0,0]) #testing value
count = 0
def f(x):
global count
if x: count = 0
else: count += 1
return count
uf = np.vectorize(f)
v = np.array([0,0,1,1,1,0,0,0])
v2 = np.concatenate((v,v))
vs = uf(v2)
i = vs.argmax()
m = vs[i]
rot = i-m + 1
print np.roll(v2,-rot)[:v.size] #output: [0 0 0 0 0 1 1 1]

I am not sure if numpy provides this, but it is strange that none of the numpy guys has answered so far.
If there is no already builtin way to do this, I would go about it like this:
Convert your array to an int
Then do the rotating over the pure int and test for the minimum
Convert back to array
This way your array rotations are reduced to bit-shifts which should be quite fast.
If your bitarrays are of the size for the samples, I guess this could suffice (I have no numpy at hands, but logic should be the same):
#! /usr/bin/python3
def array2int (a):
i = 0
for e in a: i = (i << 1) + e
return i
def int2array (i, length):
return [ (i >> p) & 1 for p in range (length - 1, -1, -1) ]
def rot (i, length):
return ( (i & ((1 << (length - 1) ) - 1) ) << 1) | (i >> (length - 1) )
def rotMin (a):
length = len (a)
minn = i = array2int (a)
for _ in range (length):
i = rot (i, length)
if i < minn: minn = i
return int2array (minn, length)
#test cases
for case in (16, 160, 193, 0, 255, 170, 240, 56, 143):
case = int2array (case, 8)
result = rotMin (case)
print ("{} -> {}".format (case, result) )
If they are way longer, you maybe would like to find first the longest runs of zeros, and then only test those cases that begin with such a run.
Output is:
[0, 0, 0, 1, 0, 0, 0, 0] -> [0, 0, 0, 0, 0, 0, 0, 1]
[1, 0, 1, 0, 0, 0, 0, 0] -> [0, 0, 0, 0, 0, 1, 0, 1]
[1, 1, 0, 0, 0, 0, 0, 1] -> [0, 0, 0, 0, 0, 1, 1, 1]
[0, 0, 0, 0, 0, 0, 0, 0] -> [0, 0, 0, 0, 0, 0, 0, 0]
[1, 1, 1, 1, 1, 1, 1, 1] -> [1, 1, 1, 1, 1, 1, 1, 1]
[1, 0, 1, 0, 1, 0, 1, 0] -> [0, 1, 0, 1, 0, 1, 0, 1]
[1, 1, 1, 1, 0, 0, 0, 0] -> [0, 0, 0, 0, 1, 1, 1, 1]
[0, 0, 1, 1, 1, 0, 0, 0] -> [0, 0, 0, 0, 0, 1, 1, 1]
[1, 0, 0, 0, 1, 1, 1, 1] -> [0, 0, 0, 1, 1, 1, 1, 1]

Using numpy I can't completely avoid iteration, but I can limit it to the smallest dimension, the number of possible rotations (8). I've found several alternatives. I suspect the last is fastest, but I haven't done time tests. The core idea is to collect all the possible rotations into an array, and pick the minimum value from those.
x=[[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 1],
...
[1, 0, 0, 0, 1, 1, 1, 1]]
x = np.array(x)
M,N = x.shape
j = 2**np.arange(N)[::-1] # powers of 2 used convert vector to number
# use np.dot(xx,j) to produce a base 10 integer
A) In the first version I collect the rotations in a 3D array
xx = np.zeros([M, N, N],dtype=int)
for i in range(N):
xx[:,i,:] = np.roll(x, i, axis=1)
t = np.argmin(np.dot(xx, j), axis=1) # find the mimimum index
print t
print xx[range(M),t,:]
produces:
[4 5 6 0 0 1 4 3 7]
[[0 0 0 0 0 0 0 1]
[0 0 0 0 0 1 0 1]
[0 0 0 0 0 1 1 1]
...
[0 0 0 1 1 1 1 1]]
B) A variation would be to store the np.dot(xx, j) values in a 2D array, and convert the minimum of each row back to the 8 column array.
xx = x.copy()
for i in range(N):
y = np.roll(x, i, axis=1)
xx[:,i] = np.dot(y, j)
y = np.min(xx, axis=1)
print y
# [4 5 6 0 0 1 4 3 7]
# convert back to binary
z = x.copy()
for i in range(N):
z[:,i] = y%2
y = y//2
z = np.fliplr(z)
print z
I couldn't find a numpy way of converting a vector of numbers to a binary array. But with N much smaller than M, this iterative approach isn't costly. numpy.base_repr uses this, but only operates on scalars. [int2array and np.unpackbits used in the other answers are faster.]
C) Better yet, I could roll j rather than x:
xx = x.copy()
for i in range(N):
xx[:,i] = np.dot(x, np.roll(j,i))
y = np.min(xx, axis=1)
print y
D) Possible further speed up by constructing an array of rotated j, and doing the dot product just once. It may be possible to construct jj without iteration, but creating an 8x8 array just once isn't expensive.
jj = np.zeros([N,N], dtype=int)
for i in range(N):
jj[:,i] = np.roll(j,i)
print jj
xx = np.dot(x, jj)
# or xx = np.einsum('ij,jk',x,jj)
y = np.min(xx, axis=1)
print y
Timing notes:
For small x, such as the sample 9 rows, the first solution (A) is fastest. Converting the integers back to binary takes up a 1/3 of time, slowing down the other solutions.
But for a large x, such as 10000 rows, the last is best. With IPython timeit
A) 10 loops, best of 3: 22.7 ms per loop
B) 100 loops, best of 3: 13.5 ms per loop
C) 100 loops, best of 3: 8.21 ms per loop
D) 100 loops, best of 3: 6.15 ms per loop
# Hyperboreous: rotMin(x1)
# adapted to work with numpy arrays
H) 1 loops, best of 3: 177 ms per loop
At one point I thought I might gain speed by selectively rotating rows only until they reach their minimum value. But these added samples show that I cannot use a local minimum:
[1, 0, 1, 0, 1, 0, 0, 0],
[1, 0, 0, 1, 1, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 1],
[0, 1, 0, 1, 0, 1, 0, 0],
[0, 1, 0, 1, 0, 0, 1, 0],
corresponding xx values
[168 81 162 69 138 21 42 84]
[152 49 98 196 137 19 38 76]
[145 35 70 140 25 50 100 200]
[ 84 168 81 162 69 138 21 42]
[ 82 164 73 146 37 74 148 41]
But notice that the minimum for each of these rows is the first 0 of the longest run of 0s. So it might possible to find the minimum without doing all the rotations and conversion to numeric value.

You can use collection deques rotate function if you convert your array to a list, you can convert it back to an array whence you're done with your rotations.
import numpy as np
import collections #import collections and deque
from collections import deque
vector = np.array([1, 1,0]) # example array
list = vector.tolist() # use tolist() to convert the array "vector" to a list
n = collections.deque(list) # convert the list to a deque
#rotate two closest
print n.rotate(1)
>>> deque([1, 1, 0])
#rotate three closest
print n.rotate(2)
deque([1, 0, 1])
n.rotate(-1) to rotate back from last rotation

Related

Vectorizing a function for finding local minima and maxima in a 2D array with strict comparison

I'm trying to improve the performance of a function that returns the local minima and maxima of an input 2D NumPy array. The function works as expected, but it is too slow for my use case. I'm wondering if it's possible to create a vectorized version of this function to improve its performance.
Here is the formal definition for defining whether an element is a local minima (maxima):
where
A=[a_m,n] is the 2D matrix, m and n are the row and column respectively, w_h and w_w are the height and width of the sliding windows, respectively.
I have tried using skimage.morphology.local_minimum and skimage.morphology.local_maxima, but they consider an element a minimum (maximum) if its value is less than or equal to (greater than or equal to) all of its neighbors.
In my case, I need the function to consider an element a minimum (maximum) if it is strictly less than (greater than) all of its neighbors.
The current implementation uses a sliding window approach with numpy.lib.stride_tricks.sliding_window_view, but the function does not necessarily have to use this approach.
Here is my current implementation:
import numpy as np
def get_local_extrema(array, window_size=(3, 3)):
# Check if the window size is valid
if not all(size % 2 == 1 and size >= 3 for size in window_size):
raise ValueError("Window size must be odd and >= 3 in both dimensions.")
# Create a map to store the local minima and maxima
minima_map = np.zeros_like(array)
maxima_map = np.zeros_like(array)
# Save the shape and dtype of the original array for later
original_size = array.shape
original_dtype = array.dtype
# Get the halved window size
half_window_size = tuple(size // 2 for size in window_size)
# Pad the array with NaN values to handle the edge cases
padded_array = np.pad(array.astype(float),
tuple((size, size) for size in half_window_size),
mode='constant', constant_values=np.nan)
# Generate all the sliding windows
windows = np.lib.stride_tricks.sliding_window_view(padded_array, window_size).reshape(
original_size[0] * original_size[1], *window_size)
# Create a mask to ignore the central element of the window
mask = np.ones(window_size, dtype=bool)
mask[half_window_size] = False
# Iterate through all the windows
for i in range(windows.shape[0]):
window = windows[i]
# Get the value of the central element
center_val = window[half_window_size]
# Apply the mask to ignore the central element
masked_window = window[mask]
# Get the row and column indices of the central element
row = i // original_size[1]
col = i % original_size[1]
# Check if the central element is a local minimum or maximum
if center_val > np.nanmax(masked_window):
maxima_map[row, col] = center_val
elif center_val < np.nanmin(masked_window):
minima_map[row, col] = center_val
return minima_map.astype(original_dtype), maxima_map.astype(original_dtype)
a = np.array([[8, 8, 4, 1, 5, 2, 6, 3],
[6, 3, 2, 3, 7, 3, 9, 3],
[7, 8, 3, 2, 1, 4, 3, 7],
[4, 1, 2, 4, 3, 5, 7, 8],
[6, 4, 2, 1, 2, 5, 3, 4],
[1, 3, 7, 9, 9, 8, 7, 8],
[9, 2, 6, 7, 6, 8, 7, 7],
[8, 2, 1, 9, 7, 9, 1, 1]])
(minima, maxima) = get_local_extrema(a)
print(minima)
# [[0 0 0 1 0 2 0 0]
# [0 0 0 0 0 0 0 0]
# [0 0 0 0 1 0 0 0]
# [0 1 0 0 0 0 0 0]
# [0 0 0 1 0 0 3 0]
# [1 0 0 0 0 0 0 0]
# [0 0 0 0 6 0 0 0]
# [0 0 1 0 0 0 0 0]]
print(maxima)
# [[0 0 0 0 0 0 0 0]
# [0 0 0 0 7 0 9 0]
# [0 8 0 0 0 0 0 0]
# [0 0 0 4 0 0 0 8]
# [6 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 8]
# [9 0 0 0 0 0 0 0]
# [0 0 0 9 0 9 0 0]]
expected_minima = np.array([[0, 0, 0, 1, 0, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 3, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0]])
expected_maxima = np.array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 7, 0, 9, 0],
[0, 8, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 4, 0, 0, 0, 8],
[6, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 8],
[9, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 9, 0, 9, 0, 0]])
np.testing.assert_array_equal(minima, expected_minima)
np.testing.assert_array_equal(maxima, expected_maxima)
print('All tests passed')
Any suggestions or ideas on how to vectorize this function would be greatly appreciated.
Thanks in advance!
EDIT #1
After playing a bit with NumPy, I managed to get the following code to almost work, in a completely vectorized way if I understood correctly:
def get_local_extrema_2(img):
minima_map = np.zeros_like(img)
maxima_map = np.zeros_like(img)
minima_map[1:-1, 1:-1] = np.where(
(a[1:-1, 1:-1] < a[:-2, 1:-1]) &
(a[1:-1, 1:-1] < a[2:, 1:-1]) &
(a[1:-1, 1:-1] < a[1:-1, :-2]) &
(a[1:-1, 1:-1] < a[1:-1, 2:]) &
(a[1:-1, 1:-1] < a[2:, 2:]) &
(a[1:-1, 1:-1] < a[:-2, :-2]) &
(a[1:-1, 1:-1] < a[2:, :-2]) &
(a[1:-1, 1:-1] < a[:-2, 2:]),
a[1:-1, 1:-1],
0)
maxima_map[1:-1, 1:-1] = np.where(
(a[1:-1, 1:-1] > a[:-2, 1:-1]) &
(a[1:-1, 1:-1] > a[2:, 1:-1]) &
(a[1:-1, 1:-1] > a[1:-1, :-2]) &
(a[1:-1, 1:-1] > a[1:-1, 2:]) &
(a[1:-1, 1:-1] > a[2:, 2:]) &
(a[1:-1, 1:-1] > a[:-2, :-2]) &
(a[1:-1, 1:-1] > a[2:, :-2]) &
(a[1:-1, 1:-1] > a[:-2, 2:]),
a[1:-1, 1:-1],
0)
return minima_map, maxima_map
Output of get_local_extrema_2 is:
Minima map:
[[0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0]
[0 1 0 0 0 0 0 0]
[0 0 0 1 0 0 3 0]
[0 0 0 0 0 0 0 0]
[0 0 0 0 6 0 0 0]
[0 0 0 0 0 0 0 0]]
Maxima map:
[[0 0 0 0 0 0 0 0]
[0 0 0 0 7 0 9 0]
[0 8 0 0 0 0 0 0]
[0 0 0 4 0 0 0 0]
[0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0]]
The problem with the above is that pixels on the border that are minima or maxima are not detected.
EDIT #2
It would be fine even if in the output arrays there is 1 instead of the value of the local minima (maxima) i.e. a 2d array of 0 and 1 (or False and True).
EDIT #3
Here is a version of the function based on Cris Luengo's answer. Notice the use of the mode 'mirror' (equivalent to NumPy 'reflect') so that if a minima or maxima is on the edge, it will not be copied outside the border and it will stand out. This way, there is no need to pad the image with the minimum or maximum element of the matrix. I think this is the most performant way to accomplish this task:
import numpy as np
import scipy
def get_local_extrema_v3(image):
footprint = np.ones((3, 3), dtype=bool)
footprint[1, 1] = False
minima = image * (scipy.ndimage.grey_erosion(image, footprint=footprint, mode='mirror') > image)
maxima = image * (scipy.ndimage.grey_dilation(image, footprint=footprint, mode='mirror') < image)
return minima, maxima
Your definition of local maximum is flawed. For example, in a 1D array [1,2,3,4,4,3,2,1], there is a local maximum, but your definition ignores it. skimage.morphology.local_maxima will correctly identify this local maximum.
If you really need to implement your definition, I would use a dilation (erosion) with a square structuring element of the window size, but excluding the central pixel. Any pixel that is larger (smaller) in the original image than in the filtered image will satisfy your definition of local maximum (minimum).
I implemented this using scikit-image, but discovered it does a weird thing at the image edge, so it will not detect local maxima or minima near the edge:
se = np.ones((3, 3))
se[1, 1] = 0
minima = a * (skimage.morphology.erosion(a, footprint=se) > a)
maxima = a * (skimage.morphology.dilation(a, footprint=se) < a)
Using DIPlib (disclosure: I'm an author) this will work correctly also at the image edge:
import diplib as dip
se = np.ones((3, 3), dtype=np.bool_)
se[1, 1] = False
minima = a * (dip.Erosion(a, se) > a)
maxima = a * (dip.Dilation(a, se) < a)
Looking at the source code for skimage.morphology.dilation, it calls scipy.ndimage.grey_dilation with the default boundary extension, which is 'reflect'. This means that every local maximum at the image edge will have a neighbor with the same value, and hence not detected as local maximum in this definition. Instead, it should use the 'constant' extension, with cval set to the minimum possible value for the data type. For example, for an uint8 input array, it should do ndi.grey_dilation(image, footprint=footprint, output=out, mode='constant', cval=0). GitHub issue

Numpy scalable diagonal matrices

Assuming I have the variables:
A = 3
B = 2
C = 1
How can i transform them into diagonal matrices in the following form:
np.diag([1, 1, 1, 0, 0, 0])
Out[0]:
array([[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
np.diag([0,0,0,1,1,0])
Out[1]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0]])
np.diag([0,0,0,0,0,1])
Out[2]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1]])
I would like this to be scalable, so for instance with 4 variables a = 500, b = 20, c = 300, d = 200 the size of the matrix will be 500 + 20 + 300 + 200 = 1020.
What is the easiest way to do this?
The obligatory solution with np.einsum, about ~2.25x slower than the accepted answer for the [500,20,200,300] arrays on a 2-core colab instance.
import numpy as np
A = 3
B = 2
C = 1
r = [A,B,C]
m = np.arange(len(r))
np.einsum('ij,kj->ijk', m.repeat(r) == m[:,None], np.eye(np.sum(r), dtype='int'))
Output
array([[[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1]]])
Here's one approach. The resulting array mats contains the matrices you're looking for.
A = 3
B = 2
C = 1
n_list = [A,B,C]
ab_list = np.cumsum([0] + n_list)
ran = np.arange(ab_list[-1])
mats = [np.diag(((a <= ran) & (ran < b)).astype('int'))
for a,b in zip(ab_list[:-1],ab_list[1:])]
for mat in mats:
print(mat,'\n')
Result:
[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 1 0 0]
[0 0 0 0 1 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 1]]
Edit: Here's a faster solution that yields the same result
n_list = [A,B,C]
ab_list = np.cumsum([0] + n_list)
total = ab_list[-1]
ran = np.arange(total)
mats = np.zeros((len(n_list),total,total))
for k,p in enumerate(zip(ab_list[:-1],ab_list[1:])):
idx = np.arange(p[0],p[1])
mats[k,idx,idx] = 1
for mat in mats:
print(mat,'\n')
This seems to yield a ~10% speedup over the currently accepted solution
Another with roughly equivalent performance:
n_list = [A,B,C]
m = len(n_list)
ab_list = np.cumsum([0] + n_list)
total = ab_list[-1]
ran = np.arange(total)
mats = np.zeros((m,total,total))
idx = [k for a,b in zip(ab_list[:-1],ab_list[1:]) for k in range(a,b)]
mats[[k for k,n in enumerate(n_list) for _ in range(n)],
idx,idx] = 1
for mat in mats:
print(mat,'\n')
You can achieve even better performance by just allocating the array once, then setting the values all at once by specifying the indices. The indices are fortunately easy to obtain.
import numpy as np
a = [3, 2, 1] # Put your values in a list
s = np.sum(a)
m = np.zeros((len(a), s, s), dtype=int) # Initialize array once
indices = (np.repeat(range(len(a)), a), *np.diag_indices(s, 2)) # Get indices
m[indices] = 1 # Set the diagonals at once
return m
Output:
[[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 1 0 0]
[0 0 0 0 1 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 1]]]
Comparing to #Ben Grossmann's answer, with A=3000, B=2000, C=1000 and 100 repeats:
def A():
'''My solution'''
a = [3000, 2000, 1000] # Put your values in a list
s = np.sum(a)
m = np.zeros((len(a), s, s), dtype=int) # Initialize array once
indices = (np.repeat(range(len(a)), a), *np.diag_indices(s, 2)) # Get indices
m[indices] = 1 # Set the diagonals at once
return m
def B():
'''Bens solution'''
A = 3000
B = 2000
C = 1000
n_list = [A,B,C]
ab_list = np.cumsum([0] + n_list)
ran = np.arange(ab_list[-1])
return [np.diag(((a <= ran) & (ran < b)).astype('int')) for a,b in zip(ab_list[:-1], ab_list[1:])]
print(f'Timings:')
timeA = timeit.timeit(A, number=100)
timeB = timeit.timeit(B, number=100)
ratio = timeA / timeB
print(f'This solution: {timeA} seconds')
print(f'Current accepted answer: {timeB} seconds')
if ratio < 1:
print(f'This solution is {1 / ratio} times faster than Bens solution')
else:
print(f'Bens solution is {ratio} times faster than this solution')
Output:
Timings:
This solution: 1.6834218999993027 seconds
Current accepted answer: 5.096610300000066 seconds
This solution is 3.027529997086397 times faster than Bens solution
EDIT: Changed the "indices" algorithm to use np.repeat instead of np.concatenate.
One posible method ( don't think it's optimal but it works):
import numpy as np
a = 3
b = 2
c = 1
values = [a,b,c] #create a list with values
n = sum(values) #calc total length of diagnal
#create an array with cumulative sums but starting from 0 to use as index
idx_vals = np.zeros(len(values)+1,dtype=int)
np.cumsum(values,out=idx_vals[1:]);
#create every diagonal using values, then create diagonal matrices and
#save them in `matrices` list
matrices = []
for idx,v in enumerate(values):
diag = np.zeros(n)
diag[idx_vals[idx]:idx_vals[idx]+v] = np.ones(v)
print(diag)
matrices.append(np.diag(diag))
Yet another possibility:
import numpy as np
# your constants here
constants = [3, 2, 1] # [A, B, C]
size = sum(constants)
cumsum = np.cumsum([0] + constants)
for i in range(len(cumsum) - 1):
inputVector = np.zeros(size, dtype=int)
inputVector[cumsum[i]:cumsum[i+1]] = 1
matrix = np.diag(inputVector)
print(matrix, '\n')
Output:
[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 1 0 0]
[0 0 0 0 1 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 1]]

Efficient and Pythonic way to calculate Euclidean distance to the nearest nonzero element, for each nonzero element in NumPy 2D array

I have a bi-dimensional NumPy array of shape M × N with many values set to 0 and others with value ≠ 0.
The following is an example of the aforesaid matrix:
A = np.array([[0, 0, 0, 1, 0, 2, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 3, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 6, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0]])
And that's it nicely formatted:
A = [[0 0 0 1 0 2 0 0]
[0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0]
[0 1 0 0 0 0 0 0]
[0 0 0 1 0 0 3 0]
[1 0 0 0 0 0 0 0]
[0 0 0 0 6 0 0 0]
[0 0 1 0 0 0 0 0]]
My task is to find, for each nonzero element (e.g. 1, 2, 1, 1, 1, 3, 1, 6 and 1) in the 2D array (A), the distance to the nearest nonzero element (except itself) by means of Euclidean distance, and then create a list (L) with the calculated distances.
The following invariant must hold:
if np.count_nonzero(A) < 2:
assert len(L) == 0
else
assert np.count_nonzero(A) == len(L)
Calculations for array A is the following:
Nearest nonzero element for A[0, 3] = 1 is A[0, 5] = 2 at distance = 2
Nearest nonzero element for A[0, 5] = 2 is A[0, 3] = 1 at distance = 2
Nearest nonzero element for A[2, 4] = 1 is A[0, 3] = 1 at distance = 2.24
Nearest nonzero element for A[3, 1] = 1 is A[4, 3] = 1 at distance = 2.24
Nearest nonzero element for A[4, 3] = 1 is A[2, 4] = 1 at distance = 2.24
Nearest nonzero element for A[4, 6] = 3 is A[2, 4] = 1 at distance = 2.83
Nearest nonzero element for A[5, 0] = 1 is A[3, 1] = 1 at distance = 2.24
Nearest nonzero element for A[6, 4] = 6 is A[4, 3] = 1 at distance = 2.24
Nearest nonzero element for A[7, 2] = 1 is A[6, 4] = 6 at distance = 2.24
The list L is then L = [2, 2, 2.24, 2.24, 2.24, 2.83, 2.24, 2.24, 2.24].
I wrote the following code to solve the problem, and I think it works correctly, but it has two problems: it's the naive, brute force and non vectorized solution of 𝒪(M² × N²) time complexity, and is not very clear, concise and succinct; that is, it's not Pythonic.
def get_distance_list(A):
L = []
for (m, n), a_mn in np.ndenumerate(A):
# skip this element if its value is 0
if a_mn == 0:
continue
d_min = math.inf
for (k, l), a_kl in np.ndenumerate(A):
# skip this element if its value is 0 or if it's me
if a_kl == 0 or (m, n) == (k, l):
continue
d = scipy.spatial.distance.euclidean((m, n), (k, l))
d_min = min(d_min, d)
# in case there are less than two nonzero values in the matrix,
# the returned list must be empty, so only add the distance
# if it's different than the default value of +inf
if d_min != math.inf:
L.append(d_min)
return L
Do you know if there is a built-in function (maybe in NumPy, SciPy, SciKit, etc.) which can replace the one I wrote, or if there is a faster/vectorized and more Pythonic way to solve the problem?
I think using scipy.spatial.KDTree is perfect for this.
from scipy.spatial import KDTree
nonzeros = np.transpose(np.nonzero(A))
t = KDTree(nonzeros)
dists, nns = t.query(nonzeros, 2)
for (i, j), d in zip(nns, dists[:,1]):
print(nonzeros[i], "is closest to", nonzeros[j], "with distance", d)
Result:
[0 3] is closest to [0 5] with distance 2.0
[0 5] is closest to [0 3] with distance 2.0
[2 4] is closest to [0 5] with distance 2.23606797749979
[3 1] is closest to [4 3] with distance 2.23606797749979
[4 3] is closest to [3 1] with distance 2.23606797749979
[4 6] is closest to [2 4] with distance 2.8284271247461903
[5 0] is closest to [3 1] with distance 2.23606797749979
[6 4] is closest to [4 3] with distance 2.23606797749979
[7 2] is closest to [6 4] with distance 2.23606797749979
This uses numpy, there could well be other functions that can streamline this.
import numpy as np
A = np.array([[0, 0, 0, 1, 0, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 3, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0]])
x, y = np.where( A != 0 )
print( x,'\n', y)
# [0 0 2 3 4 4 5 6 7] # x coords
# [3 5 4 1 3 6 0 4 2] # y coords
diff_x = np.subtract.outer( x, x ) # differences all x from each x
diff_y = np.subtract.outer( y, y ) # all y from each y
distance = np.sqrt( diff_x * diff_x + diff_y * diff_y )
# Or using np.complex
# point = x + y * np.complex( 0, 1 ) # x and y in the complex plane
# distance = abs(np.subtract.outer( point, point )) # Euclidian distances point to all points
distance[ distance == 0 ] = distance.max() # or use np.diagonal to remove zeroes.
ind = np.argmin( distance, axis = 1 ) # indices of minimums
for i, ix in enumerate( ind ):
print( x[i], y[i], 'close to', x[ix], y[ix], 'distance = ', distance[ i, ix ] )
This produces:
0 3 close to 0 5 distance = 2.0
0 5 close to 0 3 distance = 2.0
2 4 close to 0 3 distance = 2.23606797749979
3 1 close to 4 3 distance = 2.23606797749979
4 3 close to 2 4 distance = 2.23606797749979
4 6 close to 2 4 distance = 2.8284271247461903
5 0 close to 3 1 distance = 2.23606797749979
6 4 close to 4 3 distance = 2.23606797749979
7 2 close to 6 4 distance = 2.23606797749979

Do you have some advices about signal processing on binary time series?

I have a binary time series with some ASK modulated signals in different frequencies inside of it.
Let's say it's something like this: x = [0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0, ...]
What's matter to me is having all the '1' and '0' in an interval of 4 samples or more, but sometimes the '0' and '1' change places like this: x1 = [0,0,0,1,1,1,1,1] when it had to be x2 = [0,0,0,0,1,1,1,1]
And there's also some noise as spikes as seen in n1 = [0,0,0,0,0,0,1,1,0,0,0,0,0] when it should be only zeros.
I've already tried moving average and it introduced a lag to the signal that was't good for my application.
Do you have some advices about signal processing on binary time series?
The following code finds the indices of all continuous sequences with the length smaller than 4 (min_cont_length). It also gives you the lengths of the problematic sectors, so you can decide how to handle them.
import numpy as np
def find_index_of_err(signal, min_cont_length = 4):
# pad sides to detect problems at the edges
signal = np.concatenate(([1-signal[0]],signal,[1-signal[-1]]))
# calculate differences from 1 element to the next
delta = np.concatenate(([0], np.diff(signal, 1)))
# detect discontinuities
discontinuity = np.where(delta!=0)[0]
# select discontinuities with matching length (< min_cont_length)
err_idx = discontinuity[:-1][np.diff(discontinuity) < min_cont_length] - 1
# get also the size of the gap
err_val = np.diff(discontinuity)[np.argwhere(np.diff(discontinuity) < min_cont_length).flatten()]
return err_idx, err_val
# some test signals
signals = np.array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]])
for sig in signals:
index, value = find_index_of_err(sig)
print(sig, index, value)
# Output:
# [1 0 0 0 0 0 0 0 0 0 0] [0] [1]
# [0 0 1 0 0 0 0 0 0 0 0] [0 2] [2 1]
# [0 0 0 0 1 0 0 0 0 0 0] [4] [1]
# [0 0 0 0 0 0 1 1 0 0 0] [6 8] [2 3]
# [0 0 0 0 0 0 1 1 1 1 1] [] []

Generate Parity-check matrix from Generator matrix

Is there a function in (numpy) or well-tested function to calculate Parity-check matrix (https://en.wikipedia.org/wiki/Parity-check_matrix)
from Generator matrix?
P.S.
I did not find solution on this site.
If my understanding is correct parity-check matrix is nullspace of generator
matrix in modulo 2. There is solution for this in scipy but this function
give non integer nullspace. You can use sympy but it can be slow for big
matrices.
"""
>>> np.set_string_function(str)
>>> h
[[0 1 1 1 1 0 0]
[1 0 1 1 0 1 0]
[1 1 0 1 0 0 1]]
>>> (g # h.T) % 2
[[0 0 0]
[0 0 0]
[0 0 0]
[0 0 0]]
"""
import sympy
import numpy as np
g = np.array([[1, 1, 1, 0, 0, 0, 0],
[1, 0, 0, 1, 1, 0, 0],
[0, 1, 0, 1, 0, 1, 0],
[1, 1, 0, 1, 0, 0, 1]])
h = np.array(sympy.Matrix(g).nullspace()) % 2
where h is parity-check matrix.

Categories

Resources