I have array:
arr = np.array([1,2,3,2,3,4,3,2,1,2,3,1,2,3,2,2,3,4,2,1])
print (arr)
[1 2 3 2 3 4 3 2 1 2 3 1 2 3 2 2 3 4 2 1]
I would like find this pattern and return booelan mask:
pat = [1,2,3]
N = len(pat)
I use strides:
#https://stackoverflow.com/q/7100242/2901002
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
c = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
return c
print (rolling_window(arr, N))
[[1 2 3]
[2 3 2]
[3 2 3]
[2 3 4]
[3 4 3]
[4 3 2]
[3 2 1]
[2 1 2]
[1 2 3]
[2 3 1]
[3 1 2]
[1 2 3]
[2 3 2]
[3 2 2]
[2 2 3]
[2 3 4]
[3 4 2]
[4 2 1]]
I find positions of first values only:
b = np.all(rolling_window(arr, N) == pat, axis=1)
c = np.mgrid[0:len(b)][b]
print (c)
[ 0 8 11]
And positions another vals:
d = [i for x in c for i in range(x, x+N)]
print (d)
[0, 1, 2, 8, 9, 10, 11, 12, 13]
Last return mask by in1d:
e = np.in1d(np.arange(len(arr)), d)
print (e)
[ True True True False False False False False True True
True True True True False False False False False False]
Verify mask:
print (np.vstack((arr, e)))
[[1 2 3 2 3 4 3 2 1 2 3 1 2 3 2 2 3 4 2 1]
[1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0]]
1 2 3 1 2 3 1 2 3
I think my solution is a bit over-complicated. Is there some better, more pythonic solution?
We can simplify things at the end with Scipy supported binary-dilation -
from scipy.ndimage.morphology import binary_dilation
m = (rolling_window(arr, len(pat)) == pat).all(1)
m_ext = np.r_[m,np.zeros(len(arr) - len(m), dtype=bool)]
out = binary_dilation(m_ext, structure=[1]*N, origin=-(N//2))
For performance, we can bring in OpenCV with its template matching capability, as we are basically doing the same here, like so -
import cv2
tol = 1e-5
pat_arr = np.asarray(pat, dtype='uint8')
m = (cv2.matchTemplate(arr.astype('uint8'),pat_arr,cv2.TM_SQDIFF) < tol).ravel()
Not sure how safe this is, but another method would be to read back to an as_strided view of the boolean output. As long as you only have one pat at a time it shouldn't be a problem I think, and it may work with more but I can't gurantee it because reading back to as_strided can be a bit unpredictable:
def vview(a): #based on #jaime's answer: https://stackoverflow.com/a/16973510/4427777
return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
def roll_mask(arr, pat):
pat = np.atleast_2d(pat)
out = np.zeros_like(arr).astype(bool)
vout = rolling_window(out, pat.shape[-1])
vout[np.in1d(vview(rolling_window(arr, pat.shape[-1])), vview(pat))] = True
return out
np.where(roll_mask(arr, pat))
(array([ 0, 1, 2, 8, 9, 10, 11, 12, 13], dtype=int32),)
pat = np.array([[1, 2, 3], [3, 2, 3]])
print([i for i in arr[roll_mask(arr, pat)]])
[1, 2, 3, 2, 3, 1, 2, 3, 1, 2, 3]
It seems to work, but I wouldn't give this answer to a beginner!
Related
I want to create a whirlpool pattern using numpy, but not very sure about the approach.
The Whirpool starts from 0 at the center of an array. Every layer of whirlpool is incremented by 1. The last layer of whirlpool can have any number but only between 1 to 10.
Below image might help for understanding:-
I want to create a function that generates such whirpool patterns given the digits to be used in last layer. The last layer of whirpool should only allow numbers between 1 to 10 (inclusive). This should not be harcoding.
Very short and concise:
def whirlpool(n):
center = np.abs(np.arange(-n, n + 1))
return np.maximum.outer(center, center)
I'll take the liberty of calling this an onion instead of a whirlpool since this matrix has concentric layers instead of a spiral structure.
import numpy as np
def makeOnion(final_layer_num):
dim = 2 * final_layer_num + 1
matrix = []
for row_num in range(dim):
row = []
for col_num in range(dim):
row_centrality = abs(row_num - final_layer_num)
col_centrality = abs(col_num - final_layer_num)
row.append(max(row_centrality, col_centrality))
matrix.append(row)
return np.array(matrix)
If you calculate the "distance" of a row (or column) from the center, it will help with this problem, hence row_centrality and col_centrality. I just made those terms up though, maybe there are better ones. Whatever the case, the max between row & column centrality for a given entry in the matrix is equal to the layer that it is in.
I came up with this function, where n is the digit in the outer layer:
import numpy as np
def whirlpool(n):
m = n*2+1
arr = np.full((m, m), n)
for l in range(1, n+1):
arr[l:m-l, l:m-l] = np.full((m-l*2, m-l*2), n-l)
return arr
whirlpool(3)
Out:
array([[3, 3, 3, 3, 3, 3, 3],
[3, 2, 2, 2, 2, 2, 3],
[3, 2, 1, 1, 1, 2, 3],
[3, 2, 1, 0, 1, 2, 3],
[3, 2, 1, 1, 1, 2, 3],
[3, 2, 2, 2, 2, 2, 3],
[3, 3, 3, 3, 3, 3, 3]])
I simply create a full 2D array initialized with the maximum value (the outermost layer is "done" from the beginning), then decrement every lower layer by using a loop. This way requires allocating memory only once.
I made two functions where one takes the numer of layers as input and the other takes the maximum value as input. As you can see, converting between them is quite simple:
import numpy as np
def createOnionFromNumberOfLayers(layers: int):
dim = layers * 2 - 1
onion = np.full((dim, dim), layers - 1)
for i in range(1, layers):
slice_ = slice(i, dim - i)
onion[slice_, slice_] -= 1
return onion
def createOnionFromMaxValue(maxval: int):
return createOnionFromNumberOfLayers(maxval+1)
if __name__ == '__main__':
onion = createOnionFromNumberOfLayers(3)
print('Given number of layers:\n', onion, '\n')
onion = createOnionFromMaxValue(4)
print('Given max value:\n',onion)
Output:
Given number of layers:
[[2 2 2 2 2]
[2 1 1 1 2]
[2 1 0 1 2]
[2 1 1 1 2]
[2 2 2 2 2]]
Given max value:
[[4 4 4 4 4 4 4 4 4]
[4 3 3 3 3 3 3 3 4]
[4 3 2 2 2 2 2 3 4]
[4 3 2 1 1 1 2 3 4]
[4 3 2 1 0 1 2 3 4]
[4 3 2 1 1 1 2 3 4]
[4 3 2 2 2 2 2 3 4]
[4 3 3 3 3 3 3 3 4]
[4 4 4 4 4 4 4 4 4]]
Assume, I have a specific proportions of slots proportion = [30,30,20,10,10]and I want to feed it with 1 element and get it allocated one by one. For example, we start with [0,0,0,0,0] and add 1 we get [1,0,0,0,0]. What I have so far is that (based on this post answer):
def distribute_elements_in_slots(total, slots, pct):
distr = [total * pct[i] / 100 for i in range(slots)]
solid = [int(elem) for elem in distr]
short = [distr[i] - solid[i] for i in range(slots)]
leftover = int(round(sum(short)))
for i in range(leftover):
shortest = short.index(max(short))
solid[shortest] += 1
short[shortest] = 0
return solid
To feed 1 element at the time I've generated the list on 1's:
randomlist = []
for i in range(0,30):
n = random.randint(1,1)
randomlist.append(n)
print(randomlist)
And addition function to loop over that list:
x = 5
flexibility = [30, 30, 20, 10 ,10]
total = 0
cars = 0
for n in randomlist:
cars += 1
total += n
distributed = distribute_elements_in_slots(total, x, flexibility)
print(distributed)
But the broblem is this fucnction does not remeber the previous step.
1-[1, 0, 0, 0, 0]
2-[1, 1, 0, 0, 0]
3-[1, 1, 1, 0, 0]
4-[1, 1, 1, 1, 0] - on this step we have 4 elements in 4 slots.
5-[2, 2, 1, 0, 0] - on this step we took 1 from the fourth element and "gave" it to second.
But I want it it be like this:
1-[1, 0, 0, 0, 0]
2-[1, 1, 0, 0, 0]
3-[1, 1, 1, 0, 0]
4-[1, 1, 1, 1, 0]
5-[2, 1, 1, 1, 0]
This simple code gives a slot filling sequence without reallocations:
slots = 5
dist = np.array([0]*slots)
proportion = np.array([30,30,20,10,10])
for i in range(0,30):
total = max(dist.sum(),1)
prop = dist/total*100
error = proportion - prop
idx = np.argmax(error)
dist[idx] += 1
print(dist)
[1 0 0 0 0]
[1 1 0 0 0]
[1 1 1 0 0]
[1 1 1 1 0]
[1 1 1 1 1]
[2 1 1 1 1]
[2 2 1 1 1]
[2 2 2 1 1]
[3 2 2 1 1]
[3 3 2 1 1]
[4 3 2 1 1]
[4 4 2 1 1]
[4 4 3 1 1]
[4 4 3 2 1]
[4 4 3 2 2]
[5 4 3 2 2]
[5 5 3 2 2]
[5 5 4 2 2]
[6 5 4 2 2]
[6 6 4 2 2]
[7 6 4 2 2]
[7 7 4 2 2]
[7 7 5 2 2]
[7 7 5 3 2]
[7 7 5 3 3]
[8 7 5 3 3]
[8 8 5 3 3]
[8 8 6 3 3]
[9 8 6 3 3]
[9 9 6 3 3]
I'm looking to speed up my code that takes ~80 milliseconds for 300 sets to generate multiset_permutations from sympy. Ideally this would take only a few milliseconds; also the more items, the slower it gets.
What can I do to make my code faster? Multi-threading? Or convert to C? Any help here on speeding this up would be greatly appreciated.
import numpy as np
from time import monotonic
from sympy.utilities.iterables import multiset_permutations
milli_time = lambda: int(round(monotonic() * 1000))
start_time = milli_time()
num_indices = 5
num_items = 300
indices = np.array([list(multiset_permutations(list(range(num_indices)))) for _ in range(num_items)])
print(indices)
[[[0 1 2 3 4]
[0 1 2 4 3]
[0 1 3 2 4]
...
[4 3 1 2 0]
[4 3 2 0 1]
[4 3 2 1 0]]
[[0 1 2 3 4]
[0 1 2 4 3]
[0 1 3 2 4]
...
[4 3 1 2 0]
[4 3 2 0 1]
[4 3 2 1 0]]
[[0 1 2 3 4]
[0 1 2 4 3]
[0 1 3 2 4]
...
[4 3 1 2 0]
[4 3 2 0 1]
[4 3 2 1 0]]
...
[[0 1 2 3 4]
[0 1 2 4 3]
[0 1 3 2 4]
...
[4 3 1 2 0]
[4 3 2 0 1]
[4 3 2 1 0]]
[[0 1 2 3 4]
[0 1 2 4 3]
[0 1 3 2 4]
...
[4 3 1 2 0]
[4 3 2 0 1]
[4 3 2 1 0]]
[[0 1 2 3 4]
[0 1 2 4 3]
[0 1 3 2 4]
...
[4 3 1 2 0]
[4 3 2 0 1]
[4 3 2 1 0]]]
print('Multiset Perms:', milli_time() - start_time, 'milliseconds')
Multiset Perms: 88 milliseconds
** Code Update to Reduce extra computations by 2/3 **
import itertools
import numpy as np
from time import time, monotonic
from sympy.utilities.iterables import multiset_permutations
milli_time = lambda: int(round(monotonic() * 1000))
start_time = milli_time()
num_colors = 5
color_range = list(range(num_colors))
total_media = 300
def all_perms(elements):
if len(elements) <= 1:
yield elements # Only permutation possible = no permutation
else:
# Iteration over the first element in the result permutation:
for (index, first_elmt) in enumerate(elements):
other_elmts = elements[:index]+elements[index+1:]
for permutation in all_perms(other_elmts):
yield [first_elmt] + permutation
multiset = list(multiset_permutations(color_range))
# multiset = list(itertools.permutations(color_range))
# multiset = list(all_perms(color_range))
_range = range(total_media)
perm_indices = np.array([multiset for _ in _range])
print('Multiset Perms:', milli_time() - start_time)
Multiset Perms: 34 milliseconds
First of all, you do not need to recompute the permutations.
Moreover, np.array([multiset for _ in _range]) is expensive because Numpy have to transform multiset total_media times. You can solve that using np.array([multiset]).repeat(total_media, axis=0).
Finally, sympy is not the fastest implementation to perform such a computation. A faster implementation consists in using itertools instead:
num_colors = 5
total_media = 300
color_range = list(range(num_colors))
multiset = list(set(itertools.permutations(color_range)))
perm_indices = np.array([multiset], dtype=np.int32).repeat(total_media, axis=0)
However, this itertools-based implementation do not preserve the order of the permutations. If this is important, you can use np.sort on the Numpy array converted from multiset (with a specific axis and before applying repeat).
On my machine, this takes about 0.15 ms.
I'm trying to change values in matrix a with given index matrix d and matrix e.
And the matrix should always be symmetrical.
What I come up with is to overwrite the primal matrix with given index, and try to make it symmetrical, then go for another overwrite, until all the given index matrix have been gone through. It's not efficient.
But I'm stuck with how make it symmetrical.
For example:
a = np.ones([4,4],dtype=np.object) #the primal matrix
d = np.array([[1],
[2],
[0],
[0]]) #the first index matrix
a[np.arange(a.shape[0])[:,None],d] =2 #the element change to 2 with the indexes shown in d matrix
Now the result is:
a = np.array([[1 2 1 1]
[1 1 2 1]
[2 1 1 1]
[2 1 1 1]])
After making it symmetrical (if a[ i ][ j ] was selected in d matrix, a[ j ][ i ] should also be changed to 2, how to do this part).
The expected output should be :
a = np.array([[1 2 2 2]
[2 1 2 1]
[2 2 1 1]
[2 1 1 1]])
Then, for another overwrite again:
e = np.array([[0],[2],[1],[1]])
a[np.arange(a.shape[0])[:,None],e] =3
Now the result is:
a = np.array([[3 2 2 2]
[2 1 3 1]
[2 3 1 1]
[2 3 1 1]])
Make it symmetrical, (I don't know how to do this part) the final output should be : (overwrite the values if they were given 2 or 1 before)
a = np.array([[3 2 2 2]
[2 1 3 3]
[2 3 1 1]
[2 3 1 1]])
What should I do to get symmetrical matrix?
And, is there anyway to change the primal matrix a directly to get the final result? In a more efficient way?
Thanks in advance !!
You can simply switch the first and second indices and apply the change, the result would be symmetrical:
a[np.arange(a.shape[0])[:,None], d] = 2
a[d, np.arange(a.shape[0])[:,None]] = 2
output:
[[1 2 2 2]
[2 1 2 1]
[2 2 1 1]
[2 1 1 1]]
Same with any number of other changes:
a[np.arange(a.shape[0])[:,None], e] = 3
a[e, np.arange(a.shape[0])[:,None]] = 3
output:
[[3 2 2 2]
[2 1 3 3]
[2 3 1 1]
[2 3 1 1]]
I have a 2D coefficient array COEFF with size row x col and a position array POS with size n x 2.
The goal is to create a batched array BAT with size n x (2*l) x (2*l) where l is the half length of subarray.
It looks like this
BAT[i, :, :] = COEFF[POS[i, 1] - l:POS[i, 1] + l, POS[i, 0] - l:POS[i, 0] + l]
It is possible to generate BAT based on above sequential code. However, I'm wondering is there an efficient way to construct the BAT array in parallel.
Thanks!
I'm not aware of a perfectly satisfactory solution to mixing advanced indexing and slicing in that way. But the following may be acceptable (assuming that by "parallel" you mean "vectorised"):
import numpy as np
nrow, ncol = 7, 7
n, l = 3, 2
coeff = np.random.randint(0,10, (nrow,ncol))
pos = np.c_[np.random.randint(l, nrow-l+1, (n,)),np.random.randint(l, ncol-l+1, (n,))]
i = (pos[:, :1] + np.arange(-l, l))[:, :, None]
j = (pos[:, 1:] + np.arange(-l, l))[:, None, :]
print(coeff, '\n')
print(pos, '\n')
print(coeff[i, j])
Prints:
# [[7 6 7 6 3 9 9]
# [3 6 8 3 4 8 6]
# [3 7 4 7 4 6 8]
# [0 7 2 3 7 0 4]
# [8 5 2 0 0 1 7]
# [4 6 1 9 4 5 4]
# [1 6 8 3 4 5 0]]
# [[2 2]
# [3 2]
# [2 4]]
# [[[7 6 7 6]
# [3 6 8 3]
# [3 7 4 7]
# [0 7 2 3]]
# [[3 6 8 3]
# [3 7 4 7]
# [0 7 2 3]
# [8 5 2 0]]
# [[7 6 3 9]
# [8 3 4 8]
# [4 7 4 6]
# [2 3 7 0]]]