Related
array[5, :, 10, 1] gives an output like this :
"0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0"
array[2, :, 4, 0] :
"22892. 22892. 22893. 22983. 22992. 22902. 22802. 22907. 21992.
21892. 23992. 24212. 22212. 22322. 23292. 27989. 22989. "
when array[5, :, 10, 1] is 0 id like to log the lowest value of master_array[2, :, 4, 0] only until it switches to 1. when array[5, :, 10, 1] is 1 d like to log the highest value of master_array[2, :, 4, 0] only up until it switches to 0. When the switch happens in master_array[5, :, 10, 1] i do not want to log the same values again . This will be logged to array[5, :, 12, 1]
the resulting array[5, :, 12, 1] column should look like this :
"22892, 22892, 22992, 22992, 22992, 22802, 22802, 23992, 23992, 23992, 23992, 22212, 22212, 22212, 22212, 22212, 22212"
This is the block of code that I tried but the problem I run into is that is seems a lot of values are missed. A good portion of the sub-array will be filled with the proper values but sometimes values that aren't even in the sample data will be filled in, or that it will print inf/-inf. The way the sample data is setup should assure every single row is filled
# Define the target index
target_index = (5, slice(None), 10, 1)
# Initialize the output array
output_array = np.empty(array[target_index].shape)
# Initialize the current_min and current_max
current_min = float('inf')
current_max = float('-inf')
# Initialize the switch_flag
switch_flag = False
# Iterate through the target array
for i, value in enumerate(array[target_index]):
if switch_flag:
if value == 0:
# Switch from 1 to 0
output_array[i] = current_max
current_max = float('-inf')
switch_flag = False
else:
output_array[i] = output_array[i-1]
else:
if value == 1:
# Switch from 0 to 1
output_array[i] = current_min
current_min = float('inf')
switch_flag = True
else:
output_array[i] = output_array[i-1]
# Update the current_min and current_max
current_min = min(current_min, array[2, i, 4, 0])
current_max = max(current_max, array[2, i, 4, 0])
# Assign the output_array to the target index
array[5, :, 12, 1] = output_array
def create_matrix(xy):
matrix = []
matrix_y = []
x = xy[0]
y = xy[1]
for z in range(y):
matrix_y.append(0)
for n in range(x):
matrix.append(matrix_y)
return matrix
def set_matrix(matrix,xy,set):
x = xy[0]
y = xy[1]
matrix[x][y] = set
return matrix
index = [4,5]
index_2 = [3,4]
z = create_matrix(index)
z = set_matrix(z,index_2, 12)
print(z)
output:
[[0, 0, 0, 0, 12], [0, 0, 0, 0, 12], [0, 0, 0, 0, 12], [0, 0, 0, 0, 12]]
This code should change only the last array
In your for n in range(x): loop you are appending the same y matrix multiple times. Python under the hood does not copy that array, but uses a pointer. So you have a row of pointers to the same one column.
Move the matrix_y = [] stuff inside the n loop and you get unique y arrays.
Comment: python does not actually have a pointer concept but it does use them. It hides from you when it does a copy data and when it only copies a pointer to that data. That's kind of bad language design, and it tripped you up here. So now you now that pointers exist, and that most of the time when you "assign arrays" you will actually only set a pointer.
Another comment: if you are going to be doing anything serious with matrices, you should really look into numpy. That will be many factors faster if you do numerical computations.
you don't need first loop in create_matrix, hide them with comment:
#for z in range(y):
# matrix_y.append(0)
change second one like this, it means an array filled with and length = y:
for n in range(x):
matrix.append([0] * y)
result (only last cell was changed in matrix):
z = set_matrix(z,index_2, 12)
print(z)
# [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 12]]
Given boundary value k, is there a vectorized way to replace each number n with consecutive descending numbers from n-1 to k? For example, if k is 0 the I'd like to replace np.array([3,4,2,2,1,3,1]) with np.array([2,1,0,3,2,1,0,1,0,1,0,0,2,1,0,0]). Every item of input array is greater than k.
I have tried combination of np.repeat and np.cumsum but it seems evasive solution:
x = np.array([3,4,2,2,1,3,1])
y = np.repeat(x, x)
t = -np.ones(y.shape[0])
t[np.r_[0, np.cumsum(x)[:-1]]] = x-1
np.cumsum(t)
Is there any other way? I expect smth like inverse of np.add.reduceat that is able to broadcast integers to decreasing sequences instead of minimizing them.
Here's another way with array-assignment to skip the repeat part -
def func1(a):
l = a.sum()
out = np.full(l, -1, dtype=int)
out[0] = a[0]-1
idx = a.cumsum()[:-1]
out[idx] = a[1:]-1
return out.cumsum()
Benchmarking
# OP's soln
def OP(x):
y = np.repeat(x, x)
t = -np.ones(y.shape[0], dtype=int)
t[np.r_[0, np.cumsum(x)[:-1]]] = x-1
return np.cumsum(t)
Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
import benchit
a = np.array([3,4,2,2,1,3,1])
in_ = [np.resize(a,n) for n in [10, 100, 1000, 10000]]
funcs = [OP, func1]
t = benchit.timings(funcs, in_)
t.plot(logx=True, save='timings.png')
Extend to take k as arg
def func1(a, k):
l = a.sum()+len(a)*(-k)
out = np.full(l, -1, dtype=int)
out[0] = a[0]-1
idx = (a-k).cumsum()[:-1]
out[idx] = a[1:]-1-k
return out.cumsum()
Sample run -
In [120]: a
Out[120]: array([3, 4, 2, 2, 1, 3, 1])
In [121]: func1(a, k=-1)
Out[121]:
array([ 2, 1, 0, -1, 3, 2, 1, 0, -1, 1, 0, -1, 1, 0, -1, 0, -1,
2, 1, 0, -1, 0, -1])
This is concise and probably ok for efficiency; I don't think apply is vectorized here, so you will be limited mostly be the number of elements in the original array (less so their value is my guess):
import pandas as pd
x = np.array([3,4,2,2,1,3,1])
values = pd.Series(x).apply(lambda val: np.arange(val-1,-1,-1)).values
output = np.concatenate(values)
I have a square 2D numpy array, A, and an array of zeros, B, with the same shape.
For every index (i, j) in A, other than the first and last rows and columns, I want to assign to B[i, j] the value of np.sum(A[i - 1:i + 2, j - 1:j + 2].
Example:
A =
array([[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 1, 0, 0],
[0, 1, 0, 1, 0],
[0, 0, 0, 0, 0])
B =
array([[0, 0, 0, 0, 0],
[0, 3, 4, 2, 0],
[0, 4, 6, 3, 0],
[0, 3, 4, 2, 0],
[0, 0, 0, 0, 0])
Is there an efficient way to do this? Or should I simply use a for loop?
There is a clever (read "borderline smartass") way to do this with np.lib.stride_tricks.as_strided. as_strided allows you to create views into your buffer that simulate windows by adding another dimension to the view. For example, if you had a 1D array like
>>> x = np.arange(10)
>>> np.lib.stride_tricks.as_strided(x, shape=(3, x.shape[0] - 2), strides=x.strides * 2)
array([[0, 1, 2, 3, 4, 5, 6, 7],
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 3, 4, 5, 6, 7, 8, 9]])
Hopefully it is clear that you can just sum along axis=0 to get the sum of each size 3 window. There is no reason you couldn't extrend that to two or more dimensions. I've written the shape and index of the previous example in a way that suggests a solution:
A = np.array([[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 1, 0, 0],
[0, 1, 0, 1, 0],
[0, 0, 0, 0, 0]])
view = np.lib.stride_tricks.as_strided(A,
shape=(3, 3, A.shape[0] - 2, A.shape[1] - 2),
strides=A.strides * 2
)
B[1:-1, 1:-1] = view.sum(axis=(0, 1))
Summing along multiple axes simultaneously has been supported in np.sum since v1.7.0. For older versions of numpy, just sum repeatedly (twice) along axis=0.
Filling in the edges of B is left as an exercise for the reader (since it's not really part of the question).
As an aside, the solution here is a one-liner if you want it to be. Personally, I think anything with as_strided is already illegible enough, and doesn't need any further obfuscation. I'm not sure if a for loop is going to be bad enough performance-wise to justify this method in fact.
For future reference, here is a generic window-making function that can be used to solve this sort of problem:
def window_view(a, window=3):
"""
Create a (read-only) view into `a` that defines window dimensions.
The first ``a.ndim`` dimensions of the returned view will be sized according to `window`.
The remaining ``a.ndim`` dimensions will be the original dimensions of `a`, truncated by `window - 1`.
The result can be post-precessed by reducing the leading dimensions. For example, a multi-dimensional moving average could look something like ::
window_view(a, window).sum(axis=tuple(range(a.ndim))) / window**a.ndim
If the window size were different for each dimension (`window` were a sequence rather than a scalar), the normalization would be ``np.prod(window)`` instead of ``window**a.ndim``.
Parameters
-----------
a : array-like
The array to window into. Due to numpy dimension constraints, can not have > 16 dims.
window :
Either a scalar indicating the window size for all dimensions, or a sequence of length `a.ndim` providing one size for each dimension.
Return
------
view : numpy.ndarray
A read-only view into `a` whose leading dimensions represent the requested windows into `a`.
``view.ndim == 2 * a.ndim``.
"""
a = np.array(a, copy=False, subok=True)
window = np.array(window, copy=False, subok=False, dtype=np.int)
if window.size == 1:
window = np.full(a.ndim, window)
elif window.size == a.ndim:
window = window.ravel()
else:
raise ValueError('Number of window sizes must match number of array dimensions')
shape = np.concatenate((window, a.shape))
shape[a.ndim:] -= window - 1
strides = a.strides * 2
return np.lib.stride_tricks.as_strided(a, shake=shape, strides=strides)
I have found no 'simple' ways of doing this. But here are two ways:
Still involves a for loop
# Basically, get the sum for each location and then pad the result with 0's
B = [[np.sum(A[j-1:j+2,i-1:i+2]) for i in range(1,len(A)-1)] for j in range(1,len(A[0])-1)]
B = np.pad(B, ((1,1)), "constant", constant_values=(0))
Is longer but no for loops (this will be a lot more efficient on big arrays):
# Roll basically slides the array in the desired direction
A_right = np.roll(A, -1, 1)
A_left = np.roll(A, 1, 1)
A_top = np.roll(A, 1, 0)
A_bottom = np.roll(A, -1, 0)
A_bot_right = np.roll(A_bottom, -1, 1)
A_bot_left = np.roll(A_bottom, 1, 1)
A_top_right = np.roll(A_top, -1, 1)
A_top_left = np.roll(A_top, 1, 1)
# After doing that, you can just add all those arrays and these operations
# are handled better directly by numpy compared to when you use for loops
B = A_right + A_left + A_top + A_bottom + A_top_left + A_top_right + A_bot_left + A_bot_right + A
# You can then return the edges to 0 or whatever you like
B[0:len(B),0] = 0
B[0:len(B),len(B[0])-1] = 0
B[0,0:len(B)] = 0
B[len(B[0])-1,0:len(B)] = 0
You can just sum the 9 arrays that make up a block, each one being shifted by 1 w.r.t. the previous in either dimension. Using slice notation this can be done for the whole array A at once:
B = np.zeros_like(A)
B[1:-1, 1:-1] = sum(A[i:A.shape[0]-2+i, j:A.shape[1]-2+j]
for i in range(0, 3) for j in range(0, 3))
General version for arbitrary rectangular windows
def sliding_window_sum(a, size):
"""Compute the sum of elements of a rectangular sliding window over the input array.
Parameters
----------
a : array_like
Two-dimensional input array.
size : int or tuple of int
The size of the window in row and column dimension; if int then a quadratic window is used.
Returns
-------
array
Shape is ``(a.shape[0] - size[0] + 1, a.shape[1] - size[1] + 1)``.
"""
if isinstance(size, int):
size = (size, size)
m = a.shape[0] - size[0] + 1
n = a.shape[1] - size[1] + 1
return sum(A[i:m+i, j:n+j] for i in range(0, size[0]) for j in range(0, size[1]))
I have this ugly, un-pythonic beast:
def crop(dat, clp=True):
'''Crops zero-edges of an array and (optionally) clips it to [0,1].
Example:
>>> crop( np.array(
... [[0,0,0,0,0,0],
... [0,0,0,0,0,0],
... [0,1,0,2,9,0],
... [0,0,0,0,0,0],
... [0,7,4,1,0,0],
... [0,0,0,0,0,0]]
... ))
array([[1, 0, 1, 1],
[0, 0, 0, 0],
[1, 1, 1, 0]])
'''
if clp: np.clip( dat, 0, 1, out=dat )
while np.all( dat[0,:]==0 ):
dat = dat[1:,:]
while np.all( dat[:,0]==0 ):
dat = dat[:,1:]
while np.all( dat[-1,:]==0 ):
dat = dat[:-1,:]
while np.all( dat[:,-1]==0 ):
dat = dat[:,:-1]
return dat
# Below gets rid of zero-lines/columns in the middle
#+so not usable.
#dat = dat[~np.all(dat==0, axis=1)]
#dat = dat[:, ~np.all(dat == 0, axis=0)]
How do I tame it, and make it beautiful?
Try incorporating something like this:
# argwhere will give you the coordinates of every non-zero point
true_points = np.argwhere(dat)
# take the smallest points and use them as the top left of your crop
top_left = true_points.min(axis=0)
# take the largest points and use them as the bottom right of your crop
bottom_right = true_points.max(axis=0)
out = dat[top_left[0]:bottom_right[0]+1, # plus 1 because slice isn't
top_left[1]:bottom_right[1]+1] # inclusive
This could be expanded without reasonable difficulty for the general n-d case.
This should work in any number of dimensions. I believe it is also quite efficient because swapping axes and slicing create only views on the array, not copies (which rules out functions such as take() or compress() which one might be tempted to use) or any temporaries. However it is not significantly 'nicer' than your own solution.
def crop2(dat, clp=True):
if clp: np.clip( dat, 0, 1, out=dat )
for i in range(dat.ndim):
dat = np.swapaxes(dat, 0, i) # send i-th axis to front
while np.all( dat[0]==0 ):
dat = dat[1:]
while np.all( dat[-1]==0 ):
dat = dat[:-1]
dat = np.swapaxes(dat, 0, i) # send i-th axis to its original position
return dat
Definitely not the prettiest approach but wanted to try something else.
def _fill_gap(a):
"""
a = 1D array of `True`s and `False`s.
Fill the gap between first and last `True` with `True`s.
Doesn't do a copy of `a` but in this case it isn't really needed.
"""
a[slice(*a.nonzero()[0].take([0,-1]))] = True
return a
def crop3(d, clip=True):
dat = np.array(d)
if clip: np.clip(dat, 0, 1, out=dat)
dat = np.compress(_fill_gap(dat.any(axis=0)), dat, axis=1)
dat = np.compress(_fill_gap(dat.any(axis=1)), dat, axis=0)
return dat
But it works.
In [639]: crop3(np.array(
...: [[0,0,0,0,0,0],
...: [0,0,0,0,0,0],
...: [0,1,0,2,9,0],
...: [0,0,0,0,0,0],
...: [0,7,4,1,0,0],
...: [0,0,0,0,0,0]]))
Out[639]:
array([[1, 0, 1, 1],
[0, 0, 0, 0],
[1, 1, 1, 0]])
Another way of implementing this, which is faster for dense arrays, makes use of the argmax property:
def get_last_nz(vec):
"""Get last nonzero element position of a vector
:param vec: the vector
:type vec: iterable
"""
if not isinstance(vec, np.ndarray) or vec.dtype != 'bool':
vec = np.array(vec) > 0
return vec.size - 1 - np.argmax(vec[::-1])
def get_first_nz(vec):
"""Get the first nonzero element position of a vector
:param vec: the vector
:type vec: iterable
"""
if not isinstance(vec, np.ndarray) or vec.dtype != 'bool':
vec = np.array(vec) > 0
return np.argmax(vec)
def crop(array):
y_sum = array.sum(axis=1) > 0
x_sum = array.sum(axis=0) > 0
x_min = get_first_nz(x_sum)
x_max = get_last_nz(x_sum)
y_min = get_first_nz(y_sum)
y_max = get_last_nz(y_sum)
return array[y_min: y_max + 1, x_min: x_max + 1]