numpy vectorized resampling like pandas DataFrame resample - python

I have an (4, 2000) numpy array and want to resample each column (N=4) for every 5 elements with such as max, min, left, right, which makes its shape as (4, 400).
I can do with Pandas.DataFrame using .resample('5Min').agg(~) or with numpy array and for loop like result = [max(input[i:i+5]) for i in range(0, len(input), 5)]. However, it takes amount of time with large input array since it's not vectorized. Is there any way that I can do with vectorized computation with np array?

Here is another way that uses numpy strides under the hood (a is your array):
from skimage.util import view_as_blocks
a = view_as_blocks(a, (4,5))
Now, you can use methods/slicing for parameters you want:
#max
a.max(-1)[0].T
#min
a.min(-1)[0].T
#left
a[...,0][0].T
#right
a[...,-1][0].T
example:
a
#[[ 0 1 2 3 4 5 6 7 8 9]
# [10 11 12 13 14 15 16 17 18 19]
# [20 21 22 23 24 25 26 27 28 29]
# [30 31 32 33 34 35 36 37 38 39]]
output for max
#[[ 4 9]
# [14 19]
# [24 29]
# [34 39]]

Related

Quick way to calculate mean around an element in 3d array

I want to calculate the sum around an element in around. For example, calculate the sum of neighboring elements which are within 5 units (in any x,y,z direction). I wrote a loop to do this. This function is to calculate mean of a block in the 3D array. The shape of array is (159,191,159) It works ok but because it will be used in another loop, I want to make it run at least one magnitude faster.
How can I use NumPy (or any other way) to make this run more efficient? For example, conditional np.sum() I guess? Can anyone give me a simple efficient example to calculate the mean?
def patch_mean(coordinate_x,coordinate_y,coordinate_z,image,patch_radius):
for a in range(coordinate_x- patch_radius, coordinate_x + patch_radius):
for b in range(coordinate_y - patch_radius, coordinate_y + patch_radius):
for c in range (coordinate_z - patch_radius, coordinate_z + patch_radius):
if 0<a<159 and 0<b<191 and 0<c<159:
if image[a][b][c] != 0:
sum = sum + img[a][b][c]
count = count + 1
if count==0:
mean=0
else:
mean=sum/count
return mean
You can use a convolution approach.
(However, I am not sure about its performance.)
Here is a simple example for a 2-D array. This example is referenced from the following two articles:
In numpy, how to efficiently list all fixed-size submatrices?
Convolve2d just by using Numpy
import numpy as np
from numpy.lib.stride_tricks import as_strided
data = np.arange(48).reshape(6, 8)
data =
[[ 0 1 2 3 4 5 6 7]
[ 8 9 10 11 12 13 14 15]
[16 17 18 19 20 21 22 23]
[24 25 26 27 28 29 30 31]
[32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47]]
mean_filter_shape = (3, 4)
data_new_shape = tuple(np.subtract(data.shape, mean_filter_shape) + 1) + mean_filter_shape
data_new = as_strided(data, data_new_shape, data.strides * 2)
data_new =
[[[[ 0 1 2 3]
[ 8 9 10 11]
[16 17 18 19]]
...
[[28 29 30 31]
[36 37 38 39]
[44 45 46 47]]]]
mean_filter = np.ones(mean_filter_shape)
data_mean = np.einsum('ij,klij->kl', mean_filter, data_new) / np.prod(mean_filter_shape)
data_mean =
[[ 9.5 10.5 11.5 12.5 13.5]
[17.5 18.5 19.5 20.5 21.5]
[25.5 26.5 27.5 28.5 29.5]
[33.5 34.5 35.5 36.5 37.5]]
You can use scipy.signal.convolve with a numpy.ones kernel.
Documentation:
scipy.signal.convolve;
numpy.ones.
import numpy as np
from scipy.signal import convolve
data = np.random.random((159,191,159))
patch_radius = 5
kernel = np.ones((2*patch_radius+1,2*patch_radius+1,2*patch_radius+1))
data_mean = convolve(data, kernel, mode='same')

Sum function using slices with a numpy array Python

Is there a way I could index through a numpy list just like how I would be able to within a normal list function. I want to go through 3 elements in the list moving up by one point every single time and summing all the slices. So it would go through 1,2,3 for the first sum and then it would go through 2,3,4 for the second sum etc. The code down below gives me a scalar error, is there a way I could perform the function below without using a for loop.
import numpy as np
n = 3
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21 ,22, 23, 24, 25])
start = np.arange(0, len(arr)-n, 1)
stop = np.arange(n-1, len(arr), 1)
sum_arr = np.sum(arr[start:stop])
I think this should work:
sum_arr = arr[1:-1] + arr[2:] + arr[:-2]
This creates an array that's two values shorter than arr because the last element in arr doesn't have two extra elements to create a sum with.
If you wanted the array to be of the same length as the original arr, you could append two extra zeros to the arr array like so:
arr = np.append(arr, [0, 0])
sum_arr = arr[1:-1] + arr[2:] + arr[:-2]
To sum a sliding range of n elements you can use convolve1d with all weights set to 1. Use 'constant' boundary mode with the default fill value of 0. As the filter window is centered by default you need to adjust the length of the result at both ends.
import numpy as np
from scipy.ndimage import convolve1d
arr = np.arange(1,26)
for n in range(2,6):
k,r = divmod(n, 2)
print(n, convolve1d(arr, np.ones(n), mode='constant')[k+r-1:-k])
Result:
2 [ 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49]
3 [ 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72]
4 [ 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 90 94]
5 [ 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115]

Deleting certain elements from a matrix

I have the following problem:
I have a matrix. Now, I want to delete one entry in each row of the matrix: In rows that contain a certain number (say 4) I want to delete the entry with that number, and in other rows I simply want to delete the last element.
E.g. if I have the matrix
matrix=np.zeros((2,2))
matrix[0,0]=2
matrix[1,0]=4
matrix
which gives
2 0
4 0
after the deletion it should simply be
2
0
thanks for your help!
so, assuming there's maximum only one 4 in a row, what you want to do is:
iterate all rows, and if there's a four use roll so it becomes the last element
delete the last column
in rows that have 4, it will delete this 4 and shift the remaining values that come after it,
in rows that don't have 4, it will delete the last element.
(I took the liberty of trying with a little bigger matrix just to make sure output is as expected)
try this:
import numpy as np
# Actual solution
def remove_in_rows(mat, num):
for i, row in enumerate(mat):
if num in row.tolist():
index = row.tolist().index(num)
mat[i][index:] = np.roll(row[index:], -1)
return np.delete(mat, -1, 1)
# Just some example to demonstrate it works
matrix = np.array([[10 * y + x for x in range(6)] for y in range(6)])
matrix[1, 2] = 4
matrix[3, 3] = 4
matrix[4, 0] = 4
print("BEFORE:")
print(matrix)
matrix = remove_in_rows(matrix, 4)
print("AFTER:")
print(matrix)
Output:
BEFORE:
[[ 0 1 2 3 4 5]
[10 11 4 13 14 15]
[20 21 22 23 24 25]
[30 31 32 4 34 35]
[ 4 41 42 43 44 45]
[50 51 52 53 54 55]]
AFTER:
[[ 0 1 2 3 5]
[10 11 13 14 15]
[20 21 22 23 24]
[30 31 32 34 35]
[41 42 43 44 45]
[50 51 52 53 54]]

reshaping 3D matrix into 2D matrix using tensorflow

I have a 3D matrix of dimensions, 549x19x50 I need to create a 2D matrix which gets me a 549x950 matrix.
What i did so far is using tensorflow;
#data_3d is the 3D matrix
data_2d = tf.reshape(data_3d,[549,-1])
This prints out all the values of data_3d in the prompt and when I try to access data_2d it gives me an NameError
data_3d is a list of list of lists. Not a tensor or a ndarray. If we cant do this for lists, is there any way to easily convert lists to ndarrays?
Thanks in advance,
Bhashithe
There is a simple way to do so using numpy:
import numpy as np
data_3d = np.arange(27).reshape((3,3,3))
data_2d = data_3d.swapaxes(1,2).reshape(3,-1)
Ouput:
data_2d
[[ 0 3 6 1 4 7 2 5 8]
[ 9 12 15 10 13 16 11 14 17]
[18 21 24 19 22 25 20 23 26]]
print data_3d
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]
[24 25 26]]]
Note: swapaxes(1,2) is the main thing here - you need to define which axes you want to swap.

In Python how do you split a list into evenly sized chunks starting with the last element from the previous chunk?

What would be the most pythonic way to convert a list like:
mylist = [0,1,2,3,4,5,6,7,8]
into chunks of n elements that always start with the last element of the previous chunk.
The last element of the last chunk should be identical to the first element of the first chunk to make the data structure circular.
Like:
[
[0,1,2,3],
[3,4,5,6],
[6,7,8,0],
]
under the assumption that len(mylist) % (n-1) == 0 . So that it always works nicely.
What about the straightforward solution?
splitlists = [mylist[i:i+n] for i in range(0, len(mylist), n-1)]
splitlists[-1].append(splitlists[0][0])
A much less straightforward solution involving numpy (for the sake of overkill):
from numpy import arange, roll, column_stack
n = 4
values = arange(10, 26)
# values -> [10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25]
idx = arange(0, values.size, n) # [ 0 4 8 12]
idx = roll(idx, -1) # [ 4 8 12 0]
col = values[idx] # [14 18 22 10]
values = column_stack( (values.reshape(n, -1), col) )
[[10 11 12 13 14]
[14 15 16 17 18]
[18 19 20 21 22]
[22 23 24 25 10]]

Categories

Resources