Sum function using slices with a numpy array Python

Sum function using slices with a numpy array Python - python

Is there a way I could index through a numpy list just like how I would be able to within a normal list function. I want to go through 3 elements in the list moving up by one point every single time and summing all the slices. So it would go through 1,2,3 for the first sum and then it would go through 2,3,4 for the second sum etc. The code down below gives me a scalar error, is there a way I could perform the function below without using a for loop.
import numpy as np
n = 3
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21 ,22, 23, 24, 25])
start = np.arange(0, len(arr)-n, 1)
stop = np.arange(n-1, len(arr), 1)
sum_arr = np.sum(arr[start:stop])

I think this should work:
sum_arr = arr[1:-1] + arr[2:] + arr[:-2]
This creates an array that's two values shorter than arr because the last element in arr doesn't have two extra elements to create a sum with.
If you wanted the array to be of the same length as the original arr, you could append two extra zeros to the arr array like so:
arr = np.append(arr, [0, 0])
sum_arr = arr[1:-1] + arr[2:] + arr[:-2]

To sum a sliding range of n elements you can use convolve1d with all weights set to 1. Use 'constant' boundary mode with the default fill value of 0. As the filter window is centered by default you need to adjust the length of the result at both ends.
import numpy as np
from scipy.ndimage import convolve1d
arr = np.arange(1,26)
for n in range(2,6):
k,r = divmod(n, 2)
print(n, convolve1d(arr, np.ones(n), mode='constant')[k+r-1:-k])
Result:
2 [ 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49]
3 [ 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72]
4 [ 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 90 94]
5 [ 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115]

Related

Quick way to calculate mean around an element in 3d array

I want to calculate the sum around an element in around. For example, calculate the sum of neighboring elements which are within 5 units (in any x,y,z direction). I wrote a loop to do this. This function is to calculate mean of a block in the 3D array. The shape of array is (159,191,159) It works ok but because it will be used in another loop, I want to make it run at least one magnitude faster.
How can I use NumPy (or any other way) to make this run more efficient? For example, conditional np.sum() I guess? Can anyone give me a simple efficient example to calculate the mean?
def patch_mean(coordinate_x,coordinate_y,coordinate_z,image,patch_radius):
for a in range(coordinate_x- patch_radius, coordinate_x + patch_radius):
for b in range(coordinate_y - patch_radius, coordinate_y + patch_radius):
for c in range (coordinate_z - patch_radius, coordinate_z + patch_radius):
if 0<a<159 and 0<b<191 and 0<c<159:
if image[a][b][c] != 0:
sum = sum + img[a][b][c]
count = count + 1
if count==0:
mean=0
else:
mean=sum/count
return mean

You can use a convolution approach.
(However, I am not sure about its performance.)
Here is a simple example for a 2-D array. This example is referenced from the following two articles:
In numpy, how to efficiently list all fixed-size submatrices?
Convolve2d just by using Numpy
import numpy as np
from numpy.lib.stride_tricks import as_strided
data = np.arange(48).reshape(6, 8)
data =
[[ 0 1 2 3 4 5 6 7]
[ 8 9 10 11 12 13 14 15]
[16 17 18 19 20 21 22 23]
[24 25 26 27 28 29 30 31]
[32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47]]
mean_filter_shape = (3, 4)
data_new_shape = tuple(np.subtract(data.shape, mean_filter_shape) + 1) + mean_filter_shape
data_new = as_strided(data, data_new_shape, data.strides * 2)
data_new =
[[[[ 0 1 2 3]
[ 8 9 10 11]
[16 17 18 19]]
...
[[28 29 30 31]
[36 37 38 39]
[44 45 46 47]]]]
mean_filter = np.ones(mean_filter_shape)
data_mean = np.einsum('ij,klij->kl', mean_filter, data_new) / np.prod(mean_filter_shape)
data_mean =
[[ 9.5 10.5 11.5 12.5 13.5]
[17.5 18.5 19.5 20.5 21.5]
[25.5 26.5 27.5 28.5 29.5]
[33.5 34.5 35.5 36.5 37.5]]

You can use scipy.signal.convolve with a numpy.ones kernel.
Documentation:
scipy.signal.convolve;
numpy.ones.
import numpy as np
from scipy.signal import convolve
data = np.random.random((159,191,159))
patch_radius = 5
kernel = np.ones((2*patch_radius+1,2*patch_radius+1,2*patch_radius+1))
data_mean = convolve(data, kernel, mode='same')

numpy vectorized resampling like pandas DataFrame resample

I have an (4, 2000) numpy array and want to resample each column (N=4) for every 5 elements with such as max, min, left, right, which makes its shape as (4, 400).
I can do with Pandas.DataFrame using .resample('5Min').agg(~) or with numpy array and for loop like result = [max(input[i:i+5]) for i in range(0, len(input), 5)]. However, it takes amount of time with large input array since it's not vectorized. Is there any way that I can do with vectorized computation with np array?

Here is another way that uses numpy strides under the hood (a is your array):
from skimage.util import view_as_blocks
a = view_as_blocks(a, (4,5))
Now, you can use methods/slicing for parameters you want:
#max
a.max(-1)[0].T
#min
a.min(-1)[0].T
#left
a[...,0][0].T
#right
a[...,-1][0].T
example:
a
#[[ 0 1 2 3 4 5 6 7 8 9]
# [10 11 12 13 14 15 16 17 18 19]
# [20 21 22 23 24 25 26 27 28 29]
# [30 31 32 33 34 35 36 37 38 39]]
output for max
#[[ 4 9]
# [14 19]
# [24 29]
# [34 39]]

pythonic way to count multiple columns conditionaly check

I'm trying to make a ordinary loop under specific conditions.
I want to interact over rows, checking conditions, and then interact over columns counting how many times the condition was meet.
This counting should generate a new column e my dataframe indicating the total count for each row.
I tried to use apply and mapapply with no success.
I successfully generated the following code to reach my goal.
But I bet there is more efficient ways, or even, built-in pandas functions to do it.
Anyone know how?
sample code:
import pandas as pd
df = pd.DataFrame({'1column': [11, 22, 33, 44],
'2column': [32, 42, 15, 35],
'3column': [33, 77, 26, 64],
'4column': [99, 11, 110, 22],
'5column': [20, 64, 55, 33],
'6column': [10, 77, 77, 10]})
check_columns = ['3column','5column', '6column' ]
df1 = df.copy()
df1['bignum_count'] = 0
for column in check_columns:
inner_loop_count = []
bigseries = df[column]>=50
for big in bigseries:
if big:
inner_loop_count.append(1)
else:
inner_loop_count.append(0)
df1['bignum_count'] += inner_loop_count
# View the dataframe
df1
results:
1column 2column 3column 4column 5column 6column bignum_count
0 11 32 33 99 20 10 0
1 22 42 77 11 64 77 3
2 33 15 26 110 55 77 2
3 44 35 64 22 33 10 1

Index on the columns of interest and check which are greater or equal (ge) than a threshold:
df['bignum_count'] = df[check_columns].ge(50).sum(1)
print(df)
1column 2column 3column 4column 5column 6column bignum_count
0 11 32 33 99 20 10 0
1 22 42 77 11 64 77 3
2 33 15 26 110 55 77 2
3 44 35 64 22 33 10 1
check_columns
df1 = df.copy()

Use DataFrame.ge for >= with counts Trues values by sum:
df['bignum_count'] = df[check_columns].ge(50).sum(axis=1)
#alternative
#df['bignum_count'] = (df[check_columns]>=50).sum(axis=1)
print(df)
1column 2column 3column 4column 5column 6column bignum_count
0 11 32 33 99 20 10 0
1 22 42 77 11 64 77 3
2 33 15 26 110 55 77 2
3 44 35 64 22 33 10 1

Deleting certain elements from a matrix

I have the following problem:
I have a matrix. Now, I want to delete one entry in each row of the matrix: In rows that contain a certain number (say 4) I want to delete the entry with that number, and in other rows I simply want to delete the last element.
E.g. if I have the matrix
matrix=np.zeros((2,2))
matrix[0,0]=2
matrix[1,0]=4
matrix
which gives
2 0
4 0
after the deletion it should simply be
2
0
thanks for your help!

so, assuming there's maximum only one 4 in a row, what you want to do is:
iterate all rows, and if there's a four use roll so it becomes the last element
delete the last column
in rows that have 4, it will delete this 4 and shift the remaining values that come after it,
in rows that don't have 4, it will delete the last element.
(I took the liberty of trying with a little bigger matrix just to make sure output is as expected)
try this:
import numpy as np
# Actual solution
def remove_in_rows(mat, num):
for i, row in enumerate(mat):
if num in row.tolist():
index = row.tolist().index(num)
mat[i][index:] = np.roll(row[index:], -1)
return np.delete(mat, -1, 1)
# Just some example to demonstrate it works
matrix = np.array([[10 * y + x for x in range(6)] for y in range(6)])
matrix[1, 2] = 4
matrix[3, 3] = 4
matrix[4, 0] = 4
print("BEFORE:")
print(matrix)
matrix = remove_in_rows(matrix, 4)
print("AFTER:")
print(matrix)
Output:
BEFORE:
[[ 0 1 2 3 4 5]
[10 11 4 13 14 15]
[20 21 22 23 24 25]
[30 31 32 4 34 35]
[ 4 41 42 43 44 45]
[50 51 52 53 54 55]]
AFTER:
[[ 0 1 2 3 5]
[10 11 13 14 15]
[20 21 22 23 24]
[30 31 32 34 35]
[41 42 43 44 45]
[50 51 52 53 54]]

How do I put the output in an array?

How do I put the output generated in an array? I then want to subtract each element by its preceding element. For example: 6-0, 7-6, 10-7 etc to get the running length.
for index, item in enumerate(binary_sequence):
if item == 1:
print(index)
Output:
6
7
10
11
15
16
19
30
35
44
48
49
51
54
55
56
57
60
74
76
78
80
85
90
97
98

Python has ways to avoid indexing for common manipulations
zip lets you pair lists and iterate over the paired values, pairing a list with a offset version of itself is common
seq = [*range(7, 30, 4)]
seq
Out[28]: [7, 11, 15, 19, 23, 27]
out = []
for a, b in zip(seq, [0] + seq): # '[0] + seq' puts a '0' in front, shifts rest over
out.append(a - b)
print(out)
[7, 4, 4, 4, 4, 4]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sum function using slices with a numpy array Python - python

Related

Quick way to calculate mean around an element in 3d array

numpy vectorized resampling like pandas DataFrame resample

pythonic way to count multiple columns conditionaly check

Deleting certain elements from a matrix

How do I put the output in an array?

Categories

Resources