How to sum across n elements of numpy array - python

I hope that someone can help me with my problem since I'm not used to python and numpy yet. I have the following array with 24 elements:
load = np.array([10, 12, 9, 13, 17, 23, 25, 28, 26, 24, 22, 20, 18, 20, 22, 24, 26, 28, 23, 24, 21, 18, 16, 13])
I want to create a new array with the same length as "load" and calculate for each element in the array the sum of the current and the next two numbers, so that my objective array would look like this:
[31, 34, 39, 53, 65, 76, 79, 78, 72, 66, 60, 58, 60, 66, 72, 78, 77, 75, 68, 63, 55, 47, 29, 13]
I tried to solve this with the following code:
output = np.empty(len(load))
for i in range((len(output))-2):
output[i] = load[i]+load[i+1]+load[i+2]
print(output)
The output array looks like this:
array([31. , 34. , 39. , 53. , 65. , 76. , 79. , 78. , 72. , 66. , 60. ,
58. , 60. , 66. , 72. , 78. , 77. , 75. , 68. , 63. , 55. , 47. ,
6. , 4.5])
The last two numbers are not right. For the 23th element I want the sum of just 16 and 13 and for the last number to stay 13 since the array ends there. I don't unterstand how python calculated these numbers. Also I would prefer the numbers to be integers without the dot.
Does anyone have a better solution in mind? I know that this probably is easy to solve, I just don't know all the functionalities of numpy.
Thank you very much!

np.empty creates an array containing uninitialized data. In your code, you initialize an array output of length 24 but assign only 22 values to it. The last 2 values contain arbitrary values (i.e. garbage). Unless performance is of importance, np.zeros is usually the better choice for initializing arrays since all values will have a consistent value of 0.
You can solve this without a for loop by padding the input array with zeros, then computing a vectorized sum.
import numpy as np
load = np.array([10, 12, 9, 13, 17, 23, 25, 28, 26, 24, 22, 20, 18, 20, 22, 24, 26, 28, 23, 24, 21, 18, 16, 13])
tmp = np.pad(load, [0, 2])
output = load + tmp[1:-1] + tmp[2:]
print(output)
Output
[31 34 39 53 65 76 79 78 72 66 60 58 60 66 72 78 77 75 68 63 55 47 29 13]

If the array is not super long, and you don't care too much about memory utilization you could use:
from itertools import zip_longest
output = [sum([x, y, z]) for x, y, z in zip_longest(load, load[1:], load[2:], fillvalue=0)]
Output is:
[31, 34, 39, 53, 65, 76, 79, 78, 72, 66, 60, 58, 60, 66, 72, 78, 77, 75, 68, 63, 55, 47, 29, 13]

I'll address the question "How Python calculated those two numbers" in your source: they were not calculated by your program.
If you notice, your main loop runs until the end of the array but the last two elements. The value of those was not touched. For this reason it corresponds to the data that was in the memory at the position corresponding to the memory allocated by np.empty(). In fact, np.empty() will only acquire the ownership of the memory without initialization (i.e. without changing its content).
A simple approach is to loop through and sum different views of the original array:
def sum_next_k_loop(arr, k):
result = arr.copy()
for i in range(1, k):
result[:-i] += arr[i:]
return result
This is quite fast for relatively small values of k, but as k gets larger one may want to avoid the relatively slow explicit looping.
One way to do this is to use strides to create a view of the array that can be used to sum along an extra dimension.
This approach leaves behind the partial sums at the end of the input.
One could either start with a zero-padded input:
import numpy as np
import numpy.lib.stride_tricks
def sum_next_k_strides(arr, k):
n = arr.size
result = np.zeros(arr.size + k - 1, dtype=arr.dtype)
result[:n] = arr
window = (k,) * result.ndim
window_size = k ** result.ndim
reduced_shape = tuple(dim - k + 1 for dim, k in zip(result.shape, window))
view = np.lib.stride_tricks.as_strided(
result, shape=reduced_shape + window, strides=arr.strides * 2, writeable=False)
result = np.sum(view, axis=-1)
return result
or, more memory efficiently, construct the tail afterwards with np.cumsum():
import numpy as np
import numpy.lib.stride_tricks
def sum_next_k_strides_cs(arr, k):
n = arr.size
window = (k,) * arr.ndim
window_size = k ** arr.ndim
reduced_shape = tuple(dim - k + 1 for dim, k in zip(arr.shape, window))
view = np.lib.stride_tricks.as_strided(
arr, shape=reduced_shape + window, strides=arr.strides * 2, writeable=False)
result = np.empty_like(arr)
result[:n - k + 1] = np.sum(view, axis=-1)
result[n - k:] = np.cumsum(arr[-1:-(k + 1):-1])[::-1]
return result
Note that looping through the input size instead of k is not going to be fast, no matter the inputs, because k is limited by the size of the input.
Alternatively, one could use np.convolve(), which computes exactly what you are after but with both tails, so that you just need to slice out the starting tail:
def sum_next_k_conv(arr, k):
return np.convolve(arr, (1,) * k)[(k - 1):]
Finally, one could write a fully explicit looping solution accelerated with Numba:
import numpy as np
import numba as nb
#nb.njit
def running_sum_nb(arr, k):
n = arr.size
m = n - k + 1
o = k - 1
result = np.zeros(n, dtype=arr.dtype)
# : fill bulk
for j in range(m):
tot = arr[j]
for i in range(1, k):
tot += arr[j + i]
result[0 + j] = tot
# : fill tail
for j in range(o):
tot = 0
for i in range(j, o):
tot += arr[m + i]
result[m + j] = tot
return result
To check that all the solutions give the same result as the expected output:
funcs = running_sum_loop, running_sum_strides, running_sum_strides_cs, running_sum_conv, running_sum_nb
load = np.array([10, 12, 9, 13, 17, 23, 25, 28, 26, 24, 22, 20, 18, 20, 22, 24, 26, 28, 23, 24, 21, 18, 16, 13])
tgt = np.array([31, 34, 39, 53, 65, 76, 79, 78, 72, 66, 60, 58, 60, 66, 72, 78, 77, 75, 68, 63, 55, 47, 29, 13])
print(f"{'Input':>24} {load}")
print(f"{'Target':>24} {tgt}")
for i, func in enumerate(funcs, 1):
print(f"{func.__name__:>24} {func(load, 3)}")
Input [10 12 9 13 17 23 25 28 26 24 22 20 18 20 22 24 26 28 23 24 21 18 16 13]
Target [31 34 39 53 65 76 79 78 72 66 60 58 60 66 72 78 77 75 68 63 55 47 29 13]
running_sum_loop [31 34 39 53 65 76 79 78 72 66 60 58 60 66 72 78 77 75 68 63 55 47 29 13]
running_sum_strides_cs [31 34 39 53 65 76 79 78 72 66 60 58 60 66 72 78 77 75 68 63 55 47 29 13]
running_sum_strides [31 34 39 53 65 76 79 78 72 66 60 58 60 66 72 78 77 75 68 63 55 47 29 13]
running_sum_conv [31 34 39 53 65 76 79 78 72 66 60 58 60 66 72 78 77 75 68 63 55 47 29 13]
running_sum_nb [31 34 39 53 65 76 79 78 72 66 60 58 60 66 72 78 77 75 68 63 55 47 29 13]
Benchmarking all these for varying input size:
import pandas as pd
timeds_n = {}
for p in range(6):
n = 10 ** p
k = 3
arr = np.array(load.tolist() * n)
print(f"N = {n * len(load)}")
base = funcs[0](arr, k)
timeds_n[n] = []
for func in funcs:
res = func(arr, k)
timed = %timeit -r 8 -n 8 -q -o func(arr, k)
timeds_n[n].append(timed.best)
print(f"{func.__name__:>24} {np.allclose(base, res)} {timed.best:.9f}")
pd.DataFrame(data=timeds_n, index=[func.__name__ for func in funcs]).transpose().plot()
and varying k:
timeds_k = {}
for p in range(1, 10):
n = 10 ** 5
k = 2 ** p
arr = np.array(load.tolist() * n)
print(f"k = {k}")
timeds_k[k] = []
base = funcs[0](arr, k)
for func in funcs:
res = func(arr, k)
timed = %timeit -q -o func(arr, k)
timeds_k[k].append(timed.best)
print(f"{func.__name__:>24} {np.allclose(base, res)} {timed.best:.9f}")
pd.DataFrame(data=timeds_k, index=[func.__name__ for func in funcs]).transpose().plot()

Related

How to filter a elements of matrixes?

I want to get some pixel values from an image which are not equal with a specified value. But I want to get back in RGB format and not as a long vector. How can I do it?
import cv2
import numpy as np
image = cv2.imread('image.jpg')
sought = [36,255,12]
result = image[image!=sought]
import sys
np.set_printoptions(threshold=sys.maxsize)
print(result)
And I've got:
[20 103 75 21 98 70 16 100 72 18 101 73 19 97 69 15 95 66
15 95 67 13 101 73 19 104 77 21 96 69 13 94 65 8 99 69
14 98 68 13 94 63 10 88 66 24 92 69 24 92 67 23 93 67
13 93 67 13 93 67 13 97 72 16 96 70 16 93 66 15 96 68
.....
99 69 14 96 66 11 91 67 25 88 65 20 92 68 14 96 69 18
96 70 16 91 64 13 95 67 13 92 64 10 90 63]
But I want something like this:
[[[R,G,B], [R,G,B], [R,G,B], [R,G,B]],
......
[[R,G,B], [R,G,B], [R,G,B], [R,G,B]]]
What did I miss here?
If the wanted output is a list of pixels, then after the component-wise comparison, you must check what pixels differ on any of the R,G or B with .any(axis = 2):
image[(image != sought).any(axis=2)]
output of the form:
array([[ 22, 136, 161],
[197, 141, 153],
[173, 122, 65],
[137, 189, 67],
...
[166, 205, 238],
[207, 99, 129],
[ 44, 76, 97]])
With result = image[image != sought] you lost the shape of image.
The solution is to get a mask (image != sought) and work on the image with that mask (e.g. using np.where)
Generate some data:
import numpy as np
H, W, C = 8, 8, 3
sought = [255, 0, 0]
colors = np.array(
[sought, [0, 0, 255], [0, 255, 0], [0, 255, 255], [255, 0, 255], [255, 255, 0]]
)
colors_idxs = np.random.choice(np.arange(len(colors)), size=(H, W))
image = colors[colors_idxs]
Compute mask (note the keepdims=True for np.where to work easier):
mask = np.any(image != sought, axis=-1, keepdims=True)
# Get color back into the mask
mask_inpaint_pos = np.where(mask, image, 0)
mask_inpaint_neg = np.where(mask, 0, image)
Plot:
import matplotlib.pyplot as plt
fig, (ax_im, ax_mask, ax_mask_pos, ax_mask_neg) = plt.subplots(ncols=4, sharey=True)
ax_im.set_title("Original")
ax_im.imshow(image)
ax_mask.set_title("Mask binary")
ax_mask.imshow(mask.astype(int), cmap="gray")
ax_mask_pos.set_title("Mask True RGB")
ax_mask_pos.imshow(mask_inpaint_pos)
ax_mask_neg.set_title("Mask False RGB")
ax_mask_neg.imshow(mask_inpaint_neg)
plt.show()

How to convert elements of `str.split()` in Python?

I have the following:
[line.split(' ') for line in [
line.rstrip() for line in file.readlines()]]
which returns a list of list of strings. I know I could do the following to convert it to a list of list of integers:
for row in tree:
row[:] = map(int, row[:])
Can that be done inline as the lines are being processed?
Some sample data:
59
73 41
52 40 09
26 53 06 34
10 51 87 86 81
You could use
data = """
59
73 41
52 40 09
26 53 06 34
10 51 87 86 81
"""
result = [[int(x) for x in line.split()] for line in data.split("\n") if line]
print(result)
Which yields
[[59], [73, 41], [52, 40, 9], [26, 53, 6, 34], [10, 51, 87, 86, 81]]
Note that this only works if you only have integers.
To have some error management, you could use:
data = """
59 some junk here
73 41
52 40 09
26 53 06 34
10 51 87 86 81
"""
def makeint(line):
for x in line.split():
try:
yield int(x)
except ValueError:
pass
result = [[x for x in makeint(line)] for line in data.split("\n") if line]
print(result)

Getting a list of list of Nearest Neighbour within boundaries

I'm trying to return a list of list of vertical, horizontal and diagonal nearest neighbors of every item of a 2D numpy array
import numpy as np
import copy
tilemap = np.arange(99).reshape(11, 9)
print(tilemap)
def get_neighbor(pos, array):
x = copy.deepcopy(pos[0])
y = copy.deepcopy(pos[1])
grid = copy.deepcopy(array)
split = []
split.append([grid[y-1][x-1]])
split.append([grid[y-1][x]])
split.append([grid[y-1][x+1]])
split.append([grid[y][x - 1]])
split.append([grid[y][x+1]])
split.append([grid[y+1][x-1]])
split.append([grid[y+1][x]])
split.append([grid[y+1][x+1]])
print("\n Neighbors of ITEM[{}]\n {}".format(grid[y][x],split))
cordinates = [5, 6]
get_neighbor(pos=cordinates, array=tilemap)
i would want a list like this:
first item = 0
[[1],[12],[13],
[1,2], [12,24],[13,26],
[1,2,3], [12,24,36], [13,26,39]....
till it get to the boundaries completely then proceeds to second item = 1
and keeps adding to the list. if there is a neighbor above it should be add too..
MY RESULT
[[ 0 1 2 3 4 5 6 7 8]
[ 9 10 11 12 13 14 15 16 17]
[18 19 20 21 22 23 24 25 26]
[27 28 29 30 31 32 33 34 35]
[36 37 38 39 40 41 42 43 44]
[45 46 47 48 49 50 51 52 53]
[54 55 56 57 58 59 60 61 62]
[63 64 65 66 67 68 69 70 71]
[72 73 74 75 76 77 78 79 80]
[81 82 83 84 85 86 87 88 89]
[90 91 92 93 94 95 96 97 98]]
Neighbors of ITEM[59]
[[49], [50], [51], [58], [60], [67], [68], [69]]
Alright, what about a using a function like this? This takes the array, your target index, and the "radius" of the elements to be included.
def get_idx_adj(arr, idx, radius):
num_rows, num_cols = arr.shape
idx_row, idx_col = idx
slice_1 = np.s_[max(0, idx_row - radius):min(num_rows, idx_row + radius + 1)]
slice_2 = np.s_[max(0, idx_col - radius):min(num_cols, idx_col + radius + 1)]
return arr[slice_1, slice_2]
I'm currently trying to find the best way to transform the index of the element, so that the function can be used on its own output successively to get all the subarrays of various sizes.

Inconsistent python print output

(Python 2.7.12) - I have created an NxN array, when I print it I get the exact following output:
Sample a:
SampleArray=np.random.randint(1,100, size=(5,5))
[[49 72 88 56 41]
[30 73 6 43 53]
[83 54 65 16 34]
[25 17 73 10 46]
[75 77 82 12 91]]
Nice and clean.
However, when I go to sort this array by the elements in the 4th column using the code:
SampleArray=sorted(SampleArray, key=lambda x: x[4])
I get the following output:
Sample b:
[array([90, 9, 77, 63, 48]), array([43, 97, 47, 74, 53]), array([60, 64, 97, 2, 73]), array([34, 20, 42, 80, 76]), array([86, 61, 95, 21, 82])]
How can I get my output to stay in the format of 'Sample a'. It will make debugging much easier if I can see the numbers in a straight column.
Simply with numpy.argsort() routine:
import numpy as np
a = np.random.randint(1,100, size=(5,5))
print(a) # initial array
print(a[np.argsort(a[:, -1])]) # sorted array
The output for # initial array:
[[21 99 34 33 55]
[14 81 92 44 97]
[68 53 35 46 22]
[64 33 52 40 75]
[65 35 35 78 43]]
The output for # sorted array:
[[68 53 35 46 22]
[65 35 35 78 43]
[21 99 34 33 55]
[64 33 52 40 75]
[14 81 92 44 97]]
you just need to convert sample array back to a numpy array by using
SampleArray = np.array(SampleArray)
sample code:-
import numpy as np
SampleArray=np.random.randint(1,100, size=(5,5))
print (SampleArray)
SampleArray=sorted(SampleArray, key=lambda x: x[4])
print (SampleArray)
SampleArray = np.array(SampleArray)
print (SampleArray)
output:-
[[28 25 33 56 54]
[77 88 10 68 61]
[30 83 77 87 82]
[83 93 70 1 2]
[27 70 76 28 80]]
[array([83, 93, 70, 1, 2]), array([28, 25, 33, 56, 54]), array([77, 88, 10, 68, 61]), array([27, 70, 76, 28, 80]), array([30, 83, 77, 87, 82])]
[[83 93 70 1 2]
[28 25 33 56 54]
[77 88 10 68 61]
[27 70 76 28 80]
[30 83 77 87 82]]
This can help:
from pprint import pprint
pprint(SampleArray)
The output is a little bit different from the one for Sample A but it still looks neat and debugging will be easier.
Edit: here's my output
[[92 8 41 64 61]
[18 67 91 80 35]
[68 37 4 6 43]
[26 81 57 26 52]
[ 6 82 95 15 69]]
[array([18, 67, 91, 80, 35]),
array([68, 37, 4, 6, 43]),
array([26, 81, 57, 26, 52]),
array([92, 8, 41, 64, 61]),
array([ 6, 82, 95, 15, 69])]

python: database formatted in txt e skip first line and first two columns

I have e database as normal txt named DB.TXT ( delimiter Tab is applied only the numbers),like this:
Date Id I II III IV V
17-jan-13 aa 47 56 7 74 58
18-jan-13 ab 86 2 30 40 75
19-jan-13 ac 72 64 41 81 80
20-jan-13 ad 51 26 43 61 32
21-jan-13 ae 31 62 32 25 75
22-jan-13 af 60 83 18 35 5
23-jan-13 ag 29 8 47 12 69
I would like to know the code in Python for skip first line (Date, I, II, III, IV, V) and the first two columns ( Date and Id), while reading a text file. (With numbers residues should do sums and multiplications etc.)
After reading the txt file, it will appear like this:
47 56 7 74 58
86 2 30 40 75
72 64 41 81 80
51 26 43 61 32
31 62 32 25 75
60 83 18 35 5
29 8 47 12 69
The file is format txt, not CSV.
If you are only going to do calculations on the rows, you can simply do:
with open("data.txt") as fh:
fh.next()
for line in fh:
line = line.split() # This split works equally well for tabs and other spaces
do_something(line[2:])
If your needs are more complex, you're better off using a library like Pandas, which can take care of headers and label columns, as well as regex delimiters, and gives you easy access to columns:
import pandas
data = pandas.read_csv("blah.txt", sep="\s+", index_col=[0,1])
data.values # array of values as requested
data.sum() # sum of each column
data.product(axis=1) # product of each row
etc...
sep is a regex since you said it's not always \t, and index_col makes the first two columns column labels.
"the code in python" is pretty broad. Using numpy, it's:
In [21]: np.genfromtxt('db.txt',dtype=None,skip_header=1,usecols=range(2,6))
Out[21]:
array([[47, 56, 7, 74],
[86, 2, 30, 40],
[72, 64, 41, 81],
[51, 26, 43, 61],
[31, 62, 32, 25],
[60, 83, 18, 35],
[29, 8, 47, 12]])
Using the csv module, to skip the first line, just advance the file iterator by calling next(f). To skip the first two rows you could use row = row[2:]:
import csv
with open(filename, 'rb') as f:
next(f) # skip the first line
for row in csv.reader(f, delimiter='\t'):
row = row[2:] # skip the first two columns
row = map(int, row) # map the strings to ints

Categories

Resources