Calculate the sum of every 5 elements in a python array - python

I have a python array in which I want to calculate the sum of every 5 elements. In my case I have the array c with ten elements. (In reality it has a lot more elements.)
c = [1, 0, 0, 0, 0, 2, 0, 0, 0, 0]
So finally I would like to have a new array (c_new) which should show the sum of the first 5 elements, and the second 5 elements
So the result should be that one
1+0+0+0+0 = 1
2+0+0+0+0 = 2
c_new = [1, 2]
Thank you for your help
Markus

You can use np.add.reduceat by passing indices where you want to split and sum:
import numpy as np
c = [1, 0, 0, 0, 0, 2, 0, 0, 0, 0]
np.add.reduceat(c, np.arange(0, len(c), 5))
# array([1, 2])

Heres one way of doing it -
c = [1, 0, 0, 0, 0, 2, 0, 0, 0, 0]
print [sum(c[i:i+5]) for i in range(0, len(c), 5)]
Result -
[1, 2]

If five divides the length of your vector and it is contiguous then
np.reshape(c, (-1, 5)).sum(axis=-1)
It also works if it is non contiguous, but then it is typically less efficient.
Benchmark:
def aredat():
return np.add.reduceat(c, np.arange(0, len(c), 5))
def reshp():
np.reshape(c, (-1, 5)).sum(axis=-1)
c = np.random.random(10_000_000)
timeit(aredat, number=100)
3.8516048429883085
timeit(reshp, number=100)
3.09542763303034
So where possible, reshapeing seems a bit faster; reduceat has the advantage of gracefully handling non-multiple-of-five vectors.

why don't you use this ?
np.array([np.sum(i, axis = 0) for i in c.reshape(c.shape[0]//5,5,c.shape[1])])

There are various ways to achieve that. Will leave, below, two options using numpy built-in methods.
Option 1
numpy.sum and numpy.ndarray.reshape as follows
c_sum = np.sum(np.array(c).reshape(-1, 5), axis=1)
[Out]: array([1, 2])
Option 2
Using numpy.vectorize, a custom lambda function, and numpy.arange as follows
c_sum = np.vectorize(lambda x: sum(c[x:x+5]))(np.arange(0, len(c), 5))
[Out]: array([1, 2])

Related

Python/NumPy: find the first index of zero, then replace all elements with zero after that for each row

I have an numpy array like this:
a = np.array([[1, 0, 1, 1, 1],
[1, 1, 1, 1, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
Question 1:
As shown in the title, I want to replace all elements with zero after the first zero appeared. The result should be like this :
a = np.array([[1, 0, 0, 0, 0],
[1, 1, 1, 1, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])
Question 2: how to slice different columns for each row like this example?
As I am dealing with an array with large size. If any one could find an efficient way to solve this please. Thank you very much.
One way to accomplish question 1 is to use numpy.cumprod
>>> np.cumprod(a, axis=1)
array([[1, 0, 0, 0, 0],
[1, 1, 1, 1, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])
Question 1:
You could iterate over the array like so:
for i in range(a.shape[0]):
j = 0
row = a[i]
while row[j]>0:
j += 1
row[j+1:] = 0
This will change the array in-place. If you are interested in very high performance, the answers to this question could be of use to find the first zero faster. np.where scans the entire array for this and therefore is not optimal for the task.
Actually, the fastest solution will depend a bit on the distribution of your array entries: If there are many floats in there and rarely is there ever a zero, the while loops in the code above will interrupt late on average, requiring to write only "a few" zeros. If however there are only two possible entries like in your sample array and these occur with a similar probability (i.e. ~50%), there would be a lot of zeros to be written to a, and the following will be faster:
b = np.zeros(a.shape)
for i in range(a.shape[0]):
j = 0
a_row = a[i]
b_row = b[i]
while a_row[j]>0:
b_row[j] = a_row[j]
j += 1
Question 2:
If you mean to slice each row individually on a similar criterion dealing with a first occurence of some kind, you could simply adapt this iteration pattern. If the criterion is more global (like finding the maximum of the row, for example) built-in methods like np.where exist that will be more efficient, but it probably would depend a bit on the criterion itself which choice is best.
Question 1: An efficient way to do this would be the following.
import numpy as np
a = np.array([[1, 0, 1, 1, 1],
[1, 1, 1, 1, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
for row in a:
zeros = np.where(row == 0)[0]
if (len(zeros)):# Check if zero exists
row[zeros[0]:] = 0
print(a)
Output:
[[1 0 0 0 0]
[1 1 1 1 0]
[1 0 0 0 0]
[1 0 0 0 0]]
Question 2: Using the same array, for each row rowIdx, you can have a array of columns colIdxs that you want to extract from.
rowIdx = 2
colIdxs = [1, 3, 4]
print(a[rowIdx, colIdxs])
Output:
[0 1 1]
I prefer Ayrat's creative answer for the first question, but if you need to slice different columns for different rows in large size, this could help you:
indexer = tuple(np.s_[i:a.shape[1]] for i in (a==0).argmax(axis=1))
for i,j in enumerate(indexer):
a[i,j]=0
indexer:
(slice(1, 5, None), slice(4, 5, None), slice(1, 5, None), slice(1, 5, None))
or:
indexer = (a==0).argmax(axis=1)
for i in range(a.shape[0]):
a[i,indexer[i]:]=0
indexer:
[1 4 1 1]
output:
[[1 0 0 0 0]
[1 1 1 1 0]
[1 0 0 0 0]
[1 0 0 0 0]]

numpy bincount sequential slices of array

Given numpy row containing numbers from range(n),
I want to apply the following transformation:
[1 0 1 2] --> [[0 1 0] [1 1 0] [1 2 0] [1 2 1]]
We just go through the input list and bincount all elements to the left of the current (including).
import numpy as np
n = 3
a = np.array([1, 0, 1, 2])
out = []
for i in range(a.shape[0]):
out.append(np.bincount(a[:i+1], minlength=n))
out = np.array(out)
Is there any way to speed this up? I'm wondering if it's possible to get rid of that loop completely and use matrix magic only.
EDIT:
Thanks, lbragile, for mentioning list comprehensions. It's not what I meant. (I'm not sure if it's even significant asymptotically). I was thinking about some more complex things such as rewriting this based on how bincount operation works under the hood.
You can use cumsum like so:
idx = [1,0,1,2]
np.identity(np.max(idx)+1,int)[idx].cumsum(0)
# array([[0, 1, 0],
# [1, 1, 0],
# [1, 2, 0],
# [1, 2, 1]])
Using list comprehension:
fast_out = [np.bincount(a[:i+1], minlength=n) for i in range(a.shape[0])]
print(fast_out)
Output:
[array([0, 1, 0]), array([1, 1, 0]), array([1, 2, 0]), array([1, 2, 1])]
To time the code use the following:
import timeit
def timer(code_to_test):
elapsed_time = timeit.timeit(code_to_test, number=100)/100
print(elapsed_time)
your_code = """
import numpy as np
n = 3
a = np.array([1, 0, 1, 2])
out = []
for i in range(a.shape[0]):
out.append(np.bincount(a[:i+1], minlength=n))
out = np.array(out)
"""
list_comp_code = """
import numpy as np
n = 3
a = np.array([1, 0, 1, 2])
fast_out = [np.bincount(a[:i+1], minlength=n) for i in range(a.shape[0])]
"""
timer(your_code) # 0.001330663086846471
timer(list_comp_code) # 1.4601880684494972e-05
So the list comprehension method is over 91 times faster when averaged over 100 trials.

Effective way to find and delete certain elements from a numpy array

I have a numpy array with some positive numbers and some -1 elements. I want to find these elements with -1 values, delete them and store their indeces.
One way of doing it is iterating through the array and cheking if the value is -1. Is this the only way? If not, what about its effectivness? Isn't there a more effective python tool then?
With numpy.argwhere() and numpy.delete() routines:
import numpy as np
arr = np.array([1, 2, 3, -1, 4, -1, 5, 10, -1, 14])
indices = np.argwhere(arr == -1).flatten()
new_arr = np.delete(arr, indices)
print(new_arr) # [ 1 2 3 4 5 10 14]
print(indices.tolist()) # [3, 5, 8]
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.argwhere.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html
import numpy as np
yourarray=np.array([4,5,6,7,-1,2,3,-1,9,-1]) #say
rangenumpyarray=np.arange(len(yourarray)) # to create a column adjacent to your array of range
arra=np.hstack((rangenumpyarray.reshape(-1,1),yourarray.reshape(-1,1))) # combining both arrays as two columns
arra[arra[:,1]==-1][:,0] # learn boolean indexing
Use a combination of np.flatnonzero and simple boolean indexing.
x = array([ 0, 0, -1, 0, 0, -1, 0, -2, 0, 0])
m = x != -1 # generate a mask
idx = np.flatnonzero(~m)
x = x[m]
idx
array([2, 5])
x
array([ 0, 0, 0, 0, 0, -2, 0, 0])

Count occurrences of unique arrays in array

I have a numpy array of various one hot encoded numpy arrays, eg;
x = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 0]])
I would like to count the occurances of each unique one hot vector,
{[1, 0, 0]: 2, [0, 0, 1]: 1}
Approach #1
Seems like a perfect setup to use the new functionality of numpy.unique (v1.13 and newer) that lets us work along an axis of a NumPy array -
unq_rows, count = np.unique(x,axis=0, return_counts=1)
out = {tuple(i):j for i,j in zip(unq_rows,count)}
Sample outputs -
In [289]: unq_rows
Out[289]:
array([[0, 0, 1],
[1, 0, 0]])
In [290]: count
Out[290]: array([1, 2])
In [291]: {tuple(i):j for i,j in zip(unq_rows,count)}
Out[291]: {(0, 0, 1): 1, (1, 0, 0): 2}
Approach #2
For NumPy versions older than v1.13, we can make use of the fact that the input array is one-hot encoded array, like so -
_, idx, count = np.unique(x.argmax(1), return_counts=1, return_index=1)
out = {tuple(i):j for i,j in zip(x[idx],count)} # x[idx] is unq_rows
You could convert your arrays to tuples and use a Counter:
import numpy as np
from collections import Counter
x = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 0]])
Counter([tuple(a) for a in x])
# Counter({(1, 0, 0): 2, (0, 0, 1): 1})
The fastest way given your data format is:
x.sum(axis=0)
which gives:
array([2, 0, 1])
Where the 1st result is the count of arrays where the 1st is hot:
[1, 0, 0] [2
[0, 1, 0] 0
[0, 0, 1] 1]
This exploits the fact that only one can be on at a time, so we can decompose the direct sum.
If you absolutely need it expanded to the same format, it can be converted via:
sums = x.sum(axis=0)
{tuple(int(k == i) for k in range(len(sums))): e for i, e in enumerate(sums)}
or, similarly to tarashypka:
{tuple(row): count for row, count in zip(np.eye(len(sums), dtype=np.int64), sums)}
yields:
{(1, 0, 0): 2, (0, 1, 0): 0, (0, 0, 1): 1}
Here is another interesting solution with sum
>> {tuple(v): n for v, n in zip(np.eye(x.shape[1], dtype=int), np.sum(x, axis=0))
if n > 0}
{(0, 0, 1): 1, (1, 0, 0): 2}
Lists (including numpy arrays) are unhashable, i.e. they can't be keys of a dictionary. So your precise desired output, a dictionary with keys that look like [1, 0, 0] is never possible in Python. To deal with this you need to map your vectors to tuples.
from collections import Counter
import numpy as np
x = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 0]])
counts = Counter(map(tuple, x))
That will get you:
In [12]: counts
Out[12]: Counter({(0, 0, 1): 1, (1, 0, 0): 2})

building a specific sequence with python numpy

Hi I have two arrays as below.
1) An array made up of 1 and 0. 1 Signifies that its an active day and 0 is a holiday.
2) An arithmetic sequence which is smaller in length compared to array 1.
The result array needs to be a combination of 1) & 2) wherein the arithmetic sequence needs to follow the positions of 1s. In other words the array 2 needs to expanded to array 1 length with 0s inserted in the same position as array 1.
One way i could solve this is using numpy.insert with slice. However, since the lengths are different and the array 1 is dynamic i need an efficient way to achieve this.
Thanks
An alternative one-liner solution
Setup
binary = np.array([1, 1, 0, 1, 0, 1, 1, 0, 1, 0])
arithmetic = np.arange(1, 7)
Solution
#find 1s from binary and set values from arithmetic
binary[np.where(binary>0)] = arithmetic
Out[400]: array([1, 2, 0, 3, 0, 4, 5, 0, 6, 0])
Create the result array of the right length (len(binary)) filled with 0s, then use the binary array as a mask to assign into the result array. Make sure the binary mask is of the bool dtype.
>>> binary = np.array([1, 1, 0, 1, 0, 1, 1, 0, 1, 0], dtype=bool)
>>> arithmetic = np.arange(1, 7)
>>> result = np.zeros(len(binary), dtype=arithmetic.dtype)
>>> result[binary] = arithmetic
>>> result
array([1, 2, 0, 3, 0, 4, 5, 0, 6, 0])
Another way would be to create a copy of binary array and replace all non-zero values by arithmatic array
seq = np.arange(1, len(bi_arr[bi_arr != 0])+1)
final_result = bi_arr.copy()
final_result[final_result != 0] = seq
print(final_result)
array([1, 2, 0, 3, 0, 4, 5, 0, 6, 0])
The bi-arr remains unchanged

Categories

Resources