Python/NumPy: implementing a running sum (but not quite)

Python/NumPy: implementing a running sum (but not quite) - python

Given are two arrays of equal length, one holding data, one holding the results but initially set to zero, e.g.:
a = numpy.array([1, 0, 0, 1, 0, 1, 0, 0, 1, 1])
b = numpy.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
I'd like to compute the sum of all possible subsets of three adjacent elements in a. If the sum is 0 or 1, the three corresponding elements in b are left unchanged; only if the sum exceeds 1 are the three corresponding elements in b set to 1, so that after the computation b becomes
array([0, 0, 0, 1, 1, 1, 0, 1, 1, 1])
A simple loop will accomplish this:
for x in range(len(a)-2):
if a[x:x+3].sum() > 1:
b[x:x+3] = 1
After this, b has the desired form.
I have to do this for a large amount of data, so speed is an issue. Is there a faster way in NumPy to carry out the operation above?
(I understand this is similar to a convolution, but not quite the same).

You can start with a convolution, choose the values that exceed 1, and finally use a "dilation":
b = numpy.convolve(a, [1, 1, 1], mode="same") > 1
b = b | numpy.r_[0, b[:-1]] | numpy.r_[b[1:], 0]
Since this avoids the Python loop, it should be faster than your approach, but I didn't do timings.
An alternative is to use a second convolution to dilate:
kernel = [1, 1, 1]
b = numpy.convolve(a, kernel, mode="same") > 1
b = numpy.convolve(b, kernel, mode="same") > 0
If you have SciPy available, yet another option for the dilation is
b = numpy.convolve(a, [1, 1, 1], mode="same") > 1
b = scipy.ndimage.morphology.binary_dilation(b)
Edit: By doing some timings, I found that this solution seems to be fastest for large arrays:
b = numpy.convolve(a, kernel) > 1
b[:-1] |= b[1:] # Shift and "smearing" to the *left* (smearing with b[1:] |= b[:-1] does not work)
b[:-1] |= b[1:] # … and again!
b = b[:-2]
For an array of one million entries, it was more than 200 times faster than your original approach on my machine. As pointed out by EOL in the comments, this solution might be considered a bit fragile, though, since it depends on implementation details of NumPy.

You can calculate the "convolution" sums in an efficient way with:
>>> a0 = a[:-2]
>>> a1 = a[1:-1]
>>> a2 = a[2:]
>>> a_large_sum = a0 + a1 + a2 > 1
Updating b can then be done efficiently by writing something that means "at least one of the three neighboring a_large_sum values is True": you first extend you a_large_sum array back to the same number of elements as a (to the right, to the left and to the right, and then to the left):
>>> a_large_sum_0 = np.hstack([a_large_sum, [False, False]])
>>> a_large_sum_1 = np.hstack([[False], a_large_sum, [False]])
>>> a_large_sum_2 = np.hstack([[False, False], a_large_sum])
You then obtain b in an efficient way:
>>> b = a_large_sum_0 | a_large_sum_1 | a_large_sum_2
This gives the result that you obtain, but in a very efficient way, through a leveraging of NumPy internal fast loops.
PS: This approach is essentially the same as Sven's first solution, but is way more pedestrian than Sven's elegant code; it is as fast, however. Sven's second solution (double convolve()) is even more elegant, and it is twice as fast.

You might also like to have a look at NumPy's stride_tricks. Using Sven's timing setup (see link in Sven's answer), I found that for (very) large arrays, this is also a fast way to do what you want (i.e. with your definition of a):
shape = (len(a)-2,3)
strides = a.strides+a.strides
a_strided = numpy.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
b = np.r_[numpy.sum(a_strided, axis=-1) > 1, False, False]
b[2:] |= b[1:-1] | b[:-2]
After edit (see comments below) it is no longer the fastest way.
This creates a specially strided view on your original array. The data in a is not copied, but is simply viewed in a new way. We want to basically make a new array in which the last index contains the sub-arrays that we want to sum (i.e. the three elements that you want to sum). This way, we can easily sum in the end with the last command.
The last element of this new shape therefore has to be 3, and the first element will be the length of the old a minus 2 (because we can only sum up to the -2nd element).
The strides list contains the strides, in bytes, that the new array a_strided needs to make to get to the next element in each of the dimensions of the shape. If you set these equal, it means that a_strided[0,1] and a_strided[1,0] will both be a[1], which is exactly what we want. In a normal array this would not be the case (the first stride would be "size-of-first-dimension times length-of-array-first-dimension (= shape[0])"), but in this case we can make good use of it.
Not sure if I explained this all really well, but just print out a_strided and you'll see what the result is and how easy this makes the operation.

Related

Function Failing at Large List Sizes

I have a question: Starting with a 1-indexed array of zeros and a list of operations, for each operation add a value to each the array element between two given indices, inclusive. Once all operations have been performed, return the maximum value in the array.
Example: n = 10, Queries = [[1,5,3],[4,8,7],[6,9,1]]
The following will be the resultant output after iterating through the array, Index 1-5 will have 3 added to it etc...:
[0,0,0, 0, 0,0,0,0,0, 0]
[3,3,3, 3, 3,0,0,0,0, 0]
[3,3,3,10,10,7,7,7,0, 0]
[3,3,3,10,10,8,8,8,1, 0]
Finally you output the max value in the final list:
[3,3,3,10,10,8,8,8,1, 0]
My current solution:
def Operations(size, Array):
ResultArray = [0]*size
Values = [[i.pop(2)] for i in Array]
for index, i in enumerate(Array):
#Current Values in = Sum between the current values in the Results Array AND the added operation of equal length
#Results Array
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
Result = max(ResultArray)
return Result
def main():
nm = input().split()
n = int(nm[0])
m = int(nm[1])
queries = []
for _ in range(m):
queries.append(list(map(int, input().rstrip().split())))
result = Operations(n, queries)
if __name__ == "__main__":
main()
Example input: The first line contains two space-separated integers n and m, the size of the array and the number of operations.
Each of the next m lines contains three space-separated integers a,b and k, the left index, right index and summand.
5 3
1 2 100
2 5 100
3 4 100
Compiler Error at Large Sizes:
Runtime Error
Currently this solution is working for smaller final lists of length 4000, however in order test cases where length = 10,000,000 it is failing. I do not know why this is the case and I cannot provide the example input since it is so massive. Is there anything clear as to why it would fail in larger cases?

I think the problem is that you make too many intermediary trow away list here:
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
this ResultArray[i[0]-1:i[1]] result in a list and you do it twice, and one is just to get the size, which is a complete waste of resources, then you make another list with Values[index]*len(...) and finally compile that into yet another list that will also be throw away once it is assigned into the original, so you make 4 throw away list, so for example lets said the the slice size is of 5.000.000, then you are making 4 of those or 20.000.000 extra space you are consuming, 15.000.000 of which you don't really need, and if your original list is of 10.000.000 elements, well just do the math...
You can get the same result for your list(map(...)) with list comprehension like
[v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] ]
now we use two less lists, and we can reduce one list more by making it a generator expression, given that slice assignment does not need that you assign a list specifically, just something that is iterable
(v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] )
I don't know if internally the slice assignment it make it a list first or not, but hopefully it doesn't, and with that we go back to just one extra list
here is an example
>>> a=[0]*10
>>> a
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> a[1:5] = (3+v for v in a[1:5])
>>> a
[0, 3, 3, 3, 3, 0, 0, 0, 0, 0]
>>>
we can reduce it to zero extra list (assuming that internally it doesn't make one) by using itertools.islice
>>> import itertools
>>> a[3:7] = (1+v for v in itertools.islice(a,3,7))
>>> a
[0, 3, 3, 4, 4, 1, 1, 0, 0, 0]
>>>

Efficient Logical AND of Every Combination of Two Mask Elements

I am looking to take a numpy array which is a 1D boolean mask of size N, and transform it into a new mask where each element represents a boolean AND over two mask elements (I don't want to repeat the same combinations twice since the order has no importance for the logical 'AND').
Example input:
mask = [1, 0, 1] = [a, b, c]
Expected output:
newmask = [1*0, 1*1, 0*1] = [0, 1, 0] = [a*b, a*c, b*c]

From a list of elements you can create all possibile combinations of them where their order doesn't matter, without wasting time on repeated combinations:
from itertools import combinations_with_replacement
import numpy as np
n = 3
elements_to_combine = [0, 1]
for c in combinations_with_replacement(elements_to_combine, n):
x = np.array(list(c))
print(x)
and the output is:
[0, 0, 0]
[0, 0, 1]
[0, 1, 1]
[1, 1, 1]
Now you have a straight foward method to compute only the combinations you need. You may also add elements to the list "elements_to_combine" and you may also increase the size of n according to your needs. Since you didn't specify precisely the kind of elmeents to be used and how you intend to mask your elements using the logical AND operations, I will leave the rest to you. Hope this solves your performance issues.
Cheers!

Assigning to all entries whose indices sum to some value

I have an array X of binary numbers and shape (2, 2, ..., 2), and would like to assign the value 1 to all entries whose indices sum to 0 modulo 2 and the value 0 to the rest.
For example, if we had X.shape = (2, 2, 2) then I would like to assign 1 to X[0, 0, 0], X[0, 1, 1], X[1, 0, 1], X[1, 1, 0] and 0 to the other 4 entries.
What is the most efficient way of doing this? I assume I should create this array with the np.bool datatype, so the solution should work with that in mind.

Here are a direct method and a tricksy one. The tricksy one uses bit packing and exploits certain repetitive patterns. For large n this gives a considerable speedup (>50 # n=19).
import functools as ft
import numpy as np
def direct(n):
I = np.arange(2, dtype='u1')
return ft.reduce(np.bitwise_xor, np.ix_(I[::-1], *(n-1)*(I,)))
def smartish(n):
assert n >= 6
b = np.empty(1<<(n-3), 'u1')
b[[0, 3, 5, 6]] = 0b10010110
b[[1, 2, 4, 7]] = 0b01101001
i = b.view('u8')
jp = 1
for j in range(0, n-7, 2):
i[3*jp:4*jp] = i[:jp]
i[jp:3*jp].reshape(2, -1)[...] = 0xffff_ffff_ffff_ffff ^ i[:jp]
jp *= 4
if n & 1:
i[jp:] = 0xffff_ffff_ffff_ffff ^ i[:jp]
return np.unpackbits(b).reshape(n*(2,))
from timeit import timeit
assert np.all(smartish(19) == direct(19))
print(f"direct {timeit(lambda: direct(19), number=100)*10:.3f} ms")
print(f"smartish {timeit(lambda: smartish(19), number=100)*10:.3f} ms")
Sample run on a 2^19 box:
direct 5.408 ms
smartish 0.079 ms
Please note that these return uint8 arrays, for example:
>>> direct(3)
array([[[1, 0],
[0, 1]],
[[0, 1],
[1, 0]]], dtype=uint8)
But these can be view-cast to bool at virtually zero cost:
>>> direct(3).view('?')
array([[[ True, False],
[False, True]],
[[False, True],
[ True, False]]])
Explainer:
direct method: One straight-forward way of checking bit parity is to xor the bits together. We need to do this in a "reducing" way, i.e. we have to apply the binary operation xor to the first two operands, then to the result and the third operand, then to that result and the fourth operand and so forth. This is what functools.reduce does.
Also, we don't want to do this just once but on each point of a 2^n grid. The numpy way of doing this are open grids. These can be generated from 1D axes using np.ix_ or in simple cases using np.ogrid. Note that we flip the very first axis to account for the fact that we want inverted parity.
smartish method. We make two main optimizations. 1) xor is a bitwise operation meaning that it does "64-way parallel computation" for free if we pack our bits into a 64 bit uint. 2) If we flatten the 2^n hypercube then position n in the linear arrangement corresponds to cell (bit1, bit2, bit3, ...) in the hypercube where bit1, bit2 etc. is the binary representation (with leading zeros) of n. Now note that if we have computed the parities of positions 0 .. 0b11..11 = 2^k-1 then we can get the parities of 2^k..2^(k+1)-1 by simply copying and inverting the already computed parities. For example k = 2:
0b000, 0b001, 0b010, 0b011 would be what we have and
0b100, 0b101, 0b110, 0b111 would be what we need to compute
^ ^ ^ ^
Since these two sequences differ only in the marked bit it is clear that indeed their cross digit sums differ by one and the parities are inverted.
As an exercise work out what can be said in a similar vein about the next 2^k entries and the 2^k entries after those.

Loop over clump_masked indices

I have an array y_filtered that contains some masked values. I want to replace these values by some value I calculate based on their neighbouring values. I can get the indices of the masked values by using masked_slices = ma.clump_masked(y_filtered). This returns a list of slices, e.g. [slice(194, 196, None)].
I can easily get the values from my masked array, by using y_filtered[masked_slices], and even loop over them. However, I need to access the index of the values as well, so i can calculate its new value based on its neighbours. Enumerate (logically) returns 0, 1, etc. instead of the indices I need.
Here's the solution I came up with.
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
y_enum = [(i, y_i) for i, y_i in zip(range(len(y_filtered)), y_filtered)]
for sl in masked_slices:
for i, y_i in y_enum[sl]:
# simplified example calculation
y_filtered[i] = np.average(y_filtered[i-2:i+2])
It is very ugly method i.m.o. and I think there has to be a better way to do this. Any suggestions?
Thanks!

EDIT:
I figured out a better way to achieve what I think you want to do. This code picks every window of 5 elements and compute its (masked) average, then uses those values to fill the gaps in the original array. If some index does not have any unmasked value close enough it will just leave it as masked:
import numpy as np
from numpy.lib.stride_tricks import as_strided
SMOOTH_MARGIN = 2
x = np.ma.array(data=[1, 2, 3, 4, 5, 6, 8, 9, 10],
mask=[0, 1, 0, 0, 1, 1, 1, 1, 0])
print(x)
# [1 -- 3 4 -- -- -- -- 10]
pad_data = np.pad(x.data, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant')
pad_mask = np.pad(x.mask, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant',
constant_values=True)
k = 2 * SMOOTH_MARGIN + 1
isize = x.dtype.itemsize
msize = x.mask.dtype.itemsize
x_pad = np.ma.array(
data=as_strided(pad_data, (len(x), k), (isize, isize), writeable=False),
mask=as_strided(pad_mask, (len(x), k), (msize, msize), writeable=False))
x_avg = np.ma.average(x_pad, axis=1).astype(x_pad.dtype)
fill_mask = ~x_avg.mask & x.mask
result = x.copy()
result[fill_mask] = x_avg[fill_mask]
print(result)
# [1 2 3 4 3 4 10 10 10]
(note all the values are integers here because x was originally of integer type)
The original posted code has a few errors, firstly it both reads and writes values from y_filtered in the loop, so the results of later indices are affected by the previous iterations, this could be fixed with a copy of the original y_filtered. Second, [i-2:i+2] should probably be [max(i-2, 0):i+3], in order to have a symmetric window starting at zero or later always.
You could do this:
from itertools import chain
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
for idx in chain.from_iterable(range(s.start, s.stop) for s in masked_slices):
y_filtered[idx] = np.average(y_filtered[max(idx - 2, 0):idx + 3])

Identify if list has consecutive elements that are equal

I'm trying to identify if a large list has consecutive elements that are the same.
So let's say:
lst = [1, 2, 3, 4, 5, 5, 6]
And in this case, I would return true, since there are two consecutive elements lst[4] and lst[5], are the same value.
I know this could probably be done with some sort of combination of loops, but I was wondering if there were a more efficient way to do this?

You can use itertools.groupby() and a generator expression within any()
*:
>>> from itertools import groupby
>>> any(sum(1 for _ in g) > 1 for _, g in groupby(lst))
True
Or as a more Pythonic way you can use zip(), in order to check if at least there are two equal consecutive items in your list:
>>> any(i==j for i,j in zip(lst, lst[1:])) # In python-2.x,in order to avoid creating a 'list' of all pairs instead of an iterator use itertools.izip()
True
Note: The first approach is good when you want to check if there are more than 2 consecutive equal items, otherwise, in this case the second one takes the cake!
* Using sum(1 for _ in g) instead of len(list(g)) is very optimized in terms of memory use (not reading the whole list in memory at once) but the latter is slightly faster.

You can use a simple any condition:
lst = [1, 2, 3, 4, 5, 5, 6]
any(lst[i]==lst[i+1] for i in range(len(lst)-1))
#outputs:
True
any return True if any of the iterable elements are True

If you're looking for an efficient way of doing this and the lists are numerical, you would probably want to use numpy and apply the diff (difference) function:
>>> numpy.diff([1,2,3,4,5,5,6])
array([1, 1, 1, 1, 0, 1])
Then to get a single result regarding whether there are any consecutive elements:
>>> numpy.any(~numpy.diff([1,2,3,4,5,5,6]).astype(bool))
This first performs the diff, inverts the answer, and then checks if any of the resulting elements are non-zero.
Similarly,
>>> 0 in numpy.diff([1, 2, 3, 4, 5, 5, 6])
also works well and is similar in speed to the np.any approach (credit for this last version to heracho).

Here a more general numpy one-liner:
number = 7
n_consecutive = 3
arr = np.array([3, 3, 6, 5, 8, 7, 7, 7, 4, 5])
# ^ ^ ^
np.any(np.convolve(arr == number, v=np.ones(n_consecutive), mode='valid')
== n_consecutive)[0]
This method always searches the whole array, while the approach from #Kasramvd ends when the condition is first met. So which method is faster dependents on how sparse those cases of consecutive numbers are.
If you are interested in the positions of the consecutive numbers, and have to look at all elements of the array this approach should be faster (for larger arrays (or/and longer sequences)).
idx = np.nonzero(np.convolve(arr==number, v=np.ones(n_consecutive), mode='valid')
== n_consecutive)
# idx = i: all(arr[i:i+n_consecutive] == number)
If you are not interested in a specific value but at all consecutive numbers in general a slight variation of #jmetz's answer:
np.any(np.convolve(np.abs(np.diff(arr)), v=np.ones(n_consecutive-1), mode='valid') == 0)
# ^^^^^^
# EDIT see djvg's comment

Starting in Python 3.10, the new pairwise function provides a way to slide through pairs of consecutive elements, so that we can test the quality between consecutive elements:
from itertools import pairwise
any(x == y for (x, y) in pairwise([1, 2, 3, 4, 5, 5, 6]))
# True
The intermediate result of pairwise:
pairwise([1, 2, 3, 4, 5, 5, 6])
# [(1, 2), (2, 3), (3, 4), (4, 5), (5, 5), (5, 6)]

A simple for loop should do it:
def check(lst):
last = lst[0]
for num in lst[1:]:
if num == last:
return True
last = num
return False
lst = [1, 2, 3, 4, 5, 5, 6]
print (check(lst)) #Prints True
Here, in each loop, I check if the current element is equal to the previous element.

The convolution approach suggested in scleronomic's answer is very promising, especially if you're looking for more than two consecutive elements.
However, the implementation presented in that answer might not be the most efficient, because it consists of two steps: diff() followed by convolve().
Alternative implementation
If we consider that the diff() can also be calculated using convolution, we can combine the two steps into a single convolution.
The following alternative implementation only requires a single convolution of the full signal, which is advantageous if the signal has many elements.
Note that we cannot take the absolute values of the diff (to prevent false positives, as mentioned in this comment), so we add some random noise to the unit kernel instead.
# example signal
signal = numpy.array([1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0])
# minimum number of consecutive elements
n_consecutive = 3
# convolution kernel for weighted moving sum (with small random component)
rng = numpy.random.default_rng()
random_kernel = 1 + 0.01 * rng.random(n_consecutive - 1)
# convolution kernel for first-order difference (similar to numpy.diff)
diff_kernel = [1, -1]
# combine the kernels so we only need to do one convolution with the signal
combined_kernel = numpy.convolve(diff_kernel, random_kernel, mode='full')
# convolve the signal to get the moving weighted sum of differences
moving_sum_of_diffs = numpy.convolve(signal, combined_kernel, mode='valid')
# check if moving sum is zero anywhere
result = numpy.any(moving_sum_of_diffs == 0)
See the DSP guide for a detailed discussion of convolution.
Timing
The difference between the two implementations boils down to this:
def original(signal, unit_kernel):
return numpy.convolve(numpy.abs(numpy.diff(signal)), unit_kernel, mode='valid')
def alternative(signal, combined_kernel):
return numpy.convolve(signal, combined_kernel, mode='valid')
where unit_kernel = numpy.ones(n_consecutive - 1) and combined_kernel is defined above.
Comparison of these two functions, using timeit, shows that alternative() can be several times faster, for small kernel sizes (i.e. small value of n_consecutive). However, for large kernel sizes the advantage becomes negligible, because the convolution becomes dominant (compared to the diff).
Notes:
For large kernel sizes I would prefer the original two-step approach, as I think it is easier to understand.
Due to numerical issues it may be necessary to replace numpy.any(moving_sum_of_diffs == 0) by numpy.any(numpy.abs(moving_sum_of_diffs) < very_small_number), see e.g. here.

My solution for this if you want to find out whether 3 consecutive values are equal to 7. For example, a tuple of intList = (7, 7, 7, 8, 9, 1):
for i in range(len(intList) - 1):
if intList[i] == 7 and intList[i + 2] == 7 and intList[i + 1] == 7:
return True
return False

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python/NumPy: implementing a running sum (but not quite) - python

Related

Function Failing at Large List Sizes

Efficient Logical AND of Every Combination of Two Mask Elements

Assigning to all entries whose indices sum to some value

Loop over clump_masked indices

Identify if list has consecutive elements that are equal

Categories

Resources