Numpy: Finding minimum and maximum values from associations through binning - python

Prerequisite
This is a question derived from this post. So, some of the introduction of the problem will be similar to that post.
Problem
Let's say result is a 2D array and values is a 1D array. values holds some values associated with each element in result. The mapping of an element in values to result is stored in x_mapping and y_mapping. A position in result can be associated with different values. Now, I have to find the minimum and maximum of the values grouped by associations.
An example for better clarification.
min_result array:
[[0, 0],
[0, 0],
[0, 0],
[0, 0]]
max_result array:
[[0, 0],
[0, 0],
[0, 0],
[0, 0]]
values array:
[ 1., 2., 3., 4., 5., 6., 7., 8.]
Note: Here result arrays and values have the same number of elements. But it might not be the case. There is no relation between the sizes at all.
x_mapping and y_mapping have mappings from 1D values to 2D result(both min and max). The sizes of x_mapping, y_mapping and values will be the same.
x_mapping - [0, 1, 0, 0, 0, 0, 0, 0]
y_mapping - [0, 3, 2, 2, 0, 3, 2, 1]
Here, 1st value(values[0]) and 5th value(values[4]) have x as 0 and y as 0(x_mapping[0] and y_mappping[0]) and hence associated with result[0, 0]. If we compute the minimum and maximum from this group, we will have 1 and 5 as results respectively. So, min_result[0, 0] will have 1 and max_result[0, 0] will have 5.
Note that if there is no association at all then the default value for result will be zero.
Current working solution
x_mapping = np.array([0, 1, 0, 0, 0, 0, 0, 0])
y_mapping = np.array([0, 3, 2, 2, 0, 3, 2, 1])
values = np.array([ 1., 2., 3., 4., 5., 6., 7., 8.], dtype=np.float32)
max_result = np.zeros([4, 2], dtype=np.float32)
min_result = np.zeros([4, 2], dtype=np.float32)
min_result[-y_mapping, x_mapping] = values # randomly initialising from values
for i in range(values.size):
x = x_mapping[i]
y = y_mapping[i]
# maximum
if values[i] > max_result[-y, x]:
max_result[-y, x] = values[i]
# minimum
if values[i] < min_result[-y, x]:
min_result[-y, x] = values[i]
min_result,
[[1., 0.],
[6., 2.],
[3., 0.],
[8., 0.]]
max_result,
[[5., 0.],
[6., 2.],
[7., 0.],
[8., 0.]]
Failed solutions
#1
min_result = np.zeros([4, 2], dtype=np.float32)
np.minimum.reduceat(values, [-y_mapping, x_mapping], out=min_result)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-126de899a90e> in <module>()
1 min_result = np.zeros([4, 2], dtype=np.float32)
----> 2 np.minimum.reduceat(values, [-y_mapping, x_mapping], out=min_result)
ValueError: object too deep for desired array
#2
min_result = np.zeros([4, 2], dtype=np.float32)
np.minimum.reduceat(values, lidx, out= min_result)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-24-07e8c75ccaa5> in <module>()
1 min_result = np.zeros([4, 2], dtype=np.float32)
----> 2 np.minimum.reduceat(values, lidx, out= min_result)
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (4,2)->(4,) (8,)->() (8,)->(8,)
#3
lidx = ((-y_mapping) % 4) * 2 + x_mapping #from mentioned post
min_result = np.zeros([8], dtype=np.float32)
np.minimum.reduceat(values, lidx, out= min_result).reshape(4,2)
[[1., 4.],
[5., 5.],
[1., 3.],
[5., 7.]]
Question
How to use np.minimum.reduceat and np.maximum.reduceat for solving this problem? I'm looking for a solution that is optimised for runtime.
Side note
I'm using Numpy version 1.14.3 with Python 3.5.2

Approach #1
Again, the most intuitive ones would be with numpy.ufunc.at.
Now, since, these reductions would be performed against the existing values, we need to initialize the output with max values for minimum reductions and min values for maximum ones. Hence, the implementation would be -
min_result[-y_mapping, x_mapping] = values.max()
max_result[-y_mapping, x_mapping] = values.min()
np.minimum.at(min_result, [-y_mapping, x_mapping], values)
np.maximum.at(max_result, [-y_mapping, x_mapping], values)
Approach #2
To leverage np.ufunc.reduceat, we need to sort data -
m,n = max_result.shape
out_dtype = max_result.dtype
lidx = ((-y_mapping)%m)*n + x_mapping
sidx = lidx.argsort()
idx = lidx[sidx]
val = values[sidx]
m_idx = np.flatnonzero(np.r_[True,idx[:-1] != idx[1:]])
unq_ids = idx[m_idx]
max_result_out.flat[unq_ids] = np.maximum.reduceat(val, m_idx)
min_result_out.flat[unq_ids] = np.minimum.reduceat(val, m_idx)

Related

Fastest method for mapping an array of Boolean True counts to a Boolean Array

I have a 1D array of Boolean "True" counts that I want to map to a 2D array.
#Array of boolean True counts
b = [1,3,2,5]
#want this 2D array:
[1,1,1,1]
[0,1,1,1]
[0,1,0,1]
[0,0,0,1]
[0,0,0,1]
The faster the implementation (NumPy/SciPy) the better.
Thank you
Pure numpy method, using np.tri and advanced indexing:
b = np.array([1,3,2,5])
k = b.max()
np.tri(k+1,k,-1,dtype=int)[b].T
# array([[1, 1, 1, 1],
# [0, 1, 1, 1],
# [0, 1, 0, 1],
# [0, 0, 0, 1],
# [0, 0, 0, 1]])
UPDATE:
Two solns that should work better if k >> len(b). m5 and m6 in the benchmarks.
Benchmark code borrowed and extended from #Ehsan, 2nd condition. Changes: Added m5,m6. Reduced highest test size from 1000 to 200. Changed output dtype from int to int8.
Interesting observation; my original solution m2 performs significantly worse on my (low RAM) computer than on #Ehsan's.
Code (new functions only):
##Paul's solution 2
def m5(b):
k = b.max()
n = b.size
return (np.arange(1,2*n+1,dtype=np.int8)&1).repeat(np.ravel([b,k-b],order="F")).reshape(k,n,order="F")
##Paul's solution 3
def m6(b):
k = b.max()
mytri = np.array([1,0],dtype=np.int8).repeat(k)
mytri = np.lib.stride_tricks.as_strided(mytri[k:],(k,k+1),
(mytri.strides[0],-mytri.strides[0]))
return mytri[:,b]
Try:
pd.DataFrame([[1]*x for x in [1,3,2,5]]).T.fillna(0).values
output:
array([[1., 1., 1., 1.],
[0., 1., 1., 1.],
[0., 1., 0., 1.],
[0., 0., 0., 1.],
[0., 0., 0., 1.]])
You can create array of zeroes of shape required:
arr = np.zeros((np.max(b), len(b)))
Then you can create a temporary array x = np.indices(arr.shape)[0] which is:
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4]])
And pad arr with ones like so:
arr[np.where(x<b)] = 1
Numpy approach without the need to create tri in case b.max() is large:
b = np.array([1,3,2,5])
r, c = b.size, b.max()
a = np.zeros((c,r), dtype=int)
a[np.arange(c)[:,None]<b] = 1
output:
[[1 1 1 1]
[0 1 1 1]
[0 1 0 1]
[0 0 0 1]
[0 0 0 1]]
Comparison using benchit:
##Ehsan's solution
def m1(b):
r, c = b.size, b.max()
a = np.zeros((c,r), dtype=int)
a[np.arange(c)[:,None]<b] = 1
return a
##Paul's solution
def m2(b):
k = b.max()
return np.tri(k+1,k,-1,dtype=int)[b].T
##Binyamin's solution
def m3(b):
return pd.DataFrame([[1]*x for x in b]).T.fillna(0).values
##mathfux's solution
def m4(b):
arr = np.zeros((np.max(b), len(b)), dtype=int)
x = np.indices(arr.shape)[0]
arr[np.where(x<b)] = 1
return arr
For different inputs:
in_ = [np.random.randint(100, size=n) for n in [10,100,1000,10000]]
in_ = [np.random.randint(n, size=n) for n in [10,100,1000,10000]]
So what you pick depends on your b.max() value vs. b.size. For larger b.max() values (compared to b.size), m1 is faster and for smaller b.max() (compared to b.size), m2 seems to be faster.
UPDATE: Adding a new solution and comparison with #Paul's new solutions:
##Ehsan's solution 2
def m7(b):
return np.less.outer(np.arange(b.max()),b)+0
Or almost equally:
def m8(b):
return (np.arange(b.max())<b[:,None]).T+0
comparison:
in_ = [np.random.randint(10, size=n) for n in [10,100,1000]]
in_ = [np.random.randint(10000, size=n) for n in [10,100,1000,10000]]
including m8:
in_ = [np.random.randint(10000, size=n) for n in [10,100,1000]]

How to efficiently filter maximum elements of a matrix per row

Given a 2D array, I'm looking for a pythonic way to get an array of same shape, with only the maximum element per each row.
See max_row_filter function below
def max_row_filter(mat2d):
m = np.zeros(mat2d.shape)
for r in range(mat2d.shape[0]):
c = np.argmax(mat2d[r])
m[r,c]=mat2d[r,c]
return m
p = np.array([[1,2,3],[5,4,3,],[9,10,3]])
max_row_filter(p)
Out: array([[ 0., 0., 3.],
[ 5., 0., 0.],
[ 0., 10., 0.]])
I'm looking for an efficient way to do this, suitable to be done on big arrays.
Alternative answer (this will keep duplicates):
p * (p==p.max(axis=1, keepdims=True))
If there are no duplicates, you could use numpy.argmax:
import numpy as np
p = np.array([[1, 2, 3],
[5, 4, 3, ],
[9, 10, 3]])
result = np.zeros_like(p)
rows, cols = zip(*enumerate(np.argmax(p, axis=1)))
result[rows, cols] = p[rows, cols]
print(result)
Output
[[ 0 0 3]
[ 5 0 0]
[ 0 10 0]]
Note that, for multiple occurrences argmax return the first occurence.

Numpy: Finding count of distinct values from associations through binning

Prerequisite
This is a question is an extension of this post. So, some of the introduction of the problem will be similar to that post.
Problem
Let's say result is a 2D array and values is a 1D array. values holds some values associated with each element in result. The mapping of an element in values to result is stored in x_mapping and y_mapping. A position in result can be associated with different values. (x,y) pair from x_mapping and y_mapping is associated with results[-y,x]. I have to find the unique count of the values grouped by associations.
An example for better clarification.
result array:
[[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]]
values array:
[ 1., 2., 1., 1., 5., 6., 7., 1.]
Note: Here result arrays and values have the same number of elements. But it might not be the case. There is no relation between the sizes at all.
x_mapping and y_mapping have mappings from 1D values to 2D result. The sizes of x_mapping, y_mapping and values will be the same.
x_mapping - [0, 1, 0, 0, 0, 0, 0, 0]
y_mapping - [0, 3, 2, 2, 0, 3, 2, 0]
Here, 1st value(values[0]), 5th value(values[4]) and 8th value(values[7]) have x as 0 and y as 0 (x_mapping[0] and y_mappping[0]) and hence associated with result[0, 0]. If we compute the count of distinct values from this group- (1,5,1), we will have 2 as result.
#WarrenWeckesser
Let's see how [1, 3] (x,y) pair from x_mapping and y_mapping contribute to results. Since there is only one value, ie 2, associated with this particular group, the results[-3,1] will have one as the number of distinct values associated with that cell is one.
Another example. Let's compute the value of results[-1,1]. From mappings, since there is no value associated with the cell, the value of results[-1,1] will be zero.
Similarly, the position [-2, 0] in results will have value 2.
Note that if there is no association at all then the default value for result will be zero.
The result after computation,
[[ 2., 0.],
[ 1., 1.],
[ 2., 0.],
[ 0., 0.]]
Current working solution
Using the answer from #Divakar, I was able to find a working solution.
x_mapping = np.array([0, 1, 0, 0, 0, 0, 0, 0])
y_mapping = np.array([0, 3, 2, 2, 0, 3, 2, 0])
values = np.array([ 1., 2., 1., 1., 5., 6., 7., 1.], dtype=np.float32)
result = np.zeros([4, 2], dtype=np.float32)
m,n = result.shape
out_dtype = result.dtype
lidx = ((-y_mapping)%m)*n + x_mapping
sidx = lidx.argsort()
idx = lidx[sidx]
val = values[sidx]
m_idx = np.flatnonzero(np.r_[True,idx[:-1] != idx[1:]])
unq_ids = idx[m_idx]
r_res = np.zeros(m_idx.size, dtype=np.float32)
for i in range(0, m_idx.shape[0]):
_next = None
arr = None
if i == m_idx.shape[0]-1:
_next = val.shape[0]
else:
_next = m_idx[i+1]
_start = m_idx[i]
if _start >= _next:
arr = val[_start]
else:
arr = val[_start:_next]
r_res[i] = np.unique(arr).size
result.flat[unq_ids] = r_res
Question
Now, the above solution takes 15ms for operating on 19943 values.
I'm looking for a way to compute the result faster. Is there any more performant way to do this?
Side note
I'm using Numpy version 1.14.3 with Python 3.5.2
Edits
Thanks to #WarrenWeckesser, pointing out that I haven't explained how an element in results is associated with (x,y) from mappings. I have updated the post and added examples for clarity.
Here is one solution
import numpy as np
x_mapping = np.array([0, 1, 0, 0, 0, 0, 0, 0])
y_mapping = np.array([0, 3, 2, 2, 0, 3, 2, 0])
values = np.array([ 1., 2., 1., 1., 5., 6., 7., 1.], dtype=np.float32)
result = np.zeros([4, 2], dtype=np.float32)
# Get flat indices
idx_mapping = np.ravel_multi_index((-y_mapping, x_mapping), result.shape, mode='wrap')
# Sort flat indices and reorders values accordingly
reorder = np.argsort(idx_mapping)
idx_mapping = idx_mapping[reorder]
values = values[reorder]
# Get unique values
val_uniq = np.unique(values)
# Find where each unique value appears
val_uniq_hit = values[:, np.newaxis] == val_uniq
# Find reduction indices (slices with the same flat index)
reduce_idx = np.concatenate([[0], np.nonzero(np.diff(idx_mapping))[0] + 1])
# Reduce slices
reduced = np.logical_or.reduceat(val_uniq_hit, reduce_idx)
# Count distinct values on each slice
counts = np.count_nonzero(reduced, axis=1)
# Put counts in result
result.flat[idx_mapping[reduce_idx]] = counts
print(result)
# [[2. 0.]
# [1. 1.]
# [2. 0.]
# [0. 0.]]
This method takes more memory (O(len(values) * len(np.unique(values)))), but a small benchmark comparing with your original solution shows a significant speedup (although that depends on the actual size of the problem):
import numpy as np
np.random.seed(100)
result = np.zeros([400, 200], dtype=np.float32)
values = np.random.randint(100, size=(20000,)).astype(np.float32)
x_mapping = np.random.randint(result.shape[1], size=values.shape)
y_mapping = np.random.randint(result.shape[0], size=values.shape)
res1 = solution_orig(x_mapping, y_mapping, values, result)
res2 = solution(x_mapping, y_mapping, values, result)
print(np.allclose(res1, res2))
# True
# Original solution
%timeit solution_orig(x_mapping, y_mapping, values, result)
# 76.2 ms ± 623 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# This solution
%timeit solution(x_mapping, y_mapping, values, result)
# 13.8 ms ± 51.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Full code of benchmark functions:
import numpy as np
def solution(x_mapping, y_mapping, values, result):
result = np.array(result)
idx_mapping = np.ravel_multi_index((-y_mapping, x_mapping), result.shape, mode='wrap')
reorder = np.argsort(idx_mapping)
idx_mapping = idx_mapping[reorder]
values = values[reorder]
val_uniq = np.unique(values)
val_uniq_hit = values[:, np.newaxis] == val_uniq
reduce_idx = np.concatenate([[0], np.nonzero(np.diff(idx_mapping))[0] + 1])
reduced = np.logical_or.reduceat(val_uniq_hit, reduce_idx)
counts = np.count_nonzero(reduced, axis=1)
result.flat[idx_mapping[reduce_idx]] = counts
return result
def solution_orig(x_mapping, y_mapping, values, result):
result = np.array(result)
m,n = result.shape
out_dtype = result.dtype
lidx = ((-y_mapping)%m)*n + x_mapping
sidx = lidx.argsort()
idx = lidx[sidx]
val = values[sidx]
m_idx = np.flatnonzero(np.r_[True,idx[:-1] != idx[1:]])
unq_ids = idx[m_idx]
r_res = np.zeros(m_idx.size, dtype=np.float32)
for i in range(0, m_idx.shape[0]):
_next = None
arr = None
if i == m_idx.shape[0]-1:
_next = val.shape[0]
else:
_next = m_idx[i+1]
_start = m_idx[i]
if _start >= _next:
arr = val[_start]
else:
arr = val[_start:_next]
r_res[i] = np.unique(arr).size
result.flat[unq_ids] = r_res
return result

Does KNeighborsClassifier compare lists with different sizes?

I have to use Scikit Lean's KNeighborsClassifier to compare time series using an user defined function in Python.
knn = KNeighborsClassifier(n_neighbors=1,weights='distance',metric='pyfunc',func=dtw_dist)
The problem is that KNeighborsClassifier doens't seem to support my training data. They are time series, so they are lists with different sizes. KNeighborsClassifier gives me this error message when I try to use fit method (knn.fit(X,Y)):
ValueError: data type not understood
It seems KNeighborsClassifier only supports same size training sets (only time series with same lenght would be accepted, but that is not my case), but my teacher told me to use KNeighborsClassifier. So I don't know what to do...
Any ideas?
Two (or one...) options as far as I can tell:
Precompute the distances (not directly supported by KNeighborsClassifier it seems, other clustering algorithms do, e.g., Spectral Clustering).
Convert your data to be square using NaNs, and handling these accordingly in your custom distance function.
'Square' your data using NaNs
So, option 2 it is.
Say we have the following data, where every row represents a time series:
import numpy as np
series = [
[1,2,3,4],
[1,2,3],
[1],
[1,2,3,4,5,6,7,8]
]
We simply make the data square by adding nans:
def make_square(jagged):
# Careful: this mutates the series list of list
max_cols = max(map(len, jagged))
for row in jagged:
row.extend([None] * (max_cols - len(row)))
return np.array(jagged, dtype=np.float)
make_square(series)
array([[ 1., 2., 3., 4., nan, nan, nan, nan],
[ 1., 2., 3., nan, nan, nan, nan, nan],
[ 1., nan, nan, nan, nan, nan, nan, nan],
[ 1., 2., 3., 4., 5., 6., 7., 8.]])
Now the data 'fits' into the algorithm. You just have to adapt your distance function to account for the NaNs.
Precompute and use a cache function
Oh we can probably do option 1 too (assuming you have N time series):
Precompute the distances into a (N, N) distance matrix D
Create a (N, 1) matrix that is just a range between [0, N) (i.e., the index of the series in the distance matrix)
Create a distance function wrapper
Use this wrapper as the distance function.
wrapper function:
def wrapper(row1, row2):
# might have to fiddle a bit here, but i think this retrieves the indices.
i1, i2 = row1[0], row2[0]
return D[i1, i2]
Ok I hope its clear.
Complete example
#!/usr/bin/env python2.7
# encoding: utf-8
'''
'''
from mlpy import dtw_std # I dont know if you are using this one: it doesnt matter.
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
# Example data
series = [
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3],
[1],
[1, 2, 3, 4, 5, 6, 7, 8],
[1, 2, 5, 6, 7, 8],
[1, 2, 4, 5, 6, 7, 8],
]
# I dont know.. these seemed to make sense to me!
y = np.array([
0,
0,
0,
0,
1,
2,
2,
2
])
# Compute the distance matrix
N = len(series)
D = np.zeros((N, N))
for i in range(N):
for j in range(i+1, N):
D[i, j] = dtw_std(series[i], series[j])
D[j, i] = D[i, j]
print D
# Create the fake data matrix: just the indices of the timeseries
X = np.arange(N).reshape((N, 1))
# Create the wrapper function that returns the correct distance
def wrapper(row1, row2):
# cast to int to prevent warnings: sklearn converts our integer indices to floats.
i1, i2 = int(row1[0]), int(row2[0])
return D[i1, i2]
# Only the ball_tree algorith seems to accept a custom function
knn = KNeighborsClassifier(weights='distance', algorithm='ball_tree', metric='pyfunc', func=wrapper)
knn.fit(X, y)
print knn.kneighbors(X[0])
# (array([[ 0., 0., 0., 1., 6.]]), array([[1, 2, 0, 3, 4]]))
print knn.kneighbors(X[0])
# (array([[ 0., 0., 0., 1., 6.]]), array([[1, 2, 0, 3, 4]]))
print knn.predict(X)
# [0 0 0 0 1 2 2 2]

Python: Counting identical rows in an array (without any imports)

For example, given:
import numpy as np
data = np.array(
[[0, 0, 0],
[0, 1, 1],
[1, 0, 1],
[1, 0, 1],
[0, 1, 1],
[0, 0, 0]])
I want to get a 3-dimensional array, looking like:
result = array([[[ 2., 0.],
[ 0., 2.]],
[[ 0., 2.],
[ 0., 0.]]])
One way is:
for row in data
newArray[ row[0] ][ row[1] ][ row[2] ] += 1
What I'm trying to do is the following:
for i in dimension1
for j in dimension2
for k in dimension3
result[i,j,k] = (data[data[data[:,0]==i, 1]==j, 2]==k).sum()
This doesn't seem to work and I would like to achieve the desired result by sticking to my implementation rather than the one mentioned in the beginning (or using any extra imports, eg counter).
Thanks.
You can also use numpy.histogramdd for this:
>>> np.histogramdd(data, bins=(2, 2, 2))[0]
array([[[ 2., 0.],
[ 0., 2.]],
[[ 0., 2.],
[ 0., 0.]]])
The problem is that data[data[data[:,0]==i, 1]==j, 2]==k is not what you expect it to be.
Let's take this apart for the case (i,j,k) == (0,0,0)
data[:,0]==0 is [True, True, False, False, True, True], and data[data[:,0]==0] correctly gives us the lines where the first number is 0.
Now from those lines we get the lines where the second number is 0: data[data[:,0]==0, 1]==0, which gives us [True, False, False, True]. And this is the problem. Because if we take those indices from data, i.e., data[data[data[:,0]==0, 1]==0] we do not get the rows where the first and second number are 0, but the 0th and 3rd row instead:
In [51]: data[data[data[:,0]==0, 1]==0]
Out[51]: array([[0, 0, 0],
[1, 0, 1]])
And if we now filter for the rows where the third number is 0, we get the wrong result w.r.t. the orignal data.
And that's why your approach does not work. For better methods, see the other answers.
You can do something like the following
#Get output dimension and construct output array.
>>> dshape = tuple(data.max(axis=0)+1)
>>> dshape
(2, 2, 2)
>>> out = np.zeros(shape)
If you have numpy 1.8+:
out.flat[np.ravel_multi_index(data.T, dshape)]+=1
Else:
#Get indices and unique the resulting array
>>> inds = np.ravel_multi_index(data.T, dshape)
>>> inds, inverse = np.unique(inds, return_inverse=True)
>>> values = np.bincount(inverse)
>>> values
array([2, 2, 2])
>>> out.flat[inds] = values
>>> out
array([[[ 2., 0.],
[ 0., 2.]],
[[ 0., 2.],
[ 0., 0.]]])
Numpy versions before numpy 1.7 do not have a add.at attribute and the top code will not work without it. As ravel_multi_index may not be the fastest algorithm ever you can look into taking the unique rows of a numpy array. In effect these two operations should be equivalent.
Don't fear the imports. They're what make Python awesome.
If question assumes that you already have the result matrix.
import numpy as np
data = np.array(
[[0, 0, 0],
[0, 1, 1],
[1, 0, 1],
[1, 0, 1],
[0, 1, 1],
[0, 0, 0]]
)
result = np.zeros((2,2,2))
# range of each dim, aka allowable values for each dim
dim_ranges = zip(np.zeros(result.ndim), np.array(result.shape)-1)
dim_ranges
# Out[]:
# [(0.0, 2), (0.0, 2), (0.0, 2)]
# Multidimentional histogram will effectively "count" along each dim
sums,_ = np.histogramdd(data,bins=result.shape,range=dim_ranges)
result += sums
result
# Out[]:
# array([[[ 2., 0.],
# [ 0., 2.]],
#
# [[ 0., 2.],
# [ 0., 0.]]])
This solution solves for any "result" ndarray, no matter what the shape. Additionally, it works fine even if your "data" ndarray has indices which are out-of-bounds for your result matrix.

Categories

Resources