Applying multiple functions on each row using Numba

Applying multiple functions on each row using Numba - python

I have a big 2D NumPy array, let's say 5M rows and 10 columns. I want to build a few more columns according to some stateful logic implemented using Numba #jitclass. Let's say there are 50 such new columns to create. The idea is to iterate over all the rows of 10 columns in a Numba #jit function, and for each row, apply each of my 50 "filters" to generate one new cell each. So:
Source1..Source10 Derived1..Derived50
[array of 10 inputs] [array of 50 outputs]
... 5 million rows like this ...
The problem is, I can't pass a list or tuple of my "filters" to an #jit(nopython=True) function, because they are not homogenous:
#numba.jit(nopython=True)
def calc_derived(source, derived, filters):
for srcidx, src in enumerate(source):
for filtidx, filt in enumerate(filters): # doesn't work
derived[srcidx,filtidx] = filt.transform(src)
The above doesn't work because filters are a bunch of different classes. As far as I can tell, even making them derive from a common base class is not good enough.
I am left with the possibility of swapping the order of the loops, and having the loop over the 50 filters outside of the #jit function, but this would mean the entire source dataset would be loaded 50 times instead of once, which is very wasteful.
Do you have a technique to work around the "homogenous lists only" requirement of Numba?

You originally asked about doing this with a single function that loops over rows, and applies a list of filters to each row. A challenge with this approach is that numba needs to know or be able to infer the input/output types of each function. I'm not aware of a way to satisfy numba's requirement in this situation (which is not to say that none exists). If there were a way to do this, it could be a better solution (and I'd like to know what it is).
An alternative is to move the code that loops over rows into the filters themselves. Because the filters are numba functions, this should maintain speed. The function that applies the filters would longer use numba; it would simply loop over the list of filters. But, because the number of filters is small relative to the size of the data matrix, hopefully this won't impact speed too severely. Because this function no longer uses numba, the 'heterogeneous list' issue would no longer be a problem.
This approach worked when I tested it (nopython mode is fine). In test cases, filters implemented as numba functions were 10-18x faster than filters implemented as class methods (even though classes were implemented as numba jitclasses; not sure what's going on there). To gain a bit of modularity, filters can be constructed as closures, so that similar filters can be defined using different parameters.
For example, here are filters that compute sums of powers. Given a matrix x, the filter operates over the columns of x, giving an output for each row. It returns a vector v, where v[i] = sum(x[i, :] ** power)
# filter constructor
def sumpow(power):
#numba.jit(nopython=True)
def run_filter(x):
(nrows, ncols) = x.shape
result = np.zeros(nrows)
for i in range(nrows):
for j in range(ncols):
result[i] += x[i,j] ** power
return result
return run_filter
# define filters
sum1 = sumpow(1) # sum of elements
sum2 = sumpow(2) # sum of elements squared
# apply a single filter
v = sum2(x)
The function to apply multiple filters looks like this. The output of each filter is stacked into a column of the output.
def apply_filters(x, filters):
result = np.empty((x.shape[0], len(filters)))
for (i, f) in enumerate(filters):
result[:, i] = f(x)
return result
y = apply_filters(x, [sum1, sum2])
Timing results
Data matrix: random entries drawn from standard normal distribution, float64, 5 million rows x 10 columns. All methods tested using the same matrix.
Filters: sum2 filter above, repeated 20x in a list: [sum2, sum2, ...]
Timed using IPython's %timeit function, best of 3 runs
Numerical outputs of all methods agree
Numba function filters (as shown above): 2.25s
Numba jitclass filters: 28.3s
Pure NumPy (using vectorized ops, no loops): 8.64s
I imagine Numba might gain relative to NumPy for more complex filters.

To get a homogeneous list, you could construct a list of the transform functions of all filters. In this case, all list elements would would have type method.
# filters = list of filters
transforms = [x.transform for x in filters]
Then pass transforms to calc_derived() instead of filters.
Edit:
On my system, looks like numba will accept this, but only if nopython=False

Related

How to Efficiently Find the Indices of Max Values in a Multidimensional Array of Matrices using Pytorch and/or Numpy

Background
It is common in machine learning to deal with data of a high dimensionality. For example, in a Convolutional Neural Network (CNN) the dimensions of each input image may be 256x256, and each image may have 3 color channels (Red, Green, and Blue). If we assume that the model takes in a batch of 16 images at a time, the dimensionality of the input going into our CNN is [16,3,256,256]. Each individual convolutional layer expects data in the form [batch_size, in_channels, in_y, in_x], and all of these quantities often change layer-to-layer (except batch_size). The term we use for the matrix made up of the [in_y, in_x] values is feature map, and this question is concerned with finding the maximum value, and its index, in every feature map at a given layer.
Why do I want to do this? I want to apply a mask to every feature map, and I want to apply that mask centered at the max value in each feature map, and to do that I need to know where each max value is located. This mask application is done during both training and testing of the model, so efficiency is vitally important to keep computational times down. There are many Pytorch and Numpy solutions for finding singleton max values and indices, and for finding the maximum values or indices along a single dimension, but no (that I could find) dedicated and efficient built-in functions for finding the indices of maximum values along 2 or more dimensions at a time. Yes, we can nest functions that operate on a single dimension, but these are some of the least efficient approaches.
What I've Tried
I've looked at this Stackoverflow question, but the author is dealing with a special-case 4D array which is trivially squeezed to a 3D array. The accepted answer is specialized for this case, and the answer pointing to TopK is misguided because it not only operates on a single dimension, but would necessitate that k=1 given the question asked, thus devlolving to a regular torch.max call.
I've looked at this Stackoverflow question, but this question, and its answer, focus on looking through a single dimension.
I have looked at this Stackoverflow question, but I already know of the answer's approach as I independently formulated it in my own answer here (where I amended that the approach is very inefficient).
I have looked at this Stackoverflow question, but it does not satisfy the key part of this question, which is concerned with efficiency.
I have read many other Stackoverflow questions and answers, as well as the Numpy documentation, Pytorch documentation, and posts on the Pytorch forums.
I've tried implementing a LOT of varying approaches to this problem, enough that I have created this question so that I can answer it and give back to the community, and anyone who goes looking for a solution to this problem in the future.
Standard of Performance
If I am asking a question about efficiency I need to detail expectations clearly. I am trying to find a time-efficient solution (space is secondary) for the problem above without writing C code/extensions, and which is reasonably flexible (hyper specialized approaches aren't what I'm after). The approach must accept an [a,b,c,d] Torch tensor of datatype float32 or float64 as input, and output an array or tensor of the form [a,b,2] of datatype int32 or int64 (because we are using the output as indices).
Solutions should be benchmarked against the following typical solution:
max_indices = torch.stack([torch.stack([(x[k][j]==torch.max(x[k][j])).nonzero()[0] for j in range(x.size()[1])]) for k in range(x.size()[0])])

The Approach
We are going to take advantage of the Numpy community and libraries, as well as the fact that Pytorch tensors and Numpy arrays can be converted to/from one another without copying or moving the underlying arrays in memory (so conversions are low cost). From the Pytorch documentation:
Converting a torch Tensor to a Numpy array and vice versa is a breeze. The torch Tensor and Numpy array will share their underlying memory locations, and changing one will change the other.
Solution One
We are first going to use the Numba library to write a function that will be just-in-time (JIT) compiled upon its first usage, meaning we can get C speeds without having to write C code ourselves. Of course, there are caveats to what can get JIT-ed, and one of those caveats is that we work with Numpy functions. But this isn't too bad because, remember, converting from our torch tensor to Numpy is low cost. The function we create is:
#njit(cache=True)
def indexFunc(array, item):
for idx, val in np.ndenumerate(array):
if val == item:
return idx
This function if from another Stackoverflow answer located here (This was the answer which introduced me to Numba). The function takes an N-Dimensional Numpy array and looks for the first occurrence of a given item. It immediately returns the index of the found item on a successful match. The #njit decorator is short for #jit(nopython=True), and tells the compiler that we want it to compile the function using no Python objects, and to throw an error if it is not able to do so (Numba is the fastest when no Python objects are used, and speed is what we are after).
With this speedy function backing us, we can get the indices of the max values in a tensor as follows:
import numpy as np
x = x.numpy()
maxVals = np.amax(x, axis=(2,3))
max_indices = np.zeros((n,p,2),dtype=np.int64)
for index in np.ndindex(x.shape[0],x.shape[1]):
max_indices[index] = np.asarray(indexFunc(x[index], maxVals[index]),dtype=np.int64)
max_indices = torch.from_numpy(max_indices)
We use np.amax because it can accept a tuple for its axis argument, allowing it to return the max values of each 2D feature map in the 4D input. We initialize max_indices with np.zeros ahead of time because appending to numpy arrays is expensive, so we allocate the space we need ahead of time. This approach is much faster than the Typical Solution in the question (by an order of magnitude), but it also uses a for loop outside the JIT-ed function, so we can improve...
Solution Two
We will use the following solution:
#njit(cache=True)
def indexFunc(array, item):
for idx, val in np.ndenumerate(array):
if val == item:
return idx
raise RuntimeError
#njit(cache=True, parallel=True)
def indexFunc2(x,maxVals):
max_indices = np.zeros((x.shape[0],x.shape[1],2),dtype=np.int64)
for i in prange(x.shape[0]):
for j in prange(x.shape[1]):
max_indices[i,j] = np.asarray(indexFunc(x[i,j], maxVals[i,j]),dtype=np.int64)
return max_indices
x = x.numpy()
maxVals = np.amax(x, axis=(2,3))
max_indices = torch.from_numpy(indexFunc2(x,maxVals))
Instead of iterating through our feature maps one-at-a-time with a for loop, we can take advantage of parallelization using Numba's prange function (which behaves exactly like range but tells the compiler we want the loop to be parallelized) and the parallel=True decorator argument. Numba also parallelizes the np.zeros function. Because our function is compiled Just-In-Time and uses no Python objects, Numba can take advantage of all the threads available in our system! It is worth noting that there is now a raise RuntimeError in the indexFunc. We need to include this, otherwise the Numba compiler will try to infer the return type of the function and infer that it will either be an array or None. This doesn't jive with our usage in indexFunc2, so the compiler would throw an error. Of course, from our setup we know that indexFunc will always return an array, so we can simply raise and error in the other logical branch.
This approach is functionally identical to Solution One, but changes the iteration using nd.index into two for loops using prange. This approach is about 4x faster than Solution One.
Solution Three
Solution Two is fast, but it is still finding the max values using regular Python. Can we speed this up using a more comprehensive JIT-ed function?
#njit(cache=True)
def indexFunc(array, item):
for idx, val in np.ndenumerate(array):
if val == item:
return idx
raise RuntimeError
#njit(cache=True, parallel=True)
def indexFunc3(x):
maxVals = np.zeros((x.shape[0],x.shape[1]),dtype=np.float32)
for i in prange(x.shape[0]):
for j in prange(x.shape[1]):
maxVals[i][j] = np.max(x[i][j])
max_indices = np.zeros((x.shape[0],x.shape[1],2),dtype=np.int64)
for i in prange(x.shape[0]):
for j in prange(x.shape[1]):
x[i][j] == np.max(x[i][j])
max_indices[i,j] = np.asarray(indexFunc(x[i,j], maxVals[i,j]),dtype=np.int64)
return max_indices
max_indices = torch.from_numpy(indexFunc3(x))
It might look like there is a lot more going on in this solution, but the only change is that instead of calculating the maximum values of each feature map using np.amax, we have now parallelized the operation. This approach is marginally faster than Solution Two.
Solution Four
This solution is the best I've been able to come up with:
#njit(cache=True, parallel=True)
def indexFunc4(x):
max_indices = np.zeros((x.shape[0],x.shape[1],2),dtype=np.int64)
for i in prange(x.shape[0]):
for j in prange(x.shape[1]):
maxTemp = np.argmax(x[i][j])
max_indices[i][j] = [maxTemp // x.shape[2], maxTemp % x.shape[2]]
return max_indices
max_indices = torch.from_numpy(indexFunc4(x))
This approach is more condensed and also the fastest at 33% faster than Solution Three and 50x faster than the Typical Solution. We use np.argmax to get the index of the max value of each feature map, but np.argmax only returns the index as if each feature map were flattened. That is, we get a single integer telling us which number the element is in our feature map, not the indices we need to be able to access that element. The math [maxTemp // x.shape[2], maxTemp % x.shape[2]] is to turn that singular int into the [row,column] that we need.
Benchmarking
All approaches were benchmarked together against a random input of shape [32,d,64,64], where d was incremented from 5 to 245. For each d, 15 samples were gathered and the times were averaged. An equality test ensured that all solutions provided identical values. An example of the benchmark output is:
A plot of the benchmarking times as d increased is (leaving out the Typical Solution so the graph isn't squashed):
Woah! What is going on at the start with those spikes?
Solution Five
Numba allows us to produce Just-In-Time compiled functions, but it doesn't compile them until the first time we use them; It then caches the result for when we call the function again. This means the very first time we call our JIT-ed functions we get a spike in compute time as the function is compiled. Luckily, there is a way around this- if we specify ahead of time what our function's return type and argument types will be, the function will be eagerly compiled instead of compiled just-in-time. Applying this knowledge to Solution Four we get:
#njit('i8[:,:,:](f4[:,:,:,:])',cache=True, parallel=True)
def indexFunc4(x):
max_indices = np.zeros((x.shape[0],x.shape[1],2),dtype=np.int64)
for i in prange(x.shape[0]):
for j in prange(x.shape[1]):
maxTemp = np.argmax(x[i][j])
max_indices[i][j] = [maxTemp // x.shape[2], maxTemp % x.shape[2]]
return max_indices
max_indices6 = torch.from_numpy(indexFunc4(x))
And if we restart our kernel and rerun our benchmark, we can look at the first result where d==5 and the second result where d==10 and note that all of the JIT-ed solutions were slower when d==5 because they had to be compiled, except for Solution Four, because we explicitly provided the function signature ahead of time:
There we go! That's the best solution I have so far for this problem.
EDIT #1
Solution Six
An improved solution has been developed which is 33% faster than the previously posted best solution. This solution only works if the input array is C-contiguous, but this isn't a big restriction since numpy arrays or torch tensors will be contiguous unless they are reshaped, and both have functions to make the array/tensor contiguous if needed.
This solution is the same as the previous best, but the function decorator which specifies the input and return types are changed from
#njit('i8[:,:,:](f4[:,:,:,:])',cache=True, parallel=True)
to
#njit('i8[:,:,::1](f4[:,:,:,::1])',cache=True, parallel=True)
The only difference is that the last : in each array typing becomes ::1, which signals to the numba njit compiler that the input arrays are C-contiguous, allowing it to better optimize.
The full solution six is then:
#njit('i8[:,:,::1](f4[:,:,:,::1])',cache=True, parallel=True)
def indexFunc5(x):
max_indices = np.zeros((x.shape[0],x.shape[1],2),dtype=np.int64)
for i in prange(x.shape[0]):
for j in prange(x.shape[1]):
maxTemp = np.argmax(x[i][j])
max_indices[i][j] = [maxTemp // x.shape[2], maxTemp % x.shape[2]]
return max_indices
max_indices7 = torch.from_numpy(indexFunc5(x))
The benchmark including this new solution confirms the speedup:

Array operations using multiple indices of same array

I am very new to Python, and I am trying to get used to performing Python's array operations rather than looping through arrays. Below is an example of the kind of looping operation I am doing, but am unable to work out a suitable pure array operation that does not rely on loops:
import numpy as np
def f(arg1, arg2):
# an arbitrary function
def myFunction(a1DNumpyArray):
A = a1DNumpyArray
# Create a square array with each dimension the size of the argument array.
B = np.zeros((A.size, A.size))
# Function f is a function of two elements of the 1D array. For each
# element, i, I want to perform the function on it and every element
# before it, and store the result in the square array, multiplied by
# the difference between the ith and (i-1)th element.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
# Sum through j and return full sums as 1D array.
return np.sum(B, axis=0)
In short, I am integrating a function which takes two elements of the same array as arguments, returning an array of results of the integral.
Is there a more compact way to do this, without using loops?

The use of an arbitrary f function, and this [i, :i] business complicates by passing a loop.
Most of the fast compiled numpy operations work on the whole array, or whole rows and/or columns, and effectively do so in parallel. Loops that are inherently sequential (value from one loop depends on the previous) don't fit well. And different size lists or arrays in each loop are also a good indicator that 'vectorizing' will be difficult.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
With a sample A and known f (as simple as arg1*arg2), I'd generate a B array, and look for patterns that treat B as a whole. At first glance it looks like your B is a lower triangle. There are functions to help index those. But that final sum might change the picture.
Sometimes I tackle these problems with a bottom up approach, trying to remove inner loops first. But in this case, I think some sort of big-picture approach is needed.

Python - Difficulty Calculating Partition Function using Large NumPy arrays

I had a pretty compact way of computing the partition function of an Ising-like model using itertools, lambda functions, and large NumPy arrays. Given a network consisting of N nodes and Q "states"/node, I have two arrays, h-fields and J-couplings, of sizes (N,Q) and (N,N,Q,Q) respectively. J is upper-triangular, however. Using these arrays, I have been computing the partition function Z using the following method:
# Set up lambda functions and iteration tuples of the form (A_1, A_2, ..., A_n)
iters = itertools.product(range(Q),repeat=N)
hf = lambda s: h[range(N),s]
jf = lambda s: np.array([J[fi,fj,s[fi],s[fj]] \
for fi,fj in itertools.combinations(range(N),2)]).flatten()
# Initialize and populate partition function array
pf = np.zeros(tuple([Q for i in range(N)]))
for it in iters:
hterms = np.exp(hf(it)).prod()
jterms = np.exp(-jf(it)).prod()
pf[it] = jterms * hterms
# Calculates partition function
Z = pf.sum()
This method works quickly for small N and Q, say (N,Q) = (5,2). However, for larger systems (N,Q) = (18,3), this method cannot even create the pf array due to memory issues because it has Q^N nontrivial elements. Any ideas on how to either overcome this memory issue or how to alter the code to work on subarrays?
Edit: Made a small mistake in the definition of jf. It has been corrected.

You can avoid the large array just by initializing Z to 0, and incrementing it by jterms * iterms in each iteration. This still won't get you out of calculating and summing Q^N numbers, however. To do that, you probably need to figure out a way to simplify the partition function algebraically.

Not sure what you are trying to compute but I tested your code with ChrisB suggestion and jf will not work for Q=3.

Perhaps you shouldn't use a dense numpy array to encode your function? You could try sparse arrays or just straight Python with Numba compilation. This blogpost shows using Numba on the simple Ising model with good performance.

Python: nearest neighbour (or closest match) filtering on data records (list of tuples)

I am trying to write a function that will filter a list of tuples (mimicing an in-memory database), using a "nearest neighbour" or "nearest match" type algorithim.
I want to know the best (i.e. most Pythonic) way to go about doing this. The sample code below hopefully illustrates what I am trying to do.
datarows = [(10,2.0,3.4,100),
(11,2.0,5.4,120),
(17,12.9,42,123)]
filter_record = (9,1.9,2.9,99) # record that we are seeking to retrieve from 'database' (or nearest match)
weights = (1,1,1,1) # weights to approportion to each field in the filter
def get_nearest_neighbour(data, criteria, weights):
for each row in data:
# calculate 'distance metric' (e.g. simple differencing) and multiply by relevant weight
# determine the row which was either an exact match or was 'least dissimilar'
# return the match (or nearest match)
pass
if __name__ == '__main__':
result = get_nearest_neighbour(datarow, filter_record, weights)
print result
For the snippet above, the output should be:
(10,2.0,3.4,100)
since it is the 'nearest' to the sample data passed to the function get_nearest_neighbour().
My question then is, what is the best way to implement get_nearest_neighbour()?. For the purpose of brevity etc, assume that we are only dealing with numeric values, and that the 'distance metric' we use is simply an arithmentic subtraction of the input data from the current row.

Simple out-of-the-box solution:
import math
def distance(row_a, row_b, weights):
diffs = [math.fabs(a-b) for a,b in zip(row_a, row_b)]
return sum([v*w for v,w in zip(diffs, weights)])
def get_nearest_neighbour(data, criteria, weights):
def sort_func(row):
return distance(row, criteria, weights)
return min(data, key=sort_func)
If you'd need to work with huge datasets, you should consider switching to Numpy and using Numpy's KDTree to find nearest neighbors. Advantage of using Numpy is that not only it uses more advanced algorithm, but also it's implemented a top of highly optimized LAPACK (Linear Algebra PACKage).

About naive-NN:
Many of these other answers propose "naive nearest-neighbor", which is an O(N*d)-per-query algorithm (d is the dimensionality, which in this case seems constant, so it's O(N)-per-query).
While an O(N)-per-query algorithm is pretty bad, you might be able to get away with it, if you have less than any of (for example):
10 queries and 100000 points
100 queries and 10000 points
1000 queries and 1000 points
10000 queries and 100 points
100000 queries and 10 points
Doing better than naive-NN:
Otherwise you will want to use one of the techniques (especially a nearest-neighbor data structure) listed in:
http://en.wikipedia.org/wiki/Nearest_neighbor_search (most likely linked off from that page), some examples linked:
http://en.wikipedia.org/wiki/K-d_tree
http://en.wikipedia.org/wiki/Locality_sensitive_hashing
http://en.wikipedia.org/wiki/Cover_tree
especially if you plan to run your program more than once. There are most likely libraries available. To otherwise not use a NN data structure would take too much time if you have a large product of #queries * #points. As user 'dsign' points out in comments, you can probaby squeeze out a large additional constant factor of speed by using the numpy library.
However if you can get away with using the simple-to-implement naive-NN though, you should use it.

use heapq.nlargest on a generator calculating the distance*weight for each record.
something like:
heapq.nlargest(N, ((row, dist_function(row,criteria,weight)) for row in data), operator.itemgetter(1))

Best way to create a NumPy array from a dictionary?

I'm just starting with NumPy so I may be missing some core concepts...
What's the best way to create a NumPy array from a dictionary whose values are lists?
Something like this:
d = { 1: [10,20,30] , 2: [50,60], 3: [100,200,300,400,500] }
Should turn into something like:
data = [
[10,20,30,?,?],
[50,60,?,?,?],
[100,200,300,400,500]
]
I'm going to do some basic statistics on each row, eg:
deviations = numpy.std(data, axis=1)
Questions:
What's the best / most efficient way to create the numpy.array from the dictionary? The dictionary is large; a couple of million keys, each with ~20 items.
The number of values for each 'row' are different. If I understand correctly numpy wants uniform size, so what do I fill in for the missing items to make std() happy?
Update: One thing I forgot to mention - while the python techniques are reasonable (eg. looping over a few million items is fast), it's constrained to a single CPU. Numpy operations scale nicely to the hardware and hit all the CPUs, so they're attractive.

You don't need to create numpy arrays to call numpy.std().
You can call numpy.std() in a loop over all the values of your dictionary. The list will be converted to a numpy array on the fly to compute the standard variation.
The downside of this method is that the main loop will be in python and not in C. But I guess this should be fast enough: you will still compute std at C speed, and you will save a lot of memory as you won't have to store 0 values where you have variable size arrays.
If you want to further optimize this, you can store your values into a list of numpy arrays, so that you do the python list -> numpy array conversion only once.
if you find that this is still too slow, try to use psycho to optimize the python loop.
if this is still too slow, try using Cython together with the numpy module. This Tutorial claims impressive speed improvements for image processing. Or simply program the whole std function in Cython (see this for benchmarks and examples with sum function )
An alternative to Cython would be to use SWIG with numpy.i.
if you want to use only numpy and have everything computed at C level, try grouping all the records of same size together in different arrays and call numpy.std() on each of them. It should look like the following example.
example with O(N) complexity:
import numpy
list_size_1 = []
list_size_2 = []
for row in data.itervalues():
if len(row) == 1:
list_size_1.append(row)
elif len(row) == 2:
list_size_2.append(row)
list_size_1 = numpy.array(list_size_1)
list_size_2 = numpy.array(list_size_2)
std_1 = numpy.std(list_size_1, axis = 1)
std_2 = numpy.std(list_size_2, axis = 1)

While there are already some pretty reasonable ideas present here, I believe following is worth mentioning.
Filling missing data with any default value would spoil the statistical characteristics (std, etc). Evidently that's why Mapad proposed the nice trick with grouping same sized records.
The problem with it (assuming there isn't any a priori data on records lengths is at hand) is that it involves even more computations than the straightforward solution:
at least O(N*logN) 'len' calls and comparisons for sorting with an effective algorithm
O(N) checks on the second way through the list to obtain groups(their beginning and end indexes on the 'vertical' axis)
Using Psyco is a good idea (it's strikingly easy to use, so be sure to give it a try).
It seems that the optimal way is to take the strategy described by Mapad in bullet #1, but with a modification - not to generate the whole list, but iterate through the dictionary converting each row into numpy.array and performing required computations. Like this:
for row in data.itervalues():
np_row = numpy.array(row)
this_row_std = numpy.std(np_row)
# compute any other statistic descriptors needed and then save to some list
In any case a few million loops in python won't take as long as one might expect. Besides this doesn't look like a routine computation, so who cares if it takes extra second/minute if it is run once in a while or even just once.
A generalized variant of what was suggested by Mapad:
from numpy import array, mean, std
def get_statistical_descriptors(a):
if ax = len(shape(a))-1
functions = [mean, std]
return f(a, axis = ax) for f in functions
def process_long_list_stats(data):
import numpy
groups = {}
for key, row in data.iteritems():
size = len(row)
try:
groups[size].append(key)
except KeyError:
groups[size] = ([key])
results = []
for gr_keys in groups.itervalues():
gr_rows = numpy.array([data[k] for k in gr_keys])
stats = get_statistical_descriptors(gr_rows)
results.extend( zip(gr_keys, zip(*stats)) )
return dict(results)

numpy dictionary
You can use a structured array to preserve the ability to address a numpy object by a key, like a dictionary.
import numpy as np
dd = {'a':1,'b':2,'c':3}
dtype = eval('[' + ','.join(["('%s', float)" % key for key in dd.keys()]) + ']')
values = [tuple(dd.values())]
numpy_dict = np.array(values, dtype=dtype)
numpy_dict['c']
will now output
array([ 3.])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Applying multiple functions on each row using Numba - python

Related

How to Efficiently Find the Indices of Max Values in a Multidimensional Array of Matrices using Pytorch and/or Numpy

Array operations using multiple indices of same array

Python - Difficulty Calculating Partition Function using Large NumPy arrays

Python: nearest neighbour (or closest match) filtering on data records (list of tuples)

Best way to create a NumPy array from a dictionary?

Categories

Resources