Improving np.fromfuction performance in terms - python

I am trying to create a big array for a high dim in y_shift = np.fromfunction(lambda i,j: (i)>>j, ((2**dim), dim), dtype=np.uint32). For example dim=32. I have two questions
1.- How to improve the speed in term of time
2.- How to avoid the message for dim=32 zsh: killed python3
EDIT::
Alternative you can consider to use uint8 instead of uint32
y_shift = np.fromfunction(lambda i,j: (1&(i)>>j), ((2**dim), dim), dtype=np.uint8)

To answer your question:
You get the error zsh: killed python3 because you run out of memory.
If you want to run the code you initially proposed:
dim =32
y_shift = np.fromfunction(lambda i,j: (i)>>j, ((2**dim), dim), dtype=np.uint32)
You would need more than 500GB of memory, see here.
I would recommend thinking of alternatives and avoid trying to save the entire array to memory.

fromfunction just does 2 things:
args = indices(shape, dtype=dtype)
return function(*args, **kwargs)
It makes the indices array
In [247]: args = np.indices(((2**4),4))
In [248]: args.shape
Out[248]: (2, 16, 4)
and it passes that array to your function
In [249]: args[0]>>args[1]
Out[249]:
array([[ 0, 0, 0, 0],
[ 1, 0, 0, 0],
[ 2, 1, 0, 0],
[ 3, 1, 0, 0],
...
[14, 7, 3, 1],
[15, 7, 3, 1]])
With dim=32:
In [250]: ((2**32),32)
Out[250]: (4294967296, 32)
the resulting args array will be (2, 4294967296, 32). There's no way around that in terms of speed or memory use.

Related

Extract sub arrays based on kernel in numpy

I would like to know if there is an efficient method to get sub-arrays from a larger numpy array.
What I have is an application of np.where. I iterate 'manually' over x and y as offsets and apply where with a kernel to each rectangle extracted from the larger array with proper dimensions.
But is there a more direct approach in numpy's collection of methods?
import numpy as np
example = np.arange(20).reshape((5, 4))
# e.g. a cross kernel
a_kernel = np.asarray([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
np.where(a_kernel, example[1:4, 1:4], 0)
# returns
# array([[ 0, 6, 0],
# [ 9, 10, 11],
# [ 0, 14, 0]])
def arrays_from_kernel(a, a_kernel):
width, height = a_kernel.shape
y_max, x_max = a.shape
return [np.where(a_kernel, a[y:(y + height), x:(x + width)], 0)
for y in range(y_max - height + 1)
for x in range(x_max - width + 1)]
sub_arrays = arrays_from_kernel(example, a_kernel)
This returns the arrays I need for further processing.
# [array([[0, 1, 0],
# [4, 5, 6],
# [0, 9, 0]]),
# array([[ 0, 2, 0],
# [ 5, 6, 7],
# [ 0, 10, 0]]),
# ...
# array([[ 0, 9, 0],
# [12, 13, 14],
# [ 0, 17, 0]]),
# array([[ 0, 10, 0],
# [13, 14, 15],
# [ 0, 18, 0]])]
The context: similar to 2D convolution I would like to apply a custom function on each of the subarrays (e.g. product of squared numbers).
At the moment, you're manually advancing a sliding window over the data - stride tricks to the rescue! (And no, I didn't just make that up - there's actually a submodule called stride_tricks in numpy!) Instead of manually building windows into the data, and calling np.where() on them, if you had the windows in an array, you could call np.where() just once. Stride tricks allow you to create such an array without even having to copy the data.
Let me explain. Normal slices in numpy create views into the original data instead of copies. This is done by referring to the original data, but changing the strides used to access the data (ie. how much to jump between two elements or two rows, and so on). Stride tricks allow you to modify those strides more freely than just slicing and reshaping does, so you can eg. iterate over the same data more than once, which is useful here.
Let me demonstrate:
import numpy as np
example = np.arange(20).reshape((5, 4))
a_kernel = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
def sliding_window(data, win_shape, **kwargs):
assert data.ndim == len(win_shape)
shape = tuple(dn - wn + 1 for dn, wn in zip(data.shape, win_shape)) + win_shape
strides = data.strides * 2
return np.lib.stride_tricks.as_strided(data, shape=shape, strides=strides, **kwargs)
def arrays_from_kernel(a, a_kernel):
windows = sliding_window(a, a_kernel.shape)
return np.where(a_kernel, windows, 0)
sub_arrays = arrays_from_kernel(example, a_kernel)
The scipy.ndimage module offers a number of filters -- one of which might meet your needs. If none of those filters do what you want, you could use ndimage.generic_filter
to call a custom function on each subarray. ndimage.generic_filter is not as fast as the other ndimage filters, however.
For example,
import numpy as np
example = np.arange(20).reshape((5, 4))
a_kernel = np.asarray([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
# def arrays_from_kernel(a, a_kernel):
# width, height = a_kernel.shape
# y_max, x_max = a.shape
# return [np.where(a_kernel, a[y:(y + height), x:(x + width)], 0)
# for y in range(y_max - height + 1)
# for x in range(x_max - width + 1)]
# sub_arrays = arrays_from_kernel(example, a_kernel)
# for arr in sub_arrays:
# print(arr)
# print('-'*80)
import scipy.ndimage as ndimage
def func(x):
# reject subarrays that extend beyond the border of the `example` array
if not np.isnan(x).any():
y = np.zeros_like(a_kernel, dtype=example.dtype)
np.put(y, np.flatnonzero(a_kernel), x)
print(y)
# Instead or returning 0, you can perform your desired computation on the subarray here.
# Note that you may not need the 2D array y; often, you only need the values in the 1D array x
return 0
result = ndimage.generic_filter(example, func, footprint=a_kernel, mode='constant', cval=np.nan)
For the particular problem of computing the product of squares for each subarray, you
could convert the product into a sum by taking advantage of the fact that A * B = exp(log(A)+log(B)). This would allow you to express the computation as a normal convolution. Now using ndimage.convolve can improve performance a lot. The amount of the improvement depends on the size of example:
import numpy as np
import scipy.ndimage as ndimage
import perfplot
a_kernel = np.asarray([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
def orig(example, a_kernel=a_kernel):
def arrays_from_kernel(a, a_kernel):
width, height = a_kernel.shape
y_max, x_max = a.shape
return [
np.where(a_kernel, a[y : (y + height), x : (x + width)], 1)
for y in range(y_max - height + 1)
for x in range(x_max - width + 1)
]
return [np.prod(x) ** 2 for x in arrays_from_kernel(example, a_kernel)]
def alt(example, a_kernel=a_kernel):
logged = np.log(example)
result = ndimage.convolve(logged, a_kernel, mode="constant", cval=0)[1:-1, 1:-1]
return (np.exp(result) ** 2).ravel()
def make_example(N):
return np.random.random(size=(N, N))
def check(A, B):
return np.allclose(A, B)
perfplot.show(
setup=make_example,
kernels=[orig, alt],
n_range=[2 ** k for k in range(2, 11)],
logx=True,
logy=True,
xlabel="len(example)",
equality_check=check,
)

2d numpy array, making each value the sum of the 3x3 square it is centered at

I have a square 2D numpy array, A, and an array of zeros, B, with the same shape.
For every index (i, j) in A, other than the first and last rows and columns, I want to assign to B[i, j] the value of np.sum(A[i - 1:i + 2, j - 1:j + 2].
Example:
A =
array([[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 1, 0, 0],
[0, 1, 0, 1, 0],
[0, 0, 0, 0, 0])
B =
array([[0, 0, 0, 0, 0],
[0, 3, 4, 2, 0],
[0, 4, 6, 3, 0],
[0, 3, 4, 2, 0],
[0, 0, 0, 0, 0])
Is there an efficient way to do this? Or should I simply use a for loop?
There is a clever (read "borderline smartass") way to do this with np.lib.stride_tricks.as_strided. as_strided allows you to create views into your buffer that simulate windows by adding another dimension to the view. For example, if you had a 1D array like
>>> x = np.arange(10)
>>> np.lib.stride_tricks.as_strided(x, shape=(3, x.shape[0] - 2), strides=x.strides * 2)
array([[0, 1, 2, 3, 4, 5, 6, 7],
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 3, 4, 5, 6, 7, 8, 9]])
Hopefully it is clear that you can just sum along axis=0 to get the sum of each size 3 window. There is no reason you couldn't extrend that to two or more dimensions. I've written the shape and index of the previous example in a way that suggests a solution:
A = np.array([[0, 0, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 1, 0, 0],
[0, 1, 0, 1, 0],
[0, 0, 0, 0, 0]])
view = np.lib.stride_tricks.as_strided(A,
shape=(3, 3, A.shape[0] - 2, A.shape[1] - 2),
strides=A.strides * 2
)
B[1:-1, 1:-1] = view.sum(axis=(0, 1))
Summing along multiple axes simultaneously has been supported in np.sum since v1.7.0. For older versions of numpy, just sum repeatedly (twice) along axis=0.
Filling in the edges of B is left as an exercise for the reader (since it's not really part of the question).
As an aside, the solution here is a one-liner if you want it to be. Personally, I think anything with as_strided is already illegible enough, and doesn't need any further obfuscation. I'm not sure if a for loop is going to be bad enough performance-wise to justify this method in fact.
For future reference, here is a generic window-making function that can be used to solve this sort of problem:
def window_view(a, window=3):
"""
Create a (read-only) view into `a` that defines window dimensions.
The first ``a.ndim`` dimensions of the returned view will be sized according to `window`.
The remaining ``a.ndim`` dimensions will be the original dimensions of `a`, truncated by `window - 1`.
The result can be post-precessed by reducing the leading dimensions. For example, a multi-dimensional moving average could look something like ::
window_view(a, window).sum(axis=tuple(range(a.ndim))) / window**a.ndim
If the window size were different for each dimension (`window` were a sequence rather than a scalar), the normalization would be ``np.prod(window)`` instead of ``window**a.ndim``.
Parameters
-----------
a : array-like
The array to window into. Due to numpy dimension constraints, can not have > 16 dims.
window :
Either a scalar indicating the window size for all dimensions, or a sequence of length `a.ndim` providing one size for each dimension.
Return
------
view : numpy.ndarray
A read-only view into `a` whose leading dimensions represent the requested windows into `a`.
``view.ndim == 2 * a.ndim``.
"""
a = np.array(a, copy=False, subok=True)
window = np.array(window, copy=False, subok=False, dtype=np.int)
if window.size == 1:
window = np.full(a.ndim, window)
elif window.size == a.ndim:
window = window.ravel()
else:
raise ValueError('Number of window sizes must match number of array dimensions')
shape = np.concatenate((window, a.shape))
shape[a.ndim:] -= window - 1
strides = a.strides * 2
return np.lib.stride_tricks.as_strided(a, shake=shape, strides=strides)
I have found no 'simple' ways of doing this. But here are two ways:
Still involves a for loop
# Basically, get the sum for each location and then pad the result with 0's
B = [[np.sum(A[j-1:j+2,i-1:i+2]) for i in range(1,len(A)-1)] for j in range(1,len(A[0])-1)]
B = np.pad(B, ((1,1)), "constant", constant_values=(0))
Is longer but no for loops (this will be a lot more efficient on big arrays):
# Roll basically slides the array in the desired direction
A_right = np.roll(A, -1, 1)
A_left = np.roll(A, 1, 1)
A_top = np.roll(A, 1, 0)
A_bottom = np.roll(A, -1, 0)
A_bot_right = np.roll(A_bottom, -1, 1)
A_bot_left = np.roll(A_bottom, 1, 1)
A_top_right = np.roll(A_top, -1, 1)
A_top_left = np.roll(A_top, 1, 1)
# After doing that, you can just add all those arrays and these operations
# are handled better directly by numpy compared to when you use for loops
B = A_right + A_left + A_top + A_bottom + A_top_left + A_top_right + A_bot_left + A_bot_right + A
# You can then return the edges to 0 or whatever you like
B[0:len(B),0] = 0
B[0:len(B),len(B[0])-1] = 0
B[0,0:len(B)] = 0
B[len(B[0])-1,0:len(B)] = 0
You can just sum the 9 arrays that make up a block, each one being shifted by 1 w.r.t. the previous in either dimension. Using slice notation this can be done for the whole array A at once:
B = np.zeros_like(A)
B[1:-1, 1:-1] = sum(A[i:A.shape[0]-2+i, j:A.shape[1]-2+j]
for i in range(0, 3) for j in range(0, 3))
General version for arbitrary rectangular windows
def sliding_window_sum(a, size):
"""Compute the sum of elements of a rectangular sliding window over the input array.
Parameters
----------
a : array_like
Two-dimensional input array.
size : int or tuple of int
The size of the window in row and column dimension; if int then a quadratic window is used.
Returns
-------
array
Shape is ``(a.shape[0] - size[0] + 1, a.shape[1] - size[1] + 1)``.
"""
if isinstance(size, int):
size = (size, size)
m = a.shape[0] - size[0] + 1
n = a.shape[1] - size[1] + 1
return sum(A[i:m+i, j:n+j] for i in range(0, size[0]) for j in range(0, size[1]))

Computing average for numpy array

I have a 2d numpy array (6 x 6) elements. I want to create another 2D array out of it, where each block is the average of all elements within a blocksize window. Currently, I have the foll. code:
import os, numpy
def avg_func(data, blocksize = 2):
# Takes data, and averages all positive (only numerical) numbers in blocks
dimensions = data.shape
height = int(numpy.floor(dimensions[0]/blocksize))
width = int(numpy.floor(dimensions[1]/blocksize))
averaged = numpy.zeros((height, width))
for i in range(0, height):
print i*1.0/height
for j in range(0, width):
block = data[i*blocksize:(i+1)*blocksize,j*blocksize:(j+1)*blocksize]
if block.any():
averaged[i][j] = numpy.average(block[block>0])
return averaged
arr = numpy.random.random((6,6))
avgd = avg_func(arr, 3)
Is there any way I can make it more pythonic? Perhaps numpy has something which does it already?
UPDATE
Based on M. Massias's soln below, here is an update with fixed values replaced by variables. Not sure if it is coded right. it does seem to work though:
dimensions = data.shape
height = int(numpy.floor(dimensions[0]/block_size))
width = int(numpy.floor(dimensions[1]/block_size))
t = data.reshape([height, block_size, width, block_size])
avrgd = numpy.mean(t, axis=(1, 3))
To compute some operation slice by slice in numpy, it is very often useful to reshape your array and use extra axes.
To explain the process we'll use here: you can reshape your array, take the mean, reshape it again and take the mean again.
Here I assume blocksize is 2
t = np.array([[0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],])
t = t.reshape([6, 3, 2])
t = np.mean(t, axis=2)
t = t.reshape([3, 2, 3])
np.mean(t, axis=1)
outputs
array([[ 0.5, 2.5, 4.5],
[ 0.5, 2.5, 4.5],
[ 0.5, 2.5, 4.5]])
Now that it's clear how this works, you can do it in one pass only:
t = t.reshape([3, 2, 3, 2])
np.mean(t, axis=(1, 3))
works too (and should be quicker since means are computed only once - I guess). I'll let you substitute height/blocksize, width/blocksize and blocksize accordingly.
See #askewcan nice remark on how to generalize this to any dimension.

How to fold/accumulate a numpy matrix product (dot)?

With using python library numpy, it is possible to use the function cumprod to evaluate cumulative products, e.g.
a = np.array([1,2,3,4,2])
np.cumprod(a)
gives
array([ 1, 2, 6, 24, 48])
It is indeed possible to apply this function only along one axis.
I would like to do the same with matrices (represented as numpy arrays), e.g. if I have
S0 = np.array([[1, 0], [0, 1]])
Sx = np.array([[0, 1], [1, 0]])
Sy = np.array([[0, -1j], [1j, 0]])
Sz = np.array([[1, 0], [0, -1]])
and
b = np.array([S0, Sx, Sy, Sz])
then I would like to have a cumprod-like function which gives
np.array([S0, S0.dot(Sx), S0.dot(Sx).dot(Sy), S0.dot(Sx).dot(Sy).dot(Sz)])
(This is a simple example, in reality I have potentially large matrices evaluated over n-dimensional meshgrids, so I seek for the most simple and efficient way to evaluate this thing.)
In e.g. Mathematica I would use
FoldList[Dot, IdentityMatrix[2], {S0, Sx, Sy, Sz}]
so I searched for a fold function, and all I found is an accumulate method on numpy.ufuncs. To be honest, I know that I am probably doomed because an attempt at
np.core.umath_tests.matrix_multiply.accumulate(np.array([pauli_0, pauli_x, pauli_y, pauli_z]))
as mentioned in a numpy mailing list yields the error
Reduction not defined on ufunc with signature
Do you have an idea how to (efficiently) do this kind of calculation ?
Thanks in advance.
As food for thought, here are 3 ways of evaluating the 3 sequential dot products:
With the normal Python reduce (which could also be written as a loop)
In [118]: reduce(np.dot,[S0,Sx,Sy,Sz])
array([[ 0.+1.j, 0.+0.j],
[ 0.+0.j, 0.+1.j]])
The einsum equivalent
In [119]: np.einsum('ij,jk,kl,lm',S0,Sx,Sy,Sz)
The einsum index expression looks like a sequence of operations, but it is actually evaluated as a 5d product with summation on 3 axes. In the C code this is done with an nditer and strides, but the effect is as follows:
In [120]: np.sum(S0[:,:,None,None,None] * Sx[None,:,:,None,None] *
Sy[None,None,:,:,None] * Sz[None,None,None,:,:],(1,2,3))
In [127]: np.prod([S0[:,:,None,None,None], Sx[None,:,:,None,None],
Sy[None,None,:,:,None], Sz[None,None,None,:,:]]).sum((1,2,3))
A while back while creating a patch from np.einsum I translated that C code to Python, and also wrote a Cython sum-of-products function(s). This code is on github at
https://github.com/hpaulj/numpy-einsum
einsum_py.py is the Python einsum, with some useful debugging output
sop.pyx is the Cython code, which is compiled to sop.so.
Here's how it could be used for part of your problem. I'm skipping the Sy array since my sop is not coded for complex numbers (but that could be changed).
import numpy as np
import sop
import einsum_py
S0 = np.array([[1., 0], [0, 1]])
Sx = np.array([[0., 1], [1, 0]])
Sz = np.array([[1., 0], [0, -1]])
print np.einsum('ij,jk,kl', S0, Sx, Sz)
# [[ 0. -1.] [ 1. 0.]]
# same thing, but with parsing information
einsum_py.myeinsum('ij,jk,kl', S0, Sx, Sz, debug=True)
"""
{'max_label': 108, 'min_label': 105, 'nop': 3,
'shapes': [(2, 2), (2, 2), (2, 2)],
'strides': [(16, 8), (16, 8), (16, 8)],
'ndim_broadcast': 0, 'ndims': [2, 2, 2], 'num_labels': 4,
....
op_axes [[0, -1, 1, -1], [-1, -1, 0, 1], [-1, 1, -1, 0], [0, 1, -1, -1]]
"""
# take op_axes (for np.nditer) from this debug output
op_axes = [[0, -1, 1, -1], [-1, -1, 0, 1], [-1, 1, -1, 0], [0, 1, -1, -1]]
w = sop.sum_product_cy3([S0,Sx,Sz], op_axes)
print w
As written sum_product_cy3 cannot take an arbitrary number of ops. Plus the iteration space increases with each op and index. But I can imagine calling it repeatedly, either at the Cython level, or from Python. I think it has potential for being faster than repeat(dot...) for lots of small arrays.
A condensed version of the Cython code is:
def sum_product_cy3(ops, op_axes, order='K'):
#(arr, axis=None, out=None):
cdef np.ndarray[double] x, y, z, w
cdef int size, nop
nop = len(ops)
ops.append(None)
flags = ['reduce_ok','buffered', 'external_loop'...]
op_flags = [['readonly']]*nop + [['allocate','readwrite']]
it = np.nditer(ops, flags, op_flags, op_axes=op_axes, order=order)
it.operands[nop][...] = 0
it.reset()
for x, y, z, w in it:
for i in range(x.shape[0]):
w[i] = w[i] + x[i] * y[i] * z[i]
return it.operands[nop]

How to make a multidimension numpy array with a varying row size?

I would like to create a two dimensional numpy array of arrays that has a different number of elements on each row.
Trying
cells = numpy.array([[0,1,2,3], [2,3,4]])
gives an error
ValueError: setting an array element with a sequence.
We are now almost 7 years after the question was asked, and your code
cells = numpy.array([[0,1,2,3], [2,3,4]])
executed in numpy 1.12.0, python 3.5, doesn't produce any error and
cells contains:
array([[0, 1, 2, 3], [2, 3, 4]], dtype=object)
You access your cells elements as cells[0][2] # (=2) .
An alternative to tom10's solution if you want to build your list of numpy arrays on the fly as new elements (i.e. arrays) become available is to use append:
d = [] # initialize an empty list
a = np.arange(3) # array([0, 1, 2])
d.append(a) # [array([0, 1, 2])]
b = np.arange(3,-1,-1) #array([3, 2, 1, 0])
d.append(b) #[array([0, 1, 2]), array([3, 2, 1, 0])]
While Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.
If you really want flexible Numpy arrays, use something like this:
numpy.array([[0,1,2,3], [2,3,4]], dtype=object)
However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).
This isn't well supported in Numpy (by definition, almost everywhere, a "two dimensional array" has all rows of equal length). A Python list of Numpy arrays may be a good solution for you, as this way you'll get the advantages of Numpy where you can use them:
cells = [numpy.array(a) for a in [[0,1,2,3], [2,3,4]]]
Another option would be to store your arrays as one contiguous array and also store their sizes or offsets. This takes a little more conceptual thought around how to operate on your arrays, but a surprisingly large number of operations can be made to work as if you had a two dimensional array with different sizes. In the cases where they can't, then np.split can be used to create the list that calocedrus recommends. The easiest operations are ufuncs, because they require almost no modification. Here are some examples:
cells_flat = numpy.array([0, 1, 2, 3, 2, 3, 4])
# One of these is required, it's pretty easy to convert between them,
# but having both makes the examples easy
cell_lengths = numpy.array([4, 3])
cell_starts = numpy.insert(cell_lengths[:-1].cumsum(), 0, 0)
cell_lengths2 = numpy.diff(numpy.append(cell_starts, cells_flat.size))
assert np.all(cell_lengths == cell_lengths2)
# Copy prevents shared memory
cells = numpy.split(cells_flat.copy(), cell_starts[1:])
# [array([0, 1, 2, 3]), array([2, 3, 4])]
numpy.array([x.sum() for x in cells])
# array([6, 9])
numpy.add.reduceat(cells_flat, cell_starts)
# array([6, 9])
[a + v for a, v in zip(cells, [1, 3])]
# [array([1, 2, 3, 4]), array([5, 6, 7])]
cells_flat + numpy.repeat([1, 3], cell_lengths)
# array([1, 2, 3, 4, 5, 6, 7])
[a.astype(float) / a.sum() for a in cells]
# [array([ 0. , 0.16666667, 0.33333333, 0.5 ]),
# array([ 0.22222222, 0.33333333, 0.44444444])]
cells_flat.astype(float) / np.add.reduceat(cells_flat, cell_starts).repeat(cell_lengths)
# array([ 0. , 0.16666667, 0.33333333, 0.5 , 0.22222222,
# 0.33333333, 0.44444444])
def complex_modify(array):
"""Some complicated function that modifies array
pretend this is more complex than it is"""
array *= 3
for arr in cells:
complex_modify(arr)
cells
# [array([0, 3, 6, 9]), array([ 6, 9, 12])]
for arr in numpy.split(cells_flat, cell_starts[1:]):
complex_modify(arr)
cells_flat
# array([ 0, 3, 6, 9, 6, 9, 12])
In numpy 1.14.3, using append:
d = [] # initialize an empty list
a = np.arange(3) # array([0, 1, 2])
d.append(a) # [array([0, 1, 2])]
b = np.arange(3,-1,-1) #array([3, 2, 1, 0])
d.append(b) #[array([0, 1, 2]), array([3, 2, 1, 0])]
what you get an list of arrays (that can be of different lengths) and you can do operations like d[0].mean(). On the other hand,
cells = numpy.array([[0,1,2,3], [2,3,4]])
results in an array of lists.
You may want to do this:
a1 = np.array([1,2,3])
a2 = np.array([3,4])
a3 = np.array([a1,a2])
a3 # array([array([1, 2, 3]), array([3, 4])], dtype=object)
type(a3) # numpy.ndarray
type(a2) # numpy.ndarray
Slightly off-topic, but not as much as one would think because of eager mode which is now the default:
If you are using Tensorflow, you can do:
a = tf.ragged.constant([[0, 1, 2, 3]])
b = tf.ragged.constant([[2, 3, 4]])
c = tf.concat([a, b], axis=0)
And you can then do all the mathematical operations still, like tf.math.reduce_mean, etc.
np.array([[0,1,2,3], [2,3,4]], dtype=object) returns an "array" of lists.
a = np.array([np.array([0,1,2,3]), np.array([2,3,4])], dtype=object) returns an array of arrays. It allows already for operations such as a+1.
Building up on this, the functionality can be enhanced by subclassing.
import numpy as np
class Arrays(np.ndarray):
def __new__(cls, input_array, dims=None):
obj = np.array(list(map(np.array, input_array))).view(cls)
return obj
def __getitem__(self, ij):
if isinstance(ij, tuple) and len(ij) > 1:
# handle twodimensional slicing
if isinstance(ij[0],slice) or hasattr(ij[0], '__iter__'):
# [1:4,:] or [[1,2,3],[1,2]]
return Arrays(arr[ij[1]] for arr in self[ij[0]])
return self[ij[0]][ij[1]] # [1,:] np.array
return super(Arrays, self).__getitem__(ij)
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
axis = kwargs.pop('axis', None)
dimk = [len(arg) if hasattr(arg, '__iter__') else 1 for arg in inputs]
dim = max(dimk)
pad_inputs = [([i]*dim if (d<dim) else i) for d,i in zip(dimk, inputs)]
result = [np.ndarray.__array_ufunc__(self, ufunc, method, *x, **kwargs) for x in zip(*pad_inputs)]
if method == 'reduce':
# handle sum, min, max, etc.
if axis == 1:
return np.array(result)
else:
# repeat over remaining axis
return np.ndarray.__array_ufunc__(self, ufunc, method, result, **kwargs)
return Arrays(result)
Now this works:
a = Arrays([[0,1,2,3], [2,3,4]])
a[0:1,0:-1]
# Arrays([[0, 1, 2]])
np.sin(a)
# Arrays([array([0. , 0.84147098, 0.90929743, 0.14112001]),
# array([ 0.90929743, 0.14112001, -0.7568025 ])], dtype=object)
a + 2*a
# Arrays([array([0, 3, 6, 9]), array([ 6, 9, 12])], dtype=object)
To get nanfunctions working, this can be done
# patch for nanfunction that cannot handle the object-ndarrays along with second axis=-1
def nanpatch(func):
def wrapper(a, axis=None, **kwargs):
if isinstance(a, Arrays):
rowresult = [func(x, **kwargs) for x in a]
if axis == 1:
return np.array(rowresult)
else:
# repeat over remaining axis
return func(rowresult)
# otherwise keep the original version
return func(a, axis=axis, **kwargs)
return wrapper
np.nanmean = nanpatch(np.nanmean)
np.nansum = nanpatch(np.nansum)
np.nanmin = nanpatch(np.nanmin)
np.nanmax = nanpatch(np.nanmax)
np.nansum(a)
# 15
np.nansum(a, axis=1)
# array([6, 9])

Categories

Resources