Efficient way to cast scalars to numpy arrays

Efficient way to cast scalars to numpy arrays - python

When I write a function that accepts ndarray or scalar inputs
def foo(a):
# does something to `a`
#
# a: `x` dimensional array or scalar
# . . .
cast(a, x)
# deal with `a` as if it is an `x`-d array after this
Is there an effeicint way yo write that cast function? Basically what I'd want is a function that would cast:
a, a scalar to ndarray with shape ((1,)*x)
b, an ndarray with y<x dims explicitly to shape ((1,) * (y-x) + b.shape) (same as broadcasting)
c, an ndarray with x dims is unaffected
d, an ndarray with y>x dims throws an error
do it all in-place (at least when starting with an array), to prevent double memory
it seems like this functionality is repeated so often in built-in functions that there should be some shortcut for it, but I'm not finding it.
I can do a_ = np.array(a, ndmin = x, copy = False) and then assert len(a_.shape) == x) , but that still makes a copy of arrays. (i.e. a_.base is a is False). Is there any way around this?

asarray returns the array itself (if starting with an array):
In [271]: x=np.arange(10)
In [272]: y = np.asarray(x)
In [273]: id(x)
Out[273]: 2812424128
In [274]: id(y)
Out[274]: 2812424128 # same id
ndmin produces a view:
In [276]: y = np.array(x, ndmin=2, copy=False)
In [277]: y
Out[277]: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [278]: id(x)
Out[278]: 2812424128
In [279]: id(y)
Out[279]: 2811135704 # different id
In [281]: x.__array_interface__['data']
Out[281]: (188551320, False)
In [282]: y.__array_interface__['data'] # same databuffer
Out[282]: (188551320, False)
ndmin on an array of the right dim already:
In [286]: x = np.arange(9).reshape(3,3)
In [287]: y = np.array(x, ndmin=2, copy=False)
In [288]: id(x)
Out[288]: 2810813120
In [289]: id(y)
Out[289]: 2810813120 # same id
Similar discussion with astype,
confused about the `copy` attribution of `numpy.astype`

Related

np.piecewise generates incorrect values for integer array

I have a numpy piecewise function defined as
def function(x):
return np.piecewise(x, [x <= 1, x > 1], [lambda x: 1/2*np.sin((x-1)**2), lambda x:-1/2*np.sin((x-1)**2)])
I have no idea why this function is returning incorrect values for various x-values. In particular, running the following
X = np.array([0,2.1])
Y = np.array([0,2])
A = function(X)
B = function(Y)
will give A = array([ 0.42073549, -0.467808 ]), but B = array([0, 0]). Why is this happening?
I am expecting B = array([0.42073549, -0.468ish]).

Look at the types of your data.
X is an array of floats. But Y is an array of int.
And, quoting documentation of piecewise
The output is the same shape and type as x
So, output of piecewise when called with Y, that is an array of shape (2,) and dtype int64, is forced to be an array of shape (2,) and dtype int64. And the closest int64 to 0.42073549, -0.468ish are 0 and 0.
Just replace Y by np.array([0,2.0]) (to force float type), or np.array([0, 2], dtype=np.float64),

Translate accumarray (Matlab) to python

I have to translate a Matlab script to python, it transforms some complicated data into an array. And I don't know how to translate this part of the code:
accumarray([j2,i2],iq,[],[],NaN)
That is in Matlab, the shapes of j2, i2 and iq are (1362730 x 1). But the shape of [j2, i2] would be (1362730 x 2).
I found this function in python to use accumarray:
def accum(accmap, a, func=None, size=None, fill_value=0, dtype=None):
"""
An accumulation function similar to Matlab's `accumarray` function.
Parameters
----------
accmap : ndarray
This is the "accumulation map". It maps input (i.e. indices into
`a`) to their destination in the output array. The first `a.ndim`
dimensions of `accmap` must be the same as `a.shape`. That is,
`accmap.shape[:a.ndim]` must equal `a.shape`. For example, if `a`
has shape (15,4), then `accmap.shape[:2]` must equal (15,4). In this
case `accmap[i,j]` gives the index into the output array where
element (i,j) of `a` is to be accumulated. If the output is, say,
a 2D, then `accmap` must have shape (15,4,2). The value in the
last dimension give indices into the output array. If the output is
1D, then the shape of `accmap` can be either (15,4) or (15,4,1)
a : ndarray
The input data to be accumulated.
func : callable or None
The accumulation function. The function will be passed a list
of values from `a` to be accumulated.
If None, numpy.sum is assumed.
size : ndarray or None
The size of the output array. If None, the size will be determined
from `accmap`.
fill_value : scalar
The default value for elements of the output array.
dtype : numpy data type, or None
The data type of the output array. If None, the data type of
`a` is used.
Returns
-------
out : ndarray
The accumulated results.
The shape of `out` is `size` if `size` is given. Otherwise the
shape is determined by the (lexicographically) largest indices of
the output found in `accmap`.
Examples
--------
>>> from numpy import array, prod
>>> a = array([[1,2,3],[4,-1,6],[-1,8,9]])
>>> a
array([[ 1, 2, 3],
[ 4, -1, 6],
[-1, 8, 9]])
>>> # Sum the diagonals.
>>> accmap = array([[0,1,2],[2,0,1],[1,2,0]])
>>> s = accum(accmap, a)
array([9, 7, 15])
>>> # A 2D output, from sub-arrays with shapes and positions like this:
>>> # [ (2,2) (2,1)]
>>> # [ (1,2) (1,1)]
>>> accmap = array([
[[0,0],[0,0],[0,1]],
[[0,0],[0,0],[0,1]],
[[1,0],[1,0],[1,1]],
])
>>> # Accumulate using a product.
>>> accum(accmap, a, func=prod, dtype=float)
array([[ -8., 18.],
[ -8., 9.]])
>>> # Same accmap, but create an array of lists of values.
>>> accum(accmap, a, func=lambda x: x, dtype='O')
array([[[1, 2, 4, -1], [3, 6]],
[[-1, 8], [9]]], dtype=object)
"""
# Check for bad arguments and handle the defaults.
if accmap.shape[:a.ndim] != a.shape:
raise ValueError("The initial dimensions of accmap must be the same as a.shape")
if func is None:
func = np.sum
if dtype is None:
dtype = a.dtype
if accmap.shape == a.shape:
accmap = np.expand_dims(accmap, -1)
adims = tuple(range(a.ndim))
if size is None:
size = 1 + np.squeeze(np.apply_over_axes(np.max, accmap, axes=adims))
size = np.atleast_1d(size)
# Create an array of python lists of values.
vals = np.empty(size, dtype='O')
for s in product(*[range(k) for k in size]):
vals[s] = []
for s in product(*[range(k) for k in a.shape]):
indx = tuple(accmap[s])
val = a[s]
vals[indx].append(val)
# Create the output array.
out = np.empty(size, dtype=dtype)
for s in product(*[range(k) for k in size]):
if vals[s] == []:
out[s] = fill_value
else:
out[s] = func(vals[s])
return out
But it doesnt work when the shapes of accmap and a are different, which is the case because my accmap would be [j2, i2] with shape (1362730 x 2) and a would be iq with shape (1362730 x 1). I don't quite understand what does Matlab do when the inputs are of different sizes. Is there a way to modify the python function to be able to do that, or just another way to translate that line to python?

I had a project in Matlab where I used accumarray(). I recently ported it to Python using numpy.histogramdd() as its closest replacement.

Remove row from arbitrary dimension in numpy

I have a function, remrow which takes as input an arbitrary numpy nd array, arr, and an integer, n. My function should remove the last row from arr in the nth dimension. For example, if call my function like so:
remrow(arr,2)
with arr as a 3d array, then my function should return:
arr[:,:,:-1]
Similarly if I call;
remrow(arr,1)
and arr is a 5d array, then my function should return:
arr[:,:-1,:,:,:]
My problem is this; my function must work for all shapes and sizes of arr and all compatible n. How can I do this with numpy array indexing?

Construct an indexing tuple, consisting of the desired combination of slice(None) and slice(None,-1) objects.
In [75]: arr = np.arange(24).reshape(2,3,4)
In [76]: idx = [slice(None) for _ in arr.shape]
In [77]: idx
Out[77]: [slice(None, None, None), slice(None, None, None), slice(None, None, None)]
In [78]: idx[1]=slice(None,-1)
In [79]: arr[tuple(idx)].shape
Out[79]: (2, 2, 4)
In [80]: idx = [slice(None) for _ in arr.shape]
In [81]: idx[2]=slice(None,-1)
In [82]: arr[tuple(idx)].shape
Out[82]: (2, 3, 3)

numpy apply along n-spaces

I have a 4d array, and I would like to apply a function to each 2d slice taken by iterating over the last two dimensions. Viz, apply f(2d_array) to (x,y,0,0), and f(2d_array) to (x,y,0,1), etc etc. My function operates on the array in place, so the dimensions would be the same, but a general solution would return an array of shape (x',y',w,z), where w and z are the last two dimensions of the original array.
This could obviously be generalized to mD slices over an nD array.
Is there any built-in functionality that does this thing?

The 'basic' apply-along-axis model is to iterate on one axis, and pass the other to your function:
In [197]: def foo(x): # return same size
...: return x*2
...: np.array([foo(x) for x in np.arange(12).reshape(3,4)])
...:
Out[197]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
In [198]: def foo(x):
...: return x.sum() # return one less dim
...: np.array([foo(x) for x in np.arange(12).reshape(3,4)])
...:
Out[198]: array([ 6, 22, 38])
In [199]: def foo(x):
...: return x.sum(keepdims=True) # condense the dim
...: np.array([foo(x) for x in np.arange(12).reshape(3,4)])
...:
Out[199]:
array([[ 6],
[22],
[38]])
Your 4d problem can be massaged to fit this.
In [200]: arr_4d = np.arange(24).reshape(2,3,2,2)
In [201]: arr_2d = arr_4d.reshape(6,4).T
In [202]: res = np.array([foo(x) for x in arr_2d])
In [203]: res
Out[203]:
array([[60],
[66],
[72],
[78]])
In [204]: res.reshape(2,2)
Out[204]:
array([[60, 66],
[72, 78]])
which is the equivalent of doing:
In [205]: arr_4d[:,:,0,0].sum()
Out[205]: 60
In [206]: foo(arr_4d[:,:,0,0].ravel())
Out[206]: array([60])
apply_along_axis requires a function that takes a 1d array, but can be applied thus:
In [209]: np.apply_along_axis(foo,0,arr_4d.reshape(6,2,2))
Out[209]:
array([[[60, 66],
[72, 78]]])
foo could reshape its input to 2d, and pass it to a function that takes 2d. apply_along_index uses np.ndindex to generate the indices for the iteration axes.
In [212]: list(np.ndindex(2,2))
Out[212]: [(0, 0), (0, 1), (1, 0), (1, 1)]
np.vectorize normally works with a function that takes a scalar. But recent versions have a signature parameter, which I believe could be used to work with your case. It may require transposing the input so it iterates on the first two axes, passing the last two to function. See my answer at https://stackoverflow.com/a/46004266/901925.
None of these approaches offers a speed advantage.
Without reshaping or swapping, I can iterate with the help of ndindex.
Define a function that expects a 2d input:
def foo2(x):
return x.sum(axis=1, keepdims=True) # 2d
Index iterator for the last 2 dim of arr_4d:
In [260]: idx = np.ndindex(arr_4d.shape[-2:])
Do test calc to determine the shape of the return. vectorize and apply... do this sort of test.
In [261]: r1 = foo2(arr_4d[:,:,0,0]).shape
In [262]: r1
Out[262]: (2, 1)
The result array:
In [263]: res = np.zeros(r1+arr_4d.shape[-2:])
In [264]: res.shape
Out[264]: (2, 1, 2, 2)
Now iterate:
In [265]: for i,j in idx:
...: res[...,i,j] = foo2(arr_4d[...,i,j])
...:
In [266]: res
Out[266]:
array([[[[ 12., 15.],
[ 18., 21.]]],
[[[ 48., 51.],
[ 54., 57.]]]])

I guess you're looking for something like numpy.apply_over_axes coupled with a for loop to iterate other the varying axes.

I rolled my own. I'd be interested to know if there are any performance differences between this and #hpaulj's method and if there is reason to believe that writing a custom c module would be offer significant improvement. Of course #hpaulj's method is more general, since this is specific to my needing to just perform an operation on the array in place.
def apply_along_space(f, np_array, axes):
# apply the function f on each subspace given by iterating over the axes listed in axes, e.g. axes=(0,2)
for slic in itertools.product(*map(lambda ax: range(np_array.shape[ax]) if ax in axes else [slice(None,None,None)], range(len(np_array.shape)))):
f(np_array[slic])
return np_array

Given a byte buffer, dtype, shape and strides, how to create Numpy ndarray

I have a buffer, dtype, shape and strides. I want to create a Numpy ndarray which reuses the memory of the buffer.
There is numpy.frombuffer which creates a 1D array from a buffer and reuses the memory. However, I'm not sure if I can easily and safely reshape it and set the strides.
There is the numpy.ndarray constructor which can refer to a buffer but I'm not sure if it will reuse the memory or if it will copy it (it's not clear from the documentation).
So, will the numpy.ndarray constructor do what I want? Or what can I use instead?
Ok, so I'm trying to figure out myself now what the numpy.ndarray constructor is really doing. The code is here. It uses PyArray_BufferConverter to convert the buffer argument. Then it will call PyArray_NewFromDescr_int which can be seen here. If data is passed in there, it will fa->flags &= ~NPY_ARRAY_OWNDATA;.

As mentioned in the comment by #hpaulj, you can accomplish this using the stride_tricks module. You need both np.frombuffer and np.lib.stride_tricks.as_strided:
Gather data from NumPy array
In [1]: import numpy as np
In [2]: x = np.random.random((3, 4)).astype(dtype='f4')
In [3]: buffer = x.data
In [4]: dtype = x.dtype
In [5]: shape = x.shape
In [6]: strides = x.strides
Recreate NumPy array
In [7]: xx = np.frombuffer(buffer, dtype)
In [8]: xx = np.lib.stride_tricks.as_strided(xx, shape, strides)
Verify results
In [9]: x
Out[9]:
array([[ 0.75343359, 0.20676662, 0.83675659, 0.99904215],
[ 0.37182721, 0.83846378, 0.6888299 , 0.57195812],
[ 0.39905572, 0.7258808 , 0.88316005, 0.2187883 ]], dtype=float32)
In [10]: xx
Out[10]:
array([[ 0.75343359, 0.20676662, 0.83675659, 0.99904215],
[ 0.37182721, 0.83846378, 0.6888299 , 0.57195812],
[ 0.39905572, 0.7258808 , 0.88316005, 0.2187883 ]], dtype=float32)
In [11]: x.strides
Out[11]: (16, 4)
In [12]: xx.strides
Out[12]: (16, 4)

I'd stick with frombuffer because it's intended directly for this purpose, and makes it clear what you're doing. Here's an example:
In [58]: s0 = 'aaaa' # a single int32
In [59]: s1 = 'aaabaaacaaadaaae' # 4 int32s, each increasing by 1
In [60]: a0 = np.frombuffer(s0, dtype='>i4', count=1) # dtype sets the stride
In [61]: print a0
[1633771873]
In [62]: a1 = np.frombuffer(s, dtype='>i4', count=4)
In [63]: print a1
[1633771874 1633771875 1633771876 1633771877]
In [64]: a2 = a1.reshape((2,2)) # do a reshape, which also sets the strides
In [65]: print a2
[[1633771874 1633771875]
[1633771876 1633771877]]
In [66]: a2 - a0 # do some calculation with the reshape
Out[66]:
array([[1, 2],
[3, 4]], dtype=int32)
Is there something you need that this doesn't do?

You could use either method - neither of them will generate a copy:
s = b'aaabaaacaaadaaae'
a1 = np.frombuffer(s, np.int32, 4).reshape(2, 2)
a2 = np.ndarray((2, 2), np.int32, buffer=s)
print(a1.flags.owndata, a1.base.tostring())
# (False, b'aaabaaacaaadaaae')
print(a2.flags.owndata, a2.base)
# (False, b'aaabaaacaaadaaae')
Note that neither array can be modified in place, since they are backed by read-only memory:
a1[:] = 0 # ValueError: assignment destination is read-only

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficient way to cast scalars to numpy arrays - python

Related

np.piecewise generates incorrect values for integer array

Translate accumarray (Matlab) to python

Remove row from arbitrary dimension in numpy

numpy apply along n-spaces

Given a byte buffer, dtype, shape and strides, how to create Numpy ndarray

Categories

Resources