adding new axes to facilitate broadcast, a better way?

adding new axes to facilitate broadcast, a better way? - python

I am looking for a nice way to "clean up" the dimensions of two arrays which I would like to combine together using broadcasting.In particular I would like to broadcast a one dimensional array up to the shape of a multidimensional array and then add the two arrays. My understanding of the broadcasting rules tells me that this should work find if the last dimension of the multidimensional array matches that of the one dimensional array. For example, arrays with shapes (,3) and (10,3) would add fine
My problem is, given how the array I have is built the matching dimension happens to be the first dimension of the array so the broadcasting rules are not met. For reference my one d array has shape (,3) and the multi-dimensional array is (3,10,10,50).
I could correct this by reshaping the multi-dimensional array so that the compatible dimension is the last dimension but I'd like to avoid this as I find the logic of reshaping tricky to follow when the different dimensions have specific meaning.
I can also add empty dimensions to the one dimensional array as in the code below until the one dimensional array has as many dimensions as the high dimensional array as in the code snippet below.
>>> import numpy as np
>>> a = np.array([[1,2],
>>> [3,4],
>>> [5,6]])
>>> b = np.array([10,20,30])
>>> a+b[:,None]
array([[11, 12],
[23, 24],
[35, 36]])
This gives me my desired output however in my case my high dimensional array has 4 different axes so I would need to add in multiple empty dimensions which starts to feel inelegant. I can do something like
b = b[[slice(None)] + 3*[np.newaxis]]
and then proceed but that doesn't seem great. More generally one could imagine needing an arbitrary number of new axes on both sides of the original one dimension and writing a helper function to generalize the above logic. Is there a nicer/clearer way to achieve this?

Related

transpose Keyword not working as I expected [duplicate]

My goal is to to turn a row vector into a column vector and vice versa. The documentation for numpy.ndarray.transpose says:
For a 1-D array, this has no effect. (To change between column and row vectors, first cast the 1-D array into a matrix object.)
However, when I try this:
my_array = np.array([1,2,3])
my_array_T = np.transpose(np.matrix(myArray))
I do get the wanted result, albeit in matrix form (matrix([[66],[640],[44]])), but I also get this warning:
PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
my_array_T = np.transpose(np.matrix(my_array))
How can I properly transpose an ndarray then?

A 1D array is itself once transposed, contrary to Matlab where a 1D array doesn't exist and is at least 2D.
What you want is to reshape it:
my_array.reshape(-1, 1)
Or:
my_array.reshape(1, -1)
Depending on what kind of vector you want (column or row vector).
The -1 is a broadcast-like, using all possible elements, and the 1 creates the second required dimension.

If your array is my_array and you want to convert it to a column vector you can do:
my_array.reshape(-1, 1)
For a row vector you can use
my_array.reshape(1, -1)
Both of these can also be transposed and that would work as expected.

IIUC, use reshape
my_array.reshape(my_array.size, -1)

Reduce 3rd dimension of numpy array and sum the values

I think this is straightforward but I can't quite get it. I have a large 3d array and I want to reduce the 3rd dim by some factor and then sum the values to get to that reduced size. An example that works to get what I want is:
import numpy as np
arr=np.ones((10,10,16))
processed_data=np.zeros((arr.shape[0], arr.shape[1]), dtype='object')
factor=2
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
processed_data[i][j]=arr[i][j].reshape(int(arr.shape[2]/factor),-1).sum(axis=1)
So we take the last dimension, reshape it to an extra dimension and then sum along that dimension. In the example above the data is a 10x10x16 array of all 1s so with a factor=2 we get a 10x10x8 array out with the data all being 2s. I hope this illustrates what I am trying to achieve. If the factor would change to 4 we would get a 10x10x4 array out.
This method is not ideal as it involves creating a separate processed_data 'object' array where I would rather leave it as a 3D array, just with a reduced third dimension. It also involves iterating over every element in the 2D array which I don't think is neccessary. And it's really slow.
Any help appreciated - I suspect it is a combination of reshaping and transposing but cannot get my head around it.
Thanks.

I think you can reshape on the whole data and sum:
arr.reshape(*arr.shape[:2], -1, 2).sum(axis=-1)

Slicing a mesh of unknown dimensions

I'm trying to do a grid search over a model I've trained. So, producing a mesh, then predicting that mesh with the model to find a maximum.
I'm producing the mesh with:
def generate_random_grid(n_scanning_parameters,n_points_each_dimension):
points=np.linspace(1,0,num=n_points_each_dimension,endpoint=False)
x_points=[points for dimension in range(n_scanning_parameters)]
mesh=np.array(np.meshgrid(*x_points))
return mesh
As you can see, I don't know the dimensions in advance. So later when I want to index the mesh to predict different points, I don't know how to index.
E.g, if I have 4 dimensions and 10 points along each dimension, the mesh has the shape (4,10,10,10,10). And I need to access points like e.g. [:,0,0,0,0] or [:,1,2,3,4]. Which would give me a 1-D vector with 4 elements.
Now I can produce the 4 last indices using
for index in np.ndindex(*mesh.shape[1:]):
, but then indexing my mesh like mesh[:,index] doesn't result in a 1-D vector with 4 elements as I expect it to.
How can I index the mesh?

Since you're working with tuples, and numpy supports tuple indexing, let's start with that.
Effectively, you want to do your slicing like a[:, 0, 0, 0, 0]. But your index is a tuple, and you're attempting something like a[:, (0,0,0,0)] - this gives you four hyperplanes along the second dimension instead. Your indexing should be more like a[(:,0,0,0,0)] - but this gives a syntax error.
So the solution would be to use the slice built-in.
a[(slice(None),0,0,0,0)]
This would give you your one dimensional vector.
In terms of your code, you can simply add the tuples to make this work.
for index in np.ndindex(*mesh.shape[1:]):
vector = mesh[(slice(None), ) + index]
An alternative approach would be to simply use a transposed array and reversed indices. The first dimension is at the end, removing the need for :.
for index in np.ndindex(*mesh.shape[1:]):
vector = mesh.T[index[::-1]]

Computing average across a list of MxN arrays

I'm still getting the hang of working with numpy and array-wise operations.
I'm looking for the way of getting the row-wise average of a list of 2D arrays.
E.g I have a 4x3x25 array and I'm looking to get a 3x25 array of the row-wise averages.

If everything’s in one 3D array already, you can just do:
A.mean(axis=0)
…which will operate along the first dimension.
If it’s actually just a list of 2D arrays, you’ll have to convert it to a 3D array first. I would do:
A = np.dstack(list_of_arrays) # Combine the 2D arrays along a new 3rd dimension
A.mean(axis=2) # Calculate the means along that new dimension

Writing functions that accept both 1-D and 2-D numpy arrays?

My understanding is that 1-D arrays in numpy can be interpreted as either a column-oriented vector or a row-oriented vector. For instance, a 1-D array with shape (8,) can be viewed as a 2-D array of shape (1,8) or shape (8,1) depending on context.
The problem I'm having is that the functions I write to manipulate arrays tend to generalize well in the 2-D case to handle both vectors and matrices, but not so well in the 1-D case.
As such, my functions end up doing something like this:
if arr.ndim == 1:
# Do it this way
else:
# Do it that way
Or even this:
# Reshape the 1-D array to a 2-D array
if arr.ndim == 1:
arr = arr.reshape((1, arr.shape[0]))
# ... Do it the 2-D way ...
That is, I find I can generalize code to handle 2-D cases (r,1), (1,c), (r,c), but not the 1-D cases without branching or reshaping.
It gets even uglier when the function operates on multiple arrays as I would check and convert each argument.
So my question is: am I missing some better idiom? Is the pattern I've described above common to numpy code?
Also, as a related matter of API design principles, if the caller passes a 1-D array to some function that returns a new array, and the return value is also a vector, is it common practice to reshape a 2-D vector (r,1) or (1,c) back to a 1-D array or simply document that the function returns a 2-D array regardless?
Thanks

I think in general NumPy functions that require an array of shape (r,c) make no special allowance for 1-D arrays. Instead, they expect the user to either pass an array of shape (r,c) exactly, or for the user to pass a 1-D array that broadcasts up to shape (r,c).
If you pass such a function a 1-D array of shape (c,) it will broadcast to shape (1,c), since broadcasting adds new axes on the left. It can also broadcast to shape (r,c) for an arbitrary r (depending on what other array it is being combined with).
On the other hand, if you have a 1-D array, x, of shape (r,) and you need it to broadcast up to shape (r,c), then NumPy expects the user to pass an array of shape (r,1) since broadcasting will not add the new axes on the right for you.
To do that, the user must pass x[:,np.newaxis] instead of just x.
Regarding return values: I think it better to always return a 2-D array. If the user knows the output will be of shape (1,c), and wants a 1-D array, let her slice off the 1-D array x[0] herself.
By making the return value always the same shape, it will be easier to understand code that uses this function, since it is not always immediately apparent what the shape of the inputs are.
Also, broadcasting blurs the distinction between a 1-D array of shape (c,) and a 2-D array of shape (r,c). If your function returns a 1-D array when fed 1-D input, and a 2-D array when fed 2-D input, then your function makes the distinction strict instead of blurred. Stylistically, this reminds me of checking if isinstance(obj,type), which goes against the grain of duck-typing. Don't do it if you don't have to.

unutbu's explanation is good, but I disagree on the return dimension.
The function internal pattern depends on the type of function.
Reduce operations with an axis argument can often be written so that the number of dimensions doesn't matter.
Numpy has also an atleast_2d (and atleast_1d) function that is also commonly used if you need an explicit 2d array. In statistics, I sometimes use a function like atleast_2d_cols, that reshapes 1d (r,) to 2d (r,1) for code that expects 2d, or if the input array is 1d, then the interpretation and linear algebra requires a column vector. (reshaping is cheap so this is not a problem)
In a third case, I might have different code paths if the lower dimensional case can be done cheaper or simpler than the higher dimensional case. (example: if 2d requires several dot products.)
return dimension
I think not following the numpy convention with the return dimension can be very confusing to users for general functions. (topic specific functions can be different.)
For example, reduce operations loose one dimension.
For many other functions the output dimension matches the input dimension. I think a 1d input should have a 1d output and not an extra redundant dimension. Except for functions in linalg, I don't remember any functions that would return a redundant extra dimension. (The scalar versus 1-element array case is not always consistent.)
Stylistically this reminds me of an isinstance check:
Try without it if you allow for example for numpy matrices and masked arrays. You will get funny results that are not easy to debug. Although, for most numpy and scipy functions the user has to know whether the array type will work with them, since there are few isinstance checks and asarray might not always do the right thing.
As a user, I always know what kind of "array_like" I have, a list, tuple or which array subclass, especially when I use multiplication.
np.array(np.eye(3).tolist()*3)
np.matrix(range(3)) * np.eye(3)
np.arange(3) * np.eye(3)
another example: What does this do?
>>> x = np.array(tuple(range(3)), [('',int)]*3)
>>> x
array((0, 1, 2),
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
>>> x * np.eye(3)

This question has already very good answers. Here I just want to add what I usually do (which somehow summarizes responses by others) when I want to write functions that accept a wide range of inputs while the operations I do on them require a 2d row or column vector.
If I know the input is always 1d (array or list):
a. if I need a row: x = np.asarray(x)[None,:]
b. if I need a column: x = np.asarray(x)[:,None]
If the input can be either 2d (array or list) with the right shape or 1d (which needs to be converted to 2d row/column):
a. if I need a row: x = np.atleast_2d(x)
b. if I need a column: x = np.atleast_2d(np.asarray(x).T).T or x = np.reshape(x, (len(x),-1)) (the latter seems faster)

This is a good use for decorators
def atmost_2d(func):
def wrapr(x):
return func(np.atleast_2d(x)).squeeze()
return wrapr
For example, this function will pick out the last column of its input.
#atmost_2d
def g(x):
return x[:,-1]
But: it works for:
1d:
In [46]: b
Out[46]: array([0, 1, 2, 3, 4, 5])
In [47]: g(b)
Out[47]: array(5)
2d:
In [49]: A
Out[49]:
array([[0, 1],
[2, 3],
[4, 5]])
In [50]: g(A)
Out[50]: array([1, 3, 5])
0d:
In [51]: g(99)
Out[51]: array(99)
This answer builds on the previous two.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

adding new axes to facilitate broadcast, a better way? - python

Related

transpose Keyword not working as I expected [duplicate]

Reduce 3rd dimension of numpy array and sum the values

Slicing a mesh of unknown dimensions

Computing average across a list of MxN arrays

Writing functions that accept both 1-D and 2-D numpy arrays?

Categories

Resources