Randomly sample rows (first dimension) of an array with unknown dimension

Randomly sample rows (first dimension) of an array with unknown dimension - python

When the dimension is known, the task is trivial. Take a 2D array:
a = np.random.randint(10, a=(5,2))
a[np.random.choice(a.shape[0]), :]
However, in my function, the dimension of the array is arbitrary. How to handle this situation?

use the size of the 1st dimension to determine the random range:
a[np.random.randint(0,a.shape[0],10)]
or if you prefer, include an Ellipsis
a[np.random.randint(0,a.shape[0],10),...]
1 indexing array selects from rows by default.

Related

How can you do an outer summation over only one dimension of a numpy 2D array?

I have a (square) 2 dimensional numpy array where I would like to compare (subtract) all of the values within each row to each other but not to other rows so the output should be a 3D array.
matrix = np.array([[10,1,32],[32,4,15],[6,3,1]])
Output should be a 3x3x3 array which looks like:
output = [[[0,-9,22],[0,-28,-17],[0,-3,-5]], [[9,0,31],[28,0,11],[3,0,-2]], [[-22,-31,0],[17,-11,0],[5,2,0]]]
I.e. for output[0], for each of the 3 rows of matrix, subtract that row's zeroth element from every other, for output[1] subtract each row's first element etc.
This seems to me like a reduced version of numpy's ufunc.outer functionality which should be possible with
tryouter = np.subtract(matrix, matrix)
and then taking some clever slice and/or transposition.
Indeed, if you do this, one finds that: output[i,j] = tryouter[i,j,i]
This looks like it should be solvable by using np.transpose to switch the 1 and 2 axes and then taking the arrays on the new 0,1 diagonal but I can't work out how to do this with numpy diagonal or any slicing method.
Is there a way to do this or is there a simpler approach to this whole problem built into numpy?
Thanks :)

You're close, you can do it with broadcasting:
out = matrix[None, :, :] - matrix.T[:, :, None]
Here .T is the same as np.transpose, and using None as an index introduces a new dummy dimension of size 1.

transpose Keyword not working as I expected [duplicate]

My goal is to to turn a row vector into a column vector and vice versa. The documentation for numpy.ndarray.transpose says:
For a 1-D array, this has no effect. (To change between column and row vectors, first cast the 1-D array into a matrix object.)
However, when I try this:
my_array = np.array([1,2,3])
my_array_T = np.transpose(np.matrix(myArray))
I do get the wanted result, albeit in matrix form (matrix([[66],[640],[44]])), but I also get this warning:
PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
my_array_T = np.transpose(np.matrix(my_array))
How can I properly transpose an ndarray then?

A 1D array is itself once transposed, contrary to Matlab where a 1D array doesn't exist and is at least 2D.
What you want is to reshape it:
my_array.reshape(-1, 1)
Or:
my_array.reshape(1, -1)
Depending on what kind of vector you want (column or row vector).
The -1 is a broadcast-like, using all possible elements, and the 1 creates the second required dimension.

If your array is my_array and you want to convert it to a column vector you can do:
my_array.reshape(-1, 1)
For a row vector you can use
my_array.reshape(1, -1)
Both of these can also be transposed and that would work as expected.

IIUC, use reshape
my_array.reshape(my_array.size, -1)

From array of indicies, create an array of the values at those indicies

This is for a Machine Learning problem (in Python of course).
I have a 2 dimensional array, the rows are set of points, and the columns are indices into another 1 dimensional array of values for those points.
data = [[1,3,2], [3,3,1], [5,1,2]]
# yes there are duplicates in the labels
labels = [2,8,9,8,8,9]
What I need is to create a 2D array that is the original data array, but where the values in it are now the value from labels that the index represented.
new_data = [[8,8,9], [8,8,8], [9,8,9]]
I can do this with for loops obviously. I'm asking here in case numpy or something has a call that does this.

Use the indices as indices:
np.array(labels)[np.array(data)]
The output of an advanced (integer) index is the shape of the index array (data).

Slicing a mesh of unknown dimensions

I'm trying to do a grid search over a model I've trained. So, producing a mesh, then predicting that mesh with the model to find a maximum.
I'm producing the mesh with:
def generate_random_grid(n_scanning_parameters,n_points_each_dimension):
points=np.linspace(1,0,num=n_points_each_dimension,endpoint=False)
x_points=[points for dimension in range(n_scanning_parameters)]
mesh=np.array(np.meshgrid(*x_points))
return mesh
As you can see, I don't know the dimensions in advance. So later when I want to index the mesh to predict different points, I don't know how to index.
E.g, if I have 4 dimensions and 10 points along each dimension, the mesh has the shape (4,10,10,10,10). And I need to access points like e.g. [:,0,0,0,0] or [:,1,2,3,4]. Which would give me a 1-D vector with 4 elements.
Now I can produce the 4 last indices using
for index in np.ndindex(*mesh.shape[1:]):
, but then indexing my mesh like mesh[:,index] doesn't result in a 1-D vector with 4 elements as I expect it to.
How can I index the mesh?

Since you're working with tuples, and numpy supports tuple indexing, let's start with that.
Effectively, you want to do your slicing like a[:, 0, 0, 0, 0]. But your index is a tuple, and you're attempting something like a[:, (0,0,0,0)] - this gives you four hyperplanes along the second dimension instead. Your indexing should be more like a[(:,0,0,0,0)] - but this gives a syntax error.
So the solution would be to use the slice built-in.
a[(slice(None),0,0,0,0)]
This would give you your one dimensional vector.
In terms of your code, you can simply add the tuples to make this work.
for index in np.ndindex(*mesh.shape[1:]):
vector = mesh[(slice(None), ) + index]
An alternative approach would be to simply use a transposed array and reversed indices. The first dimension is at the end, removing the need for :.
for index in np.ndindex(*mesh.shape[1:]):
vector = mesh.T[index[::-1]]

Numpy: Perform Multiplication-like Addition

I wanted to define my own addition operator that takes an Nx1 vector (call it A) and a 1xN vector (B) such that the element in the i^th row and j^th column is the sum of the i^th element in A and the j^th element in B. An example is illustrated here.
I was able to write the following code for the function (and it is correct as far as I know).
def test_fn(a, b):
a_len = a.shape[0]
b_len = b.shape[1]
prod = np.array([[0]*a_len]*b_len)
for i in range(a_len):
for j in range(b_len):
prod[i, j] = a[i, 0] + b[0, j]
return prod
However, the vectors I am working with contain thousands of elements, and the function above is quite slow. I was wondering if there was a better way to approach this problem, or if there was a numpy function that could be of use. Any help would be appreciated.

According to numpy's broadcasting rules, you can use a+b to implement your own defined operator.
The first rule of broadcasting is that if all input arrays do not have the same number of dimensions, a “1” will be repeatedly prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.
The second rule of broadcasting ensures that arrays with a size of 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is assumed to be the same along that dimension for the “broadcast” array.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Randomly sample rows (first dimension) of an array with unknown dimension - python

When the dimension is known, the task is trivial. Take a 2D array: a = np.random.randint(10, a=(5,2)) a[np.random.choice(a.shape[0]), :] However, in my function, the dimension of the array is arbitrary. How to handle this situation?

use the size of the 1st dimension to determine the random range: a[np.random.randint(0,a.shape[0],10)] or if you prefer, include an Ellipsis a[np.random.randint(0,a.shape[0],10),...] 1 indexing array selects from rows by default.

Related

How can you do an outer summation over only one dimension of a numpy 2D array?

transpose Keyword not working as I expected [duplicate]

From array of indicies, create an array of the values at those indicies

Slicing a mesh of unknown dimensions

Numpy: Perform Multiplication-like Addition

Categories

Resources