use `numpy.take` to randomly select 2d points - python

Problem Setup:
points - 2D numpy.array of length N.
centroids - 2D numpy.array that I get as an output from K-Means algorithm, of length k < N.
as a centroid initialization routine for an MLE algorithm, I want to assign each point in points a random centroid from centroids.
Required Output:
A numpy.array of shape (N, 2), of randomly chosen 2D points from centroids
My Efforts:
I've tried using the numpy.take with the numpy.random.choice as shown in Code 1, but it doesn't return the desired output.
Code 1:
import numpy as np
a = np.random.randint(1, 10, 10).reshape((5, 2))
idx = np.random.choice(5, 20)
np.take(a, idx)
Out: array([6, 2, 3, 3, 8, 2, 5, 2, 6, 3, 3, 8, 6, 6, 6, 6, 8, 2, 6, 5])
From numpy.take documentation page I've learned that it chooses items from flattened array, which is not what I need.
I'd appreciate any ideas on how to accomplish this task. Thanks in advance for any help.

One way is sampling the indexes, and then use that to index the first dimension of centroids:
idx = np.random.choice(np.arange(len(centroids)), size=len(a))
out = centroids[idx]

A similar to #Quang Hoang's answer, but a bit more intuitive in my opinion, will be :
a = np.random.randint(1, 10, 10).reshape((5, 2))
n_sampled_points = 20
a[np.random.randint(0, a.shape[0], n_sampled_points)]
Cheers.

Related

Reducing the dimensions of an array in a list in Python

I am reshaping the array in a list Test from (1, 3, 3) to (3, 3). How do I reshape for a more general form, say for a very numpy array from (1, n, n) to (n, n)?
import numpy as np
Test = [np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]]])]
Test = Test[0].reshape(3, 3)
The list is not relevant.
The simplest way to reshape to the smallest valid shape is squeeze:
Test = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]]])
assert Test.shape == (1, 3, 3)
Test = Test.squeeze()
assert Test.shape == (3, 3)
By smallest valid size, I mean to eliminate all dimensions that have length 1. You can customize it to only pick specific axes to zero out, but in practice, I find the default behavior is most useful. A super-useful feature of squeeze is that it's idempotent. You can keep "squeezing" an array as many times as you want.
Bonus: The same function exists in pandas pd.DataFrame.squeeze where it gives you a pd.Series from a single column pd.DataFrame.

Python: Create a function that takes dimensions & scaling factor. Returns a two-dimensional array multiplication table scaled by the scaling factor

I am trying to create a function that does what the title is asking for. Without the use of any functions besides: range, len or append. The function would take the dimensional input of the 2D array, as well as the scaling factor, and then return a two-dimensional array multiplication table scaled by the scaling factor.
I have tried various different code but have left them out because they return 0 progress on test cases.
If you want the output as a 2d array, you can use this:
def MatrixTable(width, height, scaling_factor):
return [[w*h*scaling_factor for w in range(1, width+1)] for h in range(1, height+1)]
MatrixTable(5, 3, 1)
Outputs:
[[1, 2, 3, 4, 5], [2, 4, 6, 8, 10], [3, 6, 9, 12, 15]]

Fill numpy array with other numpy array

I have following numpy arrays:
whole = np.array(
[1, 0, 3, 0, 6]
)
sparse = np.array(
[9, 8]
)
Now I want to replace every zero in the whole array in chronological order with the items in the sparse array. In the example my desired array would look like:
merged = np.array(
[1, 9, 3, 8, 6]
)
I could write a small algorithm by myself to fix this but if someone knows a time efficient way to solve this I would be very grateful for you help!
Do you assume that sparse has the same length as there is zeros in whole ?
If so, you can do:
import numpy as np
from copy import copy
whole = np.array([1, 0, 3, 0, 6])
sparse = np.array([9, 8])
merge = copy(whole)
merge[whole == 0] = sparse
if the lengths mismatch, you have to restrict to the correct length using len(...) and slicing.

Programmatically cropping an array along all its axes in Numpy

I want to (uniformly) reduce the dimensions of a numpy array (matrix) in each direction. The code below works.
array = np.array([3, 2323, 212, 2321, 54])
padding = 1
array[padding:-padding]
Output:
[2323, 12, 2321]
But I want this to be done another way. My array will be 50-dimensional and I want to apply the last line to each dimension of the array, but I don't want to write much code.
Maybe something like
array[padding: -padding for i in range(50)]
But it doesn't work.
You can produce the relevant slices directly;
array[array.ndim * [slice(1, -1)]]
For instance,
In [31]: array = np.zeros((3, 4, 5, 6))
In [32]: array[array.ndim * [slice(1, -1)]].shape
Out[32]: (1, 2, 3, 4)

Need help converting Matlab's bsxfun to numpy

I'm trying to convert a piece of MATLAB code, and this is a line I'm struggling with:
f = 0
wlab = reshape(bsxfun(#times,cat(3,1-f,f/2,f/2),lab),[],3)
I've come up with
wlab = lab*(np.concatenate((3,1-f,f/2,f/2)))
How do I reshape it now?
Not going to do it for your code, but more as a general knowledge:
bsxfun is a function that fills a gap in MATLAB that python doesn't need to fill: broadcasting.
Broadcasting is a thing where if a matrix that is being multiplied/added/whatever similar is not the same size as the other one being used, the matrix will be repeated.
So in python, if you have a 3D matrix A and you want to multiply every 2D slice of it with a matrix B that is 2D, you dont need anything else, python will broadcast B for you, it will repeat the matrix again and again. A*B will suffice. However, in MATLAB that will raise an error Matrix dimension mismatch. To overcome that, you'd use bsxfun as bsxfun(#times,A,B) and this will broadcast (repeat) B over the 3rd dimension of A.
This means that converting bsxfun to python generally requires nothing.
MATLAB
reshape(x,[],3)
is the equivalent of numpy
np.reshape(x,(-1,3))
the [] and -1 are place holders for 'fill in the correct shape here'.
===============
I just tried the MATLAB expression is Octave - it's on a different machine, so I'll just summarize the action.
For lab=1:6 (6 elements) the bsxfun produces a (1,6,3) matrix; the reshape turns it into (6,3), i.e. just removes the first dimension. The cat produces a (1,1,3) matrix.
np.reshape(np.array([1-f,f/2,f/2])[None,None,:]*lab[None,:,None],(-1,3))
For lab with shape (n,m), the bsxfun produces a (n,m,3) matrix; the reshape would make it (n*m,3)
So for a 2d lab, the numpy needs to be
np.array([1-f,f/2,f/2])[None,None,:]*lab[:,:,None]
(In MATLAB the lab will always be 2d (or larger), so this 2nd case it closer to its action even if n is 1).
=======================
np.array([1-f,f/2,f/2])*lab[...,None]
would handle any shaped lab
If I make the Octave lab (4,2,3), the `bsxfun is also (4,2,3)
The matching numpy expression would be
In [94]: (np.array([1-f,f/2,f/2])*lab).shape
Out[94]: (4, 2, 3)
numpy adds dimensions to the start of the (3,) array to match the dimensions of lab, effectively
(np.array([1-f,f/2,f/2])[None,None,:]*lab) # for 3d lab
If f=0, then the array is [1,0,0], so this has the effect of zeroing values on the last dimension of lab. In effect, changing the 'color'.
It is equivalent to
import numpy as np
wlab = np.kron([1-f,f/2,f/2],lab.reshape(-1,1))
In Python, if you use numpy you do not need to do any broadcasting, as this is done automatically for you.
For instance, looking at the following code should make it clearer:
>>> import numpy as np
>>> a = np.array([[1, 2, 3], [3, 4, 5], [6, 7, 8], [9, 10, 100]])
>>> b = np.array([1, 2, 3])
>>>
>>> a
array([[ 1, 2, 3],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 100]])
>>> b
array([1, 2, 3])
>>>
>>> a - b
array([[ 0, 0, 0],
[ 2, 2, 2],
[ 5, 5, 5],
[ 8, 8, 97]])
>>>

Categories

Resources