Create a Numpy matrix storing shuffled versions of an input ndarray - python

I have a 2d ndarray called weights of shape (npts, nweights). For every column of weights, I wish to randomly shuffle the rows. I want to repeat this process num_shuffles times, and store the collection of shufflings into a 3d ndarray called weights_matrix. Importantly, for each shuffling iteration, the shuffling indices of each column of weights should be the same.
Below appears an explicit naive double-for-loop implementation of this algorithm. Is it possible to avoid the python loops and generate weights_matrix in pure Numpy?
import numpy as np
npts, nweights = 5, 2
weights = np.random.rand(npts*nweights).reshape((npts, nweights))
num_shuffles = 3
weights_matrix = np.zeros((num_shuffles, npts, nweights))
for i in range(num_shuffles):
indx = np.random.choice(np.arange(npts), npts, replace=False)
for j in range(nweights):
weights_matrix[i, :, j] = weights[indx, j]

You can start by filling your 3-D array with copies of the original weights, then perform a simple iteration over slices of that 3-D array, using numpy.random.shuffle to shuffle each 2-D slice in-place.
For every column of weights, I wish to randomly shuffle the rows...the shuffling indices of each column of weights should be the same
is just another way of saying "I want to randomly reorder the rows of a 2D array". numpy.random.shuffle is a numpy-array-capable version of random.shuffle: it will reorder the elements of a container in-place. And that's all you need, since the "elements" of a 2-D numpy array, in that sense, are its rows.
import numpy
weights = numpy.array( [ [ 1, 2, 3 ], [ 4, 5, 6], [ 7, 8, 9 ] ] )
weights_3d = weights[ numpy.newaxis, :, : ].repeat( 10, axis=0 )
for w in weights_3d:
numpy.random.shuffle( w ) # in-place shuffle of the rows of each slice
print( weights_3d[0, :, :] )
print( weights_3d[1, :, :] )
print( weights_3d[2, :, :] )

Here's a vectorized solution with the idea being borrowed from this post -
weights[np.random.rand(num_shuffles,weights.shape[0]).argsort(1)]
Sample run -
In [28]: weights
Out[28]:
array([[ 0.22508764, 0.8527072 ],
[ 0.31504052, 0.73272155],
[ 0.73370203, 0.54889059],
[ 0.87470619, 0.12394942],
[ 0.20587307, 0.11385946]])
In [29]: num_shuffles = 3
In [30]: weights[np.random.rand(num_shuffles,weights.shape[0]).argsort(1)]
Out[30]:
array([[[ 0.87470619, 0.12394942],
[ 0.20587307, 0.11385946],
[ 0.22508764, 0.8527072 ],
[ 0.31504052, 0.73272155],
[ 0.73370203, 0.54889059]],
[[ 0.87470619, 0.12394942],
[ 0.22508764, 0.8527072 ],
[ 0.73370203, 0.54889059],
[ 0.20587307, 0.11385946],
[ 0.31504052, 0.73272155]],
[[ 0.73370203, 0.54889059],
[ 0.31504052, 0.73272155],
[ 0.22508764, 0.8527072 ],
[ 0.20587307, 0.11385946],
[ 0.87470619, 0.12394942]]])

Related

Broadcasting a function to a 3D array Python

I tried understanding numpy broadcasting with 3d arrays but I think the OP there is asking something slightly different.
I have a 3D numpy array like so -
IQ = np.array([
[[1,2],
[3,4]],
[[5,6],
[7,8]]
], dtype = 'float64')
The shape of this array is (2,2,2). I want to apply a function to each 1x2 array in this 3D matrix like so -
def func(IQ):
I = IQ[0]
Q = IQ[1]
amp = np.power((np.power(I,2) + np.power(Q, 2)),1/2)
phase = math.atan(Q/I)
return [amp, phase]
As you can see, I want to apply my function to each 1x2 array and replace it with the return value of my function. The output is a 3D array with the same dimensions. Is there a way to broadcast this function to each 1x2 array in my original 3D array? Currently I am using loops which becomes very slow as the 3D array increases in dimensions.
Currently I am doing this -
#IQ is defined from above
for i in range(IQ.shape[0]):
for j in range(IQ.shape[1]):
I = IQ[i,j,0]
Q = IQ[i,j,1]
amp = np.power((np.power(I,2) + np.power(Q, 2)),1/2)
phase = math.atan(Q/I)
IQ[i,j,0] = amp
IQ[i,j,1] = phase
And the returned 3D array is -
[[[ 2.23606798 1.10714872]
[ 5. 0.92729522]]
[[ 7.81024968 0.87605805]
[10.63014581 0.85196633]]]
One way is to slice the arrays to extract the I and Q values, perform the computations using normal broadcasting, and then stick the values back together:
>>> Is, Qs = IQ[...,0], IQ[...,1]
>>> np.stack(((Is**2 + Qs**2) ** 0.5, np.arctan2(Qs, Is)), axis=-1)
array([[[ 2.23606798, 1.10714872],
[ 5. , 0.92729522]],
[[ 7.81024968, 0.87605805],
[10.63014581, 0.85196633]]])
It can be done using arrays:
# sort of sum of squares along axis 2, ie (IQ[..., 0]**2 + IQ[..., 1]**2 + ...)**0.5
amp = np.sqrt(np.square(IQ).sum(axis=2))
amp
>>> array([[ 2.23606798, 5. ],
[ 7.81024968, 10.63014581]])
# and phase is arctan for each component in each matrix
phase = np.arctan2(IQ[..., 1], IQ[..., 0])
phase
>>> array([[1.10714872, 0.92729522],
[0.87605805, 0.85196633]])
# then combine the arrays to 3d
np.stack([amp, phase], axis=2)
>>> array([[[ 2.23606798, 1.10714872],
[ 5. , 0.92729522]],
[[ 7.81024968, 0.87605805],
[10.63014581, 0.85196633]]])
I = IQ[..., 0]
Q = IQ[..., 1]
amp = np.linalg.norm(IQ, axis= 2)
phase = np.arctan(Q/I)
IQ[..., 0] = amp
IQ[..., 1] = phase
IQ
>> [[[ 2.23606798, 1.10714872],
[ 5. , 0.92729522]],
[[ 7.81024968, 0.87605805],
[10.63014581, 0.85196633]]]

How to get median of column positive elements in a tensor/matrix?

Specifically given a 2-D matrix, how to find median for every column's positive elements?
Mathematically speaking: return B, where B[i] = median({A[j, i] | A[j, i] > 0})
I know that median can by computed by  tf.contrib.distributions.percentile
tf.boolean_mask(A, tf.greater(A, 0)) outputs a 1-D list instead of a matrix.
tf.boolean_mask() indeed returns a 1-D tensor, as otherwise the resulting tensor with dimensions kept would be sparse (c.f. columns having a different number of positive elements).
As I do not know of any median function for sparse matrices, the only alternative coming to mind is to loop over the columns, e.g. using tf.map_fn():
import tensorflow as tf
A = tf.convert_to_tensor([[ 1, 0, 20, 5],
[-1, 1, 10, 0],
[-2, 1, -10, 2],
[ 0, 2, 20, 1]])
positive_median_fn = lambda x: tf.contrib.distributions.percentile(tf.boolean_mask(x, tf.greater(x, 0)), q=50)
A_t = tf.matrix_transpose(A) # tf.map_fn is applied along 1st dim, so we need to transpose A
res = tf.map_fn(fn=positive_median_fn, elems=A_t)
with tf.Session() as sess:
print(sess.run(res))
# [ 1 1 20 2]
Note: this snippet doesn't cover the case when a column contains no positive elements. tf.contrib.distributions.percentile() would return an error if its input tensor is empty. A condition on the shape of tf.boolean_mask(x, tf.greater(x, 0)) could for instance be used (e.g. with tf.where())
You could loop over the column slices and filter like this.
inputlist = [[5 , -10 ] ,
[10 , 3 ] ,
[15 , -5 ]]
x = tf.Variable(initial_value=inputlist)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(x.get_shape().as_list()[1]) : #loop over columns
print( sess.run(tf.contrib.distributions.percentile(tf.gather(x[:,i],
tf.where(tf.greater(x[:,i],
0))),
50.0)))

Making a multidimensional list of vectors

I am quite new to Python so bear with me. I am writing a program to calculate some physical quantity, let's call it A. A is a function of several variables, let's call them x, y, z. So I have three nested loops to calculate A for the values of x, y, z that I am interested in.
for x in xs:
for y in ys:
for z in zs:
A[x, y, z] = function_calculating_value(x,y,z)
Now, the problem is that A[x,y,z] is two-dimensional array containing both the mean value and the variance so that A[x,y,z] = [mean, variance]. From other languages I am used to initializing A using function similar to np.zeros(). How do I do that here? What is the easiest way to achieve what I want, and how do I access the mean and variance easily for a given (x,y,z)?
(the end goal is to be able to plot the mean with the variance as error bars, so if there is an even more elegant way of doing this, I appreciate that as well)
thanks in advance!
You can create and manipulate your multi-dimensional array with numpy
# Generate a random 4d array that has nx = 3, ny = 3, and nz = 3, with each 3D point having 2 values
mdarray = np.random.random( size = (3,3,3,2) )
# The overall shape of the 4d array
mdarray
Out[66]:
array([[[[ 0.80091246, 0.28476668],
[ 0.94264747, 0.27247111],
[ 0.64503087, 0.13722768]],
[[ 0.21371798, 0.41006764],
[ 0.79783723, 0.02537987],
[ 0.80658387, 0.43464532]],
[[ 0.04566927, 0.74836831],
[ 0.8280196 , 0.90288647],
[ 0.59271082, 0.65910184]]],
[[[ 0.82533798, 0.29075978],
[ 0.76496127, 0.1308289 ],
[ 0.22767752, 0.01865939]],
[[ 0.76849458, 0.7934015 ],
[ 0.93313128, 0.88436557],
[ 0.06897508, 0.00307739]],
[[ 0.15975812, 0.00792386],
[ 0.40292818, 0.21209199],
[ 0.48805502, 0.71974702]]],
[[[ 0.66522525, 0.49797465],
[ 0.29369336, 0.68743839],
[ 0.46411967, 0.69547356]],
[[ 0.50339875, 0.66423777],
[ 0.80520751, 0.88115054],
[ 0.08296022, 0.69467829]],
[[ 0.76572574, 0.45332754],
[ 0.87982243, 0.15773385],
[ 0.5762041 , 0.91268172]]]])
# Both values for this specific sample at x = 0, y = 1 and z = 2
mdarray[0,1,2]
Out[67]: array([ 0.80658387, 0.43464532])
mdarray[0,1,2,0] # mean only at the same point
Out[68]: 0.8065838666297338
mdarray[0,1,2,1] # variance only at the same point
Out[69]: 0.43464532443865489
You can also get only the means or the variance values separately by slicing the array:
mean = mdarray[:,:,:,0]
variance = mdarray[:,:,:,1]
mean
Out[74]:
array([[[ 0.80091246, 0.94264747, 0.64503087],
[ 0.21371798, 0.79783723, 0.80658387],
[ 0.04566927, 0.8280196 , 0.59271082]],
[[ 0.82533798, 0.76496127, 0.22767752],
[ 0.76849458, 0.93313128, 0.06897508],
[ 0.15975812, 0.40292818, 0.48805502]],
[[ 0.66522525, 0.29369336, 0.46411967],
[ 0.50339875, 0.80520751, 0.08296022],
[ 0.76572574, 0.87982243, 0.5762041 ]]])
I'm still unsure how I would have preferred to plot this data, will think about this a bit and update this answer.

Python: matrix-vector multiplication with broadcasting

I have a numpy 2x2 matrix defined as follows:
a = np.pi/2
g = np.asarray([[-np.sin(a), -np.cos(a)],
[ np.cos(a), -np.sin(a)]])
Now, I have numpy array of 2D points that I would like to transform using this matrix. So we can simulate a bunch (25) of 2D points as follows:
p = np.random.rand(25, 2)
How can I do this matrix-vector multiplication for all these 25 points with broadcasting rather than do a for loop?
At the moment, I can do something like:
for i in range(25):
print np.dot(g, p[i])
This should give me another 2D array with the shape (25, 2).
Is there a more elegant way to do this without the for loop?
I think what you want is -
np.dot(p,g.T)
.T is to transpose an array
Example/Demo -
In [1]: import numpy as np
In [2]: a = np.pi/2
In [3]: g = np.asarray([[-np.sin(a), -np.cos(a)],
...: [ np.cos(a), -np.sin(a)]])
In [4]: p = np.random.rand(25, 2)
In [8]: for i in range(25):
...: print(np.dot(g, p[i]))
...:
[-0.56997282 -0.70151323]
[-0.65807814 -0.21773391]
[-0.533987 -0.53936287]
[-0.91982277 -0.01423868]
[-0.96648577 -0.42122831]
[-0.67169383 -0.94959473]
[-0.09013282 -0.57637376]
[-0.03937037 -0.94635173]
[ -2.59523258e-01 -4.04297667e-05]
[-0.77029438 -0.67325988]
[-0.24862373 -0.89806226]
[-0.91866799 -0.07927881]
[-0.83540497 -0.33473515]
[-0.38738641 -0.75406194]
[-0.07569734 -0.66859275]
[-0.72707983 -0.21314985]
[-0.67738699 -0.90763549]
[-0.96172981 -0.68684667]
[-0.40152064 -0.14629421]
[-0.46495457 -0.37456133]
[-0.97915149 -0.0470546 ]
[-0.76488223 -0.70756525]
[-0.21534494 -0.91354898]
[-0.25035908 -0.37841355]
[-0.17990176 -0.18436497]
In [10]: np.dot(p,g.T)
Out[10]:
array([[ -5.69972820e-01, -7.01513225e-01],
[ -6.58078138e-01, -2.17733909e-01],
[ -5.33987004e-01, -5.39362872e-01],
[ -9.19822767e-01, -1.42386768e-02],
[ -9.66485769e-01, -4.21228314e-01],
[ -6.71693832e-01, -9.49594730e-01],
[ -9.01328234e-02, -5.76373760e-01],
[ -3.93703749e-02, -9.46351732e-01],
[ -2.59523258e-01, -4.04297667e-05],
[ -7.70294378e-01, -6.73259882e-01],
[ -2.48623728e-01, -8.98062260e-01],
[ -9.18667987e-01, -7.92788080e-02],
[ -8.35404971e-01, -3.34735152e-01],
[ -3.87386412e-01, -7.54061939e-01],
[ -7.56973425e-02, -6.68592746e-01],
[ -7.27079833e-01, -2.13149846e-01],
[ -6.77386988e-01, -9.07635490e-01],
[ -9.61729810e-01, -6.86846673e-01],
[ -4.01520636e-01, -1.46294211e-01],
[ -4.64954574e-01, -3.74561327e-01],
[ -9.79151491e-01, -4.70545953e-02],
[ -7.64882230e-01, -7.07565246e-01],
[ -2.15344940e-01, -9.13548984e-01],
[ -2.50359076e-01, -3.78413552e-01],
[ -1.79901758e-01, -1.84364974e-01]])
Try:
np.dot(p, g.T)
which multiplies the points by the transpose of the rotation matrix.

NumPy: Execute function over each ndarray element

I have a three dimensional ndarray of 2D coordinates, for example:
[[[1704 1240]
[1745 1244]
[1972 1290]
[2129 1395]
[1989 1332]]
[[1712 1246]
[1750 1246]
[1964 1286]
[2138 1399]
[1989 1333]]
[[1721 1249]
[1756 1249]
[1955 1283]
[2145 1399]
[1990 1333]]]
The ultimate goal is to remove the point closest to a given point ([1989 1332]) from each "group" of 5 coordinates. My thought was to produce a similarly shaped array of distances, and then using argmin to determine the indices of the values to be removed. However, I am not certain how to go about applying a function, like one to calculate a distance to a given point, to every element in an ndarray, at least in a NumPythonic way.
List comprehensions are a very inefficient way to deal with numpy arrays. They're an especially poor choice for the distance calculation.
To find the difference between your data and a point, you'd just do data - point. You can then calculate the distance using np.hypot, or if you'd prefer, square it, sum it, and take the square root.
It's a bit easier if you make it an Nx2 array for the purposes of the calculation though.
Basically, you want something like this:
import numpy as np
data = np.array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395],
[1989, 1332]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399],
[1989, 1333]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399],
[1990, 1333]]])
point = [1989, 1332]
#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)
# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)
print dist
This yields:
array([[[ 299.48121811],
[ 259.38388539],
[ 45.31004304],
[ 153.5219854 ],
[ 0. ]],
[[ 290.04310025],
[ 254.0019685 ],
[ 52.35456045],
[ 163.37074401],
[ 1. ]],
[[ 280.55837182],
[ 247.34186868],
[ 59.6405902 ],
[ 169.77926846],
[ 1.41421356]]])
Now, removing the closest element is a bit harder than simply getting the closest element.
With numpy, you can use boolean indexing to do this fairly easily.
However, you'll need to worry a bit about the alignment of your axes.
The key is to understand that numpy "broadcasts" operations along the last axis. In this case, we want to brodcast along the middle axis.
Also, -1 can be used as a placeholder for the size of an axis. Numpy will calculate the permissible size when -1 is put in as the size of an axis.
What we'd need to do would look a bit like this:
#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]
# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])
You could make that a single line, I'm just breaking it down for readability. The key is that dist != something yields a boolean array which you can then use to index the original array.
So, Putting it all together:
import numpy as np
data = np.array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395],
[1989, 1332]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399],
[1989, 1333]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399],
[1990, 1333]]])
point = [1989, 1332]
#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)
# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)
#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]
# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])
print filtered
Yields:
array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399]]])
On a side note, if more than one point is equally close, this won't work. Numpy arrays have to have the same number of elements along each dimension, so you'll need to re-do your grouping in that case.
If I understand your question correctly, I think you're looking for apply_along_axis. Using numpy's built-in broadcasting, we can simply subtract the point from the array:
>>> a - numpy.array([1989, 1332])
array([[[-285, -92],
[-244, -88],
[ -17, -42],
[ 140, 63],
[ 0, 0]],
[[-277, -86],
[-239, -86],
[ -25, -46],
[ 149, 67],
[ 0, 1]],
[[-268, -83],
[-233, -83],
[ -34, -49],
[ 156, 67],
[ 1, 1]]])
Then we can apply numpy.linalg.norm to it:
>>> dist = a - numpy.array([1989, 1332])
>>> numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
array([[ 299.48121811, 259.38388539, 45.31004304,
153.5219854 , 0. ],
[ 290.04310025, 254.0019685 , 52.35456045,
163.37074401, 1. ],
[ 280.55837182, 247.34186868, 59.6405902 ,
169.77926846, 1.41421356]])
Finally, some boolean mask trickery, along with a couple of reshape calls:
>>> a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))
array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399]]])
Joe Kington's answer is faster though. Oh well. I'll leave this for posterity.
def joes(data, point):
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)
dist = dist.reshape(data.shape[0], data.shape[1], 1)
mask = np.squeeze(dist) != dist.min(axis=1)
return data[mask].reshape((3, 4, 2))
def mine(a, point):
dist = a - point
normed = numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
return a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))
>>> %timeit mine(data, point)
1000 loops, best of 3: 586 us per loop
>>> %timeit joes(data, point)
10000 loops, best of 3: 48.9 us per loop
There are multiple ways to do this, but here is one using list comprehensions:
Distance function:
In [35]: from numpy.linalg import norm
In [36]: dist = lambda x,y:norm(x-y)
Input data:
In [39]: GivenMatrix = scipy.rand(3, 5, 2)
In [40]: GivenMatrix
Out[40]:
array([[[ 0.83798666, 0.90294439],
[ 0.8706959 , 0.88397176],
[ 0.91879085, 0.93512921],
[ 0.15989245, 0.57311869],
[ 0.82896003, 0.53589968]],
[[ 0.0207089 , 0.9521768 ],
[ 0.94523963, 0.31079109],
[ 0.41929482, 0.88559614],
[ 0.87885236, 0.45227422],
[ 0.58365369, 0.62095507]],
[[ 0.14757177, 0.86101539],
[ 0.58081214, 0.12632764],
[ 0.89958321, 0.73660852],
[ 0.3408943 , 0.45420989],
[ 0.42656333, 0.42770216]]])
In [41]: q = scipy.rand(2)
In [42]: q
Out[42]: array([ 0.03280889, 0.71057403])
Compute output distances:
In [44]: distances = [[dist(x, q) for x in SubMatrix]
for SubMatrix in GivenMatrix]
In [45]: distances
Out[45]:
[[0.82783910695733931,
0.85564093542511577,
0.91399620574915652,
0.18720096539588818,
0.81508758596405939],
[0.24190557184498068,
0.99617079746515047,
0.42426891258164884,
0.88459501973012633,
0.55808740166908177],
[0.18921712490174292,
0.80103146210692744,
0.86716521557255788,
0.40079819635686459,
0.48482888965287363]]
To rank the results for each submatrix:
In [46]: scipy.argsort(distances)
Out[46]:
array([[3, 4, 0, 1, 2],
[0, 2, 4, 3, 1],
[0, 3, 4, 1, 2]])
As for the deletion, I personally think that's easiest by converting GivenMatrix to a list, then using del:
>>> GivenList = GivenMatrix.tolist()
>>> del GivenList[1][2] # delete third row from the second 5-by-2 submatrix

Categories

Resources