Related
I am quite new to Python so bear with me. I am writing a program to calculate some physical quantity, let's call it A. A is a function of several variables, let's call them x, y, z. So I have three nested loops to calculate A for the values of x, y, z that I am interested in.
for x in xs:
for y in ys:
for z in zs:
A[x, y, z] = function_calculating_value(x,y,z)
Now, the problem is that A[x,y,z] is two-dimensional array containing both the mean value and the variance so that A[x,y,z] = [mean, variance]. From other languages I am used to initializing A using function similar to np.zeros(). How do I do that here? What is the easiest way to achieve what I want, and how do I access the mean and variance easily for a given (x,y,z)?
(the end goal is to be able to plot the mean with the variance as error bars, so if there is an even more elegant way of doing this, I appreciate that as well)
thanks in advance!
You can create and manipulate your multi-dimensional array with numpy
# Generate a random 4d array that has nx = 3, ny = 3, and nz = 3, with each 3D point having 2 values
mdarray = np.random.random( size = (3,3,3,2) )
# The overall shape of the 4d array
mdarray
Out[66]:
array([[[[ 0.80091246, 0.28476668],
[ 0.94264747, 0.27247111],
[ 0.64503087, 0.13722768]],
[[ 0.21371798, 0.41006764],
[ 0.79783723, 0.02537987],
[ 0.80658387, 0.43464532]],
[[ 0.04566927, 0.74836831],
[ 0.8280196 , 0.90288647],
[ 0.59271082, 0.65910184]]],
[[[ 0.82533798, 0.29075978],
[ 0.76496127, 0.1308289 ],
[ 0.22767752, 0.01865939]],
[[ 0.76849458, 0.7934015 ],
[ 0.93313128, 0.88436557],
[ 0.06897508, 0.00307739]],
[[ 0.15975812, 0.00792386],
[ 0.40292818, 0.21209199],
[ 0.48805502, 0.71974702]]],
[[[ 0.66522525, 0.49797465],
[ 0.29369336, 0.68743839],
[ 0.46411967, 0.69547356]],
[[ 0.50339875, 0.66423777],
[ 0.80520751, 0.88115054],
[ 0.08296022, 0.69467829]],
[[ 0.76572574, 0.45332754],
[ 0.87982243, 0.15773385],
[ 0.5762041 , 0.91268172]]]])
# Both values for this specific sample at x = 0, y = 1 and z = 2
mdarray[0,1,2]
Out[67]: array([ 0.80658387, 0.43464532])
mdarray[0,1,2,0] # mean only at the same point
Out[68]: 0.8065838666297338
mdarray[0,1,2,1] # variance only at the same point
Out[69]: 0.43464532443865489
You can also get only the means or the variance values separately by slicing the array:
mean = mdarray[:,:,:,0]
variance = mdarray[:,:,:,1]
mean
Out[74]:
array([[[ 0.80091246, 0.94264747, 0.64503087],
[ 0.21371798, 0.79783723, 0.80658387],
[ 0.04566927, 0.8280196 , 0.59271082]],
[[ 0.82533798, 0.76496127, 0.22767752],
[ 0.76849458, 0.93313128, 0.06897508],
[ 0.15975812, 0.40292818, 0.48805502]],
[[ 0.66522525, 0.29369336, 0.46411967],
[ 0.50339875, 0.80520751, 0.08296022],
[ 0.76572574, 0.87982243, 0.5762041 ]]])
I'm still unsure how I would have preferred to plot this data, will think about this a bit and update this answer.
I have a numpy array containing the XYZ coordinates of the k-neighboors (k=10) points from a point cloud:
k_neighboors
Out[53]:
array([[[ 2.51508147e-01, 5.60274944e-02, 1.98303187e+00],
[ 2.48552352e-01, 5.95569573e-02, 1.98319519e+00],
[ 2.56611764e-01, 5.36767729e-02, 1.98236740e+00],
...,
[ 2.54520357e-01, 6.23480231e-02, 1.98255634e+00],
[ 2.57603496e-01, 5.19787706e-02, 1.98221457e+00],
[ 2.43914440e-01, 5.68424985e-02, 1.98352253e+00]],
[[ 9.72352773e-02, 2.06699912e-02, 1.99344850e+00],
[ 9.91205871e-02, 2.36056261e-02, 1.99329960e+00],
[ 9.59625840e-02, 1.71508361e-02, 1.99356234e+00],
...,
[ 1.03216261e-01, 2.19752081e-02, 1.99304521e+00],
[ 9.65025574e-02, 1.44127617e-02, 1.99355054e+00],
[ 9.59930867e-02, 2.72080526e-02, 1.99344873e+00]],
[[ 1.76408485e-01, 2.81930678e-02, 1.98819435e+00],
[ 1.78670138e-01, 2.81904750e-02, 1.98804617e+00],
[ 1.80372953e-01, 3.05109434e-02, 1.98791444e+00],
...,
[ 1.81960404e-01, 2.47725621e-02, 1.98785996e+00],
[ 1.74499243e-01, 3.50728296e-02, 1.98826015e+00],
[ 1.83470801e-01, 2.70808022e-02, 1.98774099e+00]],
...,
[[ 1.78178743e-01, -4.60980982e-02, -1.98792374e+00],
[ 1.77953839e-01, -4.73701134e-02, -1.98792756e+00],
[ 1.77889392e-01, -4.75468598e-02, -1.98793030e+00],
...,
[ 1.79924294e-01, -5.08776568e-02, -1.98772371e+00],
[ 1.76720902e-01, -5.11409082e-02, -1.98791265e+00],
[ 1.83644593e-01, -4.64747548e-02, -1.98756230e+00]],
[[ 2.00245917e-01, -2.33091787e-03, -1.98685515e+00],
[ 2.02384919e-01, -5.60011715e-04, -1.98673022e+00],
[ 1.97325528e-01, -1.03301927e-03, -1.98705769e+00],
...,
[ 1.95464164e-01, -6.23105839e-03, -1.98713481e+00],
[ 1.98985338e-01, -8.39920342e-03, -1.98688531e+00],
[ 1.95959195e-01, 2.68006674e-03, -1.98713303e+00]],
[[ 1.28851235e-01, -3.24527062e-02, -1.99127460e+00],
[ 1.26415789e-01, -3.27731185e-02, -1.99143147e+00],
[ 1.25985757e-01, -3.24910432e-02, -1.99146211e+00],
...,
[ 1.28296465e-01, -3.92388329e-02, -1.99117136e+00],
[ 1.34895295e-01, -3.64872888e-02, -1.99083793e+00],
[ 1.29047096e-01, -3.97952795e-02, -1.99111152e+00]]])
With this shape:
k_neighboors.shape
Out[54]: (2999986, 10, 3)
And I have this function which applies a Principal Component Analysis to some data provided as 2-Dimensional array:
def PCA(data, correlation=False, sort=True):
""" Applies Principal Component Analysis to the data
Parameters
----------
data: array
The array containing the data. The array must have NxM dimensions, where each
of the N rows represents a different individual record and each of the M columns
represents a different variable recorded for that individual record.
array([
[V11, ... , V1m],
...,
[Vn1, ... , Vnm]])
correlation(Optional) : bool
Set the type of matrix to be computed (see Notes):
If True compute the correlation matrix.
If False(Default) compute the covariance matrix.
sort(Optional) : bool
Set the order that the eigenvalues/vectors will have
If True(Default) they will be sorted (from higher value to less).
If False they won't.
Returns
-------
eigenvalues: (1,M) array
The eigenvalues of the corresponding matrix.
eigenvector: (M,M) array
The eigenvectors of the corresponding matrix.
Notes
-----
The correlation matrix is a better choice when there are different magnitudes
representing the M variables. Use covariance matrix in any other case.
"""
#: get the mean of all variables
mean = np.mean(data, axis=0, dtype=np.float64)
#: adjust the data by substracting the mean to each variable
data_adjust = data - mean
#: compute the covariance/correlation matrix
#: the data is transposed due to np.cov/corrcoef sintaxis
if correlation:
matrix = np.corrcoef(data_adjust.T)
else:
matrix = np.cov(data_adjust.T)
#: get the eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(matrix)
if sort:
#: sort eigenvalues and eigenvectors
sort = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[sort]
eigenvectors = eigenvectors[:,sort]
return eigenvalues, eigenvectors
So the question is: how can I apply the PCA function mentioned above over each of the 2999986 10x3 arrays in a way that doesn't take for ever like this one:
data = np.empty((2999986, 3))
for i in range(len(k_neighboors)):
w, v = PCA(k_neighboors[i])
data[i] = v[:,2]
break #: I break the loop in order to don't have to wait for ever.
data
Out[64]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]])
Thanks to #Divakar and #Eelco comments.
Using the function that Divakar post on this answer
def vectorized_app(data):
diffs = data - data.mean(1,keepdims=True)
return np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]
And using what Eelco pointed on his comment, I end up with this.
k_neighboors.shape
Out[48]: (2999986, 10, 3)
#: THE (ASSUMED)VECTORIZED ANSWER
data = np.linalg.eig(vectorized_app(k_neighboors))[1][:,:,2]
data
Out[50]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
...,
[-0.0632175 , 0.01613551, 0.99786933],
[-0.06449399, 0.00552943, 0.99790278],
[-0.06081954, 0.01802078, 0.99798609]])
Wich gives the same results as the for loop, without taking forever (althought still takes a while):
data2 = np.empty((2999986, 3))
for i in range(len(k_neighboors)):
if i > 10:
break #: I break the loop in order to don't have to wait for ever.
w, v = PCA(k_neighboors[i])
data2[i] = v[:,2]
data2
Out[52]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]])
I don't know if there could be a better way to do this, so I'm going to keep the question open.
How to compute a xi-xj matrix in numpy without loops (by api calls)?
Here's what to start with:
import numpy as np
x = np.random.rand(4)
xij = np.matrix([xi-xj for xj in x for xi in x]).reshape(4,4)
You can take advantage of broadcasting to subtract x as a column vector from x as a flat array and produce the matrix.
>>> x = np.random.rand(4)
Then:
>>> x - x[:,np.newaxis]
array([[ 0. , 0.89175647, 0.80930233, 0.37955823],
[-0.89175647, 0. , -0.08245415, -0.51219825],
[-0.80930233, 0.08245415, 0. , -0.4297441 ],
[-0.37955823, 0.51219825, 0.4297441 , 0. ]])
If you want a matrix object (and not the default array object) you could write:
np.matrix(x - x[:,np.newaxis])
By reshaping the array, you can use the minus operator to calculate what you want
import numpy as np
x = np.random.rand(4)
x = x.reshape(-1,1)
xij = np.matrix(x.T - x)
Another alternative is to use np.subtract.outer:
In [35]: x = np.random.rand(4)
In [36]: np.matrix([xi-xj for xj in x for xi in x]).reshape(4,4)
Out[36]:
matrix([[ 0. , 0.45365177, 0.07227472, -0.05824887],
[-0.45365177, 0. , -0.38137705, -0.51190064],
[-0.07227472, 0.38137705, 0. , -0.13052359],
[ 0.05824887, 0.51190064, 0.13052359, 0. ]])
In [37]: -np.subtract.outer(x, x)
Out[37]:
array([[-0. , 0.45365177, 0.07227472, -0.05824887],
[-0.45365177, -0. , -0.38137705, -0.51190064],
[-0.07227472, 0.38137705, -0. , -0.13052359],
[ 0.05824887, 0.51190064, 0.13052359, -0. ]])
(Note that the result is a numpy array, not a matrix.)
In general we could have matrices of arbitrary sizes. For my application it is necessary to have square matrix. Also the dummy entries should have a specified value. I am wondering if there is anything built in numpy?
Or the easiest way of doing it
EDIT :
The matrix X is already there and it is not squared. We want to pad the value to make it square. Pad it with the dummy given value. All the original values will stay the same.
Thanks a lot
Building upon the answer by LucasB here is a function which will pad an arbitrary matrix M with a given value val so that it becomes square:
def squarify(M,val):
(a,b)=M.shape
if a>b:
padding=((0,0),(0,a-b))
else:
padding=((0,b-a),(0,0))
return numpy.pad(M,padding,mode='constant',constant_values=val)
Since Numpy 1.7, there's the numpy.pad function. Here's an example:
>>> x = np.random.rand(2,3)
>>> np.pad(x, ((0,1), (0,0)), mode='constant', constant_values=42)
array([[ 0.20687158, 0.21241617, 0.91913572],
[ 0.35815412, 0.08503839, 0.51852029],
[ 42. , 42. , 42. ]])
For a 2D numpy array m it’s straightforward to do this by creating a max(m.shape) x max(m.shape) array of ones p and multiplying this by the desired padding value, before setting the slice of p corresponding to m (i.e. p[0:m.shape[0], 0:m.shape[1]]) to be equal to m.
This leads to the following function, where the first line deals with the possibility that the input has only one dimension (i.e. is an array rather than a matrix):
import numpy as np
def pad_to_square(a, pad_value=0):
m = a.reshape((a.shape[0], -1))
padded = pad_value * np.ones(2 * [max(m.shape)], dtype=m.dtype)
padded[0:m.shape[0], 0:m.shape[1]] = m
return padded
So, for example:
>>> r1 = np.random.rand(3, 5)
>>> r1
array([[ 0.85950957, 0.92468279, 0.93643261, 0.82723889, 0.54501699],
[ 0.05921614, 0.94946809, 0.26500925, 0.02287463, 0.04511802],
[ 0.99647148, 0.6926722 , 0.70148198, 0.39861487, 0.86772468]])
>>> pad_to_square(r1, 3)
array([[ 0.85950957, 0.92468279, 0.93643261, 0.82723889, 0.54501699],
[ 0.05921614, 0.94946809, 0.26500925, 0.02287463, 0.04511802],
[ 0.99647148, 0.6926722 , 0.70148198, 0.39861487, 0.86772468],
[ 3. , 3. , 3. , 3. , 3. ],
[ 3. , 3. , 3. , 3. , 3. ]])
or
>>> r2=np.random.rand(4)
>>> r2
array([ 0.10307689, 0.83912888, 0.13105124, 0.09897586])
>>> pad_to_square(r2, 0)
array([[ 0.10307689, 0. , 0. , 0. ],
[ 0.83912888, 0. , 0. , 0. ],
[ 0.13105124, 0. , 0. , 0. ],
[ 0.09897586, 0. , 0. , 0. ]])
etc.
I am a beginner at python and numpy and I need to compute the matrix logarithm for each "pixel" (i.e. x,y position) of a matrix-valued image of dimension NxMx3x3. 3x3 is the dimensions of the matrix at each pixel.
The function I have written so far is the following:
def logm_img(im):
from scipy import linalg
dimx = im.shape[0]
dimy = im.shape[1]
res = zeros_like(im)
for x in range(dimx):
for y in range(dimy):
res[x, y, :, :] = linalg.logm(asmatrix(im[x,y,:,:]))
return res
Is it ok?
Is there a way to avoid the two nested loops ?
Numpy can do that. Just call numpy.log:
>>> import numpy
>>> a = numpy.array(range(100)).reshape(10, 10)
>>> b = numpy.log(a)
__main__:1: RuntimeWarning: divide by zero encountered in log
>>> b
array([[ -inf, 0. , 0.69314718, 1.09861229, 1.38629436,
1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458],
[ 2.30258509, 2.39789527, 2.48490665, 2.56494936, 2.63905733,
2.7080502 , 2.77258872, 2.83321334, 2.89037176, 2.94443898],
[ 2.99573227, 3.04452244, 3.09104245, 3.13549422, 3.17805383,
3.21887582, 3.25809654, 3.29583687, 3.33220451, 3.36729583],
[ 3.40119738, 3.4339872 , 3.4657359 , 3.49650756, 3.52636052,
3.55534806, 3.58351894, 3.61091791, 3.63758616, 3.66356165],
[ 3.68887945, 3.71357207, 3.73766962, 3.76120012, 3.78418963,
3.80666249, 3.8286414 , 3.8501476 , 3.87120101, 3.8918203 ],
[ 3.91202301, 3.93182563, 3.95124372, 3.97029191, 3.98898405,
4.00733319, 4.02535169, 4.04305127, 4.06044301, 4.07753744],
[ 4.09434456, 4.11087386, 4.12713439, 4.14313473, 4.15888308,
4.17438727, 4.18965474, 4.20469262, 4.21950771, 4.2341065 ],
[ 4.24849524, 4.26267988, 4.27666612, 4.29045944, 4.30406509,
4.31748811, 4.33073334, 4.34380542, 4.35670883, 4.36944785],
[ 4.38202663, 4.39444915, 4.40671925, 4.41884061, 4.4308168 ,
4.44265126, 4.4543473 , 4.46590812, 4.47733681, 4.48863637],
[ 4.49980967, 4.51085951, 4.52178858, 4.53259949, 4.54329478,
4.55387689, 4.56434819, 4.57471098, 4.58496748, 4.59511985]])