I have below code, that is actually checking if any value that is less than 0.5 in the data, would be replace by -1, but i want to check if a specific position value suppose 10th value should only be checked! How can i do that, using where function of numpy
import numpy as np
x = np.random.random((10,10))
x2 = np.where( x<0.5, x, -1)
this is what is want to.
import numpy as np
x = np.random.random((10,10))
x2 = np.where( x<0.5 and (index of x is 9), x, -1)
One way with the mask checking the 10th column after slicing i.e
import numpy as np
x = np.random.random((10,10))
Option 1 :
mask = x[:, 9] <0.5
x[:, 9][mask] = -1
Option 2
x[:,9] = np.where(x[:, 9] <0.5,x[:,9],-1)
Output :
array([[ 0.13291679, 0.36437627, 0.61680761, 0.47180988, 0.40779945,
0.21448173, 0.70938531, 0.88205403, 0.9007378 , -1. ],
[ 0.18517135, 0.591143 , 0.20951978, 0.09811755, 0.53492105,
0.70484089, 0.87912825, 0.94987278, 0.98151354, -1. ],
[ 0.55545461, 0.50936625, 0.26460411, 0.81739966, 0.07142206,
0.97005035, 0.08655628, 0.62414457, 0.42844278, 0.67848139],
[ 0.97279637, 0.32032396, 0.87051124, 0.01823881, 0.58417096,
0.39085964, 0.39753232, 0.49915164, 0.44284544, -1. ],
[ 0.95868029, 0.39688236, 0.82069431, 0.30433585, 0.52959998,
0.88929817, 0.90156477, 0.09418035, 0.68805644, 0.97685649],
[ 0.11680575, 0.97914842, 0.34087048, 0.16332758, 0.0531713 ,
0.18936729, 0.02451479, 0.25073047, 0.72354052, -1. ],
[ 0.65997478, 0.60118864, 0.42100758, 0.16616609, 0.16181439,
0.83024903, 0.99521926, 0.45748708, 0.26720405, 0.92070836],
[ 0.99248054, 0.68889428, 0.30094476, 0.00427059, 0.27930388,
0.44895715, 0.3866733 , 0.40558292, 0.4394462 , -1. ],
[ 0.98661531, 0.57641035, 0.17323863, 0.17630214, 0.27312168,
0.14315776, 0.10212816, 0.15961012, 0.55773218, -1. ],
[ 0.68539788, 0.58486093, 0.12482709, 0.89666695, 0.83484223,
0.39818926, 0.66773542, 0.59832267, 0.28018467, -1. ]])
I am quite new to Python so bear with me. I am writing a program to calculate some physical quantity, let's call it A. A is a function of several variables, let's call them x, y, z. So I have three nested loops to calculate A for the values of x, y, z that I am interested in.
for x in xs:
for y in ys:
for z in zs:
A[x, y, z] = function_calculating_value(x,y,z)
Now, the problem is that A[x,y,z] is two-dimensional array containing both the mean value and the variance so that A[x,y,z] = [mean, variance]. From other languages I am used to initializing A using function similar to np.zeros(). How do I do that here? What is the easiest way to achieve what I want, and how do I access the mean and variance easily for a given (x,y,z)?
(the end goal is to be able to plot the mean with the variance as error bars, so if there is an even more elegant way of doing this, I appreciate that as well)
thanks in advance!
You can create and manipulate your multi-dimensional array with numpy
# Generate a random 4d array that has nx = 3, ny = 3, and nz = 3, with each 3D point having 2 values
mdarray = np.random.random( size = (3,3,3,2) )
# The overall shape of the 4d array
array([[[[ 0.80091246, 0.28476668],
[ 0.94264747, 0.27247111],
[ 0.64503087, 0.13722768]],
[[ 0.21371798, 0.41006764],
[ 0.79783723, 0.02537987],
[ 0.80658387, 0.43464532]],
[[ 0.04566927, 0.74836831],
[ 0.8280196 , 0.90288647],
[ 0.59271082, 0.65910184]]],
[[[ 0.82533798, 0.29075978],
[ 0.76496127, 0.1308289 ],
[ 0.22767752, 0.01865939]],
[[ 0.76849458, 0.7934015 ],
[ 0.93313128, 0.88436557],
[ 0.06897508, 0.00307739]],
[[ 0.15975812, 0.00792386],
[ 0.40292818, 0.21209199],
[ 0.48805502, 0.71974702]]],
[[[ 0.66522525, 0.49797465],
[ 0.29369336, 0.68743839],
[ 0.46411967, 0.69547356]],
[[ 0.50339875, 0.66423777],
[ 0.80520751, 0.88115054],
[ 0.08296022, 0.69467829]],
[[ 0.76572574, 0.45332754],
[ 0.87982243, 0.15773385],
[ 0.5762041 , 0.91268172]]]])
# Both values for this specific sample at x = 0, y = 1 and z = 2
Out[67]: array([ 0.80658387, 0.43464532])
mdarray[0,1,2,0] # mean only at the same point
Out[68]: 0.8065838666297338
mdarray[0,1,2,1] # variance only at the same point
Out[69]: 0.43464532443865489
You can also get only the means or the variance values separately by slicing the array:
mean = mdarray[:,:,:,0]
variance = mdarray[:,:,:,1]
array([[[ 0.80091246, 0.94264747, 0.64503087],
[ 0.21371798, 0.79783723, 0.80658387],
[ 0.04566927, 0.8280196 , 0.59271082]],
[[ 0.82533798, 0.76496127, 0.22767752],
[ 0.76849458, 0.93313128, 0.06897508],
[ 0.15975812, 0.40292818, 0.48805502]],
[[ 0.66522525, 0.29369336, 0.46411967],
[ 0.50339875, 0.80520751, 0.08296022],
[ 0.76572574, 0.87982243, 0.5762041 ]]])
I'm still unsure how I would have preferred to plot this data, will think about this a bit and update this answer.
I have a numpy array containing the XYZ coordinates of the k-neighboors (k=10) points from a point cloud:
array([[[ 2.51508147e-01, 5.60274944e-02, 1.98303187e+00],
[ 2.48552352e-01, 5.95569573e-02, 1.98319519e+00],
[ 2.56611764e-01, 5.36767729e-02, 1.98236740e+00],
[ 2.54520357e-01, 6.23480231e-02, 1.98255634e+00],
[ 2.57603496e-01, 5.19787706e-02, 1.98221457e+00],
[ 2.43914440e-01, 5.68424985e-02, 1.98352253e+00]],
[[ 9.72352773e-02, 2.06699912e-02, 1.99344850e+00],
[ 9.91205871e-02, 2.36056261e-02, 1.99329960e+00],
[ 9.59625840e-02, 1.71508361e-02, 1.99356234e+00],
[ 1.03216261e-01, 2.19752081e-02, 1.99304521e+00],
[ 9.65025574e-02, 1.44127617e-02, 1.99355054e+00],
[ 9.59930867e-02, 2.72080526e-02, 1.99344873e+00]],
[[ 1.76408485e-01, 2.81930678e-02, 1.98819435e+00],
[ 1.78670138e-01, 2.81904750e-02, 1.98804617e+00],
[ 1.80372953e-01, 3.05109434e-02, 1.98791444e+00],
[ 1.81960404e-01, 2.47725621e-02, 1.98785996e+00],
[ 1.74499243e-01, 3.50728296e-02, 1.98826015e+00],
[ 1.83470801e-01, 2.70808022e-02, 1.98774099e+00]],
[[ 1.78178743e-01, -4.60980982e-02, -1.98792374e+00],
[ 1.77953839e-01, -4.73701134e-02, -1.98792756e+00],
[ 1.77889392e-01, -4.75468598e-02, -1.98793030e+00],
[ 1.79924294e-01, -5.08776568e-02, -1.98772371e+00],
[ 1.76720902e-01, -5.11409082e-02, -1.98791265e+00],
[ 1.83644593e-01, -4.64747548e-02, -1.98756230e+00]],
[[ 2.00245917e-01, -2.33091787e-03, -1.98685515e+00],
[ 2.02384919e-01, -5.60011715e-04, -1.98673022e+00],
[ 1.97325528e-01, -1.03301927e-03, -1.98705769e+00],
[ 1.95464164e-01, -6.23105839e-03, -1.98713481e+00],
[ 1.98985338e-01, -8.39920342e-03, -1.98688531e+00],
[ 1.95959195e-01, 2.68006674e-03, -1.98713303e+00]],
[[ 1.28851235e-01, -3.24527062e-02, -1.99127460e+00],
[ 1.26415789e-01, -3.27731185e-02, -1.99143147e+00],
[ 1.25985757e-01, -3.24910432e-02, -1.99146211e+00],
[ 1.28296465e-01, -3.92388329e-02, -1.99117136e+00],
[ 1.34895295e-01, -3.64872888e-02, -1.99083793e+00],
[ 1.29047096e-01, -3.97952795e-02, -1.99111152e+00]]])
With this shape:
Out[54]: (2999986, 10, 3)
And I have this function which applies a Principal Component Analysis to some data provided as 2-Dimensional array:
def PCA(data, correlation=False, sort=True):
""" Applies Principal Component Analysis to the data
data: array
The array containing the data. The array must have NxM dimensions, where each
of the N rows represents a different individual record and each of the M columns
represents a different variable recorded for that individual record.
[V11, ... , V1m],
[Vn1, ... , Vnm]])
correlation(Optional) : bool
Set the type of matrix to be computed (see Notes):
If True compute the correlation matrix.
If False(Default) compute the covariance matrix.
sort(Optional) : bool
Set the order that the eigenvalues/vectors will have
If True(Default) they will be sorted (from higher value to less).
If False they won't.
eigenvalues: (1,M) array
The eigenvalues of the corresponding matrix.
eigenvector: (M,M) array
The eigenvectors of the corresponding matrix.
The correlation matrix is a better choice when there are different magnitudes
representing the M variables. Use covariance matrix in any other case.
#: get the mean of all variables
mean = np.mean(data, axis=0, dtype=np.float64)
#: adjust the data by substracting the mean to each variable
data_adjust = data - mean
#: compute the covariance/correlation matrix
#: the data is transposed due to np.cov/corrcoef sintaxis
if correlation:
matrix = np.corrcoef(data_adjust.T)
matrix = np.cov(data_adjust.T)
#: get the eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(matrix)
if sort:
#: sort eigenvalues and eigenvectors
sort = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[sort]
eigenvectors = eigenvectors[:,sort]
return eigenvalues, eigenvectors
So the question is: how can I apply the PCA function mentioned above over each of the 2999986 10x3 arrays in a way that doesn't take for ever like this one:
data = np.empty((2999986, 3))
for i in range(len(k_neighboors)):
w, v = PCA(k_neighboors[i])
data[i] = v[:,2]
break #: I break the loop in order to don't have to wait for ever.
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]])
Thanks to #Divakar and #Eelco comments.
Using the function that Divakar post on this answer
def vectorized_app(data):
diffs = data - data.mean(1,keepdims=True)
return np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]
And using what Eelco pointed on his comment, I end up with this.
Out[48]: (2999986, 10, 3)
data = np.linalg.eig(vectorized_app(k_neighboors))[1][:,:,2]
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
[-0.0632175 , 0.01613551, 0.99786933],
[-0.06449399, 0.00552943, 0.99790278],
[-0.06081954, 0.01802078, 0.99798609]])
Wich gives the same results as the for loop, without taking forever (althought still takes a while):
data2 = np.empty((2999986, 3))
for i in range(len(k_neighboors)):
if i > 10:
break #: I break the loop in order to don't have to wait for ever.
w, v = PCA(k_neighboors[i])
data2[i] = v[:,2]
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]])
I don't know if there could be a better way to do this, so I'm going to keep the question open.
I'm working on a Computer Vision system and this is giving me a serious headache. I'm having trouble re-implementing an old gradient operator more efficiently, I'm working with numpy and openCV2.
This is what I had:
def gradientX(img):
rows, cols = img.shape
out = np.zeros((rows,cols))
for y in range(rows-1):
Mr = img[y]
Or = out[y]
Or[0] = Mr[1] - Mr[0]
for x in xrange(1, cols - 2):
Or[x] = (Mr[x+1] - Mr[x-1])/2.0
Or[cols-1] = Mr[cols-1] - Mr[cols-2]
return out
def gradient(img):
return [gradientX(img), (gradientX(img.T).T)]
I've tried using numpy's gradient operator but the result is not the same
For this input
array([[ 3, 4, 5],
[255, 0, 12],
[ 25, 15, 200]])
Using my gradient returns
[array([[ 1., 0., 1.],
[-255., 0., 12.],
[ 0., 0., 0.]]),
array([[ 252., -4., 0.],
[ 0., 0., 0.],
[-230., 15., 0.]])]
While using numpy's np.gradient returns
[array([[ 252. , -4. , 7. ],
[ 11. , 5.5, 97.5],
[-230. , 15. , 188. ]]),
array([[ 1. , 1. , 1. ],
[-255. , -121.5, 12. ],
[ -10. , 87.5, 185. ]])]
There are cleary some similarities between the results but they're definitely not the same. So I'm missing something here or the two operators aren't mean to produce the same results. In that case, I wanted to know how to re-implement my gradientX function so it doesn't use that awful looking double loop for traversing the 2-d array using mostly numpy's potency.
I've been working a bit more on this just to find that my mistake.
I was skipping last row and last column when iterating. As #wflynny noted, the result was identical except for a row and a column of zeros.
Provided this, the result could not be the same as np.gradient, but with that change, the results are identical, so there's no need to find any other numpy implementation for this.
Answering my own question, a good numpy's implementation for my gradient algorithm would be
import numpy as np
def gradientX(img):
return np.gradient(img)[::-1]
I'm also posting the working code, just because it shows how numpy's gradient operator works
def computeMatXGradient(img):
rows, cols = img.shape
out = np.zeros((rows,cols))
for y in range(rows):
Mr = img[y]
Or = out[y]
Or[0] = float(Mr[1]) - float(Mr[0])
for x in xrange(1, cols - 1):
Or[x] = (float(Mr[x+1]) - float(Mr[x-1]))/2.0
Or[cols-1] = float(Mr[cols-1]) - float(Mr[cols-2])
return out
I have a list raws of arrays that I would like to plot in ipython notebook. Here is the code I am trying to get working:
fig, axes = subplots(len(raws),1, sharex=True, tight_layout=True, figsize=(12, 6), dpi=72)
for r in range(len(raws)):
I've been lost for hours if not days trying to figure out how to index the list raws, such that I can plot each mxn array on it's own axis where n is the number of time points, i.e., x-axis and m is the number of time-series functions sampled at each point.
When I code:
for r in range(len(raws)):
I get an ValueError: setting an array element with a sequence.
For your information:
len(raws) = 2
type(raws) = 'list'
np.shape(raws[0][0]) = (306, 10001)
raws =
[(array([[ -4.13211217e-12, -4.13287303e-12, -4.01705259e-12, ...,
1.36386023e-12, 1.65182851e-12, 2.00368966e-12],
[ 1.08914129e-12, 1.47828466e-12, 1.82257607e-12, ...,
-2.70151520e-12, -2.48631967e-12, -2.28625548e-12],
[ -7.80962369e-14, -1.27119591e-13, -1.73610315e-13, ...,
-1.13219629e-13, -1.15031720e-13, -1.12106621e-13],
[ 2.52774254e-12, 2.32293195e-12, 2.02644002e-12, ...,
4.20064191e-12, 3.94858906e-12, 3.69495394e-12],
[ -4.38122146e-12, -4.96229676e-12, -5.47782145e-12, ...,
3.93820033e-12, 4.18850823e-12, 4.34950629e-12],
[ -1.07284424e-13, -9.23447993e-14, -7.89852400e-14, ...,
7.92079631e-14, 5.60172215e-14, 3.04448868e-14]]), array([ 60. , 60.001, 60.002, ..., 69.998, 69.999, 70. ])), (array([[ -6.71363108e-12, -5.80501003e-12, -4.95944514e-12, ...,
-3.25087343e-12, -2.68982494e-12, -2.13637448e-12],
[ -5.04818633e-12, -4.65757005e-12, -4.16084140e-12, ...,
-4.26120531e-13, 2.20744290e-13, 7.81245614e-13],
[ 1.97329506e-13, 1.64543867e-13, 1.32679812e-13, ...,
2.11645494e-13, 1.94795729e-13, 1.75781773e-13],
[ 3.04245661e-12, 2.28376461e-12, 1.54118900e-12, ...,
-1.14020908e-14, -8.04647589e-13, -1.52676489e-12],
[ -1.83485962e-13, -5.22949893e-13, -8.60038852e-13, ...,
7.70312553e-12, 7.20825156e-12, 6.58362857e-12],
[ -7.26357906e-14, -7.11700989e-14, -6.88759767e-14, ...,
-1.04171843e-13, -1.03084861e-13, -9.68462427e-14]]), array([ 60. , 60.001, 60.002, ..., 69.998, 69.999, 70. ]))]
Just so I can post code, I am responding here.
Looks like your data is nested in the form
[ ( array1, array2, ..., arrayN ) ]
This could be handled in one of two ways:
In [2]: raws = [np.random.rand(20, 100), np.random.rand(20, 100)]
In [3]: raws = raws[0]
In [4]: f, axes = plt.subplots(len(raws), 1)
In [5]: for i in range(len(raws)):
...: axes[i].plot(raws[i])
In [3]: raws = [(np.random.rand(20, 100), np.random.rand(20, 100))]
In [4]: f, axes = plt.subplots(len(raws[0]), 1)
In [5]: for i in range(len(raws[0])):
...: axes[i].plot(raws[0][i])
If you have a list of arrays such as abac below, you can plot as following:
import numpy as np
a = np.array(range(20))
b = a * 2
c = a * 3
abac = a,b,a,c