Why does matplotlib extrapolate/plot missing values?

Why does matplotlib extrapolate/plot missing values? - python

I have a situation where sometimes, a whole series of data is not available. I'm real-time plotting values from sensors, and these can be turned on and off via user interaction, and thus I cannot be sure the values are always in a series. A user can start a sensor and later turn it off and on again, but In this case, matplotlib draws a line from the last end point and the new start point.
The data I plotted was as follows:
[[ 5. 22.57011604]
[ 6. 22.57408142]
[ 7. 22.56350136]
[ 8. 22.56394005]
[ 9. 22.56790352]
[ 10. 22.56451225]
[ 11. 22.56481743]
[ 12. 22.55789757]
#Missing x vals. Still plots straight line..
[ 29. 22.55654716]
[ 29. 22.56066513]
[ 30. 22.56110382]
[ 31. 22.55050468]
[ 32. 22.56550789]
[ 33. 22.56213379]
[ 34. 22.5588932 ]
[ 35. 22.54829407]
[ 35. 22.56697655]
[ 36. 22.56005478]
[ 37. 22.5568161 ]
[ 38. 22.54621696]
[ 39. 22.55033493]
[ 40. 22.55079269]
[ 41. 22.55475616]
[ 41. 22.54783821]
[ 42. 22.55195618]]
my plot function looks a lot simplified like this:
def plot(self, data)
for name, xy_dict in data.iteritems():
x_vals = xy_dict['x_values']
y_vals = xy_dict['y_values']
line_to_plot = xy_dict['line_number']
self.lines[line_to_plot].set_xdata(x_vals)
self.lines[line_to_plot].set_ydata(y_vals)
Does anyone know why it does like that? And do I have to take care of non-serial x and y values when plotting? It seems matplotlib should take care of this on its own.. Otherwise i have to split lists into smaller lists and plot these?

One option would be to add dummy items wherever data is missing (in your case apparently when x changes by more than 1), and set them as masked elements. That way matplotlib skips the line segments. For example:
import numpy as np
import matplotlib.pylab as pl
# Your data, with some additional elements deleted...
data = np.array(
[[ 5., 22.57011604],
[ 6., 22.57408142],
[ 9., 22.56790352],
[ 10., 22.56451225],
[ 11., 22.56481743],
[ 12., 22.55789757],
[ 29., 22.55654716],
[ 33., 22.56213379],
[ 34., 22.5588932 ],
[ 35., 22.54829407],
[ 40., 22.55079269],
[ 41., 22.55475616],
[ 41., 22.54783821],
[ 42., 22.55195618]])
x = data[:,0]
y = data[:,1]
# Difference from element to element in x
dx = x[1:]-x[:-1]
# Wherever dx > 1, insert a dummy item equal to -1
x2 = np.insert(x, np.where(dx>1)[0]+1, -1)
y2 = np.insert(y, np.where(dx>1)[0]+1, -1)
# As discussed in the comments, another option is to use e.g.:
#x2 = np.insert(x, np.where(dx>1)[0]+1, np.nan)
#y2 = np.insert(y, np.where(dx>1)[0]+1, np.nan)
# and skip the masking step below.
# Mask elements which are -1
x2 = np.ma.masked_where(x2 == -1, x2)
y2 = np.ma.masked_where(y2 == -1, y2)
pl.figure()
pl.subplot(121)
pl.plot(x,y)
pl.subplot(122)
pl.plot(x2,y2)

Another option is to include None or numpy.nan as values for y.
This, for example, shows a disconnected line:
import matplotlib.pyplot as plt
plt.plot([1,2,3,4,5],[5,6,None,7,8])

Matplotlib will connect all your consequetive datapoints with lines.
If you want to avoid this you could split your data at the missing x-values, and plot the two splitted lists separately.

Related

numpy.where with data position as well in the condition

I have below code, that is actually checking if any value that is less than 0.5 in the data, would be replace by -1, but i want to check if a specific position value suppose 10th value should only be checked! How can i do that, using where function of numpy
import numpy as np
x = np.random.random((10,10))
x2 = np.where( x<0.5, x, -1)
print(x2)
this is what is want to.
import numpy as np
x = np.random.random((10,10))
x2 = np.where( x<0.5 and (index of x is 9), x, -1)
print(x2)

One way with the mask checking the 10th column after slicing i.e
import numpy as np
x = np.random.random((10,10))
Option 1 :
mask = x[:, 9] <0.5
x[:, 9][mask] = -1
Option 2
x[:,9] = np.where(x[:, 9] <0.5,x[:,9],-1)
Output :
array([[ 0.13291679, 0.36437627, 0.61680761, 0.47180988, 0.40779945,
0.21448173, 0.70938531, 0.88205403, 0.9007378 , -1. ],
[ 0.18517135, 0.591143 , 0.20951978, 0.09811755, 0.53492105,
0.70484089, 0.87912825, 0.94987278, 0.98151354, -1. ],
[ 0.55545461, 0.50936625, 0.26460411, 0.81739966, 0.07142206,
0.97005035, 0.08655628, 0.62414457, 0.42844278, 0.67848139],
[ 0.97279637, 0.32032396, 0.87051124, 0.01823881, 0.58417096,
0.39085964, 0.39753232, 0.49915164, 0.44284544, -1. ],
[ 0.95868029, 0.39688236, 0.82069431, 0.30433585, 0.52959998,
0.88929817, 0.90156477, 0.09418035, 0.68805644, 0.97685649],
[ 0.11680575, 0.97914842, 0.34087048, 0.16332758, 0.0531713 ,
0.18936729, 0.02451479, 0.25073047, 0.72354052, -1. ],
[ 0.65997478, 0.60118864, 0.42100758, 0.16616609, 0.16181439,
0.83024903, 0.99521926, 0.45748708, 0.26720405, 0.92070836],
[ 0.99248054, 0.68889428, 0.30094476, 0.00427059, 0.27930388,
0.44895715, 0.3866733 , 0.40558292, 0.4394462 , -1. ],
[ 0.98661531, 0.57641035, 0.17323863, 0.17630214, 0.27312168,
0.14315776, 0.10212816, 0.15961012, 0.55773218, -1. ],
[ 0.68539788, 0.58486093, 0.12482709, 0.89666695, 0.83484223,
0.39818926, 0.66773542, 0.59832267, 0.28018467, -1. ]])

Making a multidimensional list of vectors

I am quite new to Python so bear with me. I am writing a program to calculate some physical quantity, let's call it A. A is a function of several variables, let's call them x, y, z. So I have three nested loops to calculate A for the values of x, y, z that I am interested in.
for x in xs:
for y in ys:
for z in zs:
A[x, y, z] = function_calculating_value(x,y,z)
Now, the problem is that A[x,y,z] is two-dimensional array containing both the mean value and the variance so that A[x,y,z] = [mean, variance]. From other languages I am used to initializing A using function similar to np.zeros(). How do I do that here? What is the easiest way to achieve what I want, and how do I access the mean and variance easily for a given (x,y,z)?
(the end goal is to be able to plot the mean with the variance as error bars, so if there is an even more elegant way of doing this, I appreciate that as well)
thanks in advance!

You can create and manipulate your multi-dimensional array with numpy
# Generate a random 4d array that has nx = 3, ny = 3, and nz = 3, with each 3D point having 2 values
mdarray = np.random.random( size = (3,3,3,2) )
# The overall shape of the 4d array
mdarray
Out[66]:
array([[[[ 0.80091246, 0.28476668],
[ 0.94264747, 0.27247111],
[ 0.64503087, 0.13722768]],
[[ 0.21371798, 0.41006764],
[ 0.79783723, 0.02537987],
[ 0.80658387, 0.43464532]],
[[ 0.04566927, 0.74836831],
[ 0.8280196 , 0.90288647],
[ 0.59271082, 0.65910184]]],
[[[ 0.82533798, 0.29075978],
[ 0.76496127, 0.1308289 ],
[ 0.22767752, 0.01865939]],
[[ 0.76849458, 0.7934015 ],
[ 0.93313128, 0.88436557],
[ 0.06897508, 0.00307739]],
[[ 0.15975812, 0.00792386],
[ 0.40292818, 0.21209199],
[ 0.48805502, 0.71974702]]],
[[[ 0.66522525, 0.49797465],
[ 0.29369336, 0.68743839],
[ 0.46411967, 0.69547356]],
[[ 0.50339875, 0.66423777],
[ 0.80520751, 0.88115054],
[ 0.08296022, 0.69467829]],
[[ 0.76572574, 0.45332754],
[ 0.87982243, 0.15773385],
[ 0.5762041 , 0.91268172]]]])
# Both values for this specific sample at x = 0, y = 1 and z = 2
mdarray[0,1,2]
Out[67]: array([ 0.80658387, 0.43464532])
mdarray[0,1,2,0] # mean only at the same point
Out[68]: 0.8065838666297338
mdarray[0,1,2,1] # variance only at the same point
Out[69]: 0.43464532443865489
You can also get only the means or the variance values separately by slicing the array:
mean = mdarray[:,:,:,0]
variance = mdarray[:,:,:,1]
mean
Out[74]:
array([[[ 0.80091246, 0.94264747, 0.64503087],
[ 0.21371798, 0.79783723, 0.80658387],
[ 0.04566927, 0.8280196 , 0.59271082]],
[[ 0.82533798, 0.76496127, 0.22767752],
[ 0.76849458, 0.93313128, 0.06897508],
[ 0.15975812, 0.40292818, 0.48805502]],
[[ 0.66522525, 0.29369336, 0.46411967],
[ 0.50339875, 0.80520751, 0.08296022],
[ 0.76572574, 0.87982243, 0.5762041 ]]])
I'm still unsure how I would have preferred to plot this data, will think about this a bit and update this answer.

How to vectorize a 'for' loop which calls a function (that takes a 2-Dimensional array as argument) over a 3-Dimensional numpy array

I have a numpy array containing the XYZ coordinates of the k-neighboors (k=10) points from a point cloud:
k_neighboors
Out[53]:
array([[[ 2.51508147e-01, 5.60274944e-02, 1.98303187e+00],
[ 2.48552352e-01, 5.95569573e-02, 1.98319519e+00],
[ 2.56611764e-01, 5.36767729e-02, 1.98236740e+00],
...,
[ 2.54520357e-01, 6.23480231e-02, 1.98255634e+00],
[ 2.57603496e-01, 5.19787706e-02, 1.98221457e+00],
[ 2.43914440e-01, 5.68424985e-02, 1.98352253e+00]],
[[ 9.72352773e-02, 2.06699912e-02, 1.99344850e+00],
[ 9.91205871e-02, 2.36056261e-02, 1.99329960e+00],
[ 9.59625840e-02, 1.71508361e-02, 1.99356234e+00],
...,
[ 1.03216261e-01, 2.19752081e-02, 1.99304521e+00],
[ 9.65025574e-02, 1.44127617e-02, 1.99355054e+00],
[ 9.59930867e-02, 2.72080526e-02, 1.99344873e+00]],
[[ 1.76408485e-01, 2.81930678e-02, 1.98819435e+00],
[ 1.78670138e-01, 2.81904750e-02, 1.98804617e+00],
[ 1.80372953e-01, 3.05109434e-02, 1.98791444e+00],
...,
[ 1.81960404e-01, 2.47725621e-02, 1.98785996e+00],
[ 1.74499243e-01, 3.50728296e-02, 1.98826015e+00],
[ 1.83470801e-01, 2.70808022e-02, 1.98774099e+00]],
...,
[[ 1.78178743e-01, -4.60980982e-02, -1.98792374e+00],
[ 1.77953839e-01, -4.73701134e-02, -1.98792756e+00],
[ 1.77889392e-01, -4.75468598e-02, -1.98793030e+00],
...,
[ 1.79924294e-01, -5.08776568e-02, -1.98772371e+00],
[ 1.76720902e-01, -5.11409082e-02, -1.98791265e+00],
[ 1.83644593e-01, -4.64747548e-02, -1.98756230e+00]],
[[ 2.00245917e-01, -2.33091787e-03, -1.98685515e+00],
[ 2.02384919e-01, -5.60011715e-04, -1.98673022e+00],
[ 1.97325528e-01, -1.03301927e-03, -1.98705769e+00],
...,
[ 1.95464164e-01, -6.23105839e-03, -1.98713481e+00],
[ 1.98985338e-01, -8.39920342e-03, -1.98688531e+00],
[ 1.95959195e-01, 2.68006674e-03, -1.98713303e+00]],
[[ 1.28851235e-01, -3.24527062e-02, -1.99127460e+00],
[ 1.26415789e-01, -3.27731185e-02, -1.99143147e+00],
[ 1.25985757e-01, -3.24910432e-02, -1.99146211e+00],
...,
[ 1.28296465e-01, -3.92388329e-02, -1.99117136e+00],
[ 1.34895295e-01, -3.64872888e-02, -1.99083793e+00],
[ 1.29047096e-01, -3.97952795e-02, -1.99111152e+00]]])
With this shape:
k_neighboors.shape
Out[54]: (2999986, 10, 3)
And I have this function which applies a Principal Component Analysis to some data provided as 2-Dimensional array:
def PCA(data, correlation=False, sort=True):
""" Applies Principal Component Analysis to the data
Parameters
----------
data: array
The array containing the data. The array must have NxM dimensions, where each
of the N rows represents a different individual record and each of the M columns
represents a different variable recorded for that individual record.
array([
[V11, ... , V1m],
...,
[Vn1, ... , Vnm]])
correlation(Optional) : bool
Set the type of matrix to be computed (see Notes):
If True compute the correlation matrix.
If False(Default) compute the covariance matrix.
sort(Optional) : bool
Set the order that the eigenvalues/vectors will have
If True(Default) they will be sorted (from higher value to less).
If False they won't.
Returns
-------
eigenvalues: (1,M) array
The eigenvalues of the corresponding matrix.
eigenvector: (M,M) array
The eigenvectors of the corresponding matrix.
Notes
-----
The correlation matrix is a better choice when there are different magnitudes
representing the M variables. Use covariance matrix in any other case.
"""
#: get the mean of all variables
mean = np.mean(data, axis=0, dtype=np.float64)
#: adjust the data by substracting the mean to each variable
data_adjust = data - mean
#: compute the covariance/correlation matrix
#: the data is transposed due to np.cov/corrcoef sintaxis
if correlation:
matrix = np.corrcoef(data_adjust.T)
else:
matrix = np.cov(data_adjust.T)
#: get the eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(matrix)
if sort:
#: sort eigenvalues and eigenvectors
sort = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[sort]
eigenvectors = eigenvectors[:,sort]
return eigenvalues, eigenvectors
So the question is: how can I apply the PCA function mentioned above over each of the 2999986 10x3 arrays in a way that doesn't take for ever like this one:
data = np.empty((2999986, 3))
for i in range(len(k_neighboors)):
w, v = PCA(k_neighboors[i])
data[i] = v[:,2]
break #: I break the loop in order to don't have to wait for ever.
data
Out[64]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]])

Thanks to #Divakar and #Eelco comments.
Using the function that Divakar post on this answer
def vectorized_app(data):
diffs = data - data.mean(1,keepdims=True)
return np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]
And using what Eelco pointed on his comment, I end up with this.
k_neighboors.shape
Out[48]: (2999986, 10, 3)
#: THE (ASSUMED)VECTORIZED ANSWER
data = np.linalg.eig(vectorized_app(k_neighboors))[1][:,:,2]
data
Out[50]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
...,
[-0.0632175 , 0.01613551, 0.99786933],
[-0.06449399, 0.00552943, 0.99790278],
[-0.06081954, 0.01802078, 0.99798609]])
Wich gives the same results as the for loop, without taking forever (althought still takes a while):
data2 = np.empty((2999986, 3))
for i in range(len(k_neighboors)):
if i > 10:
break #: I break the loop in order to don't have to wait for ever.
w, v = PCA(k_neighboors[i])
data2[i] = v[:,2]
data2
Out[52]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]])
I don't know if there could be a better way to do this, so I'm going to keep the question open.

Implementing gradient operator in Python

I'm working on a Computer Vision system and this is giving me a serious headache. I'm having trouble re-implementing an old gradient operator more efficiently, I'm working with numpy and openCV2.
This is what I had:
def gradientX(img):
rows, cols = img.shape
out = np.zeros((rows,cols))
for y in range(rows-1):
Mr = img[y]
Or = out[y]
Or[0] = Mr[1] - Mr[0]
for x in xrange(1, cols - 2):
Or[x] = (Mr[x+1] - Mr[x-1])/2.0
Or[cols-1] = Mr[cols-1] - Mr[cols-2]
return out
def gradient(img):
return [gradientX(img), (gradientX(img.T).T)]
I've tried using numpy's gradient operator but the result is not the same
For this input
array([[ 3, 4, 5],
[255, 0, 12],
[ 25, 15, 200]])
Using my gradient returns
[array([[ 1., 0., 1.],
[-255., 0., 12.],
[ 0., 0., 0.]]),
array([[ 252., -4., 0.],
[ 0., 0., 0.],
[-230., 15., 0.]])]
While using numpy's np.gradient returns
[array([[ 252. , -4. , 7. ],
[ 11. , 5.5, 97.5],
[-230. , 15. , 188. ]]),
array([[ 1. , 1. , 1. ],
[-255. , -121.5, 12. ],
[ -10. , 87.5, 185. ]])]
There are cleary some similarities between the results but they're definitely not the same. So I'm missing something here or the two operators aren't mean to produce the same results. In that case, I wanted to know how to re-implement my gradientX function so it doesn't use that awful looking double loop for traversing the 2-d array using mostly numpy's potency.

I've been working a bit more on this just to find that my mistake.
I was skipping last row and last column when iterating. As #wflynny noted, the result was identical except for a row and a column of zeros.
Provided this, the result could not be the same as np.gradient, but with that change, the results are identical, so there's no need to find any other numpy implementation for this.
Answering my own question, a good numpy's implementation for my gradient algorithm would be
import numpy as np
def gradientX(img):
return np.gradient(img)[::-1]
I'm also posting the working code, just because it shows how numpy's gradient operator works
def computeMatXGradient(img):
rows, cols = img.shape
out = np.zeros((rows,cols))
for y in range(rows):
Mr = img[y]
Or = out[y]
Or[0] = float(Mr[1]) - float(Mr[0])
for x in xrange(1, cols - 1):
Or[x] = (float(Mr[x+1]) - float(Mr[x-1]))/2.0
Or[cols-1] = float(Mr[cols-1]) - float(Mr[cols-2])
return out

plotting a list of arrays with matplotlib

I have a list raws of arrays that I would like to plot in ipython notebook. Here is the code I am trying to get working:
fig, axes = subplots(len(raws),1, sharex=True, tight_layout=True, figsize=(12, 6), dpi=72)
for r in range(len(raws)):
axes[r].plot(raws)
I've been lost for hours if not days trying to figure out how to index the list raws, such that I can plot each mxn array on it's own axis where n is the number of time points, i.e., x-axis and m is the number of time-series functions sampled at each point.
When I code:
for r in range(len(raws)):
axes[r].plot(raws[r])
I get an ValueError: setting an array element with a sequence.
For your information:
len(raws) = 2
type(raws) = 'list'
np.shape(raws[0][0]) = (306, 10001)
raws =
[(array([[ -4.13211217e-12, -4.13287303e-12, -4.01705259e-12, ...,
1.36386023e-12, 1.65182851e-12, 2.00368966e-12],
[ 1.08914129e-12, 1.47828466e-12, 1.82257607e-12, ...,
-2.70151520e-12, -2.48631967e-12, -2.28625548e-12],
[ -7.80962369e-14, -1.27119591e-13, -1.73610315e-13, ...,
-1.13219629e-13, -1.15031720e-13, -1.12106621e-13],
...,
[ 2.52774254e-12, 2.32293195e-12, 2.02644002e-12, ...,
4.20064191e-12, 3.94858906e-12, 3.69495394e-12],
[ -4.38122146e-12, -4.96229676e-12, -5.47782145e-12, ...,
3.93820033e-12, 4.18850823e-12, 4.34950629e-12],
[ -1.07284424e-13, -9.23447993e-14, -7.89852400e-14, ...,
7.92079631e-14, 5.60172215e-14, 3.04448868e-14]]), array([ 60. , 60.001, 60.002, ..., 69.998, 69.999, 70. ])), (array([[ -6.71363108e-12, -5.80501003e-12, -4.95944514e-12, ...,
-3.25087343e-12, -2.68982494e-12, -2.13637448e-12],
[ -5.04818633e-12, -4.65757005e-12, -4.16084140e-12, ...,
-4.26120531e-13, 2.20744290e-13, 7.81245614e-13],
[ 1.97329506e-13, 1.64543867e-13, 1.32679812e-13, ...,
2.11645494e-13, 1.94795729e-13, 1.75781773e-13],
...,
[ 3.04245661e-12, 2.28376461e-12, 1.54118900e-12, ...,
-1.14020908e-14, -8.04647589e-13, -1.52676489e-12],
[ -1.83485962e-13, -5.22949893e-13, -8.60038852e-13, ...,
7.70312553e-12, 7.20825156e-12, 6.58362857e-12],
[ -7.26357906e-14, -7.11700989e-14, -6.88759767e-14, ...,
-1.04171843e-13, -1.03084861e-13, -9.68462427e-14]]), array([ 60. , 60.001, 60.002, ..., 69.998, 69.999, 70. ]))]

Just so I can post code, I am responding here.
Looks like your data is nested in the form
[ ( array1, array2, ..., arrayN ) ]
This could be handled in one of two ways:
In [2]: raws = [np.random.rand(20, 100), np.random.rand(20, 100)]
In [3]: raws = raws[0]
In [4]: f, axes = plt.subplots(len(raws), 1)
In [5]: for i in range(len(raws)):
...: axes[i].plot(raws[i])
Or
In [3]: raws = [(np.random.rand(20, 100), np.random.rand(20, 100))]
In [4]: f, axes = plt.subplots(len(raws[0]), 1)
In [5]: for i in range(len(raws[0])):
...: axes[i].plot(raws[0][i])

If you have a list of arrays such as abac below, you can plot as following:
import numpy as np
a = np.array(range(20))
b = a * 2
c = a * 3
abac = a,b,a,c
plt.plot(*abac)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does matplotlib extrapolate/plot missing values? - python

Another option is to include None or numpy.nan as values for y. This, for example, shows a disconnected line: import matplotlib.pyplot as plt plt.plot([1,2,3,4,5],[5,6,None,7,8])

Matplotlib will connect all your consequetive datapoints with lines. If you want to avoid this you could split your data at the missing x-values, and plot the two splitted lists separately.

Related

numpy.where with data position as well in the condition

Making a multidimensional list of vectors

How to vectorize a 'for' loop which calls a function (that takes a 2-Dimensional array as argument) over a 3-Dimensional numpy array

Implementing gradient operator in Python

plotting a list of arrays with matplotlib

Categories

Resources