Related
I am a manufacturing engineer, very new to Python and Matplotlib. Currently, I am trying to plot a scatter time graph, where for every single record, I have the data (read from a sensor) and upper and lower limits for that data that will stop the tool if data is not between them.
So for a simple set of data like this:
time = [1, 2, 3, 7, 8, 9, 10]*
data = [5, 6, 5, 5, 6, 7, 8]
lower_limit = [4, 4, 5, 5, 5, 5, 5]
upper_limit = [6, 6, 6, 7, 7, 7, 7]
When the tool is not working, nothing will be recorded, hence a gap b/w 3 & 7 in time records.
The desired graph would look like this:
A few rules that I am trying to stick to:
All three graphs (data, upper_limit, and lower_limit) are required to be scattered points and not lines, with the x-axis (time) being shared among them. - required.
A green highlight that fills between upper and lower limits, considering only the two points with the same time for each highlight. - highly recommended.
(I tried matplotlib.fill_between, but it creates a polygon between trend lines, rather than straight vertical lines between matching pairs of L.L. & U.L. dots. Therefore, it won't be accurate, and it will fill up the gap b/w times 3s and 7s, which is not desired. Also, I tried to use matplot.bar for limits along the scatter plot for the 'data', but I was not able to set a minimum = lower_limit for the bars.)
When the value of data is not equal to or between the limits, the representing dot should appear in red, rather than the original color. -highly recommended.
So, with all of that in mind, and thousands of records per day, a regular graph, for a 24hr time span, should look like the following: (notice the gap due to possible lack of records in a time span, as well as vertical green lines, for the limits.)
Thanks for your time and help!
This is a version using numpys masking and matplotlibs errorbar
import matplotlib.pyplot as plt
import numpy as np
time = np.array( [0, 1, 2, 3, 7, 8, 9, 10] )
data = np.array([2, 5, 6, 5, 5, 6, 7, 8] )
lower = np.array([4, 4, 4, 5, 5, 5, 5, 5] )
upper = np.array([6, 6, 6, 6, 7, 7, 7, 7] )
nn = len( lower )
delta = upper - lower
### creating masks
inside = ( ( upper - data ) >= 0 ) & ( ( data - lower ) >= 0 )
outside = np.logical_not( inside )
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.errorbar( time, lower, yerr=( nn*[0], delta), ls='', ecolor="#00C023" )
ax.scatter( time[ inside ], data[ inside ], c='k' )
ax.scatter( time[ outside ], data[ outside ], c='r' )
plt.show()
Something like this should work, plotting each component separately:
time = [1, 2, 3, 7, 8, 9, 10]
data = [5, 6, 5, 5, 6, 7, 8]
lower_limit = [4, 4, 5, 5, 5, 5, 5]
upper_limit = [6, 6, 6, 7, 7, 7, 7]
# put data into dataframe and identify which points are out of range (not between the lower and upper limit)
df = pd.DataFrame({'time': time, 'data': data, 'll': lower_limit, 'ul': upper_limit})
df.loc[:, 'in_range'] = 0
df.loc[((df['data'] >= df['ll']) & (df['data'] <= df['ul'])), 'in_range'] = 1
# make the plot
fig, ax = plt.subplots()
# plot lower-limit and upper-limit points
plt.scatter(df['time'], df['ll'], c='green')
plt.scatter(df['time'], df['ul'], c='green')
# plot data points in range
plt.scatter(df.loc[df['in_range']==1, :]['time'], df.loc[df['in_range']==1, :]['data'], c='black')
# plot data points out of range (in red)
plt.scatter(df.loc[df['in_range']==0, :]['time'], df.loc[df['in_range']==0, :]['data'], c='red')
# plot lines between lower limit and upper limit
plt.plot((df['time'],df['time']),([i for i in df['ll']], [j for j in df['ul']]), c='lightgreen')
I need to try to plot 3 bars on the same graph. I have 2 dataframes set up right now. My first dataframe was created off a JSON file seen here.
My second dataframe was created in the code below:
def make_bar_graph():
with open('filelocation.json') as json_file:
data = json.load(json_file)
df = pd.DataFrame([])
for item in data["Results"]["Result"]:
df = df.append(pd.DataFrame.from_dict(kpi for kpi in item["KPI"]))
df.reset_index(level=0, inplace= True)
df.rename(columns={0: 'id', 1: 'average', 2:'std. dev', 3: 'min', 4:
'median', 5:'max'}, inplace=True)
wanted_x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
wanted_y = [5, 5, .500, .500, .500, 1, 1, 5, 5, .500, .500, .500, 1, 1]
kpi = ['kpi1', 'kpi2', 'kpi3', 'kpi4', 'kpi5', 'kpi6', 'kpi7', 'kpi8', 'kpi9', 'kpi10', 'kpi11', 'kpi12',
'kpi13', 'kpi14']
df2 = pd.DataFrame(dict(x=wanted_x, y=wanted_y, kpi=kpi))
sns.set()
sns.set_context("talk")
sns.axes_style("darkgrid")
h = sns.barplot(x='id', y ='average', data=df.ix[0:13], label='Test
on 4/30/2018', color='b')
g = sns.barplot(x='id', y='average', data=df.ix[14:27], label='Test
on 6/4/2018', color='r')
k = sns.barplot("x", "y", data=df2, label='Desired Results', color='y')
plt.legend()
plt.xlabel('KPI number')
plt.ylabel('Time(s)')
plt.show()
This is the graph I get from that:
Graph1
I need the bars to be next to each other, separated by id (or KPI, id number and KPI number are the same things). I'm not sure how to rework my dataframe to do this
When plotting 2 columns from a dataframe into a line plot, is it possible to, instead of a consistently increasing scale, have fixed values on your y axis (and keep the distances between the numbers on the axis constant)? For example, instead of 0, 100, 200, 300, ... to have 0, 21, 53, 124, 287, depending on the values from your dataset? So basically to have on the axis all your possible values fixed instead of an increasing scale?
Yes, you can use: ax.set_yticks()
Example:
df = pd.DataFrame([[13, 1], [14, 1.5], [15, 1.8], [16, 2], [17, 2], [18, 3 ], [19, 3.6]], columns = ['A','B'])
fig, ax = plt.subplots()
x = df['A']
y = df['B']
ax.plot(x, y, 'g-')
ax.set_yticks(y)
plt.show()
Or if the values are very distant each other, you can use ax.set_yscale('log').
Example:
df = pd.DataFrame([[13, 1], [14, 1.5], [15, 1.8], [16, 2], [17, 2], [18, 3 ], [19, 3.6], [20, 300]], columns = ['A','B'])
fig, ax = plt.subplots()
x = df['A']
y = df['B']
ax.plot(x, y, 'g-')
ax.set_yscale('log', basex=2)
ax.yaxis.set_ticks(y)
ax.yaxis.set_ticklabels(y)
plt.show()
What you need to do is:
get all distinct y values and sort them
set their y position on the plot according to their place on the ordered list
set the y labels according to distinct ordered values
The code below would do
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame([[13, 1], [14, 1.8], [16, 2], [15, 1.5], [17, 2], [18, 3 ],
[19, 200],[20, 3.6], ], columns = ['A','B'])
x = df['A']
y = df['B']
y_keys = np.sort(y.unique())
y_values = range(len(y_keys))
y_dict = dict(zip(y_keys,y_values))
fig, ax = plt.subplots()
ax.plot(x,[y_dict[k] for k in y],'o-')
ax.set_yticks(y_values)
ax.set_yticklabels(y_keys)
X = np.array([[24,13,38],[8,3,17],[21,6,40],[1,14,-9],[9,3,21],[7,1,14],[8,7,11],[10,16,3],[1,3,2],
[15,2,30],[4,6,1],[12,10,18],[1,9,-4],[7,3,19],[5,1,13],[1,12,-6],[21,9,34],[8,8,7],
[1,18,-18],[15,8,25],[16,10,29],[7,0,17],[14,2,31],[3,7,0],[5,6,7]])
pca = PCA(n_components=1)
pca.fit(X)
a = pca.components_[0][0] # a
b = pca.components_[0][1] # b
c = pca.components_[0][2] # c
def average(values):
if(values) ==0:
return None
return sum(values, 0.0) / len(values)
x_mean = average(x) # For an approximation
y_mean = average(y)
z_mean = average(z)
d = -(a * x_mean + b * y_mean + c * z_mean)
so -0.375978766054x + 0.10612154283y -0.920531469111z + 15.1366572005 = 0
Actually, I'm not sure it is right.
I want to draw a plane in this situation using matplotlib library.
How can I code this?
Each principal component defines a vector in the feature space. PCA orders those vectors based on the variance of the data in each direction. So the first vector will represent the maximum variance of the data and the last vector minimum variance. Assuming the data are distributed around a plane the third vector should be perpendicular to the plane. Here's the code:
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
X = np.array([[24,13,38],[8,3,17],[21,6,40],[1,14,-9],[9,3,21],[7,1,14],[8,7,11],[10,16,3],[1,3,2],
[15,2,30],[4,6,1],[12,10,18],[1,9,-4],[7,3,19],[5,1,13],[1,12,-6],[21,9,34],[8,8,7],
[1,18,-18],[15,8,25],[16,10,29],[7,0,17],[14,2,31],[3,7,0],[5,6,7]])
pca = PCA(n_components=3)
pca.fit(X)
eig_vec = pca.components_
print(pca.explained_variance_ratio_)
# [0.90946569 0.08816839 0.00236591]
# Percentage of variance explain by last vector is less 0.2%
# This is the normal vector of minimum variance
normal = eig_vec[2, :] # (a, b, c)
centroid = np.mean(X, axis=0)
# Every point (x, y, z) on the plane should satisfy a*x+b*y+c*z = d
# Taking centroid as a point on the plane
d = -centroid.dot(normal)
# Draw plane
xx, yy = np.meshgrid(np.arange(np.min(X[:, 0]), np.max(X[:, 0])), np.arange(np.min(X[:, 1]), np.max(X[:, 1])))
z = (-normal[0] * xx - normal[1] * yy - d) * 1. / normal[2]
# plot the surface
plt3d = plt.figure().gca(projection='3d')
plt3d.plot_surface(xx, yy, z)
plt3d.scatter(*(X.T))
plt.show()
The first principal component doesn't define a plane, it defines a vector in three dimensions. Here's how to visualize it in 3D: the code starts out with yours, and then has the plotting steps:
import numpy as np
from sklearn.decomposition import PCA
X = np.array([[24, 13, 38], [8, 3, 17], [21, 6, 40], [1, 14, -9], [9, 3, 21], [7, 1, 14],
[8, 7, 11], [10, 16, 3], [1, 3, 2], [15, 2, 30], [4, 6, 1], [12, 10, 18], [1, 9, -4],
[7, 3, 19], [5, 1, 13], [1, 12, -6], [21, 9, 34], [8, 8, 7], [1, 18, -18],
[15, 8, 25], [16, 10, 29], [7, 0, 17], [14, 2, 31], [3, 7, 0], [5, 6, 7]])
pca = PCA(n_components=1)
pca.fit(X)
## New code below
p = pca.components_
centroid = np.mean(X, 0)
segments = np.arange(-40, 40)[:, np.newaxis] * p
import matplotlib
matplotlib.use('TkAgg') # might not be necessary for you
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
plt.ion()
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
scatterplot = ax.scatter(*(X.T))
lineplot = ax.plot(*(centroid + segments).T, color="red")
plt.xlabel('x')
plt.ylabel('y')
plt.savefig('result.png', dpi=150)
(Note the above code was auto-formatted with yapf, which I highly recommend.) Resulting figure:
I have to plot multiple lines and their curve fit lines on a single plot. All these lines are plotted using a for loop. Since it is plot using loops the curve fit lines of the succeeding step is plotted over its predecessor as shown in figure.
The reproducible code:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
y = np.array([[4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[6, 5.2, 8.5, 9.1, 13.4, 15.1, 16.1, 18.3, 20.4, 22.1, 23.7]])
m, n = x.shape
figure = plt.figure(figsize=(5.15, 5.15))
figure.clf()
plot = plt.subplot(111)
for i in range(m):
poly = np.polyfit(x[i, :], y[i, :], deg =1)
plt.plot(poly[0] * x[i, :] + poly[1], linestyle = '-')
plt.plot(x[i, :], y[i, :], linestyle = '', marker = 'o', markersize = 20)
plot.set_ylabel('Y', labelpad = 6)
plot.set_xlabel('X', labelpad = 6)
plt.show()
I can fix this using another loop as:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
y = np.array([[4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[6, 5.2, 8.5, 9.1, 13.4, 15.1, 16.1, 18.3, 20.4, 22.1, 23.7]])
m, n = x.shape
figure = plt.figure(figsize=(5.15, 5.15))
figure.clf()
plot = plt.subplot(111)
for i in range(m):
poly = np.polyfit(x[i, :], y[i, :], deg =1)
plt.plot(poly[0] * x[i, :] + poly[1], linestyle = '-')
for i in range(m):
plt.plot(x[i, :], y[i, :], linestyle = '', marker = 'o', markersize = 20)
plot.set_ylabel('Y', labelpad = 6)
plot.set_xlabel('X', labelpad = 6)
plt.show()
which gives me all the fit lines below the markers.
But is there any built-in function in Python/matplotlib to do this without using two loops?
Update
Only as an example I have used n = 2, n can be greater than 2, i.e. the loop would be run multiple times.
Update 2 after answer
Can I do this for the same line also? As an example:
plt.plot(x[i, :], y[i, :], linestyle = ':', marker = 'o', markersize = 20)
Can I give the linestyle a zorder = 1 and the markers a zorder = 3?
Editing just your plotting lines:
plt.plot(poly[0] * x[i, :] + poly[1], linestyle = '-',
zorder=-1)
plt.plot(x[i, :], y[i, :], linestyle = '', marker = 'o', markersize = 20,
zorder=3)
now the markers are all in front of the lines, though within marker/line groups they're still order-of-plotting.
Update answer
No. One call to plot, one zorder argument.
If you want to match the color and style of markers and line in each pass through the loop, set up an iterator or generator for colors and get current_color on each pass, then use that as an argument for plot calls.