How to plot errorbar in line chart from dataframes - python

I have two dataframes, df_avg and df_sem, which contain mean values and standard errors of the means, respectively. For example:
KPCmb1 KPCmb1IA KPCmb2 KPCmb3 KPCmb4 KPCmb5 KPCmb6
temp
19.99 15.185905 24.954296 22.610052 29.249107 26.151815 34.374257 36.589218
20.08 15.198452 24.998227 22.615342 29.229325 26.187794 34.343738 36.596730
20.23 15.208917 25.055061 22.647499 29.234424 26.193382 34.363549 36.580033
20.47 15.244485 25.092773 22.691421 29.206816 26.202425 34.337385 36.640839
20.62 15.270921 25.145798 22.720752 29.217821 26.235101 34.364162 36.600030
and
KPCmb1 KPCmb1IA KPCmb2 KPCmb3 KPCmb4 KPCmb5 KPCmb6
temp
19.99 0.342735 0.983424 0.131502 0.893494 1.223318 0.536450 0.988185
20.08 0.347366 0.983732 0.136239 0.898661 1.230763 0.534779 0.993970
20.23 0.348641 0.981614 0.134729 0.898790 1.227567 0.529240 1.005609
20.47 0.350937 0.993973 0.138411 0.881142 1.237749 0.526841 0.991591
20.62 0.345863 0.983064 0.132934 0.883863 1.234746 0.533048 0.987520
I want to plot a line chart using temp as the x-axis and the dataframe columns as the y-axes. I also want to use the df_sem dataframe to provide error bars for each line (note the column names are the same between the two dataframes).
I can achieve this with the following code:
df_avg.plot(yerr=df_sem), but this does not allow me to change many aspects of the plot, like DPI, labels, and things like that.
So I've tried to make the plot using the following code as an alternative:
plt.figure()
x = df_avg.index
y = df_avg
plt.errorbar(x,y,yerr=df_sem)
plt.show()
But this gives me the error: ValueError: shape mismatch: objects cannot be broadcast to a single shape
How do I go about making the same chart that I am able to using pandas plotting with matplotlib plotting?
Thanks!

You can do just a simple for loop:
for col in df_avg.columns:
plt.errorbar(df_avg.index, df_avg[col], yerr=df_sem[col], label=col)
plt.legend()
Output:

Related

Create a scatter plot from an ndarray using the position in the row as the x-axis value and the value in the array for the y-axis

much like the title says I am trying to create a graph that shows 1-6 on the x-axis (the values position in the row of the array) and its value on the y-axis. A snippet from the array is shown below, with each column representing a coefficient number from 1-6.
[0.99105 0.96213 0.96864 0.96833 0.96698 0.97381]
[0.99957 0.99709 0.9957 0.9927 0.98492 0.98864]
[0.9967 0.98796 0.9887 0.98613 0.98592 0.99125]
[0.9982 0.99347 0.98943 0.96873 0.91424 0.83831]
[0.9985 0.99585 0.99209 0.98399 0.97253 0.97942]
It's already set up as a numpy array. I think it's relatively straightforward, just drawing a complete mental blank.
Any ideas?
Do you want something like this?
a = np.array([[0.99105, 0.96213, 0.96864, 0.96833, 0.96698, 0.97381],
[0.99957, 0.99709, 0.9957, 0.9927, 0.98492, 0.98864],
[0.9967, 0.98796, 0.9887, 0.98613, 0.98592, 0.99125],
[0.9982, 0.99347, 0.98943, 0.96873, 0.91424, 0.83831],
[0.9985, 0.99585, 0.99209, 0.98399, 0.97253, 0.97942]])
import matplotlib.pyplot as plt
plt.scatter(x=np.tile(np.arange(a.shape[1]), a.shape[0])+1, y=a)
output:
Note that you can emulate the same with groups using:
plt.plot(a.T, marker='o', ls='')
x = np.arange(a.shape[0]+1)
plt.xticks(x, x+1)
output:

Seaborn catplot results in error by changing hue

I have a dataset that looks like this:
feature_1
feature_2
feature_3
feature_4
feature_5
feature_6
feature_7
feature_8
0
-0.0020185900105266514
-0.004525512052716703
0.004290147446159787
0.008121342033951665
0.019995812082180105
0.02034942055088337
-0.02236798581774497
-0.018665971326321824
1
0.008327938744324304
0.0057161731520134415
0.015149000101932132
0.014244686228342962
0.031266799783999905
0.02556201262830425
0.00491191281881069
0.002627771331087464
2
0.0056570911367399175
0.006780099460379361
-0.0038521559525533412
-0.0042372049750104175
0.025755417055772233
0.029050369619095566
-0.0016924684746490136
0.001915807620861465
3
-0.0066361424845156666
-0.006829267976941566
0.008195242107994306
0.00993842145208005
0.02794638215808405
0.025168342480038512
-0.013222987355723491
-0.011178407242310215
4
0.005111817323414786
0.002367954071875622
-0.0013140356150100757
-0.0027816139194379794
0.025028881734832177
0.029704777330334546
0.0073461329985677545
0.008414726948742138
I have been able to create a catplot that is almost perfect, like this:
sns.catplot(data=test_df, palette="dark", orient="h")
Resulting in:
However, I want the colors to change depending on the results of a list (which I could append to test_df). The list is as follows:
classifications = ["class_1", "class_2", "class_1", "class_1", "class_2"]. Ideally, I'd like for the colors of the points to be different depending on the class.
Trying to add the hue parameter errors out, resulting in ValueError: Cannot use 'hue' without 'x' and 'y'
How can I change the colors of the points based on the values of the classifications list?
You can add the class column and melt() into seaborn's preferred long form:
test_df["class"] = classifications
melted = test_df.melt("class", value_name="value", var_name="feature")
sns.catplot(data=melted, x="value", y="feature", hue="class", palette="dark", orient="h")

Python sort_values plot is inverted

new Python learner here. This seems like a very simple task but I can't do it to save my life.
All I want to do is to grab 1 column from my DataFrame, sort it, and then plot it. THAT'S IT. But when I plot it, the graph is inverted. Upon examination, I find that the values are sorted, but the index is not...
Here is my simple 3 liner code:
testData = pd.DataFrame([5,2,4,2,5,7,9,7,8,5,4,6],[9,4,3,1,5,6,7,5,4,3,7,8])
x = testData[0].sort_values()
plt.plot(x)
edit:
Using matplotlib
If you're talking about ordering values sequentially on the x-axis like 0, 1, 2, 3, 4 ... You need to re-index your values.
x = testData[0].sort_values()
x.index = range(len(x))
plt.plot(x)
Other than that if you want your values sorted in the data frame but displayed by order of index then you want a scatter plot not a line plot
plt.scatter(x.index, x.values)

Matplotlib Bar Plot Grouping Subplots

I'm currently generating this plot:
But as you can see, it is taking taking the items from arrays below and spreading them across the xticks.
[array([ 1.77009257, 1.57980963, 0.31896943, 0.01874767]), array([ 1.02788175, 0.99604306])]
[array([ 0.20091287, 0.14682076, 0.03212798, 0.00187477]), array([ 0.09545977, 0.11318596])]
What I want is to create a cluster of all four items from the first array over the xtick -2wks and a cluster of the two items of the next array over xtick -1wk.
Bonus points if you can then give each bar in a given cluster the corresponding label from these arrays.
[Index([u'AL GAINESVILLE LOCK', u'AL GREENSBORO', u'AL HIGHLAND HOME', u'AL BREWTON 3 SSE'],dtype='object', name=u'StateCity'), Index([u'AL GREENSBORO', u'AL GAINESVILLE LOCK'], dtype='object', name=u'StateCity')]
You might be best off using pandas plot for this. The second example here looks very similar to what you would like to achieve if I understand you correctly.
If you transpose your data so that the index you show below makes up the columns and your xticks make up the new index, you should get what you are looking for.

Plot rolling mean together with data

I have a DataFrame that looks something like this:
####delays:
Worst case Avg case
2014-10-27 2.861433 0.953108
2014-10-28 2.899174 0.981917
2014-10-29 3.080738 1.030154
2014-10-30 2.298898 0.711107
2014-10-31 2.856278 0.998959
2014-11-01 3.118587 1.147104
...
I would like to plot the data of this DataFrame, together with the rolling mean of the data. I would like the data itself should be a dotted line and the rolling mean to be a full line. The worst case column should be in red, while the average case column should be in blue.
I've tried the following code:
import pandas as pd
import matplotlib.pyplot as plt
rolling = pd.rolling_mean(delays, 7)
delays.plot(x_compat=True, style='r--')
rolling.plot(style='r')
plt.title('Delays per day on entire network')
plt.xlabel('Date')
plt.ylabel('Minutes')
plt.show()
Unfortunately, this gives me 2 different plots. One with the data and one with the rolling mean. Also, the worst case column and average case column are both in red.
How can I get this to work?
You need to say to pandas where you want to plot. By default pandas creates a new figure.
Just modify these 2 lines:
delays.plot(x_compat=True, style='r--')
rolling.plot(style='r')
by:
ax_delays = delays.plot(x_compat=True, style='--', color=["r","b"])
rolling.plot(color=["r","b"], ax=ax_delays, legend=0)
in the 2nd line you now tell pandas to plot on ax_delays, and to not show the legend again.
To get 2 different colors for the 2 lines, just pass as many colors with color argument (see above).

Categories

Resources