pandas boxplot contains content of plot saved before - python

I'm plotting some columns of a datafame into a boxplot. Sofar, no problem. As seen below I wrote some stuff and it works. BUT: the second plot contains the plot of the first plot, too. So as you can see I tried it with "= None" or "del value", but it does not work. Putting the plot function outside also don't solves the problem.
Whats wrong with my code?
Here is an executable example
import pandas as pd
d1 = {'ff_opt_time': [10, 20, 11, 5, 15 , 13, 19, 25 ], 'ff_count_opt': [30, 40, 45, 29, 35,38,32,41]}
df1 = pd.DataFrame(data=d1)
d2 = {'ff_opt_time': [1, 2, 1, 5, 1 , 1, 4, 5 ], 'ff_count_opt': [3, 4, 4, 9, 5,3, 2,4]}
df2 = pd.DataFrame(data=d2)
def evaluate2(df1, df2):
def plot(df, output ):
boxplot = df.boxplot(rot=45,fontsize=5)
fig = boxplot.get_figure()
fig.savefig(output + ".pdf")
df_ot = pd.DataFrame(columns=['opt_time1' , 'opt_time2'])
df_ot['opt_time1'] = df1['ff_opt_time']
df_ot['opt_time2'] = df2['ff_opt_time']
plot(df_ot, "bp_opt_time")
df_op = pd.DataFrame(columns=['count_opt1' , 'count_opt2'])
df_op['count_opt1'] = df1['ff_count_opt']
df_op['count_opt2'] = df2['ff_count_opt']
plot(df_op, "bp_count_opt_perm")
evaluate2(df1, df2)
Here is another executable example. I even used other variable names.
import pandas as pd
d1 = {'ff_opt_time': [10, 20, 11, 5, 15 , 13, 19, 25 ], 'ff_count_opt': [30, 40, 45, 29, 35,38,32,41]}
df1 = pd.DataFrame(data=d1)
d2 = {'ff_opt_time': [1, 2, 1, 5, 1 , 1, 4, 5 ], 'ff_count_opt': [3, 4, 4, 9, 5,3, 2,4]}
df2 = pd.DataFrame(data=d2)
def evaluate2(df1, df2):
df_ot = pd.DataFrame(columns=['opt_time1' , 'opt_time2'])
df_ot['opt_time1'] = df1['ff_opt_time']
df_ot['opt_time2'] = df2['ff_opt_time']
boxplot1 = df_ot.boxplot(rot=45,fontsize=5)
fig1 = boxplot1.get_figure()
fig1.savefig( "bp_opt_time.pdf")
df_op = pd.DataFrame(columns=['count_opt1' , 'count_opt2'])
df_op['count_opt1'] = df1['ff_count_opt']
df_op['count_opt2'] = df2['ff_count_opt']
boxplot2 = df_op.boxplot(rot=45,fontsize=5)
fig2 = boxplot2.get_figure()
fig2.savefig( "bp_count_opt_perm.pdf")
evaluate2(df1, df2)

I can see from your code that boxplots: boxplot1 & boxplot2 are in the same graph. What you need to do is instruct that there is going to be two plots.
This can be achieved either by
Create two sub plots using pyplot in matplotlib, this code does the trick fig1, ax1 = plt.subplots() with ax1 specifying boxplot to put in that axes and fig2 specifying boxplot figure
Dissolve evaluate2 function and execute the boxplot separately in different cell in the jupyter notebook
Solution 1 : Two subplots using pyplot
import pandas as pd
import matplotlib.pyplot as plt
d1 = {'ff_opt_time': [10, 20, 11, 5, 15 , 13, 19, 25 ], 'ff_count_opt': [30, 40, 45, 29, 35,38,32,41]}
df1 = pd.DataFrame(data=d1)
d2 = {'ff_opt_time': [1, 2, 1, 5, 1 , 1, 4, 5 ], 'ff_count_opt': [3, 4, 4, 9, 5,3, 2,4]}
df2 = pd.DataFrame(data=d2)
def evaluate2(df1, df2):
df_ot = pd.DataFrame(columns=['opt_time1' , 'opt_time2'])
df_ot['opt_time1'] = df1['ff_opt_time']
df_ot['opt_time2'] = df2['ff_opt_time']
fig1, ax1 = plt.subplots()
boxplot1 = df_ot.boxplot(rot=45,fontsize=5)
ax1=boxplot1
fig1 = boxplot1.get_figure()
fig1.savefig( "bp_opt_time.pdf")
df_op = pd.DataFrame(columns=['count_opt1' , 'count_opt2'])
df_op['count_opt1'] = df1['ff_count_opt']
df_op['count_opt2'] = df2['ff_count_opt']
fig2, ax2 = plt.subplots()
boxplot2 = df_op.boxplot(rot=45,fontsize=5)
fig2 = boxplot2.get_figure()
ax2=boxplot2
fig2.savefig( "bp_count_opt_perm.pdf")
plt.show()
evaluate2(df1, df2)
Solution 2: Executing boxplot in different cell
Update based on comments : clearing plots
Two ways you can clear the plot,
plot itself using clf()
matplotlib.pyplot.clf() function to clear the current Figure’s state without closing it
clear axes using cla()
matplotlib.pyplot.cla() function clears the current Axes state without closing the Axes.
Simply call plt.clf() function after calling fig.save
Read this documentation on how to clear a plot in Python using matplotlib

Just grab the code from Archana David and put it in your plot function: the goal is to call "fig, ax = plt.subplots()" to create a new graph.
import pandas as pd
import matplotlib.pyplot as plt
d1 = {'ff_opt_time': [10, 20, 11, 5, 15, 13, 19, 25],
'ff_count_opt': [30, 40, 45, 29, 35, 38, 32, 41]}
df1 = pd.DataFrame(data=d1)
d2 = {'ff_opt_time': [1, 2, 1, 5, 1, 1, 4, 5],
'ff_count_opt': [3, 4, 4, 9, 5, 3, 2, 4]}
df2 = pd.DataFrame(data=d2)
def evaluate2(df1, df2):
def plot(df, output):
fig, ax = plt.subplots()
boxplot = df.boxplot(rot=45, fontsize=5)
ax = boxplot
fig = boxplot.get_figure()
fig.savefig(output + ".pdf")
df_ot = pd.DataFrame(columns=['opt_time1', 'opt_time2'])
df_ot['opt_time1'] = df1['ff_opt_time']
df_ot['opt_time2'] = df2['ff_opt_time']
plot(df_ot, "bp_opt_time")
df_op = pd.DataFrame(columns=['count_opt1' , 'count_opt2'])
df_op['count_opt1'] = df1['ff_count_opt']
df_op['count_opt2'] = df2['ff_count_opt']
plot(df_op, "bp_count_opt_perm")
evaluate2(df1, df2)

Related

Bar graph df.plot() vs ax.bar() structure matplotlib

I am trying to graph a table as a bar graph.
I get my desired outcome using df.plot(kind='bar') structure. But for certain reasons, I now need to graph it using the ax.bar() structure.
Please refer to the example screenshot. I would like to graph the x axis as categorical labels like the df.plot(kind='bar') structure rather than continuous scale, but need to learn to use ax.bar() structure to do the same.
Make the index categorical by setting the type to 'str'
import pandas as pd
import matplotlib.pyplot as plt
data = {'SA': [11, 12, 13, 16, 17, 159, 209, 216],
'ET': [36, 45, 11, 15, 16, 4, 11, 10],
'UT': [11, 26, 10, 11, 16, 7, 2, 2],
'CT': [5, 0.3, 9, 5, 0.2, 0.2, 3, 4]}
df = pd.DataFrame(data)
df['SA'] = df['SA'].astype('str')
df.set_index('SA', inplace=True)
width = 3
fig, ax = plt.subplots(figsize=(12, 8))
p1 = ax.bar(df.index, df.ET, color='b', label='ET')
p2 = ax.bar(df.index, df.UT, bottom=df.ET, color='g', label='UT')
p3 = ax.bar(df.index, df.CT, bottom=df.ET+df.UT, color='r', label='CT')
plt.legend()
plt.show()

Why does setting hue in seaborn plot change the size of a point?

The plot I am trying to make needs to achieve 3 things.
If a quiz is taken on the same day with the same score, that point needs to be bigger.
If two quiz scores overlap there needs to be some jitter so we can see all points.
Each quiz needs to have its own color
Here is how I am going about it.
import seaborn as sns
import pandas as pd
data = {'Quiz': [1, 1, 2, 1, 2, 1],
'Score': [7.5, 5.0, 10, 10, 10, 10],
'Day': [2, 5, 5, 5, 11, 11],
'Size': [115, 115, 115, 115, 115, 355]}
df = pd.DataFrame.from_dict(data)
sns.lmplot(x = 'Day', y='Score', data = df, fit_reg=False, x_jitter = True, scatter_kws={'s': df.Size})
plt.show()
Setting the hue, which almost does everything I need, results in this.
import seaborn as sns
import pandas as pd
data = {'Quiz': [1, 1, 2, 1, 2, 1],
'Score': [7.5, 5.0, 10, 10, 10, 10],
'Day': [2, 5, 5, 5, 11, 11],
'Size': [115, 115, 115, 115, 115, 355]}
df = pd.DataFrame.from_dict(data)
sns.lmplot(x = 'Day', y='Score', data = df, fit_reg=False, hue = 'Quiz', x_jitter = True, scatter_kws={'s': df.Size})
plt.show()
Is there a way I can have hue while keeping the size of my points?
It doesn't work because when you are using hue, seaborn does two separate scatterplots and therefore the size argument you are passing using scatter_kws= no longer aligns with the content of the dataframe.
You can recreate the same effect by hand however:
x_col = 'Day'
y_col = 'Score'
hue_col = 'Quiz'
size_col = 'Size'
jitter=0.2
fig, ax = plt.subplots()
for q,temp in df.groupby(hue_col):
n = len(temp[x_col])
x = temp[x_col]+np.random.normal(scale=0.2, size=(n,))
ax.scatter(x,temp[y_col],s=temp[size_col], label=q)
ax.set_xlabel(x_col)
ax.set_ylabel(y_col)
ax.legend(title=hue_col)

How to keep the index when using pd.melt and merge to create a DataFrame for Seaborn and matplotlib

I am trying to draw subplots using two identical DataFrames ( predicted and observed) with exact same structure ... the first column is index
The code below makes new index when they are concatenated using pd.melt and merge
as you can see in the figure the index of orange line is changed from 1-5 to 6-10
I was wondering if some could fix the code below to keep the same index for the orange line:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
actual = pd.DataFrame({'a': [5, 8, 9, 6, 7, 2],
'b': [89, 22, 44, 6, 44, 1]})
predicted = pd.DataFrame({'a': [7, 2, 13, 18, 20, 2],
'b': [9, 20, 4, 16, 40, 11]})
# Creating a tidy-dataframe to input under seaborn
merged = pd.concat([pd.melt(actual), pd.melt(predicted)]).reset_index()
merged['category'] = ''
merged.loc[:len(actual)*2,'category'] = 'actual'
merged.loc[len(actual)*2:,'category'] = 'predicted'
g = sns.FacetGrid(merged, col="category", hue="variable")
g.map(plt.plot, "index", "value", alpha=.7)
g.add_legend();
The orange line ('variable' == 'b') doesn't have an index of 0-5 because of how you used melt. If you look at pd.melt(actual), the index doesn't match what you are expecting, IIUC.
Here is how I would rearrange the dataframe:
merged = pd.concat([actual, predicted], keys=['actual', 'predicted'])
merged.index.names = ['category', 'index']
merged = merged.reset_index()
merged = pd.melt(merged, id_vars=['category', 'index'], value_vars=['a', 'b'])
Set the ignore_index variable to false to preserve the index., e.g.
df = df.melt(var_name=‘species’, value_name=‘height’, ignore_index = False)

python - Plotting bar graph side by side on the same graph with seaborn

I need to try to plot 3 bars on the same graph. I have 2 dataframes set up right now. My first dataframe was created off a JSON file seen here.
My second dataframe was created in the code below:
def make_bar_graph():
with open('filelocation.json') as json_file:
data = json.load(json_file)
df = pd.DataFrame([])
for item in data["Results"]["Result"]:
df = df.append(pd.DataFrame.from_dict(kpi for kpi in item["KPI"]))
df.reset_index(level=0, inplace= True)
df.rename(columns={0: 'id', 1: 'average', 2:'std. dev', 3: 'min', 4:
'median', 5:'max'}, inplace=True)
wanted_x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
wanted_y = [5, 5, .500, .500, .500, 1, 1, 5, 5, .500, .500, .500, 1, 1]
kpi = ['kpi1', 'kpi2', 'kpi3', 'kpi4', 'kpi5', 'kpi6', 'kpi7', 'kpi8', 'kpi9', 'kpi10', 'kpi11', 'kpi12',
'kpi13', 'kpi14']
df2 = pd.DataFrame(dict(x=wanted_x, y=wanted_y, kpi=kpi))
sns.set()
sns.set_context("talk")
sns.axes_style("darkgrid")
h = sns.barplot(x='id', y ='average', data=df.ix[0:13], label='Test
on 4/30/2018', color='b')
g = sns.barplot(x='id', y='average', data=df.ix[14:27], label='Test
on 6/4/2018', color='r')
k = sns.barplot("x", "y", data=df2, label='Desired Results', color='y')
plt.legend()
plt.xlabel('KPI number')
plt.ylabel('Time(s)')
plt.show()
This is the graph I get from that:
Graph1
I need the bars to be next to each other, separated by id (or KPI, id number and KPI number are the same things). I'm not sure how to rework my dataframe to do this

Joining two Pandas dataframes and producing side-by-side barplot?

Suppose I have two Pandas dataframes, df1 and df2, each with two columns, hour and value. Some of the hours are missing in the two dataframes.
import pandas as pd
import matplotlib.pyplot as plt
data1 = [
('hour', [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]),
('value', [12.044324085714285, 8.284134466666668, 9.663580800000002,
18.64010145714286, 15.817029916666664, 13.242198508695651,
10.157177889201877, 9.107153674476985, 10.01193336545455,
16.03340384878049, 16.037368506666674, 16.036160044827593,
15.061596637500001, 15.62831551764706, 16.146087032608694,
16.696574719512192, 16.02603831463415, 17.07469460470588,
14.69635686969697, 16.528905725581396, 12.910250661111112,
13.875522341935481, 12.402971938461539])
]
df1 = pd.DataFrame.from_items(data1)
df1.head()
# hour value
# 0 0 12.044324
# 1 1 8.284134
# 2 2 9.663581
# 3 4 18.640101
# 4 5 15.817030
data2 = [
('hour', [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23]),
('value', [27.2011904, 31.145661266666668, 27.735570511111113,
18.824297487999996, 17.861847334275623, 25.3033003254902,
22.855934450000003, 31.160574200000003, 29.080220000000004,
30.987719745454548, 26.431310216666663, 30.292641480000004,
27.852885586666666, 30.682682472727276, 29.43023531764706,
24.621718962500005, 33.92878745, 26.873105866666666,
34.06412232, 32.696606333333335])
]
df2 = pd.DataFrame.from_items(data2)
df2.head()
# hour value
# 0 0 27.201190
# 1 5 31.145661
# 2 6 27.735571
# 3 7 18.824297
# 4 8 17.861847
I would like to join them together using the key of hour and then produce a side-by-side barplot of the data. The x-axis would be hour, and the y-axis would be value.
I can create a bar plot of one dataframe at a time.
_ = plt.bar(df1.hour.tolist(), df1.value.tolist())
_ = plt.xticks(df1.hour, rotation=0)
_ = plt.grid()
_ = plt.show()
_ = plt.bar(df2.hour.tolist(), df2.value.tolist())
_ = plt.xticks(df2.hour, rotation=0)
_ = plt.grid()
_ = plt.show()
However, what I want is to create a barchart of them side by side, like this:
Thank you for any help.
You can do it all in one line, if you wish. Making use of the pandas plotting wrapper and the fact that plotting a dataframe with several columns will group the plot. Given the definitions of df1 and df2 from the question, you can call
pd.merge(df1,df2, how='outer', on=['hour']).set_index("hour").plot.bar()
plt.show()
resulting in
Note that this leaves out the number 3 in this case as it is not part of any hour column in any of the two dataframes. To include it, use reset_index
pd.merge(df1,df2, how='outer', on=['hour']).set_index("hour").reindex(range(24)).plot.bar()
First reindex the dataframes and then create two barplots using the data. The positioning of the rectangles is given by (x - width/2, x + width/2, bottom, bottom + height).
import numpy as np
index = np.arange(0, 24)
bar_width = 0.3
df1 = df1.set_index('hour').reindex(index)
df2 = df2.set_index('hour').reindex(index)
plt.figure(figsize=(10, 5))
plt.bar(index - bar_width / 2, df1.value, bar_width, label='df1')
plt.bar(index + bar_width / 2, df2.value, bar_width, label='df2')
plt.xticks(index)
plt.legend()
plt.tight_layout()
plt.show()

Categories

Resources