Box Plot of a many Pandas Dataframes - python

I have three dataframes containing 17 sets of data with groups A, B, and C. A shown in the following code snippet
import pandas as pd
import numpy as np
data1 = pd.DataFrame(np.random.rand(17,3), columns=['A','B','C'])
data2 = pd.DataFrame(np.random.rand(17,3)+0.2, columns=['A','B','C'])
data3 = pd.DataFrame(np.random.rand(17,3)+0.4, columns=['A','B','C'])
I would like to plot a box plot to compare the three groups as shown in the figure below
I am trying make the plot using seaborn's box plot as follows
import seaborn as sns
sns.boxplot(data1, groupby='A','B','C')
but obviously this does not work. Can someone please help?

Consider assigning an indicator like Location to distinguish your three sets of data. Then concatenate all three and melt the data to retrieve one value column, one Letter categorical column, and one Location column, all inputs into sns.boxplot:
import pandas as pd
import numpy as np
from matplotlib.pyplot as plt
import seaborn as sns
data1 = pd.DataFrame(np.random.rand(17,3), columns=['A','B','C']).assign(Location=1)
data2 = pd.DataFrame(np.random.rand(17,3)+0.2, columns=['A','B','C']).assign(Location=2)
data3 = pd.DataFrame(np.random.rand(17,3)+0.4, columns=['A','B','C']).assign(Location=3)
cdf = pd.concat([data1, data2, data3])
mdf = pd.melt(cdf, id_vars=['Location'], var_name=['Letter'])
print(mdf.head())
# Location Letter value
# 0 1 A 0.223565
# 1 1 A 0.515797
# 2 1 A 0.377588
# 3 1 A 0.687614
# 4 1 A 0.094116
ax = sns.boxplot(x="Location", y="value", hue="Letter", data=mdf)
plt.show()

Related

Rearranging the columns of my heatmap using python's seaborn

I'm trying to visualize the following .csv data:
Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10,Q11,Q12,Q13,Q14,Q15,Q16,Q17,Q18,Q19,Q20
4,4,2,2,4,2,3,5,3,4,2,5,2,1,4,4,2,1,5,2
2,2,4,4,4,2,2,2,4,4,2,4,2,2,3,2,2,4,5,2
4,5,4,1,4,2,2,4,4,3,2,2,2,1,2,4,4,2,5,4
3,4,2,4,4,2,2,2,4,3,2,4,4,3,3,4,2,4,5,1
4,4,3,2,4,3,4,5,4,3,1,5,3,2,4,2,2,3,4,2
4,5,2,3,5,1,3,4,3,3,1,2,4,4,5,4,1,4,5,4
5,5,5,2,4,3,2,4,4,2,2,4,4,2,4,2,2,4,4,5
4,4,3,1,5,3,2,4,2,2,1,4,4,2,4,1,2,5,5,3
1,3,5,2,4,4,3,1,4,4,2,3,1,4,3,4,3,3,4,1
3,3,5,2,4,2,4,4,3,4,1,5,4,2,1,2,2,4,5,2
Here's my code:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
map = sns.clustermap(df, annot=True, linewidths=2, linecolor='yellow', metric="correlation", method="single")
plt.show()
Which returns:
I want to rearrange my heatmap and order it column-wise by the frequency of each response. For example, The column Q5 has the value 4 repeated 8 times (more than any other column), so it should be the first column. Columns 17 and 19 have a value that is repeated 7 times, so they should come in second and third (exact order doesn't matter). How can I do this?
You can compute the order and reindex before using the data in clustermap:
order = (df.apply(pd.Series.value_counts)
.max()
.sort_values(ascending=False)
.index
)
import seaborn as sns
cm = sns.clustermap(df[order], col_cluster=False, annot=True, linewidths=2, linecolor='yellow', metric="correlation", method="single")
Output:

Bar plot and coloured categorical variable

I have a dataframe with 3 variables:
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Approved"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
I want a barplot grouped by the Period column, showing all the values ​​contained in the Observations column and colored with the Result column.
How can I do this?
I tried the sns.barplot, but it joined the values in Observations column in just one bar(mean of the values).
sns.barplot(x='Period',y='Observations',hue='Result',data=df,ci=None)
Plot output
Assuming that you want one bar for each row, you can do as follows:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
result_cat = df["Result"].astype("category")
result_codes = result_cat.cat.codes.values
cmap = plt.cm.Dark2(range(df["Result"].unique().shape[0]))
patches = []
for code in result_cat.cat.codes.unique():
cat = result_cat.cat.categories[code]
patches.append(mpatches.Patch(color=cmap[code], label=cat))
df.plot.bar(x='Period',
y='Observations',
color=cmap[result_codes],
legend=False)
plt.ylabel("Observations")
plt.legend(handles=patches)
If you would like it grouped by the months, and then stacked, please use the following (note I updated your code to make sure one month had more than one status), but not sure I completely understood your question correctly:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Under evaluation"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
df.groupby(['Period', 'Result'])['Observations'].sum().unstack('Result').plot(kind='bar', stacked=True)

Pandas 2 dataframes into one graph

I have 2 separate dataframes that look exactly the same but with different numbers in it
df = pd.DataFrame({'clip emotes':[79,223,435,291,188,99,153,50,55,78,83,48,43,73]}, index=['roohappy','rooblank','lul','omegalul','pog','pogchamp','roovv','roowut','roopog','pepehands','biblethumb','roocry','rooree','rooblind'])
df
and
df = pd.DataFrame({'vod emotes':[3963,7286,5560,4390,3386,3111,2639,2612,2422,1999,1948,1691,1654,1573,1308,1090,1024,1019,1019,974,945,912,893,856,790,771,731,677,658,652]}, index=['rood','roovv','pepega','lul','clap','rookek','roocult','rooblank','pog','rooree','rooaww','roohappy','omegaroll','rooduck','rooh','rareroo','roocry','pepehand','lulw','rooderp','roopog','hyperclap','roospy','rooayaya','omegalul','roolove','roowut','roonya','monkas','roo4'])
df
and then I do df.plot(kind = 'bar') for both of the separately. I cant figure out how can I put these two datas into a one graph one over the other so that one bar with the same name would be over the other with a different colour.
You can do it by joining them:
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.DataFrame({'clip emotes':[79,223,435,291,188,99,153,50,55,78,83,48,43,73]}, index=['roohappy','rooblank','lul','omegalul','pog','pogchamp','roovv','roowut','roopog','pepehands','biblethumb','roocry','rooree','rooblind'])
df2 = pd.DataFrame({'vod emotes':[3963,7286,5560,4390,3386,3111,2639,2612,2422,1999,1948,1691,1654,1573,1308,1090,1024,1019,1019,974,945,912,893,856,790,771,731,677,658,652]}, index=['rood','roovv','pepega','lul','clap','rookek','roocult','rooblank','pog','rooree','rooaww','roohappy','omegaroll','rooduck','rooh','rareroo','roocry','pepehand','lulw','rooderp','roopog','hyperclap','roospy','rooayaya','omegalul','roolove','roowut','roonya','monkas','roo4'])
df3 = df2.join(df1)
df3.plot(kind='bar', stacked=True)
plt.tight_layout()

Pyhon matplotlib - plot box plots from 2 different data frames

Hello,
I'm trying to plot a box plot combining columns from two different data frames. Help please :)
This is the code:
import pandas as pd
from numpy import random
#Generating the data frame
df1 = pd.DataFrame(data = random.randn(5,2), columns = ['W','Y'])
df2 = pd.DataFrame(data = random.randn(5,2), columns = ['X','Y'])
print(df1.head())
print('\n')
print(df2.head())
This is the output:
This is what I want to get:
The following will give you what you desire:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
ax.boxplot([df1['Y'], df2['Y']], positions=[1, 2])
ax.set_xticklabels(['W', 'X'])
ax.set_ylabel('Y')
This gave me the plot below (which I think is what you were aiming for):

Pandas.plot(subplots=True) with 3 columns in each subplot

I have a DataFrame with 700 rows and 6 columns:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(700,6))
I can plot all columns in a single plot by calling:
df.plot()
And I can plot each column in a single plot by calling:
df.plot(subplots=True)
How can I have two subplots with three columns each from my DataFrame?!
Here's a general approach to plot a dataframe with n columns in each subplot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(700,6))
col_per_plot = 3
cols = df.columns.tolist()
# Create groups of 3 columns
cols_splits = [cols[i:i+col_per_plot] for i in range(0, len(cols), col_per_plot)]
# Define plot grid.
# Here I assume it is always one row and many columns. You could fancier...
fig, axarr = plt.subplots(1, len(cols_splits))
# Plot each "slice" of the dataframe in a different subplot
for cc, ax in zip(cols_splits, axarr):
df.loc[:, cc].plot(ax = ax)
This gives the following picture:

Categories

Resources