I would like to use seaborn (matplotlib would be good, too) to create a barplot from my DataFrame.
But judging from the docs, the barplot function expects a list of values that looks like this:
Then you can plot it with:
tips = sns.load_dataset("tips")
sns.barplot(x="day", y="total_bill", hue="sex", data=tips)
My data looks different, I have created a multi_index in the columns. Since I can't publish my original data, here is a mockup on how it would look for the tips dataset:
And here is the code that creates the above dataframe:
index_tuples=[]
for sex in ["Male", "Female"]:
for day in ["Sun", "Mon"]:
index_tuples.append([sex, day])
index = pd.MultiIndex.from_tuples(index_tuples, names=["sex", "day"])
dataframe = pd.DataFrame(columns = index)
total_bill = {"Male":{"Sun":5, "Mon":3},"Female":{"Sun":10, "Mon":5}}
dataframe = dataframe.append(pd.DataFrame.from_dict(total_bill).unstack().rename('total_bill'))
Now, my question is: How can I create a barplot from this multiindex ?
The solution should group the bars correctly, as does the hue argument of seaborn. Simply getting the data as an array and passing it to matplotlib doesn't work.
My solution so far is to converting the multiindex into columns by repeatedly stacking the DataFrame. Like this:
stacked_frame = dataframe.stack().stack().to_frame().reset_index()
It results in the data layout expected by seaborn:
And you can plot it with
sns.barplot(x="day", y=0, hue="sex", data=stacked_frame)
plt.show()
Can I create a barplot directly from the multiindex ?
Is this what you are looking for?
idx = pd.MultiIndex.from_product([['M', 'F'], ['Mo', 'Tu', 'We']], names=['sex', 'day'])
df = pd.DataFrame(np.random.randint(2, 10, size=(idx.size, 1)), index=idx, columns=['total bill'])
df.unstack(level=0)['total bill'].plot(
kind='bar'
)
plt.ylabel('total bill');
Related
I have a df that looks like this:
image of the dataframe
my goal is to make a line chart that sums up the codes for each month and, after this, add a dropdown to be able to filter between 'type', group' and 'Spec.'
If I didn't want the dropdown filter, I could achieve this with
`df.groupby('month')['code'].count().reset_index()`
Since I need the filters, the ideal is to be able to do this sum in the graph code in plotly, so I don't lose the 'type', group' and 'Spec.' columns.
I tryed this code:
`line_fig1 = px.line(data_frame = df,
x= 'month',
y='code',
labels={'month':'','code':''},
title='',
width=450,
height=250,
template='plotly_white',
color_discrete_sequence= ["rgb(1, 27, 105)"],
markers=True,
text='code'
)`
and this was the result:
image of the chart
I also tryed something like
`line_fig1 = px.line(data_frame = df,
x= 'month',
y='code'.count()`
or even tryed to add a column with a number one, so the chart could aggregate
`df['assign_value'] = 1
line_fig1 = px.line(data_frame = df,
x= 'month',
y='assign_value'`
But this also don't work.
Any help here?
I think you should groupby by month and code and then use new dataframe to make line graph. Something as below:
df2 = df.groupby(['month', 'code'])['code'].count().reset_index(name='counts')
fig = px.line(df2,x='month',y='counts', color='code')
fig.show()
I have two different DataFrames in Python, one is the actual revenue values and the second one is the values of the prediction with the accumulative per day (index of the rows). Both DataFrames have the same length.
I want to compare them on the same plot, row by row. If I want to plot only one row from each DataFrame, I use this code:
df_actual.loc[71].T.plot(figsize=(14,10), kind='line')
df_preds.loc[71].T.plot(figsize=(14,10), kind='line')
The output is this:
However, the ideal output is to have all the rows for each DataFrame in a grid so I can compare all the results:
I have tried to create a for loop to itinerate each row but it is not working:
for i in range(20):
df_actual.loc[i].T.plot(figsize=(14,10), kind='line')
df_preds.loc[i].T.plot(figsize=(14,10), kind='line')
Is there any way to do this that is not manual? Thanks!
it would be helpful if you provided a sample of your dfs.
assuming both dfs have the same length & assuming you want 2 columns, try this:
fig, ax = plt.subplots(round(len(df_actual)/2),2)
ax.ravel()
for i in range(len(ax)):
sns.lineplot(df_actual.loc[i].T, ax=ax[i], color="navy")
sns.lineplot(df_preds.loc[i].T, ax=ax[i], color="orange")
edit:
this works for me (you just have to add your .T):
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df_actual = pd.DataFrame(data=[[1,2,3,4,5], [6,7,8,9,10]], columns = ["col1","col2", "col3", "col4", "col5"])
df_pred = pd.DataFrame(data=[[3,4,5,6,7], [8,9,10,11,12]], columns = ["col1", "col2", "col3", "col4", "col5"])
fig, ax = plt.subplots(round(len(df_actual)/2),2)
ax.ravel()
for i in range(len(ax)):
ax[i].plot(df_actual.loc[i], color="navy")
ax[i].plot(df_pred.loc[i], color="orange")
I am trying to improve my visualizations on Python. Assume I have this data:
data = {'animal':['cat', 'tiger', 'leopard', 'dog'], 'family':['mustelids ','felidae','felidae', 'canidae'], 'family_pct':[6.06,33.33,9.09,12.12]}
df = pd.DataFrame(data)
I want to create a barplot as follows:
fig, ax = plt.subplots()
sns.barplot(x = 'family', y = 'family_pct', hue='animal', data = df)
However, I would like each "family" to be plotted separately (one plot for mustelids, one for felidae, and one for canidae) and not on the same plot. I effectively would like to loop the graph over every value of the family column. However, I am not sure how to go about this.
Thanks!
Use catplot() to combine a barplot() and a FacetGrid. This allows grouping within additional categorical variables. Using catplot() is safer than using FacetGrid directly, as it ensures synchronization of variable order across facets:
sns.catplot(x = 'family', y = 'family_pct', hue='animal', col='family', data = df, kind='bar', height=4, aspect=.7)
See more details here.
I have a pandas dataframe:
import pandas as pd
data1 = {'Date':['03-19-2019'],
'Total':[35],
'Solved':[19],
'Arrived':[23],
}
df1 = pd.DataFrame(data1)
and I want to plot a bar plot like this:
with
df1.plot(kind='barh',x='Date',y='Total', ax=ax0, color='#C0C0C0',
width=0.5)
df1.plot(kind='barh',x='Date',y='Arrived', ax=ax0, color='#C0FFFF',
width=0.5)
df1.plot(kind='barh',x='Date',y='Solved', ax=ax0, color='#C0C0FF',
width=0.5)
However, to avoid overlapping, I have to draw each column taking into account which of them has the bigger value.(Total greater than Arrived greater than Solved)
How can I avoid to do this and automate this process easily?
There must be a straightforward and simpler approach in Pandas but I just came up with this quick workaround. The idea is following:
Leave out the first column Date and sort the remaining columns.
Use the sorted indices for plotting the columns in ascending order
To make the colors consistent, you can make use of dictionary so that the ascending/descending order doesn't affect your colors.
fig, ax0 = plt.subplots()
ids = np.argsort(df1.values[0][1:])[::-1]
colors = {'Total': '#C0C0C0', 'Arrived': '#C0FFFF', 'Solved':'#C0C0FF'}
for col in np.array(df1.columns[1:].tolist())[ids]:
df1.plot(kind='barh',x='Date',y=col, ax=ax0, color=colors[col], width=0.1)
A stacked bar graph can be produced in pandas via the stacked=True option. To use this you need to make the "Date" the index first.
import matplotlib.pyplot as plt
import pandas as pd
data1 = {'Date':['03-19-2019'],
'Total':[35],
'Solved':[19],
'Arrived':[23],
}
df = pd.DataFrame(data1)
df.set_index("Date").plot(kind="barh", stacked=True)
plt.show()
Given the following:
import numpy as np
import pandas as pd
import seaborn as sns
np.random.seed(365)
x1 = np.random.randn(50)
y1 = np.random.randn(50) * 100
x2 = np.random.randn(50)
y2 = np.random.randn(50) * 100
df1 = pd.DataFrame({'x1':x1, 'y1': y1})
df2 = pd.DataFrame({'x2':x2, 'y2': y2})
sns.lmplot('x1', 'y1', df1, fit_reg=True, ci = None)
sns.lmplot('x2', 'y2', df2, fit_reg=True, ci = None)
This will create 2 separate plots. How can I add the data from df2 onto the SAME graph? All the seaborn examples I have found online seem to focus on how you can create adjacent graphs (say, via the 'hue' and 'col_wrap' options). Also, I prefer not to use the dataset examples where an additional column might be present as this does not have a natural meaning in the project I am working on.
If there is a mixture of matplotlib/seaborn functions that are required to achieve this, I would be grateful if someone could help illustrate.
You could use seaborn's FacetGrid class to get desired result.
You would need to replace your plotting calls with these lines:
# sns.lmplot('x1', 'y1', df1, fit_reg=True, ci = None)
# sns.lmplot('x2', 'y2', df2, fit_reg=True, ci = None)
df = pd.concat([df1.rename(columns={'x1':'x','y1':'y'})
.join(pd.Series(['df1']*len(df1), name='df')),
df2.rename(columns={'x2':'x','y2':'y'})
.join(pd.Series(['df2']*len(df2), name='df'))],
ignore_index=True)
pal = dict(df1="red", df2="blue")
g = sns.FacetGrid(df, hue='df', palette=pal, size=5);
g.map(plt.scatter, "x", "y", s=50, alpha=.7, linewidth=.5, edgecolor="white")
g.map(sns.regplot, "x", "y", ci=None, robust=1)
g.add_legend();
This will yield this plot:
Which is if I understand correctly is what you need.
Note that you will need to pay attention to .regplot parameters and may want to change the values I have put as an example.
; at the end of the line is to suppress output of the command (I use ipython notebook where it's visible).
Docs give some explanation on the .map() method. In essence, it does just that, maps plotting command with data. However it will work with 'low-level' plotting commands like regplot, and not lmlplot, which is actually calling regplot behind the scene.
Normally plt.scatter would take parameters: c='none', edgecolor='r' to make non-filled markers. But seaborn is interfering the process and enforcing color to the markers, so I don't see an easy/straigtforward way to fix this, but to manipulate ax elements after seaborn has produced the plot, which is best to be addressed as part of a different question.
Option 1: sns.regplot
In this case, the easiest to implement solution is to use sns.regplot, which is an axes-level function, because this will not require combining df1 and df2.
import pandas as pd
import seaborn
import matplotlib.pyplot as plt
# create the figure and axes
fig, ax = plt.subplots(figsize=(6, 6))
# add the plots for each dataframe
sns.regplot(x='x1', y='y1', data=df1, fit_reg=True, ci=None, ax=ax, label='df1')
sns.regplot(x='x2', y='y2', data=df2, fit_reg=True, ci=None, ax=ax, label='df2')
ax.set(ylabel='y', xlabel='x')
ax.legend()
plt.show()
Option 2: sns.lmplot
As per sns.FacetGrid, it is better to use figure-level functions than to use FacetGrid directly.
Combine df1 and df2 into a long format, and then use sns.lmplot with the hue parameter.
When working with seaborn, it is almost always necessary for the data to be in a long format.
It's customary to use pandas.DataFrame.stack or pandas.melt to convert DataFrames from wide to long.
For this reason, df1 and df2 must have the columns renamed, and have an additional identifying column. This allows them to be concatenated on axis=0 (the default long format), instead of axis=1 (a wide format).
There are a number of ways to combine the DataFrames:
The combination method in the answer from Primer is fine if combining a few DataFrames.
However, a function, as shown below, is better for combining many DataFrames.
def fix_df(data: pd.DataFrame, name: str) -> pd.DataFrame:
"""rename columns and add a column"""
# rename columns to a common name
data.columns = ['x', 'y']
# add an identifying value to use with hue
data['df'] = name
return data
# create a list of the dataframes
df_list = [df1, df2]
# update the dataframes by calling the function in a list comprehension
df_update_list = [fix_df(v, f'df{i}') for i, v in enumerate(df_list, 1)]
# combine the dataframes
df = pd.concat(df_update_list).reset_index(drop=True)
# plot the dataframe
sns.lmplot(data=df, x='x', y='y', hue='df', ci=None)
Notes
Package versions used for this answer:
pandas v1.2.4
seaborn v0.11.1
matplotlib v3.3.4