I'm trying to make a plot with bars or areas rescaled to 100% with the new seaborn.objects interface and I can't seem to get so.Norm() to work, with or without by...
Here's what I've got so far:
import seaborn as sns
import seaborn.objects as so
tips = sns.load_dataset("tips")
# bars
(
so.Plot(tips, x="day", y="total_bill", color="time")
.add(so.Bar(), so.Agg("sum"), so.Norm(func="sum"), so.Stack())
)
#areas
(
so.Plot(tips, x="size", y="total_bill", color="time")
.add(so.Area(), so.Agg("sum"), so.Norm(func="sum"), so.Stack())
)
I think that you intend for the height of each (stacked) bar to equal 1, so you'd want to aggregate by x values when normalizing:
(
so.Plot(tips, x="day", y="total_bill", color="time")
.add(so.Bar(), so.Agg("sum"), so.Norm(func="sum", by=["x"]), so.Stack())
)
Related
How to disable the axes' ticklabels for seaborn.objects generated image?
import seaborn as sns
from seaborn import objects as so
tips = sns.load_dataset("tips")
(
so.Plot(tips, "total_bill", "tip")
.add(so.Dots())
.layout(size=(4, 5))
.label(x=None, y=None)
.save("data/dot_01.png")
.show()
)
The above code generated image still includes the ticklabels (in red boxes):
How to disable the ticklabels so that the image could be something like the below one without any blank gaps around the four edges, i.e., the grey background edges are the images' edges:
import seaborn as sns
import seaborn.objects as so
from matplotlib.ticker import FixedFormatter
tips = sns.load_dataset("tips")
empty_formatter = FixedFormatter([])
(
so.Plot(tips, "total_bill", "tip")
.add(so.Dots())
.layout(size=(4, 5))
.label(x=None, y=None)
.scale(
x=so.Continuous().tick().label(formatter = empty_formatter),
y=so.Continuous().tick().label(formatter = empty_formatter)
)
.save("dot_01.png")
.show()
)
Will generate the plot which looks closest to your example.
Below you can see my data and facet plot in matplotlib.
import pandas as pd
import numpy as np
pd.set_option('max_columns', None)
import matplotlib.pyplot as plt
import matplotlib as mpl
# Data
data = {
'type_sale': ['g_1','g_2','g_3','g_4','g_5','g_6','g_7','g_8','g_9','g_10'],
'open':[70,20,24,150,80,90,60,90,20,20],
'closed':[30,14,20,10,20,40,10,10,10,10],
}
df = pd.DataFrame(data, columns = ['type_sale',
'open',
'closed',
])
data1 = {
'type_sale': [ 'open','closed'],
'structure':[70,30],
}
df1 = pd.DataFrame(data1, columns = ['type_sale',
'structure',
])
# Ploting
labels = ['open','closed']
fig, axs = plt.subplots(2,2, figsize=(10,8))
plt.subplots_adjust(wspace=0.2, hspace=0.6)
df1.plot(x='type_sale', y='structure',labels=labels,autopct='%1.1f%%',kind='pie', title='Stacked Bar Graph by dataframe',ax=axs[0,0])
df.plot(x='type_sale', kind='bar', stacked=True, title='Stacked Bar Graph by dataframe', ax=axs[0,1])
df.plot(x='type_sale', kind='bar', stacked=True, title='Stacked Bar Graph by dataframe',ax=axs[1,0])
df.plot(x='type_sale', kind='bar', stacked=True,title='Stacked Bar Graph by dataframe', ax=axs[1,1])
plt.suptitle(t='Stacked Bar Graph by dataframe', fontsize=16)
plt.show()
If you compare the first pie plot with others, you can spot a big difference. Namely, the first pie plot is not enclosed with a black line (rectangle), while the other is enclosed.
So can anybody help me with how to solve this problem?
After playing around myself, it seems that this is working, but I think the pie gets stretched, which doesn't look that good.
EDIT
found a better solution with set_adjustable
also two options how you create the piechart, the frame and ticks differ in a bit.
# 1
axs[0,0].pie(df1['structure'],labels=labels,autopct='%1.1f%%',frame=True,radius=10)
axs[0,0].set_title('Stacked Bar Graph by dataframe')
# 2
df1.plot(x='type_sale', y='structure',labels=labels,autopct='%1.1f%%',kind='pie', title='Stacked Bar Graph by dataframe',ax=axs[0,0])
axs[0,0].set_frame_on(True)
axs[0,0].set_adjustable('datalim')
How can I reproduce the following graph done in seaborn in altair?
import seaborn as sns
tips = sns.load_dataset("tips")
ax = sns.boxplot(x="day", y="total_bill", hue="smoker",
data=tips, palette="Set3")
This was my attempt:
import altair as alt
chart = (
alt.Chart(tips)
.mark_boxplot()
.encode(x=alt.X("day"), y=alt.Y("total_bill"), color="smoker")
.interactive()
.properties(width=300))
chart.show()
which gives me this not wanted graph:
Put smoker on the x-axis and use the column facet for the day and play a bit with the padding and spacing:
chart = alt.Chart(df).mark_boxplot(ticks=True).encode(
x=alt.X("smoker:O", title=None, axis=alt.Axis(labels=False, ticks=False), scale=alt.Scale(padding=1)),
y=alt.Y("total_bill:Q"),
color="smoker:N",
column=alt.Column('day:N', sort=['Thur','Fri','Sat','Sun'], header=alt.Header(orient='bottom'))
).properties(
width=100
).configure_facet(
spacing=0
).configure_view(
stroke=None
)
chart
I would like to annotate my violin plot with the number of observations in each group. So the question is essentially the same as this one, except:
python instead of R,
seaborn instead of ggplot, and
violin plots instead of boxplots
Lets take this example from Seaborn API documentation:
import seaborn as sns
sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips)
I'd like to have n=62, n=19, n=87, and n=76 on top of the violins. Is this doable?
In this situation, I like to precompute the annotated values and incorporate them into the categorical axis. In other words, precompute e.g., "Thurs, N = xxx"
That looks like this:
import seaborn as sns
sns.set_style("whitegrid")
ax= (
sns.load_dataset("tips")
.assign(count=lambda df: df['day'].map(df.groupby(by=['day'])['total_bill'].count()))
.assign(grouper=lambda df: df['day'].astype(str) + '\nN = ' + df['count'].astype(str))
.sort_values(by='day')
.pipe((sns.violinplot, 'data'), x="grouper", y="total_bill")
.set(xlabel='Day of the Week', ylabel='Total Bill (USD)')
)
You first need to store all values of y positions and x positions (using your dataset for that) in order to use ax.text, then a simple for loop can write everything in the positions desired:
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips)
yposlist = tips.groupby(['day'])['total_bill'].median().tolist()
xposlist = range(len(yposlist))
stringlist = ['n = 62','n = 19','n = 87','n = 76']
for i in range(len(stringlist)):
ax.text(xposlist[i], yposlist[i], stringlist[i])
plt.show()
I'm trying to create an interactive plotly graph from pandas dataframes.
However, I can't get the legends displayed correctly.
Here is a working example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.plotly as py
# sign into the plotly api
py.sign_in("***********", "***********")
# create some random dataframes
dates = pd.date_range('1/1/2000', periods=8)
df1 = pd.DataFrame(np.random.randn(8, 1), index=dates, columns=['A'])
df2 = pd.DataFrame(np.random.randn(8, 1), index=dates, columns=['B'])
df1.index.name = 'date'
df2.index.name = 'date'
Now I attempt to plot the dataframes using plotly.
fig, ax = plt.subplots(1,1)
df1.plot(y='A', ax=ax)
df2.plot(y='B', ax=ax)
py.iplot_mpl(fig, filename='random')
Notice there is no legend
Edit:
Based on suggestions below I have added an update dict. Although this does display the legend, it messes up the plot itself:
fig, ax = plt.subplots(1,1)
df1.plot(y='A', ax=ax)
df2.plot(y='B', ax=ax)
update = dict(
layout=dict(
annotations=[dict(text=' ')], # rm erroneous 'A', 'B', ... annotations
showlegend=True # show legend
)
)
py.iplot_mpl(fig, update=update, filename='random')
Edit 2:
Removing the annotations entry from the layout dict results in the plot being displayed correctly, but the legend is not the y column name, but rather the x column name, the index name of the dataframe
fig, ax = plt.subplots(1,1)
df1.plot(y='A', ax=ax)
df2.plot(y='B', ax=ax)
update = dict(
layout=dict(
showlegend=True # show legend
)
)
py.iplot_mpl(fig, update=update, filename='random')
This results in the following plot:
Edit 3:
I have found a way to override the legend text but it seems a bit klunky. Given that I've specified the dataframe column I want to plot:
df1.plot(y='A', ax=ax)
I would have expected that y='A' would result in 'A' being used as the legend label.
It seems this is not the case, and while it is possible to override using the index label, as seen below, it just feels wrong.
Is there a better way to achieve this result?
update = dict(
layout=dict(
showlegend=True,
),
data=[
dict(name='A'),
dict(name='B'),
]
)
py.iplot_mpl(fig, update=update, filename='random')
Legends don't convert well from matplotlib to plotly.
Fortunately, adding a plotly legend to a matplotlib plot is straight forward:
update = dict(
layout=dict(
showlegend=True # show legend
)
)
py.iplot_mpl(fig, update=update)
See the full working ipython notebook here.
For more information, refer to the plotly user guide.