Groupby Plot - Include Subgroup Name - python

I'm looking to plot two columns of a time series based on a groupby of a third column. It works as intended more or less, but I can't tell which subgroup is being plotted in the output as it is not included in the legend or anywhere else in the graphs outputted.
Is there a way to include the subgroup name in the graphs outputted?
This is what I've attempted on the dataframe as follows:
dataframe
awareness.groupby('campaign_name')['sum_purchases_value','sum_ad_spend'].plot(figsize=(20,8), legend=True);

Try this:
grouped = awareness.groupby('campaign_name')
titles = [name for name,data in grouped]
plots = grouped['sum_purchases_value',
'sum_ad_spend'].plot(figsize=(20,8), legend=True)
for plot, label in zip(plots, titles):
plot.set(title = label)
The pandas plot function returns a Series of matplotlib subplot objects, so using the for loop you can customize whatever you like (x labels, y labels, font size, etc.)

Related

Creating a single tidy seaborn plot in a 'for' loop

I'm trying to generate a plot in seaborn using a for loop to plot the contents of each dataframe column on its own row.
The number of columns that need plotting can vary between 1 and 30. However, the loop creates multiple individual plots, each with their own x-axis, which are not aligned and with a lot of wasted space between the plots. I'd like to have all the plots together with a shared x-axis without any vertical spacing between each plot that I can then save as a single image.
The code I have been using so far is below.
comp_relflux = measurements.filter(like='rel_flux_C', axis=1) *# Extracts relevant columns from larger dataframe
comp_relflux=comp_relflux.reindex(comp_relflux.mean().sort_values().index, axis=1) # Sorts into order based on column mean.
plt.rcParams["figure.figsize"] = [12.00, 1.00]
for column in comp_relflux.columns:
plt.figure()
sns.scatterplot((bjd)%1, comp_relflux[column], color='b', marker='.')
This is a screenshot of the resultant plots.
I have also tried using FacetGrid, but this just seems to plot the last column's data.
p = sns.FacetGrid(comp_relflux, height=2, aspect=6, despine=False)
p.map(sns.scatterplot, x=(bjd)%1, y=comp_relflux[column])
To combine the x-axis labels and have just one instead of having it for each row, you can use sharex. Also, using plt.subplot() to the number of columns you have, you would also be able to have just one figure with all the subplots within it. As there is no data available, I used random numbers below to demonstrate the same. There are 4 columns of data in my df, but have kept as much of your code and naming convention as is. Hope this is what you are looking for...
comp_relflux = pd.DataFrame(np.random.rand(100, 4)) #Random data - 4 columns
bjd=np.linspace(0,1,100) # Series of 100 points - 0 to 1
rows=len(comp_relflux.columns) # Use this to get column length = subplot length
fig, ax = plt.subplots(rows, 1, sharex=True, figsize=(12,6)) # The subplots... sharex is assigned here and I move the size in here from your rcParam as well
for i, column in enumerate(comp_relflux.columns):
sns.scatterplot((bjd)%1, comp_relflux[column], color='b',marker='.', ax=ax[i])
1 output plot with 4 subplots

How to use column values for x axis labels in matplotlib

I have a basic DataFrame in pandas and using matplotlib to create a chart
I have followed advice found on SO and also on the docs for labelling the values on the x axis but they won't change from the indices.
I have this,
Presc_df_asc = Presc_df.sort_values('Total Items',ascending=True)
Presc_df_asc['Total Items'].plot.bar(x="Practice", ylim=[Presc_df_asc['Total Items'].min(), Presc_df_asc['Total Items'].max()])
plt.xlabel('Practice')
plt.ylabel('Total Items')
plt.title('practice total items')
plt.legend(('Items',),loc='upper center')
From what I have found plot.bar(x="Practice" should set the x-axis to show the values int he practice column under each bar.
But no matter what I try I get the x-axis labelled as indices with just the main label saying Practices.
In order for the plotting command to be able to access the "Practice" column, you need to apply the plot function to the entire dataframe (or a sub_dataframe that contains at least these two columns). The code below uses the corresponding labels below each bar. The rot=0 argument prevents the labels from being rotated by 90°.
Presc_df_asc.plot.bar(x="Practice", y ="Total Items",
ylim=[Presc_df_asc['Total Items'].min(),
Presc_df_asc['Total Items'].max()], rot=0)

Seaborn showing values in legend not present in Pandas column

I'm generating a scatterplot for a Pandas DataFrame data, containing amongst others the numeric column 'year' with the unique values
array([2010., 2011., 2012., 2013., 2014., 2015., 2016., 2017., 2018.])
as shown with data.year.unique().
Displaying the plot like this:
ax = sns.scatterplot(x='x', y='y', hue='name', size='year', data=data, palette=sns.color_palette('deep', 7))
generates a legend with the groupings for year listed as
This is misleading, as the plot only contains data from 2008 to 2020.
I've tried passing a tuple (min, max) to the sns.scatterplot function as described in the documentation, to no avail.
Changing the data type of the column 'year' to categoric does print the range of the years correctly in the legend, but yields a legend entry for every single year, which is unnecessary and takes up a lot of space.
I've also tried the solution from this related thread, but it doesn't change the range of the legend entries.
How can I force seaborn to show the actual range of values in the legend? Alternatively, if it only works by using a categorical column, how can I only show every second entry in the legend?

seaborn.FacetGrid col parameter not behaving as expected

The following is my code.
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_style('whitegrid')
titanic = sns.load_dataset('titanic')
g = sns.FacetGrid(titanic, col="sex")
g = g.map(plt.hist, "age")
The Histogram looks like as shown.
Now I have a question about the parameter col. I see two histograms arranged in a row. But I have mentioned that col=sex. So what is the purpose of col parameter and why histograms are arranged in a rowwise fashion?
Specifying the col parameter subsets the data frame into grouped by the variable that matches the indicated argument name. Each group will be plotted in a separate column in the resulting plot. In your case, the data frame variable sex has two groups: males and females. In the resulting plot, each of these groups have been plotted plotted in a separate column, that's why there are two columns and one row in your plot.
From the FaceGrid docstring:
row, col, hue : strings
Variables that define subsets of the data, which will be drawn on separate facets in the grid. See the *_order parameters to control the order of levels of this variable.

seaborn FacetGrid , ranked barplot separated on row and col

Given some data:
pt = pd.DataFrame({'alrmV':[000,000,000,101,101,111,111],
'he':[e,e,e,e,h,e,e],
'inc':[0,0,0,0,0,1,1]})
I would like to create a bar plot separated on row and col.
g = sns.FacetGrid(pt, row='inc', col='he', margin_titles=True)
g.map( sns.barplot(pt['alrmV']), color='steelblue')
This, works, but how do I also add:
an ordered x-axis
only display the top-two-by-count alrmV types
To get an ordered x-axis, that displays the top 2 count types, I played around with this grouping, but unable to get it into a Facet grid:
grouped = pt.groupby( ['he','inc'] )
grw= grouped['alrmV'].value_counts().fillna(0.) #.unstack().fillna(0.)
grw[:2].plot(kind='bar')
Using FacetGrid, slicing limits the total count displayed
g.map(sns.barplot(pt['alrmV'][:10]), color='steelblue')
So how can I get a bar graph, that is separated on row and col, and is ordered and displays only top 2 counts?
I couldn't get the example to work with the data you provided, so I'll use one of the example datasets to demonstrate:
import seaborn as sns
tips = sns.load_dataset("tips")
We'll make a plot with sex in the columns, smoker in the rows, using day as the x variable for the barplot. To get the top two days in order, we could do
top_two_ordered = tips.day.value_counts().order().index[-2:]
Then you can pass this list to the x_order argument of barplot.
Although you can use FacetGrid directly here, it's probably easier to use the factorplot function:
g = sns.factorplot("day", col="sex", row="smoker",
data=tips, margin_titles=True, size=3,
x_order=top_two_ordered)
Which draws:
While I wouldn't recommend doing exactly what you proposed (plotting bars for different x values in each facet), it could be accomplished by doing something like
g = sns.FacetGrid(tips, col="sex", row="smoker", sharex=False)
def ordered_barplot(data, **kws):
x_order = data.day.value_counts().order().index[-2:]
sns.barplot(data.day, x_order=x_order)
g.map_dataframe(ordered_barplot)
to make

Categories

Resources