Given some data:
pt = pd.DataFrame({'alrmV':[000,000,000,101,101,111,111],
'he':[e,e,e,e,h,e,e],
'inc':[0,0,0,0,0,1,1]})
I would like to create a bar plot separated on row and col.
g = sns.FacetGrid(pt, row='inc', col='he', margin_titles=True)
g.map( sns.barplot(pt['alrmV']), color='steelblue')
This, works, but how do I also add:
an ordered x-axis
only display the top-two-by-count alrmV types
To get an ordered x-axis, that displays the top 2 count types, I played around with this grouping, but unable to get it into a Facet grid:
grouped = pt.groupby( ['he','inc'] )
grw= grouped['alrmV'].value_counts().fillna(0.) #.unstack().fillna(0.)
grw[:2].plot(kind='bar')
Using FacetGrid, slicing limits the total count displayed
g.map(sns.barplot(pt['alrmV'][:10]), color='steelblue')
So how can I get a bar graph, that is separated on row and col, and is ordered and displays only top 2 counts?
I couldn't get the example to work with the data you provided, so I'll use one of the example datasets to demonstrate:
import seaborn as sns
tips = sns.load_dataset("tips")
We'll make a plot with sex in the columns, smoker in the rows, using day as the x variable for the barplot. To get the top two days in order, we could do
top_two_ordered = tips.day.value_counts().order().index[-2:]
Then you can pass this list to the x_order argument of barplot.
Although you can use FacetGrid directly here, it's probably easier to use the factorplot function:
g = sns.factorplot("day", col="sex", row="smoker",
data=tips, margin_titles=True, size=3,
x_order=top_two_ordered)
Which draws:
While I wouldn't recommend doing exactly what you proposed (plotting bars for different x values in each facet), it could be accomplished by doing something like
g = sns.FacetGrid(tips, col="sex", row="smoker", sharex=False)
def ordered_barplot(data, **kws):
x_order = data.day.value_counts().order().index[-2:]
sns.barplot(data.day, x_order=x_order)
g.map_dataframe(ordered_barplot)
to make
Related
I'm trying to generate a plot in seaborn using a for loop to plot the contents of each dataframe column on its own row.
The number of columns that need plotting can vary between 1 and 30. However, the loop creates multiple individual plots, each with their own x-axis, which are not aligned and with a lot of wasted space between the plots. I'd like to have all the plots together with a shared x-axis without any vertical spacing between each plot that I can then save as a single image.
The code I have been using so far is below.
comp_relflux = measurements.filter(like='rel_flux_C', axis=1) *# Extracts relevant columns from larger dataframe
comp_relflux=comp_relflux.reindex(comp_relflux.mean().sort_values().index, axis=1) # Sorts into order based on column mean.
plt.rcParams["figure.figsize"] = [12.00, 1.00]
for column in comp_relflux.columns:
plt.figure()
sns.scatterplot((bjd)%1, comp_relflux[column], color='b', marker='.')
This is a screenshot of the resultant plots.
I have also tried using FacetGrid, but this just seems to plot the last column's data.
p = sns.FacetGrid(comp_relflux, height=2, aspect=6, despine=False)
p.map(sns.scatterplot, x=(bjd)%1, y=comp_relflux[column])
To combine the x-axis labels and have just one instead of having it for each row, you can use sharex. Also, using plt.subplot() to the number of columns you have, you would also be able to have just one figure with all the subplots within it. As there is no data available, I used random numbers below to demonstrate the same. There are 4 columns of data in my df, but have kept as much of your code and naming convention as is. Hope this is what you are looking for...
comp_relflux = pd.DataFrame(np.random.rand(100, 4)) #Random data - 4 columns
bjd=np.linspace(0,1,100) # Series of 100 points - 0 to 1
rows=len(comp_relflux.columns) # Use this to get column length = subplot length
fig, ax = plt.subplots(rows, 1, sharex=True, figsize=(12,6)) # The subplots... sharex is assigned here and I move the size in here from your rcParam as well
for i, column in enumerate(comp_relflux.columns):
sns.scatterplot((bjd)%1, comp_relflux[column], color='b',marker='.', ax=ax[i])
1 output plot with 4 subplots
I'm aiming to plot a stacked chart that displays normalised values from a pandas df. Using below, each unique value in Item has it's own row. I then aim to plot a stacked chart containing normalised values from Label, with Num along the x-axis.
However, hue seems to pass a different set of colours for each individual Item. They aren't consistent, for ex, A in Up is blue, while A in Right is green.
I'm also hoping to share the x-axis for Num is consistent for each Item. The values aren't aligned with the respective x-axis.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Num' : [1,2,1,2,3,2,1,3,2,2,1,2,3,3,1,3],
'Label' : ['A','B','C','B','B','C','C','B','B','A','C','A','B','A','C','A'],
'Item' : ['Up','Left','Up','Left','Down','Right','Up','Down','Right','Down','Right','Up','Up','Right','Down','Left'],
})
g = sns.FacetGrid(df,
row = 'Item',
row_order = ['Up','Right','Down','Left'],
aspect = 2,
height = 4,
sharex = True,
legend_out = True
)
g.map_dataframe(sns.histplot, x = 'Num', hue = 'Label', multiple = 'fill', shrink = 0.8, binwidth = 1)
g.add_legend()
Using FacetGrid directly can be tricky; it is basically doing a groupb-by and for loop over the axes, and it does not track any function-specific state that would be needed to make sure that the answer to questions like "what order should be used for each hue level" is the same in each facet. So you would need to supply that information somehow (i.e. hue_order or passing a palette dictionary). In fact, there is a warning in the documentation to this effect.
But you generally don't need to use FacetGrid directly; you can use one of the figure-level functions, which do all of the bookkeeping for you to make sure that information is aligned across facets. Here you would use displot:
sns.displot(
data=df, x="Num", hue="Label",
row='Item', row_order=['Up','Right','Down','Left'],
multiple="fill", shrink=.8, discrete=True,
aspect=4, height=2,
)
Note that I've made one other change to your code here, which is to use discrete=True instead of binwidth=1, which is what I think you want.
I am trying to add multiple plots and create a matrix plot with seaborn. unfortunately python give me following warning.
"relplot is a figure-level function and does not accept target axes. You may wish to try scatterplot"
fig, axes = plt.subplots(nrows=5,ncols=5,figsize=(20,20),sharex=True, sharey=True)
for i in range(5):
for j in range(5):
axes[i][j]=seaborn.relplot(x=col[i+2],y=col[j+2],data=df,ax=axes=[i][j])
I would like to know if there's any method with which I can combine all the plots plotted with relplot.
Hi Kinto welcome to StackOverflow!
relplot works differently than for example scatterplot. With relplot you don't need to define subplots and loop over them. Instead you can say what you would like to vary on each row or column of a graph.
For an example from the documentation:
import seaborn as sns
sns.set(style="ticks")
tips = sns.load_dataset("tips")
g = sns.relplot(
x="total_bill", y="tip", hue="day",
col="time", row="sex", data=tips
)
Which says: on each subplot, plot the total bill on the x-axis, the tip on the y-axis and vary the hue in a subplot with the day. Then for each column, plot unique data from the "time" column of the tips dataset. In this case there are two unique times: "Lunch" and "Diner". And finally vary the "sex" for each subplot row. In this case there are two types of "sex": "Male" and "Female", so on one row you plot the male tipping behavior and on the second the female tipping behavior.
I'm not sure what your data looks like, but hopefully this explanation helps you.
I'm looking to plot two columns of a time series based on a groupby of a third column. It works as intended more or less, but I can't tell which subgroup is being plotted in the output as it is not included in the legend or anywhere else in the graphs outputted.
Is there a way to include the subgroup name in the graphs outputted?
This is what I've attempted on the dataframe as follows:
dataframe
awareness.groupby('campaign_name')['sum_purchases_value','sum_ad_spend'].plot(figsize=(20,8), legend=True);
Try this:
grouped = awareness.groupby('campaign_name')
titles = [name for name,data in grouped]
plots = grouped['sum_purchases_value',
'sum_ad_spend'].plot(figsize=(20,8), legend=True)
for plot, label in zip(plots, titles):
plot.set(title = label)
The pandas plot function returns a Series of matplotlib subplot objects, so using the for loop you can customize whatever you like (x labels, y labels, font size, etc.)
The following is my code.
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_style('whitegrid')
titanic = sns.load_dataset('titanic')
g = sns.FacetGrid(titanic, col="sex")
g = g.map(plt.hist, "age")
The Histogram looks like as shown.
Now I have a question about the parameter col. I see two histograms arranged in a row. But I have mentioned that col=sex. So what is the purpose of col parameter and why histograms are arranged in a rowwise fashion?
Specifying the col parameter subsets the data frame into grouped by the variable that matches the indicated argument name. Each group will be plotted in a separate column in the resulting plot. In your case, the data frame variable sex has two groups: males and females. In the resulting plot, each of these groups have been plotted plotted in a separate column, that's why there are two columns and one row in your plot.
From the FaceGrid docstring:
row, col, hue : strings
Variables that define subsets of the data, which will be drawn on separate facets in the grid. See the *_order parameters to control the order of levels of this variable.