python seaborn: customize line plot and scatterplot together (also legend) - python

df = pd.DataFrame({
'id': {0: -3, 1: 2, 2: -3, 3: 1},
'val': {0: 0.4, 1: 0.03, 2: 0.88, 3: 1.3},
'indicator': {0: 'A', 1: 'A', 2: 'B', 3: 'B'},
'count': {0: 40000, 1: 5779, 2: 3000, 3: 31090}
})
df
and I hope to get a plot like the following:
I know with code of following, I can get a close plot but I want to have the line size also varies with "count" variable, but when I tried to add size = 'count', I did not get a meaningful plot and also for the legend, I want to only have one legend for "indicator" rather than two:
plt.figure()
sns.lineplot(x = 'id', y = 'val', hue = 'indicator', data = df)
sns.scatterplot(x = 'id', y = 'val', hue = 'indicator', size = 'count', data = df)

To answer the second part of your question - you can disable the lineplot legend like so:
sns.lineplot(x = 'id', y = 'val', hue = 'indicator', data = df, legend=False)
This will leave you with two legend groups - one for colours and one for sizes. This is the easiest way, but you can also tinker with plt.legend() and build your own from scratch.
As for making the lines vary their thickness dynamically from one point to another, I don't think you can do it using seaborn. For something like that you'd need a more low-level library, like bokeh or use matplotlib directly to draw connecting lines between line markers, adjusting for their varying size.

Related

A facet-specific legend in each facet of a FacetGrid Seaborn

I am trying to make a facet-specific legend in each facet of a FacetGrid Seaborn object, such as that produced by a catplot.
Consider the following DataFrame where measurement is the variable to plot, against the categorical Condition, faceted across rows and columns according to variables Lab and (instrument) model. The hue is set to the serial number of the particular instrument on which the measurement was made.
Here is the DataFrame:
df = pd.DataFrame({'Condition': ['C1','C2','C1','C2','C1','C1','C2','C1',
'C1','C1', 'C1', 'C2', 'C1', 'C2', 'C1', 'C2', 'C2'],
'model': ['Pluto','Pluto','Jupy','Jupy','Jupy','Jupy','Jupy','Jupy',
'Jupy', 'Pluto', 'Pluto', 'Pluto', 'Pluto', 'Pluto', 'Jupy', 'Jupy',
'Pluto'],
'serial': [2520,2520,3568,3568,3568,3580,3580,356,
456, 2580, 2580, 2580, 2599, 2599, 2700, 2700,
2560],
'measurement': [1.02766,1.0287,1.0099,1.0198,1.0034,1.0036,1.0054,1.0024,
1.0035,1.00245,1.00456, 1.01, 1.0023, 1.0024, 1.00238, 1.0115,
1.020],
'Lab': ['John','John','John','John','Jack','Jack','Jack','John',
'Jack','John', 'Jack', 'Jack', 'Jack', 'Jack', 'John', 'John',
'John']}
)
some facets contain only a subset of the hue levels and as the levels grow in number the FacetGrid legend gets rather long. Inspired by an the answer to another post, I opt for iterating through the FacetGrid axes using g.axes.ravel() to get a legend in each facet:
sns.set_style("ticks")
g = sns.catplot(x='Condition', # returns a FacetGrid object for further editing
y = 'measurement',
data=df,
hue='serial',
row='Lab',
col='model',
s=10,
kind='swarm',
dodge=False,
aspect = 1,
sharey = True,
legend_out = True,
).despine(left=True)
for axes in g.axes.ravel():
axes.legend()
g.savefig('/Users/massimopinto/Desktop/legend_in_facets.png',
bbox_inches='tight')
this leads to a rather crowded plot and overloaded information from the entire FacetGrid object legend. What I would prefer to have is the legend of each facet only to show the hue levels that appear in that specific facet.
How do I get to that?
versions: pandas: 1.0.3; seaborn: 0.10.0; python: 3.7.2
Consider iterating elementwise with a groupby() object using zip to rebuild each legend by corresponding values of hue column. Importantly, you must sort data frame before plotting.
df = df.sort_values(['Lab', 'model', 'serial']).reset_index(drop=True)
sns.set_style("ticks")
g = sns.catplot(x = 'Condition',
y = 'measurement',
data = df,
hue = 'serial',
row = 'Lab',
col = 'model',
s=10,
kind='swarm',
dodge=False,
aspect = 1,
sharey = True,
legend_out = False, # REMOVE MASTER LEGEND
).despine(left=True)
# MASTER SERIES OF serial
ser_vals = pd.Series(df['serial'].sort_values().unique())
for axes, (i, d) in zip(g.axes.ravel(), df.groupby(['Lab', 'model'])):
handles, labels = axes.get_legend_handles_labels()
# SUBSET MASTER SERIES OF serial
vals = ser_vals[ser_vals.isin(d['serial'].unique())]
idx = vals.index.tolist()
if len(idx) > 0:
axes.legend(handles = [handles[i] for i in idx],
labels = vals.tolist())

Is there any way to implement Stacked or Grouped Bar charts in plotly express

I am trying to implement a grouped-bar-chart (or) stacked-bar-chart in plotly express
I have implemented it using plotly (which is pretty straight forward) and below is code for it. There are altogether six columns in dataframe ['Rank', 'NOC', 'Gold', 'Silver', 'Bronze', 'Total']
`
trace1=go.Bar(x=olympics_data['NOC'],y=olympics_data['Gold'],marker=dict(color='green',opacity=0.5),name="Gold")
trace2=go.Bar(x=olympics_data['NOC'],y=olympics_data['Silver'],marker=dict(color='red',opacity=0.5),name="Silver")
trace3=go.Bar(x=olympics_data['NOC'],y=olympics_data['Bronze'],marker=dict(color='blue',opacity=0.5),name="Bronze")
data=[trace1,trace2,trace3]
layout = go.Layout(title="number of medals in each category for various countries",xaxis=dict(title="countries"),yaxis=dict(title="number of medals"),
barmode="stack")
fig = go.Figure(data,layout)
fig.show()`
Output:
I am expecting a similar output using plotly-express.
You can arrange your data to use px.bar() as in this link.
Or you can consider using relative in the barmode().
barmode (str (default 'relative')) – One of 'group', 'overlay' or
'relative' In 'relative' mode, bars are stacked above zero for
positive values and below zero for negative values. In 'overlay' mode,
bars are drawn on top of one another. In 'group' mode, bars are placed
beside each other.
Using overlay:
import plotly.express as px
iris = px.data.iris()
display(iris)
fig = px.histogram(iris, x='sepal_length', color='species',
nbins=19, range_x=[4,8], width=600, height=350,
opacity=0.4, marginal='box')
fig.update_layout(barmode='overlay')
fig.update_yaxes(range=[0,20],row=1, col=1)
fig.show()
Using relative:
fig.update_layout(barmode='relative')
fig.update_yaxes(range=[0,20],row=1, col=1)
fig.show()
Using group:
fig.update_layout(barmode='group')
fig.show()
Yes, Plotly Express support both stacked and grouped bars with px.bar(). Full documentation with examples is here https://plot.ly/python/bar-charts/
Here is a reusable function to do this.
def px_stacked_bar(df, color_name='category', y_name='y', **pxargs):
'''Row-wise stacked bar using plot-express.
Equivalent of `df.T.plot(kind='bar', stacked=True)`
`df` must be single-indexed'''
idx_col = df.index.name
m = pd.melt(df.reset_index(), id_vars=idx_col, var_name=color_name, value_name=y_name)
return px.bar(m, x=idx_col, y=y_name, color=color_name, **pxargs)
Example use
df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
'B': {0: 1, 1: 3, 2: 5},
'C': {0: 2, 1: 4, 2: 6}})
px_stacked_bar(df.set_index('A'))

How to plot a pie chart in matplotlib with 3 columns?

I need to plot a pie chart using matplotlib but my DataFrame has 3 columns namely gender, segment and total_amount.
I have tried playing with plt.pie() arguments but it only takes x and labels for data. I tried setting gender as a legend but then it doesn't look right.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'gender': {0: 'Female',
1: 'Female',
2: 'Female',
3: 'Male',
4: 'Male',
5: 'Male'},
'Segment': {0: 'Gold',
1: 'Platinum',
2: 'Silver',
3: 'Gold',
4: 'Platinum',
5: 'Silver'},
'total_amount': {0: 2110045.0,
1: 2369722.0,
2: 1897545.0,
3: 2655970.0,
4: 2096445.0,
5: 2347134.0}})
plt.pie(data = df,x="claim_amount",labels="Segment")
plt.legend(d3.gender)
plt.show()
The result I want is a pie chart of total_amount and its labels as gender and segment. If I can get the percentage, it will be a bonus.
I suggest the following:
# Data to plot
# Take the information from the segment and label columns and join them into one string
labels = df["Segment"]+ " " + df["gender"].map(str)
# Extract the sizes of the segments
sizes = df["total_amount"]
# Plot with labels and percentage
plt.pie(sizes, labels=labels,autopct='%1.1f%%')
plt.show()
You should get this:

How do I plot two pandas DataFrames in one graph with the same colors but different line styles?

Suppose I have the following two dataframes:
df1 = pd.DataFrame(np.random.randn(100, 3),columns=['A','B','C']).cumsum()
df2 = pd.DataFrame(np.random.randn(100, 3),columns=['A','B','C']).cumsum()
My question is that, how can I plot them in one graph such that:
The three series of df1 and df2 are still in the same blue, orange
and green lines as above.
The three series of df1 are in solid lines
The three series of df1 are in dashed lines
Currently the closest thing I can get is the following:
ax = df1.plot(style=['b','y','g'])
df2.plot(ax=ax, style=['b','y','g'], linestyle='--')
Is there any way to get the color codes used by default by DataFrame.plot()? Or is there any other better approach to achieve what I want? Ideally I don't want to specify any color codes with the style parameter but always use the default colors.
Without messing with the colors themselves or transferring them from one plot to the other you may easily just reset the colorcycle in between your plot commands
ax = df1.plot()
ax.set_prop_cycle(None)
df2.plot(ax=ax, linestyle="--")
You could use get_color from the lines:
df1 = pd.DataFrame(np.random.randn(100, 3),columns=['A','B','C']).cumsum()
df2 = pd.DataFrame(np.random.randn(100, 3),columns=['A','B','C']).cumsum()
ax = df1.plot()
l = ax.get_lines()
df2.plot(ax=ax, linestyle='--', color=(i.get_color() for i in l))
Output:
You can get the default color parameters that are currently being used from matplotlib.
import matplotlib.pyplot as plt
colors = list(plt.rcParams.get('axes.prop_cycle'))
[{'color': '#1f77b4'},
{'color': '#ff7f0e'},
{'color': '#2ca02c'},
{'color': '#d62728'},
{'color': '#9467bd'},
{'color': '#8c564b'},
{'color': '#e377c2'},
{'color': '#7f7f7f'},
{'color': '#bcbd22'},
{'color': '#17becf'}]
so just pass style=['#1f77b4', '#ff7f0e', '#2ca02c'] and the colors should work.
If you want to set another color cycler, say the older version, then:
plt.rcParams['axes.prop_cycle'] = ("cycler('color', 'bgrcmyk')")
list(plt.rcParams['axes.prop_cycle'])
#[{'color': 'b'},
# {'color': 'g'},
# {'color': 'r'},
# {'color': 'c'},
# {'color': 'm'},
# {'color': 'y'},
# {'color': 'k'}]

Matplotlib vs PivotChart: Grouped Axis Labels

How can I format Matplotlib plots on multi-indexed data to resemble Excel's PivotChart axis layout? Excel's PivotChart feature groups similar axis labels together, whereas MPL labels each tick individually as (Index1,Index2). Using the Sample Data, I've provided the outputs for both Excel and MPL; notice how Index1 is grouped in the Excel chart, but not in the MPL plot.
data = {
'Index1': {0: 'A', 1: 'A', 2: 'B', 3: 'B'},
'Index2': {0: 1, 1: 2, 2: 1, 3: 2},
'Value': {0: 50, 1: 100, 2: 50, 3: 100}
}
Matplotlib Chart
Excel Chart
Does anyone have a solution? Ideally, the number of multi-index levels will not matter. Thanks for the help!

Categories

Resources