Related
I am trying to make a function using plotly 5.9.0 that will reproduce a specific type of plot. I am having trouble aligning legend entries with their subplots, especially when the figure is resizable.
This is what i currently have:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.subplots as sp
from plotly.offline import plot
def get_df(len_df):
x = np.linspace(-1, 1, len_df)
# Create a dictionary with the functions to use for each column
funcs = {
"column1": np.sin,
"column2": np.cos,
"column3": np.tan,
"column4": np.arcsin,
"column5": np.arccos,
"column6": np.arctan
}
# Create an empty dataframe with the same index as x
df = pd.DataFrame(index=pd.date_range('2022-01-01', periods=len(x), freq='H'))
# Populate the dataframe with the functions
for column, func in funcs.items():
df[column] = func(x)
return df
def plot_subplots(df, column_groups, fig_height=1000):
# Create a figure with a grid of subplots
fig = sp.make_subplots(rows=len(column_groups), shared_xaxes=True, shared_yaxes=True, vertical_spacing=.1)
# Iterate over the list of column groups
for i, group in enumerate(column_groups):
# Iterate over the columns in the current group
for column in group:
# Add a scatter plot for the current column to the figure, specifying the row number
fig.add_trace(go.Scatter(x=df.index, y=df[column], mode="lines", name=column, legendgroup=str(i)), row=i + 1, col=1)
fig.update_layout(legend_tracegroupgap=fig_height/len(column_groups), height=fig_height)
return fig
df = get_df(1000)
column_groups = [
['column1', 'column3'],
['column2', 'column4'],
['column5', 'column6']
]
fig = plot_subplots(df, column_groups)
plot(fig)
This produces a plot that looks like this:
How do I align my legend subgroups with the top of each corresponding plotly subplot?
If we can somehow relate the legend_tracegroupgap to the height of the figure that would be a great first step. This feels like such a logical thing to want that I feel like I'm missing something.
In reply to r-beginners:
I tried this:
tracegroupgap=(fig.layout.yaxis.domain[1] - fig.layout.yaxis.domain[0])*fig_height
Which works perfectly for a figure with a height of 1000. But not for a height of 500 pixels. I still have to subtract some value that has to do with the vertical spacing is my guess.
There are few functions in plotly that allow strict size definitions other than figure size. The position of the legend in a subplot can also only be set by setting the spacing between legend groups as a pixel value (the default is 10px). So I used a function provided for development to check the area of the subplot.
dev_fig = fig.full_figure_for_development()
'yxais':{
...
domain': [0.7333333333333334, 1],
...
}
'yaxis2': {
...
'domain': [0.3666666666666667, 0.6333333333333333],
...
}
'yaxis3': {
...
'domain': [0, 0.26666666666666666],
...
}
fig.update_layout(legend_tracegroupgap=266, height=fig_height)
Since each subplot is drawn to the nearest 0.26 units, the gap was set at 266. However, this does not mean that we have derived a perfect value. I am sure other factors are still affecting this, and I hope to get answers from actual developers and others.
Question has been updated so that if the height of the graph is 500px
The default margins are 100px top and 80px bottom, so set them to 0.
def plot_subplots(df, column_groups, fig_height=500):
# Create a figure with a grid of subplots
fig = sp.make_subplots(rows=len(column_groups), shared_xaxes=True, shared_yaxes=True, vertical_spacing=.1)
# Iterate over the list of column groups
for i, group in enumerate(column_groups):
# Iterate over the columns in the current group
for column in group:
# Add a scatter plot for the current column to the figure, specifying the row number
fig.add_trace(go.Scatter(x=df.index, y=df[column], mode="lines", name=column, legendgroup=str(i)), row=i + 1, col=1)
tracegroupgap = (fig.layout.yaxis.domain[1] - fig.layout.yaxis.domain[0])*fig_height
print(fig.layout.yaxis.domain[0], fig.layout.yaxis.domain[1])
print(tracegroupgap)
fig.update_layout(margin=dict(t=0,b=0,l=0,r=0))
fig.update_layout(legend_tracegroupgap=tracegroupgap, height=fig_height)#fig_height/len(column_groups)
return fig
I am a bit new to Python. And I am playing with a dummy dataset to get some Python data manipulation practice. Below is the code for generating the dummy data:
d = {
'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] ,
'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] ,
'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] ,
'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] ,
'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] ,
'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
CarWash = pd.DataFrame(data = d)
categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
numerical = ['DistancefromBranch']
CarWash[categoricals] = CarWash[categoricals].astype('category')
I am basically struggling with a couple of things:
#1. A stacked barplot with absolute values (like the excel example below)
#2. A stacked barplot with percentage values (like the excel example below)
Below are my target visualizations for # 1 and # 2 using countplot().
#1
#2
For # 1, instead of a stacked barplot, with countplot() I am able to make a clustered barplot, like below, and also the annotation snippet feels more like a workaround rather than being Python elegant.
# Looping through each categorical column and viewing target variable distribution (ReversedPayment) by value
figure, axes = plt.subplots(2,2,figsize = (10,10))
for i,ax in zip(categoricals[:-1],axes.flatten()):
sns.countplot(x= i, hue = 'ReversedPayment', data = CarWash, ax = ax)
for p in ax.patches:
height = np.nan_to_num(p.get_height()) # gets the height of each patch/bar
adjust = np.nan_to_num(p.get_width())/2 # a calculation for adusting the data label later
label_xy = (np.nan_to_num(p.get_x()) + adjust,np.nan_to_num(p.get_height()) + adjust) #x,y coordinates where we want to put our data label
ax.annotate(height,label_xy) # final annotation
For # 2, I tried creating a new data frame housing % values but that felt tedious and error-prone.
I feel an option like stacked = True, proportion = True, axis = 1, annotate = True could have been so useful for countplot() to have.
Are there any other libraries that would be straight-froward and less code-intensive? Any comments or suggestions are welcome.
In this case, I think plotly.express may be more intuitive for you.
import plotly.express as px
df_temp = CarWash.groupby(['SeniorCitizen', 'ReversedPayment'])['DistancefromBranch'].count().reset_index().rename({'DistancefromBranch':'count'}, axis=1)
fig = px.bar(df_temp, x="SeniorCitizen", y="count", color="ReversedPayment", title="SeniorCitizen", text='count')
fig.update_traces(textposition='inside')
fig.show()
Basically, if you want to have more flexibility to adjust your charts, it is hard to avoid writing lots of codes.
I also try using matplotlib and pandas to create a stacked bar chart for percentages. If you are interested in it, you can try it.
sns.set()
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=[12,8], dpi=100)
# Conver the axes matrix to a 1-d array
axes = ax.flatten()
for i, col in enumerate(['SeniorCitizen', 'CollegeDegree', 'Married', 'FulltimeJob']):
# Calculate the number of plots
df_temp = (CarWash.groupby(col)['ReversedPayment']
.value_counts()
.unstack(1).fillna(0)
.rename({0:f'No', 1:f'Yes'})
.rename({0:'No', 1:'Yes'}, axis=1))
df_temp = df_temp / df_temp.sum(axis=0)
df_temp.plot.bar(stacked=True, ax=axes[i])
axes[i].set_title(col, y=1.03, fontsize=16)
rects = axes[i].patches
labels = df_temp.values.flatten()
for rect, label in zip(rects, labels):
if label == 0: continue
axes[i].text(rect.get_x() + rect.get_width() / 2, rect.get_y() + rect.get_height() / 3, '{:.2%}'.format(label),
ha='center', va='bottom', color='white', fontsize=12)
axes[i].legend(title='Reversed\nPayment', bbox_to_anchor=(1.05, 1), loc='upper left', title_fontsize = 10, fontsize=10)
axes[i].tick_params(rotation=0)
plt.tight_layout()
plt.show()
I would like to write scout report on some football players and for that I need visualizations. One type of which is pie charts. Now I need some pie charts that looks like below, with different size of slices ( proportionate to the number of the thing the slice indicates) . Can anyone suggest how to do it or have any link to websites where I can learn this?
What you are looking for is called a "Radar Pie Chart". It's analogous to the more commonly used "Radar Chart", but I think it looks better as it highlights the values, rather than focus on meaningless shapes.
The challenge you face with your football dataset is that each category is on a different scale, so you want to plot each value as a percentage of some max. My code will accomplish that, but you'll want to annotate the original values to finish off these charts.
The plot itself can be done with just the standard matplotlib library using polar axes. I borrowed code from here (https://raphaelletseng.medium.com/getting-to-know-matplotlib-and-python-docx-5ee67bad38d2).
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from math import pi
from random import random, seed
seed(12345)
# Generate dataset with 10 rows, different maxes
maxes = [5, 5, 5, 2, 2, 10, 10, 10, 10, 10]
df = pd.DataFrame(
data = {
'categories': ['category_{}'.format(x) for x, _ in enumerate(maxes)],
'scores': [random()*max for max in maxes],
'max_values': maxes,
},
)
df['pct'] = df['scores'] / df['max_values']
df = df.set_index('categories')
# Plot pie radar chart
N = df.shape[0]
theta = np.linspace(0.0, 2*np.pi, N, endpoint=False)
categories = df.index
df['radar_angles'] = theta
ax = plt.subplot(polar=True)
ax.bar(df['radar_angles'], df['pct'], width=2*pi/N, linewidth=2, edgecolor='k', alpha=0.5)
ax.set_xticks(theta)
ax.set_xticklabels(categories)
_ = ax.set_yticklabels([])
I had previously work with rose or polar bar chart. Here is the example.
import plotly.express as px
df = px.data.wind()
fig = px.bar_polar(df, r="frequency", theta="direction",
color="strength", template="plotly_dark",
color_discrete_sequence= px.colors.sequential.Plasma_r)
fig.show()
this is my first foray into Plotly. I love the ease of use compared to matplotlib and bokeh. However I'm stuck on some basic questions on how to beautify my plot. First, this is the code below (its fully functional, just copy and paste!):
import plotly.express as px
from plotly.subplots import make_subplots
import plotly as py
import pandas as pd
from plotly import tools
d = {'Mkt_cd': ['Mkt1','Mkt2','Mkt3','Mkt4','Mkt5','Mkt1','Mkt2','Mkt3','Mkt4','Mkt5'],
'Category': ['Apple','Orange','Grape','Mango','Orange','Mango','Apple','Grape','Apple','Orange'],
'CategoryKey': ['Mkt1Apple','Mkt2Orange','Mkt3Grape','Mkt4Mango','Mkt5Orange','Mkt1Mango','Mkt2Apple','Mkt3Grape','Mkt4Apple','Mkt5Orange'],
'Current': [15,9,20,10,20,8,10,21,18,14],
'Goal': [50,35,21,44,20,24,14,29,28,19]
}
dataset = pd.DataFrame(d)
grouped = dataset.groupby('Category', as_index=False).sum()
data = grouped.to_dict(orient='list')
v_cat = grouped['Category'].tolist()
v_current = grouped['Current']
v_goal = grouped['Goal']
fig1 = px.bar(dataset, x = v_current, y = v_cat, orientation = 'h',
color_discrete_sequence = ["#ff0000"],height=10)
fig2 = px.bar(dataset, x = v_goal, y = v_cat, orientation = 'h',height=15)
trace1 = fig1['data'][0]
trace2 = fig2['data'][0]
fig = make_subplots(rows = 1, cols = 1, shared_xaxes=True, shared_yaxes=True)
fig.add_trace(trace2, 1, 1)
fig.add_trace(trace1, 1, 1)
fig.update_layout(barmode = 'overlay')
fig.show()
Here is the Output:
Question1: how do I make the width of v_current (shown in red bar) smaller? As in, it should be smaller in height since this is a horizontal bar. I added the height as 10 for trace1 and 15 for trace2, but they are still showing at the same heights.
Question2: Is there a way to make the v_goal (shown in blue bar) only show it's right edge, instead of a filled out bar? Something like this:
If you noticed, I also added a line under each of the category. Is there a quick way to add this as well? Not a deal breaker, just a bonus. Other things I'm trying to do is add animation, etc but that's for some other time!
Thanks in advance for answering!
Running plotly.express wil return a plotly.graph_objs._figure.Figure object. The same goes for plotly.graph_objects running go.Figure() together with, for example, go.Bar(). So after building a figure using plotly express, you can add lines or traces through references directly to the figure, like:
fig['data'][0].width = 0.4
Which is exactly what you need to set the width of your bars. And you can easily use this in combination with plotly express:
Code 1
fig = px.bar(grouped, y='Category', x = ['Current'],
orientation = 'h', barmode='overlay', opacity = 1,
color_discrete_sequence = px.colors.qualitative.Plotly[1:])
fig['data'][0].width = 0.4
Plot 1
In order to get the bars or shapes to indicate the goal levels, you can use the approach described by DerekO, or you can use:
for i, g in enumerate(grouped.Goal):
fig.add_shape(type="rect",
x0=g+1, y0=grouped.Category[i], x1=g, y1=grouped.Category[i],
line=dict(color='#636EFA', width = 28))
Complete code:
import plotly.express as px
from plotly.subplots import make_subplots
import plotly as py
import pandas as pd
from plotly import tools
d = {'Mkt_cd': ['Mkt1','Mkt2','Mkt3','Mkt4','Mkt5','Mkt1','Mkt2','Mkt3','Mkt4','Mkt5'],
'Category': ['Apple','Orange','Grape','Mango','Orange','Mango','Apple','Grape','Apple','Orange'],
'CategoryKey': ['Mkt1Apple','Mkt2Orange','Mkt3Grape','Mkt4Mango','Mkt5Orange','Mkt1Mango','Mkt2Apple','Mkt3Grape','Mkt4Apple','Mkt5Orange'],
'Current': [15,9,20,10,20,8,10,21,18,14],
'Goal': [50,35,21,44,20,24,14,29,28,19]
}
dataset = pd.DataFrame(d)
grouped = dataset.groupby('Category', as_index=False).sum()
fig = px.bar(grouped, y='Category', x = ['Current'],
orientation = 'h', barmode='overlay', opacity = 1,
color_discrete_sequence = px.colors.qualitative.Plotly[1:])
fig['data'][0].width = 0.4
fig['data'][0].marker.line.width = 0
for i, g in enumerate(grouped.Goal):
fig.add_shape(type="rect",
x0=g+1, y0=grouped.Category[i], x1=g, y1=grouped.Category[i],
line=dict(color='#636EFA', width = 28))
f = fig.full_figure_for_development(warn=False)
fig.show()
You can use Plotly Express and then directly access the figure object as #vestland described, but personally I prefer to use graph_objects to make all of the changes in one place.
I'll also point out that since you are stacking bars in one chart, you don't need subplots. You can create a graph_object with fig = go.Figure() and add traces to get stacked bars, similar to what you already did.
For question 1, if you are using go.Bar(), you can pass a width parameter. However, this is in units of the position axis, and since your y-axis is categorical, width=1 will fill the entire category, so I have chosen width=0.25 for the red bar, and width=0.3 (slightly larger) for the blue bar since that seems like it was your intention.
For question 2, the only thing that comes to mind is a hack. Split the bars into two sections (one with height = original height - 1), and set its opacity to 0 so that it is transparent. Then place down bars of height 1 on top of the transparent bars.
If you don't want the traces to show up in the legend, you can set this individually for each bar by passing showlegend=False to fig.add_trace, or hide the legend entirely by passing showlegend=False to the fig.update_layout method.
import plotly.express as px
import plotly.graph_objects as go
# from plotly.subplots import make_subplots
import plotly as py
import pandas as pd
from plotly import tools
d = {'Mkt_cd': ['Mkt1','Mkt2','Mkt3','Mkt4','Mkt5','Mkt1','Mkt2','Mkt3','Mkt4','Mkt5'],
'Category': ['Apple','Orange','Grape','Mango','Orange','Mango','Apple','Grape','Apple','Orange'],
'CategoryKey': ['Mkt1Apple','Mkt2Orange','Mkt3Grape','Mkt4Mango','Mkt5Orange','Mkt1Mango','Mkt2Apple','Mkt3Grape','Mkt4Apple','Mkt5Orange'],
'Current': [15,9,20,10,20,8,10,21,18,14],
'Goal': [50,35,21,44,20,24,14,29,28,19]
}
dataset = pd.DataFrame(d)
grouped = dataset.groupby('Category', as_index=False).sum()
data = grouped.to_dict(orient='list')
v_cat = grouped['Category'].tolist()
v_current = grouped['Current']
v_goal = grouped['Goal']
fig = go.Figure()
## you have a categorical plot and the units for width are in position axis units
## therefore width = 1 will take up the entire allotted space
## a width value of less than 1 will be the fraction of the allotted space
fig.add_trace(go.Bar(
x=v_current,
y=v_cat,
marker_color="#ff0000",
orientation='h',
width=0.25
))
## you can show the right edge of the bar by splitting it into two bars
## with the majority of the bar being transparent (opacity set to 0)
fig.add_trace(go.Bar(
x=v_goal-1,
y=v_cat,
marker_color="#ffffff",
opacity=0,
orientation='h',
width=0.30,
))
fig.add_trace(go.Bar(
x=[1]*len(v_cat),
y=v_cat,
marker_color="#1f77b4",
orientation='h',
width=0.30,
))
fig.update_layout(barmode='relative')
fig.show()
Is there a way to iteratively plot data using seaborn's sns.boxplot() without having the boxplots overlap? (without combining datasets into a single pd.DataFrame())
Background
Sometimes when comparing different (e.g. size/shape) datasets, a mutual comparison is often useful and can be made by binning the datasets by a different shared variable (via pd.cut() and df.groupby(), as shown below).
Previously, I have iteratively plotted these "binned" data as boxplots on the same axis by looping separate DataFrames using matplotlib's ax.boxplot() (by providing y axis location values as a position argument to to ensure boxplots don't overlap).
Example
Below is an simplified example that shows the overlapping plots in when using sns.boxplot():
import seaborn as sns
import random
import pandas as pd
import matplotlib.pyplot as plt
# Get the tips dataset and select a subset as an example
tips = sns.load_dataset("tips")
variable_to_bin_by = 'tip'
binned_variable = 'total_bill'
df = tips[[binned_variable, variable_to_bin_by] ]
# Create a second dataframe with different values and shape
df2 = pd.concat( [ df.copy() ] *5 )
# Use psuedo random numbers to convey that df2 is different to df
scale = [ random.uniform(0,2) for i in range(len(df2[binned_variable])) ]
df2[ binned_variable ] = df2[binned_variable].values * scale * 5
dfs = [ df, df2 ]
# Group the data by a list of bins
bins = [0, 1, 2, 3, 4]
for n, df in enumerate( dfs ):
gdf = df.groupby( pd.cut(df[variable_to_bin_by].values, bins ) )
data = [ i[1][binned_variable].values for i in gdf]
dfs[n] = pd.DataFrame( data, index = bins[:-1])
# Create an axis for both DataFrames to be plotted on
fig, ax = plt.subplots()
# Loop the DataFrames and plot
colors = ['red', 'black']
for n in range(2):
ax = sns.boxplot( data=dfs[n].T, ax=ax, width=0.2, orient='h',
color=colors[n] )
plt.ylabel( variable_to_bin_by )
plt.xlabel( binned_variable )
plt.show()
More detail
I realise the simplified example above could resolved by combining the DataFrames and providing the hue argument to sns.boxplot().
Updating the index of the DataFrames provide also doesn't help, as y values from the last DataFrame provided is then used.
Providing the kwargs argument (e.g. kwargs={'positions': dfs[n].T.index}) won't work as this raises a TypeError.
TypeError: boxplot() got multiple values for keyword argument
'positions'
The setting sns.boxplot()'s dodge argument to True doesn't solve this.
Funnily enough, the "hack" that I proposed earlier today in this answer could be applied here.
It complicates the code a bit because seaborn expects a long-form dataframe instead of a wide-form to use hue-nesting.
# Get the tips dataset and select a subset as an example
tips = sns.load_dataset("tips")
df = tips[['total_bill', 'tip'] ]
# Group the data by
bins = [0, 1, 2, 3, 4]
gdf = df.groupby( pd.cut(df['tip'].values, bins ) )
data = [ i[1]['total_bill'].values for i in gdf]
df = pd.DataFrame( data , index = bins[:-1]).T
dfm = df.melt() # create a long-form database
dfm.loc[:,'dummy'] = 'dummy'
# Create a second, slightly different, DataFrame
dfm2 = dfm.copy()
dfm2.value = dfm.value*2
dfs = [ dfm, dfm2 ]
colors = ['red', 'black']
hue_orders = [['dummy','other'], ['other','dummy']]
# Create an axis for both DataFrames to be plotted on
fig, ax = plt.subplots()
# Loop the DataFrames and plot
for n in range(2):
ax = sns.boxplot( data=dfs[n], x='value', y='variable', hue='dummy', hue_order=hue_orders[n], ax=ax, width=0.2, orient='h',
color=colors[n] )
ax.legend_.remove()
plt.show()