Altair bar chart with bars of variable width? - python

I'm trying to use Altair in Python to make a bar chart where the bars have varying width depending on the data in a column of the source dataframe. The ultimate goal is to get a chart like this one:
The height of the bars corresponds to a marginal-cost of each energy-technology (given as a column in the source dataframe). The bar width corresponds to the capacity of each energy-technology (also given as a columns in the source dataframe). Colors are ordinal data also from the source dataframe. The bars are sorted in increasing order of marginal cost. (A plot like this is called a "generation stack" in the energy industry). This is easy to achieve in matplotlib like shown in the code below:
import matplotlib.pyplot as plt
# Make fake dataset
height = [3, 12, 5, 18, 45]
bars = ('A', 'B', 'C', 'D', 'E')
# Choose the width of each bar and their positions
width = [0.1,0.2,3,1.5,0.3]
y_pos = [0,0.3,2,4.5,5.5]
# Make the plot
plt.bar(y_pos, height, width=width)
plt.xticks(y_pos, bars)
plt.show()
(code from https://python-graph-gallery.com/5-control-width-and-space-in-barplots/)
But is there a way to do this with Altair? I would want to do this with Altair so I can still get the other great features of Altair like a tooltip, selectors/bindings as I have lots of other data I want to show alongside the bar-chart.
First 20 rows of my source data looks like this:
(does not match exactly the chart shown above).

In Altair, the way to do this would be to use the rect mark and construct your bars explicitly. Here is an example that mimics your data:
import altair as alt
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame({
'MarginalCost': 100 * np.random.rand(30),
'Capacity': 10 * np.random.rand(30),
'Technology': np.random.choice(['SOLAR', 'THERMAL', 'WIND', 'GAS'], 30)
})
df = df.sort_values('MarginalCost')
df['x1'] = df['Capacity'].cumsum()
df['x0'] = df['x1'].shift(fill_value=0)
alt.Chart(df).mark_rect().encode(
x=alt.X('x0:Q', title='Capacity'),
x2='x1',
y=alt.Y('MarginalCost:Q', title='Marginal Cost'),
color='Technology:N',
tooltip=["Technology", "Capacity", "MarginalCost"]
)
To get the same result without preprocessing of the data, you can use Altair's transform syntax:
df = pd.DataFrame({
'MarginalCost': 100 * np.random.rand(30),
'Capacity': 10 * np.random.rand(30),
'Technology': np.random.choice(['SOLAR', 'THERMAL', 'WIND', 'GAS'], 30)
})
alt.Chart(df).transform_window(
x1='sum(Capacity)',
sort=[alt.SortField('MarginalCost')]
).transform_calculate(
x0='datum.x1 - datum.Capacity'
).mark_rect().encode(
x=alt.X('x0:Q', title='Capacity'),
x2='x1',
y=alt.Y('MarginalCost:Q', title='Marginal Cost'),
color='Technology:N',
tooltip=["Technology", "Capacity", "MarginalCost"]
)

Related

How to create a wind rose or polar bar plot

I would like to write scout report on some football players and for that I need visualizations. One type of which is pie charts. Now I need some pie charts that looks like below, with different size of slices ( proportionate to the number of the thing the slice indicates) . Can anyone suggest how to do it or have any link to websites where I can learn this?
What you are looking for is called a "Radar Pie Chart". It's analogous to the more commonly used "Radar Chart", but I think it looks better as it highlights the values, rather than focus on meaningless shapes.
The challenge you face with your football dataset is that each category is on a different scale, so you want to plot each value as a percentage of some max. My code will accomplish that, but you'll want to annotate the original values to finish off these charts.
The plot itself can be done with just the standard matplotlib library using polar axes. I borrowed code from here (https://raphaelletseng.medium.com/getting-to-know-matplotlib-and-python-docx-5ee67bad38d2).
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from math import pi
from random import random, seed
seed(12345)
# Generate dataset with 10 rows, different maxes
maxes = [5, 5, 5, 2, 2, 10, 10, 10, 10, 10]
df = pd.DataFrame(
data = {
'categories': ['category_{}'.format(x) for x, _ in enumerate(maxes)],
'scores': [random()*max for max in maxes],
'max_values': maxes,
},
)
df['pct'] = df['scores'] / df['max_values']
df = df.set_index('categories')
# Plot pie radar chart
N = df.shape[0]
theta = np.linspace(0.0, 2*np.pi, N, endpoint=False)
categories = df.index
df['radar_angles'] = theta
ax = plt.subplot(polar=True)
ax.bar(df['radar_angles'], df['pct'], width=2*pi/N, linewidth=2, edgecolor='k', alpha=0.5)
ax.set_xticks(theta)
ax.set_xticklabels(categories)
_ = ax.set_yticklabels([])
I had previously work with rose or polar bar chart. Here is the example.
import plotly.express as px
df = px.data.wind()
fig = px.bar_polar(df, r="frequency", theta="direction",
color="strength", template="plotly_dark",
color_discrete_sequence= px.colors.sequential.Plasma_r)
fig.show()

How do I resize my Plotly bar height and show only bar’s edge (in subplot)?

this is my first foray into Plotly. I love the ease of use compared to matplotlib and bokeh. However I'm stuck on some basic questions on how to beautify my plot. First, this is the code below (its fully functional, just copy and paste!):
import plotly.express as px
from plotly.subplots import make_subplots
import plotly as py
import pandas as pd
from plotly import tools
d = {'Mkt_cd': ['Mkt1','Mkt2','Mkt3','Mkt4','Mkt5','Mkt1','Mkt2','Mkt3','Mkt4','Mkt5'],
'Category': ['Apple','Orange','Grape','Mango','Orange','Mango','Apple','Grape','Apple','Orange'],
'CategoryKey': ['Mkt1Apple','Mkt2Orange','Mkt3Grape','Mkt4Mango','Mkt5Orange','Mkt1Mango','Mkt2Apple','Mkt3Grape','Mkt4Apple','Mkt5Orange'],
'Current': [15,9,20,10,20,8,10,21,18,14],
'Goal': [50,35,21,44,20,24,14,29,28,19]
}
dataset = pd.DataFrame(d)
grouped = dataset.groupby('Category', as_index=False).sum()
data = grouped.to_dict(orient='list')
v_cat = grouped['Category'].tolist()
v_current = grouped['Current']
v_goal = grouped['Goal']
fig1 = px.bar(dataset, x = v_current, y = v_cat, orientation = 'h',
color_discrete_sequence = ["#ff0000"],height=10)
fig2 = px.bar(dataset, x = v_goal, y = v_cat, orientation = 'h',height=15)
trace1 = fig1['data'][0]
trace2 = fig2['data'][0]
fig = make_subplots(rows = 1, cols = 1, shared_xaxes=True, shared_yaxes=True)
fig.add_trace(trace2, 1, 1)
fig.add_trace(trace1, 1, 1)
fig.update_layout(barmode = 'overlay')
fig.show()
Here is the Output:
Question1: how do I make the width of v_current (shown in red bar) smaller? As in, it should be smaller in height since this is a horizontal bar. I added the height as 10 for trace1 and 15 for trace2, but they are still showing at the same heights.
Question2: Is there a way to make the v_goal (shown in blue bar) only show it's right edge, instead of a filled out bar? Something like this:
If you noticed, I also added a line under each of the category. Is there a quick way to add this as well? Not a deal breaker, just a bonus. Other things I'm trying to do is add animation, etc but that's for some other time!
Thanks in advance for answering!
Running plotly.express wil return a plotly.graph_objs._figure.Figure object. The same goes for plotly.graph_objects running go.Figure() together with, for example, go.Bar(). So after building a figure using plotly express, you can add lines or traces through references directly to the figure, like:
fig['data'][0].width = 0.4
Which is exactly what you need to set the width of your bars. And you can easily use this in combination with plotly express:
Code 1
fig = px.bar(grouped, y='Category', x = ['Current'],
orientation = 'h', barmode='overlay', opacity = 1,
color_discrete_sequence = px.colors.qualitative.Plotly[1:])
fig['data'][0].width = 0.4
Plot 1
In order to get the bars or shapes to indicate the goal levels, you can use the approach described by DerekO, or you can use:
for i, g in enumerate(grouped.Goal):
fig.add_shape(type="rect",
x0=g+1, y0=grouped.Category[i], x1=g, y1=grouped.Category[i],
line=dict(color='#636EFA', width = 28))
Complete code:
import plotly.express as px
from plotly.subplots import make_subplots
import plotly as py
import pandas as pd
from plotly import tools
d = {'Mkt_cd': ['Mkt1','Mkt2','Mkt3','Mkt4','Mkt5','Mkt1','Mkt2','Mkt3','Mkt4','Mkt5'],
'Category': ['Apple','Orange','Grape','Mango','Orange','Mango','Apple','Grape','Apple','Orange'],
'CategoryKey': ['Mkt1Apple','Mkt2Orange','Mkt3Grape','Mkt4Mango','Mkt5Orange','Mkt1Mango','Mkt2Apple','Mkt3Grape','Mkt4Apple','Mkt5Orange'],
'Current': [15,9,20,10,20,8,10,21,18,14],
'Goal': [50,35,21,44,20,24,14,29,28,19]
}
dataset = pd.DataFrame(d)
grouped = dataset.groupby('Category', as_index=False).sum()
fig = px.bar(grouped, y='Category', x = ['Current'],
orientation = 'h', barmode='overlay', opacity = 1,
color_discrete_sequence = px.colors.qualitative.Plotly[1:])
fig['data'][0].width = 0.4
fig['data'][0].marker.line.width = 0
for i, g in enumerate(grouped.Goal):
fig.add_shape(type="rect",
x0=g+1, y0=grouped.Category[i], x1=g, y1=grouped.Category[i],
line=dict(color='#636EFA', width = 28))
f = fig.full_figure_for_development(warn=False)
fig.show()
You can use Plotly Express and then directly access the figure object as #vestland described, but personally I prefer to use graph_objects to make all of the changes in one place.
I'll also point out that since you are stacking bars in one chart, you don't need subplots. You can create a graph_object with fig = go.Figure() and add traces to get stacked bars, similar to what you already did.
For question 1, if you are using go.Bar(), you can pass a width parameter. However, this is in units of the position axis, and since your y-axis is categorical, width=1 will fill the entire category, so I have chosen width=0.25 for the red bar, and width=0.3 (slightly larger) for the blue bar since that seems like it was your intention.
For question 2, the only thing that comes to mind is a hack. Split the bars into two sections (one with height = original height - 1), and set its opacity to 0 so that it is transparent. Then place down bars of height 1 on top of the transparent bars.
If you don't want the traces to show up in the legend, you can set this individually for each bar by passing showlegend=False to fig.add_trace, or hide the legend entirely by passing showlegend=False to the fig.update_layout method.
import plotly.express as px
import plotly.graph_objects as go
# from plotly.subplots import make_subplots
import plotly as py
import pandas as pd
from plotly import tools
d = {'Mkt_cd': ['Mkt1','Mkt2','Mkt3','Mkt4','Mkt5','Mkt1','Mkt2','Mkt3','Mkt4','Mkt5'],
'Category': ['Apple','Orange','Grape','Mango','Orange','Mango','Apple','Grape','Apple','Orange'],
'CategoryKey': ['Mkt1Apple','Mkt2Orange','Mkt3Grape','Mkt4Mango','Mkt5Orange','Mkt1Mango','Mkt2Apple','Mkt3Grape','Mkt4Apple','Mkt5Orange'],
'Current': [15,9,20,10,20,8,10,21,18,14],
'Goal': [50,35,21,44,20,24,14,29,28,19]
}
dataset = pd.DataFrame(d)
grouped = dataset.groupby('Category', as_index=False).sum()
data = grouped.to_dict(orient='list')
v_cat = grouped['Category'].tolist()
v_current = grouped['Current']
v_goal = grouped['Goal']
fig = go.Figure()
## you have a categorical plot and the units for width are in position axis units
## therefore width = 1 will take up the entire allotted space
## a width value of less than 1 will be the fraction of the allotted space
fig.add_trace(go.Bar(
x=v_current,
y=v_cat,
marker_color="#ff0000",
orientation='h',
width=0.25
))
## you can show the right edge of the bar by splitting it into two bars
## with the majority of the bar being transparent (opacity set to 0)
fig.add_trace(go.Bar(
x=v_goal-1,
y=v_cat,
marker_color="#ffffff",
opacity=0,
orientation='h',
width=0.30,
))
fig.add_trace(go.Bar(
x=[1]*len(v_cat),
y=v_cat,
marker_color="#1f77b4",
orientation='h',
width=0.30,
))
fig.update_layout(barmode='relative')
fig.show()

How to make aggregated point sizes bigger in Altair, Python?

I am working on a dashboard using Altair. I am creating 4 different plot using the same data. I am creating scatterplots using mark_circle.
How do I change the size to be size*2, or anything else?
Here is a sample:
bar = alt.Chart(df).mark_point(filled=True).encode(
x='AGE_GROUP:N',
y=alt.Y( 'PERP:N', axis=alt.Axis( values= df['PERP'].unique().tolist() )),
size = 'count()')
You can do this by adjusting the scale range for the size encoding. For example, this sets the range such that the smallest points have an area of 100 square pixels, and the largest have an area of 500 square pixels:
import altair as alt
import pandas as pd
import numpy as np
df = pd.DataFrame({
'x': np.random.randn(30),
'y': np.random.randn(30),
'count': np.random.randint(1, 5, 30)
})
alt.Chart(df).mark_point().encode(
x='x',
y='y',
size=alt.Size('count', scale=alt.Scale(range=[100, 500]))
)

Python - Dynamic Line Chart Marker Colours based on an another column

I'm trying to create a simple line chart in Python, for each customer I want to show a single line showing the trend based on the value (y-axis) and date (x-axis). The marker colours should be based on the colour field.
Please find below for the sample data and the expected output (sorry for my scribbling)
Sample Data Here
Expected Output
A simple way is to first use the plot function to plot the lines, for example in black, and then use scatter to plot the colored markers. Here I show an example:
I generate data similar to yours as follows (I suppose you are storing the data in a pandas dataframe):
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
columns = ['CustID', 'Date', 'Value', 'Color']
data = []
np.random.seed(42)
for cus_id in np.arange(1, 5):
for date in ['jan', 'feb', 'mar', 'apr']:
data.append([cus_id, date, np.random.randint(20,35),
np.random.choice(['red', 'blue', 'green'])])
df = pd.DataFrame(data=data, columns=columns)
At this point, df contains this table.
Finally, to plot your line chart:
fig, ax = plt.subplots(1)
for _, df_cus in df.groupby('CustID'):
ax.plot(df_cus.Date, df_cus.Value, color='black', zorder=0)
ax.scatter(df_cus.Date, df_cus.Value, color=df_cus.Color, zorder=1)
Here's the output
I use zorder to make sure the scatter plots are always on top of the black lines.

Plotting multiple bars with matplotlib using ax.bar()

Following up my previous question: Sorting datetime objects by hour to a pandas dataframe then visualize to histogram
I need to plot 3 bars for one X-axis value representing viewer counts. Now they show those under one minute and above. I need one showing the overall viewers. I have the Dataframe but I can't seem to make them look right. With just 2 bars I have no problem, it looks just like I would want it with two bars:
The relevant part of the code for this:
# Time and date stamp variables
allviews = int(df['time'].dt.hour.count())
date = str(df['date'][0].date())
hours = df_hist_short.index.tolist()
hours[:] = [str(x) + ':00' for x in hours]
The hours variable that I use to represent the X-axis may be problematic, since I convert it to string so I can make the hours look like 23:00 instead of just the pandas index output 23 etc. I have seen examples where people add or subtract values from the X to change the bars position.
fig, ax = plt.subplots(figsize=(20, 5))
short_viewers = ax.bar(hours, df_hist_short['time'], width=-0.35, align='edge')
long_viewers = ax.bar(hours, df_hist_long['time'], width=0.35, align='edge')
Now I set the align='edge' and the two width values are absolutes and negatives. But I have no idea how to make it look right with 3 bars. I didn't find any positioning arguments for the bars. Also I have tried to work with the plt.hist() but I couldn't get the same output as with the plt.bar() function.
So as a result I wish to have a 3rd bar on the graph shown above on the left side, a bit wider than the other two.
pandas will do this alignment for you, if you make the bar plot in one step rather than two (or three). Consider this example (adapted from the docs to add a third bar for each animal).
import pandas as pd
import matplotlib.pyplot as plt
speed = [0.1, 17.5, 40, 48, 52, 69, 88]
lifespan = [2, 8, 70, 1.5, 25, 12, 28]
height = [1, 5, 20, 3, 30, 6, 10]
index = ['snail', 'pig', 'elephant',
'rabbit', 'giraffe', 'coyote', 'horse']
df = pd.DataFrame({'speed': speed,
'lifespan': lifespan,
'height': height}, index=index)
ax = df.plot.bar(rot=0)
plt.show()
In pure matplotlib, instead of using the width parameter to position the bars as you've done, you can adjust the x-values for your plot:
import numpy as np
import matplotlib.pyplot as plt
# Make some fake data:
n_series = 3
n_observations = 5
x = np.arange(n_observations)
data = np.random.random((n_observations,n_series))
# Plotting:
fig, ax = plt.subplots(figsize=(20,5))
# Determine bar widths
width_cluster = 0.7
width_bar = width_cluster/n_series
for n in range(n_series):
x_positions = x+(width_bar*n)-width_cluster/2
ax.bar(x_positions, data[:,n], width_bar, align='edge')
In your particular case, seaborn is probably a good option. You should (almost always) try keep your data in long-form so instead of three separate data frames for short, medium and long, it is much better practice to keep a single data frame and add a column that labels each row as short, medium or long. Use this new column as the hue parameter in Seaborn's barplot

Categories

Resources