I have made two interactive linked histograms with crossfiltering using Altair. I have created labels over each bar with the sum of the filtered items by using mark_text(). Below is an image of the linked charts:
I would like learn how to concatenate a string in the label so that it would look something like: 'error: 481.1'. This is the relevant code snippet:
crossfilter = alt.selection(type='interval', encodings=['x'])
tick_up = alt.Chart().transform_filter(crossfilter).mark_tick(color='black').encode(
x = x,
y = alt.Y(plus_error+':Q', aggregate='sum'))
text = tick_up.mark_text(
align='left',
baseline='middle',
dx=12
).encode(text='sum(error):Q')
When I try to change .encode(text='sum(error):Q') to something like .encode(text='"error": sum(error):Q'), it throws me an error, is there a simple way to do this? I have also tried using transform_calculate, but I need something adapts to what is being selected by crossfilter.
I would also like to learn how to create a label with multiple lines, for example:
curr_error: 123.1
previous_error: 110
You can use + in transform_calculate to concatenate a string and a number. I believe you will also need transform_joinaggregate to have your sum accessible in the calculation transformation. Something like this:
import altair as alt
from vega_datasets import data
source = data.cars()
chart = alt.Chart(source).mark_text(align='left', dx=2).encode(
x='sum(Horsepower)',
y='Origin',
text='label:N'
).transform_joinaggregate(
sum_hp = 'sum(Horsepower)',
groupby=['Origin']
).transform_calculate(
label = "'error: ' + datum.sum_hp"
)
chart.mark_bar() + chart
This also works with a selection, but make sure the transform_filter is before the other transforms so that only the selected data points are used for the calculations:
import altair as alt
from vega_datasets import data
source = data.cars()
brush = alt.selection_interval()
scatter = alt.Chart(source).mark_point().encode(
x='Horsepower',
y='Weight_in_lbs',
color='Origin'
).add_selection(
brush
)
bars = alt.Chart(source).mark_text(align='left', dx=2).encode(
x='sum(Horsepower)',
y='Origin',
text='label:N'
).transform_filter(
brush
).transform_joinaggregate(
sum_hp = 'sum(Horsepower)',
groupby=['Origin']
).transform_calculate(
label = "'error: ' + datum.sum_hp"
)
scatter & bars.mark_bar() + bars
Related
Is there a way to access the label/title of a datum when using it in a Repeat?
I would like to change the Y-axis title by applying a custom function to the default label (simple example below).
I know I can change the columns in the dataframe or do a manual loop to get the result, but I wonder if there is a direct way using repeat.
import altair as alt
from vega_datasets import data
def replace(label):
label = str(label)
return label.replace('_', ' ')
source = data.cars()
alt.Chart(source).mark_circle(size=60).encode(
x='Horsepower',
y=alt.Y(alt.repeat('repeat'), type='quantitative', title=replace(alt.datum.label)),
color='Origin',
).repeat(
repeat=['Miles_per_Gallon', 'Weight_in_lbs'],
)
becomes:
I would like to make the mark_rule (significance level) to be adjustable. I have tried to do it using user input code and change the value in the rule from 0.05 to 'user input' but the chart turned out weird.
There are two things that I would like to ask for help with:
Make the mark_rule change through user input (top priority)
Make the color of the bars (factors) below the mark_rule change (optional)
I have tried many codes in this, by far, I can only make the mark_rule move using mouseover but it is not exactly what I want to do.
Any help would be very much appreciated.
import pandas as pd
import altair as alt
Sheet2 = 'P-value'
df = pd.read_excel('Life Expectancy Data- Clean.xlsx', sheet_name=Sheet2)
highlight = alt.selection(type='single', on='mouseover',
fields=['Factor'], nearest=True, empty="none")
bar = alt.Chart(df).mark_bar(strokeWidth=5, stroke="steelblue", strokeOpacity=0.1).encode(
x = alt.X('Factor:O', sort='y'),
y = alt.Y('P-value:Q'),
tooltip = [alt.Tooltip('Factor:O'),alt.Tooltip('P-value:Q',format='.4f')],
color= alt.condition(
highlight,
alt.value("orange"),
alt.value("steelblue"))
).add_selection(
highlight
)
rule = alt.Chart(pd.DataFrame({'y': [0.05]})).mark_rule(color='red').encode(y='y')
alt.layer(
bar, rule
).properties(
title='Factors that Contribute to Life Expectancy in Malaysia',
width=500, height=300
)
Current graph
Building upon the example in the Altair docs, you could do something like this, which gives you a slider that controls the position of the rule and highlights the bars in different colors depending on if they are above or below the slider value:
import altair as alt
import pandas as pd
import numpy as np
rand = np.random.RandomState(42)
df = pd.DataFrame({
'xval': range(10),
'yval': rand.randn(10).cumsum()
})
slider = alt.binding_range(min=0, max=5, step=0.5, name='cutoff:')
selector = alt.selection_single(name="SelectorName", bind=slider, init={'cutoff': 2.5})
rule = alt.Chart().mark_rule().transform_calculate(
rule='SelectorName.cutoff'
).encode(
# Take the mean to avoid creating multiple lines on top of eachother
y='mean(rule):Q',
)
bars = alt.Chart(df).mark_bar().encode(
x='xval:O',
y='yval',
color=alt.condition(
alt.datum.yval < selector.cutoff,
alt.value('coral'), alt.value('steelblue')
)
).add_selection(
selector
)
bars + rule
I have two concatenated charts built on the same DF. The left one shows a density transform of one data column, the right one shows a scatter plot of aggregates of other data columns.
I would like to do an interval selection on the left side and filter transform the right side accordingly. No matter what I select, however, the right side loses all data points.
Can anyone see what I am doing wrong here?
import altair as alt
from vega_datasets import data
source = data.iris()
brush = alt.selection(type='interval', encodings=['x'])
PDFs = alt.Chart(source
).transform_density(
'sepalWidth',
as_=['size','density'],
groupby=['species']
).mark_line().encode(
x='size:Q',
y='density:Q',
color='species'
).add_selection(
brush
)
Scatter = alt.Chart(source
).transform_aggregate(
Frequency = 'count()',
petalL_mean = 'mean(petalLength)',
petalW_mean = 'mean(petalWidth)',
sepalL_mean = 'mean(sepalLength)',
groupby = ['species']
).transform_calculate(
Value = 'datum.Frequency / (datum.petalL_mean * datum.petalW_mean)'
).mark_point().encode(
x = 'sepalL_mean:Q',
y = 'Value:Q',
color='species'
).transform_filter(
brush
)
PDFs | Scatter
Interval selection cannot be used for aggregate charts yet in Vega-Lite. The error behavior have been updated in a recent PR to Vega-Lite to show a helpful message.
Is is possible to format the values within a tooltip for a boxplot? From this Vega documentation, it appears so, but I can't quite figure out how to do it with Altair for python
from vega_datasets import data
import altair as alt
source = data.population.url
alt.Chart(source).mark_boxplot().encode(
alt.X("age:O"),
alt.Y("people:Q"),
tooltip=[
alt.Tooltip("people:Q", format=",.2f"),
],
)
I believe you need to provide an aggregation for composite marks like mark_boxplot. This works:
from vega_datasets import data
import altair as alt
source = data.population.url
alt.Chart(source).mark_boxplot().encode(
alt.X("age:O"),
alt.Y("people:Q"),
tooltip=alt.Tooltip("mean(people):Q", format=",.2f"),)
Update: As it is currently impossible to add multiple aggregated tooltips to a boxplot, I combined my answer with How to change Altair boxplot infobox to display mean rather than median? to put a transparent box with a custom tooltip on top of the boxplot. I still kept the boxplot underneath in order to have the outliers and whiskers plotted as a Tukey boxplot instead of min-max. I also added a point for the mean, since this is what I wanted to see in the tooltip:
alt.Chart(source).mark_boxplot(median={'color': '#353535'}).encode(
alt.X("age:O"),
alt.Y("people:Q"),
tooltip=[
alt.Tooltip("people:Q", format=",.2f"),
],
) + alt.Chart(source).mark_circle(color='#353535', size=15).encode(
x='age:O',
y='mean(people):Q'
) + alt.Chart(source).transform_aggregate(
min="min(people)",
max="max(people)",
mean="mean(people)",
median="median(people)",
q1="q1(people)",
q3="q3(people)",
count="count()",
groupby=['age']
).mark_bar(opacity=0).encode(
x='age:O',
y='q1:Q',
y2='q3:Q',
tooltip=alt.Tooltip(['min:Q', 'q1:Q', 'mean:Q', 'median:Q', 'q3:Q', 'max:Q', 'count:Q'], format='.1f')
)
There is a way to add multiple columns to the tooltip. You can pass in multiple columns in square brackets as a list.
import altair as alt
from vega_datasets import data
stocks = data.stocks()
alt.Chart(stocks).mark_point().transform_window(
previous_price = 'lag(price)'
).transform_calculate(
pct_change = '(datum.price - datum.previous_price) / datum.previous_price'
).encode(
x='date',
y='price',
color='symbol',
tooltip=[ 'price', 'symbol', alt.Tooltip('pct_change:Q', format='.1%')]
)
Can Altair plot bands on the y axis, similar to this Highcharts example?
The docs have an example showing how to draw a line on the y axis, but adapting the example to use plot_rect to draw a band instead doesn't quite work:
import altair as alt
from vega_datasets import data
weather = data.seattle_weather.url
chart = alt.Chart(weather).encode(
alt.X("date:T")
)
bars = chart.mark_bar().encode(
y='precipitation:Q'
)
band = chart.mark_rect().encode(
y=alt.value(20),
y2=alt.value(50),
color=alt.value('firebrick')
)
alt.layer(bars, band)
I think the problem when you give a value with alt.value is that you specify the value in pixels starting from the top of the graph : it is not mapped to the data.
In the initial answer, with mark_rule, it would'nt create a clean band but a lot of vertical stripes, so here is a way to correctly plot a band.
First solution is to create a brand new data frame for the band, and layer that on top of the bars:
import altair as alt
import pandas as pd
from vega_datasets import data
weather = data('seattle_weather')
band_df = pd.DataFrame([{'x_min': weather.date.min(),
'x_max': weather.date.max(),
'y_min': 20,
'y_max': 50}])
bars = alt.Chart(weather).mark_bar().encode(
x=alt.X('date:T'),
y=alt.Y('precipitation:Q', title="Precipitation")
)
band_2 = alt.Chart(band_df).mark_rect(color='firebrick', opacity=0.3).encode(
x='x_min:T',
x2='x_max:T',
y='y_min:Q',
y2='y_max:Q'
)
alt.layer(bars, band_2)
Second option, if you do not want/cannot create a dataframe, is to use transform_calculate, and manually specify x and x2 in the band chart:
bars = alt.Chart().mark_bar().encode(
x=alt.X('date:T', title='Date'),
y=alt.Y('precipitation:Q', title="Precipitation")
)
band_3 = alt.Chart().mark_rect(color='firebrick', opacity=0.3).encode(
x='min(date):T',
x2='max(date):T',
y='y_min:Q',
y2='y_max:Q'
).transform_calculate(y_min='20', y_max='50')
alt.layer(bars, band_3, data=data.seattle_weather.url)
Initial answer
I would do 2 things to mimic the highchart example you gave. First, use a transform_calculate to set y_min and y_max values. And second, I'll use mark_rule so that the band span on the X axis where there are values. (I also added some opacity and changed the order of the layers so that the band is behind the bars.)
import altair as alt
from vega_datasets import data
weather = data.seattle_weather.url
chart = alt.Chart().encode(
alt.X("date:T")
)
bars = chart.mark_bar().encode(
y='precipitation:Q'
)
band = chart.mark_rule(color='firebrick',
opacity=0.3).encode(
y=alt.Y('y_min:Q'),
y2=alt.Y('y_max:Q')
).transform_calculate(y_min="20",
y_max="50")
alt.layer(band, bars, data=weather)