I have the code below. But the 'selection' does not work as expected. When a point is selected, the other lines are deselected. But the points for the selected line also hide/disappear.
I must be doing something wrong. Is this the proper way to add a selection to a layered chart?
import altair as alt
from vega_datasets import data
source = data.stocks()
selection1 = alt.selection_single()
line = alt.Chart(source).mark_line().encode(
x='date',
y='price',
#color= 'symbol',
color=alt.condition(selection1, 'symbol', alt.value('grey')),
opacity=alt.condition(selection1, alt.value(0.8), alt.value(0.1)),
)
point = line.mark_point(size = 40, fill='white')
alt.layer(line, point).add_selection(selection1)
By default, the selection selects only the data directly associated with the mark you click on. If you want it to apply to a larger set of data, you can specify fields or encodings. In your case, it sounds like you want it to apply to all data with the same symbol, so you can do this:
selection1 = alt.selection_single(fields=['symbol'])
or, since your symbol maps to color in all cases, equivalently you can do this:
selection1 = alt.selection_single(encodings=['color'])
Related
Is there a way to access the label/title of a datum when using it in a Repeat?
I would like to change the Y-axis title by applying a custom function to the default label (simple example below).
I know I can change the columns in the dataframe or do a manual loop to get the result, but I wonder if there is a direct way using repeat.
import altair as alt
from vega_datasets import data
def replace(label):
label = str(label)
return label.replace('_', ' ')
source = data.cars()
alt.Chart(source).mark_circle(size=60).encode(
x='Horsepower',
y=alt.Y(alt.repeat('repeat'), type='quantitative', title=replace(alt.datum.label)),
color='Origin',
).repeat(
repeat=['Miles_per_Gallon', 'Weight_in_lbs'],
)
becomes:
I would like to make the mark_rule (significance level) to be adjustable. I have tried to do it using user input code and change the value in the rule from 0.05 to 'user input' but the chart turned out weird.
There are two things that I would like to ask for help with:
Make the mark_rule change through user input (top priority)
Make the color of the bars (factors) below the mark_rule change (optional)
I have tried many codes in this, by far, I can only make the mark_rule move using mouseover but it is not exactly what I want to do.
Any help would be very much appreciated.
import pandas as pd
import altair as alt
Sheet2 = 'P-value'
df = pd.read_excel('Life Expectancy Data- Clean.xlsx', sheet_name=Sheet2)
highlight = alt.selection(type='single', on='mouseover',
fields=['Factor'], nearest=True, empty="none")
bar = alt.Chart(df).mark_bar(strokeWidth=5, stroke="steelblue", strokeOpacity=0.1).encode(
x = alt.X('Factor:O', sort='y'),
y = alt.Y('P-value:Q'),
tooltip = [alt.Tooltip('Factor:O'),alt.Tooltip('P-value:Q',format='.4f')],
color= alt.condition(
highlight,
alt.value("orange"),
alt.value("steelblue"))
).add_selection(
highlight
)
rule = alt.Chart(pd.DataFrame({'y': [0.05]})).mark_rule(color='red').encode(y='y')
alt.layer(
bar, rule
).properties(
title='Factors that Contribute to Life Expectancy in Malaysia',
width=500, height=300
)
Current graph
Building upon the example in the Altair docs, you could do something like this, which gives you a slider that controls the position of the rule and highlights the bars in different colors depending on if they are above or below the slider value:
import altair as alt
import pandas as pd
import numpy as np
rand = np.random.RandomState(42)
df = pd.DataFrame({
'xval': range(10),
'yval': rand.randn(10).cumsum()
})
slider = alt.binding_range(min=0, max=5, step=0.5, name='cutoff:')
selector = alt.selection_single(name="SelectorName", bind=slider, init={'cutoff': 2.5})
rule = alt.Chart().mark_rule().transform_calculate(
rule='SelectorName.cutoff'
).encode(
# Take the mean to avoid creating multiple lines on top of eachother
y='mean(rule):Q',
)
bars = alt.Chart(df).mark_bar().encode(
x='xval:O',
y='yval',
color=alt.condition(
alt.datum.yval < selector.cutoff,
alt.value('coral'), alt.value('steelblue')
)
).add_selection(
selector
)
bars + rule
I have made two interactive linked histograms with crossfiltering using Altair. I have created labels over each bar with the sum of the filtered items by using mark_text(). Below is an image of the linked charts:
I would like learn how to concatenate a string in the label so that it would look something like: 'error: 481.1'. This is the relevant code snippet:
crossfilter = alt.selection(type='interval', encodings=['x'])
tick_up = alt.Chart().transform_filter(crossfilter).mark_tick(color='black').encode(
x = x,
y = alt.Y(plus_error+':Q', aggregate='sum'))
text = tick_up.mark_text(
align='left',
baseline='middle',
dx=12
).encode(text='sum(error):Q')
When I try to change .encode(text='sum(error):Q') to something like .encode(text='"error": sum(error):Q'), it throws me an error, is there a simple way to do this? I have also tried using transform_calculate, but I need something adapts to what is being selected by crossfilter.
I would also like to learn how to create a label with multiple lines, for example:
curr_error: 123.1
previous_error: 110
You can use + in transform_calculate to concatenate a string and a number. I believe you will also need transform_joinaggregate to have your sum accessible in the calculation transformation. Something like this:
import altair as alt
from vega_datasets import data
source = data.cars()
chart = alt.Chart(source).mark_text(align='left', dx=2).encode(
x='sum(Horsepower)',
y='Origin',
text='label:N'
).transform_joinaggregate(
sum_hp = 'sum(Horsepower)',
groupby=['Origin']
).transform_calculate(
label = "'error: ' + datum.sum_hp"
)
chart.mark_bar() + chart
This also works with a selection, but make sure the transform_filter is before the other transforms so that only the selected data points are used for the calculations:
import altair as alt
from vega_datasets import data
source = data.cars()
brush = alt.selection_interval()
scatter = alt.Chart(source).mark_point().encode(
x='Horsepower',
y='Weight_in_lbs',
color='Origin'
).add_selection(
brush
)
bars = alt.Chart(source).mark_text(align='left', dx=2).encode(
x='sum(Horsepower)',
y='Origin',
text='label:N'
).transform_filter(
brush
).transform_joinaggregate(
sum_hp = 'sum(Horsepower)',
groupby=['Origin']
).transform_calculate(
label = "'error: ' + datum.sum_hp"
)
scatter & bars.mark_bar() + bars
I have two concatenated charts built on the same DF. The left one shows a density transform of one data column, the right one shows a scatter plot of aggregates of other data columns.
I would like to do an interval selection on the left side and filter transform the right side accordingly. No matter what I select, however, the right side loses all data points.
Can anyone see what I am doing wrong here?
import altair as alt
from vega_datasets import data
source = data.iris()
brush = alt.selection(type='interval', encodings=['x'])
PDFs = alt.Chart(source
).transform_density(
'sepalWidth',
as_=['size','density'],
groupby=['species']
).mark_line().encode(
x='size:Q',
y='density:Q',
color='species'
).add_selection(
brush
)
Scatter = alt.Chart(source
).transform_aggregate(
Frequency = 'count()',
petalL_mean = 'mean(petalLength)',
petalW_mean = 'mean(petalWidth)',
sepalL_mean = 'mean(sepalLength)',
groupby = ['species']
).transform_calculate(
Value = 'datum.Frequency / (datum.petalL_mean * datum.petalW_mean)'
).mark_point().encode(
x = 'sepalL_mean:Q',
y = 'Value:Q',
color='species'
).transform_filter(
brush
)
PDFs | Scatter
Interval selection cannot be used for aggregate charts yet in Vega-Lite. The error behavior have been updated in a recent PR to Vega-Lite to show a helpful message.
Is is possible to format the values within a tooltip for a boxplot? From this Vega documentation, it appears so, but I can't quite figure out how to do it with Altair for python
from vega_datasets import data
import altair as alt
source = data.population.url
alt.Chart(source).mark_boxplot().encode(
alt.X("age:O"),
alt.Y("people:Q"),
tooltip=[
alt.Tooltip("people:Q", format=",.2f"),
],
)
I believe you need to provide an aggregation for composite marks like mark_boxplot. This works:
from vega_datasets import data
import altair as alt
source = data.population.url
alt.Chart(source).mark_boxplot().encode(
alt.X("age:O"),
alt.Y("people:Q"),
tooltip=alt.Tooltip("mean(people):Q", format=",.2f"),)
Update: As it is currently impossible to add multiple aggregated tooltips to a boxplot, I combined my answer with How to change Altair boxplot infobox to display mean rather than median? to put a transparent box with a custom tooltip on top of the boxplot. I still kept the boxplot underneath in order to have the outliers and whiskers plotted as a Tukey boxplot instead of min-max. I also added a point for the mean, since this is what I wanted to see in the tooltip:
alt.Chart(source).mark_boxplot(median={'color': '#353535'}).encode(
alt.X("age:O"),
alt.Y("people:Q"),
tooltip=[
alt.Tooltip("people:Q", format=",.2f"),
],
) + alt.Chart(source).mark_circle(color='#353535', size=15).encode(
x='age:O',
y='mean(people):Q'
) + alt.Chart(source).transform_aggregate(
min="min(people)",
max="max(people)",
mean="mean(people)",
median="median(people)",
q1="q1(people)",
q3="q3(people)",
count="count()",
groupby=['age']
).mark_bar(opacity=0).encode(
x='age:O',
y='q1:Q',
y2='q3:Q',
tooltip=alt.Tooltip(['min:Q', 'q1:Q', 'mean:Q', 'median:Q', 'q3:Q', 'max:Q', 'count:Q'], format='.1f')
)
There is a way to add multiple columns to the tooltip. You can pass in multiple columns in square brackets as a list.
import altair as alt
from vega_datasets import data
stocks = data.stocks()
alt.Chart(stocks).mark_point().transform_window(
previous_price = 'lag(price)'
).transform_calculate(
pct_change = '(datum.price - datum.previous_price) / datum.previous_price'
).encode(
x='date',
y='price',
color='symbol',
tooltip=[ 'price', 'symbol', alt.Tooltip('pct_change:Q', format='.1%')]
)