Rolling average on a layered faceted chart in Altair - python

I successfully got layers to work in faceted charts and rolling average to work in layered charts. I now want to sort of combine the two i.e have a rolling average in a layered faceted chart.
Intuitively combining the two gives me an error -
Javascript Error: Cannot read property 'concat' of undefined
This usually means there's a typo in your chart specification. See the javascript console for the full traceback.
Code (gives the above error):
# Data Preparation
df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
idf = df[df['Country/Region'] == 'India']
idf = idf[df.columns[4:]]
idf = idf.T
idf = idf.reset_index()
idf.columns = ['day', 'case']
idf['country'] = 'india'
gdf = df[df['Country/Region'] == 'Germany']
gdf = gdf[df.columns[4:]]
gdf = gdf.T
gdf = gdf.reset_index()
gdf.columns = ['day', 'case']
gdf['country'] = 'germany'
fdf = pd.concat([idf,gdf])
# Charting
a = alt.Chart().mark_bar(opacity=0.5).encode(
x='day:T',
y='case:Q'
)
c = alt.Chart().mark_line().transform_window(
rolling_mean='mean(case:Q)',
frame=[-7, 0]
).encode(
x='day:T',
y='rolling_mean:Q'
)
alt.layer(a, c, data=fdf).facet(alt.Column('country', sort=alt.EncodingSortField('case', op='max', order='descending')))
If you remove the transform_window and replace y='rolling_mean:Q' with y='case:Q', you'd get a layered faceted chart. It is this chart on which I want a 7 day rolling average.

You should replace your window transform with this:
.transform_window(
rolling_mean='mean(case)',
frame=[-7, 0],
groupby=['country']
)
There were two issues with your original transform:
type shorthands are only used in encodings, never in transforms. When you wrote mean(case:Q), you were specifying a rolling mean of the field named "case:Q", which does not exist.
since you are faceting by country, you need to group by country when computing the rolling mean.
The result looks like this:

Try to use transform_window by sort=[{'field': 'date'}]
https://vega.github.io/vega-lite/docs/window.html#cumulative-frequency-distribution
Or:
https://altair-viz.github.io/gallery/scatter_marginal_hist.html
https://altair-viz.github.io/gallery/layered_chart_with_dual_axis.html#layered-chart-with-dual-axis
https://altair-viz.github.io/gallery/parallel_coordinates.html#parallel-coordinates-example
import altair as alt
from vega_datasets import data
source = data.iris()
alt.Chart(source).transform_window(
index='count()'
).transform_fold(
['petalLength', 'petalWidth', 'sepalLength', 'sepalWidth']
).mark_line().encode(
x='key:N',
y='value:Q',
color='species:N',
detail='index:N',
opacity=alt.value(0.5)
).properties(width=500)
https://altair-viz.github.io/user_guide/compound_charts.html?highlight=repeat#horizontal-concatenation
import altair as alt
from vega_datasets import data
iris = data.iris.url
chart1 = alt.Chart(iris).mark_point().encode(
x='petalLength:Q',
y='petalWidth:Q',
color='species:N'
).properties(
height=300,
width=300
)
chart2 = alt.Chart(iris).mark_bar().encode(
x='count()',
y=alt.Y('petalWidth:Q', bin=alt.Bin(maxbins=30)),
color='species:N'
).properties(
height=300,
width=100
)

Related

Combining string and aggregate function within bar chart label in Altair

I have made two interactive linked histograms with crossfiltering using Altair. I have created labels over each bar with the sum of the filtered items by using mark_text(). Below is an image of the linked charts:
I would like learn how to concatenate a string in the label so that it would look something like: 'error: 481.1'. This is the relevant code snippet:
crossfilter = alt.selection(type='interval', encodings=['x'])
tick_up = alt.Chart().transform_filter(crossfilter).mark_tick(color='black').encode(
x = x,
y = alt.Y(plus_error+':Q', aggregate='sum'))
text = tick_up.mark_text(
align='left',
baseline='middle',
dx=12
).encode(text='sum(error):Q')
When I try to change .encode(text='sum(error):Q') to something like .encode(text='"error": sum(error):Q'), it throws me an error, is there a simple way to do this? I have also tried using transform_calculate, but I need something adapts to what is being selected by crossfilter.
I would also like to learn how to create a label with multiple lines, for example:
curr_error: 123.1
previous_error: 110
You can use + in transform_calculate to concatenate a string and a number. I believe you will also need transform_joinaggregate to have your sum accessible in the calculation transformation. Something like this:
import altair as alt
from vega_datasets import data
source = data.cars()
chart = alt.Chart(source).mark_text(align='left', dx=2).encode(
x='sum(Horsepower)',
y='Origin',
text='label:N'
).transform_joinaggregate(
sum_hp = 'sum(Horsepower)',
groupby=['Origin']
).transform_calculate(
label = "'error: ' + datum.sum_hp"
)
chart.mark_bar() + chart
This also works with a selection, but make sure the transform_filter is before the other transforms so that only the selected data points are used for the calculations:
import altair as alt
from vega_datasets import data
source = data.cars()
brush = alt.selection_interval()
scatter = alt.Chart(source).mark_point().encode(
x='Horsepower',
y='Weight_in_lbs',
color='Origin'
).add_selection(
brush
)
bars = alt.Chart(source).mark_text(align='left', dx=2).encode(
x='sum(Horsepower)',
y='Origin',
text='label:N'
).transform_filter(
brush
).transform_joinaggregate(
sum_hp = 'sum(Horsepower)',
groupby=['Origin']
).transform_calculate(
label = "'error: ' + datum.sum_hp"
)
scatter & bars.mark_bar() + bars

Altair interval selection in concatenated charts with 'density', 'aggregate' and 'calculate' transforms

I have two concatenated charts built on the same DF. The left one shows a density transform of one data column, the right one shows a scatter plot of aggregates of other data columns.
I would like to do an interval selection on the left side and filter transform the right side accordingly. No matter what I select, however, the right side loses all data points.
Can anyone see what I am doing wrong here?
import altair as alt
from vega_datasets import data
source = data.iris()
brush = alt.selection(type='interval', encodings=['x'])
PDFs = alt.Chart(source
).transform_density(
'sepalWidth',
as_=['size','density'],
groupby=['species']
).mark_line().encode(
x='size:Q',
y='density:Q',
color='species'
).add_selection(
brush
)
Scatter = alt.Chart(source
).transform_aggregate(
Frequency = 'count()',
petalL_mean = 'mean(petalLength)',
petalW_mean = 'mean(petalWidth)',
sepalL_mean = 'mean(sepalLength)',
groupby = ['species']
).transform_calculate(
Value = 'datum.Frequency / (datum.petalL_mean * datum.petalW_mean)'
).mark_point().encode(
x = 'sepalL_mean:Q',
y = 'Value:Q',
color='species'
).transform_filter(
brush
)
PDFs | Scatter
Interval selection cannot be used for aggregate charts yet in Vega-Lite. The error behavior have been updated in a recent PR to Vega-Lite to show a helpful message.

Merge two legends in altair

I have a scatter plot in altair where I am representing a column using both shape and color. I would like to have a single legend with both pieces of information, but instead I am getting two legends, one for shape and another for color.
The code is as follows. See this notebook for a reproducible example (you will need to enter your google credentials to load the data).
import altair as alt
alt.themes.enable('fivethirtyeight')
selection = alt.selection_multi(fields=['Domain'], bind='legend')
chart = alt.Chart(df, width=1100, height=600,
title="Parameter count of ML systems through time")\
.mark_point(size=120, filled=False).encode(
x=alt.X('Publication date:T'),
y=alt.Y('Parameters:Q',
scale=alt.Scale(type='log', domain=(1, 3e13)),
axis=alt.Axis(format=".1e")),
color=alt.Color('Domain',
sort=['Vision', 'Language', 'Games', 'Other'],
legend=alt.Legend(
values = ['Vision', 'Language', 'Games', 'Other'],),),
shape = alt.Shape('Domain'),#, legend=None),
tooltip=['System',
'Reference',
'Publication date',
alt.Tooltip('Parameters', format=".1e"),
'Domain'],
opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
)
regression = chart.transform_regression(
on="Publication date",
regression="Parameters",
method = 'exp',
groupby=["Domain"],
).mark_line(point=False, strokeDash=[10,5], clip=True)
alt.layer(chart.add_selection(selection), regression).configure_axis(
labelFontSize=20,titleFontSize=30).configure_legend(
titleFontSize=20,
labelFontSize =18,
gradientLength=400,
gradientThickness=30,
symbolSize = 130,
)
How can I merge both legends into a single one?
You can set the legend to None in the line chart for shape and color and then use resolve_scale as per the comments on the question:
import altair as alt
from vega_datasets import data
df = data.cars()
selection = alt.selection_multi(fields=['Origin'], bind='legend')
chart = alt.Chart(df).mark_point(filled=False).encode(
x=alt.X('Acceleration'),
y=alt.Y('Horsepower',scale=alt.Scale(type='log'), axis=alt.Axis(format=".1e")),
color='Origin',
shape='Origin',
opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
)
regression = chart.transform_regression(
on="Acceleration", regression="Horsepower", groupby=["Origin"]
).mark_line(
).encode(color=alt.Color('Origin', legend=None), shape=alt.Shape('Origin', legend=None))
(alt.layer(chart, regression)
.resolve_scale(shape='independent', color='independent')
.add_selection(selection))

Altair mark_text labels where domain in x-axis is linked

I was trying to give text labels on some altair chart linked to a selected interval from another chart. I realize that the text given by "mark_text ()" doesn't show completely at the last points of the chart where the domain in the x-axis is specified to be the interval selected, also I didn't know how to specify the format so the dates will be given just as yyyy-mm or month-year (don't want to display the day).
Another thing that I realized, is when one specifies the tooltip doesn't show at all when the domain on the x-axis of the graph is also linked to an interval selected in another chart
, that's the reason I used the mark_text()
the code I'm using is the following
import altair as alt
from vega_datasets import data
nearest = alt.selection_single(nearest=True, on='mouseover',
encodings=['x','y'], empty='none')
interval = alt.selection_interval(encodings=['x'])
weather = data.seattle_weather()
base = alt.Chart(weather).mark_rule(size=2).encode(
x='date:T')
chart = base.mark_line().encode(
x=alt.X('date:T', scale=alt.Scale(domain=interval.ref())),
y='temp_max:Q',).properties(
width=800,
height=300)
text=base.mark_text(align='left', dx=5, dy=5).encode(
y='temp_max:Q',
text=alt.condition(nearest, 'label:N', alt.value(' '))
).transform_calculate(label='"Date: " + format(datum.date, "") '
).properties(selection=nearest,width=800,
height=300)
point=base.mark_point().encode(y='temp_max:Q',opacity=alt.condition(nearest, alt.value(1), alt.value(0)))
view = base.mark_line().add_selection(
interval).properties(width=800, height=20)
(point+text+chart) &view
It looks like you're trying to create a tooltip using a layer, and this is the cause of many of the problems you're having. Have you considered using the tooltip encoding?
import altair as alt
from vega_datasets import data
nearest = alt.selection_single(nearest=True, on='mouseover',
encodings=['x','y'], empty='none')
interval = alt.selection_interval(encodings=['x'])
weather = data.seattle_weather()
line = alt.Chart(weather).mark_line().encode(
x=alt.X('date:T', scale=alt.Scale(domain=interval)),
y='temp_max:Q'
).properties(
width=800,
height=200
)
point = line.mark_point().encode(
tooltip='yearmonth(date):N',
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)
view = alt.Chart(weather).mark_line().encode(
x='date:T',
).properties(
width=800,
height=20
).add_selection(interval)
(point + line) & view

Using color on bar chart with Altair seems to prevent zero=False on scale from having anticipated effect

The first chart from the below code (based on this: https://altair-viz.github.io/gallery/us_population_over_time_facet.html) seems to force Y-axis to not begin at zero, as anticipated. But the second chart, which includes a color in the encoding, seems to make the zero=False in alt.Scale no longer respected
Edit: forgot to mention using Altair 4.1.0
import altair as alt
from vega_datasets import data
import pandas as pd
source = data.population.url
df = pd.read_json(source)
df = df[df["age"] <= 40]
alt.Chart(df).mark_bar().encode(
x="age:O",
y=alt.Y(
"sum(people):Q",
title="Population",
axis=alt.Axis(format="~s"),
scale=alt.Scale(zero=False),
),
facet=alt.Facet("year:O", columns=5),
).resolve_scale(y="independent").properties(
title="US Age Distribution By Year", width=90, height=80
)
alt.Chart(df).mark_bar().encode(
x="age:O",
y=alt.Y(
"sum(people):Q",
title="Population",
axis=alt.Axis(format="~s"),
scale=alt.Scale(zero=False),
),
facet=alt.Facet("year:O", columns=5),
color=alt.Color("year"),
).resolve_scale(y="independent").properties(
title="US Age Distribution By Year", width=90, height=80
)
This happens because the scales are automatically adjusted to show all the groups in the variable you are coloring by. It is easier to understand if we look at a single barplot with stacked colors:
import altair as alt
from vega_datasets import data
import pandas as pd
source = data.population.url
df = pd.read_json(source)
df = df[df["age"] <= 40]
alt.Chart(df.query('year < 1880')).mark_bar().encode(
x="age:O",
y=alt.Y(
"sum(people):Q",
axis=alt.Axis(format="~s"),
scale=alt.Scale(zero=False)),
color=alt.Color("year"))
You are calculating the sum, which means that all the years are going to be somewhere in that bar stacked on top of each other. Altair / Vega-Lite expands the axis so that includes all groups in your colored variable.
If you instead would color by age, the axis would again expand to include all the colored group, but because they are now not at the bottom of each bar, the axis is cut above zero.
import altair as alt
from vega_datasets import data
import pandas as pd
source = data.population.url
df = pd.read_json(source)
df = df[df["age"] <= 40]
alt.Chart(df.query('year < 1880')).mark_bar().encode(
x="age:O",
y=alt.Y(
"sum(people):Q",
axis=alt.Axis(format="~s"),
scale=alt.Scale(zero=False)),
color=alt.Color("age"))
The only discrepancy is why doesn't it just show the tip of the darkest color in the first plot and cut around 2M? I am not sure about that on the top of my head.

Categories

Resources