altair dynamic combination of selection conditions - python

I'm trying to create a chart where it is possible to select combinations of different columns of the data by toggling checkboxes. However I don't always want to display the checkboxes for all the columns. So I want to add the selections to the chart in a 'dynamic' way.
The thing I want to accomplish is that I want to make a pre-selection of which categories I want to visualize (this is done before the altair chart is created). These categories are then added as checkboxes in altair. However the only way I could find to do this is by adding them in a hardcoded way like the "sel1[0] & sel1[1] & sel1[2] & sel1[3] & sel1[4]" in the code below:
sel1 = [
alt.selection_single(
bind=alt.binding_checkbox(name=field),
fields=[field],
init={field: False}
)
for field in category_selection
]
transform_args = {str(col): f'toBoolean(datum.{col})' for col in category_selection}
alt.Chart(df1).transform_calculate(**transform_args).mark_point(filled=True).encode(
x='dim1',
y='dim2',
opacity=alt.condition(
sel1[0] & sel1[1] & sel1[2] & sel1[3] & sel1[4],
alt.value(1), alt.value(0)
)
).add_selection(
*sel1
)
I have tried doing it like this:
alt.Chart(df1).transform_calculate(**transform_args).mark_point(filled=True).encode(
x='dim1',
y='dim2',
opacity=alt.condition(
{'and': sel[:2]},
alt.value(1), alt.value(0)
)
).add_selection(
*sel1[:2]
)
But that does not work.
I can't seem to figure out how to achieve something like this with altair. Could someone provide an example on how to do this with checkboxes or help me find another method to achieve the same thing?
TLDR: I basically want to support a variable amount of categories that also supports the ability to create combinations of the categories.
EDIT: Tried to make it more clear what I'm trying achieve with the code.

It sounds like you want to write the equivalent of this without knowing the length of sel:
sel = [alt.selection_single() for i in range(3)]
combined = sel[0] & sel[1] & sel[2]
For Python operators in general, you can do so like this:
import operator
import functools
combined = functools.reduce(operator.and_, sel)
In Altair, you can alternatively construct the resulting Vega-Lite specification directly:
combined = {"selection": {"and": [s.name for s in sel]}}
Any of these three approaches should lead to identical results when used within an Altair chart.

Related

Altair interactive filter on multiple values of one column

Here is the code of my plot.
input_dropdown = alt.binding_select(options=[None]+all_ids, name='Series ID', labels=["All"]+all_ids)
selection = alt.selection_single(fields=['ID'], bind=input_dropdown)
chart = alt.Chart(source_df).mark_line().encode(
x=alt.X('Date:T', title='Date'),
y=alt.Y('Value:Q', title='Value'),
color = alt.Color('ID:N', title='Series ID'),
strokeDash='Type:N'
).properties(
width=700
).add_selection(
selection
).transform_filter(
selection
)
st.altair_chart(chart)
Currently I can filter data displayed by choosing one value of ID column.
What should I do to filter by multiple ID values?
Smth like, show me the data for both ids '1' and '2'.
This is currently not possible via a widget like a dropdown because it is not implemented in the underlying Vega and Vega-Lite libraries. You could another chart as a selection element, or maybe use a streamlit component since it looks like your code is using that library already.

Display additional values in holoviews sankey labels or hover information

I would like to find a way to modify the labels on holoviews sankey diagrams that they show, in addition to the numerical values, also the percentage values.
For example:
import holoviews as hv
import pandas as pd
hv.extension('bokeh')
data = {'A':['XX','XY','YY','XY','XX','XX'],
'B':['RR','KK','KK','RR','RK','KK'],
'values':[10,5,8,15,19,1]}
df = pd.DataFrame(data, columns=['A','B','values'])
sankey = hv.Sankey(df)
For 'From' label 'YY' which is 'YY - 8' change this to 'YY - 8 (13.7%)' - add the additional percentage in there.
I have found ways to change from the absolute value to percentage by using something along the lines of:
value_dim = hv.Dimension('Percentage', unit='%')
But can't find a way to have both values in the label.
Additionally, I tried to modify the hover tag. In my search to find ways to modify this I found ways to reference and display various attributes in the hover information (through the bokeh tooltips) but it does not seem like you can manipulate this information.
In this post two possible ways are explained how to achive the wanted result. Let's start with the example DataFrame and the necessary imports.
import holoviews as hv
from holoviews import opts, dim # only needed for 2. solution
import pandas as pd
data = {'A':['XX','XY','YY','XY','XX','XX'],
'B':['RR','KK','KK','RR','RK','KK'],
'values':[10,5,8,15,19,1],
}
df = pd.DataFrame(data)
1. Option
Use hv.Dimension(spec, **params), which gives you the opportunity to apply a formatter with the keyword value_format to a column name. This formatter is simple the combination of the value and the value in percent.
total = df.groupby('A', sort=False)['values'].sum().sum()
def fmt(x):
return f'{x} ({round(x/total,2)}%)'
hv.Sankey(df, vdims = hv.Dimension('values', value_format=fmt))
2. Option
Extend the DataFrame df by one column wich stores the labels, you want to use. This can be later reused inside the Sankey, with opts(labels=dim('labels')). To ckeck if the calculations are correct, you can turn show_values on, but this will cause a duplicate inside the labels. Therefor in the final solution show_values is set to False. This can be sometime tricky to find the correct order.
labels = []
for item in ['A', 'B']:
grouper = df.groupby(item, sort=False)['values']
total_sum = grouper.sum().sum()
for name, group in grouper:
_sum = group.sum()
_percent = round(_sum/total_sum,2)
labels.append(f'{name} - {_sum} ({_percent}%)')
df['labels'] = labels
hv.Sankey(df).opts(show_values=False, labels=dim('labels'))
The downside of this solution is, that we apply a groupby for both columns 'A' and 'B'. This is something holoviews will do, too. So this is not very efficient.
Output
Comment
Both solutions create nearly the same figure, except that the HoverTool is not equal.

Using Pandas Styling (DataFrame.style property) to iterate through product prices

I have a Pandas dataframe that contains data of prices of various products as taken on different dates, the columns are ‘date’, ‘product’, ‘price’.
My goal is to highlight the price cell where there has been a price reduction for that particular product. Much like this example .csv seen below…
An example .csv showing what I want to achieve using Pandas Styling
I understand that each product will need to be separated and then the prices of that product evaluated in pairs. I have used the following code in another part of the script to successfully achieve this:
integer = 0
for iteration in range(iterations):
first_price_pair = one_product.iloc[integer,2]
integer=integer+1
second_price_pair = one_product.iloc[integer,2]
# one_product is selected by using .drop_duplicates() on 'product'
price_dif = first_price_pair - second_price_pair
if second_price_pair < first_price_pair:
# highlight cell green - INDICATES PRICE REDUCTION FROM PREV PRICE
elif second_price_pair == first_price_pair:
# no change to cell colour
elif second_price_pair > first_price_pair:
# highlight cell RED - INDICATES PRICE INCREASE FROM PREV PRICE
My problem is when I attempt to use - DataFrame.style - for applying the highlighting. It appears that once ‘styling’ has been applied to the DF, the DF is then converted to type: pandas.io.formats.style.Styler - and that this can then not be modified.
I’d appreciate it if someone can confirm it is possible to achieve what I’m trying to do and if so, give me some guidance on how to achieve it.
Thank you!
To apply highlights you might want to use either:
Styler.applymap()
Styler.apply()
The difference between the two lies in the way you want to select the elements as applymap() works elementwise and apply() works with column-/row-/table-wise.
Both methods require a function to generate the CSS attributes you want to change.
In your case if you put it in an if statement it might be something like this:
import pandas as pd
df = pd.DataFrame(np.random.randint(-4,4, size=(5,5)))
def background_cell(x, row_idx, col_idx, color):
b_color = 'background-color: green'
df_styler = pd.DataFrame('', index=x.index, columns=x.columns)
df_styler.iloc[row_idx, col_idx] = b_color
return df_styler
df.style.apply(background_cell, row_idx=1, col_idx=1, color='green', axis=None)
This is going to change the background of the cell [1,1]. You can call df.style.apply() with a different colour and the index of the cell you want to change.
I think you overwrote the Styler on the DataFrame variable by typing
df = df.style.apply(...)
that's why you lost it and couldn't modify it anymore.
The styling is a method you can use to show the DataFrame, so you should use it whenever you are printing it, although it won't be an attribute of the DataFrame itself.

Altair: Sorting faceted "text" chart not reflecting expectation

This is a direct follow up to Sorting based on the alt.Color field in Altair
using the same dataframe (that is included for ease of reference). I asked a follow up in the comments section, but after giving it a whirl on my own and getting close, I am creating a new question.
Unnamed: 0,Species,Unknown,group,LDA Score,p value
11,a,3.474929757,bad,3.07502591,5.67e-05
16,b,3.109308852,bad,2.739744898,0.000651725
31,c,3.16979865,bad,2.697247855,0.03310557
38,d,0.06730106400000001,bad,2.347746497,0.013009626000000002
56,e,2.788383183,good,2.223874347,0.0027407140000000004
65,f,2.644346144,bad,2.311106698,0.00541244
67,g,3.626001112,good,2.980960068,0.038597163
74,h,3.132399759,good,2.849798377,0.007021518000000001
117,i,3.192113412,good,2.861299028,8.19e-06
124,j,0.6140430960000001,bad,2.221483531,0.0022149739999999998
147,k,2.873671544,bad,2.390164757,0.002270102
184,l,3.003479213,bad,2.667274876,0.008129727
188,m,2.46344998,good,2.182085465,0.001657861
256,n,0.048663767,bad,2.952260299,0.013009626000000002
285,o,2.783848855,good,2.387345098,0.00092491
286,p,3.636218971,good,3.094046639,0.001584756
The follow up question was after grouping by "color", how can I do a subsequent ordering within the groups by "LDA Score" or essentially by bar length and have the text column sort by LDA, as well. I didn't know how to incorporate a second level or ordering in the code I was using, so I opted to turn the groups into facets and then try sorting by LDA Score for both the bar charts and the text column. I am getting the proper sorting by LDA score on the charts, but I can't seem to make it work for the text column. I am pasting the code and the image. As you can see, I am telling it to use LDA Score as the sorting field for the "text" chart (which is the pvalue), but it is still sorting alphabetically by species. Any thoughts? To be honest I feel like I'm heading down the rabbit hole where I'm forcing a solution to work in the current code, so if you think a different strategy altogether is the better way to go, let me know.
FYI, there are some formatting issues (like redundant labels on axes) that you can ignore for now.
bars = alt.Chart(df).mark_bar().encode(
alt.X('LDA Score'),
alt.Y("Species:N", sort='-x'),
color='group:N',
row='group:N'
).resolve_scale(y='independent'
)
text = alt.Chart(df).mark_text().encode(
alt.Text('p value:Q', format='.2e'),
alt.Y('Species:N', sort=alt.EncodingSortField(field='LDA Score', op='count', order='descending')),
row='group:N'
).resolve_scale(y='independent'
).properties(width=50)
#bars | text
alt.hconcat(bars, text, spacing=0)
Drop op="count". The count in each row is exactly 1 (there is one data point in each row). It sounds like you want to instead sort by the data value.
It also would make sense in this context to use this same sort expression for both y encodings, since they're meant to match:
y_sort = alt.EncodingSortField(field='LDA Score', order='descending')
bars = alt.Chart(df).mark_bar().encode(
alt.X('LDA Score'),
alt.Y("Species:N", sort=y_sort),
color='group:N',
row='group:N'
).resolve_scale(
y='independent'
)
text = alt.Chart(df).mark_text().encode(
alt.Text('p value:Q', format='.2e'),
alt.Y("Species:N", sort=y_sort, axis=None),
alt.Row('group:N', header=alt.Header(title=None, labelFontSize=0))
).resolve_scale(
y='independent'
).properties(width=50)
alt.hconcat(bars, text, spacing=0)
(labelFontSize is a workaround because there is a bug with labels=False)

Sorting based on the alt.Color field in Altair

I am attempting to sort a horizontal barchart based on the group to which it belongs. I have included the dataframe, code that I thought would get me to group-wise sorting, and image. The chart is currently sorted according to the species column in alphabetical order, but I would like it sorted by the group so that all "bads" are together, similarly, all "goods" are together. Ideally, I would like to take it one step further so that the goods and bads are subsequently sorted by value of 'LDA Score', but that was the next step.
Dataframe:
Unnamed: 0,Species,Unknown,group,LDA Score,p value
11,a,3.474929757,bad,3.07502591,5.67e-05
16,b,3.109308852,bad,2.739744898,0.000651725
31,c,3.16979865,bad,2.697247855,0.03310557
38,d,0.06730106400000001,bad,2.347746497,0.013009626000000002
56,e,2.788383183,good,2.223874347,0.0027407140000000004
65,f,2.644346144,bad,2.311106698,0.00541244
67,g,3.626001112,good,2.980960068,0.038597163
74,h,3.132399759,good,2.849798377,0.007021518000000001
117,i,3.192113412,good,2.861299028,8.19e-06
124,j,0.6140430960000001,bad,2.221483531,0.0022149739999999998
147,k,2.873671544,bad,2.390164757,0.002270102
184,l,3.003479213,bad,2.667274876,0.008129727
188,m,2.46344998,good,2.182085465,0.001657861
256,n,0.048663767,bad,2.952260299,0.013009626000000002
285,o,2.783848855,good,2.387345098,0.00092491
286,p,3.636219,good,3.094047,0.001584756
The code:
bars = alt.Chart(df).mark_bar().encode(
alt.X('LDA Score:Q'),
alt.Y("Species:N"),
alt.Color('group:N', sort=alt.EncodingSortField(field="Clinical group", op='distinct', order='ascending'))
)
bars
The resulting figure:
Two things:
If you want to sort the y-axis, you should put the sort expression in the y encoding. Above, you are sorting the color labels in the legend.
Sorting by field in Vega-Lite only works for numeric data (Edit: this is incorrect; see below), so you can use a calculate transform to map the entries to numbers by which to sort.
The result might look something like this:
alt.Chart(df).transform_calculate(
order='datum.group == "bad" ? 0 : 1'
).mark_bar().encode(
alt.X('LDA Score:Q'),
alt.Y("Species:N", sort=alt.SortField('order')),
alt.Color('group:N')
)
Edit: it turns out the reason sorting by group fails is that the default operation for sort fields is sum, which only works well on quantitative data. If you choose a different operation, you can sort on nominal data directly. For example, this shows the correct output:
alt.Chart(df).mark_bar().encode(
alt.X('LDA Score:Q'),
alt.Y("Species:N", sort=alt.EncodingSortField('group', op='min')),
alt.Color('group:N')
)
See vega/vega-lite#6064 for more information.

Categories

Resources