Align rotated labels of Altair boxplot - python

I'm plotting a boxplot graph using Altair to divide the data into groups (similar to what was done here: Altair boxplot with nested grouping by two categorical variables) . The code I have is below:
import seaborn as sns
import altair as alt
tips = sns.load_dataset("tips")
tips = tips.replace('Sun', 'Sunday morning')
chart = alt.Chart(tips).mark_boxplot(ticks=True).encode(
x=alt.X("smoker:O", title=None, axis=alt.Axis(labels=False, ticks=False), scale=alt.Scale(padding=1)),
y=alt.Y("total_bill:Q"),
color="smoker:N",
column=alt.Column('day:N', sort=['Thur','Fri','Sat','Sunday morning'],header=alt.Header(labelPadding=-530,labelAngle=-90))
).properties(
width=80,
height = 350
).configure_view(
stroke=None
)
However, I can not manage to properly position the column labels after rotating them 90 degrees. In the image below, you can see the graph I'm getting in 'a', and the graph I would like to get in 'b'.
I looked for configurations on https://altair-viz.github.io/user_guide/configuration.html, but could not find a solution. Does anyone have an idea? (I'm using Altair 4.2.2 - the latest version).

I have tried various parameters, referring to this answer. I got the intended result by setting the label orient to bottom and modifying the padding. I don't have as much experience with Altair as you describe.
import seaborn as sns
import altair as alt
tips = sns.load_dataset("tips")
tips = tips.replace('Sun', 'Sunday morning')
chart = alt.Chart(tips).mark_boxplot(ticks=True).encode(
x=alt.X("smoker:O", title=None, axis=alt.Axis(labels=False, ticks=False), scale=alt.Scale(padding=1)),
y=alt.Y("total_bill:Q"),
color="smoker:N",
column=alt.Column('day:N',
sort=['Thur','Fri','Sat','Sunday morning'],
header=alt.Header(
labelPadding=360,
labelAngle=-90,
orient='bottom',
labelOrient='bottom',
labelAlign='right'
))
).properties(
width=80,
height = 350
).configure_view(
stroke=None
)
chart

Related

How do you put the x axis labels on the top of the heatmap created with seaborn? [duplicate]

This question already has answers here:
How to move labels from bottom to top without adding "ticks"
(2 answers)
How to have the axis ticks in both top and bottom, left and right of a heatmap
(2 answers)
Closed 4 months ago.
I have created a heatmap using the seaborn and matplotlib package in python, and while it is perfectly suited for my current needs, I really would prefer to have the labels on the x-axis of the heatmap to be placed at the top of the plot, rather than at the bottom (which seems to be its default).
So an abridged form of my data looks like this:
NP NP1 NP2 NP3 NP4 NP5
identifier
A1BG~P04217 -0.094045 0.012229 0.102279 1.319618 0.002383
A2M~P01023 -0.805089 -0.477339 -0.351341 0.089735 -0.473815
AARS1~P49588 0.081827 -0.099849 -0.287426 0.101588 0.136366
ABCB6~Q9NP58 0.109911 0.458039 -0.039325 -0.484872 1.905586
ABCC1~I3L4X2 -0.560155 0.580285 0.012868 0.291303 -0.407900
ABCC4~O15439 0.055264 0.138630 -0.204665 0.191241 0.304999
ABCE1~P61221 -0.510108 -0.059724 -0.233365 0.078956 -0.651327
ABCF1~Q8NE71 -0.348526 -0.135414 -0.390021 -0.190644 -0.276303
ABHD10~Q9NUJ1 0.237959 -2.060834 0.325901 -0.778036 -4.046345
ABHD11~Q8NFV4 0.294587 1.193258 -0.797294 -0.148064 -1.153391
And when I use the following code:
import seaborn as sns
import matplotlib as plt
fig, ax = plt.subplots(figsize=(10,30))
ax = sns.heatmap(df_example, annot=True, xticklabels=True)
I get this kind of plot:
https://imgpile.com/i/T3zPH1
I should note that the this plot was made from the abridged dataframe above, the actual dataframe has thousands of identifiers, making it very long.
But as you can see, the labels on the x axis only appear at the bottom. I have been trying to get them to appear on the top, but seaborn doesn't seem to allow this kind of formatting.
So I have also tried using plotly express, but while I solve the issue of placing my x-axis labels on top, I have been completely unable to format the heat map as I had before using seaborn. The following code:
import plotly.express as px
fig = px.imshow(df_example, width= 500, height=6000)
fig.update_xaxes(side="top")
fig.show()
yields this kind of plot: https://imgpile.com/i/T3zF42.
I have tried many times to reformat it using the documentation from plotly (https://plotly.com/python/heatmaps/), but I can't seem to get it to work. When one thing is fixed, another problem arises. I really just want to keep using the seaborn based code as above, and just fix the x-axis labels. I'm also happy to have the x-axis label at both the top and bottom of the plot, but I can't get that work presently. Can someone advise me on what to do here?
Ok, so I did a bit more research, and it turns out you can add the follow code with the seaborn approach:
plt.tick_params(axis='both', which='major', labelsize=10, labelbottom = False, bottom=False, top = False, labeltop=True)
If your data are stored into csv file, you can use this code:
import pandas as pd
import plotly.express as px
df = pd.read_csv("file.csv").round(2)
fig = px.imshow(df.iloc[:,1:],
y = df['identifier'],
text_auto=True, aspect="auto")
fig.show()
The data in the CSV file are in the following format:
identifier NP1 NP2 NP3 NP4 NP5
A1BG~P04217 -0.094045 0.012229 0.102279 1.319618 0.002383
A2M~P01023 -0.805089 -0.477339 -0.351341 0.089735 -0.473815
AARS1~P49588 0.081827 -0.099849 -0.287426 0.101588 0.136366
ABCB6~Q9NP58 0.109911 0.458039 -0.039325 -0.484872 1.905586
ABCC1~I3L4X2 -0.560155 0.580285 0.012868 0.291303 -0.407900
ABCC4~O15439 0.055264 0.138630 -0.204665 0.191241 0.304999
ABCE1~P61221 -0.510108 -0.059724 -0.233365 0.078956 -0.651327
ABCF1~Q8NE71 -0.348526 -0.135414 -0.390021 -0.190644 -0.276303
ABHD10~Q9NUJ1 0.237959 -2.060834 0.325901 -0.778036 -4.046345
ABHD11~Q8NFV4 0.294587 1.193258 -0.797294 -0.148064 -1.153391
Now let's display the xaxis top of the heatmap by adding:
fig.update_layout(xaxis = dict(side ="top"))
Alternative solution if you have old version of Plotly:
fig = go.Figure(data=go.Heatmap(
x=df.columns[1:],
y=df.identifier,
z=df.iloc[:,1:],
text=df.iloc[:,1:],
texttemplate="%{text}"))
fig.update_layout(xaxis = dict(side ="top"))
fig.show()

Is there a way to format tooltip values in Altair boxplot

Is is possible to format the values within a tooltip for a boxplot? From this Vega documentation, it appears so, but I can't quite figure out how to do it with Altair for python
from vega_datasets import data
import altair as alt
source = data.population.url
alt.Chart(source).mark_boxplot().encode(
alt.X("age:O"),
alt.Y("people:Q"),
tooltip=[
alt.Tooltip("people:Q", format=",.2f"),
],
)
I believe you need to provide an aggregation for composite marks like mark_boxplot. This works:
from vega_datasets import data
import altair as alt
source = data.population.url
alt.Chart(source).mark_boxplot().encode(
alt.X("age:O"),
alt.Y("people:Q"),
tooltip=alt.Tooltip("mean(people):Q", format=",.2f"),)
Update: As it is currently impossible to add multiple aggregated tooltips to a boxplot, I combined my answer with How to change Altair boxplot infobox to display mean rather than median? to put a transparent box with a custom tooltip on top of the boxplot. I still kept the boxplot underneath in order to have the outliers and whiskers plotted as a Tukey boxplot instead of min-max. I also added a point for the mean, since this is what I wanted to see in the tooltip:
alt.Chart(source).mark_boxplot(median={'color': '#353535'}).encode(
alt.X("age:O"),
alt.Y("people:Q"),
tooltip=[
alt.Tooltip("people:Q", format=",.2f"),
],
) + alt.Chart(source).mark_circle(color='#353535', size=15).encode(
x='age:O',
y='mean(people):Q'
) + alt.Chart(source).transform_aggregate(
min="min(people)",
max="max(people)",
mean="mean(people)",
median="median(people)",
q1="q1(people)",
q3="q3(people)",
count="count()",
groupby=['age']
).mark_bar(opacity=0).encode(
x='age:O',
y='q1:Q',
y2='q3:Q',
tooltip=alt.Tooltip(['min:Q', 'q1:Q', 'mean:Q', 'median:Q', 'q3:Q', 'max:Q', 'count:Q'], format='.1f')
)
There is a way to add multiple columns to the tooltip. You can pass in multiple columns in square brackets as a list.
import altair as alt
from vega_datasets import data
stocks = data.stocks()
alt.Chart(stocks).mark_point().transform_window(
previous_price = 'lag(price)'
).transform_calculate(
pct_change = '(datum.price - datum.previous_price) / datum.previous_price'
).encode(
x='date',
y='price',
color='symbol',
tooltip=[ 'price', 'symbol', alt.Tooltip('pct_change:Q', format='.1%')]
)

Generate two legends via Altair

I would like to have two legends via Altair just like the picture below.
I have created the legend of "Count of actors", but I don't know how to generate the other one. My code is below:
plot = base.mark_circle(
opacity=0.8,
stroke='black',
strokeWidth=1
).encode(
alt.X('TYPE:O'),
alt.Y('index:N',
sort= movies_order
),
alt.Size('count(index):Q',
scale=alt.Scale(range=[0,4500]),
legend=alt.Legend(title='Count of actors', symbolFillColor='white')),
alt.Color('GENDER', legend=None)
#complete this
).properties(
width=350,
height=880
And the chart I created is like this:
This is the default behavior in Altair, but you have disabled the color legend. Change alt.Color('GENDER', legend=None) to alt.Color('GENDER').
Here is a modifed example for the Altair gallery with two legends:
import altair as alt
from vega_datasets import data
source = data.cars()
alt.Chart(source).mark_circle().encode(
x='Horsepower',
y='Miles_per_Gallon',
color='Origin',
size='Cylinders')

Using color on bar chart with Altair seems to prevent zero=False on scale from having anticipated effect

The first chart from the below code (based on this: https://altair-viz.github.io/gallery/us_population_over_time_facet.html) seems to force Y-axis to not begin at zero, as anticipated. But the second chart, which includes a color in the encoding, seems to make the zero=False in alt.Scale no longer respected
Edit: forgot to mention using Altair 4.1.0
import altair as alt
from vega_datasets import data
import pandas as pd
source = data.population.url
df = pd.read_json(source)
df = df[df["age"] <= 40]
alt.Chart(df).mark_bar().encode(
x="age:O",
y=alt.Y(
"sum(people):Q",
title="Population",
axis=alt.Axis(format="~s"),
scale=alt.Scale(zero=False),
),
facet=alt.Facet("year:O", columns=5),
).resolve_scale(y="independent").properties(
title="US Age Distribution By Year", width=90, height=80
)
alt.Chart(df).mark_bar().encode(
x="age:O",
y=alt.Y(
"sum(people):Q",
title="Population",
axis=alt.Axis(format="~s"),
scale=alt.Scale(zero=False),
),
facet=alt.Facet("year:O", columns=5),
color=alt.Color("year"),
).resolve_scale(y="independent").properties(
title="US Age Distribution By Year", width=90, height=80
)
This happens because the scales are automatically adjusted to show all the groups in the variable you are coloring by. It is easier to understand if we look at a single barplot with stacked colors:
import altair as alt
from vega_datasets import data
import pandas as pd
source = data.population.url
df = pd.read_json(source)
df = df[df["age"] <= 40]
alt.Chart(df.query('year < 1880')).mark_bar().encode(
x="age:O",
y=alt.Y(
"sum(people):Q",
axis=alt.Axis(format="~s"),
scale=alt.Scale(zero=False)),
color=alt.Color("year"))
You are calculating the sum, which means that all the years are going to be somewhere in that bar stacked on top of each other. Altair / Vega-Lite expands the axis so that includes all groups in your colored variable.
If you instead would color by age, the axis would again expand to include all the colored group, but because they are now not at the bottom of each bar, the axis is cut above zero.
import altair as alt
from vega_datasets import data
import pandas as pd
source = data.population.url
df = pd.read_json(source)
df = df[df["age"] <= 40]
alt.Chart(df.query('year < 1880')).mark_bar().encode(
x="age:O",
y=alt.Y(
"sum(people):Q",
axis=alt.Axis(format="~s"),
scale=alt.Scale(zero=False)),
color=alt.Color("age"))
The only discrepancy is why doesn't it just show the tip of the darkest color in the first plot and cut around 2M? I am not sure about that on the top of my head.

How to do annotations with Altair?

I am trying to write some text inside the figure to highlight something in my plot (equivalent to 'annotate' in matplotlib). Any idea? Thanks
You can get annotations into your Altair plots in two steps:
Use mark_text() to specify the annotation's position, fontsize etc.
Use transform_filter() from datum to select the points (data subset) that needs the annotation. Note the line from altair import datum.
Code:
import altair as alt
from vega_datasets import data
alt.renderers.enable('notebook')
from altair import datum #Needed for subsetting (transforming data)
iris = data.iris()
points = alt.Chart(iris).mark_point().encode(
x='petalLength',
y='petalWidth',
color='species')
annotation = alt.Chart(iris).mark_text(
align='left',
baseline='middle',
fontSize = 20,
dx = 7
).encode(
x='petalLength',
y='petalWidth',
text='petalLength'
).transform_filter(
(datum.petalLength >= 5.1) & (datum.petalWidth < 1.6)
)
points + annotation
which produces:
These are static annotations. You can also get interactive annotations by binding selections to the plots.

Categories

Resources