Altair: Controlling tick counts for binned axis - python

I'm trying to generate a histogram in Altair, but I'm having trouble controlling the tick count for the axis corresponding to the binned variable (x-axis). I'm new to Altair so apologies I'm missing something obvious here. I tried to look for whether others had faced this kind of issue but didn't find an exact match.
The code to generate the histogram is
alt.Chart(df_test).mark_bar().encode(
x=alt.X('x:Q', bin=alt.Bin(step=0.1), scale=alt.Scale(domain=[8.9, 11.6])),
y=alt.Y('count(y):Q', title='Count(Y)')
).configure_axis(labelLimit=0, tickCount=3)
df_test is a Pandas dataframe - the data for which is available here.
The above code generates the following histogram. Changing tickCount changes the y-axis tick counts, but not the x-axis.
Any guidance is appreciated.

There might be a more convenient way to do this using bin=, but one approach is to use transform_bin with mark_rect, since this does not change the axis into a binned axis (which are more difficult to customize):
import altair as alt
from vega_datasets import data
source = data.movies.url
alt.Chart(source).mark_rect(stroke='white').encode(
x=alt.X('x1:Q', title='IMDB Rating', axis=alt.Axis(tickCount=3)),
x2='x2:Q',
y='count()',
).transform_bin(
['x1', 'x2'], field='IMDB_Rating'
)
You might notice that you don't get the exact number of ticks, this is because there is rounding to "nice" values, such as multiple of 5 etc. I couldn't turn this off even when setting nice=False on the scale, so another approach in those cases is to pass the exact tick values values=.
alt.Chart(source).mark_rect(stroke='white').encode(
x=alt.X('x1:Q', title='IMDB Rating', axis=alt.Axis(values=[0, 3, 6, 9])),
x2='x2:Q',
y='count()',
).transform_bin(
['x1', 'x2'], field='IMDB_Rating'
)
Be careful with decimal values, these are automatically displayed as integers (even with tickRound=False), but in the wrong position (this seems like a bug to me so if you investigate it more you might want to report on the Vega Lite issue tracker.

Related

Choosing how many x axis labels display on an altair chart in python

I have an altair chart where I am using mark_rectangle. I want to choose how many x axis labels are displayed to have the marks form squares in the visualization. Or perhaps I want to choose the range of the x axis labels. Right now there are far too many labels being displayed. Below I have an example of what I am trying to achieve and what my current output chart is. I apologize if the issue is due to something else, I am currently figuring out altair.
My code currently:
alt.Chart(temdf).mark_rect().encode(
x=alt.X('norm:O', title='', axis=alt.Axis(grid=False, labelAngle=360)),
y=alt.Y('term:N', title='', axis=alt.Axis(grid=False)),
color=alt.Color('norm:O', title='', scale=alt.Scale(scheme='blues'), legend=None),
facet=alt.Facet('title:N', title='',columns=3, header=alt.Header(labelOrient='bottom', labelPadding=15, labelAngle=360),
sort=alt.EncodingSortField(field = 'title', order='ascending'))
What I am trying to achieve:
My current output:
You have declared that your x data is type O, meaning ordinal, i.e. ordered categories. This says that you want one distinct x bin for each unique value in your dataset. If you want fewer ordinal x bins, you should use a dataset with fewer unique values.
Alternatively, if you don't want each unique x value to have its own label, you can use the quantitative data type (i.e. x=alt.X('norm:Q')), or perhaps bin your data x=alt.X('norm:O', bin=True). Be sure to bin your color encoding as well if you use the latter.

Adding upper / lower limits and changing x-axes value format in Dash Plotly

I’ve been working on a mini project using Dash Plotly to visualize some factory data I found online and I have a couple of questions that I could not find answers to.
How to change the format of x-axes values ? My values are in the thousands and Plotly defaults it to 20k, 22.5k, 25k etc. I actually want it as 20000, 22500, 25000.
While I am able to plot x and y values from the data frame easily, my data has upper and lower limits (same scale as y values) to determine pass fail criteria. These limits are in separate columns in the data frame. How do I plot these limits for every corresponding x value ?
Thanks for your help!
Question 1 - How to format axis labels
From the official documentation:
https://plotly.com/python/tick-formatting/#using-tickformat-attribute
Using Tickformat Attribute
For more formatting types, see: https://github.com/d3/d3-format/blob/master/README.md#locale_format
fig.update_layout(yaxis_tickformat = '%')
Question 2
I would recommend to read out the upper and lower limits from the dataframe directly and then using the range property in plotly:
https://plotly.com/python/axes/#setting-the-range-of-axes-manually
fig.update_xaxes(range=[1.5, 4.5])
fig.update_yaxes(range=[3, 9])

Hours and minutes as labels in Altair plot spanning more than one day

I'm trying to create in Altair a Vega-Lite specification of a plot of a time series whose time range spans a few days. Since in my case, it will be clear which day is which, I want to reduce noise in my axis labels by letting labels be of the form '%H:%M', even if this causes labels to be non-distinct.
Here's some example data; my actual data has a five minute resolution, but I imagine that won't matter too much here:
import altair as alt
import numpy as np
import pandas as pd
# Create data spanning 30 hours, or just over one full day
df = pd.DataFrame({'time': pd.date_range('2018-01-01', periods=30, freq='H'),
'data': np.arange(30)**.5})
By using the otherwise trivial yearmonthdatehoursminutes transform, I get the following:
alt.Chart(df).mark_line().encode(x='yearmonthdatehoursminutes(time):T',
y='data:Q')
Now, my goal is to get rid of the dates in the labels on the horizontal axis, so they become something like ['00:00', '03:00', ..., '21:00', '00:00', '03:00'], or whatever spacing works best.
The naive approach of just using hoursminutes as a transform won't work, as that bins the actual data:
alt.Chart(df).mark_line().encode(x='hoursminutes(time):T', y='data:Q')
So, is there a declarative way of doing this? Ultimately, the visualization will be making use of selections to define the horizontal axis limits, so specifying the labels explicitly using Axis does not seem appealing.
To expand on #fuglede's answer, there are two distinct concepts at play with dates and times in Altair.
Time formats let you specify how times are displayed on an axis; they look like this:
chart.encode(
x=alt.X('time:T', axis=alt.Axis(format='%H:%M'))
)
Altair uses format codes from d3-time-format.
Time units let you specify how data will be grouped, and they also adjust the default time format to match. They look something like this:
chart.encode(
x=alt.X('time:T', timeUnit='hoursminutes')
)
or via the shorthand:
chart.encode(
x='hoursminutes(time):T'
)
Available time units are listed here.
If you want to adjust axis formats only, use time formats. If you want to group based on timespans (i.e. group data by year, by month, by hour, etc.) then use a time unit. Examples of this appear in the Altair documentation, e.g. the Seattle Weather Heatmap in Altair's example gallery.
This can actually easily be achieved by specifying format in Axis:
alt.Chart(df).mark_line().encode(x=alt.X('time:T', axis=alt.Axis(format='%H:%M')), y='data:Q')

Matplotlib: plotting string values give strange behaviour

I'm trying to plot two data series:
day of the year (X-axis): ['2019-01-01', '2019-01-02', ...]
hour of sunrise (Y-axis): ['07:04', '07:03', ...]
But matplotlib is getting me crazy… here's the plot of a subset (ax.plot(datelist[130:230], hourlist[130:230], label='sunrise')):
As you can see, the Y-axis decrease from '03:57' to '03:33' and, then, suddenly start to increase up to '04:26'. That's non-sense to me.
Can you help me fixing that ?
Bonus points if you tell me how to show a decent scale on both axis (i.e. 00:00 – 24:00 equally spaced by 1 hour with minor ticks; and a list of chosen dates for the X-axis).
Thank you in advance!
So, thanks to #ImportanceOfBeingErnest's insight, I managed to make it work by converting both data series to Python's datetime.datetime objects, but that wasn't enough.
In order to be properly plotted, the Y-values needed to also have the same date (with a fixed reference date just for plotting purposes).
For the chart's scale I've found the matplotlib.dates module which happens to contains useful Formatters and Locators for the axis's attributes.
In order to get a full 24 hours range for the Y-axis I've used:
ax.set_ylim([datetime.datetime(2019, 1, 1), datetime.datetime(2019, 1, 2)])
ax.yaxis.set_major_formatter(DateFormatter('%H:%M'))
ax.yaxis.set_major_locator(HourLocator())
The overall result (with some additions) seems good enough for now (even if I have to fix the UTC's offsets):
Thank you again!
What type of variables did you use? You probably used strings in datelist and hourlist. Therefore when you plot them, matplotlib doesn't sort the lists.
You need to convert your values to the correct object type, and then you would be able to plot correctly.
For example:
If I plot the list ['c','a','b'] in the y values, then my y axis would be: c, then a, then b.

Seaborn pairplot: how to change legend label text

I'm making a simple pairplot with Seaborn in Python that shows different levels of a categorical variable by the color of plot elements across variables in a Pandas DataFrame. Although the plot comes out exactly as I want it, the categorical variable is binary, which makes the legend quite meaningless to an audience not familiar with the data (categories are naturally labeled as 0 & 1).
An example of my code:
g = sns.pairplot(df, hue='categorical_var', palette='Set3')
Is there a way to change legend label text with pairplot? Or should I use PairGrid, and if so how would I approach this?
Found it! It was answered here: Edit seaborn legend
g = sns.pairplot(df, hue='categorical_var', palette='Set3')
g._legend.set_title(new_title)
Since you don't provide a full example of code, nor mock data, I will use my own codes to answer.
First solution
The easiest must be to keep your binary labels for analysis and to create a column with proper names for plotting. Here is a sample code of mine, you should grab the idea:
def transconum(morph):
if (morph == 'S'):
return 1.0
else:
return 0.0
CompactGroups['MorphNum'] = CompactGroups['MorphGal'].apply(transconum)
Second solution
Another way would be to overwrite labels on the flight. Here is a sample code of mine which works perfectly:
grid = sns.jointplot(x="MorphNum", y="PropS", data=CompactGroups, kind="reg")
grid.set_axis_labels("Central type", "Spiral proportion among satellites")
grid.ax_joint.set_xticks([0, 1, 1])
plt.xticks(range(2), ('$Red$', '$S$'))

Categories

Resources