How to modify time interval in altair line graph - python

I have a simple line graph that looks like this: line graph of stock returns
I have been trying to format the x axis such that the time interval is in years instead of months, as it currently is now. But when I use the timeUnit attribute, it produces a stunted graph like this: line graph of stock returns in years
Code:
alt.Chart(data).mark_line().encode(
x = alt.X('Date', timeUnit = 'year'),
y = alt.Y('Cumul_R', axis = alt.Axis(format='%', orient='right')),
color = 'Stock')
What I'm trying to produce is a graph that looks like the first graph, but with intervals expressed in years like 06-2010, 06-2011, ... etc without compressing the graph like in the second pic. In other words, how do I only show some tick labels and not all of them.
I've seen answers to my question but they deal with absolute values using tickCount or tickMinStep, not for datetime values. There is apparently an altair attribute called timeinterval in https://altair-viz.github.io/user_guide/generated/core/altair.TimeInterval.html#altair.TimeInterval.init
that may solve the problem, but I'm not sure how to use it.
Appreciate all help on the matter. Thank you!

It appears that you are plotting your dates as nominal typed values, when you should probably be plotting them as temporal.
You should change x = alt.X('Date') to x = alt.X('Date:T') to specify that the x channel is temporal. When you do that, the renderer will use a temporal axis label that is probably closer to what you had in mind.
See Encoding Data Types in the documentation for more information.

Related

Altair: Controlling tick counts for binned axis

I'm trying to generate a histogram in Altair, but I'm having trouble controlling the tick count for the axis corresponding to the binned variable (x-axis). I'm new to Altair so apologies I'm missing something obvious here. I tried to look for whether others had faced this kind of issue but didn't find an exact match.
The code to generate the histogram is
alt.Chart(df_test).mark_bar().encode(
x=alt.X('x:Q', bin=alt.Bin(step=0.1), scale=alt.Scale(domain=[8.9, 11.6])),
y=alt.Y('count(y):Q', title='Count(Y)')
).configure_axis(labelLimit=0, tickCount=3)
df_test is a Pandas dataframe - the data for which is available here.
The above code generates the following histogram. Changing tickCount changes the y-axis tick counts, but not the x-axis.
Any guidance is appreciated.
There might be a more convenient way to do this using bin=, but one approach is to use transform_bin with mark_rect, since this does not change the axis into a binned axis (which are more difficult to customize):
import altair as alt
from vega_datasets import data
source = data.movies.url
alt.Chart(source).mark_rect(stroke='white').encode(
x=alt.X('x1:Q', title='IMDB Rating', axis=alt.Axis(tickCount=3)),
x2='x2:Q',
y='count()',
).transform_bin(
['x1', 'x2'], field='IMDB_Rating'
)
You might notice that you don't get the exact number of ticks, this is because there is rounding to "nice" values, such as multiple of 5 etc. I couldn't turn this off even when setting nice=False on the scale, so another approach in those cases is to pass the exact tick values values=.
alt.Chart(source).mark_rect(stroke='white').encode(
x=alt.X('x1:Q', title='IMDB Rating', axis=alt.Axis(values=[0, 3, 6, 9])),
x2='x2:Q',
y='count()',
).transform_bin(
['x1', 'x2'], field='IMDB_Rating'
)
Be careful with decimal values, these are automatically displayed as integers (even with tickRound=False), but in the wrong position (this seems like a bug to me so if you investigate it more you might want to report on the Vega Lite issue tracker.

Choosing how many x axis labels display on an altair chart in python

I have an altair chart where I am using mark_rectangle. I want to choose how many x axis labels are displayed to have the marks form squares in the visualization. Or perhaps I want to choose the range of the x axis labels. Right now there are far too many labels being displayed. Below I have an example of what I am trying to achieve and what my current output chart is. I apologize if the issue is due to something else, I am currently figuring out altair.
My code currently:
alt.Chart(temdf).mark_rect().encode(
x=alt.X('norm:O', title='', axis=alt.Axis(grid=False, labelAngle=360)),
y=alt.Y('term:N', title='', axis=alt.Axis(grid=False)),
color=alt.Color('norm:O', title='', scale=alt.Scale(scheme='blues'), legend=None),
facet=alt.Facet('title:N', title='',columns=3, header=alt.Header(labelOrient='bottom', labelPadding=15, labelAngle=360),
sort=alt.EncodingSortField(field = 'title', order='ascending'))
What I am trying to achieve:
My current output:
You have declared that your x data is type O, meaning ordinal, i.e. ordered categories. This says that you want one distinct x bin for each unique value in your dataset. If you want fewer ordinal x bins, you should use a dataset with fewer unique values.
Alternatively, if you don't want each unique x value to have its own label, you can use the quantitative data type (i.e. x=alt.X('norm:Q')), or perhaps bin your data x=alt.X('norm:O', bin=True). Be sure to bin your color encoding as well if you use the latter.

Hours and minutes as labels in Altair plot spanning more than one day

I'm trying to create in Altair a Vega-Lite specification of a plot of a time series whose time range spans a few days. Since in my case, it will be clear which day is which, I want to reduce noise in my axis labels by letting labels be of the form '%H:%M', even if this causes labels to be non-distinct.
Here's some example data; my actual data has a five minute resolution, but I imagine that won't matter too much here:
import altair as alt
import numpy as np
import pandas as pd
# Create data spanning 30 hours, or just over one full day
df = pd.DataFrame({'time': pd.date_range('2018-01-01', periods=30, freq='H'),
'data': np.arange(30)**.5})
By using the otherwise trivial yearmonthdatehoursminutes transform, I get the following:
alt.Chart(df).mark_line().encode(x='yearmonthdatehoursminutes(time):T',
y='data:Q')
Now, my goal is to get rid of the dates in the labels on the horizontal axis, so they become something like ['00:00', '03:00', ..., '21:00', '00:00', '03:00'], or whatever spacing works best.
The naive approach of just using hoursminutes as a transform won't work, as that bins the actual data:
alt.Chart(df).mark_line().encode(x='hoursminutes(time):T', y='data:Q')
So, is there a declarative way of doing this? Ultimately, the visualization will be making use of selections to define the horizontal axis limits, so specifying the labels explicitly using Axis does not seem appealing.
To expand on #fuglede's answer, there are two distinct concepts at play with dates and times in Altair.
Time formats let you specify how times are displayed on an axis; they look like this:
chart.encode(
x=alt.X('time:T', axis=alt.Axis(format='%H:%M'))
)
Altair uses format codes from d3-time-format.
Time units let you specify how data will be grouped, and they also adjust the default time format to match. They look something like this:
chart.encode(
x=alt.X('time:T', timeUnit='hoursminutes')
)
or via the shorthand:
chart.encode(
x='hoursminutes(time):T'
)
Available time units are listed here.
If you want to adjust axis formats only, use time formats. If you want to group based on timespans (i.e. group data by year, by month, by hour, etc.) then use a time unit. Examples of this appear in the Altair documentation, e.g. the Seattle Weather Heatmap in Altair's example gallery.
This can actually easily be achieved by specifying format in Axis:
alt.Chart(df).mark_line().encode(x=alt.X('time:T', axis=alt.Axis(format='%H:%M')), y='data:Q')

Seaborn pairplot: how to change legend label text

I'm making a simple pairplot with Seaborn in Python that shows different levels of a categorical variable by the color of plot elements across variables in a Pandas DataFrame. Although the plot comes out exactly as I want it, the categorical variable is binary, which makes the legend quite meaningless to an audience not familiar with the data (categories are naturally labeled as 0 & 1).
An example of my code:
g = sns.pairplot(df, hue='categorical_var', palette='Set3')
Is there a way to change legend label text with pairplot? Or should I use PairGrid, and if so how would I approach this?
Found it! It was answered here: Edit seaborn legend
g = sns.pairplot(df, hue='categorical_var', palette='Set3')
g._legend.set_title(new_title)
Since you don't provide a full example of code, nor mock data, I will use my own codes to answer.
First solution
The easiest must be to keep your binary labels for analysis and to create a column with proper names for plotting. Here is a sample code of mine, you should grab the idea:
def transconum(morph):
if (morph == 'S'):
return 1.0
else:
return 0.0
CompactGroups['MorphNum'] = CompactGroups['MorphGal'].apply(transconum)
Second solution
Another way would be to overwrite labels on the flight. Here is a sample code of mine which works perfectly:
grid = sns.jointplot(x="MorphNum", y="PropS", data=CompactGroups, kind="reg")
grid.set_axis_labels("Central type", "Spiral proportion among satellites")
grid.ax_joint.set_xticks([0, 1, 1])
plt.xticks(range(2), ('$Red$', '$S$'))

Highlighting last data point in pandas plot

I have number of graphs similar to this:
import pandas as pd
dates = pd.date_range('2012-01-01','2013-02-22')
y = np.random.randn(len(dates))/365
Y = pd.Series(y, index=dates)
Y.plot()
The graph is great for showing the shape of the data, but I would like the latest value to stand out as well. I would like to highlight the last data point with a marker 'x' and with a different color. Any idea how I can do this?
Have added Dan Allan's suggestion. Works but I need something a bit more visible. As seen below the x is hardly visible. Any ideas?
Have added return of final answer to complete this. Changed the x to a D for a diamond for better visibility and increased the size of the marker.
Y.tail(1).plot(style='rD',markersize=10)
Add this line to your example to plot the last data point as a red X.
Y.tail(1).plot(style='rx')

Categories

Resources