Matplotlib: plotting string values give strange behaviour - python

I'm trying to plot two data series:
day of the year (X-axis): ['2019-01-01', '2019-01-02', ...]
hour of sunrise (Y-axis): ['07:04', '07:03', ...]
But matplotlib is getting me crazy… here's the plot of a subset (ax.plot(datelist[130:230], hourlist[130:230], label='sunrise')):
As you can see, the Y-axis decrease from '03:57' to '03:33' and, then, suddenly start to increase up to '04:26'. That's non-sense to me.
Can you help me fixing that ?
Bonus points if you tell me how to show a decent scale on both axis (i.e. 00:00 – 24:00 equally spaced by 1 hour with minor ticks; and a list of chosen dates for the X-axis).
Thank you in advance!

So, thanks to #ImportanceOfBeingErnest's insight, I managed to make it work by converting both data series to Python's datetime.datetime objects, but that wasn't enough.
In order to be properly plotted, the Y-values needed to also have the same date (with a fixed reference date just for plotting purposes).
For the chart's scale I've found the matplotlib.dates module which happens to contains useful Formatters and Locators for the axis's attributes.
In order to get a full 24 hours range for the Y-axis I've used:
ax.set_ylim([datetime.datetime(2019, 1, 1), datetime.datetime(2019, 1, 2)])
ax.yaxis.set_major_formatter(DateFormatter('%H:%M'))
ax.yaxis.set_major_locator(HourLocator())
The overall result (with some additions) seems good enough for now (even if I have to fix the UTC's offsets):
Thank you again!

What type of variables did you use? You probably used strings in datelist and hourlist. Therefore when you plot them, matplotlib doesn't sort the lists.
You need to convert your values to the correct object type, and then you would be able to plot correctly.
For example:
If I plot the list ['c','a','b'] in the y values, then my y axis would be: c, then a, then b.

Related

Altair: Controlling tick counts for binned axis

I'm trying to generate a histogram in Altair, but I'm having trouble controlling the tick count for the axis corresponding to the binned variable (x-axis). I'm new to Altair so apologies I'm missing something obvious here. I tried to look for whether others had faced this kind of issue but didn't find an exact match.
The code to generate the histogram is
alt.Chart(df_test).mark_bar().encode(
x=alt.X('x:Q', bin=alt.Bin(step=0.1), scale=alt.Scale(domain=[8.9, 11.6])),
y=alt.Y('count(y):Q', title='Count(Y)')
).configure_axis(labelLimit=0, tickCount=3)
df_test is a Pandas dataframe - the data for which is available here.
The above code generates the following histogram. Changing tickCount changes the y-axis tick counts, but not the x-axis.
Any guidance is appreciated.
There might be a more convenient way to do this using bin=, but one approach is to use transform_bin with mark_rect, since this does not change the axis into a binned axis (which are more difficult to customize):
import altair as alt
from vega_datasets import data
source = data.movies.url
alt.Chart(source).mark_rect(stroke='white').encode(
x=alt.X('x1:Q', title='IMDB Rating', axis=alt.Axis(tickCount=3)),
x2='x2:Q',
y='count()',
).transform_bin(
['x1', 'x2'], field='IMDB_Rating'
)
You might notice that you don't get the exact number of ticks, this is because there is rounding to "nice" values, such as multiple of 5 etc. I couldn't turn this off even when setting nice=False on the scale, so another approach in those cases is to pass the exact tick values values=.
alt.Chart(source).mark_rect(stroke='white').encode(
x=alt.X('x1:Q', title='IMDB Rating', axis=alt.Axis(values=[0, 3, 6, 9])),
x2='x2:Q',
y='count()',
).transform_bin(
['x1', 'x2'], field='IMDB_Rating'
)
Be careful with decimal values, these are automatically displayed as integers (even with tickRound=False), but in the wrong position (this seems like a bug to me so if you investigate it more you might want to report on the Vega Lite issue tracker.

How to compress a time series plot, after taking out specific month from a dataset in python?

I have a dataset of OLR from 1986-2013 (daily data), and I am interested in plotting a time series which should have only boreal winter months i.e. from November to April.
(i) I am able to sort out Nov-Apr months from my datasets by using -
OLRNA = OLR.sel(TIME = OLR.TIME.dt.month.isin([11,12,1,2,3,4]))
and this is working.
(ii) But the problem is that whenever I am plotting a time series then that series is not continuous i.e not joining Nov-Apr for each year (showing gaps for remaining months). I know that as I have selected only Nov-Apr months in my data so it's not showing. Then how to join or compress the time axis?
how to plot this time series properly?
Instead, first mask the months you do not want to plot and then remove these masked rows by applying dropna
OLRNA = OLR.mask(OLR.Time.dt.month.isin([5, 6, 7, 8, 9, 10]))
OLRNA = OLRNA.dropna()
I tried to solve this issue and getting a proper plot. So answering my own question.
After selecting specific month from the time series. Just plot the series without considering 'time' on x-axis means just plot yaxis variable and let the x-axis denotes the serial numbers. And then with the help of matplotlib, just set the xticks and xticklabels manually wherever u want.
Thank you. Especially Bruno Vermeulen sir for the cooperation.

How to modify time interval in altair line graph

I have a simple line graph that looks like this: line graph of stock returns
I have been trying to format the x axis such that the time interval is in years instead of months, as it currently is now. But when I use the timeUnit attribute, it produces a stunted graph like this: line graph of stock returns in years
Code:
alt.Chart(data).mark_line().encode(
x = alt.X('Date', timeUnit = 'year'),
y = alt.Y('Cumul_R', axis = alt.Axis(format='%', orient='right')),
color = 'Stock')
What I'm trying to produce is a graph that looks like the first graph, but with intervals expressed in years like 06-2010, 06-2011, ... etc without compressing the graph like in the second pic. In other words, how do I only show some tick labels and not all of them.
I've seen answers to my question but they deal with absolute values using tickCount or tickMinStep, not for datetime values. There is apparently an altair attribute called timeinterval in https://altair-viz.github.io/user_guide/generated/core/altair.TimeInterval.html#altair.TimeInterval.init
that may solve the problem, but I'm not sure how to use it.
Appreciate all help on the matter. Thank you!
It appears that you are plotting your dates as nominal typed values, when you should probably be plotting them as temporal.
You should change x = alt.X('Date') to x = alt.X('Date:T') to specify that the x channel is temporal. When you do that, the renderer will use a temporal axis label that is probably closer to what you had in mind.
See Encoding Data Types in the documentation for more information.

Hours and minutes as labels in Altair plot spanning more than one day

I'm trying to create in Altair a Vega-Lite specification of a plot of a time series whose time range spans a few days. Since in my case, it will be clear which day is which, I want to reduce noise in my axis labels by letting labels be of the form '%H:%M', even if this causes labels to be non-distinct.
Here's some example data; my actual data has a five minute resolution, but I imagine that won't matter too much here:
import altair as alt
import numpy as np
import pandas as pd
# Create data spanning 30 hours, or just over one full day
df = pd.DataFrame({'time': pd.date_range('2018-01-01', periods=30, freq='H'),
'data': np.arange(30)**.5})
By using the otherwise trivial yearmonthdatehoursminutes transform, I get the following:
alt.Chart(df).mark_line().encode(x='yearmonthdatehoursminutes(time):T',
y='data:Q')
Now, my goal is to get rid of the dates in the labels on the horizontal axis, so they become something like ['00:00', '03:00', ..., '21:00', '00:00', '03:00'], or whatever spacing works best.
The naive approach of just using hoursminutes as a transform won't work, as that bins the actual data:
alt.Chart(df).mark_line().encode(x='hoursminutes(time):T', y='data:Q')
So, is there a declarative way of doing this? Ultimately, the visualization will be making use of selections to define the horizontal axis limits, so specifying the labels explicitly using Axis does not seem appealing.
To expand on #fuglede's answer, there are two distinct concepts at play with dates and times in Altair.
Time formats let you specify how times are displayed on an axis; they look like this:
chart.encode(
x=alt.X('time:T', axis=alt.Axis(format='%H:%M'))
)
Altair uses format codes from d3-time-format.
Time units let you specify how data will be grouped, and they also adjust the default time format to match. They look something like this:
chart.encode(
x=alt.X('time:T', timeUnit='hoursminutes')
)
or via the shorthand:
chart.encode(
x='hoursminutes(time):T'
)
Available time units are listed here.
If you want to adjust axis formats only, use time formats. If you want to group based on timespans (i.e. group data by year, by month, by hour, etc.) then use a time unit. Examples of this appear in the Altair documentation, e.g. the Seattle Weather Heatmap in Altair's example gallery.
This can actually easily be achieved by specifying format in Axis:
alt.Chart(df).mark_line().encode(x=alt.X('time:T', axis=alt.Axis(format='%H:%M')), y='data:Q')

Pandas histogram of dates with empty bins

My use case is very simliar to this post, but my data is not continuous through each bin. I'm attempting to create multiple figures over the same time span to show activity (or lack thereof) over 18 months. I thought I hit the jackpot with the df.groupby(df.date.month).count() approach, but since my data is irregular I get different bins per dataset.
My question, then, is how would I go about creating some kind of master x-axis with fixed bins (month,year) and plot each dataset against these bins. I think I'm missing some fundamental understanding of either Pandas or MPL, and I apologize for what I'm sure is a silly question. First post, go easy...
Since I can't comment yet, I'll edit here:
I have 18 months generated with pd.period_range. I also have a DataFrame full of observations with timestamps within those months. Some of months have zero observations. How do I effectively count and chart the observations by month?
Have you tried the suggestions here?
You can also try this sort of approach to manually define the bin boundaries
bins = [0, 30, 60, 90, 120]
labels = [1, 2, 3, 4]
df['new_bin'] = pd.cut(df['existing_value'], bins=bins, labels=labels)

Categories

Resources