I browsed over stack overflow and found some matching questions, but I could not gather much from them.
I have a dataset as shown in the image above.
The dataset consists of Sales figures from Jan 1949 to Dec 1960. I want to plot the Date vs Sales graph. My code returned the following graph.
The X-axis shows only intervals of 2 years. I want the x-Axis to display at uniform intervals through the date range of Jan 1949 to Dec 1960 (as shown in the dataset above), rotated in a vertical format to accomodate more xticks. How can I do that?
https://matplotlib.org/3.1.0/api/dates_api.html#matplotlib.dates.MonthLocator i.e. something like:
import matplotlib.dates as mdates
...
ax.xaxis.set_major_locator(mdates.MonthLocator())
Related
I have a Plotly express animated scatter plot, which plots variable 1 (INFLATION) against variable 2 (GROWTH) for every year (TIME) and every country (LOCATION) and then iterates through the years (that's the animation).
def create_plot(self):
fig = px.scatter(self.yoy2,
x="GROWTH",
y="INFLATION",
color="LOCATION",
animation_frame="TIME",
#animation_group="TIME",
size='GDP',
text="LOCATION",
hover_name="LOCATION",
range_y=[-2,12],
range_x=[-15,15],
labels={
"GROWTH": "Growth measured in QGDP PC_CHGPP",
"INFLATION": "INFLATION measured in AGRWTH",
"TIME": "Year"},
title=f"Growth versus Inflation plotter for OECD countries.",
template='none')
I would like to add historic data as shadows to the scatter. So for example there is the scatter of 2021 showing the variables growth and inflation for every country, then I would like to also show the scatter of 2020 with the same colours, and every country point of 2021 connected to their country point in 2020 with a line (so that you see the year over year evolution).
I tried to draw another scatter on top with a shifted dataframe but that did not work.
And sadly there is no function in plotly itself for this.
How could I achieve this?
I have a code in which I create a data frame from the data of a bunch of sensors, select which type of sensor I wish to observe, and the period over which I would like to observe this sensor. The data is then plotted and saved to an Excel file. The following is a snippet of what my data frame looks like.
Site Name Sensor Display Name Readings Datetime
342 100 Adelaide EY_VAV_31_05 - Space Temperature 2021-01-01 00:06:00
343 100 Adelaide EY_VAV_31_05 - Space Temperature 2021-01-01 00:21:00
344 100 Adelaide EY_VAV_31_05 - Space Temperature 2021-01-01 00:36:00
345 100 Adelaide EY_VAV_31_05 - Space Temperature 2021-01-01 00:52:00
346 100 Adelaide EY_VAV_31_05 - Space Temperature 2021-01-01 01:06:00
My issue is that when I plot the data, I would like to get the dates of each day I observe to be the labels for my X-axis. Instead, I get the number of hours since the beginning of the data. I'm a little new to matplotlib, and don't fully understand how I can fix this. The following is the code used to plot everything.
chosen['Readings Value'].plot()
plt.xlabel('Hour')
plt.ylabel('Sensor Reading')
plt.show()
I don't want to use pd.groupby to reorganize the data by day, as I need to keep all of the data for every hour. Any help will be appreciated, thank you.
I am going to assume that "Readings Datetime" in your dataframe is a single column; (that is, the column contains values like 2021-01-01 00:06:00).
The column should also be type Pandas Timestamp. If not, then you can convert it as follows:
df['Readings Datetime'] = pd.to_datetime(df['Readings Datetime'])
Matplotlib has its own datetime type. You may already know this and have converted, but in case not, do the following:
import matplotlib.dates as mdates
xdates = mdates.date2num([dtm.to_pydatetime() for dtm in df['Readings Datetime']])
where xdates is the variable you will use to populate your x-axis data.
Then to format the x-asis with only the dates:
formatter = mdates.DateFormatter('%Y-%b-%d') # or similar format string
ax.xaxis.set_major_formatter(formatter) # `ax` is your Axes object
I just now looked more closely at your code:
chosen['Readings Value'].plot()
plt.xlabel('Hour')
plt.ylabel('Sensor Reading')
plt.show()
Assuming chosen is your dataframe, you appear to be using a combination of Pandas (which uses matplotlib under the hood) and matplotlib directly (plt) to plot your data. This can result in some confusing situations, because Pandas makes certain assumptions when plotting, and takes some of the control away from you.
You can avoid this by either doing all of your plotting directly in matplotlib,
or continue using Pandas to plot, but before calling chosen[...].plot() call
fig, ax = plt.subplots()
to gain access to the Figure and Axes objects.
Then you can call ax.xaxis.set_major_formatter() as shown above. There is also an example here that also shows how to reformat the x-axis when using Pandas to plot with datetimes on the x-axis.
I'm trying to plot a graph with time data on X-Axis. My data has daily information, but I want to create something that has two different date scales on X-Axis.
I want to start it from 2005 and it goes to 2014, but after 2014, I want that, the data continues by months of 2015. Is this possible to do? If so: how can I create this kind of plot?
Thanks.
I provided an image below:
Yes you can, just use the following pattern as I observed your X-axis values are already the same so it would just plot the other graph on the right
For a dataframe:
import numpy, matplotlib
data = numpy.array([45,63,83,91,101])
df1 = pd.DataFrame(data, index=pd.date_range('2005-10-09', periods=5, freq='W'), columns=['events'])
df2 = pd.DataFrame(numpy.arange(10,21,2), index=pd.date_range('2015-01-09', periods=6, freq='M'), columns=['events'])
matplotlib.pyplot.plot(df1.index, df1.events)
matplotlib.pyplot.plot(df2.index, df2.events)
matplotlib.pyplot.show()
You can change the parameters according to your convenience.
I'm currently working with some temperature data from a sensor that was active for about 4 months (from December 2018 to March 2019). I'm trying to plot the data; however, my time series currently goes from 350 to 430. How do I make the x-axis ticks start over at 0 once it reaches 365? Or, how can I add ticks that represent months starting at December and going to March?
Current graph:
Let's say you have your matplotlib.pyplot object, e.g. plt. We can use this to change the labels of the x-axis ticks:
xticks = plt.xticks()[0]
plt.xticks(xticks, (xticks % 365))
Below shows a plot of simulated data, which contains the xticks that I want to modify. By default, the pd.df.plot chooses dates that are approximately 3 months apart as ticks. But what I want is each month being a tick. What is the best way to do this? What about seasonal ticks? Thank you in advance.
First of all you have to convert pandas date objects to python date objects. This conversion is needed because of matplotlib internal date conversion functions. Then use functions from matplotlib.dates to set desired formatter and tick positions like here:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import matplotlib.dates as mdates
# convert date objects from pandas format to python datetime
index = pd.date_range(start = "2015-07-01", end = "2017-01-01", freq = "D")
index = [pd.to_datetime(date, format='%Y-%m-%d').date() for date in index]
data = np.random.randint(1,100, size=len(index))
df = pd.DataFrame(data=data,index=index, columns=['data'])
print (df.head())
ax = df.plot()
# set monthly locator
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
# set formatter
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
# set font and rotation for date tick labels
plt.gcf().autofmt_xdate()
plt.show()
For season labels you have to construct it by yourself and then set it with plt.setp function (for month 02 set label winter, 04 - spring etc.):
plt.setp(new_labels, rotation=90, fontsize=9).
head of df:
data
2015-07-01 26
2015-07-02 33
2015-07-03 46
2015-07-04 69
2015-07-05 17
This answer is based on the one by Serenity as well as on this one by ImportanceOfBeingErnest.
The best way to customize time series tick labels is to use the tick locators and formatters from the matplotlib.dates module (mdates). Though it is worth noting that if you want a tick frequency based on the same unit as the time series you are plotting, it may be more convenient to create and format the tick labels using the dates as strings like in the answers to this question concerning pandas bar plots.
As described in the documentation, pandas uses matplotlib to create plots with its own custom tick formatters for time series:
pandas provides custom formatters for timeseries plots. These change the formatting of the axis labels for dates and times. By default, the custom formatters are applied only to plots created by pandas with DataFrame.plot() or Series.plot().
The ticks and labels of pandas time series plots are currently formatted like this by default:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.dates as mdates # v 3.3.2
# Create random dataset stored as a pandas DataFrame with a DatetimeIndex
rng = np.random.default_rng(seed=1) # random number generator
date_day = pd.date_range(start='2015-07-01', end='2016-12-31', freq='D')
traffic = rng.lognormal(sigma=2, size=date_day.size)
df_day = pd.DataFrame(dict(traffic=traffic), index=date_day)
# Create pandas plot with default settings except for figure size
df_day.plot(figsize=(10,5));
To be able to use the mdates tick locators and formatters and override the default tick formatting, the pandas dates must be correctly recognized by matplotlib. The problem is that pandas and matplotlib have different approaches to computing the date numbers that are used to locate the ticks on the time axis (the x-axis by default).
In pandas, time is measured in nanoseconds starting at zero on 1970-01-01 00:00:00 (the origin of the Unix epoch) and individual time points are stored as pandas timestamp objects. But when it comes to creating time scales for plots, pandas uses another numbering system which starts at the same origin but then increases by 1 for each period of the chosen frequency (in this example the frequency is in days).
Matplotlib uses the same default origin as pandas since version 3.3.0 released in July 2020 but the dates are always numbered in terms of days:
Matplotlib represents dates using floating point numbers specifying the number of days since a default epoch of 1970-01-01 UTC; for example, 1970-01-01, 06:00 is the floating point number 0.25.
You can check what numbers are being used for the scale by running ax.get_xticks(), with ax = df.plot() when using pandas.
As you might have guessed, this means no date conversion is needed when the time series has a frequency in days, as illustrated here with a simple custom tick locator and formatter:
ax = df_day.plot(figsize=(10,5))
# Create custom ticks using matplotlib date tick locator and formatter
loc = mdates.MonthLocator(interval=2)
ax.xaxis.set_major_locator(loc)
fmt = mdates.DateFormatter('%b\n%Y')
ax.xaxis.set_major_formatter(fmt)
This particular case makes it convenient for keeping other pandas default settings for the x-axis limits and minor x ticks. But this is an exception to the general rule.
To be able to use mdates tick locators and formatters with a pandas plot of a time series of any type of frequency, you need to use the (long-existing yet absent-from-the-docstring and barely-documented) x_compat=True argument. The following example illustrates its use with the same dataset resampled to a monthly frequency. It may often be the case that you just want to slightly tweak the default pandas format, so in the following example, the default format is recreated from scratch to show what methods can be used to adjust it:
# Resample time series to monthly frequency and plot it using date
# numbers that are compatible with mdates
df_month = df_day.resample('MS').sum()
ax = df_month.plot(figsize=(10,5), x_compat=True)
# Set major and minor date tick locators
maj_loc = mdates.MonthLocator(bymonth=np.arange(1,12,2))
ax.xaxis.set_major_locator(maj_loc)
min_loc = mdates.MonthLocator()
ax.xaxis.set_minor_locator(min_loc)
# Set major date tick formatter
zfmts = ['', '%b\n%Y', '%b', '%b-%d', '%H:%M', '%H:%M']
maj_fmt = mdates.ConciseDateFormatter(maj_loc, zero_formats=zfmts, show_offset=False)
ax.xaxis.set_major_formatter(maj_fmt)
ax.figure.autofmt_xdate(rotation=0, ha='center')
ax.set_xlim(df_month.index.min(), df_month.index.max());
Documentation: pd.date_range, date format codes, mdates.ConciseDateFormatter, fig.autofmt_xdate
I had a hard time trying to get #Serenity answer to work because I'm working directly with Matplotlib instead of plotting the Pandas dataset. So if you are one of these, my answer might help.
Plotting with Matplotlib.plot()
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Process dataset
bitcoin['Date'] = pd.to_datetime(bitcoin['Date'])
bitcoin['Open'] = pd.to_numeric(bitcoin['Open'])
# Plot
plt.figure()
plt.plot(bitcoin['Date'], bitcoin['Open'])
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=4))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
plt.gcf().autofmt_xdate() # Rotation
plt.show()
bitcoin[['Date', 'Open']].head()
Date Open
0 2017-09-05 4228.29
1 2017-09-04 4591.63
2 2017-09-03 4585.27
3 2017-09-02 4901.42
4 2017-09-01 4701.76