i want to plot a consumption of electricty every 30 minutes over 2 month
my code is working my problem is in xlabel i don't want to have range(1,2....48*58)
but i want to have some thing like that between 1 and 48*30 give a nme of january between the secon 48*28 giving the name of february etc...
plt.xticks(rotation=70)
mask3 = (train['date'] >= '2008-01-01') & (train['date'] <= '2008-02-27')
week = train.loc[mask3]
plt.plot(range(48*58),week.LoadNette)
plt.ylabel("Electricy consumption")
plt.xlabel("Month")
plt.title('Electricity consumption / week')
plt.show()
By searching «python matplotlib use dates as xlabel» on a search engine, you can find an example of what you want in the Matplotlib documentation : https://matplotlib.org/examples/api/date_demo.html.
This example supposes your xdata is dates however, which is not the case right now. You would need to create a list of dates and use that instead of your range(48*58) list, like this :
import pandas
xdata = pandas.date_range(
pandas.to_datetime("2008-01-01"),
pandas.to_datetime("2008-02-27 23:30:00"),
freq=pandas.to_timedelta(30,unit="m")).tolist()
This creates a list of datetimes from your start time to your end time at a frequency of 30 minutes.
After that, you'll need to use the example in the link above. Here it is reproduced and tweaked a bit to your needs, but you'll need to play around with it to set it properly. You can find many more examples of using dates in matplotlib now that you'll be using a date list as input for your plot.
import datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
# define locators for every month and every day
months = mdates.MonthLocator() # every month
days = mdates.DayLocator() # every day
monthsFmt = mdates.DateFormatter('%m')
# create the plot and plot your data
fig, ax = plt.subplots()
ax.plot(xdata, week.LoadNette)
# format the x ticks to have a major tick every month and a minor every day
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(monthsFmt)
ax.xaxis.set_minor_locator(days)
# format the xlabel to only show the month
ax.format_xdata = mdates.DateFormatter('%m')
# rotates and right aligns the x labels, and moves the bottom of the
# axes up to make room for them
fig.autofmt_xdate()
plt.show()
Using dates in Matplotlib can be intimidating, but it's better in the long run than just hacking the labels you want this specific time.
Related
I'm plotting a dataframe which its index is of type datetime (like 2018-05-29 08:20:00).
I slice the data based on last hour and last day and last week and last month and then I plot them.
The data is collected every one minuet. So, the index of each row differs only one minute.
When I plot the data for last hour, the x axis is plotted like:
Or, for the last month it is like:
which is clean and readable. But, when I plot the last day data the x-axis index is like:
Why it is overlapped? how to fix it?
the codes to plot these time frames are the same, just the given dataframe is changed:
self.canvas.axes.plot(df_day.index, df_day.loc[:, item], linestyle="None", marker='.')
# or df_month or df_week or df_hour
how to make a the x-axis index as the format that I want?
I want it to be printed as hour:minute for last hour, or day hour:minute for last day.
I tried the links, but none of them helped:
Customizing Ticks
matplotlib: how to prevent x-axis labels from overlapping each other
I tried
self.canvas.axes.xaxis.set_major_formatter(self.major_formatter, self.canvas.axes.get_xticklabels())
#ticker.FuncFormatter
def major_formatter(x, pos):
return datetime.datetime.fromtimestamp(x.day / 1e3)
but it returned int46 in x variable, so it wasn't helping.
from the first answer to How to plot day and month which is also an answer from question owner I found the solution:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(date, price , label="Price")
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%d'))
or in my case:
self.canvas.axes.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
from strftime() and strptime() Format Codes¶, one can learn about formats of dates and times.
I have two datasets that contain temperature and light sensor readings. The measurements were done from 22:35:41 - 04:49:41.
The problem with this datasets is to plot the measurements with respect to the datetime.date format when the measurements are taken from one day to another (22:35:41 - 04:49:41). The plot-function automatically starts from 00:00 and puts the data that was measured before 00:00 to the end of the plot.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Temperature = pd.read_excel("/kaggle/input/Temperature_measurement.xlsx")
Light = pd.read_excel("/kaggle/input/Light_measurement.xlsx")
sns.lineplot(x="Time",y="Light", data = Light)
sns.lineplot(y="Temperature", x="Time", data = Temperature)
plt.show()
This is a link to the dataset
Here is a link to the Jupyter Notebook
First you need to convert your times to a Pandas Timestamp. Pandas Timestamps don't really support a time on its own, they will attach a date to them, but that's fine since we'll hide that part later.
We also need to detect day changes, which we can do by looking at where the time wraps, which we can find by looking at a time that's smaller than its predecessor.
We can count the cumulative wraps and add that number of dates to our timestamps.
Let's define a function to take the datetime.time objects, convert them to native Pandas Timestamps (using an arbitrary date of 1900-01-01, which is the default for Pandas) and adjusting the day according to the wraps (so we end up with our final times on 1900-01-02):
def normalize_time(series):
series = pd.to_datetime(series, format="%H:%M:%S")
series += pd.to_timedelta(series.lt(series.shift()).cumsum(), unit="D")
return series
Let's now apply it to our DataFrames:
Light["Time"] = normalize_time(Light["Time"])
Temperature["Time"] = normalize_time(Temperature["Time"])
Plotting the data now will look correct, with the times being continuous. Except that the labels of the X ticks will try to display the dates, which are not really what we care about, so let's fix that part now.
We can use Matplotlib's set_major_formatter together with a DateFormatter to include times only:
import matplotlib.dates
ax = plt.subplot()
sns.lineplot(x="Time", y="Light", data=Light)
sns.lineplot(x="Time", y="Temperature", data=Temperature)
ax.xaxis.set_major_formatter(
matplotlib.dates.DateFormatter("%H:%M")
)
plt.show()
This produces X ticks every hour, which seem to be a great fit for this data set.
I have a csv with the following columns: recorded, humidity and temperature. I want to display the recorded values(date and time) on the x axis and the humidity on the y axis. How can I properly display the dates(it is quite a big csv), as my current plot has black instead of proper date numbers... My date format is like this: 2019-09-12T07:26:55, having the date and also the time displayed in the csv.
I have displayed the plot using this code:
from matplotlib import pyplot as plt
import pandas as pd
data = pd.read_csv('home_data.csv')
plt.plot(data.recorded, data.humidity)
plt.xlabel('date')
plt.ylabel('humidity')
plt.title('Visualizing date and humidity')
plt.show()
This is a print screen of the plot:
https://snipboard.io/d4hfS7.jpg
Actually, the plot is displaying every date in your dataset. They are so many that they seem just like a black blob. You can downsample the xticks in order to increase the readability. Do something like this:
fig, ax = plt.subplots()
ax.plot(data.recorded, data.humidity)
# some axes labelling
# Reduce now the number of the ticks printed in the figure
ax.set_xticks(ax.get_xticks()[::4])
ax.get_xticklabels(ax.get_xticks(), rotation=45)
In line ax.set_xticks(ax.get_xticks()[::4]) you are setting the ticks of the x-axis
picking 1 date every 4 using the property of the list. It will reduce the number of dates printed. You can increase the number as much as you want.
To increase the readibility, you can rotate the tick labels as I suggested in the line
ax.get_xticklabels(ax.get_xticks(), rotation=45).
Hope this helps.
Below shows a plot of simulated data, which contains the xticks that I want to modify. By default, the pd.df.plot chooses dates that are approximately 3 months apart as ticks. But what I want is each month being a tick. What is the best way to do this? What about seasonal ticks? Thank you in advance.
First of all you have to convert pandas date objects to python date objects. This conversion is needed because of matplotlib internal date conversion functions. Then use functions from matplotlib.dates to set desired formatter and tick positions like here:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import matplotlib.dates as mdates
# convert date objects from pandas format to python datetime
index = pd.date_range(start = "2015-07-01", end = "2017-01-01", freq = "D")
index = [pd.to_datetime(date, format='%Y-%m-%d').date() for date in index]
data = np.random.randint(1,100, size=len(index))
df = pd.DataFrame(data=data,index=index, columns=['data'])
print (df.head())
ax = df.plot()
# set monthly locator
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
# set formatter
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
# set font and rotation for date tick labels
plt.gcf().autofmt_xdate()
plt.show()
For season labels you have to construct it by yourself and then set it with plt.setp function (for month 02 set label winter, 04 - spring etc.):
plt.setp(new_labels, rotation=90, fontsize=9).
head of df:
data
2015-07-01 26
2015-07-02 33
2015-07-03 46
2015-07-04 69
2015-07-05 17
This answer is based on the one by Serenity as well as on this one by ImportanceOfBeingErnest.
The best way to customize time series tick labels is to use the tick locators and formatters from the matplotlib.dates module (mdates). Though it is worth noting that if you want a tick frequency based on the same unit as the time series you are plotting, it may be more convenient to create and format the tick labels using the dates as strings like in the answers to this question concerning pandas bar plots.
As described in the documentation, pandas uses matplotlib to create plots with its own custom tick formatters for time series:
pandas provides custom formatters for timeseries plots. These change the formatting of the axis labels for dates and times. By default, the custom formatters are applied only to plots created by pandas with DataFrame.plot() or Series.plot().
The ticks and labels of pandas time series plots are currently formatted like this by default:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.dates as mdates # v 3.3.2
# Create random dataset stored as a pandas DataFrame with a DatetimeIndex
rng = np.random.default_rng(seed=1) # random number generator
date_day = pd.date_range(start='2015-07-01', end='2016-12-31', freq='D')
traffic = rng.lognormal(sigma=2, size=date_day.size)
df_day = pd.DataFrame(dict(traffic=traffic), index=date_day)
# Create pandas plot with default settings except for figure size
df_day.plot(figsize=(10,5));
To be able to use the mdates tick locators and formatters and override the default tick formatting, the pandas dates must be correctly recognized by matplotlib. The problem is that pandas and matplotlib have different approaches to computing the date numbers that are used to locate the ticks on the time axis (the x-axis by default).
In pandas, time is measured in nanoseconds starting at zero on 1970-01-01 00:00:00 (the origin of the Unix epoch) and individual time points are stored as pandas timestamp objects. But when it comes to creating time scales for plots, pandas uses another numbering system which starts at the same origin but then increases by 1 for each period of the chosen frequency (in this example the frequency is in days).
Matplotlib uses the same default origin as pandas since version 3.3.0 released in July 2020 but the dates are always numbered in terms of days:
Matplotlib represents dates using floating point numbers specifying the number of days since a default epoch of 1970-01-01 UTC; for example, 1970-01-01, 06:00 is the floating point number 0.25.
You can check what numbers are being used for the scale by running ax.get_xticks(), with ax = df.plot() when using pandas.
As you might have guessed, this means no date conversion is needed when the time series has a frequency in days, as illustrated here with a simple custom tick locator and formatter:
ax = df_day.plot(figsize=(10,5))
# Create custom ticks using matplotlib date tick locator and formatter
loc = mdates.MonthLocator(interval=2)
ax.xaxis.set_major_locator(loc)
fmt = mdates.DateFormatter('%b\n%Y')
ax.xaxis.set_major_formatter(fmt)
This particular case makes it convenient for keeping other pandas default settings for the x-axis limits and minor x ticks. But this is an exception to the general rule.
To be able to use mdates tick locators and formatters with a pandas plot of a time series of any type of frequency, you need to use the (long-existing yet absent-from-the-docstring and barely-documented) x_compat=True argument. The following example illustrates its use with the same dataset resampled to a monthly frequency. It may often be the case that you just want to slightly tweak the default pandas format, so in the following example, the default format is recreated from scratch to show what methods can be used to adjust it:
# Resample time series to monthly frequency and plot it using date
# numbers that are compatible with mdates
df_month = df_day.resample('MS').sum()
ax = df_month.plot(figsize=(10,5), x_compat=True)
# Set major and minor date tick locators
maj_loc = mdates.MonthLocator(bymonth=np.arange(1,12,2))
ax.xaxis.set_major_locator(maj_loc)
min_loc = mdates.MonthLocator()
ax.xaxis.set_minor_locator(min_loc)
# Set major date tick formatter
zfmts = ['', '%b\n%Y', '%b', '%b-%d', '%H:%M', '%H:%M']
maj_fmt = mdates.ConciseDateFormatter(maj_loc, zero_formats=zfmts, show_offset=False)
ax.xaxis.set_major_formatter(maj_fmt)
ax.figure.autofmt_xdate(rotation=0, ha='center')
ax.set_xlim(df_month.index.min(), df_month.index.max());
Documentation: pd.date_range, date format codes, mdates.ConciseDateFormatter, fig.autofmt_xdate
I had a hard time trying to get #Serenity answer to work because I'm working directly with Matplotlib instead of plotting the Pandas dataset. So if you are one of these, my answer might help.
Plotting with Matplotlib.plot()
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Process dataset
bitcoin['Date'] = pd.to_datetime(bitcoin['Date'])
bitcoin['Open'] = pd.to_numeric(bitcoin['Open'])
# Plot
plt.figure()
plt.plot(bitcoin['Date'], bitcoin['Open'])
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=4))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
plt.gcf().autofmt_xdate() # Rotation
plt.show()
bitcoin[['Date', 'Open']].head()
Date Open
0 2017-09-05 4228.29
1 2017-09-04 4591.63
2 2017-09-03 4585.27
3 2017-09-02 4901.42
4 2017-09-01 4701.76
I'm making a visualization of historical stock data for a project, and I'd like to highlight regions of drops. For instance, when the stock is experiencing significant drawdown, I would like to highlight it with a red region.
Can I do this automatically, or will I have to draw a rectangle or something?
Have a look at axvspan (and axhspan for highlighting a region of the y-axis).
import matplotlib.pyplot as plt
plt.plot(range(10))
plt.axvspan(3, 6, color='red', alpha=0.5)
plt.show()
If you're using dates, then you'll need to convert your min and max x values to matplotlib dates. Use matplotlib.dates.date2num for datetime objects or matplotlib.dates.datestr2num for various string timestamps.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
t = mdates.drange(dt.datetime(2011, 10, 15), dt.datetime(2011, 11, 27),
dt.timedelta(hours=2))
y = np.sin(t)
fig, ax = plt.subplots()
ax.plot_date(t, y, 'b-')
ax.axvspan(*mdates.datestr2num(['10/27/2011', '11/2/2011']), color='red', alpha=0.5)
fig.autofmt_xdate()
plt.show()
Here is a solution that uses axvspan to draw multiple highlights where the limits of each highlight are set by using the indices of the stock data corresponding to the peaks and troughs.
Stock data usually contain a discontinuous time variable where weekends and holidays are not included. Plotting them in matplotlib or pandas will produce gaps along the x-axis for weekends and holidays when dealing with daily stock prices. This may not be noticeable with long date ranges and/or small figures (like in this example), but it will become apparent if you zoom in and it may be something that you want to avoid.
This is why I share here a complete example that features:
A realistic sample dataset that includes a discontinuous DatetimeIndex based on the New York Stock Exchange trading calendar imported with the pandas_market_calendars as well as fake stock data that looks like the real thing.
A pandas plot created with use_index=False which removes the gaps for weekends and holidays by using instead a range of integers for the x-axis. The returned ax object is used in a way that avoids the need to import matplotlib.pyplot (unless you need plt.show).
An automatic detection of drawdowns over the entire date range by using the scipy.signal find_peaks function which returns the indices needed to plot the highlights with axvspan. Computing drawdowns in a more correct way would require a clear definition of what would count as a drawdown and would lead to more complicated code which is a topic for another question.
Properly formatted ticks created by looping through the timestamps of the DatetimeIndex seeing as all the convenient matplotlib.dates tick locators and formatters as well as DatetimeIndex properties like .is_month_start cannot be used in this case.
Create sample dataset
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import pandas_market_calendars as mcal # v 1.6.1
from scipy.signal import find_peaks # v 1.5.2
# Create datetime index with a 'trading day end' frequency based on the New York Stock
# Exchange trading hours (end date is inclusive)
nyse = mcal.get_calendar('NYSE')
nyse_schedule = nyse.schedule(start_date='2019-10-01', end_date='2021-02-01')
nyse_dti = mcal.date_range(nyse_schedule, frequency='1D').tz_convert(nyse.tz.zone)
# Create sample of random data for daily stock closing price
rng = np.random.default_rng(seed=1234) # random number generator
price = 100 + rng.normal(size=nyse_dti.size).cumsum()
df = pd.DataFrame(data=dict(price=price), index=nyse_dti)
df.head()
# price
# 2019-10-01 16:00:00-04:00 98.396163
# 2019-10-02 16:00:00-04:00 98.460263
# 2019-10-03 16:00:00-04:00 99.201154
# 2019-10-04 16:00:00-04:00 99.353774
# 2019-10-07 16:00:00-04:00 100.217517
Plot highlights for drawdowns with properly formatted ticks
# Plot stock price
ax = df['price'].plot(figsize=(10, 5), use_index=False, ylabel='Price')
ax.set_xlim(0, df.index.size-1)
ax.grid(axis='x', alpha=0.3)
# Highlight drawdowns using the indices of stock peaks and troughs: find peaks and
# troughs based on signal analysis rather than an algorithm for drawdowns to keep
# example simple. Width and prominence have been handpicked for this example to work.
peaks, _ = find_peaks(df['price'], width=7, prominence=4)
troughs, _ = find_peaks(-df['price'], width=7, prominence=4)
for peak, trough in zip(peaks, troughs):
ax.axvspan(peak, trough, facecolor='red', alpha=.2)
# Create and format monthly ticks
ticks = [idx for idx, timestamp in enumerate(df.index)
if (timestamp.month != df.index[idx-1].month) | (idx == 0)]
ax.set_xticks(ticks)
labels = [tick.strftime('%b\n%Y') if df.index[ticks[idx]].year
!= df.index[ticks[idx-1]].year else tick.strftime('%b')
for idx, tick in enumerate(df.index[ticks])]
ax.set_xticklabels(labels)
ax.figure.autofmt_xdate(rotation=0, ha='center')
ax.set_title('Drawdowns are highlighted in red', pad=15, size=14);
For the sake of completeness, it is worth noting that you can achieve exactly the same result using the fill_between plotting function, though it takes a few more lines of code:
ax.set_ylim(*ax.get_ylim()) # remove top and bottom gaps with plot frame
drawdowns = np.repeat(False, df['price'].size)
for peak, trough in zip(peaks, troughs):
drawdowns[np.arange(peak, trough+1)] = True
ax.fill_between(np.arange(df.index.size), *ax.get_ylim(), where=drawdowns,
facecolor='red', alpha=.2)
You are using matplotlib's interactive interface and want to have dynamic ticks when you zoom in? Then you will need to use locators and formatters from the matplotlib.ticker module. You could for example keep the major ticks fixed like in this example and add dynamic minor ticks to show days or weeks of the year when zooming in. You can find an example of how to do this at the end of this answer.