I'm making a visualization of historical stock data for a project, and I'd like to highlight regions of drops. For instance, when the stock is experiencing significant drawdown, I would like to highlight it with a red region.
Can I do this automatically, or will I have to draw a rectangle or something?
Have a look at axvspan (and axhspan for highlighting a region of the y-axis).
import matplotlib.pyplot as plt
plt.plot(range(10))
plt.axvspan(3, 6, color='red', alpha=0.5)
plt.show()
If you're using dates, then you'll need to convert your min and max x values to matplotlib dates. Use matplotlib.dates.date2num for datetime objects or matplotlib.dates.datestr2num for various string timestamps.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
t = mdates.drange(dt.datetime(2011, 10, 15), dt.datetime(2011, 11, 27),
dt.timedelta(hours=2))
y = np.sin(t)
fig, ax = plt.subplots()
ax.plot_date(t, y, 'b-')
ax.axvspan(*mdates.datestr2num(['10/27/2011', '11/2/2011']), color='red', alpha=0.5)
fig.autofmt_xdate()
plt.show()
Here is a solution that uses axvspan to draw multiple highlights where the limits of each highlight are set by using the indices of the stock data corresponding to the peaks and troughs.
Stock data usually contain a discontinuous time variable where weekends and holidays are not included. Plotting them in matplotlib or pandas will produce gaps along the x-axis for weekends and holidays when dealing with daily stock prices. This may not be noticeable with long date ranges and/or small figures (like in this example), but it will become apparent if you zoom in and it may be something that you want to avoid.
This is why I share here a complete example that features:
A realistic sample dataset that includes a discontinuous DatetimeIndex based on the New York Stock Exchange trading calendar imported with the pandas_market_calendars as well as fake stock data that looks like the real thing.
A pandas plot created with use_index=False which removes the gaps for weekends and holidays by using instead a range of integers for the x-axis. The returned ax object is used in a way that avoids the need to import matplotlib.pyplot (unless you need plt.show).
An automatic detection of drawdowns over the entire date range by using the scipy.signal find_peaks function which returns the indices needed to plot the highlights with axvspan. Computing drawdowns in a more correct way would require a clear definition of what would count as a drawdown and would lead to more complicated code which is a topic for another question.
Properly formatted ticks created by looping through the timestamps of the DatetimeIndex seeing as all the convenient matplotlib.dates tick locators and formatters as well as DatetimeIndex properties like .is_month_start cannot be used in this case.
Create sample dataset
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import pandas_market_calendars as mcal # v 1.6.1
from scipy.signal import find_peaks # v 1.5.2
# Create datetime index with a 'trading day end' frequency based on the New York Stock
# Exchange trading hours (end date is inclusive)
nyse = mcal.get_calendar('NYSE')
nyse_schedule = nyse.schedule(start_date='2019-10-01', end_date='2021-02-01')
nyse_dti = mcal.date_range(nyse_schedule, frequency='1D').tz_convert(nyse.tz.zone)
# Create sample of random data for daily stock closing price
rng = np.random.default_rng(seed=1234) # random number generator
price = 100 + rng.normal(size=nyse_dti.size).cumsum()
df = pd.DataFrame(data=dict(price=price), index=nyse_dti)
df.head()
# price
# 2019-10-01 16:00:00-04:00 98.396163
# 2019-10-02 16:00:00-04:00 98.460263
# 2019-10-03 16:00:00-04:00 99.201154
# 2019-10-04 16:00:00-04:00 99.353774
# 2019-10-07 16:00:00-04:00 100.217517
Plot highlights for drawdowns with properly formatted ticks
# Plot stock price
ax = df['price'].plot(figsize=(10, 5), use_index=False, ylabel='Price')
ax.set_xlim(0, df.index.size-1)
ax.grid(axis='x', alpha=0.3)
# Highlight drawdowns using the indices of stock peaks and troughs: find peaks and
# troughs based on signal analysis rather than an algorithm for drawdowns to keep
# example simple. Width and prominence have been handpicked for this example to work.
peaks, _ = find_peaks(df['price'], width=7, prominence=4)
troughs, _ = find_peaks(-df['price'], width=7, prominence=4)
for peak, trough in zip(peaks, troughs):
ax.axvspan(peak, trough, facecolor='red', alpha=.2)
# Create and format monthly ticks
ticks = [idx for idx, timestamp in enumerate(df.index)
if (timestamp.month != df.index[idx-1].month) | (idx == 0)]
ax.set_xticks(ticks)
labels = [tick.strftime('%b\n%Y') if df.index[ticks[idx]].year
!= df.index[ticks[idx-1]].year else tick.strftime('%b')
for idx, tick in enumerate(df.index[ticks])]
ax.set_xticklabels(labels)
ax.figure.autofmt_xdate(rotation=0, ha='center')
ax.set_title('Drawdowns are highlighted in red', pad=15, size=14);
For the sake of completeness, it is worth noting that you can achieve exactly the same result using the fill_between plotting function, though it takes a few more lines of code:
ax.set_ylim(*ax.get_ylim()) # remove top and bottom gaps with plot frame
drawdowns = np.repeat(False, df['price'].size)
for peak, trough in zip(peaks, troughs):
drawdowns[np.arange(peak, trough+1)] = True
ax.fill_between(np.arange(df.index.size), *ax.get_ylim(), where=drawdowns,
facecolor='red', alpha=.2)
You are using matplotlib's interactive interface and want to have dynamic ticks when you zoom in? Then you will need to use locators and formatters from the matplotlib.ticker module. You could for example keep the major ticks fixed like in this example and add dynamic minor ticks to show days or weeks of the year when zooming in. You can find an example of how to do this at the end of this answer.
Related
I want to plot the daily rainfall data of 5 years by the bar chart. when the width of bars is 1, they become lines without any width, while I changed the width of bars they overlapped each other like the image below. I want to have discrete bar charts with a good looking width. This my code.
import pandas as pd
from datetime import datetime, timedelta
from matplotlib import pyplot as plt
data=pd.read_excel('final.xlsx')
data['Date']=pd.to_datetime(data['Date'])
date = data['Date']
amount = data['Amount']
plt.bar (date, amount, color='gold', edgecolor='blue', align='center', width=5)
plt.ylabel('rainfall amount (mm)')
plt.show()
Just to note, you can also pass a Timedelta to the width parameter; I find this helpful to be explicit about how many units in x (e.g. days here) the bars will take up. Additionally for some time series the int widths are less intuitive:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#fake data with minute frequency for an hour
dr = pd.date_range('01-01-2016 9:00:00', '01-01-2016 10:00:00', freq='1T')
df = pd.DataFrame(np.random.rand(len(dr)), index=dr)
#graph 1, using int width
plt.figure(figsize=(10,2))
plt.bar (df.index, df[0], color='gold', edgecolor='blue', align='center',
width=1)
#graph 2, using Timedelta width
plt.figure(figsize=(10,2))
plt.bar (df.index, df[0], color='gold', edgecolor='blue', align='center',
width=pd.Timedelta(minutes=1))
Graph 1:
Graph 2:
This was what came to mind when I saw your issue, but I think the real problem is the amount of data points (as #JohanC pointed out). Already when you plot 365 days, you can barely see the yellow anymore (and by 3 or 4 years its definitely gone):
You can also see in the above that different bars get rendered with different apparent widths, but that is just because there are too few pixels in the space provided to accurately show the bar fill and bar widths the same for each point.
I have a csv with the following columns: recorded, humidity and temperature. I want to display the recorded values(date and time) on the x axis and the humidity on the y axis. How can I properly display the dates(it is quite a big csv), as my current plot has black instead of proper date numbers... My date format is like this: 2019-09-12T07:26:55, having the date and also the time displayed in the csv.
I have displayed the plot using this code:
from matplotlib import pyplot as plt
import pandas as pd
data = pd.read_csv('home_data.csv')
plt.plot(data.recorded, data.humidity)
plt.xlabel('date')
plt.ylabel('humidity')
plt.title('Visualizing date and humidity')
plt.show()
This is a print screen of the plot:
https://snipboard.io/d4hfS7.jpg
Actually, the plot is displaying every date in your dataset. They are so many that they seem just like a black blob. You can downsample the xticks in order to increase the readability. Do something like this:
fig, ax = plt.subplots()
ax.plot(data.recorded, data.humidity)
# some axes labelling
# Reduce now the number of the ticks printed in the figure
ax.set_xticks(ax.get_xticks()[::4])
ax.get_xticklabels(ax.get_xticks(), rotation=45)
In line ax.set_xticks(ax.get_xticks()[::4]) you are setting the ticks of the x-axis
picking 1 date every 4 using the property of the list. It will reduce the number of dates printed. You can increase the number as much as you want.
To increase the readibility, you can rotate the tick labels as I suggested in the line
ax.get_xticklabels(ax.get_xticks(), rotation=45).
Hope this helps.
Below shows a plot of simulated data, which contains the xticks that I want to modify. By default, the pd.df.plot chooses dates that are approximately 3 months apart as ticks. But what I want is each month being a tick. What is the best way to do this? What about seasonal ticks? Thank you in advance.
First of all you have to convert pandas date objects to python date objects. This conversion is needed because of matplotlib internal date conversion functions. Then use functions from matplotlib.dates to set desired formatter and tick positions like here:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import matplotlib.dates as mdates
# convert date objects from pandas format to python datetime
index = pd.date_range(start = "2015-07-01", end = "2017-01-01", freq = "D")
index = [pd.to_datetime(date, format='%Y-%m-%d').date() for date in index]
data = np.random.randint(1,100, size=len(index))
df = pd.DataFrame(data=data,index=index, columns=['data'])
print (df.head())
ax = df.plot()
# set monthly locator
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
# set formatter
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
# set font and rotation for date tick labels
plt.gcf().autofmt_xdate()
plt.show()
For season labels you have to construct it by yourself and then set it with plt.setp function (for month 02 set label winter, 04 - spring etc.):
plt.setp(new_labels, rotation=90, fontsize=9).
head of df:
data
2015-07-01 26
2015-07-02 33
2015-07-03 46
2015-07-04 69
2015-07-05 17
This answer is based on the one by Serenity as well as on this one by ImportanceOfBeingErnest.
The best way to customize time series tick labels is to use the tick locators and formatters from the matplotlib.dates module (mdates). Though it is worth noting that if you want a tick frequency based on the same unit as the time series you are plotting, it may be more convenient to create and format the tick labels using the dates as strings like in the answers to this question concerning pandas bar plots.
As described in the documentation, pandas uses matplotlib to create plots with its own custom tick formatters for time series:
pandas provides custom formatters for timeseries plots. These change the formatting of the axis labels for dates and times. By default, the custom formatters are applied only to plots created by pandas with DataFrame.plot() or Series.plot().
The ticks and labels of pandas time series plots are currently formatted like this by default:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.dates as mdates # v 3.3.2
# Create random dataset stored as a pandas DataFrame with a DatetimeIndex
rng = np.random.default_rng(seed=1) # random number generator
date_day = pd.date_range(start='2015-07-01', end='2016-12-31', freq='D')
traffic = rng.lognormal(sigma=2, size=date_day.size)
df_day = pd.DataFrame(dict(traffic=traffic), index=date_day)
# Create pandas plot with default settings except for figure size
df_day.plot(figsize=(10,5));
To be able to use the mdates tick locators and formatters and override the default tick formatting, the pandas dates must be correctly recognized by matplotlib. The problem is that pandas and matplotlib have different approaches to computing the date numbers that are used to locate the ticks on the time axis (the x-axis by default).
In pandas, time is measured in nanoseconds starting at zero on 1970-01-01 00:00:00 (the origin of the Unix epoch) and individual time points are stored as pandas timestamp objects. But when it comes to creating time scales for plots, pandas uses another numbering system which starts at the same origin but then increases by 1 for each period of the chosen frequency (in this example the frequency is in days).
Matplotlib uses the same default origin as pandas since version 3.3.0 released in July 2020 but the dates are always numbered in terms of days:
Matplotlib represents dates using floating point numbers specifying the number of days since a default epoch of 1970-01-01 UTC; for example, 1970-01-01, 06:00 is the floating point number 0.25.
You can check what numbers are being used for the scale by running ax.get_xticks(), with ax = df.plot() when using pandas.
As you might have guessed, this means no date conversion is needed when the time series has a frequency in days, as illustrated here with a simple custom tick locator and formatter:
ax = df_day.plot(figsize=(10,5))
# Create custom ticks using matplotlib date tick locator and formatter
loc = mdates.MonthLocator(interval=2)
ax.xaxis.set_major_locator(loc)
fmt = mdates.DateFormatter('%b\n%Y')
ax.xaxis.set_major_formatter(fmt)
This particular case makes it convenient for keeping other pandas default settings for the x-axis limits and minor x ticks. But this is an exception to the general rule.
To be able to use mdates tick locators and formatters with a pandas plot of a time series of any type of frequency, you need to use the (long-existing yet absent-from-the-docstring and barely-documented) x_compat=True argument. The following example illustrates its use with the same dataset resampled to a monthly frequency. It may often be the case that you just want to slightly tweak the default pandas format, so in the following example, the default format is recreated from scratch to show what methods can be used to adjust it:
# Resample time series to monthly frequency and plot it using date
# numbers that are compatible with mdates
df_month = df_day.resample('MS').sum()
ax = df_month.plot(figsize=(10,5), x_compat=True)
# Set major and minor date tick locators
maj_loc = mdates.MonthLocator(bymonth=np.arange(1,12,2))
ax.xaxis.set_major_locator(maj_loc)
min_loc = mdates.MonthLocator()
ax.xaxis.set_minor_locator(min_loc)
# Set major date tick formatter
zfmts = ['', '%b\n%Y', '%b', '%b-%d', '%H:%M', '%H:%M']
maj_fmt = mdates.ConciseDateFormatter(maj_loc, zero_formats=zfmts, show_offset=False)
ax.xaxis.set_major_formatter(maj_fmt)
ax.figure.autofmt_xdate(rotation=0, ha='center')
ax.set_xlim(df_month.index.min(), df_month.index.max());
Documentation: pd.date_range, date format codes, mdates.ConciseDateFormatter, fig.autofmt_xdate
I had a hard time trying to get #Serenity answer to work because I'm working directly with Matplotlib instead of plotting the Pandas dataset. So if you are one of these, my answer might help.
Plotting with Matplotlib.plot()
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Process dataset
bitcoin['Date'] = pd.to_datetime(bitcoin['Date'])
bitcoin['Open'] = pd.to_numeric(bitcoin['Open'])
# Plot
plt.figure()
plt.plot(bitcoin['Date'], bitcoin['Open'])
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=4))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
plt.gcf().autofmt_xdate() # Rotation
plt.show()
bitcoin[['Date', 'Open']].head()
Date Open
0 2017-09-05 4228.29
1 2017-09-04 4591.63
2 2017-09-03 4585.27
3 2017-09-02 4901.42
4 2017-09-01 4701.76
i want to plot a consumption of electricty every 30 minutes over 2 month
my code is working my problem is in xlabel i don't want to have range(1,2....48*58)
but i want to have some thing like that between 1 and 48*30 give a nme of january between the secon 48*28 giving the name of february etc...
plt.xticks(rotation=70)
mask3 = (train['date'] >= '2008-01-01') & (train['date'] <= '2008-02-27')
week = train.loc[mask3]
plt.plot(range(48*58),week.LoadNette)
plt.ylabel("Electricy consumption")
plt.xlabel("Month")
plt.title('Electricity consumption / week')
plt.show()
By searching «python matplotlib use dates as xlabel» on a search engine, you can find an example of what you want in the Matplotlib documentation : https://matplotlib.org/examples/api/date_demo.html.
This example supposes your xdata is dates however, which is not the case right now. You would need to create a list of dates and use that instead of your range(48*58) list, like this :
import pandas
xdata = pandas.date_range(
pandas.to_datetime("2008-01-01"),
pandas.to_datetime("2008-02-27 23:30:00"),
freq=pandas.to_timedelta(30,unit="m")).tolist()
This creates a list of datetimes from your start time to your end time at a frequency of 30 minutes.
After that, you'll need to use the example in the link above. Here it is reproduced and tweaked a bit to your needs, but you'll need to play around with it to set it properly. You can find many more examples of using dates in matplotlib now that you'll be using a date list as input for your plot.
import datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
# define locators for every month and every day
months = mdates.MonthLocator() # every month
days = mdates.DayLocator() # every day
monthsFmt = mdates.DateFormatter('%m')
# create the plot and plot your data
fig, ax = plt.subplots()
ax.plot(xdata, week.LoadNette)
# format the x ticks to have a major tick every month and a minor every day
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(monthsFmt)
ax.xaxis.set_minor_locator(days)
# format the xlabel to only show the month
ax.format_xdata = mdates.DateFormatter('%m')
# rotates and right aligns the x labels, and moves the bottom of the
# axes up to make room for them
fig.autofmt_xdate()
plt.show()
Using dates in Matplotlib can be intimidating, but it's better in the long run than just hacking the labels you want this specific time.
I want to change x axis to years. The years are saves in the variable years.
I want to make plot of my data that looks like this:
It should look like this image
However, I am not able to create x axes with a years. My plot looks like the following image:
This is an example of produced image by my code
My code looks as follows:
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("data1.csv")
demand = data["demand"]
years = data["year"]
plt.plot( demand, color='black')
plt.xlabel("Year")
plt.ylabel("Demand (GW)")
plt.show()
I am thankful for any advice.
The plot method in your example does not know the scaling of your data. So, for simplicity it treats the values of demand as being one unit apart from each other. If you want your x-axis to represent years, you have to tell matplotlib how many values of demand it should treat as "one year". If your data is a monthly demand, it is obviously 12 values per year. And here we go:
# setup a figure
fig, (ax1, ax2) = plt.subplots(2)
# generate some random data
data = np.random.rand(100)
# plot undesired way
ax1.plot(data)
# change the tick positions and labels ...
ax2.plot(data)
# ... to one label every 12th value
xticks = np.arange(0,100,12)
# ... start counting in the year 2000
xlabels = range(2000, 2000+len(xticks))
ax2.set_xticks(xticks)
ax2.set_xticklabels(xlabels)
plt.show()