I am new to Python and visualization and am trying to get my plot to only display the xticks in the middle of each year (see my plot below). I have tried a couple of things with date_range but now my plot is displaying two xticks for each year, one for the beginning of the year and one for the middle of the year. How can I get rid of the xticks that are at the beginning of each year and only keep the ones at the middle of each year?
Here's my code and plot:
texasdrought['ValidStart']=pd.to_datetime(texasdrought['ValidStart'])
droughtMask = texasdrought[texasdrought['ValidStart'].dt.year.between(2005,2015)]
# Set the figure size
plt.figure(figsize = (30,16))
# Create a mask with the dates
dates = droughtMask["ValidStart"]
# Categorize droughts
droughtcat = {
'D4 - Exceptional Drought': droughtMask["D4"],
'D3 - Extreme Drought': droughtMask["D3"],
'D2 - Severe Drought': droughtMask["D2"],
'D1 - Moderate Drought': droughtMask["D1"],
'D0 - Abnormally Dry': droughtMask["D0"]
}
fig, ax = plt.subplots()
ax.stackplot(dates, droughtcat.values(), labels=droughtcat.keys(),colors=['#660000','#FF0000','#FF6600','#FFCC99','#FFFF00'])
# Format y-axis to percentages
yearsFmt = mdates.DateFormatter("'%y")
ax.xaxis.set_major_formatter(yearsFmt)
#ax.yaxis.set_major_formatter(mtick.PercentFormatter())
# Add legend location
ax.legend(loc='upper left')
# Add title to the stackplot
ax.set_title('Drought in Texas (2005-2015)')
ticks = pd.date_range('2005-01-01', '2015-12-31', freq='6M')
plt.xticks(ticks)
# Add axis labels
ax.set_xlabel('Year')
ax.set_ylabel('Drought Intensity')
# Save figure as DroughtPlot.jpg
fig.savefig('DroughtPlot.jpg')
plt.show()
Thank you.
Use this instead:
ticks = pd.date_range('2005-01-01', '2015-12-31', freq='A-JUN') which will generate monthly dates ending in middle of each year at the last date of middle month.
DatetimeIndex(['2005-06-30', '2006-06-30', '2007-06-30', '2008-06-30',
'2009-06-30', '2010-06-30', '2011-06-30', '2012-06-30',
'2013-06-30', '2014-06-30', '2015-06-30'],
dtype='datetime64[ns]', freq='A-JUN')
Related
Data - we import historical yields of the ten and thirty year Treasury and calculate the spread (difference) between the two (this block of code is good; feel free so skip):
#Import statements
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
#Constants
start_date = "2018-01-01"
end_date = "2023-01-01"
#Pull in data
tenYear_master = yf.download('^TNX', start_date, end_date)
thirtyYear_master = yf.download('^TYX', start_date, end_date)
#Trim DataFrames to only include 'Adj Close columns'
tenYear = tenYear_master['Adj Close'].to_frame()
thirtyYear = thirtyYear_master['Adj Close'].to_frame()
#Rename columns
tenYear.rename(columns = {'Adj Close' : 'Adj Close - Ten Year'}, inplace= True)
thirtyYear.rename(columns = {'Adj Close' : 'Adj Close - Thirty Year'}, inplace= True)
#Join DataFrames
data = tenYear.join(thirtyYear)
#Add column for difference (spread)
data['Spread'] = data['Adj Close - Thirty Year'] - data['Adj Close - Ten Year']
data
This block is also good.
'''Plot data'''
#Delete top, left, and right borders from figure
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.left'] = False
plt.rcParams['axes.spines.right'] = False
fig, ax = plt.subplots(figsize = (15,10))
data.plot(ax = ax, secondary_y = ['Spread'], ylabel = 'Yield', legend = False);
'''Change left y-axis tick labels to percentage'''
left_yticks = ax.get_yticks().tolist()
ax.yaxis.set_major_locator(mticker.FixedLocator(left_yticks))
ax.set_yticklabels((("%.1f" % tick) + '%') for tick in left_yticks);
#Add legend
fig.legend(loc="upper center", ncol = 3, frameon = False)
fig.tight_layout()
plt.show()
I have questions concerning two features of the graph that I want to customize:
The x-axis currently has a tick and tick label for every year. How can I change this so that there is a tick and tick label for every 3 months in the form MMM-YY? (see picture below)
The spread was calculated as thirty year yield - ten year yield. Say I want to change the RIGHT y-axis tick labels so that their sign is flipped, but I want to leave both the original data and curves alone (for the sake of argument; bear with me, there is logic underlying this). In other words, the right y-axis tick labels currently go from -0.2 at the bottom to 0.8 at the top. How can I change them so that they go from 0.2 at the bottom to -0.8 at the top without changing anything about the data or curves? This is purely a cosmetic change of the right y-axis tick labels.
I tried doing the following:
'''Change right y-axis tick labels'''
right_yticks = (ax.right_ax).get_yticks().tolist()
#Loop through and multiply each right y-axis tick label by -1
for index, value in enumerate(right_yticks):
right_yticks[index] = value*(-1)
(ax.right_ax).yaxis.set_major_locator(mticker.FixedLocator(right_yticks))
(ax.right_ax).set_yticklabels(right_yticks)
But I got this:
Note how the right y-axis is incomplete.
I'd appreciate any help. Thank you!
Let's create some data:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
days = np.array(["2022-01-01", "2022-07-01", "2023-02-15", "2023-11-15", "2024-03-03"],
dtype = "datetime64")
val = np.array([20, 20, -10, -10, 10])
For the date in the x-axis, we import matplotlib.dates, which provides the month locator and the date formater. The locator sets the ticks each 3 months, and the formater sets the way the labels are displayed (month-00).
For the y-axis data, you require changing the sign of the data (hence the negative sign in ax2.plot(), but you want the curve in the same position, so afterwards you need to invert the axis. And so, the curves in both plots are identical, but the y-axis values have different signs and directions.
fig, (ax1, ax2) = plt.subplots(figsize = (10,5), nrows = 2)
ax1.plot(days, val, marker = "x")
# set the locator to Jan, Apr, Jul, Oct
ax1.xaxis.set_major_locator(mdates.MonthLocator( bymonth = (1, 4, 7, 10) ))
# set the formater for month-year, with lower y to show only two digits
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%b-%y"))
# change the sign of the y data plotted
ax2.plot(days, -val, marker = "x")
#invert the y axis
ax2.invert_yaxis()
# set the locator to Jan, Apr, Jul, Oct
ax2.xaxis.set_major_locator(mdates.MonthLocator( bymonth = (1, 4, 7, 10) ))
# set the formater for month-year, with lower y to show only two digits
ax2.xaxis.set_major_formatter(mdates.DateFormatter("%b-%y"))
plt.show()
I have one-year data and I want to plot their seasonal patterns. SO I just created sub data for each season. but my winter data plot has a gap. It cannot plot three months in sequence.
Here is my data:
winter = pd.concat([countData19_gdf.loc['2019-12-01':'2019-12-31'], countData19_gdf.loc['2019-01-01':'2019-02-28']])
winter= winter.sort_index()
min_count = countData19_gdf['volume'].min()
max_count = countData19_gdf['volume'].max() + 20
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(16,10))
line_width = 2
ax[0,0].plot(winter.resample('d').mean()['volume'].index, winter.resample('d').mean()['volume'], c='blue', lw=line_width);
ax[0,1].plot(countData19_gdf.loc['2019-03-01': '2019-05-31'].resample('d').mean()['volume'].index, countData19_gdf.loc['2019-03-01': '2019-05-31'].resample('d').mean()['volume'] ,c='orange',lw=line_width);
ax[1,0].plot(countData19_gdf.loc['2019-06-01': '2019-08-31'].resample('d').mean()['volume'].index, countData19_gdf.loc['2019-06-01': '2019-08-31'].resample('d').mean()['volume'], c='green', lw=line_width);
ax[1,1].plot(countData19_gdf.loc['2019-09-01': '2019-11-30'].resample('d').mean()['volume'].index, countData19_gdf.loc['2019-09-01': '2019-11-30'].resample('d').mean()['volume'], c='brown', lw=line_width);
ax[0,0].title.set_text('Winter')
ax[0,1].title.set_text('Spring')
ax[1,0].title.set_text('Summer')
ax[1,1].title.set_text('Fall')
for ax in [ax[0,1], ax[1,0], ax[1,1]]:
# Set minor ticks with day numbers
ax.xaxis.set_minor_locator(dates.DayLocator(interval=10))
ax.xaxis.set_minor_formatter(dates.DateFormatter('%d'))
# Set major ticks with month names
ax.xaxis.set_major_locator(dates.MonthLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('\n%b'))
plt.savefig('seasonal_global.png')
plt.show()
The gap in your plot occurs because you are displaying the winter months of two different winters, one that started in 2018 and ended in 2019, and another that started in 2019 and ended in 2020.
You need to subset your data so that it gathers the appropriate months, as the following code does:
import numpy as np
import pandas as pd
np.random.seed(42)
datetime_index = pd.date_range(start='2018-01-01', end='2020-12-31')
volume = np.random.randint(low=30, high=60, size=datetime_index.shape[0])
data = pd.DataFrame({'volume': volume},
index=datetime_index)
winter = data['2019-12':'2020-02']
winter.plot()
It plots this:
If you don't have more than one year's worth of data, then you can fill the gap with the other seasons in light grey, such as the graph below:
fig, ax = plt.subplots(1, 1, figsize=(16,10))
line_width = 2
ax.plot(data['volume'], c='grey', lw=line_width, label='All year')
ax.plot(data[:'2019-02'], c='blue', lw=line_width, label='Winter')
ax.plot(data['2019-12':], c='blue', lw=line_width)
plt.legend()
plt.title('Volume across 2019')
plt.xlabel('Month')
plt.ylabel('Volume')
plt.show()
The synthetic data that I've used is more volatile than the real data. You could smooth the time series with a moving average using rolling(), to improve the readability of the changes over time.
I have a CSV file with time data as follows:
Time,Download,Upload
17:00,7.51,0.9
17:15,6.95,0.6
17:31,5.2,0.46
I import the csv into a pandas dataframe: df = pd.read_csv('speeds.csv', parse_dates=['Time'])
And then plot the graph like so:
fig, ax = plt.subplots(figsize=(20, 7))
df.plot(ax=ax)
majorFmt = mdates.DateFormatter('%H:%M:')
minorFmt = mdates.DateFormatter('%H:%M:')
hour_locator = mdates.HourLocator()
min_locator = mdates.MinuteLocator(byminute=[15, 30, 45])
ax.xaxis.set_major_locator(hour_locator)
ax.xaxis.set_major_formatter(majorFmt)
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90, fontsize=10)
ax.xaxis.set_minor_locator(min_locator)
ax.xaxis.set_minor_formatter(minorFmt)
plt.setp(ax.xaxis.get_minorticklabels(), rotation=90, fontsize=8)
However the final graph starts from 00:00 like so, although the CSV file starts at 17:00:
How comes the graph doesnt start at 17:00 also?
Another problem (while im here) is the major lables dont line up with the major markers, they are shifted left slightly how would I fix that?
First question - graph doesn't start at 17:00:
Your csv only gives times (no dates) and it rolls over midnight. Pandas implicitely adds the current date to all times, so that times after midnight, which pertain to the next day, get the same date a times before midnight. Therefore you'll have to adjust the date part:
days = 0
df['Datetime']=df['Time']
for i in df.index:
if i > 0 and df.at[i,'Time'] < df.at[i-1,'Time']:
days += 1
df.at[i,'Datetime'] = df.at[i,'Time'] + DateOffset(days=days)
and then use the Datetime column on your x axis.
Second question - shifted major markers:
Set horizontal alingment
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90, fontsize=10, horizontalalignment='center')
I have a data frame containing several columns for which I have continuous (annual) data since 1971 up to 2012. After that I have some say "predicted" values for 2020, 2025, 2030 and 2035. The index to the data frame is in integer format (each date), and I've tried converting it to a date time format using the appropriate module, but this still doesn't correctly space out the dates on the x-axis (to show the actual time-gaps) Here's the code I've been experimenting with:
fig, ax = plt.subplots()
# Set title
ttl = "India's fuel mix (1971-2012)"
# Set color transparency (0: transparent; 1: solid)
a = 0.7
# Convert the index integer dates into actual date objects
new_fmt.index = [datetime.datetime(year=date, month=1, day=1) for date in new_fmt.index]
new_fmt.ix[:,['Coal', 'Oil', 'Gas', 'Biofuels', 'Nuclear', 'Hydro','Wind']].plot(ax=ax,kind='bar', stacked=True, title = ttl)
ax.grid(False)
xlab = 'Date (Fiscal Year)'
ylab = 'Electricity Generation (GWh)'
ax.set_title(ax.get_title(), fontsize=20, alpha=a)
ax.set_xlabel(xlab, fontsize=16, alpha=a)
ax.set_ylabel(ylab, fontsize=16, alpha=a)
# Tell matplotlib to interpret the x-axis values as dates
ax.xaxis_date()
# Make space for and rotate the x-axis tick labels
fig.autofmt_xdate()
I tried to figure it out:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import datetime
# create data frame with random data (3 rows, 2 columns)
df = pd.DataFrame(np.random.randn(3,2))
# time index with missing years
t = [datetime.date(year=1971, month=12, day=31), datetime.date(year=1972, month=12, day=31), datetime.date(year=1980, month=12, day=31)]
df.index = t
# time index with all the years:
tnew = pd.date_range(datetime.date(year=1971, month=1, day=1),datetime.date(year=1981, month=1, day=1),freq="A")
# reindex data frame (missing years will be filled with NaN
df2 = df.reindex(tnew)
# replace NaN with 0
df2_zeros = df2.fillna(0)
# or interpolate
df2_interp = df2.interpolate()
# and plot
df2_interp.columns = ["coal","wind"]
df2_interp.plot(kind='bar', stacked=True)
plt.show()
Hope this helps.
I have plotted a timeseries of carbon fluxes over 16 years at a particular site. I would like the x-axis to have years (1992-2007) instead of year number (1-16). When I set the x-axis to have a min value of 1992 and a max value of 2007, the graph doesnt appear on the plot, but when I dont set the min/max years, it appears. I'm not sure what I am doing wrong. I plotted another timeseries over one year and was able to label the x-axis with the months using MonthLocator, but am having no luck with YearLocator. Here is the code that I have written:
fig=pyplot.figure()
ax=fig.gca()
ax.plot_date(days,nee,'r-',label='model daily nee')
ax.plot_date(days,nee_obs,'b-',label='obs daily nee')
# locate the ticks
ax.xaxis.set_major_locator(YearLocator())
# format the ticks
ax.xaxis.set_major_formatter(DateFormatter('%Y'))
# set years 1992-2007
datemin = datetime.date(1992, 1, 1)
datemax = datetime.date(2007, 12, 31)
ax.set_xlim(datemin, datemax)
labels=ax.get_xticklabels()
setp(labels,'rotation',45,fontsize=10)
legend(loc="upper right", bbox_to_anchor=[0.98, 0.98],
ncol=1, shadow=True)
pyplot.ylabel('NEE($gC m^{-2} day^{-1}$)')
pyplot.title('Net Ecosystem Exchange')
pyplot.savefig('nee_obs_model_HF_daily.pdf')
# rotates and right aligns the x labels, and moves the bottom of the
# axes up to make room for them
#fig.autofmt_xdate()
pyplot.show()
pyplot.close()
I think Andrey Sobolev is right. When I run your script, with minor adjustments, :-), with some data the I have with the date field as a date, I get the years to show up with no problems. It's virtually your code, with the exception of:
fh = open(thisFileName)
# a numpy record array with fields: date, nee, nee_obs
# from a csv, thisFileName with format:
# Date,nee,nee_obs
# 2012-02-28,137.20,137.72
matplotlib.mlab.csv2rec(fh)
fh.close()
r.sort()
days = r.date
nee = r.nee
nee_obs = r.nee_obs
...
...
and then I get:
Much of this solution what borrowed from here. Let me know if I misinterpreted what you need.