gap in timeseries plot

gap in timeseries plot - python

I have one-year data and I want to plot their seasonal patterns. SO I just created sub data for each season. but my winter data plot has a gap. It cannot plot three months in sequence.
Here is my data:
winter = pd.concat([countData19_gdf.loc['2019-12-01':'2019-12-31'], countData19_gdf.loc['2019-01-01':'2019-02-28']])
winter= winter.sort_index()
min_count = countData19_gdf['volume'].min()
max_count = countData19_gdf['volume'].max() + 20
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(16,10))
line_width = 2
ax[0,0].plot(winter.resample('d').mean()['volume'].index, winter.resample('d').mean()['volume'], c='blue', lw=line_width);
ax[0,1].plot(countData19_gdf.loc['2019-03-01': '2019-05-31'].resample('d').mean()['volume'].index, countData19_gdf.loc['2019-03-01': '2019-05-31'].resample('d').mean()['volume'] ,c='orange',lw=line_width);
ax[1,0].plot(countData19_gdf.loc['2019-06-01': '2019-08-31'].resample('d').mean()['volume'].index, countData19_gdf.loc['2019-06-01': '2019-08-31'].resample('d').mean()['volume'], c='green', lw=line_width);
ax[1,1].plot(countData19_gdf.loc['2019-09-01': '2019-11-30'].resample('d').mean()['volume'].index, countData19_gdf.loc['2019-09-01': '2019-11-30'].resample('d').mean()['volume'], c='brown', lw=line_width);
ax[0,0].title.set_text('Winter')
ax[0,1].title.set_text('Spring')
ax[1,0].title.set_text('Summer')
ax[1,1].title.set_text('Fall')
for ax in [ax[0,1], ax[1,0], ax[1,1]]:
# Set minor ticks with day numbers
ax.xaxis.set_minor_locator(dates.DayLocator(interval=10))
ax.xaxis.set_minor_formatter(dates.DateFormatter('%d'))
# Set major ticks with month names
ax.xaxis.set_major_locator(dates.MonthLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('\n%b'))
plt.savefig('seasonal_global.png')
plt.show()

The gap in your plot occurs because you are displaying the winter months of two different winters, one that started in 2018 and ended in 2019, and another that started in 2019 and ended in 2020.
You need to subset your data so that it gathers the appropriate months, as the following code does:
import numpy as np
import pandas as pd
np.random.seed(42)
datetime_index = pd.date_range(start='2018-01-01', end='2020-12-31')
volume = np.random.randint(low=30, high=60, size=datetime_index.shape[0])
data = pd.DataFrame({'volume': volume},
index=datetime_index)
winter = data['2019-12':'2020-02']
winter.plot()
It plots this:
If you don't have more than one year's worth of data, then you can fill the gap with the other seasons in light grey, such as the graph below:
fig, ax = plt.subplots(1, 1, figsize=(16,10))
line_width = 2
ax.plot(data['volume'], c='grey', lw=line_width, label='All year')
ax.plot(data[:'2019-02'], c='blue', lw=line_width, label='Winter')
ax.plot(data['2019-12':], c='blue', lw=line_width)
plt.legend()
plt.title('Volume across 2019')
plt.xlabel('Month')
plt.ylabel('Volume')
plt.show()
The synthetic data that I've used is more volatile than the real data. You could smooth the time series with a moving average using rolling(), to improve the readability of the changes over time.

Related

Setting xticks to the middle of a given year with Matplotlib

I am new to Python and visualization and am trying to get my plot to only display the xticks in the middle of each year (see my plot below). I have tried a couple of things with date_range but now my plot is displaying two xticks for each year, one for the beginning of the year and one for the middle of the year. How can I get rid of the xticks that are at the beginning of each year and only keep the ones at the middle of each year?
Here's my code and plot:
texasdrought['ValidStart']=pd.to_datetime(texasdrought['ValidStart'])
droughtMask = texasdrought[texasdrought['ValidStart'].dt.year.between(2005,2015)]
# Set the figure size
plt.figure(figsize = (30,16))
# Create a mask with the dates
dates = droughtMask["ValidStart"]
# Categorize droughts
droughtcat = {
'D4 - Exceptional Drought': droughtMask["D4"],
'D3 - Extreme Drought': droughtMask["D3"],
'D2 - Severe Drought': droughtMask["D2"],
'D1 - Moderate Drought': droughtMask["D1"],
'D0 - Abnormally Dry': droughtMask["D0"]
}
fig, ax = plt.subplots()
ax.stackplot(dates, droughtcat.values(), labels=droughtcat.keys(),colors=['#660000','#FF0000','#FF6600','#FFCC99','#FFFF00'])
# Format y-axis to percentages
yearsFmt = mdates.DateFormatter("'%y")
ax.xaxis.set_major_formatter(yearsFmt)
#ax.yaxis.set_major_formatter(mtick.PercentFormatter())
# Add legend location
ax.legend(loc='upper left')
# Add title to the stackplot
ax.set_title('Drought in Texas (2005-2015)')
ticks = pd.date_range('2005-01-01', '2015-12-31', freq='6M')
plt.xticks(ticks)
# Add axis labels
ax.set_xlabel('Year')
ax.set_ylabel('Drought Intensity')
# Save figure as DroughtPlot.jpg
fig.savefig('DroughtPlot.jpg')
plt.show()
Thank you.

Use this instead:
ticks = pd.date_range('2005-01-01', '2015-12-31', freq='A-JUN') which will generate monthly dates ending in middle of each year at the last date of middle month.
DatetimeIndex(['2005-06-30', '2006-06-30', '2007-06-30', '2008-06-30',
'2009-06-30', '2010-06-30', '2011-06-30', '2012-06-30',
'2013-06-30', '2014-06-30', '2015-06-30'],
dtype='datetime64[ns]', freq='A-JUN')

How to display negative values in matplotlib's stackplot?

I´d like to create a stacked barplot of asset weights representing a financial portfolio over time. I tried several approaches for that one, but got the most pleasing results with matplotlib's stackplot function. However, I am not able to display negative asset weights in my stackplot, thus receiving wrong figures. I am using Python (3.8.3) and Matplotlib (3.3.2).
The following displays the head of the asset weights dataframe to plot:
w_minvar1nc.head()
SMACAP GROWTH MOMTUM MINVOL QUALITY
Date
2015-02-20 0.012942 0.584273 -0.114441 0.387773 0.129454
2015-02-23 0.013129 0.584528 -0.115836 0.386448 0.131732
2015-02-24 0.013487 0.584404 -0.116585 0.386364 0.132330
2015-02-25 0.015145 0.572256 -0.117796 0.387583 0.142811
2015-02-26 0.015113 0.567198 -0.114580 0.387807 0.144462
The following displays a simple code snippet of my current approach to the stackplot:
# initialize stackplot
fig, ax = plt.subplots(nrows=1, ncols=1, facecolor="#F0F0F0")
# create and format stackplot
ax.stackplot(w_minvar1nc.index, w_minvar1nc.SMACAP, w_minvar1nc.GROWTH, w_minvar1nc.MOMTUM, w_minvar1nc.MINVOL, w_minvar1nc.QUALITY)
ax.set_xlabel("Time")
ax.set_ylabel("Weight")
ax.set_ylim(bottom=-0.5, top=1.5)
ax.grid(which="major", color="grey", linestyle="--", linewidth=0.5)
# save stackplot
fig.savefig(fname=(plotpath + "test.png"))
plt.clf()
plt.close()
And here comes the corresponding stackplot itself in which you can see that the negative asset weights don't show up:
Does anyone know how to deal with that problem? Any ideas would be much appreciated.
PS: Of course I've already tried other approaches such as stacking the data manually and then create a regular barplot etc. And in this case the positive and negative asset weights are actually displayed correctly, but this approach also leads to even bigger problems regarding the formatting of the x-axis because of the daily data.

If the columns are separated into positive and negative weights, you can plot them separately:
from matplotlib import pyplot as plt
import pandas as pd
#fake data
import numpy as np
np.random.seed(123)
n = 100
df = pd.DataFrame({"Dates": pd.date_range("20180101", periods=n, freq="10d"),
"A": 0.2 + np.random.random(n)/10,
"B": -np.random.random(n)/10,
"C": -0.1-np.random.random(n)/10,
"D": 0.3+ np.random.random(n)/10})
df.set_index("Dates", inplace=True)
df["E"] = 1 - df.A - df.D - df.B - df.C
fig, ax = plt.subplots(nrows=1, ncols=1, facecolor="#F0F0F0")
ax.stackplot(df.index, df.A, df.D, df.E)
ax.stackplot(df.index, df.B, df.C)
ax.set_xlabel("Time")
ax.set_ylabel("Weight")
ax.set_ylim(bottom=-0.5, top=1.5)
ax.grid(which="major", color="grey", linestyle="--", linewidth=0.5)
plt.show()
Sample output:

Enclosed the solution to the problem with huge credit to #Mr. T:
# split data into negative and positive values
w_minvar1nc_pos = w_minvar1nc[w_minvar1nc >= 0].fillna(0)
w_minvar1nc_neg = w_minvar1nc[w_minvar1nc < 0].fillna(0)
# initialize stackplot
fig, ax = plt.subplots(nrows=1, ncols=1, facecolor="#F0F0F0")
# create and format stackplot
ax.stackplot(w_minvar1nc_pos.index, w_minvar1nc_pos.SMACAP, w_minvar1nc_pos.GROWTH, w_minvar1nc_pos.MOMTUM, w_minvar1nc_pos.MINVOL, w_minvar1nc_pos.QUALITY)
ax.stackplot(w_minvar1nc_neg.index, w_minvar1nc_neg.SMACAP, w_minvar1nc_neg.GROWTH, w_minvar1nc_neg.MOMTUM, w_minvar1nc_neg.MINVOL, w_minvar1nc_neg.QUALITY)
ax.set_xlabel("Time")
ax.set_ylabel("Weight")
ax.set_ylim(bottom=-0.5, top=1.5)
ax.grid(which="major", color="grey", linestyle="--", linewidth=0.5)
# save stackplot
fig.savefig(fname=(plotpath + "test.png"))
plt.clf()
plt.close()

date and graph alignment - Economic analysis

I'm am running a fundamental economic analysis and when I get to visualising and charting I am not able to align the dates with the graph.
I wanted the most recent date entry to show on the right and the rest of the dates to show every two years.
I have tried literally everything and cant find the solution.
Here is my code:
%matplotlib inline
import pandas as pd
from matplotlib import pyplot
import matplotlib.dates as mdates
df = pd.read_csv('https://fred.stlouisfed.org/graph/fredgraph.csvbgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=off&txtcolor=%23444444&ts=12&tts=12&width=1168&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=NAEXKP01EZQ657S&scale=left&cosd=1995-04-01&coed=2020-04-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Quarterly&fam=avg&fgst=lin&fgsnd=2020-02-01&line_index=1&transformation=lin&vintage_date=2020-09-21&revision_date=2020-09-21&nd=1995-04-01')
df = df.set_index('DATE')
df['12MonthAvg'] = df.rolling(window=12).mean().dropna(how='all')
df['9MonthAvg'] = df['12MonthAvg'].rolling(window=12).mean().dropna(how='all')
df['Spread'] = df['12MonthAvg'] - df['9MonthAvg']
pyplot.style.use("seaborn")
pyplot.subplots(figsize=(10, 5), dpi=85)
df['Spread'].plot().set_title('EUROPE: GDP Q Growth Rate (12M/12M Avg Spread)', fontsize=16)
df['Spread'].plot().axhline(0, linestyle='-', color='r',alpha=1, linewidth=2, marker='')
df['Spread'].plot().spines['left'].set_position(('outward', 10))
df['Spread'].plot().spines['bottom'].set_position(('outward', 10))
df['Spread'].plot().spines['right'].set_visible(False)
df['Spread'].plot().spines['top'].set_visible(False)
df['Spread'].plot().yaxis.set_ticks_position('left')
df['Spread'].plot().xaxis.set_ticks_position('bottom')
df['Spread'].plot().text(0.50, 0.02, "Crossing red line downwards / Crossing red line Upwards",
transform=pyplot.gca().transAxes, fontsize=14, ha='center', color='blue')
df['Spread'].plot().fmt_xdata = mdates.DateFormatter('%Y-%m-%d')
print(df['Spread'].tail(3))
pyplot.autoscale()
pyplot.show()
And the output:
This is the raw data:

There is a couple of corrections to your code.
In your URL insert "?" after fredgraph.csv. It starts so called query string,
where bgcolor is the first parameter.
Read your DataFrame with additional parameters:
df = pd.read_csv('...', parse_dates=[0], index_col=[0])
The aim is to:
read Date column as datetime,
set it as the index.
Create additional columns as:
df['12MonthAvg'] = df.NAEXKP01EZQ657S.rolling(window=12).mean()
df['9MonthAvg'] = df.NAEXKP01EZQ657S.rolling(window=9).mean()
df['Spread'] = df['12MonthAvg'] - df['9MonthAvg']
Corrections:
9MonthAvg (as I think) should be computed from the source column,
not from 12MonthAvg,
dropna here is not needed, as you create whole column anyway.
Now is the place to use dropna() on Spread column and save it in
a dedicated variable:
spread = df['Spread'].dropna()
Draw your figure the following way:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
plt.style.use("seaborn")
fig, ax = plt.subplots(figsize=(10, 5), dpi=85)
plt.plot_date(spread.index, spread, fmt='-')
ax.set_title('EUROPE: GDP Q Growth Rate (12M/12M Avg Spread)', fontsize=16)
ax.axhline(0, linestyle='-', color='r',alpha=1, linewidth=2, marker='')
ax.spines['left'].set_position(('outward', 10))
ax.spines['bottom'].set_position(('outward', 10))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.text(0.50, 0.02, "Crossing red line downwards / Crossing red line Upwards",
transform=ax.transAxes, fontsize=14, ha='center', color='blue')
ax.xaxis.set_major_formatter(mdates.DateFormatter(fmt='%Y-%m-%d'))
plt.show()
Corrections:
plt.subplots returns fig and ax, so I saved them (actually, only ax
is needed).
When one axis contains dates, it is better to use plot_date.
I changed the way DateFormatter is set.
Using the above code I got the following picture:

Increase the size of plots in pandas python

result
year Month Min_days Avg_days Median_days Count MonthName-Year
2015 1 9 12.56 10 4 2015-Jan
2015 2 10 13.67 9 3 2015-Feb
........................................................
2016 12 12 15.788 19 2 2016-Dec
and so on...
I wish to create a line plot plotting min_days, avg_days, median_days, count according to month and year say. Code used for that(which works perfectly):
import matplotlib.pyplot as plt
result=freq_start_month_year_to_date_1(df,'Jan','2015','Dec','2019')
#Visualisations
fig, ax = plt.subplots()
for col in ["Min_days", "Median_days", "Count",'Target_days_before_customer_dead']:
ax.plot(result["Month Name-Year"], result[col], label=col)
ax.legend(loc="best")
ax.tick_params(axis="x", rotation=30)
I am getting a plot . The only issue is that the x axis is too crowded and all the values 2015-Jan, 2015-Feb etc are overlapping so nothing is readable in the x axis, it looks like black scrabbling...I am unable to increase the size of the plot.
I tried below code but that too did not work
fig, ax = plt.subplots(2,2, figsize=(20,20))
Using the above code I got 4 empty sub plots

The problem is you preformatted your x-axis as string and thus robbed matplotlib of the chance to apply its own formatter. matplotlib tried to cram all the strings into the axis so you can never make it wide enough to hold all the labels.
Create a new date column and use it to form your x axis:
from matplotlib import dates as mdates
# The new column to be used as x axis
result['Date'] = pd.to_datetime(result[['Year', 'Month']].assign(Day=1))
# Plot the data
fig, ax = plt.subplots(figsize=(10, 2))
for col in ['Min_days', 'Median_days', 'Count', 'Target_days_before_customer_dead']:
ax.plot(result['Date'], result[col], label=col)
years = mdates.YearLocator() # only print label for the years
months = mdates.MonthLocator() # mark months as ticks
years_fmt = mdates.DateFormatter('%Y')
ax.xaxis.set_major_locator(years)
ax.xaxis.set_minor_locator(months)
ax.xaxis.set_major_formatter(years_fmt)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
Result (with random data):

How do I index or plot datetimes after resampling so they display on a bar plot axis correctly?

I want to display my third plot x-axis data in the datetime like my other two plots (see linked figure). I have used similar approaches to each graph, but resampled the third dataset to plot precipitation in a bar graph for every hour in my time period. When I originally attempted to format the date for the third plot as I did in the previous two, the x-axis labels either disappeared or the data doesn't plot correctly. In the link below, the data is displayed the way I intended.
Three subplots of rainfall
My timeseries data appears like this, where I'm only concerned about 'Reading' and 'Value':
Reading,Receive,Value,Unit,Quality
2018-04-07 13:09:28,2018-04-07 13:09:35,0.00,in,A
2018-04-07 06:01:25,2018-04-07 06:01:35,0.04,in,A
2018-04-07 04:38:15,2018-04-07 04:38:35,0.04,in,A
Here is how I achieved the correct scheme in the second plot:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.patches as patches
import matplotlib.dates as mdates
import datetime as dt
#read data from csv
data2 = pd.read_csv('Arroyo_Corte_Madera_del_Presidio_38021_Precipitation_Accumulation_0.txt', usecols=['Reading','Value'], parse_dates=['Reading'])
#set date as index
data2.set_index('Reading',inplace=True)
#plot data
ax2 = plt.subplot(3, 1, 2)
data2.plot(ax=ax2)
#set ticks every 12 hours
ax2.xaxis.set_major_locator(mdates.HourLocator(byhour=range(0,24,12)))
plt.xticks(rotation=0, ha='center')
#format date
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%b %d\n%H:%M:%S'))
ax2.legend().set_visible(False)
ax2.set_title('Accumulated Rainfall\nApril 5-7, 2018')
ax2.set_xlabel('')
ax2.set_ylabel('Inches Since Oct 1 2017')
ax2.set_ylim(17.5, 22)
arrow_date2 = mdates.datestr2num('04/07/2018 04:30:00')
start_date2 = mdates.datestr2num('04/07/2018 03:00:00')
end_date2 = mdates.datestr2num('04/07/2018 06:00:00')
text_date2 = mdates.datestr2num('04/07/2018 03:00:00')
ax2.axvspan(start_date2, end_date2, 0.86, 0.97, color='green', alpha=0.35)
ax2.annotate("Approximate time of\nSlope Failure", xy=(arrow_date2, 21.5), xycoords='data', xytext=(text_date2, 19), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc3"))
My code so far for the third subplot:
#read data from csv
data =pd.read_csv('Arroyo_Corte_Madera_del_Presidio_38021_Precipitation_Increment_0.txt', usecols=['Reading','Value'], parse_dates=['Reading'])
#set date as index
data.set_index('Reading',inplace=True)
resamp = data.resample('1H').sum().reset_index()
#plot data
ax3 = plt.subplot(3, 1, 3)
resamp.plot(kind='bar',ax=ax3, x='Reading', y='Value', width=0.9)
#set ticks every other hour
plt.xticks(ha='center')
for label in ax3.xaxis.get_ticklabels()[::2]:
label.set_visible(False)
ax3.legend().set_visible(False)
ax3.set_title('Rainfall in Hours\nApril 6-7, 2018')
ax3.set_xlabel('')
ax3.set_ylabel('Precipitation Increment (in)')
plt.show()
How do I fix my code to make the axis labels plot in the way I want them to plot?

My code was wrong, obviously. When I resampled the data, I reset the index. This created a new index column that was messing with my desired x values ('Reading'). Additionally, I shouldn't have been plotting 'x' in resamp.plot. This solution helped: Plotting with Pandas. Here is the corrected code:
#read data from csv
data = pd.read_csv('Arroyo_Corte_Madera_del_Presidio_38021_Precipitation_Increment_0.txt', usecols=['Reading','Value'], parse_dates=['Reading'])
#set date as index
data.set_index('Reading',inplace=True)
resamp = data.resample('1H').sum() # changed here
#plot data
ax3 = plt.subplot(3, 1, 3)
resamp.plot(ax=ax3, y='Value', kind='bar', width=0.9) # changed here
ax3.set_xticklabels([dt.strftime('%b %d\n%H:%M:%S') for dt in resamp.index])
plt.xticks(rotation=0, ha='center')
for i, tick in enumerate(ax3.xaxis.get_major_ticks()):
if (i % (4) != 0): # 4 hours
tick.set_visible(False)
ax3.legend().set_visible(False)
ax3.set_title('Rainfall in Hours\nApril 6-7, 2018')
ax3.set_xlabel('')
ax3.set_ylabel('Precipitation Increment (in)')
ax3.set_ylim(0.00, 0.40)
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

gap in timeseries plot - python

Related

Setting xticks to the middle of a given year with Matplotlib

How to display negative values in matplotlib's stackplot?

date and graph alignment - Economic analysis

Increase the size of plots in pandas python

How do I index or plot datetimes after resampling so they display on a bar plot axis correctly?

Categories

Resources