Plotting time series dataframe in python

Plotting time series dataframe in python - python

I am having a really really hard time plotting a time series plot from a data frame python. Please find datatype below.
Time_split datetime64[ns]
Total_S4_Sig1 float64
The time_split column is the X axis and is the time variable. The total s4 is the Y variable and is a float.
0 15:21:00
1 15:22:00
2 15:23:00
3 15:24:00
4 15:25:00
5 19:29:00
6 19:30:00
7 19:31:00
8 19:32:00
9 19:33:00
Please be advised that the time series will never seconds fraction i.e. it will always be 00 and also the data be continuous i.e. it will be minute wise continuous data.
The data will NOT NECESSARILY start at a whole hour. It could start at any time for example 15:35. I want to create a graph where the X axis major marking will be full hours like 19:00, 21:00, 22:00 and the minor marking should be half the hour i.e. 21:30, 19:30. I do not want the seconds part of the time to be seen as its useless.
What I want it to do is just graph hour and minute in format HH:MM and major markings at whole hours and minor markings at half hours.
keydata["Time_split"] = keydata["Time_split"].dt.time
keydata.plot(x='Time_split', y='Total_S4_Sig1')
plt.show()
This code leads to such a plot.
I do not want the seconds to be shown and I want the marking at full hours and minor markings at half hours.
keydata["Time_split"] = keydata["Time_split"].dt.time
time_form = mdates.DateFormatter("%H:%M")
ax = keydata.plot(x='Time_split', y='Total_S4_Sig1')
ax.xaxis.set_major_formatter(time_form)
plt.show()
This code leads to such a plot.
Please be advised the seconds will always be 00

Try using matplotlib date formatting
import matplotlib.dates as mdates
date_fmt = mdates.DateFormatter('%H:%M:%S')
# plot your data
ax = df.plot.line(x='time', y='values')
# add the date formatter as the x axis tick formatter
ax.xaxis.set_major_formatter(date_fmt)

The following should address the problems you are facing:
import pandas as pd
from datetime import date, datetime, timedelta
import matplotlib.pyplot as plt
import matplotlib.dates as md
import numpy as np
#testing data
#keydata = pd.read_csv('test.txt',sep='\t',header=None,names=['Time_split','Total_S4_Sig1'])
x = pd.to_datetime(keydata['Time_split'])
y = keydata['Total_S4_Sig1']
# plot
fig, ax = plt.subplots(1, 1)
ax.plot(x, y,'ok')
# Format xtick labels as hour : minutes
xformatter = md.DateFormatter('%H:%M')
## Set xtick labels to appear every 1 hours
ax.xaxis.set_major_locator(md.HourLocator(interval=1))
#set minor ticks every 1/2 hour
ax.xaxis.set_minor_locator(md.MinuteLocator(byminute=[0,30],interval=1))
plt.gcf().axes[0].xaxis.set_major_formatter(xformatter)
plt.show()

Related

mplfinance xaxis ticks every n minutes, but not if datapoint doesn't exist

Checking I am not re-inventing the wheel here, with mplfinance I would like to have ticks on the x-axis every 15 minutes, but only if data exists.
Plotting direct to mplfinance (without returnfig=True) the plot is ok except the xaxis which values are not time aligned they depend on the first element of the dataframe as to what time gets used.
To try and make this have grid/tick every hour, I have the below code which works ok when there are the expected panda indexes in my dataset:
start_date = pd.to_datetime('2021-12-21 04:00').tz_localize('America/New_York')
end_date = pd.to_datetime('2021-12-21 19:55').tz_localize('America/New_York')
df5trimmed = df5.truncate(before=start_date, after=end_date)
ticks = pd.date_range(start_date, end_date, freq='15T')
ticklocations = [ df5trimmed.index.get_loc(tick) for tick in ticks ]
ticklabels = [ tick.time().strftime('%H:%M') for tick in ticks ]
fig, axlist = mpf.plot(df5trimmed,style='yahoo', addplot=plotsToAdd, figsize=(48,24),
type='candlestick', volume=True, xrotation=0,
tight_layout=True, returnfig=True)
axlist[-2].xaxis.set_ticks(ticklocations)
axlist[-2].set_xticklabels(ticklabels)
However it blows up, expectedly, with index not found during the iterator df5trimmed.index.get_loc(tick) for tick in ticks when no data exists in the pandaframe for that tick.
Notice the discontinuities in the data below it blows up during 17:00 key access attempt, as it doesn't exist in my data:
Essentially I am looking to plot the lines aligned to n minutes (in the example below 15 minutes), but only if it exists and not otherwise (if it doesn't exist, I am ok with the bars being right next to one another)... in summary during regulary trading hours with liqudity (where there would be data points) would be ticks # 08:15, 08:30.
Is there an argument in mplfinance that can do this?
What I am looking to achieve
The below is from tradingview, note the aligned time ticks every 15 minutes during regular trading hours and pretty much the entire plot.
Additional Info - source data and what is plotted
The below uses this csv data, and plots directly to mplfinance, you can see the time ticks are not aligned to the hour I get 04:00, 06:25, 08:10, 09:50 etc:
import mplfinance as mpf
import pandas as pd
import datetime
df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
fig, axlist = mpf.plot(df5trimmed,style='yahoo', figsize=(48,24), type='candlestick', volume=True, xrotation=0, tight_layout=True, returnfig=True)
# Display:
mpf.show()

I used the following code to add the ticks, but only if they exist - I suspect much nicer ways of writing this so open to reposts of this as alternative answers.
My next iteration will be to make the text with 0-degree rotation, and not cluttered should a label be overwritten by another.
import mplfinance as mpf
import pandas as pd
import datetime
def getTimestampTickFrequency(df):
# get most common interval in minutes
mode = df.index.to_series().diff().astype('timedelta64[m]').astype('Int64').mode()[0]
if mode==5:
return 15, 3 # for 5 minutes, tick every 15 mins
elif mode==15:
return 60, 12 # for 15 minute data, tick every hour
elif mode==120:
return 240, 48 # for hourly data, tick every 2 hours
return mode
def getTickLocationsAndLabels(df):
tickLocations = []
tickLabels = []
tickFrequencyMinutes, samplesBetweenTicks = getTimestampTickFrequency(df)
entireTickRange = pd.date_range(start=df.index[0], end=df.index[-1], freq=f'{tickFrequencyMinutes}T')
prevTimestamp = df.index[0]
# get indexes of data frame that match the ticks, if they exist
for tick in entireTickRange:
print(tick)
try:
found = df.index.get_loc(tick)
currentTimestamp = df.index[found]
timestampDifference = (currentTimestamp - prevTimestamp).total_seconds() / 60
print(f'Delta last time stamp = {timestampDifference}')
#if timestampDifference <= tickFrequencyMinutes:
tickLocations.append(found)
tickLabels.append(tick.time().strftime('%H:%M'))
prevTimestamp = currentTimestamp
except KeyError:
pass # ignore
return tickLocations, tickLabels
df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
tickLocations, tickLabels = getTickLocationsAndLabels(df5trimmed)
fig, axlist = mpf.plot(df5trimmed,style='yahoo', figsize=(48,24), type='candlestick', volume=True, tight_layout=True, returnfig=True)
axlist[-2].xaxis.set_ticks(tickLocations)
axlist[-2].set_xticklabels(tickLabels)
# Display:
mpf.show()

So, what need is to remove gaps from the plot due to missing bars.
Essentially I am looking to plot the lines aligned to n minutes (in
the example below 15 minutes), but only if it exists and not otherwise
From the code you posted to try, but zooming in the first 30 bars and adding the option show_nontrading=True:
import mplfinance as mpf
import pandas as pd
import datetime
df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
fig, axlist = mpf.plot(df5trimmed[0:30],
style='yahoo',
figsize=(48,24),
type='candle',
volume=True,
xrotation=0,
tight_layout=True,
returnfig=True,
show_nontrading=True)
# Display:
mpf.show()
I get this, which is showing the gaps you mentioned.
But if I set the option show_nontrading=False
This changes to the plot below, which removes the gaps corresponding to the missing bars.
Isn't this what you needed?

Please check if this solves your issue. I think it does.
Check the date range between 6:00 am and 7:00 AM. Few bars are plotted between 5:00 and 6:00 and 6:30 is missing.
import mplfinance as mpf
import pandas as pd
import datetime
import matplotlib.pyplot as plt
%matplotlib inline
# DATA PREPARATION
df5 = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
start_date = pd.to_datetime('2021-12-21 1:00').tz_localize('America/New_York')
end_date = pd.to_datetime('2021-12-21 7:00').tz_localize('America/New_York')
df5trimmed = df5.truncate(before=start_date, after=end_date)
# PLOTTING
fig, axlist= mpf.plot(df5trimmed,
style='yahoo',
type='candlestick',
volume=True,
xrotation=0,
figsize=(20,10),
returnfig=True,
show_nontrading=False)
# x-tick labeling preparation
# the frequency can be ajusted but needs to be lower than original data
idx = pd.date_range(start_date, end_date, freq='30T', tz='America/New_York')
df_label_idx = pd.DataFrame(index=idx)
# this merge does the trick: the output is the intersection between the lower frequency and the
# higher freq time series. The inner option leaves in the output only those rows present in both TS
# dropping from the lower freq TS those missing periods in the higher freq TS.
df_label = pd.merge(df_label_idx, df5trimmed, how='inner', left_index=True, right_index=True ).tz_convert('America/New_York')
# Tick labels are generated based on df_label
tick_labels = list(df_label.index.strftime('%H:%M'))
ticklocations = [df5trimmed.index.get_loc(tick) for tick in df_label.index ]
axlist[-2].xaxis.set_ticks(ticklocations)
axlist[-2].set_xticklabels(tick_labels)
mpf.show()
print(tick_labels)
df_label
df5trimmed['2021-12-21 05:00:00-05:00':'2021-12-21 07:00:00-05:00']

How do I plot a graph with x_lims between time h1:m1 and h2:m2

I'm working on a project with loads of temperature data and I'm currently processing and plotting all of my data. However, I keep falling foul when I try to set x_lims on my plots between a time1 (9:00) and time2 (21:00)
Data background:
The sensor has collected data every second for two weeks and I've split the main data file into smaller daily files (e.g. dayX). Each day contains a timestamp (column = 'timeStamp') and a mean temperature (column = 'meanT').
The data for each day has been presliced just slightly over the window I want to plot (i.e. dayX contains data from 8:55:00 - 21:05:00). The dataset contains NaN values at some points as the sensors were not worn and data needed to be discarded.
Goal:
What I want to do is to be able to plot the dayX data between a set time interval (x_lim = 9:00 - 21:00). As I have many days of data, I eventually want to plot each day using the same x axis (I want them as separate figures however, not subplots), but each day has different gaps in the main data set, so I want to set constant x lims. As I have many different days of data, I'd rather not have to specify the date as well as the time.
Example data:
dayX =
timeStamp meanT
2018-05-10 08:55:00 NaN
. .
. .
. .
2018-05-10 18:20:00 32.4
. .
. .
. .
2018-05-10 21:05:00 32.0
What I've tried:
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.dates import date2num, DateFormatter
dayX = pd.read_csv('path/to/file/dayX.csv)
dayX['timeStamp'] = pd.to_datetime(dayX['timeStamp'], format=%Y %m %d %H:%M:%S.%f')
fig, ax1 = plt.subplots(1,1)
ax1.plot(dayX['timeStamp'], dayX['meanT'])
ax1.xaxis.set_major_formatter(DateFormatter('%H:%M'))
ax1.set_xlim(pd.Timestamp('9:00'), pd.Timestamp('21:00'))
fig.autofmt_xdate()
plt.show()
Which gives:
If I remove the limit line however, the data plots okay, but the limits are automatically selected
# Get rid of this line:
ax1.set_xlim(pd.Timestamp('9:00'), pd.Timestamp('21:00'))
# Get this:
I'm really not sure why this is going wrong or what else I should be trying.

Your timeStamp is a datetime object. All you got to do is pass the datetime objects as the limits.
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.dates import date2num, DateFormatter
dayX = df
dayX['timeStamp'] = pd.to_datetime(dayX['timeStamp'], format='%Y-%m-%d %H:%M:%S')
fig, ax1 = plt.subplots(1,1)
ax1.plot(dayX['timeStamp'], dayX['meanT'])
ax1.xaxis.set_major_formatter(DateFormatter('%H:%M'))
ax1.set_xlim(df['timeStamp'].min().replace(hour=9), df['timeStamp'].min().replace(hour=21))
fig.autofmt_xdate()
plt.show()
Output:

You probably need to construct a full timestamp object since it'll default to today's date, which has no data in your case. the following snippet shoudl replace the ax1.set_xlim line in your code, and should also work for starting and ending multiday time ranges on specific hours of your choosing.
min_h = 9 # hours
max_h = 21 # hours
start = dayX['timeStamp'].min()
end = dayX['timeStamp'].max()
xmin = pd.Timestamp(year=start.year, month=start.month, day=start.day, hour=min_h)
xmax = pd.Timestamp(year=end.year, month=end.month, day=end.day, hour=max_h)
ax1.set_xlim(xmin, xmax)

Using matplotlib to plot a distribution of time occurrences. I would like the x axis to have hours (12:00 PM) rather than integers (12)

Here's my plot, which is generated using the following code:
bins = np.linspace(0,24,25)
plt.hist(hours,bins, edgecolor='black', linewidth = 1.2, color = 'red')
I would like the x axis to show 24 entries, from 12:00AM to 11:00 PM ideally rotated left 90 degrees.
I see two paths: convert the actual data to time values so the histogram reads in time values or simply add a custom x axis with 12:00AM, 1:00 AM, etc. What's the easiest / cleanest approach here? I'm not familiar with how to do either. For reference, "hours" is a int64 array.

Here's a working example:
import numpy as np
import matplotlib.pyplot as plt
bins = np.arange(0,25)
hours = np.random.rand(50)*25
fig, ax = plt.subplots()
labels = []
for i in bins:
if i<12:
labels.append("{}:00AM".format(i))
elif i == 12:
labels.append("12:00PM")
else:
labels.append("{}:00PM".format(i-12))
ax.hist(hours, bins)
ax.set_xticks(bins + 0.5) # 0.5 is half of the "1" auto width
ax.set_xticklabels(labels, rotation='vertical')
fig.subplots_adjust(bottom = 0.2) # makes space for the vertical
#labels.
plt.show()
which gives:
I've changed the linspace to arange as it returns integers

To get a nice time format on the xaxis, the idea could be to calculate the histogram in terms of numbers which can be interpreted as datetimes.
In case you only have times, you would not mind too much about the actual date. So dividing the data by 24 gives fraction of a day. Since matplotlib interpretes numbers as days since 0001-01-01 UTC, plus 1, one then needs to add some whole number >=2 not to run into trouble with negative dates.
Then usual matplotlib.dates locators and formatters can be used to get nice ticklabels. "%I:%M %p" would give the time representation in hours by 12 with am/pm appendix.
import numpy as np; np.random.seed(3)
import matplotlib.pyplot as plt
import matplotlib.dates
data = np.random.normal(12,7, size=200)
data = data[(data >=0) & (data <24)]
f = lambda x: 2+x/24.
bins=np.arange(25)
plt.hist(f(data), bins=f(bins))
plt.gca().xaxis.set_major_locator(matplotlib.dates.HourLocator())
plt.gca().xaxis.set_major_formatter(matplotlib.dates.DateFormatter("%I:%M %p"))
plt.setp(plt.gca().get_xticklabels(),rotation=90)
plt.tight_layout()
plt.show()
(This would hence be the histogram of datetimes of the 2nd of january 0001.)

matplotlib only business days without weekends on x-axis with plot_date

I have the following persistent problem:
The following code should draw a straight line:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
d = pd.date_range(start="1/1/2012", end="2/1/2012", freq="B")
v = np.linspace(1,10,len(d))
plt.plot_date(d,v,"-")
But all I get is a jagged line because "plot_date" somehow fills up the dates in "d" with the weekends.
Is there a way to force matplotlib to take my dates (only business days) as is without filing them up with weekend dates?
>>>d
DatetimeIndex(['2012-01-02', '2012-01-03', '2012-01-04', '2012-01-05',
'2012-01-06', '2012-01-09', '2012-01-10', '2012-01-11',
'2012-01-12', '2012-01-13', '2012-01-16', '2012-01-17',
'2012-01-18', '2012-01-19', '2012-01-20', '2012-01-23',
'2012-01-24', '2012-01-25', '2012-01-26', '2012-01-27',
'2012-01-30', '2012-01-31', '2012-02-01'],
dtype='datetime64[ns]', freq='B')

plot_date does a trick, it converts dates to number of days since 1-1-1 and uses these numbers to plot, then converts the ticks to dates again in order to draw nice tick labels. So using plot_date each day count as 1, business or not.
You can plot your data against a uniform range of numbers but if you want dates as tick labels you need to do it yourself.
d = pd.date_range(start="1/1/2012", end="2/1/2012", freq="B")
v = np.linspace(1,10,len(d))
plt.plot(range(d.size), v)
xticks = plt.xticks()[0]
xticklabels = [(d[0] + x).strftime('%Y-%m-%d') for x in xticks.astype(int)]
plt.xticks(xticks, xticklabels)
plt.autoscale(True, axis='x', tight=True)
But be aware that the labels can be misleading. The segment between 2012-01-02 and 2012-01-09 represents five days, not seven.

Plotting Pandas Datetime Timeseries in AM/PM format

I have a pandas series with Timestamp indices that I'd like to plot.
print example.head()
2015-08-11 20:07:00-05:00 26
2015-08-11 20:08:00-05:00 66
2015-08-11 20:09:00-05:00 71
2015-08-11 20:10:00-05:00 63
2015-08-11 20:11:00-05:00 73
But when i plot it in pandas with:
plt.figure(figsize = (15,8))
cubs1m.plot(kind='area')
I'd like the values on the y-axis to show up in AM/PM format (8:08PM), not military time(20:08). Is there an easy way to do this?
And also, how would I control # of ticks and # of labels plotting with pandas?
Thanks in advance.

Your question has two elements:
How to control # of ticks/labels on a plot
How to change 24-hour time to 12-hour time
Axes methods set_xticks, set_yticks, set_xticklabels, and set_yticklabels control the ticks and the labels:
import matplotlib.pyplot as plt
plt.plot(range(10), range(10))
plt.gca().set_xticks(range(0,10,2))
plt.gca().set_xticklabels(['a{}'.format(ii) for ii in range(0,10,2)])
To change the time format, use pd.datetime.strftime: How can I convert 24 hour time to 12 hour time?
import pandas as pd
data = pd.Series(range(12), index=pd.date_range('2016-2-3 9:00','2016-2-3 20:00', freq='H'))
ax = data.plot(xticks=data.index[::2])
ax.set_xticklabels(data.index[::2].map(lambda x: pd.datetime.strftime(x, '%I %p')))
This question covers an alternate approach to plotting with dates: Pandas timeseries plot setting x-axis major and minor ticks and labels

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting time series dataframe in python - python

Try using matplotlib date formatting import matplotlib.dates as mdates date_fmt = mdates.DateFormatter('%H:%M:%S') # plot your data ax = df.plot.line(x='time', y='values') # add the date formatter as the x axis tick formatter ax.xaxis.set_major_formatter(date_fmt)

Related

mplfinance xaxis ticks every n minutes, but not if datapoint doesn't exist

How do I plot a graph with x_lims between time h1:m1 and h2:m2

Using matplotlib to plot a distribution of time occurrences. I would like the x axis to have hours (12:00 PM) rather than integers (12)

matplotlib only business days without weekends on x-axis with plot_date

Plotting Pandas Datetime Timeseries in AM/PM format

Categories

Resources