I am trying to figure out how to calculate number of iterations of datetimes between 2 dates using a specific frequency (1D 3D 3H 15T)
for example:
freq = '3H'
start = datetime.datetime(2018, 8, 14, 9, 0, 0)
end= datetime.datetime(2018, 8, 15)
total = func(start, end, freq)
if freq = '3H' total would be 5.
if freq = '30T' total would be 30
what would func look like?
EDIT
I'm leaving my original question up there, and adding details I failed to add originally in order to keep things as simple as possible.
In the code I am working on, I have a Pandas DataFrame with a DateTimeIndex. I needed to calculate the number of rows since a specific time(above). I thought about creating a DataFrame starting from that time and filling in the gaps, and counting rows like that, but that seems silly now.
the function I ended up using (with the parsing) is this:
def periods(row, time_frame):
start = datetime.datetime(2018, 8, 14, 9, 0, 0)
end = row.name
t = time_frame[-1:]
n = int(re.findall('\d+',time_frame)[0])
if t is 'H':
freq = datetime.timedelta(hours=n)
elif t is 'T':
freq = datetime.timedelta(minutes=n)
else:
freq = datetime.timedelta(days=n)
count = 0
while start < end:
start += freq
count += 1
return count
and I call it from my dataframe(candlesticks) like this:
candlesticks['n'] = candlesticks.apply(lambda x: periods(x, candlesticks.index.freqstr), axis=1)
Use the timedelta module in the datetime library, and from there it's the same as comparing numbers essentially.
from datetime import timedelta
freq = timedelta(hours=3)
def periods(frequency, start, end):
count = 0
while start < end:
start += frequency
count += 1
return count
p = periods(freq, start, end)
print(p)
>> 5
Related
I am working on recurring events in a calendar.
I have a base recurring event with start datetime = S and a recurring start datetime of S + N×Δt, where N is the Nth occurrence and Δt is the separation between each occurrence.
What is the most efficient way of finding all occurrences of this event over a specific datetime interval?
Example:
Event has an initial datetime (2021, 10, 29, 10, 0) and occurs at an interval of 10 days.
I want to solve for N when N is between 2022-05-06 00: and 2022-06-05, so datetime(2022, 05, 06, 0, 0) <= N < datetime(2022, 06, 05, 0, 0)
Can this be done in a more elegant way than iterating over each minute within that range and performing a Euclidian division?
I am using Python.
I figured it out, for anyone looking. This will give you the values of N over which you have to iterate to find the occurrences within the given range.
occurrence_start = int(input('Occurrence start: '))
occurrence_end = int(input('Occurrence end: '))
delta = occurrence_end - occurrence_start
i = int(input('Separation: '))
start = int(input('Interval start: '))
end = int(input('Interval end: '))
print('Range is between N={} and N={}'.format((start - occurrence_start + delta)//i, (end - occurrence_end - delta)//i))
I have a 10 years climatological dateset as follows.
dt T P
01-01-2010 3 0
02-01-2010 5 11
03-01-2010 10 50
....
31-12-2020 -1 0
I want to estimate the total number of days in each month where T and P continuously stayed greater than 0 for three days or more
I would want these columns as an output:
month Number of days/DurationT&P>0 T P
I have never used loops in python, I seem to be able to write a simple loop and nothing beyond this when the data has to be first grouped by month and year and then apply the condition. would really appreciate any hints on the construction of the loop.
A= dataset
A['dt'] = pd.to_datetime(A['dt'], format='%Y-%m-%d')
for column in A [['P', 'T']]:
for i in range (len('P')):
if i > 0:
P.value_counts()
print(i)
for j in range (len ('T')):
if i > 0:
T.value_counts()
print(j)
Here is a really naive way you could set it up by simply iterating over the rows:
df['valid'] = (df['T'] > 0) & (df['P'] > 0)
def count_total_days(df):
i = 0
total = 0
for idx, row in df.iterrows():
if row.valid:
i += 1
elif not row.valid:
if i >= 3:
total += i
i = 0
return total
Since you want it per month, you would first have to create new month and year columns to group by:
df['month'] = df['dt'].dt.month
df['year'] = df['dt'].dt.year
for date, df_subset in df.groupby(['month', 'year']):
count_total_days(df_subset)
You can use resample and sum to get the sum of day for each where the condition is true.
import pandas as pd
dt = ["01-01-2010", "01-02-2010","01-03-2010","01-04-2010", "03-01-2010",'12-31-2020']
t=[3,66,100,5,10,-1]
P=[0,77,200,11,50,0]
A=pd.DataFrame(list(zip(dt, t,P)),
columns =['dtx', 'T','P'])
A['dtx'] = pd.to_datetime(A['dtx'], format='%m-%d-%Y')
A['Mask']=A.dtx.diff().dt.days.ne(1).cumsum()
dict_freq=A['Mask'].value_counts().to_dict()
newdict = dict((k, v) for k, v in dict_freq.items() if v >= 3)
A=A[A['Mask'].isin(list(newdict.keys()))]
A['Mask']=(A['T'] >= 1) & (A['P'] >= 1)
df_summary=A.query('Mask').resample(rule='M',on='dtx')['Mask'].sum()
Which produce
2010-01-31 3
I'm trying to write a program that groups Timelapse photo's together from their timestamp. The timelapse photo's and random photo's are in one folder.
For example, if the timestamp difference in seconds between the previous and current photo is: 346, 850, 13, 14, 13, 14, 15, 12, 12, 13, 16, 11, 438.
You can make a reasonable guess the timelapse began at 13 and ended at 11.
Right now I'm trying a hacky solution to compare the percentage difference with the previous one.
But there has to be a formula/algo to group timestamps together by timedifference. Rolling mean or something.
Am I looking over a simple solution?
Thank you!
def cat_algo(folder):
# Get a list with all the CR2 files in the folder we are processing
file_list = folder_to_file_list(folder)
# Extract the timestamp out of the CR2 file into a sorted dictionary
cr2_timestamp = collections.OrderedDict()
for file in file_list:
cr2_timestamp[file] = return_date_from_raw(file)
print str(file) + " - METADATA TIMESTAMP: " + \
str(return_date_from_raw(file))
# Loop over the dictionary to compare the timestamps and create a new dictionary with a suspected group number per shot
# Make sure we know that there is no first file yet using this (can be refractored)
item_count = 1
group_count = 0
cr2_category = collections.OrderedDict()
# get item and the next item out of the sorted dictionary
for item, nextitem in zip(cr2_timestamp.items(), cr2_timestamp.items()[1::]):
# if not the first CR2 file
if item_count >= 2:
current_date_stamp = item[1]
next_date_stamp = nextitem[1]
delta_previous = current_date_stamp - previous_date_stamp
delta_next = next_date_stamp - current_date_stamp
try:
difference_score = int(delta_next.total_seconds() /
delta_previous.total_seconds() * 100)
print "diffscore: " + str(difference_score)
except ZeroDivisionError:
print "zde"
if delta_previous > datetime.timedelta(minutes=5):
# if difference_score < 20:
print item[0] + " - hit - " + str(delta_previous)
group_count += 1
cr2_category[item[0]] = group_count
else:
cr2_category[item[0]] = group_count
# create a algo to come up with percentage difference and use this to label timelapses.
print int(delta_previous.total_seconds())
print int(delta_next.total_seconds())
# Calculations done, make the current date stamp the previous datestamp for the next iteration
previous_date_stamp = current_date_stamp
# If time difference with previous over X make a dict with name:number, in the end everything which has the
# same number 5+ times in a row can be assumed as a timelapse.
else:
# If it is the first date stamp, assign it the current one to be used in the next loop
previous_date_stamp = item[1]
# To help make sure this is not the first image in the sequence.
item_count += 1
print cr2_category
If you use itertools.groupby, using a function that returns True if the delay meets your criteria for timelapse photo regions, based on the list of delays, you can get the index of each such region. Basically, we're grouping on the True/False output of that function.
from itertools import groupby
# time differences given in original post
data = [346, 850, 13, 14, 13, 14, 15, 12, 12, 13, 16, 11, 438]
MAX_DELAY = 25 # timelapse regions will have a delay no larger than this
MIN_LENGTH = 3 # timelapse regions will have at least this many photos
index = 0
for timelapse, g in groupby(data, lambda x: x <= MAX_DELAY):
length = len(list(g))
if (timelapse and length > MIN_LENGTH):
print ('timelapse index {}, length {}'.format(index, length))
index += length
output:
timelapse index 2, length 10
I'm trying to build a function that recieves a date and adds days, updating everything in case it changes, so far i've come up with this:
def addnewDate(date, numberOfDays):
date = date.split(":")
day = int(date[0])
month = int(date[1])
year = int(date[2])
new_days = 0
l = 0
l1 = 28
l2 = 30
l3 = 31
#l's are the accordingly days of the month
while numberOfDays > l:
numberOfDays = numberOfDays - l
if month != 12:
month += 1
else:
month = 1
year += 1
if month in [1, 3, 5, 7, 8, 10, 12]:
l = l3
elif month in [4, 6, 9, 11]:
l = l2
else:
l = l1
return str(day) + ':' + str(month) + ':' + str(year) #i'll deal
#with fact that it doesn't put the 0's in the < 10 digits later
Desired output:
addnewDate('29:12:2016', 5):
'03:01:2017'
I think the problem is with either the variables, or the position i'm using them in, kinda lost though..
Thanks in advance!
p.s I can't use python build in functions :)
Since you cannot use standard library, here's my attempt. I hope I did not forget anything.
define a table for month lengths
tweak it if leap year detected (every 4 year, but special cases)
work on zero-indexed days & months, much easier
add the number of days. If lesser that current month number of days, end, else, substract current month number of days and retry (while loop)
when last month reached, increase year
add 1 to day and month in the end
code:
def addnewDate(date, numberOfDays):
month_days = [31,28,31,30,31,30,31,31,30,31,30,31]
date = date.split(":")
day = int(date[0])-1
month = int(date[1])-1
year = int(date[2])
if year%4==0 and year%400!=0:
month_days[1]+=1
new_days = 0
#l's are the accordingly days of the month
day += numberOfDays
nb_days_month = month_days[month]
done = False # since you don't want to use break, let's create a flag
while not done:
nb_days_month = month_days[month]
if day < nb_days_month:
done = True
else:
day -= nb_days_month
month += 1
if month==12:
year += 1
month = 0
return "{:02}:{:02}:{:04}".format(day+1,month+1,year)
test (may be not exhaustive):
for i in ("28:02:2000","28:02:2004","28:02:2005","31:12:2012","03:02:2015"):
print(addnewDate(i,2))
print(addnewDate(i,31))
result:
02:03:2000
31:03:2000
01:03:2004
30:03:2004
02:03:2005
31:03:2005
02:01:2013
31:01:2013
05:02:2015
06:03:2015
of course, this is just for fun. Else use time or datetime modules!
This question already has answers here:
Number of days between 2 dates, excluding weekends
(22 answers)
Closed 7 years ago.
The same problem to Find day difference between two dates (excluding weekend days) but it is for javascript. How to do that in Python?
Try it with scikits.timeseries:
import scikits.timeseries as ts
import datetime
a = datetime.datetime(2011,8,1)
b = datetime.datetime(2011,8,29)
diff_business_days = ts.Date('B', b) - ts.Date('B', a)
# returns 20
or with dateutil:
import datetime
from dateutil import rrule
a = datetime.datetime(2011,8,1)
b = datetime.datetime(2011,8,29)
diff_business_days = len(list(rrule.rrule(rrule.DAILY,
dtstart=a,
until=b - datetime.timedelta(days=1),
byweekday=(rrule.MO, rrule.TU, rrule.WE, rrule.TH, rrule.FR))))
scikits.timeseries look depricated : http://pytseries.sourceforge.net/
With pandas instead someone can do :
import pandas as pd
a = datetime.datetime(2015, 10, 1)
b = datetime.datetime(2015, 10, 29)
diff_calendar_days = pd.date_range(a, b).size
diff_business_days = pd.bdate_range(a, b).size
Not sure that this is the best one solution but it works for me:
from datetime import datetime, timedelta
startDate = datetime(2011, 7, 7)
endDate = datetime(2011, 10, 7)
dayDelta = timedelta(days=1)
diff = 0
while startDate != endDate:
if startDate.weekday() not in [5,6]:
diff += 1
startDate += dayDelta
Here's a O(1) complexity class solution which uses only built-in Python libraries.
It has constant performance regardless of time interval length and doesn't care about argument order.
#
# by default, the last date is not inclusive
#
def workdaycount(first, second, inc = 0):
if first == second:
return 0
import math
if first > second:
first, second = second, first
if inc:
from datetime import timedelta
second += timedelta(days=1)
interval = (second - first).days
weekspan = int(math.ceil(interval / 7.0))
if interval % 7 == 0:
return interval - weekspan * 2
else:
wdf = first.weekday()
if (wdf < 6) and ((interval + wdf) // 7 == weekspan):
modifier = 0
elif (wdf == 6) or ((interval + wdf + 1) // 7 == weekspan):
modifier = 1
else:
modifier = 2
return interval - (2 * weekspan - modifier)
#
# sample usage
#
print workdaycount(date(2011, 8, 15), date(2011, 8, 22)) # returns 5
print workdaycount(date(2011, 8, 15), date(2011, 8, 22), 1) # last date inclusive, returns 6