i have start date and end date and dataframe with daily observations. The problem is that i can't find a way, which will enable me select dates with periodicity of 3 months
for example:
2003-01-03 + 3 months = 2003-04-03 and so on
output should consist of 20 rows because 5 years with 3 months periodicity, including start and end dates
EDIT: Old solution didn't work for all cases. Therefore a new one:
start, end = returns.index[0], returns.index[-1]
length = (end.year - start.year) * 12 + (end.month - start.month)
if length % 3 == 0 and end.day >= start.day:
length += 3
new_index = []
for m in range(3, length, 3):
ydelta, month = divmod(start.month + m, 12)
day = pd.Timestamp(year=start.year + ydelta, month=month, day=1)
day += pd.Timedelta(f'{min(start.day, day.days_in_month) - 1}d')
new_index.append(day)
new_index = pd.DatetimeIndex(new_index)
returns = returns.loc[new_index]
Another version which has some slight inaccuracies around the month ends but is more compact:
add_3_months = pd.tseries.offsets.DateOffset(months=3)
new_index = pd.date_range(returns.index[0] + add_3_months,
returns.index[-1],
freq=add_3_months)
returns = returns.loc[new_index]
Related
I understand using relativedelta for exact month calculation taking into account the number of days in each month.
Are there any streamlined libraries for adding just month ints together?
ie. December is 12. Adding 2 months is 14 which is 2 == February?
I am hoping to solve all edge cases surrounding the problem with a tested library.
I have thought of doing a modulo calculation along the following:
curr_month = 12 - 1 (-1 for 0 indexing) = 11
if we do divmod(curr_month, 11) , we get (0,0) but the result in reality should be 11. If I just handled that with an if result[0] == 0: curr_month = 11, then whenever result[1] we will get the wrong answer
This formula should work for you
>>> current_month = 12
>>> delta = 2
>>> (((current_month - 1) + delta) % 12) + 1
2
I have a 10 years climatological dateset as follows.
dt T P
01-01-2010 3 0
02-01-2010 5 11
03-01-2010 10 50
....
31-12-2020 -1 0
I want to estimate the total number of days in each month where T and P continuously stayed greater than 0 for three days or more
I would want these columns as an output:
month Number of days/DurationT&P>0 T P
I have never used loops in python, I seem to be able to write a simple loop and nothing beyond this when the data has to be first grouped by month and year and then apply the condition. would really appreciate any hints on the construction of the loop.
A= dataset
A['dt'] = pd.to_datetime(A['dt'], format='%Y-%m-%d')
for column in A [['P', 'T']]:
for i in range (len('P')):
if i > 0:
P.value_counts()
print(i)
for j in range (len ('T')):
if i > 0:
T.value_counts()
print(j)
Here is a really naive way you could set it up by simply iterating over the rows:
df['valid'] = (df['T'] > 0) & (df['P'] > 0)
def count_total_days(df):
i = 0
total = 0
for idx, row in df.iterrows():
if row.valid:
i += 1
elif not row.valid:
if i >= 3:
total += i
i = 0
return total
Since you want it per month, you would first have to create new month and year columns to group by:
df['month'] = df['dt'].dt.month
df['year'] = df['dt'].dt.year
for date, df_subset in df.groupby(['month', 'year']):
count_total_days(df_subset)
You can use resample and sum to get the sum of day for each where the condition is true.
import pandas as pd
dt = ["01-01-2010", "01-02-2010","01-03-2010","01-04-2010", "03-01-2010",'12-31-2020']
t=[3,66,100,5,10,-1]
P=[0,77,200,11,50,0]
A=pd.DataFrame(list(zip(dt, t,P)),
columns =['dtx', 'T','P'])
A['dtx'] = pd.to_datetime(A['dtx'], format='%m-%d-%Y')
A['Mask']=A.dtx.diff().dt.days.ne(1).cumsum()
dict_freq=A['Mask'].value_counts().to_dict()
newdict = dict((k, v) for k, v in dict_freq.items() if v >= 3)
A=A[A['Mask'].isin(list(newdict.keys()))]
A['Mask']=(A['T'] >= 1) & (A['P'] >= 1)
df_summary=A.query('Mask').resample(rule='M',on='dtx')['Mask'].sum()
Which produce
2010-01-31 3
I am very new to Python and we were told to write the weekday function without any modules like e.g. daytime etc.
But it doesn't work and i am not sure where is a problem
def weekDay (day,month,year):
global y0
global m0
global x
y0 = 0
m0 = 0
day,month,year = int(input("Enter a day: "))
month = int(input("Enter a month: "))
year = int(input("Enter a year:"))
a = (day + x + (31 * m0) // 12) % 7
for m0 in a:
m0 = month + 12 * ((14 - month) // 12) - 2
for x in a:
x = y0 + y0 // 4 - y0 // 100 + y0 // 400
for y0 in x:
y0 = year - ((14 - month) // 12)
if a == 0:
print("Sunday")
elif a == 1:
print("Monday")
elif a == 2:
print("Tuesday")
elif a == 3:
print("Wednesday")
elif a == 4:
print("Thursday")
elif a == 5:
print("Friday")
else:
print("Error")
return weekDay(a)
'''
here is the formula we were given:
[![formula][1]][1]
[1]: https://i.stack.imgur.com/iBv30.png
This should help:
>>> import datetime
>>> now = datetime.datetime.now()
>>> now.strftime('%A')
'Friday'
>>>
Global variables not defined anywhere and I am not able to understand the logic you are trying to write. So written a function based on a aptitude trick.
def weekday(day,month,year):
"""
This function is written based upon aptitude trick
to obtain day from given a date.
Input date example : 15-5-2020
Link for logic : https://www.youtube.com/watch?v=rJ0_GWDTdD4
"""
# for different months we are assigning specific number to that month
months = {1:1, 2:4, 3:4, 4:0, 5:2, 6:5, 7:0, 8:3, 9:6, 10:1, 11:4, 12:6}
# assigning days to a number
day_name = {1:'Sunday', 2:'Monday', 3:'Tuesday', 4:'Wednesday', 5:'Thursday',
6:'Friday', 0:'Saturday'}
# to get the year in between 1600 and 2000. since we are assiging values
# for the years also
while year not in range(1600,2000):
if year>2000:
year-=400
if year<1600:
year+=400
# assigning values to years
if year in range(1600,1700):
yr = 6
if year in range(1700,1800):
yr = 4
if year in range(1800,1900):
yr = 2
if year in range(1900,2000):
yr = 0
# assigning last two numbers of year to last
first = year//100
last = year - (first * 100)
# obtaining remainder
res = (day + months[month] + yr + last + (last//4))%7
#returning the day_name
return day_name[res]
day,month,year = list(map(int,input("Enter date in format dd-mm-yyyy : ").split('-')))
print(weekday(day,month,year))
Hope, you are satisfied with logic.
I want to adapt the code here: Given a date range how can we break it up into N contiguous sub-intervals? to split a range of dates as follows:
Starting date is '2015-03-20' and end date is '2017-03-12'. I want to split it into 3 parts. One for each of the 3 years, so that I get a list like this:
[['2015-03-20', '2015-12-31'], ['2016-01-01', '2016-12-31'], ['2017-01-01', '2017-03-12']]
Any pythonic way to do this?
If I don't misunderstand your meaning, you can just get the middle years and append it to the string like -12-31 or -01-01.
start_date = '2015-03-20'
end_date = '2018-03-12'
def split_date(s,e):
return [[s,s[:4]+"-12-31"]]+ [['%s-01-01'%(str(i)), '%s-12-31'%(str(i))] for i in range(int(s[:4])+1,int(e[:4]))]+[[e[:4] + "-01-01", e]]
print(split_date(start_date,end_date))
Result:
[['2015-03-20', '2015-12-31'], ['2016-01-01', '2016-12-31'], ['2017-01-01', '2017-12-31'], ['2018-01-01', '2018-03-12']]
Modifying the original code that you linked:
from datetime import datetime, timedelta
def date_range(start, end, interval):
start = datetime.strptime(start, "%Y%m%d")
end = datetime.strptime(end, "%Y%m%d")
diff = (end - start) / interval
for i in range(interval):
if i == 0:
date_additive = 0
else:
date_additive = 1
yield ["{0}-{1}-{2}".format(str(((start + diff * i) + timedelta(days=date_additive)).strftime("%Y").zfill(2)),
str(((start + diff * i) + timedelta(days=date_additive)).strftime("%m").zfill(2)),
str(((start + diff * i) + timedelta(days=date_additive)).strftime("%d").zfill(2))),
"{0}-{1}-{2}".format(str((start + diff * (i + 1)).strftime("%Y").zfill(2)),
str((start + diff * (i + 1)).strftime("%m").zfill(2)),
str((start + diff * (i + 1)).strftime("%d").zfill(2)))]
Input example:
def main():
begin = "20150320"
end = "20170312"
interval = 3
print(list(date_range(begin, end, interval)))
main()
Results:
[['2015-03-20', '2015-11-16'], ['2015-11-17', '2016-07-14'], ['2016-07-15', '2017-03-12']]
I'm trying to build a function that recieves a date and adds days, updating everything in case it changes, so far i've come up with this:
def addnewDate(date, numberOfDays):
date = date.split(":")
day = int(date[0])
month = int(date[1])
year = int(date[2])
new_days = 0
l = 0
l1 = 28
l2 = 30
l3 = 31
#l's are the accordingly days of the month
while numberOfDays > l:
numberOfDays = numberOfDays - l
if month != 12:
month += 1
else:
month = 1
year += 1
if month in [1, 3, 5, 7, 8, 10, 12]:
l = l3
elif month in [4, 6, 9, 11]:
l = l2
else:
l = l1
return str(day) + ':' + str(month) + ':' + str(year) #i'll deal
#with fact that it doesn't put the 0's in the < 10 digits later
Desired output:
addnewDate('29:12:2016', 5):
'03:01:2017'
I think the problem is with either the variables, or the position i'm using them in, kinda lost though..
Thanks in advance!
p.s I can't use python build in functions :)
Since you cannot use standard library, here's my attempt. I hope I did not forget anything.
define a table for month lengths
tweak it if leap year detected (every 4 year, but special cases)
work on zero-indexed days & months, much easier
add the number of days. If lesser that current month number of days, end, else, substract current month number of days and retry (while loop)
when last month reached, increase year
add 1 to day and month in the end
code:
def addnewDate(date, numberOfDays):
month_days = [31,28,31,30,31,30,31,31,30,31,30,31]
date = date.split(":")
day = int(date[0])-1
month = int(date[1])-1
year = int(date[2])
if year%4==0 and year%400!=0:
month_days[1]+=1
new_days = 0
#l's are the accordingly days of the month
day += numberOfDays
nb_days_month = month_days[month]
done = False # since you don't want to use break, let's create a flag
while not done:
nb_days_month = month_days[month]
if day < nb_days_month:
done = True
else:
day -= nb_days_month
month += 1
if month==12:
year += 1
month = 0
return "{:02}:{:02}:{:04}".format(day+1,month+1,year)
test (may be not exhaustive):
for i in ("28:02:2000","28:02:2004","28:02:2005","31:12:2012","03:02:2015"):
print(addnewDate(i,2))
print(addnewDate(i,31))
result:
02:03:2000
31:03:2000
01:03:2004
30:03:2004
02:03:2005
31:03:2005
02:01:2013
31:01:2013
05:02:2015
06:03:2015
of course, this is just for fun. Else use time or datetime modules!