Python error: cannot add integral value to Timestamp without freq - python

I am trying calculate the difference between two dates to get a number that is an integer difference (in days) between the two dates, but I get the following error: "Cannot add integral value to Timestmp without freq". Here is the code:
from __future__ import print_function
try:
import argparse
flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
flags = None
import os
import datetime
import pandas_datareader.data as web
import numpy as np
import pandas as pd
def main():
count = 0
df = pd.DataFrame([])
start = datetime.datetime(2017, 10, 11)
end = datetime.datetime(2017, 10, 27)
index_date = datetime.datetime(2017, 10, 11)
symbols_list = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
length = len(symbols_list)
for num, ticker in enumerate(symbols_list, start=1):
f = web.DataReader(ticker, 'yahoo', start, end)['Adj Close']
f.ix[index_date]
if count == 0:
f = f.to_frame().reset_index()
df = f
df.columns = ['Date', ticker]
length_df = len(df)
sDate = df.iloc[:,-2] # Date data list
print ('sDate[0] is: ', (sDate[0]))
j = 0
while j < len(sDate[j] - 1):
date_delta = timedelta(sDate[j] - index_date)
j += 1
It crashes at the last line:
date_delta = timedelta(sDate[j] - index_reference_date)
The error message is: "Cannot add integral value to Timestmp without freq".
I cannot understand what the problem is. The data types are:
sDate[0] is: 2017-10-06 00:00:00, and
index_date is: 2017-10-11 00:00:00
index_date type is: <type 'datetime.datetime'>
But note that:
sDate[0] type is: <class 'pandas._libs.tslib.Timestamp'>
So: Maybe the problem is here? Thanks for any help!

There is a typing error on this line:
while j < len(sDate[j] - 1):
sDate is a date data list, thus sDate[j] is a date (probably of type pandas.tslib.Timestamp) and it's length does not make sense. So you probably want something like:
while j < len(sDate) - 1:
Maybe it's more appropriate to use a for loop, something like:
for dat in sDate[:-1]:
Edit: and then you need the thinks I wrote to the first answer.

The important thing may be the type of the difference sDate[j] - index_reference_date and how to pass it to timedelta constructor.
I believe this could be the solution:
date_delta = timedelta(microseconds=(sDate[j] - index_reference_date).delta)

Related

Python Network Hours

Really struggling with this one so any help would be much appreciated.
GOAL - workout the hours between two datetime columns excluding weekends and only taking the hours between the working times of 9 & 17.
Now I have reused a function that I use for network days but the output is wrong and I can't seem to figure out how to get it working.
As an example I have In my data a start date and end date that are as follows
Start_Date = 2017-07-11 19:33:00
End_Date = 2017/07/12 12:01:00
and the output I'm after is
3.02
However the function I do have is returning 16!
Function below -
start = pd.Series(start)
end = pd.Series(end)
mask = (pd.notnull(start) & pd.notnull(end)) & (start.dt.hour >= 9) & (end.dt.hour <= 17) & (start.dt.weekday < 5) & (end.dt.weekday < 5)
result = np.empty(len(start), dtype=float)
result.fill(np.nan)
result[mask] = np.where((start[mask].dt.hour >= 9) & (end[mask].dt.hour <= 17), (end[mask] - start[mask]).astype('timedelta64[h]').astype(float), 0)
return result ```
It looks like what you need is businesstimedelta
import datetime
import businesstimedelta
start = datetime.datetime.strptime("2017-07-11 19:33:00", "%Y-%m-%d %H:%M:%S")
end = datetime.datetime.strptime("2017-07-12 12:01:00", "%Y-%m-%d %H:%M:%S")
# Define a working day rule
workday = businesstimedelta.WorkDayRule(
start_time=datetime.time(9),
end_time=datetime.time(17),
working_days=[0, 1, 2, 3, 4])
businesshours = businesstimedelta.Rules([workday])
# Calculate the difference
diff = businesshours.difference(start, end)
print(diff)
Output:
<BusinessTimeDelta 3 hours 60 seconds>
https://pypi.org/project/businesstimedelta/
So, I really struggled finding out how to apply the above into a function but finally after much banging of the head came up with the below. Sharing for the next person in my situation. I wanted to convert to minutes so if not required just remove the *60 at the return
import datetime
import businesstimedelta
# Define a working day
workday = businesstimedelta.WorkDayRule(
start_time=datetime.time(9),
end_time=datetime.time(17),
working_days=[0, 1, 2, 3, 4])
# Combine the two
businesshrs = businesstimedelta.Rules([workday])
def business_Mins(df, start, end):
try:
mask = pd.notnull(df[start]) & pd.notnull(df[end])
result = np.empty(len(df), dtype=object)
result[mask] = df.loc[mask].apply(lambda x: businesshrs.difference(x[start],x[end]).hours, axis=1)
result[~mask] = np.nan
return result * 60
except KeyError as e:
print(f"Error: One or more columns not found in the dataframe - {e}")
return None
df['Contact_SLA'] = business_Mins(df, 'Date and Time of Instruction', 'Date and Time of Attempted Contact')

Python converting time from string to timedelta to find time difference

how do I convert string to timedelta in order to create a new column within my dataframe?
from pandas as pd
from numpy as np
from datetime import timedelta
pricetime = pd.DataFrame({'price1':[22.34, 44.68, 52.98], 'time1':['9:48:14', '15:54:33', '13:13:22'],'price2':[28.88, 47.68, 22.32], 'time2':['10:52:44', '15:59:59', '10:12:22']})
pricetime['price_change'] = np.where(pricetime['time1'] < pricetime['time2'], (pricetime['price1'] - pricetime['price2'])/pricetime['price2'], np.nan)
pricetime['time_diff'] = np.where(pricetime['time1'] < pricetime['time2'], pricetime['time2'] - pricetime['time1'], np.nan)
When I do this. I get an error for the time where I'm subtracting the two different times.
I tried to do this but it gave me an error:
pricetime['price_change'] = np.where((datetime.strptime(pricetime['time1'], '%H:%M:%S') < datetime.strptime(pricetime['time2'], '%H:%M:%S')), (pricetime['price1'] - pricetime['price2'])/pricetime['price2'], np.nan)
pricetime['time_diff'] = np.where((datetime.strptime(pricetime['time1'], '%H:%M:%S') < datetime.strptime(pricetime['time2'], '%H:%M:%S'), datetime.strptime(pricetime['time2'], '%H:%M:%S') - datetime.strptime(pricetime['time1'], '%H:%M:%S'), np.nan)
The error it gave is:
TypeError: strptime() argument 1 must be str, not Series
after a discussion with #Marc_Law the answer he looked for is:
pricetime['time_diff'] = pd.to_datetime(pricetime['time2']) - pd.to_datetime(pricetime['time1'])
pricetime.loc[pd.to_datetime(pricetime['time1']) >= pd.to_datetime(pricetime['time2']),'time_diff'] = np.nan
pricetime['time_diff'] = pricetime['time_diff'].apply(lambda x: str(x).split(' ')[-1:][0])
what he needed is to have the difference only if the value in time1 column was smaller than the value in time2 column, otherwise put np.nan. than return it to string without the "X days".
If you only want to find the difference in time, then you can follow this sample
from datetime import datetime
foo = '9:48:14'
bar = '15:54:33'
foo = datetime.strptime(foo, '%H:%M:%S')
bar = datetime.strptime(bar, '%H:%M:%S')
print(bar - foo)
Output
6:06:19
Further reading

Error while subtracting 2 date columns in pandas

I have a dataframe and a function to get random dates..
from datetime import date, timedelta
import pandas as pd
import random
def dates(start_date, end_date):
start_date = date(start_date[0], start_date[1], start_date[2])
end_date = date(end_date[0], end_date[1], end_date[2])
days_delta = (end_date - start_date).days
return start_date + timedelta(days=random.randrange(days_delta))
df = pd.DataFrame(index=range(100))
df['MOVE_OUT_DATE'] = date(9999, 12, 31)
df['MOVE_IN_DATE'] = [dates((2021, 1, 1), (2021, 6, 30)) for _ in range(df.shape[0])]
To get the difference in days I do this,
df['days_diff'] = df['MOVE_OUT_DATE'] - df['MOVE_IN_DATE']
and this works fine in VS Code. But it throws a "Python int too large to convert to C long" in Databricks. A screenshot of error is attached below,
Any help or suggestion is appreciated. Thank you.
I was able to get everything to work and I believe it is what you are trying to accomplish with your code
df = pd.DataFrame(pd.date_range('2021-01-01', '2021-06-01', freq = 'D'), columns = ['START_DATE'])
df['MOVE_OUT_DATE'] = '2260-12-31'
df['START_DATE'] = pd.to_datetime(df['START_DATE'])
df['MOVE_OUT_DATE'] = pd.to_datetime(df['MOVE_OUT_DATE'])
df['DAYS_DIFF'] = df['MOVE_OUT_DATE'] - df['START_DATE']
df
However, if you notice the 'MOVE_OUT_DATE' is only set to 2060 as anything long than that produced an error as the being to long. Could you take this and generate the results you want (if you converted it into a function)?

Create new column based on multiple conditions of existing column while manipulating existing column

I am new to Python/pandas coming from an R background. I am having trouble understanding how I can manipulate an existing column to create a new column based on multiple conditions of the existing column. There are 10 different conditions that need to met but for simplicity I will use a 2 case scenario.
In R:
install.packages("lubridate")
library(lubridate)
df <- data.frame("Date" = c("2020-07-01", "2020-07-15"))
df$Date <- as.Date(df$Date, format = "%Y-%m-%d")
df$Fiscal <- ifelse(day(df$Date) > 14,
paste0(year(df$Date),"-",month(df$Date) + 1,"-01"),
paste0(year(df$Date),"-",month(df$Date),"-01")
)
df$Fiscal <- as.Date(df$Fiscal, format = "%Y-%m-%d")
In Python I have:
import pandas as pd
import datetime as dt
df = {'Date': ['2020-07-01', '2020-07-15']}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'], yearfirst = True, format = "%Y-%m-%d")
df.loc[df['Date'].dt.day > 14,
'Fiscal'] = "-".join([dt.datetime.strftime(df['Date'].dt.year), dt.datetime.strftime(df['Date'].dt.month + 1),"01"])
df.loc[df['Date'].dt.day <= 14,
'Fiscal'] = "-".join([dt.datetime.strftime(df['Date'].dt.year), dt.datetime.strftime(df['Date'].dt.month),"01"])
If I don't convert the 'Date' field it says that it expects a string, however if I do convert the date field, I still get an error as it seems it is applying to a 'Series' object.
TypeError: descriptor 'strftime' for 'datetime.date' objects doesn't apply to a 'Series' object
I understand I may have some terminology or concepts incorrect and apologize, however the answers I have seen dealing with creating a new column with multiple conditions do not seem to be manipulating the existing column they are checking the condition on, and simply taking on an assigned value. I can only imagine there is a more efficient way of doing this that is less 'R-ey' but I am not sure where to start.
This isn't intended as a full answer, just as an illustration how strftime works: strftime is a method of a date(time) object that takes a format-string as argument:
import pandas as pd
import datetime as dt
df = {'Date': ['2020-07-01', '2020-07-15']}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'], yearfirst = True, format = "%Y-%m-%d")
s = [dt.date(df['Date'][i].year, df['Date'][i].month + 1, 1).strftime('%Y-%m-%d')
for i in df['Date'].index]
print(s)
Result:
['2020-08-01', '2020-08-01']
Again: No full answer, just a hint.
EDIT: You can vectorise this, for example by:
import pandas as pd
import datetime as dt
df = {'Date': ['2020-07-01', '2020-07-15']}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'], yearfirst=True, format='%Y-%m-%d')
df['Fiscal'] = df['Date'].apply(lambda d: dt.date(d.year, d.month, 1)
if d.day < 15 else
dt.date(d.year, d.month + 1, 1))
print(df)
Result:
Date Fiscal
0 2020-07-01 2020-07-01
1 2020-07-15 2020-08-01
Here I'm using an on-the-fly lambda function. You could also do it with an externally defined function:
def to_fiscal(date):
if date.day < 15:
return dt.date(date.year, date.month, 1)
return dt.date(date.year, date.month + 1, 1)
df['Fiscal'] = df['Date'].apply(to_fiscal)
In general vectorisation is better than looping over rows because the looping is done on a more "lower" level and that is much more efficient.
Until someone tells me otherwise I will do it this way. If there's a way to do it vectorized (or just a better way in general) I would greatly appreciate it
import pandas as pd
import datetime as dt
df = {'Date': ['2020-07-01', '2020-07-15']}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'], yearfirst=True, format='%Y-%m-%d')
test_list = list()
for i in df['Date'].index:
mth = df['Date'][i].month
yr = df['Date'][i].year
dy = df['Date'][i].day
if(dy > 14):
new_date = dt.date(yr, mth + 1, 1)
else:
new_date = dt.date(yr, mth, 1)
test_list.append(new_date)
df['New_Date'] = test_list

Datetime output from Nonetype to Dataframe

there. I'm trying to write a webscraper using python and need to first create a column of dates. I've gotten the list I need, but it keeps coming out as NoneType. Any ideas on how to get this to work as a dataframe?
Relevant part of code:
import datetime
from datetime import date
date1 = '2019-01-01'
date2 = '2019-01-30'
start = datetime.datetime.strptime(date1,'%Y-%m-%d')
end = datetime.datetime.strptime(date2,'%Y-%m-%d')
step = datetime.timedelta(days=1)
while start <= end:
daterange = print(start.strftime('%Y%m%d'))
start += step
type(daterange)
Thanks in advance!
Here
daterange = print(start.strftime('%Y%m%d'))
should be
daterange = start.strftime('%Y%m%d')
EXTRA:
if you want to save the daterange:
import datetime
from datetime import date
date1 = '2019-01-01'
date2 = '2019-01-30'
daterange_list = []
start = datetime.datetime.strptime(date1,'%Y-%m-%d')
end = datetime.datetime.strptime(date2,'%Y-%m-%d')
step = datetime.timedelta(days=1)
while start <= end:
daterange = start.strftime('%Y%m%d')
daterange_list.append(daterange)
start += step
type(daterange)
str
type(daterange_list)
list

Categories

Resources