Cannot compare dates between date variable and pandas dataframe - python

I have a frustrating issue while comparing variable date with pandas dataset of dates. No matter what formatting options I try, I just cannot get these in line.
May you guys please help, I basically only need to compare the dates in the pandas dataset with todays date + 6 months.
My code:
SourceData_Workbook = R"G:\AR\REPORTS\Automation Files\Credit Risk\test1.xlsx"
SourceInPandas = pd.read_excel(SourceData_Workbook, skiprows=33,header=0,index=False)
# Creating date variable + 6 months
six_months = date.today() + relativedelta(months=+6)
# Formatting sourced data to date format
SourceInPandas['Req.dlv.dt']=SourceInPandas['Req.dlv.dt'].apply(lambda x:datetime.strptime(x,'%d.%m.%Y'))
# Fails on this line
SourceInPandas.loc[(SourceInPandas['Req.dlv.dt']<= six_months) & (SourceInPandas['OpIt'] != "15 Overdue account")& (SourceInPandas['OpIt'] != "16 Prepayment required")& (SourceInPandas['OpIt'] != "17 Approval required"),"OpIt"]="Future delivery"
Stack trace:
TypeError: Invalid comparison between dtype=datetime64[ns] and date

You can use Timestamp with Timestamp.floor and addded 6 months by DateOffset:
six_months = pd.Timestamp('today').floor('d') + pd.DateOffset(months=6)
print (six_months)
2021-06-10 00:00:00
SourceInPandas['Req.dlv.dt']=pd.to_datetime(SourceInPandas['Req.dlv.dt'], dayfirst=True)

Related

How would I do date time math on a DF column using today's date?

Essentially I want to create a new column that has the number of days remaining until maturity from today. The code below doesn't work, kind of stuck what to do next as nearly all examples showcase doing math on 2 DF columns.
today = date.today()
today = today.strftime("%m/%d/%y")
df['Maturity Date'] = df['Maturity Date'].apply(pd.to_datetime)
df['Remaining Days til Maturity] = (df['Maturity Date'] - today)
You're mixing types, it's like subtracting apples from pears. In your example, today is a string representing - to us humans - a date (in some format, looks like the one used in the USA). Your pandas Series (the column of interest in your DataFrame) has a datetime64[ns] type, after you did the apply(pd.to_datetime) (which, you could do more efficiently without the apply as that will run an operation in a non-vectorized way over every element of the Series - have a look below, where I'm converting those strings into datetime64[ns] type in a vectorized way).
The main idea is that whenever you do operations with multiple objects, they should be of the same type. Sometimes frameworks will automatically convert types for you, but don't rely on it.
import pandas as pd
df = pd.DataFrame({"date": ["2000-01-01"]})
df["date"] = pd.to_datetime(df["date"])
today = pd.Timestamp.today().floor("D") # That's one way to do it
today
# Timestamp('2021-11-02 00:00:00')
today - df["date"]
# 0 7976 days
# Name: date, dtype: timedelta64[ns]
parse the Maturity Date as a datetime and format it as month/day/year then subtract the Maturity Date as a date type and store the difference in days as Remaining Days til Maturity
from datetime import date
today = date.today()
df=pd.DataFrame({'Maturity Date':'11/04/2021'},index=[0])
df['Maturity Date'] = pd.to_datetime(df['Maturity Date'], format='%m/%d/%Y')
df['Remaining Days til Maturity'] = (df['Maturity Date'].dt.date - today).dt.days
print(df)
output:
Maturity Date Remaining Days til Maturity
0 2021-11-04 2

In Python,Excel,Pandas, I Need to move row based on Date to new sheet, but getting date error

so I have an excel report that is run weekly that we manually modify. I have created a Python scrip to remove unwanted data and format it. No I have a request that I need to copy data that is 1 year or older from today, 2 years, 3years and so on. not a programmer so bare with me.
I import the Excel
>excel_workbook = 'Excel.xlsx'
sheet1 = pd.read_excel(excel_workbook, sheet_name='Sheet1', keep_default_na= False, index_col=0,
parse_dates=['DT RECD'])
then when i try to set date tp format i get this:
today = pd.datetime.now().date()
print(today)
oneYear = today - pd.Timedelta.days(365) #timedelta(days=365)
print(oneYear)
twoYear = today - pd.Timedelta.days(730) #timedelta(days=730)
print(twoYear)
errors:
pandas.core.arrays.datetimelike.InvalidComparison: 730 days, 0:00:00
raise TypeError(f"Invalid comparison between dtype={left.dtype} and {typ}")
TypeError: Invalid comparison between dtype=datetime64[ns] and timedelta
how do I match dates so I can State something like OverOneYear = <= oneyear and >= twoyear on Column 'DT RECD'
so i have made the changes:
today = datetime.date.today()
oneYear = today - pd.DateOffset(years=1)
twoYear = today - pd.DateOffset(years=2)
Output:
2021-03-01
2020-03-01 00:00:00
2019-03-01 00:00:00
Oringinal Time Stamp
MM/DD/YYY
after changes
2020-03-01 00:00:00
Sample Data set
Dataset
Trying to filter
YearOne[YearOne['DT RECD'].between(oneYear, twoYear)]
print(YearOne)
no error - But date 2021-03-01(was this after import "parse_dates=['DT RECD']") does not match 2020-03-01 00:00:00, i do not need the 00:00:00. if i can drop it i think it will filter just fine.
It is in datetime64[ns]
sheet1.dtypes
FINISH object
LENGTH object
Qty on Hand int64
Unit Cost float64
DT RECD datetime64[ns]
PRODUCED object
Status object
You can replace your setting of oneYear and twoYear as follows:
oneYear = today - pd.offsets.DateOffset(years=1)
twoYear = today - pd.offsets.DateOffset(years=2)
If you want to select for dates in Column 'DT RECD' within the last one year and 2 years, you can use: (assuming df is the name of DataFrame)
df_selected = df[df['DT RECD'].between(oneYear, twoYear)) # extract into new variable
print(df_selected) # view the selected result
Edit:
If you want to keep the dates in datetime.date format instead of Timestamp format, add the following 2 lines in addition to the above:
oneYear = oneYear.to_pydatetime().date()
twoYear = twoYear.to_pydatetime().date()

Converting a string containing year and week number to datetime in Pandas

I have a column in a Pandas dataframe that contains the year and the week number (1 up to 52) in one string in this format: '2017_03' (meaning 3d week of year 2017).
I want to convert the column to datetime and I am using the pd.to_datetime() function. However I get an exception:
pd.to_datetime('2017_01',format = '%Y_%W')
ValueError: Cannot use '%W' or '%U' without day and year
On the other hand the strftime documentation mentions that:
I am not sure what I am doing wrong.
You need also define start day:
a = pd.to_datetime('2017_01_0',format = '%Y_%W_%w')
print (a)
2017-01-08 00:00:00
a = pd.to_datetime('2017_01_1',format = '%Y_%W_%w')
print (a)
2017-01-02 00:00:00
a = pd.to_datetime('2017_01_2',format = '%Y_%W_%w')
print (a)
2017-01-03 00:00:00

Python Date Index: finding the closest date a year ago from today

I have a panda dataframe (stock prices) with an index in a date format. It is daily but only for working days.
I basically try to compute some price performance YTD and from a year ago.
To get the first date of the actual year in my dataframe I used the following method:
today = str(datetime.date.today())
curr_year = int(today[:4])
curr_month = int(today[5:7])
first_date_year = (df[str(curr_year)].first_valid_index())
Now I try to get the closest date a year ago (exactly one year from the last_valid_index()). I could extract the month and the year but then it wouldn't be as precise. Any suggestion ?
Thanks
Since you didn't provide any data, I am assuming that you have a list of dates (string types) like the following:
dates = ['11/01/2016', '12/01/2016', '02/01/2017', '03/01/2017']
You then need to transform that into datetime format, I would suggest using pandas:
pd_dates = pd.to_datetime(dates)
Then you have to define today and one year ago. I would suggest using datetime for that:
today = datetime.today()
date_1yr_ago = datetime(today.year-1, today.month, today.day)
Lastly, you slice the date list for dates larger than the date_1yr_ago value and get the first value of that slice:
pd_dates[pd_dates > date_1yr_ago][0]
This will return the first date that is larger than the 1 year ago date.
output:
Timestamp('2017-02-01 00:00:00')
You can convert that datetime value to string with the following code:
datetime.strftime(pd_dates[pd_dates > date_1yr_ago][0], '%Y/%m/%d')
output:
'2017/02/01'

Stripping and testing against Month component of a date

I have a dataset that looks like this:
import numpy as np
import pandas as pd
raw_data = {'Series_Date':['2017-03-10','2017-04-13','2017-05-14','2017-05-15','2017-06-01']}
df = pd.DataFrame(raw_data,columns=['Series_Date'])
print df
I would like to pass in a date parameter as a string as follows:
date = '2017-03-22'
I would now like to know if there are any dates in my DataFrame 'df' for which the month is 3 months after the month in the date parameter.
That is if the month in the date parameter is March, then it should check if there are any dates in df from June. If there are any, I would like to see those dates. If not, it should just output 'No date found'.
In this example, the output should be '2017-06-01' as it is a date from June as my date parameter is from March.
Could anyone help how may I get started with this?
convert your column to Timestamp
df.Series_Date = pd.to_datetime(df.Series_Date)
date = pd.to_datetime('2017-03-01')
Then
df[
(df.Series_Date.dt.year - date.year) * 12 +
df.Series_Date.dt.month - date.month == 3
]
Series_Date
4 2017-06-01

Categories

Resources