Converting a string containing year and week number to datetime in Pandas - python

I have a column in a Pandas dataframe that contains the year and the week number (1 up to 52) in one string in this format: '2017_03' (meaning 3d week of year 2017).
I want to convert the column to datetime and I am using the pd.to_datetime() function. However I get an exception:
pd.to_datetime('2017_01',format = '%Y_%W')
ValueError: Cannot use '%W' or '%U' without day and year
On the other hand the strftime documentation mentions that:
I am not sure what I am doing wrong.

You need also define start day:
a = pd.to_datetime('2017_01_0',format = '%Y_%W_%w')
print (a)
2017-01-08 00:00:00
a = pd.to_datetime('2017_01_1',format = '%Y_%W_%w')
print (a)
2017-01-02 00:00:00
a = pd.to_datetime('2017_01_2',format = '%Y_%W_%w')
print (a)
2017-01-03 00:00:00

Related

Cannot compare dates between date variable and pandas dataframe

I have a frustrating issue while comparing variable date with pandas dataset of dates. No matter what formatting options I try, I just cannot get these in line.
May you guys please help, I basically only need to compare the dates in the pandas dataset with todays date + 6 months.
My code:
SourceData_Workbook = R"G:\AR\REPORTS\Automation Files\Credit Risk\test1.xlsx"
SourceInPandas = pd.read_excel(SourceData_Workbook, skiprows=33,header=0,index=False)
# Creating date variable + 6 months
six_months = date.today() + relativedelta(months=+6)
# Formatting sourced data to date format
SourceInPandas['Req.dlv.dt']=SourceInPandas['Req.dlv.dt'].apply(lambda x:datetime.strptime(x,'%d.%m.%Y'))
# Fails on this line
SourceInPandas.loc[(SourceInPandas['Req.dlv.dt']<= six_months) & (SourceInPandas['OpIt'] != "15 Overdue account")& (SourceInPandas['OpIt'] != "16 Prepayment required")& (SourceInPandas['OpIt'] != "17 Approval required"),"OpIt"]="Future delivery"
Stack trace:
TypeError: Invalid comparison between dtype=datetime64[ns] and date
You can use Timestamp with Timestamp.floor and addded 6 months by DateOffset:
six_months = pd.Timestamp('today').floor('d') + pd.DateOffset(months=6)
print (six_months)
2021-06-10 00:00:00
SourceInPandas['Req.dlv.dt']=pd.to_datetime(SourceInPandas['Req.dlv.dt'], dayfirst=True)

Converting different date time formats to MM/DD/YYYY format in pandas dataframe

I have a date column in a pandas.DataFrame in various date time formats and stored as list object, like the following:
date
1 [May 23rd, 2011]
2 [January 1st, 2010]
...
99 [Apr. 15, 2008]
100 [07-11-2013]
...
256 [9/01/1995]
257 [04/15/2000]
258 [11/22/68]
...
360 [12/1997]
361 [08/2002]
...
463 [2014]
464 [2016]
For the sake of convenience, I want to convert them all to MM/DD/YYYY format. It doesn't seem possible to use regex replace() function to do this, since one cannot execute this operation over list objects. Also, to use strptime() for each cell will be too time-consuming.
What will be the easier way to convert them all to the desired MM/DD/YYYY format? I found it very hard to do this on list objects within a dataframe.
Note: for cell values of the form [YYYY] (e.g., [2014] and [2016]), I will assume they are the first day of that year (i.e., January 1, 1968) and for cell values such as [08/2002] (or [8/2002]), I will assume they the first day of the month of that year (i.e., August 1, 2002).
Given your sample data, with the addition of a NaT, this works:
Code:
df.date.apply(lambda x: pd.to_datetime(x).strftime('%m/%d/%Y')[0])
Test Code:
import pandas as pd
df = pd.DataFrame([
[['']],
[['May 23rd, 2011']],
[['January 1st, 2010']],
[['Apr. 15, 2008']],
[['07-11-2013']],
[['9/01/1995']],
[['04/15/2000']],
[['11/22/68']],
[['12/1997']],
[['08/2002']],
[['2014']],
[['2016']],
], columns=['date'])
df['clean_date'] = df.date.apply(
lambda x: pd.to_datetime(x).strftime('%m/%d/%Y')[0])
print(df)
Results:
date clean_date
0 [] NaT
1 [May 23rd, 2011] 05/23/2011
2 [January 1st, 2010] 01/01/2010
3 [Apr. 15, 2008] 04/15/2008
4 [07-11-2013] 07/11/2013
5 [9/01/1995] 09/01/1995
6 [04/15/2000] 04/15/2000
7 [11/22/68] 11/22/1968
8 [12/1997] 12/01/1997
9 [08/2002] 08/01/2002
10 [2014] 01/01/2014
11 [2016] 01/01/2016
It would be better if you use this it'll give you the date format in MM-DD-YYYY the you can apply strftime:
df['Date_ColumnName'] = pd.to_datetime(df['Date_ColumnName'], dayfirst = False, yearfirst = False)
Provided code will work for following scenarios.
Change date format from M/D/YY to MM/DD/YY (5/2/2009 to 05/02/2009)
change form ANY FORMAT to MM/DD/YY
import pandas as pd
'''
* checking provided input file date format correct or not
* if format is correct change date format from M/D/YY to MM/DD/YY
* else date format is not correct in input file
Date format change form ANY FORMAT to MM/DD/YY
'''
input_file_name = 'C:/Users/Admin/Desktop/SarenderReddy/predictions.csv'
dest_file_name = 'C:/Users/Admin/Desktop/SarenderReddy/Enrich.csv'
#input_file_name = 'C:/Users/Admin/Desktop/SarenderReddy/enrichment.csv'
read_data = pd.read_csv(input_file_name)
print(pd.to_datetime(read_data['Date'], format='%m/%d/%Y', errors='coerce').notnull().all())
if pd.to_datetime(read_data['Date'], format='%m/%d/%Y', errors='coerce').notnull().all():
print("Provided correct input date format in input file....!")
read_data['Date'] = pd.to_datetime(read_data['Date'],format='%m/%d/%Y')
read_data['Date'] = read_data['Date'].dt.strftime('%m/%d/%Y')
read_data.to_csv(dest_file_name,index=False)
print(read_data['Date'])
else:
print("NOT... Provided correct input date format in input file....!")
data_format = pd.read_csv(input_file_name,parse_dates=['Date'], dayfirst=True)
#print(df['Date'])
data_format['Date'] = pd.to_datetime(data_format['Date'],format='%m/%d/%Y')
data_format['Date'] = data_format['Date'].dt.strftime('%m/%d/%Y')
data_format.to_csv(dest_file_name,index=False)
print(data_format['Date'])

How can I get all the dates within a week of a certain day using datetime?

I have some measurements that happened on specific days in a dictionary. It looks like
date_dictionary['YYYY-MM-DD'] = measurement.
I want to calculate the variance between the measurements within 7 days from a given date. When I convert the date strings to a datetime.datetime, the result looks like a tuple or an array, but doesn't behave like one.
Is there an easy way to generate all the dates one week from a given date? If so, how can I do that efficiently?
You can do this using - timedelta . Example -
>>> from datetime import datetime,timedelta
>>> d = datetime.strptime('2015-07-22','%Y-%m-%d')
>>> for i in range(1,8):
... print(d + timedelta(days=i))
...
2015-07-23 00:00:00
2015-07-24 00:00:00
2015-07-25 00:00:00
2015-07-26 00:00:00
2015-07-27 00:00:00
2015-07-28 00:00:00
2015-07-29 00:00:00
You do not actually need to print it, datetime object + timedelta object returns a datetime object. You can use that returned datetime object directly in your calculation.
Using datetime, to generate all 7 dates following a given date, including the the given date, you can do:
import datetime
dt = datetime.datetime(...)
week_dates = [ dt + datetime.timedelta(days=i) for i in range(7) ]
There are libraries providing nicer APIs for performing datetime/date operations, most notably pandas (though it includes much much more). See pandas.date_range.

How to find the number of the day in a year based on the actual dates using Pandas?

My data frame data has a date variable dateOpen with the following format date_format = "%Y-%m-%d %H:%M:%S.%f" and I would like to have a new column called openDay which is the day number based on 365 days a year. I tried applying the following
data['dateOpen'] = [datetime.strptime(dt, date_format) for dt in data['dateOpen']]
data['openDay'] = [dt.day for dt in data['dateOpen']]
however, I get the day in the month. For example if the date was 2013-02-21 10:12:14.3 then the above formula would return 21. However, I want it to return 52 which is 31 days from January plus the 21 days from February.
Is there a simple way to do this in Pandas?
On latest pandas you can use date-time properties:
>>> ts = pd.Series(pd.to_datetime(['2013-02-21 10:12:14.3']))
>>> ts
0 2013-02-21 10:12:14.300000
dtype: datetime64[ns]
>>> ts.dt.dayofyear
0 52
dtype: int64
On older versions, you may be able to convert to a DatetimeIndex and then use .dayofyear property:
>>> pd.Index(ts).dayofyear # may work
array([52], dtype=int32)
Not sure if there's a pandas builtin, but in Python, you can get the "Julian" day, eg:
data['openDay'] = [int(format(dt, '%j')) for dt in data['dateOpen']]
Example:
>>> from datetime import datetime
>>> int(format(datetime(2013,2,21), '%j'))
52
#To find number of days in this year sofar
from datetime import datetime
from datetime import date
today = date.today()
print("Today's date:", today)
print(int(format(today, '%j')))
Today's date: 2020-03-26
86

Reading .dat file date string

In Python:
Got a .dat file one column is a datestr 'yyyy-mm-dd'. Column years range from 2000, to 2010 I only want to use 2005.
How can I successfully read using np.loadtxt, keeping in the same format.
I am then going to use:
time_string = yyyy-mm-dd
doy = int (time.strftime ("%j", time.strptime ( time_string, "%Y, %m, %d")))
to convert yyyy-mm-dd to day of year (1-365)
The question doesn't point any reason to the use of loadtxt from numpy, so you actually don't care about how your data is loaded. Said that, in this case you simply use dtype=object for loading it.
Suppose this is your .dat file, let us call it d1.dat:
1 2000-01-01 blah
2 2005-01-01 bleh
3 2006-02-03 blih
4 2008-03-04 bloh
5 2010-04-05 bluh
6 2005-03-12 blahr
Then (for example) to load it using numpy:
import numpy
data = numpy.loadtxt('d1.dat', usecols=[1,2], dtype=object)
Now you can apply your function to extract the day of the year from the first column in data:
for date, _ in data:
print time.strftime("%j", time.strptime(date, "%Y-%m-%d"))
from datetime import datetime
time_string = '2012-12-31'
dt = datetime.strptime(time_string, '%Y-%m-%d')
print dt.timetuple().tm_yday
366

Categories

Resources