Dealing with different date formats in python - python

So I have a issue around dates that are coming from a excel sheet which I'm transforming into a CSV and then loading into a data frame. Basically the data I'm dealing with each day can come in two different formats. These two date columns are called Appointment Date and Attended Date
I'm dealing with (DD/MM/YYYY HH:MM) and (YYYY/MM/DD HH:MM) and its coming from a third party so I cant set the date format structure. What i need to do is parse the data and remove the HH:MM and output the data only has DD/MM/YYYY.
My current code is currently the following:
df['Appointment Date'] = df['Appointment Date'].str.replace(' ', '/', regex=True)
df['Attended Date'] = df['Attended Date'].str.replace(' ', '/', regex=True)
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M").dt.strftime("%d/%m/%Y")
df['Attended Date'] = pd.to_datetime(df['Attended Date'], format="%d/%m/%Y/%H:%M").dt.strftime("%d/%m/%Y")
But I'm not able to parse the data when it comes through as YYYY/MM/DD HH:MM
Exception error:
time data '2021-10-08/00:00:00' does not match format '%d/%m/%Y/%H:%M' (match)
Any ideas on how i can get around this?

Try it one way, and if it doesn't work, try it the other way.
try:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M:%S").dt.strftime("%d/%m/%Y")
except WhateverDateParseException:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%Y/%m/%d/%H:%M:%S").dt.strftime("%d/%m/%Y")
Of course, instead of WhateverDateParseException use the actual exception that is raised in your code.
Edit: fixed missing "%S"

I would use regular expressions for that as follows:
import pandas as pd
df = pd.DataFrame({"daytime": ["31/12/2020 23:59", "2020/12/31 23:59"]})
df["daypart"] = df["daytime"].str.replace(r" \d\d:\d\d","") # drop HH:MM part
df["day"] = df["daypart"].str.replace(r"(\d\d\d\d)/(\d\d)/(\d\d)", r"\3/\2/\1")
print(df)
output
daytime daypart day
0 31/12/2020 23:59 31/12/2020 31/12/2020
1 2020/12/31 23:59 2020/12/31 31/12/2020
Explanation: I used so-called capturing groups in second .replace, if there is (4 digits)/(2 digits)/(2 digits) their order is re-arranged that 3rd become 1st, 2nd become 2nd and 1st become 3rd (note that group are 1-based, not 0-base like is case with general python indexing). AS day format is now consistent you could be able to parse it easily.

As mentioned by #C14L that method can be followed but my guess seeing your exception is you need to add a seconds format (%S) to your time formatting, so the updated code wld be like
try:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M:%S").dt.strftime("%d/%m/%Y")
except WhateverDateParseException:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%Y/%m/%d/%H:%M:%S").dt.strftime("%d/%m/%Y")

The format, %d/%m/%Y/%H:%M does not match with the Date-Time string, 2021-10-08/00:00:00. You need to use %Y-%m-%d/%H:%M:%S for this Date-Time string.
Demo:
from datetime import datetime
date_time_str = '2021-10-08/00:00:00'
date_str = datetime.strptime(date_time_str, '%Y-%m-%d/%H:%M:%S').strftime('%d/%m/%Y')
print(date_str)
Output:
08/10/2021

Related

Python Pandas Convert 10 digit datetime to a proper date format

I have an excel file which contains date format in 10 digit.
For example,
Order Date as 1806825282.731065,
Purchase Date as 1806765295
Does anyone know how to convert them to a proper date format such as dd/mm/yyyy hh:mm or dd/mm/yyyy? Any date format will be fine.
I tried pd.to_datetime but does not work.
Thanks!
You can do this
(pd.to_timedelta(1806825282, unit='s') + pd.to_datetime('1960-1-1'))
or
(pd.to_timedelta(df['Order Date'], unit='s') + pd.to_datetime('1960-1-1'))
SAS timestamp are stored in seconds from 1960-1-1:
import pandas as pd
origin = pd.Timestamp('1960-1-1')
df = pd.DataFrame({'Order Date': [1806825282.731065],
'Purchase Date': [1806765295]})
df['Order Date'] = origin + pd.to_timedelta(df['Order Date'], unit='s')
df['Purchase Date'] = origin + pd.to_timedelta(df['Purchase Date'], unit='s')
Output:
>>> df
Order Date Purchase Date
0 2017-04-03 07:54:42.731065035 2017-04-02 15:14:55
From The Essential Guide to SAS Dates and Times
SAS has three separate counters that keep track of dates and times. The date counter started
at zero on January 1, 1960. Any day before 1/1/1960 is a negative number, and any day
after that is a positive number. Every day at midnight, the date counter is increased by one.
The time counter runs from zero (at midnight) to 86,399.9999, when it resets to zero. The last
counter is the datetime counter. This is the number of seconds since midnight, January 1, 1960. Why January 1, 1960? One story has it that the founders of SAS wanted to use the
approximate birth date of the IBM 370 system, and they chose January 1, 1960 as an easy-
to-remember approximation.
According to The Pandas Documentation Link:
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Code
>>> pd.to_datetime(1674518400, unit='s')
Timestamp('2023-01-24 15:16:45')
>>> pd.to_datetime(1674518400433502912, unit='ns')
Timestamp('2023-01-24 15:16:45.433502912')
# you can use template
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],unit='ms'))
You can use something like this:
# Convert the 10-digit datetime to a datetime object
df['date_column'] = pd.to_datetime(df['date_column'], unit='s')
# Format the datetime object to the desired format
df['date_column'] = df['date_column'].dt.strftime('%d/%m/%Y %H:%M')
Or if you want a one-liner:
df['date_column'] = pd.to_datetime(df['date_column'], unit='s').dt.strftime('%d/%m/%Y %H:%M')

group by with year of the date

I have a date column in excel,with year_month_day format I want to extract only year of my date and group the column by year,but I got an error
df.index = pd.to_datetime(df[18], format='%y/%m/%d %I:%M%p')
df.groupby(by=[df.index.year])
18 is index of my date column
error=ValueError: time data '2022/04/23' does not match format '%y/%m/%d %I:%M%p' (match)
I don't know how can I fix it.
By the looks of it, the error message indicates that the format string you are using, %y/%m/%d %I:%M%p, doesn't match the format of the dates in your column.
It appears that your date format is YYYY/MM/DD, but the format string you're using is trying to parse it as YY/MM/DD %I:%M%p.
I think you should change the format string to %Y/%m/%d.
df.index = pd.to_datetime(df[18], format='%Y/%m/%d')
Then you can extract the year using the year attribute of the datetime object, and group by the year as you are doing.
Make sure your date column is formatted correctly. I provide here a code with which you can adjust the format of the dates.
import pandas as pd
df = pd.DataFrame({'date': ['2022/04/23', '2022/04/24', '2022/04/25']})
df['date'] = pd.to_datetime(df['date'], format='%Y/%m/%d')

change YYYYDDMM to YYYYMMDD in python

I have a df with dates in a column converted to a datetime. the current format is YYYYDDMM. I need this converted to YYYYMMDD. I tried the below code but it does not change the format and still gives me YYYYDDMM. the end goal is to subtract 1 business day from the effective date but the format needs to be in YYYYMMDD to do this otherwise it subtracts 1 day from the M and not D. can someone help?
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'])
# Effective Date = 20220408 (4th Aug 2022 for clarity)
filtered_df['Effective Date new'] = filtered_df['Effective Date'].dt.strftime("%Y%m%d")
# Effective Date new = 20220408
desired output -- > Effective Date new = 20220804
By default, .to_datetime will interpret the input YYYYDDMM as YYYYMMDD, and therefore print the same thing with %Y%m%d as the format. You can fix this and make it properly parse days in the month greater than 12 by adding the dayfirst keyword argument.
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'], dayfirst=True)
I like to use the datetime library for this purpose. You can use strptime to convert a string into the datetime object and strftime to convert your datetime object to the new string.
from datetime import datetime
def change_date(row):
row["Effective Date new"] = datetime.strptime(row["Effective Date"], "%Y%d%m").strftime("%Y%m%d")
return row
df2 = df.apply(change_date, axis=1)
The output df2 will have Effective Date new as your new column.

Making both day-first and month-first dates in a csv file day-first

I have a csv file that has a column of dates. The dates are in order of month - so January comes first, then Feb, and so on. The problem is some of the dates are in mm/dd/yyyy format and others in dd/mm/yyyy format. Here's what it looks like.
Date
01/08/2005
01/12/2005
15/01/2005
19/01/2005
22/01/2005
26/01/2005
29/01/2005
03/02/2005
05/02/2005
...
I would like to bring all of them to the same format (dd/mm/yyyy)
I am using Python and pandas to read and edit the csv file. I tried using Excel to manually change the date formats using the built-in formatting tools but it seems impossible with the large number of rows. I'm thinking of using regex but I'm not quite sure how to distinguish between month-first and day-first.
# here's what i have so far
date = df.loc[i, 'Date']
pattern = r'\d\d/\d\d/\d\d'
match = re.search(pattern, date)
if match:
date_items = date.split('/')
day = date_items[1]
month = date_items[0]
year = date_items[2]
new_date = f'{dd}/{mm}/{year}'
df.loc[i, 'Date'] = new_date
I want the csv to have a uniform date format in the end.
In short: you can't!
There's no way for you to know if 01/02/2019 is Jan 2nd or Feb 1st!
Same goes for other dates in your examples such as:
01/08/2005
01/12/2005
03/02/2005
05/02/2005

How to format date to 1900's?

I'm preprocessing data and one column represents dates such as '6/1/51'
I'm trying to convert the string to a date object and so far what I have is:
date = row[2].strip()
format = "%m/%d/%y"
datetime_object = datetime.strptime(date, format)
date_object = datetime_object.date()
print(date_object)
print(type(date_object))
The problem I'm facing is changing 2051 to 1951.
I tried writing
format = "%m/%d/19%y"
But it gives me a ValueError.
ValueError: time data '6/1/51' does not match format '%m/%d/19%y'
I couldn't easily find the answer online so I'm asking here. Can anyone please help me with this?
Thanks.
Parse the date without the century using '%m/%d/%y', then:
year_1900 = datetime_object.year - 100
datetime_object = datetime_object.replace(year=year_1900)
You should put conditionals around that so you only do it on dates that are actually in the 1900's, for example anything later than today.

Categories

Resources