change YYYYDDMM to YYYYMMDD in python - python

I have a df with dates in a column converted to a datetime. the current format is YYYYDDMM. I need this converted to YYYYMMDD. I tried the below code but it does not change the format and still gives me YYYYDDMM. the end goal is to subtract 1 business day from the effective date but the format needs to be in YYYYMMDD to do this otherwise it subtracts 1 day from the M and not D. can someone help?
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'])
# Effective Date = 20220408 (4th Aug 2022 for clarity)
filtered_df['Effective Date new'] = filtered_df['Effective Date'].dt.strftime("%Y%m%d")
# Effective Date new = 20220408
desired output -- > Effective Date new = 20220804

By default, .to_datetime will interpret the input YYYYDDMM as YYYYMMDD, and therefore print the same thing with %Y%m%d as the format. You can fix this and make it properly parse days in the month greater than 12 by adding the dayfirst keyword argument.
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'], dayfirst=True)

I like to use the datetime library for this purpose. You can use strptime to convert a string into the datetime object and strftime to convert your datetime object to the new string.
from datetime import datetime
def change_date(row):
row["Effective Date new"] = datetime.strptime(row["Effective Date"], "%Y%d%m").strftime("%Y%m%d")
return row
df2 = df.apply(change_date, axis=1)
The output df2 will have Effective Date new as your new column.

Related

Python Pandas Convert 10 digit datetime to a proper date format

I have an excel file which contains date format in 10 digit.
For example,
Order Date as 1806825282.731065,
Purchase Date as 1806765295
Does anyone know how to convert them to a proper date format such as dd/mm/yyyy hh:mm or dd/mm/yyyy? Any date format will be fine.
I tried pd.to_datetime but does not work.
Thanks!
You can do this
(pd.to_timedelta(1806825282, unit='s') + pd.to_datetime('1960-1-1'))
or
(pd.to_timedelta(df['Order Date'], unit='s') + pd.to_datetime('1960-1-1'))
SAS timestamp are stored in seconds from 1960-1-1:
import pandas as pd
origin = pd.Timestamp('1960-1-1')
df = pd.DataFrame({'Order Date': [1806825282.731065],
'Purchase Date': [1806765295]})
df['Order Date'] = origin + pd.to_timedelta(df['Order Date'], unit='s')
df['Purchase Date'] = origin + pd.to_timedelta(df['Purchase Date'], unit='s')
Output:
>>> df
Order Date Purchase Date
0 2017-04-03 07:54:42.731065035 2017-04-02 15:14:55
From The Essential Guide to SAS Dates and Times
SAS has three separate counters that keep track of dates and times. The date counter started
at zero on January 1, 1960. Any day before 1/1/1960 is a negative number, and any day
after that is a positive number. Every day at midnight, the date counter is increased by one.
The time counter runs from zero (at midnight) to 86,399.9999, when it resets to zero. The last
counter is the datetime counter. This is the number of seconds since midnight, January 1, 1960. Why January 1, 1960? One story has it that the founders of SAS wanted to use the
approximate birth date of the IBM 370 system, and they chose January 1, 1960 as an easy-
to-remember approximation.
According to The Pandas Documentation Link:
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Code
>>> pd.to_datetime(1674518400, unit='s')
Timestamp('2023-01-24 15:16:45')
>>> pd.to_datetime(1674518400433502912, unit='ns')
Timestamp('2023-01-24 15:16:45.433502912')
# you can use template
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],unit='ms'))
You can use something like this:
# Convert the 10-digit datetime to a datetime object
df['date_column'] = pd.to_datetime(df['date_column'], unit='s')
# Format the datetime object to the desired format
df['date_column'] = df['date_column'].dt.strftime('%d/%m/%Y %H:%M')
Or if you want a one-liner:
df['date_column'] = pd.to_datetime(df['date_column'], unit='s').dt.strftime('%d/%m/%Y %H:%M')

group by with year of the date

I have a date column in excel,with year_month_day format I want to extract only year of my date and group the column by year,but I got an error
df.index = pd.to_datetime(df[18], format='%y/%m/%d %I:%M%p')
df.groupby(by=[df.index.year])
18 is index of my date column
error=ValueError: time data '2022/04/23' does not match format '%y/%m/%d %I:%M%p' (match)
I don't know how can I fix it.
By the looks of it, the error message indicates that the format string you are using, %y/%m/%d %I:%M%p, doesn't match the format of the dates in your column.
It appears that your date format is YYYY/MM/DD, but the format string you're using is trying to parse it as YY/MM/DD %I:%M%p.
I think you should change the format string to %Y/%m/%d.
df.index = pd.to_datetime(df[18], format='%Y/%m/%d')
Then you can extract the year using the year attribute of the datetime object, and group by the year as you are doing.
Make sure your date column is formatted correctly. I provide here a code with which you can adjust the format of the dates.
import pandas as pd
df = pd.DataFrame({'date': ['2022/04/23', '2022/04/24', '2022/04/25']})
df['date'] = pd.to_datetime(df['date'], format='%Y/%m/%d')

I have a list of dates and I want to subtract actual date from each of them to know how many day passed. Is there any fast way to do this?

I know I should import datetime to have actual date. But the rest is black magic for me right now.
ex.
dates = ['2019-010-11', '2013-05-16', '2011-06-16', '2000-04-22']
actual_date = datetime.datetime.now()
How can I subtract this and as a result have new list with days that passed by from dates to actual_date?
If I'm understanding correctly, you need to find the current date, and then find the number of days between the current date and the dates in your list?
If so, you could try this:
from datetime import datetime, date
dates = ['2019-10-11', '2013-05-16', '2011-06-16', '2000-04-22']
actual_date = date.today()
days = []
for date in dates:
date_object = datetime.strptime(date, '%Y-%m-%d').date()
days_difference = (actual_date - date_object).days
days.append(days_difference)
print(days)
What I am doing here is:
Converting the individual date strings to a "date" object
Subtracting the this date from the actual date. This gets you the time as well, so to strip that out we add .days.
Save the outcome to a list, although of course you could do whatever you wanted with the output.

Problem converting time into pandas datetime

I am trying to convert a date column containing only hours, minutes and seconds ito a datetime form using pandas.to_datetime(). However, it adds year and date automatically. I also tried using
pandas.to_datetime(df["time"], format = %H:%M:%S").dt.time, again the data type remains object.
Is there any method that can change into datetime format without year and date?
Something like this?
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='ignore')
put .dt.time on the end
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='ignore').dt.time

pandas.to_datetime with different length date strings

I have a column of timestamps that I would like to convert to datetime in my pandas dataframe. The format of the dates is %Y-%m-%d-%H-%M-%S which pd.to_datetime does not recognize. I have manually entered the format as below:
df['TIME'] = pd.to_datetime(df['TIME'], format = '%Y-%m-%d-%H-%M-%S')
My problem is some of the times do not have seconds so they are shorter
(format = %Y-%m-%d-%H-%M).
How can I get all of these strings to datetimes?
I was thinking I could add zero seconds (-0) to the end of my shorter dates but I don't know how to do that.
try strftime and if you want the right format and if Pandas can't recognize your custom datetime format, you should provide it explicetly
from functools import partial
df1 = pd.DataFrame({'Date': ['2018-07-02-06-05-23','2018-07-02-06-05']})
newdatetime_fmt = partial(pd.to_datetime, format='%Y-%m-%d-%H-%M-%S')
df1['Clean_Date'] = (df1.Date.str.replace('-','').apply(lambda x: pd.to_datetime(x).strftime('%Y-%m-%d-%H-%M-%S'))
.apply(newdatetime_fmt))
print(df1,df1.dtypes)
output:
Date Clean_Date
0 2018-07-02-06-05-23 2018-07-02 06:05:23
1 2018-07-02-06-05 2018-07-02 06:05:00
Date object
Clean_Date datetime64[ns]

Categories

Resources