I've been looking through every thread that I can find, and the only one that is relevant to this type of formatting issue is here, but it's for java...
How parse 2013-03-13T20:59:31+0000 date string to Date
I've got a column with values like 201604 and 201605 that I need to turn into date values like 2016-04-01 and 2016-05-01. To accomplish this, I've done what is below.
#Create Number to build full date
df['DAY_NBR'] = '01'
#Convert Max and Min date to string to do date transformation
df['MAXDT'] = df['MAXDT'].astype(str)
df['MINDT'] = df['MINDT'].astype(str)
#Add the day number to the max date month and year
df['MAXDT'] = df['MAXDT'] + df['DAY_NBR']
#Add the day number to the min date month and year
df['MINDT'] = df['MINDT'] + df['DAY_NBR']
#Convert Max and Min date to integer values
df['MAXDT'] = df['MAXDT'].astype(int)
df['MINDT'] = df['MINDT'].astype(int)
#Convert Max date to datetime
df['MAXDT'] = pd.to_datetime(df['MAXDT'], format='%Y%m%d')
#Convert Min date to datetime
df['MINDT'] = pd.to_datetime(df['MINDT'], format='%Y%m%d')
To be honest, I can work with this output, but it's a little messy because the unique values for the two columns are...
MAXDT Values
['2016-07-01T00:00:00.000000000' '2017-09-01T00:00:00.000000000'
'2018-06-01T00:00:00.000000000' '2017-07-01T00:00:00.000000000'
'2017-03-01T00:00:00.000000000' '2018-12-01T00:00:00.000000000'
'2017-12-01T00:00:00.000000000' '2019-01-01T00:00:00.000000000'
'2018-09-01T00:00:00.000000000' '2018-10-01T00:00:00.000000000'
'2016-04-01T00:00:00.000000000' '2018-03-01T00:00:00.000000000'
'2017-05-01T00:00:00.000000000' '2018-08-01T00:00:00.000000000'
'2017-02-01T00:00:00.000000000' '2016-12-01T00:00:00.000000000'
'2018-01-01T00:00:00.000000000' '2018-02-01T00:00:00.000000000'
'2017-06-01T00:00:00.000000000' '2018-11-01T00:00:00.000000000'
'2018-05-01T00:00:00.000000000' '2019-11-01T00:00:00.000000000'
'2016-06-01T00:00:00.000000000' '2017-10-01T00:00:00.000000000'
'2016-08-01T00:00:00.000000000' '2018-04-01T00:00:00.000000000'
'2016-03-01T00:00:00.000000000' '2016-10-01T00:00:00.000000000'
'2016-11-01T00:00:00.000000000' '2019-12-01T00:00:00.000000000'
'2016-09-01T00:00:00.000000000' '2017-08-01T00:00:00.000000000'
'2016-05-01T00:00:00.000000000' '2017-01-01T00:00:00.000000000'
'2017-11-01T00:00:00.000000000' '2018-07-01T00:00:00.000000000'
'2017-04-01T00:00:00.000000000' '2016-01-01T00:00:00.000000000'
'2016-02-01T00:00:00.000000000' '2019-02-01T00:00:00.000000000'
'2019-07-01T00:00:00.000000000' '2019-10-01T00:00:00.000000000'
'2019-09-01T00:00:00.000000000' '2019-03-01T00:00:00.000000000'
'2019-05-01T00:00:00.000000000' '2019-04-01T00:00:00.000000000'
'2019-08-01T00:00:00.000000000' '2019-06-01T00:00:00.000000000'
'2020-02-01T00:00:00.000000000' '2020-01-01T00:00:00.000000000']
MINDT Values
['2016-04-01T00:00:00.000000000' '2017-07-01T00:00:00.000000000'
'2016-02-01T00:00:00.000000000' '2017-01-01T00:00:00.000000000'
'2017-02-01T00:00:00.000000000' '2018-12-01T00:00:00.000000000'
'2017-08-01T00:00:00.000000000' '2018-04-01T00:00:00.000000000'
'2017-10-01T00:00:00.000000000' '2019-01-01T00:00:00.000000000'
'2018-05-01T00:00:00.000000000' '2018-09-01T00:00:00.000000000'
'2018-10-01T00:00:00.000000000' '2016-01-01T00:00:00.000000000'
'2016-03-01T00:00:00.000000000' '2017-11-01T00:00:00.000000000'
'2017-05-01T00:00:00.000000000' '2018-07-01T00:00:00.000000000'
'2018-06-01T00:00:00.000000000' '2017-12-01T00:00:00.000000000'
'2016-10-01T00:00:00.000000000' '2018-02-01T00:00:00.000000000'
'2017-06-01T00:00:00.000000000' '2018-08-01T00:00:00.000000000'
'2018-03-01T00:00:00.000000000' '2018-11-01T00:00:00.000000000'
'2016-08-01T00:00:00.000000000' '2016-06-01T00:00:00.000000000'
'2018-01-01T00:00:00.000000000' '2016-07-01T00:00:00.000000000'
'2016-11-01T00:00:00.000000000' '2016-09-01T00:00:00.000000000'
'2017-04-01T00:00:00.000000000' '2016-05-01T00:00:00.000000000'
'2017-09-01T00:00:00.000000000' '2016-12-01T00:00:00.000000000'
'2017-03-01T00:00:00.000000000']
I'm trying to build a loop that runs through these dates, and it works, but I don't want to have an index with all of these irrelevant zeros and a T in it. How can I convert these empty timestamp values to just the date that is in yyyy-mm-dd format?
Thank you!
Unfortunately, I believe Pandas always stores datetime objects as datetime64[ns], meaning the precision has to be like that. Even if you attempt to save as datetime64[D], it will be cast to datetime64[ns].
It's possible to just store these datetime objects as strings instead, but the simplest solution is likely to just strip the extra zeroes when you're looping through them (i.e, using df['MAXDT'].to_numpy().astype('datetime64[D]') and looping through the formatted numpy array), or just reformatting using datetime.
Related
I have a df with dates in a column converted to a datetime. the current format is YYYYDDMM. I need this converted to YYYYMMDD. I tried the below code but it does not change the format and still gives me YYYYDDMM. the end goal is to subtract 1 business day from the effective date but the format needs to be in YYYYMMDD to do this otherwise it subtracts 1 day from the M and not D. can someone help?
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'])
# Effective Date = 20220408 (4th Aug 2022 for clarity)
filtered_df['Effective Date new'] = filtered_df['Effective Date'].dt.strftime("%Y%m%d")
# Effective Date new = 20220408
desired output -- > Effective Date new = 20220804
By default, .to_datetime will interpret the input YYYYDDMM as YYYYMMDD, and therefore print the same thing with %Y%m%d as the format. You can fix this and make it properly parse days in the month greater than 12 by adding the dayfirst keyword argument.
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'], dayfirst=True)
I like to use the datetime library for this purpose. You can use strptime to convert a string into the datetime object and strftime to convert your datetime object to the new string.
from datetime import datetime
def change_date(row):
row["Effective Date new"] = datetime.strptime(row["Effective Date"], "%Y%d%m").strftime("%Y%m%d")
return row
df2 = df.apply(change_date, axis=1)
The output df2 will have Effective Date new as your new column.
I know I should import datetime to have actual date. But the rest is black magic for me right now.
ex.
dates = ['2019-010-11', '2013-05-16', '2011-06-16', '2000-04-22']
actual_date = datetime.datetime.now()
How can I subtract this and as a result have new list with days that passed by from dates to actual_date?
If I'm understanding correctly, you need to find the current date, and then find the number of days between the current date and the dates in your list?
If so, you could try this:
from datetime import datetime, date
dates = ['2019-10-11', '2013-05-16', '2011-06-16', '2000-04-22']
actual_date = date.today()
days = []
for date in dates:
date_object = datetime.strptime(date, '%Y-%m-%d').date()
days_difference = (actual_date - date_object).days
days.append(days_difference)
print(days)
What I am doing here is:
Converting the individual date strings to a "date" object
Subtracting the this date from the actual date. This gets you the time as well, so to strip that out we add .days.
Save the outcome to a list, although of course you could do whatever you wanted with the output.
I have a dataframe with column formatted as a string with value such as '20151' to identify year and quarter, just like '2015Q1'.
I need it to be in DateTime format, specifically period[Q-DEC] datatype. Only by looking at the value, that '2015Q1' is already the right format, but it is actually a string.
The additional problem is I don't have information of the date, only the period as a string.
How can I convert that string into the period[Q-DEC] datatype in Python?
Convert to the Pandas Period type like this:
import pandas as pd
d = "20151"
t = d[:-1] + "Q" + d[-1:]
month = pd.Period(t, freq="M")
print(month)
returns:
2015-01
Conversely, if you need the PeriodIndex:
values = ["2015Q1", "2015Q2", "2015Q3", "2015Q4"]
index = pd.PeriodIndex(values, freq="Q")
print(index)
Will return:
PeriodIndex(['2015Q1', '2015Q2', '2015Q3', '2015Q4'], dtype='period[Q-DEC]', freq='Q-DEC')
I want to add a blank column of date of format "%Y-%m-%d" to a dataframe. I tried datetime.datetime.strptime('0000-00-00',"%Y-%m-%d")
But I get an error ValueError: time data '0000-00-00' does not match format '%Y-%m-%d'
How can I create a column of blank date of format "%Y-%m-%d"?
In R following works.
df$date =""
class(df$date) = "Date"
How can I achieve this in Python?
Thank you.
I don't think that's possible with datetime module. The oldest you can go to is answered here:
What is the oldest time that can be represented in Python?
datetime.MINYEAR
The smallest year number allowed in a date or datetime object. MINYEAR is 1.
datetime.MAXYEAR
The largest year number allowed in a date or datetime object. MAXYEAR is 9999.
source: datetime documentation
initial_date = request.GET.get('data') or datetime.min # datetime.min is 1
end_date = request.GET.get('data_f') or datetime.max # datetime.max is 9999
I have the data with an array containing dates (YYYY-MM-DD) starting from 2005-12-01 till 2012-30-12. The dates are irregular and some of the dates are missing in between. I want to take the reference date as 2005-11-30 and calculate the integer number of all the dates in the array.
How can I convert my date array into an integer number from the reference date in Python?
If I understood your question correctly you have a list of dates which you want to find and write the difference of each between a fixed date.
You can use list comprehension;
from datetime import date
start_date = date(2005, 11, 30)
# assuming your list is named my_date_list
differences = [d - start_date for d in my_date_list]
If your list members are not date type, but formatted strings. You can convert them to dates on the run.
from datetime import datetime
date_format = "%Y-%m-%d"
differences = [datetime.strptime(d, date_format) - start_date for d in my_date_list]