pandas.to_datetime with different length date strings - python

I have a column of timestamps that I would like to convert to datetime in my pandas dataframe. The format of the dates is %Y-%m-%d-%H-%M-%S which pd.to_datetime does not recognize. I have manually entered the format as below:
df['TIME'] = pd.to_datetime(df['TIME'], format = '%Y-%m-%d-%H-%M-%S')
My problem is some of the times do not have seconds so they are shorter
(format = %Y-%m-%d-%H-%M).
How can I get all of these strings to datetimes?
I was thinking I could add zero seconds (-0) to the end of my shorter dates but I don't know how to do that.

try strftime and if you want the right format and if Pandas can't recognize your custom datetime format, you should provide it explicetly
from functools import partial
df1 = pd.DataFrame({'Date': ['2018-07-02-06-05-23','2018-07-02-06-05']})
newdatetime_fmt = partial(pd.to_datetime, format='%Y-%m-%d-%H-%M-%S')
df1['Clean_Date'] = (df1.Date.str.replace('-','').apply(lambda x: pd.to_datetime(x).strftime('%Y-%m-%d-%H-%M-%S'))
.apply(newdatetime_fmt))
print(df1,df1.dtypes)
output:
Date Clean_Date
0 2018-07-02-06-05-23 2018-07-02 06:05:23
1 2018-07-02-06-05 2018-07-02 06:05:00
Date object
Clean_Date datetime64[ns]

Related

change YYYYDDMM to YYYYMMDD in python

I have a df with dates in a column converted to a datetime. the current format is YYYYDDMM. I need this converted to YYYYMMDD. I tried the below code but it does not change the format and still gives me YYYYDDMM. the end goal is to subtract 1 business day from the effective date but the format needs to be in YYYYMMDD to do this otherwise it subtracts 1 day from the M and not D. can someone help?
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'])
# Effective Date = 20220408 (4th Aug 2022 for clarity)
filtered_df['Effective Date new'] = filtered_df['Effective Date'].dt.strftime("%Y%m%d")
# Effective Date new = 20220408
desired output -- > Effective Date new = 20220804
By default, .to_datetime will interpret the input YYYYDDMM as YYYYMMDD, and therefore print the same thing with %Y%m%d as the format. You can fix this and make it properly parse days in the month greater than 12 by adding the dayfirst keyword argument.
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'], dayfirst=True)
I like to use the datetime library for this purpose. You can use strptime to convert a string into the datetime object and strftime to convert your datetime object to the new string.
from datetime import datetime
def change_date(row):
row["Effective Date new"] = datetime.strptime(row["Effective Date"], "%Y%d%m").strftime("%Y%m%d")
return row
df2 = df.apply(change_date, axis=1)
The output df2 will have Effective Date new as your new column.

convert yyyy-mm-dd to mmm-yy in dataframe python

I am trying to convert the way month and year is presented.
I have dataframe as below
Date
2020-01-31
2020-04-30
2021-05-05
and I want to convert it in the way like month and year.
The output that I am expecting is
Date
Jan-20
Apr-20
May-21
I tried to do it with datetime but it doesn't work.
pd.to_datetime(pd.Series(df['Date'),format='%mmm-%yy')
Use .dt.strftime() to change the display format. %b-%y is the format string for Mmm-YY:
df.Date = pd.to_datetime(df.Date).dt.strftime('%b-%y')
# Date
# 0 Jan-20
# 1 Apr-20
# 2 May-21
Or if Date is the index:
df.index = pd.to_datetime(df.index).dt.strftime('%b-%y')
import pandas as pd
date_sr = pd.to_datetime(pd.Series("2020-12-08"))
change_format = date_sr.dt.strftime('%b-%Y')
print(change_format)
reference https://docs.python.org/3/library/datetime.html
%Y-%m-%d changed to ('%b-%y')
import datetime
df['Date'] = df['Date'].apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d').strftime('%b-%y'))
# reference https://docs.python.org/3/library/datetime.html
# %Y-%m-%d changed to ('%b-%y')

python convert timestamp without year-month-day

I have a dataframe with time column as string and I should convert it to a timestamp only with h:m:sec.ms . Here an example:
import pandas as pd
df=pd.DataFrame({'time': ['02:21:18.110']})
df.time= pd.to_datetime(df.time , format="%H:%M:%S.%f")
df # I get 1900-01-01 02:21:18.110
Without format flag, I get current day 2020-12-16. How can I get the stamp without year-month-day which seemingly always is included. Thanks!
If need processing values later by some datetimelike methods better is convert values to timedeltas by to_timedelta instead times:
df['time'] = pd.to_timedelta(df['time'])
print (df)
time
0 0 days 02:21:18.110000
You need this:
df=pd.DataFrame({'time': ['02:21:18.110']})
df['time'] = pd.to_datetime(df['time']).dt.time
In [1023]: df
Out[1023]:
time
0 02:21:18.110000

How to change format of data to '%Y%m%d' in Pandas?

I have a DF with first column showing as e.g. 2018-01-31 00:00:00.
I want to convert whole column (or during printing / saving to other variable) that date to 20180131 format.
NOT looking to do that during saving to a CSV file.
Tried this but it did not work:
df['mydate'] = pd.to_datetime(df['mydate'], format='%Y%m%d')
pd.to_datetime is used to convert your series to datetime:
s = pd.Series(['2018-01-31 00:00:00'])
s = pd.to_datetime(s)
print(s)
0 2018-01-31
dtype: datetime64[ns]
pd.Series.dt.strftime converts your datetime series to a string in your desired format:
s = s.dt.strftime('%Y%m%d')
print(s)
0 20180131
dtype: object
pd.to_datetime will convert a string to a date. You want to covert a date to a string
df['mydate'].dt.strftime('%Y%m%d')
Note that it's possible your date is already a string, but in the wrong format in which case you might have to convert it to a date first:
pd.to_datetime(df['mydate'], format='%Y-%m-%d %H:%M:%S').dt.strftime('%Y%m%d')
Convert the string column with 2018-01-31 00:00:00. to a datetime:
df['mydate'] = pd.to_datetime(df['mydate'])
#Get your preferred strings based on format:
df['mydate'].dt.strftime('%Y-%m-%d')
#Output: '2018-01-31'
df['mydate'].dt.strftime('%Y%m%d')
#output:'20180131'

pandas save date in ISO format?

I'm trying to generate a Pandas DataFrame where date_range is an index. Then save it to a CSV file so that the dates are written in ISO-8601 format.
import pandas as pd
import numpy as np
from pandas import DataFrame, Series
NumberOfSamples = 10
dates = pd.date_range('20130101',periods=NumberOfSamples,freq='90S')
df3 = DataFrame(index=dates)
df3.to_csv('dates.txt', header=False)
The current output to dates.txt is:
2013-01-01 00:00:00
2013-01-01 00:01:30
2013-01-01 00:03:00
2013-01-01 00:04:30
...................
I'm trying to get it to look like:
2013-01-01T00:00:00Z
2013-01-01T00:01:30Z
2013-01-01T00:03:00Z
2013-01-01T00:04:30Z
....................
Use datetime.strftime and call map on the index:
In [72]:
NumberOfSamples = 10
import datetime as dt
dates = pd.date_range('20130101',periods=NumberOfSamples,freq='90S')
df3 = pd.DataFrame(index=dates)
df3.index = df3.index.map(lambda x: dt.datetime.strftime(x, '%Y-%m-%dT%H:%M:%SZ'))
df3
Out[72]:
Empty DataFrame
Columns: []
Index: [2013-01-01T00:00:00Z, 2013-01-01T00:01:30Z, 2013-01-01T00:03:00Z, 2013-01-01T00:04:30Z, 2013-01-01T00:06:00Z, 2013-01-01T00:07:30Z, 2013-01-01T00:09:00Z, 2013-01-01T00:10:30Z, 2013-01-01T00:12:00Z, 2013-01-01T00:13:30Z]
Alternatively and better in my view (thanks to #unutbu) you can pass a format specifier to to_csv:
df3.to_csv('dates.txt', header=False, date_format='%Y-%m-%dT%H:%M:%SZ')
With pd.Index.strftime:
If you're sure that all your dates are UTC, you can hardcode the format:
df3.index = df3.index.strftime('%Y-%m-%dT%H:%M:%SZ')
which gives you 2013-01-01T00:00:00Z and so on. Note that the "Z" denotes UTC!
With pd.Timestamp.isoformat and pd.Index.map:
df3.index = df3.index.map(lambda timestamp: timestamp.isoformat())
This gives you 2013-01-01T00:00:00. If you attach a timezone to your dates first (e.g. by passing tz="UTC" to date_range), you'll get: 2013-01-01T00:00:00+00:00 which also conforms to ISO-8601 but is a different notation. This should work for any dateutil or pytz timezone, leaving no room for ambiguity when clocks switch from daylight saving to standard time.

Categories

Resources