convert alphanumeric value to date - python

I get a1523245800 value in the date field from my incoming data feed. I wish to know, how to convert this value into the date dtype? I have tried pandas.to_datetime but that does not seem to work. thankyou.
here is my code
pd.to_datetime([`a1523245800`], errors='coerce')
and here is the output of the above:
DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)

Remove a by str[1:] for remove first char or str.extract for get numeric part first and then to_datetime with parameter unit:
df = pd.DataFrame({'date':['a1523245800','a1523245800']})
df['date1'] = pd.to_datetime(df['date'].str[1:], unit='s')
Or:
df['date1'] = pd.to_datetime(df['date'].str.extract('(\d+)', expand=False), unit='s')
print (df)
date date1
0 a1523245800 2018-04-09 03:50:00
1 a1523245800 2018-04-09 03:50:00

Related

Dates go crazy when applying pd.to_datetime

I have this situation in which I have a DataFrame with a string column with some values with this format:
DD/MM/YYYY
and some with this other one:
DD/MM/YYYY HH:Mi:SS
If I try to convert everything to datetime like this
df['COLUMN'] = pd.to_datetime(df['COLUMN'])
The rows without the HH:Mi:SS go crazy and the months are interpreted as days (and viceversa).
How could avoid this and have a column with just date format?
Example of column which goes crazy:
Before conversion:
DateTime
--------
02/07/2021
15/07/2021 18:16:00
After conversion:
DateTime
2021-02-07 (This is February!!)
2021-07-15 18:16:00
Pandas to_datetime has an inbuild parameter to specify if your day is first. i.e. dayfirst
You can use it as :
df['COLUMN'] = pd.to_datetime(df['COLUMN'], dayfirst=True)
Checkout the documentation for more info.
I believe the following achieves the desired output (may not be the fastest way)
import pandas as pd
df = pd.DataFrame({'date': ['15/07/2021 18:16:00', '02/07/2021']})
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce').fillna(pd.to_datetime(df['date'], format="%d/%m/%Y %H:%M:%S", errors="coerce"))
print(df.head())
for date in df['date']:
print(type(date))
Output:
date
0 2021-07-15 18:16:00
1 2021-07-02 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

Cannot remove timestamp in datetime

I have this date column which the dtype: object and the format is 31-Mar-20. So i tried to turn it with datetime.strptime into datetime64[D] and with format of 2020-03-31 which somehow whatever i have tried it does not work, i have tried some methode from this and this. In some way, it does turn my column to datetime64 but it has timestamp in it and i don't want it. I need it to be datetime without timestamp and the format is 2020-03-31 This is my code
dates = [datetime.datetime.strptime(ts,'%d-%b-%y').strftime('%Y-%m-%d')
for ts in df['date']]
df['date']= pd.DataFrame({'date': dates})
df = df.sort_values(by=['date'])
This approach might work -
import pandas as pd
df = pd.DataFrame({'dates': ['20-Mar-2020', '21-Mar-2020', '22-Mar-2020']})
df
dates
0 20-Mar-2020
1 21-Mar-2020
2 22-Mar-2020
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%Y').dt.date
df
dates
0 2020-03-20
1 2020-03-21
2 2020-03-22
df['date'] = pd.to_datetime(df['date'], format="%d-%b-%y")
This converts it to a datetime, when you look at df it displays values as 2020-03-31 like you want, however these are all datetime objects so if you extract one value with df['date'][0] then you see Timestamp('2020-03-31 00:00:00')
if you want to convert them into a date you can do
df['date'] = [df_datetime.date() for df_datetime in df['date'] ]
There is probably a better way of doing this step.

TypeError: Passing PeriodDtype data is invalid. Use `data.to_timestamp()` instead

How can I convert a date column with format of 2014-09 to format of 2014-09-01 00:00:00.000? The previous format is converted from df['date'] = pd.to_datetime(df['date']).dt.to_period('M').
I use df['date'] = pd.to_datetime(df['date']).dt.strftime('%Y-%m-%d %H:%M:%S.000'), but it generates an error: TypeError: Passing PeriodDtype data is invalid. Use data.to_timestamp() instead. I also try with pd.to_datetime(df['date']).dt.strftime('%Y-%m'), it generates same error.
First idea is convert periods to timestamps by Series.to_timestamp and then use Series.dt.strftime:
print (df)
date
0 2014-09
print (df.dtypes)
date period[M]
dtype: object
df['date'] = df['date'].dt.to_timestamp('s').dt.strftime('%Y-%m-%d %H:%M:%S.000')
print (df)
date
0 2014-09-01 00:00:00.000
Or simply add last values same for each value:
df['date'] = df['date'].dt.to_timestamp('s').dt.strftime('%Y-%m-%d %H:%M:%S').add('.000')
print (df)
date
0 2014-09-01 00:00:00.000
Or:
df['date'] = df['date'].dt.strftime('%Y-%m').add('-01 00:00:00.000')
print (df)
date
0 2014-09-01 00:00:00.000
use %f for milliseconds
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d %H:%M:%S.%f')
sample code is
df = pd.DataFrame({
'Date': ['2014-09-01 00:00:00.000']
})
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S.%f')
df
which gives you the following output
Date
0 2014-09-01
to convert 2014-09 in Period to 2014-09-01 00:00:00.000, we can do as follows
df = pd.DataFrame({
'date': ['2014-09-05']
})
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['date'] = pd.to_datetime(df['date']).dt.to_period("M")
df['date'] = df['date'].dt.strftime('%Y-%m-01 00:00:00.000')
df
Try stripping the last 3 digits
print(pd.to_datetime(df['date']).dt.strftime('%Y-%m-%d %H:%M:%S.%f')[0][:-3])
Output:
2014-09-01 00:00:00.000
In the event the other answers don't work, you could try
df.index = pd.DatetimeIndex(df.date).to_period('s')
df.index
Which should show the datetimeindex object with the frequency set as 's'

Convert one column to standard date format in Python

For a date column I have data like this: 19.01.01, which means 2019-01-01. Is there a method to change the format from the former to the latter?
My idea is to add 20 to the start of date and replace . with -. Are there better ways to do that?
Thanks.
If format is YY.DD.MM use %y.%d.%m, if format is YY.MM.DD use %y.%m.%d in to_datetime:
df = pd.DataFrame({'date':['19.01.01','19.01.02']})
#YY.DD.MM
df['date'] = pd.to_datetime(df['date'], format='%y.%d.%m')
print (df)
date
0 2019-01-01
1 2019-02-01
#YY.MM.DD
df['date'] = pd.to_datetime(df['date'], format='%y.%m.%d')
print (df)
date
0 2019-01-01
1 2019-01-02

Pandas - Different time formats in the same column

I have a Dataframe that has dates stored in different formats in the same column as shown below:
date
1-10-2018
2-10-2018
3-Oct-2018
4-10-2018
Is there anyway I could make all of them to have the same format.
Use to_datetime with specify formats with errors='coerce' for replace not matched values to NaNs. Last combine_first for replace missing values by date2 Series.
date1 = pd.to_datetime(df['date'], format='%d-%m-%Y', errors='coerce')
date2 = pd.to_datetime(df['date'], format='%d-%b-%Y', errors='coerce')
df['date'] = date1.combine_first(date2)
print (df)
date
0 2018-10-01
1 2018-10-02
2 2018-10-03
3 2018-10-04

Categories

Resources