python pandas reading a timestamp - python

I am trying to read a date-time in microsecond format.
1500909283.955000
The expected output should be something like
July 24, 2017 3:14:43.955 PM
But when I use pandas to_datetime function I got
1970-01-01 00:00:01.500909283
I tried all possible format with no success.
Any hints

You need unit='s' param for to_datetime:
In[4]:
pd.to_datetime(1500909283.955000, unit='s')
Out[4]: Timestamp('2017-07-24 15:14:43.955000')
the timestamp is seconds since the epoch
The default unit value is nanoseconds:
In[5]:
pd.to_datetime(1500909283.955000, unit='ns')
Out[5]: Timestamp('1970-01-01 00:00:01.500909283')
which is what you observed

Need to_datetime with parameter unit, data are in seconds, not in miliseconds:
df = pd.DataFrame({'col':['1500909283.955000','1500909283.955000']})
df['col'] = pd.to_datetime(df['col'], unit='s')
print (df)
col
0 2017-07-24 15:14:43.955
1 2017-07-24 15:14:43.955
For milliseconds:
df['col'] = pd.to_datetime(df['col'], unit='ms')
print (df)
col
0 1970-01-18 08:55:09.283955
1 1970-01-18 08:55:09.283955
If need another format use Series.dt.strftime:
df['col1'] = df['col'].dt.strftime('%b %d, %Y %I:%M:%S.%f %p')
print (df)
col col1
0 2017-07-24 15:14:43.955 Jul 24, 2017 03:14:43.955000 PM
1 2017-07-24 15:14:43.955 Jul 24, 2017 03:14:43.955000 PM

Related

"Bad directive" value error when converting to Pandas datetime

I have a Pandas dataframe df that looks as follows:
df = pd.DataFrame({'timestamp' : ['Wednesday, Apr 4/04/22 at 17:02',
'Saturday, Apr 4/23/22 at 15:45'],
'foo' : [1, 2]
})
df
timestamp foo
0 Wednesday, Apr 4/04/22 at 17:02 1
1 Saturday, Apr 4/23/22 at 15:45 2
I'm trying to convert the timestamp column to a datetime object so that I can add a day_of_week column.
My attempt:
df['timestamp'] = pd.to_datetime(df['timestamp'],
format='%A, %b %-m/%-d/%y at %H:%M')
df['day_of_week'] = df['timestamp'].dt.day_name()
The error is:
ValueError: '-' is a bad directive in format '%A, %b %-m/%-d/%y at %H:%M'
Any assistance would be greatly appreciated. Thanks!
Just use the format without the -:
df['timestamp'] = pd.to_datetime(df['timestamp'],
format='%A, %b %m/%d/%y at %H:%M')
df['day_of_week'] = df['timestamp'].dt.day_name()
NB. to_datetime is quite flexible on the provided data, note how the incorrect day of week was just ignored.
output:
timestamp foo day_of_week
0 2022-04-04 17:02:00 1 Monday
1 2022-04-23 15:45:00 2 Saturday

Pandas mixed date column with Excel dates, floats, int, string dates - convert to datetime

I am trying to convert a column with a real mix of date formats. I have tried a few things on SO but still not got a working solution. I have tried changing column to 'string', also tried converting the floats in int.
data
date
1 43076.0
2 43077
3 07 Dec 2017
4 2021-12-22 00:00:00
code to try and fix the Excel dates and '07 Dec 2017' style
d = ['43076.0', '43077', '07 Dec 2017', '2021-12-22 00:00:00']
df = pd.DataFrame(d, columns=['date'])
date1 = pd.to_datetime(df['date'], errors='coerce', format='%d %a %Y')
date2 = pd.to_datetime(df['date'], errors='coerce', unit='D', origin='1899-12-30')
frame_clean[col] = date2.fillna(date1)
error
Name: StartDate, Length: 16189, dtype: object' is not compatible with origin='1899-12-30'; it must be numeric with a unit specified
I like this solution rather than using apply as to slow. But I am struggling to get it working.
Edit
Breaking down #FObersteiner solution for better understanding.
convert the simple dates
df['datetime'] = pd.to_datetime(df['date'], errors='coerce')
0 NaT
1 NaT
2 2018-12-07
3 2021-12-22
isolate the numeric rows
m = pd.to_numeric(df['date'], errors='coerce').notna()
m
0 True
1 True
2 False
3 False
convert numeric rows to floats
df['date'][m].astype(float)
0 43080.0
1 43077.0
convert numeric rows to floats and then dt objects
pd.to_datetime(df['date'][m].astype(float), errors='coerce', unit='D', origin='1899-12-30')
0 2017-12-11
1 2017-12-08
pull it alltogether and bring back the simple date rows
df.loc[m, 'datetime'] = pd.to_datetime(df['date'][m].astype(float), errors='coerce', unit='D', origin='1899-12-30')
print(df)
For given example, use a mask to convert numeric and non-numeric data separately:
import pandas as pd
df = pd.DataFrame({'date':['43076.0', '43077', '07 Dec 2017', '2021-12-22 00:00:00']})
df['datetime'] = pd.to_datetime(df['date'], errors='coerce')
m = pd.to_numeric(df['date'], errors='coerce').notna()
df.loc[m, 'datetime'] = pd.to_datetime(df['date'][m].astype(float), errors='coerce', unit='D', origin='1899-12-30')
print(df)
date datetime
0 43076.0 2017-12-07
1 43077 2017-12-08
2 07 Dec 2017 2017-12-07
3 2021-12-22 00:00:00 2021-12-22

TypeError: Passing PeriodDtype data is invalid. Use `data.to_timestamp()` instead

How can I convert a date column with format of 2014-09 to format of 2014-09-01 00:00:00.000? The previous format is converted from df['date'] = pd.to_datetime(df['date']).dt.to_period('M').
I use df['date'] = pd.to_datetime(df['date']).dt.strftime('%Y-%m-%d %H:%M:%S.000'), but it generates an error: TypeError: Passing PeriodDtype data is invalid. Use data.to_timestamp() instead. I also try with pd.to_datetime(df['date']).dt.strftime('%Y-%m'), it generates same error.
First idea is convert periods to timestamps by Series.to_timestamp and then use Series.dt.strftime:
print (df)
date
0 2014-09
print (df.dtypes)
date period[M]
dtype: object
df['date'] = df['date'].dt.to_timestamp('s').dt.strftime('%Y-%m-%d %H:%M:%S.000')
print (df)
date
0 2014-09-01 00:00:00.000
Or simply add last values same for each value:
df['date'] = df['date'].dt.to_timestamp('s').dt.strftime('%Y-%m-%d %H:%M:%S').add('.000')
print (df)
date
0 2014-09-01 00:00:00.000
Or:
df['date'] = df['date'].dt.strftime('%Y-%m').add('-01 00:00:00.000')
print (df)
date
0 2014-09-01 00:00:00.000
use %f for milliseconds
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d %H:%M:%S.%f')
sample code is
df = pd.DataFrame({
'Date': ['2014-09-01 00:00:00.000']
})
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S.%f')
df
which gives you the following output
Date
0 2014-09-01
to convert 2014-09 in Period to 2014-09-01 00:00:00.000, we can do as follows
df = pd.DataFrame({
'date': ['2014-09-05']
})
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['date'] = pd.to_datetime(df['date']).dt.to_period("M")
df['date'] = df['date'].dt.strftime('%Y-%m-01 00:00:00.000')
df
Try stripping the last 3 digits
print(pd.to_datetime(df['date']).dt.strftime('%Y-%m-%d %H:%M:%S.%f')[0][:-3])
Output:
2014-09-01 00:00:00.000
In the event the other answers don't work, you could try
df.index = pd.DatetimeIndex(df.date).to_period('s')
df.index
Which should show the datetimeindex object with the frequency set as 's'

Convert string to date in python if date string has different format

My data has date variable with two different date formats
Date
01 Jan 2019
02 Feb 2019
01-12-2019
23-01-2019
11-04-2019
22-05-2019
I want to convert this string into date(YYYY-mm-dd)
Date
2019-01-01
2019-02-01
2019-12-01
2019-01-23
2019-04-11
2019-05-22
I have tried following things, but I am looking for better approach
df['Date'] = np.where(df['Date'].str.contains('-'), pd.to_datetime(df['Date'], format='%d-%m-%Y'), pd.to_datetime(df['Date'], format='%d %b %Y'))
Working solution for me
df['Date_1']= np.where(df['Date'].str.contains('-'),df['Date'],np.nan)
df['Date_2']= np.where(df['Date'].str.contains('-'),np.nan,df['Date'])
df['Date_new'] = np.where(df['Date'].str.contains('-'),pd.to_datetime(df['Date_1'], format = '%d-%m-%Y'),pd.to_datetime(df['Date_2'], format = '%d %b %Y'))
Just use the option dayfirst=True
pd.to_datetime(df.Date, dayfirst=True)
Out[353]:
0 2019-01-01
1 2019-02-02
2 2019-12-01
3 2019-01-23
4 2019-04-11
5 2019-05-22
Name: Date, dtype: datetime64[ns]
My suggestion:
Define a conversion function as follows:
import datetime as dt
def conv_date(x):
try:
res = pd.to_datetime(dt.datetime.strptime(x, "%d %b %Y"))
except ValueError:
res = pd.to_datetime(dt.datetime.strptime(x, "%d-%m-%Y"))
return res
Now get the new date column as folows:
df['Date_new'] = df['Date'].apply(lambda x: conv_date(x))
You can get your desired result with the help of apply AND to_datetime method of pandas, as given below:-
import pandas pd
def change(value):
return pd.to_datetime(value)
df = pd.DataFrame(data = {'date':['01 jan 2019']})
df['date'] = df['date'].apply(change)
df
I hope it may help you.
This works simply as expected -
import pandas as pd
a = pd. DataFrame({
'Date' : ['01 Jan 2019',
'02 Feb 2019',
'01-12-2019',
'23-01-2019',
'11-04-2019',
'22-05-2019']
})
a['Date'] = a['Date'].apply(lambda date: pd.to_datetime(date, dayfirst=True))
print(a)

How to get rid of bad dates in a string before parsing to datetime type? [duplicate]

I want to convert a string from a dataframe to datetime.
dfx = df.ix[:,'a']
dfx = pd.to_datetime(dfx)
But it gives the following error:
ValueError: day is out of range for month
Can anyone help?
Maybe help add parameter dayfirst=True to to_datetime, if format of datetime is 30-01-2016:
dfx = df.ix[:,'a']
dfx = pd.to_datetime(dfx, dayfirst=True)
More universal is use parameter format with errors='coerce' for replacing values with other format to NaN:
dfx = '30-01-2016'
dfx = pd.to_datetime(dfx, format='%d-%m-%Y', errors='coerce')
print (dfx)
2016-01-30 00:00:00
Sample:
dfx = pd.Series(['30-01-2016', '15-09-2015', '40-09-2016'])
print (dfx)
0 30-01-2016
1 15-09-2015
2 40-09-2016
dtype: object
dfx = pd.to_datetime(dfx, format='%d-%m-%Y', errors='coerce')
print (dfx)
0 2016-01-30
1 2015-09-15
2 NaT
dtype: datetime64[ns]
If format is standard (e.g. 01-30-2016 or 01-30-2016), add only errors='coerce':
dfx = pd.Series(['01-30-2016', '09-15-2015', '09-40-2016'])
print (dfx)
0 01-30-2016
1 09-15-2015
2 09-40-2016
dtype: object
dfx = pd.to_datetime(dfx, errors='coerce')
print (dfx)
0 2016-01-30
1 2015-09-15
2 NaT
dtype: datetime64[ns]
Well in my case
year = 2023
month = 2
date = datetime.date(year, month, 30)
got me this error because February month has 29 or 28 days in it. Maybe that point helps someone

Categories

Resources