pandas - convert a column with multiple date formats to datetime - python

I have a column in a dataframe with multiple date formats that need to be converted to datetime.
date amount
September 2018 15
Sep-18 20
The output should look like
date amount
2018-09-01 15
2018-09-01 20
Using pd.to_datetime(df['Month']) returns the error...
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-09-18 00:00:00

Related

Is there a way to covert date (with different format) into a standardized format in python?

I have a column calls "date" which is an object and it has very different date format like dd.m.yy, dd.mm.yyyy, dd/mm/yyyy, dd/mm, m/d/yyyy etc as below. Obviously by simply using df['date'] = pd.to_datetime(df['date']) will not work. I wonder for messy date value like that, is there anyway to standardized and covert the date into one single format ?
date
17.2.22 # means Feb 17 2022
23.02.22 # means Feb 23 2022
17/02/2022 # means Feb 17 2022
18.2.22 # means Feb 18 2022
2/22/2022 # means Feb 22 2022
3/1/2022 # means March 1 2022
<more messy different format>
Coerce the dates to datetime and allow invalid entries to be turned into nulls.Also, allow pandas to infer the format. code below
df['date'] = pd.to_datetime(df['date'], errors='coerce',infer_datetime_format=True)
date
0 2022-02-17
1 2022-02-23
2 2022-02-17
3 2022-02-18
4 2022-02-22
5 2022-03-01
Based on wwnde's solution, the following works in my real dataset -
df['date'].fillna('',inplace=True)
df['date'] = df['date'].astype('str')
df['date new'] = df['date'].str.replace('.','/')
df['date new'] = pd.to_datetime(df['date new'],
errors='coerce',infer_datetime_format=True)

Converting Python dataframe column to date format

I have a column in a dataframe that I want to convert to a date. The values of the column are either DDMONYYY or DD Month YYYY 00:00:00.000 GMT. For example, one row in the dataframe could have the value 31DEC2002 and the next row could have 31 December 2015 00:00:00.000 GMT. I think this is why I get an error when trying to convert the column to a date using pd.to_datetime or datetime.strptime to convert.
Anyone got any ideas? I'd be very grateful for any help/pointers.
For me working to_datetime with utc=True for converting all values to UTC and errors='coerce' for convert not parseable values to NaT (missing datetime):
df = pd.DataFrame({'date':['31DEC2002','31 December 2015 00:00:00.000 GMT','.']})
df['date'] = pd.to_datetime(df['date'], utc=True, errors='coerce')
print (df)
date
0 2002-12-31 00:00:00+00:00
1 2015-12-31 00:00:00+00:00
2 NaT

Need to convert epoch time to EST using pandas

I do have a dataframe like this -> df
timestamp values
0 1574288141 34
1 1574288241 23
2 1574288341 22
3 1574288441 10
Here timestamp has the epoch time. I want to convert this into a datetime in the format 2019-11-20 04:03:01. I would like to convert this into a EST date.
When I do
pd.to_datetime(df['timestamp'], unit='s')
I get the conversion and the required format but the time doesn't seem to be in EST. It is 4 hours ahead of EST.
I have tried to convert utc to Eastern using the code
pd.to_datetime(df['timestamp'], unit='s').tz_localize('utc').dt.tz_convert('US/Eastern')
But I am getting an error
TypeError: index is not a valid DatetimeIndex or PeriodIndex
You should adding dt , since your input is series not index
pd.to_datetime(df.timestamp,unit='s').dt.tz_localize('utc').dt.tz_convert('US/Eastern')
Out[8]:
0 2019-11-20 17:15:41-05:00
1 2019-11-20 17:17:21-05:00
2 2019-11-20 17:19:01-05:00
3 2019-11-20 17:20:41-05:00
Name: timestamp, dtype: datetime64[ns, US/Eastern]

How do I create a rolling monthly datetime index for pandas?

I have a DatetimeIndex in pandas and I want to convert it to a rolling DatetimeIndex using the last date in the series.
So if I create a sample datetime index:
dates=pd.DatetimeIndex(pd.date_range(dt(2017,10,1),dt(2018,02,02)))
An example
Input: DatetimeIndex with all dates in the above range:
dates
2017-10-01
2017-10-02
.
.
2018-02-01
2018-02-02
Desired Output: DatetimeIndex with only the 2nd of every month (as that is the last date in the input):
dates
2017-10-02
2017-11-02
2017-12-02
2018-01-02
2018-02-02
Attempts
I've tried
dates[::-1][::30]
and also
dates[dates.apply(lambda x: x.date().day==2)]
Unfortunately months can differ by 30 or 31 days so the first way doesn't work and while the second method works for days in range 1-30, for the 31st it skips every other month. So, for example, if I had:
dates
2017-10-01
2017-10-02
.
.
2018-01-31
I would want:
dates
2017-10-31
2017-11-30
2017-12-31
2018-01-31
while the second method skips November as it doesn't have a 30th.
Is there any way to use RelativeDelta to do this?
You can use the .is_month_end functionality in Pandas. This gives an array of boolean values – True if the date is a month-end, false if otherwise.
import pandas as pd
import datetime as dt
dates=pd.Series(pd.date_range('2017-10-1','2017-12-31'))
print(dates[dates.is_month_end])
Output
DatetimeIndex(['2017-10-31', '2017-11-30', '2017-12-31'], dtype='datetime64[ns]', freq=None)
This will help you filter things.

Converting numeric SAS dates to datetimes Pandas

I am currently trying to reproduce this: convert numeric sas date to datetime in Pandas
, but get the following error:
"Python int too large to convert to C long"
Here and example of my dates:
0 1.416096e+09
1 1.427069e+09
2 1.433635e+09
3 1.428624e+09
4 1.433117e+09
Name: dates, dtype: float64
Any ideas?
Here is a little hacky solution. If the date column is called 'date', try
df['date'] = pd.to_datetime(df['date'] - 315619200, unit = 's')
Here 315619200 is the number of seconds between Jan 1 1960 and Jan 1 1970.
You get
0 2004-11-15 00:00:00
1 2005-03-22 00:03:20
2 2005-06-05 23:56:40
3 2005-04-09 00:00:00
4 2005-05-31 00:03:20

Categories

Resources