format a pandas dataframe column to datatime format [duplicate] - python

I have pandas column like following
January 2014
February 2014
I want to convert it to following format
201401
201402
I am doing following
df.date = pd.to_datetime(df.date,format= '%Y%B')
But,it gives me an error.

You shouldn't need the format string, it just works:
In [207]:
pd.to_datetime('January 2014')
Out[207]:
Timestamp('2014-01-01 00:00:00')
besides your format string is incorrect, it should be '%B %Y':
In [209]:
pd.to_datetime('January 2014', format='%B %Y')
Out[209]:
Timestamp('2014-01-01 00:00:00')

Related

Pandas int type to date type

i am new to pandas and I try to convert an int type-column to an date type-column .
The int in the df is something like: 10712 (first day, then month, then year).
I tried solving this with:
df_date = pd.to_datetime(df['Date'], format='%d%m%Y')
but I always get the following value error:
time data '10712' does not match format '%d%m%Y' (match)
Thank you for your help :)
You should use %y (2-digit year) instead of %Y (4-digit year). But that is not enough.
The format %d%m%y converts 10712 to 10-07-2012, not to 1-07-2012 as you expect.
That's because of the following feature of the underlying strptime:
When used with the strptime() method, the leading zero is optional for
%m
A workaround could be to convert to a format properly understandable by strptime (and to_datetime):
>>> df = pd.DataFrame({'date': [10712, 20813, 30914]})
>>> df
date
0 10712
1 20813
2 30914
>>> df1 = df.date.astype(str).str.replace('(\d+)(\d\d)(\d\d)',
r'\2/\1/\3', regex=True)
>>> df1
0 07/1/12
1 08/2/13
2 09/3/14
>>> pd.to_datetime(df1)
0 2012-07-01
1 2013-08-02
2 2014-09-03
Use %y year specifier to parse year without century digits:
In [654]: pd.to_datetime(10712, format='%d%m%y')
Out[654]: Timestamp('2012-07-10 00:00:00')

How do I convert date from alphabetical to numeric format?

I want to convert date from 'Sep 17, 2021' format to '17.09.2021'. I made a function, but I can't apply it to the series. What am I doing wrong?
def to_normal_date(bad_date):
datetime.strptime(bad_date, '%b %d, %Y')
return s.strftime('%Y-%m-%d')
df['normal_date'] = df['date'].apply(to_normal_date)
I receive a ValueError when I'm trying to apply it to series. But it works fine with this:
to_normal_date('Sep 16, 2021')
Use pd.to_datetime to convert the "date" column to datetime format. Specifying errors="coerce" will convert dates that are not in the correct format to NaN values instead of raising errors.
Convert to the required format using .strftime with the .dt accessor.
df["normal_date"] = pd.to_datetime(df["date"], format="%b %d, %Y", errors="coerce").dt.strftime("%d.%m.%Y")
>>> df
date normal_date
0 Sep 17, 2021 17.09.2021
1 Oct 31, 2021 31.10.2021
2 Nov 19, 2021 19.11.2021
3 Dec 25, 2021 25.12.2021
Try:
pd.to_datetime(df['date'], format='%b %d, %Y').dt.strftime('%Y-%m-%d')
It should work provided all df['date'] entries match the date pattern of 'Sep 17, 2021'.

How to change time format that is already stored in json using python?

I have data stored in a JSON file and am reading it in with Pandas. The format of the time is 'Jun 10, 2021, 01:05:30:565'. I would like to present the time column with its match. However, python gives this error: Unknown string format:', 'Jun 10, 2021, 01:05:30:565 AM').
I used: DS [pd.to_datetime(day + ' ' + time)] = value
That line worked with other columns that have time : HH:MM: SS. but with milliseconds I'm unable to present what I want.
From your example :
>>> import pandas as pd
>>> df = pd.DataFrame({'date': ['Jun 10, 2021, 01:05:30:565 AM']},
... index = [0])
>>> df
date
0 Jun 10, 2021, 01:05:30:565 AM
We can convert the date column to DateTime like so :
>>> df['date'] = pd.to_datetime(df['date'], format="%b %d, %Y, %H:%M:%S:%f %p")
>>> df
date
0 2021-06-10 01:05:30.565

Converting a string that has day of the year to datetime

I have a string column that looks like below:
2018-24 7:10:0
2018-8 12:1:20
2018-44 13:55:19
The 24,8,44 that you see are the day of the year and not the date.
How can I convert this to datetime column in the below format ?
2018-01-24 07:10:00
2018-01-08 12:01:20
2018-02-13 13:55:19
I am unable to find anything related to converting day of the year ?
You need format string '%Y-%j %H:%M:%S'
In[53]:
import datetime as dt
dt.datetime.strptime('2018-44 13:55:19', '%Y-%j %H:%M:%S')
Out[53]: datetime.datetime(2018, 2, 13, 13, 55, 19)
%j is day of year
For pandas:
In[59]:
import pandas as pd
import io
t="""2018-24 7:10:0
2018-8 12:1:20
2018-44 13:55:19"""
df = pd.read_csv(io.StringIO(t), header=None, names=['datetime'])
df
Out[59]:
datetime
0 2018-24 7:10:0
1 2018-8 12:1:20
2 2018-44 13:55:19
Use pd.to_datetime and pass format param:
In[60]:
df['new_datetime'] = pd.to_datetime(df['datetime'], format='%Y-%j %H:%M:%S')
df
Out[60]:
datetime new_datetime
0 2018-24 7:10:0 2018-01-24 07:10:00
1 2018-8 12:1:20 2018-01-08 12:01:20
2 2018-44 13:55:19 2018-02-13 13:55:19
You can use dateutil.relativedelta for sum the day from the first day of years.
example:
from datetime import datetime
from dateutil.relativedelta import relativedelta
datetime.now()+ relativedelta(days=5)
The documentation at strftime.org identifies the %j format specifier as handling day of the year. I don't know whether it's available on all platforms, but my Mac certainly has it.
Use time.strptime to convert from string to datetime. The output below has a newline inserted for reading convenience:
>>> time.strptime('2018-24 7:10:0', '%Y-%j %H:%M:%S')
time.struct_time(tm_year=2018, tm_mon=1, tm_mday=24, tm_hour=7,
tm_min=10, tm_sec=0, tm_wday=2, tm_yday=24, tm_isdst=-1)
The time.strftime formats datetimes, so you can get what you need by applying it to the output of strptime:
>>> time.strftime('%Y-%m-%d %H:%M:%S',
... time.strptime('2018-24 7:10:0', '%Y-%j %H:%M:%S'))
'2018-01-24 07:10:00'

how to convert a string type to date format

My source data has a column including the date information but it is a string type.
Typical lines are like this:
04 13, 2013
07 1, 2012
I am trying to convert to a date format, so I used panda's to_datetime function:
df['ReviewDate_formated'] = pd.to_datetime(df['ReviewDate'],format='%mm%d, %yyyy')
But I got this error message:
ValueError: time data '04 13, 2013' does not match format '%mm%d, %yyyy' (match)
My questions are:
How do I convert to a date format?
I also want to extract to Month and Year and Day columns because I need to do some month over month comparison? But the problem here is the length of the string varies.
Your format string is incorrect, you want '%m %d, %Y', there is a reference that shows what the valid format identifiers are:
In [30]:
import io
import pandas as pd
t="""ReviewDate
04 13, 2013
07 1, 2012"""
df = pd.read_csv(io.StringIO(t), sep=';')
df
Out[30]:
ReviewDate
0 04 13, 2013
1 07 1, 2012
In [31]:
pd.to_datetime(df['ReviewDate'], format='%m %d, %Y')
Out[31]:
0 2013-04-13
1 2012-07-01
Name: ReviewDate, dtype: datetime64[ns]
To answer the second part, once the dtype is a datetime64 then you can call the vectorised dt accessor methods to get just the day, month, and year portions:
In [33]:
df['Date'] = pd.to_datetime(df['ReviewDate'], format='%m %d, %Y')
df['day'],df['month'],df['year'] = df['Date'].dt.day, df['Date'].dt.month, df['Date'].dt.year
df
Out[33]:
ReviewDate Date day month year
0 04 13, 2013 2013-04-13 13 4 2013
1 07 1, 2012 2012-07-01 1 7 2012

Categories

Resources