How to reformat date data in Pandas dataframe - python

My input dataframe is
df = pd.DataFrame({'Source':['Pre-Nov 2017', 'Pre-Nov 2017', 'Oct 19', '2019-04-01 00:00:00', '2019-06-01 00:00:00', 'Nov 17-Nov 18', 'Nov 17-Nov 18']})
I would need Target column as below
If I use the below code , it's not working. I'm getting the same values of Source in the Target column.
df['Target'] = pd.to_datetime(df['Source'], format= '%b %Y',errors='ignore')
Looks like pandas is considering values like '2019-04-01 00:00:00', '2019-06-01 00:00:00' as NaN

One idea is use errors='coerce' for missing values if not matching datetimes, then convert to custom strings by Series.dt.strftime - also NaT are strings, so for replace to original use Series.mask:
df['Target'] = (pd.to_datetime(df['Source'], errors='coerce')
.dt.strftime('%b %y')
.mask(lambda x: x == 'NaT', df['Source']))
print (df)
Source Target
0 Pre-Nov 2017 Pre-Nov 2017
1 Pre-Nov 2017 Pre-Nov 2017
2 Oct 19 Oct 19
3 2019-04-01 00:00:00 Apr 19
4 2019-06-01 00:00:00 Jun 19
5 Nov 17-Nov 18 Nov 17-Nov 18
6 Nov 17-Nov 18 Nov 17-Nov 18
Alternative is use numpy.where:
d = pd.to_datetime(df['Source'], errors='coerce')
df['Target'] = np.where(d.isna(), df['Source'], d.dt.strftime('%b %y'))
EDIT:
but why did this did not worked
df['Target'] = pd.to_datetime(df['Source'], format= '%b %Y',errors='ignore')
If check to_datetime and use errors='ignore' it return same values of column if converting failed.
If 'ignore', then invalid parsing will return the input

Related

How to convert dataframe string into date time

df['Year,date']
Sep 10
1 Sep 16
2 Aug 01
3 Sep 30
4 Sep 28
...
2230 Jul 20
2231 Oct 26
2232 Oct 13
2233 Dec 31
2234 Jul 08
Name: Year,date, Length: 2235, dtype: object
This is my dataframe and I want to convert each row into data time
in Months and date, format, I have tried some codes but not working on mine.
welcome to Stack Overflow. To convert the dataframe you mentioned from string to date time, you can use below code.
Initial data
from datetime import datetime
data = {'date': ['Sep 16', 'Aug 01', 'Sep 30', 'Sep 16']}
df=pd.DataFrame(data)
df.info()
>> # Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 4 non-null object
print(df)
>> date
0 Sep 16
1 Aug 01
2 Sep 30
3 Sep 16
To convert to datetime....
pd.to_datetime(df['date'],format='%b %d').dt.to_period('M')
df.info()
>> # Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 4 non-null datetime64[ns]
dtypes: datetime64[ns](1)
print(df)
>> date
0 1900-09-16
1 1900-08-01
2 1900-09-30
3 1900-09-16
You might have noticed that the year is taken as 1900 as this is the default. So, in case you need it as this year, you would do this...
from datetime import datetime
data = {'date': ['Sep 16', 'Aug 01', 'Sep 30', 'Sep 16']}
df=pd.DataFrame(data)
df.date = datetime.now().strftime("%Y") + " " + df.date
df.date = pd.to_datetime(df.date, format='%Y %b %d')
print(df)
>> date
0 2022-09-16
1 2022-08-01
2 2022-09-30
3 2022-09-16
Now that the date is stored in the dataframe in as a datetime format, if you want to see this information in the mon dd format, you would need to do this...
print(df.date.dt.strftime("%b %d"))
>> 0 Sep 16
1 Aug 01
2 Sep 30
3 Sep 16
Note that the date in df is still in datetime format.

Convert date strings with Italian month names to %Y-%m-%d

I would like to convert dates (Before) within a column (After) in date format:
Before After
23 Ottobre 2020 2020-10-23
24 Ottobre 2020 2020-10-24
27 Ottobre 2020 2020-10-27
30 Ottobre 2020 2020-10-30
22 Luglio 2020 2020-07-22
I tried as follows:
from datetime import datetime
date = df.Before.tolist()
dtObject = datetime.strptime(date,"%d %m, %y")
dtConverted = dtObject.strftime("%y-%m-%d")
But it does not work.
Can you explain me how to do it?
Similar to this question, you can set the locale to Italian before parsing:
import pandas as pd
import locale
locale.setlocale(locale.LC_ALL, 'it_IT')
df = pd.DataFrame({'Before': ['30 Ottobre 2020', '22 Luglio 2020']})
df['After'] = pd.to_datetime(df['Before'], format='%d %B %Y')
# df
# Before After
# 0 30 Ottobre 2020 2020-10-30
# 1 22 Luglio 2020 2020-07-22
If you want the "After" column as dtype string, use df['After'].dt.strftime('%Y-%m-%d').

Convert string to date in python if date string has different format

My data has date variable with two different date formats
Date
01 Jan 2019
02 Feb 2019
01-12-2019
23-01-2019
11-04-2019
22-05-2019
I want to convert this string into date(YYYY-mm-dd)
Date
2019-01-01
2019-02-01
2019-12-01
2019-01-23
2019-04-11
2019-05-22
I have tried following things, but I am looking for better approach
df['Date'] = np.where(df['Date'].str.contains('-'), pd.to_datetime(df['Date'], format='%d-%m-%Y'), pd.to_datetime(df['Date'], format='%d %b %Y'))
Working solution for me
df['Date_1']= np.where(df['Date'].str.contains('-'),df['Date'],np.nan)
df['Date_2']= np.where(df['Date'].str.contains('-'),np.nan,df['Date'])
df['Date_new'] = np.where(df['Date'].str.contains('-'),pd.to_datetime(df['Date_1'], format = '%d-%m-%Y'),pd.to_datetime(df['Date_2'], format = '%d %b %Y'))
Just use the option dayfirst=True
pd.to_datetime(df.Date, dayfirst=True)
Out[353]:
0 2019-01-01
1 2019-02-02
2 2019-12-01
3 2019-01-23
4 2019-04-11
5 2019-05-22
Name: Date, dtype: datetime64[ns]
My suggestion:
Define a conversion function as follows:
import datetime as dt
def conv_date(x):
try:
res = pd.to_datetime(dt.datetime.strptime(x, "%d %b %Y"))
except ValueError:
res = pd.to_datetime(dt.datetime.strptime(x, "%d-%m-%Y"))
return res
Now get the new date column as folows:
df['Date_new'] = df['Date'].apply(lambda x: conv_date(x))
You can get your desired result with the help of apply AND to_datetime method of pandas, as given below:-
import pandas pd
def change(value):
return pd.to_datetime(value)
df = pd.DataFrame(data = {'date':['01 jan 2019']})
df['date'] = df['date'].apply(change)
df
I hope it may help you.
This works simply as expected -
import pandas as pd
a = pd. DataFrame({
'Date' : ['01 Jan 2019',
'02 Feb 2019',
'01-12-2019',
'23-01-2019',
'11-04-2019',
'22-05-2019']
})
a['Date'] = a['Date'].apply(lambda date: pd.to_datetime(date, dayfirst=True))
print(a)

Return dataframe with range of dates

I need a Python function to return a Pandas DataFrame with range of dates, only year and month, for example, from November 2016 to March 2017 and have this as result:
year month
2016 11
2016 12
2017 01
2017 02
2017 03
My dates are in string format Y-m (from = '2016-11', to = '2017-03'). I'm not sure on turning them to datetime type or to separate them into two different integer values.
Any ideas on how to achieve it properly?
Are you looking at something like this?
pd.date_range('November 2016', 'April 2017', freq = 'M')
You get
DatetimeIndex(['2016-11-30', '2016-12-31', '2017-01-31', '2017-02-28',
'2017-03-31'],
dtype='datetime64[ns]', freq='M')
To get dataframe
index = pd.date_range('November 2016', 'April 2017', freq = 'M')
df = pd.DataFrame(index = index)
pd.Series(pd.date_range('2016-11', '2017-4', freq='M').strftime('%Y-%m')) \
.str.split('-', expand=True) \
.rename(columns={0: 'year', 1: 'month'})
year month
0 2016 11
1 2016 12
2 2017 01
3 2017 02
4 2017 03
You can use a combination of pd.to_datetime and pd.date_range.
import pandas as pd
start = 'November 2016'
end = 'March 2017'
s = pd.Series(pd.date_range(*(pd.to_datetime([start, end]) \
+ pd.offsets.MonthEnd()), freq='1M'))
Construct a dataframe using the .dt accessor attributes.
df = pd.DataFrame({'year' : s.dt.year, 'month' : s.dt.month})
df
month year
0 11 2016
1 12 2016
2 1 2017
3 2 2017
4 3 2017

Making a list of months and years from DatetimeIndex in Pandas

I have a dataframe of information. I set the index to be the received date and time. Now I want a list
I set the df index doing this:
df.index = pd.to_datetime(df.index, format='%m/%d/%Y %H:%M')
which gives me this:
print df.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-07-28 09:42:08, ..., 2015-07-28 09:06:12]
Length: 15177, Freq: None, Timezone: None
I want a list of the month and years in order to use them to plot, like so: ["Jan 2015", "Feb 2015", "Mar 2015", "Apr 2015", "May 2015", "June 2015", "Jul 2015", "Aug 2014", "Sep 2014", "Oct 2014", "Nov 2014", "Dec 2014"]
How can I do this? I've looked into something like this:
df = [datetime.datetime.strftime(n,'%b-%Y') for n in pd.DataFrame(df).resample('M').index]
But this gives me the error DataError: No numeric types to aggregate.
Original answer
The following should work: convert your datetimeindex to a series, so you can call apply and use strftime to return an array of strings:
In [27]:
import datetime as dt
import pandas as pd
df = pd.DataFrame(index=pd.date_range(start = dt.datetime(2014,1,1), end = dt.datetime.now(), freq='M'))
df.index.to_series().apply(lambda x: dt.datetime.strftime(x, '%b %Y'))
Out[27]:
2014-01-31 Jan 2014
2014-02-28 Feb 2014
2014-03-31 Mar 2014
2014-04-30 Apr 2014
2014-05-31 May 2014
2014-06-30 Jun 2014
2014-07-31 Jul 2014
2014-08-31 Aug 2014
2014-09-30 Sep 2014
2014-10-31 Oct 2014
2014-11-30 Nov 2014
2014-12-31 Dec 2014
2015-01-31 Jan 2015
2015-02-28 Feb 2015
2015-03-31 Mar 2015
2015-04-30 Apr 2015
2015-05-31 May 2015
2015-06-30 Jun 2015
Freq: M, dtype: object
If you want a list then just call tolist():
df.index.to_series().apply(lambda x: dt.datetime.strftime(x, '%b %Y')).tolist()
Updated answer
Actually, looking at this question 2 years later, I realise the above is completely unnecessary. You can just do:
In [10]:
df.index.strftime('%Y-%b')
Out[10]:
array(['2014-Jan', '2014-Feb', '2014-Mar', '2014-Apr', '2014-May',
'2014-Jun', '2014-Jul', '2014-Aug', '2014-Sep', '2014-Oct',
'2014-Nov', '2014-Dec', '2015-Jan', '2015-Feb', '2015-Mar',
'2015-Apr', '2015-May', '2015-Jun', '2015-Jul', '2015-Aug',
'2015-Sep', '2015-Oct', '2015-Nov', '2015-Dec', '2016-Jan',
'2016-Feb', '2016-Mar', '2016-Apr', '2016-May', '2016-Jun',
'2016-Jul', '2016-Aug', '2016-Sep', '2016-Oct', '2016-Nov',
'2016-Dec', '2017-Jan', '2017-Feb', '2017-Mar', '2017-Apr',
'2017-May', '2017-Jun', '2017-Jul'],
dtype='<U8')
datetimeindex support .dt accessors directly without converting to a Series
You can directly do this as of pandas 1.0.x (2020). You can generate arbitrary pd.date_range with arbitrary frequency, then strftime() it into arbitrary format. All in one line:
>>> pd.date_range(start='7/2019', end='6/2020', freq='M').strftime('%Y-%b')
Index(['2019-Jul', '2019-Aug', '2019-Sep', '2019-Oct', '2019-Nov', '2019-Dec',
'2020-Jan', '2020-Feb', '2020-Mar', '2020-Apr', '2020-May'],
dtype='object')

Categories

Resources