Convert date strings with Italian month names to %Y-%m-%d - python

I would like to convert dates (Before) within a column (After) in date format:
Before After
23 Ottobre 2020 2020-10-23
24 Ottobre 2020 2020-10-24
27 Ottobre 2020 2020-10-27
30 Ottobre 2020 2020-10-30
22 Luglio 2020 2020-07-22
I tried as follows:
from datetime import datetime
date = df.Before.tolist()
dtObject = datetime.strptime(date,"%d %m, %y")
dtConverted = dtObject.strftime("%y-%m-%d")
But it does not work.
Can you explain me how to do it?

Similar to this question, you can set the locale to Italian before parsing:
import pandas as pd
import locale
locale.setlocale(locale.LC_ALL, 'it_IT')
df = pd.DataFrame({'Before': ['30 Ottobre 2020', '22 Luglio 2020']})
df['After'] = pd.to_datetime(df['Before'], format='%d %B %Y')
# df
# Before After
# 0 30 Ottobre 2020 2020-10-30
# 1 22 Luglio 2020 2020-07-22
If you want the "After" column as dtype string, use df['After'].dt.strftime('%Y-%m-%d').

Related

cleaning date columns in python

Kindly assist me in cleaning my date types in python.
My sample data is as follows:
INITIATION DATE
DATE CUT
DATE GIVEN
1/July/2022
21 July 2022
11-July-2022
17-July-2022
16/July/2022
21/July/2022
16-July-2022
01-July-2022
09/July/2022
19-July-2022
31 July 2022
27 July 2022
How do I remove all dashes/slashes/hyphens from dates in the different columns? I have 8 columns and 300 rows.
What i tried:
df[['INITIATION DATE', 'DATE CUT', 'DATE GIVEN']]= df[['INITIATION DATE', 'DATE CUT', 'DATE GIVEN']].apply(pd.to_datetime, format = '%d%b%Y')
Desired output format for all: 1 July 2022
ValueError I'm getting:
time data '18 July 2022' does not match format '%d-%b-%Y' (match)
to remove all dashes/slashes/hyphens from strings you can just use replace method:
df.apply(lambda x: x.str.replace('[/-]',' ',regex=True))
>>>
'''
INITIATION DATE DATE CUT DATE GIVEN
0 1 July 2022 21 July 2022 11 July 2022
1 17 July 2022 16 July 2022 21 July 2022
2 16 July 2022 01 July 2022 09 July 2022
3 19 July 2022 31 July 2022 27 July 2022
and if you also need to conver strings to datetime then try this:
df.apply(lambda x: pd.to_datetime(x.str.replace('[/-]',' ',regex=True)))
>>>
'''
INITIATION DATE DATE CUT DATE GIVEN
0 2022-07-01 2022-07-21 2022-07-11
1 2022-07-17 2022-07-16 2022-07-21
2 2022-07-16 2022-07-01 2022-07-09
3 2022-07-19 2022-07-31 2022-07-27
You can use pd.to_datetime to convert strings to datetime objects. The function takes a format argument which specifies the format of the datetime string, using the usual format codes
df['INITIATION DATE'] = pd.to_datetime(df['INITIATION DATE'], format='%d-%B-%Y').dt.strftime('%d %B %Y')
df['DATE CUT'] = pd.to_datetime(df['DATE CUT'], format='%d %B %Y').dt.strftime('%d %B %Y')
df['DATE GIVEN'] = pd.to_datetime(df['DATE GIVEN'], format='%d/%B/%Y').dt.strftime('%d %B %Y')
output
INITIATION DATE DATE CUT DATE GIVEN
0 01 July 2022 21 July 2022 11 July 2022
1 17 July 2022 16 July 2022 21 July 2022
2 16 July 2022 01 July 2022 09 July 2022
3 19 July 2022 31 July 2022 27 July 2022
You get that error because your datetime strings (e.g. '18 July 2022') do not match your format specifiers ('%d-%b-%Y') because of the extra hyphens in the format specifier.

How to convert dataframe string into date time

df['Year,date']
Sep 10
1 Sep 16
2 Aug 01
3 Sep 30
4 Sep 28
...
2230 Jul 20
2231 Oct 26
2232 Oct 13
2233 Dec 31
2234 Jul 08
Name: Year,date, Length: 2235, dtype: object
This is my dataframe and I want to convert each row into data time
in Months and date, format, I have tried some codes but not working on mine.
welcome to Stack Overflow. To convert the dataframe you mentioned from string to date time, you can use below code.
Initial data
from datetime import datetime
data = {'date': ['Sep 16', 'Aug 01', 'Sep 30', 'Sep 16']}
df=pd.DataFrame(data)
df.info()
>> # Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 4 non-null object
print(df)
>> date
0 Sep 16
1 Aug 01
2 Sep 30
3 Sep 16
To convert to datetime....
pd.to_datetime(df['date'],format='%b %d').dt.to_period('M')
df.info()
>> # Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 4 non-null datetime64[ns]
dtypes: datetime64[ns](1)
print(df)
>> date
0 1900-09-16
1 1900-08-01
2 1900-09-30
3 1900-09-16
You might have noticed that the year is taken as 1900 as this is the default. So, in case you need it as this year, you would do this...
from datetime import datetime
data = {'date': ['Sep 16', 'Aug 01', 'Sep 30', 'Sep 16']}
df=pd.DataFrame(data)
df.date = datetime.now().strftime("%Y") + " " + df.date
df.date = pd.to_datetime(df.date, format='%Y %b %d')
print(df)
>> date
0 2022-09-16
1 2022-08-01
2 2022-09-30
3 2022-09-16
Now that the date is stored in the dataframe in as a datetime format, if you want to see this information in the mon dd format, you would need to do this...
print(df.date.dt.strftime("%b %d"))
>> 0 Sep 16
1 Aug 01
2 Sep 30
3 Sep 16
Note that the date in df is still in datetime format.

convert mixed date with int and string to date

i have a dataframe with a column of dates. the date format is "mixed" with integers and string, like: " 15 January 2000". i would like to have a column with a date like "2000-01-15"
list_dates = ['15 January 2000', '16 January 2000', '17 January 2000']
df_dates = pd.DataFrame(list_dates)
df_dates['expect'] = ['2000-01-15', '2000-01-16', '2000-01-17']
I expect a column like "df_dates['expect']". Thank you for help!
Here's one way:
df_dates['expect'] = pd.to_datetime(df_dates['date'])
Here you go:
from datetime import datetime
new_values = []
for d in df_dates[0].values:
dt = datetime.strptime(d, '%d %B %Y')
new_values.append(f'{dt.year}-{dt.month}-{dt.day}')
df_dates[0] = new_values
A simple solution would be to use pandas.to_datetime function.
You are looking for the function:
df_dates["expect"] = pd.to_datetime(df_dates["column_name"])
A code snippet is shown below:
import pandas as pd
list_dates = ['15 January 2000', '16 January 2000', '17 January 2000']
df_dates = pd.DataFrame(list_dates)
df_dates['expect'] = pd.to_datetime(df_dates[0])
print(df_dates)
Output:
0 expect
0 15 January 2000 2000-01-15
1 16 January 2000 2000-01-16
2 17 January 2000 2000-01-17

How to reformat date data in Pandas dataframe

My input dataframe is
df = pd.DataFrame({'Source':['Pre-Nov 2017', 'Pre-Nov 2017', 'Oct 19', '2019-04-01 00:00:00', '2019-06-01 00:00:00', 'Nov 17-Nov 18', 'Nov 17-Nov 18']})
I would need Target column as below
If I use the below code , it's not working. I'm getting the same values of Source in the Target column.
df['Target'] = pd.to_datetime(df['Source'], format= '%b %Y',errors='ignore')
Looks like pandas is considering values like '2019-04-01 00:00:00', '2019-06-01 00:00:00' as NaN
One idea is use errors='coerce' for missing values if not matching datetimes, then convert to custom strings by Series.dt.strftime - also NaT are strings, so for replace to original use Series.mask:
df['Target'] = (pd.to_datetime(df['Source'], errors='coerce')
.dt.strftime('%b %y')
.mask(lambda x: x == 'NaT', df['Source']))
print (df)
Source Target
0 Pre-Nov 2017 Pre-Nov 2017
1 Pre-Nov 2017 Pre-Nov 2017
2 Oct 19 Oct 19
3 2019-04-01 00:00:00 Apr 19
4 2019-06-01 00:00:00 Jun 19
5 Nov 17-Nov 18 Nov 17-Nov 18
6 Nov 17-Nov 18 Nov 17-Nov 18
Alternative is use numpy.where:
d = pd.to_datetime(df['Source'], errors='coerce')
df['Target'] = np.where(d.isna(), df['Source'], d.dt.strftime('%b %y'))
EDIT:
but why did this did not worked
df['Target'] = pd.to_datetime(df['Source'], format= '%b %Y',errors='ignore')
If check to_datetime and use errors='ignore' it return same values of column if converting failed.
If 'ignore', then invalid parsing will return the input

Return dataframe with range of dates

I need a Python function to return a Pandas DataFrame with range of dates, only year and month, for example, from November 2016 to March 2017 and have this as result:
year month
2016 11
2016 12
2017 01
2017 02
2017 03
My dates are in string format Y-m (from = '2016-11', to = '2017-03'). I'm not sure on turning them to datetime type or to separate them into two different integer values.
Any ideas on how to achieve it properly?
Are you looking at something like this?
pd.date_range('November 2016', 'April 2017', freq = 'M')
You get
DatetimeIndex(['2016-11-30', '2016-12-31', '2017-01-31', '2017-02-28',
'2017-03-31'],
dtype='datetime64[ns]', freq='M')
To get dataframe
index = pd.date_range('November 2016', 'April 2017', freq = 'M')
df = pd.DataFrame(index = index)
pd.Series(pd.date_range('2016-11', '2017-4', freq='M').strftime('%Y-%m')) \
.str.split('-', expand=True) \
.rename(columns={0: 'year', 1: 'month'})
year month
0 2016 11
1 2016 12
2 2017 01
3 2017 02
4 2017 03
You can use a combination of pd.to_datetime and pd.date_range.
import pandas as pd
start = 'November 2016'
end = 'March 2017'
s = pd.Series(pd.date_range(*(pd.to_datetime([start, end]) \
+ pd.offsets.MonthEnd()), freq='1M'))
Construct a dataframe using the .dt accessor attributes.
df = pd.DataFrame({'year' : s.dt.year, 'month' : s.dt.month})
df
month year
0 11 2016
1 12 2016
2 1 2017
3 2 2017
4 3 2017

Categories

Resources