How to convert dataframe string into date time - python

df['Year,date']
Sep 10
1 Sep 16
2 Aug 01
3 Sep 30
4 Sep 28
...
2230 Jul 20
2231 Oct 26
2232 Oct 13
2233 Dec 31
2234 Jul 08
Name: Year,date, Length: 2235, dtype: object
This is my dataframe and I want to convert each row into data time
in Months and date, format, I have tried some codes but not working on mine.

welcome to Stack Overflow. To convert the dataframe you mentioned from string to date time, you can use below code.
Initial data
from datetime import datetime
data = {'date': ['Sep 16', 'Aug 01', 'Sep 30', 'Sep 16']}
df=pd.DataFrame(data)
df.info()
>> # Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 4 non-null object
print(df)
>> date
0 Sep 16
1 Aug 01
2 Sep 30
3 Sep 16
To convert to datetime....
pd.to_datetime(df['date'],format='%b %d').dt.to_period('M')
df.info()
>> # Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 4 non-null datetime64[ns]
dtypes: datetime64[ns](1)
print(df)
>> date
0 1900-09-16
1 1900-08-01
2 1900-09-30
3 1900-09-16
You might have noticed that the year is taken as 1900 as this is the default. So, in case you need it as this year, you would do this...
from datetime import datetime
data = {'date': ['Sep 16', 'Aug 01', 'Sep 30', 'Sep 16']}
df=pd.DataFrame(data)
df.date = datetime.now().strftime("%Y") + " " + df.date
df.date = pd.to_datetime(df.date, format='%Y %b %d')
print(df)
>> date
0 2022-09-16
1 2022-08-01
2 2022-09-30
3 2022-09-16
Now that the date is stored in the dataframe in as a datetime format, if you want to see this information in the mon dd format, you would need to do this...
print(df.date.dt.strftime("%b %d"))
>> 0 Sep 16
1 Aug 01
2 Sep 30
3 Sep 16
Note that the date in df is still in datetime format.

Related

cleaning date columns in python

Kindly assist me in cleaning my date types in python.
My sample data is as follows:
INITIATION DATE
DATE CUT
DATE GIVEN
1/July/2022
21 July 2022
11-July-2022
17-July-2022
16/July/2022
21/July/2022
16-July-2022
01-July-2022
09/July/2022
19-July-2022
31 July 2022
27 July 2022
How do I remove all dashes/slashes/hyphens from dates in the different columns? I have 8 columns and 300 rows.
What i tried:
df[['INITIATION DATE', 'DATE CUT', 'DATE GIVEN']]= df[['INITIATION DATE', 'DATE CUT', 'DATE GIVEN']].apply(pd.to_datetime, format = '%d%b%Y')
Desired output format for all: 1 July 2022
ValueError I'm getting:
time data '18 July 2022' does not match format '%d-%b-%Y' (match)
to remove all dashes/slashes/hyphens from strings you can just use replace method:
df.apply(lambda x: x.str.replace('[/-]',' ',regex=True))
>>>
'''
INITIATION DATE DATE CUT DATE GIVEN
0 1 July 2022 21 July 2022 11 July 2022
1 17 July 2022 16 July 2022 21 July 2022
2 16 July 2022 01 July 2022 09 July 2022
3 19 July 2022 31 July 2022 27 July 2022
and if you also need to conver strings to datetime then try this:
df.apply(lambda x: pd.to_datetime(x.str.replace('[/-]',' ',regex=True)))
>>>
'''
INITIATION DATE DATE CUT DATE GIVEN
0 2022-07-01 2022-07-21 2022-07-11
1 2022-07-17 2022-07-16 2022-07-21
2 2022-07-16 2022-07-01 2022-07-09
3 2022-07-19 2022-07-31 2022-07-27
You can use pd.to_datetime to convert strings to datetime objects. The function takes a format argument which specifies the format of the datetime string, using the usual format codes
df['INITIATION DATE'] = pd.to_datetime(df['INITIATION DATE'], format='%d-%B-%Y').dt.strftime('%d %B %Y')
df['DATE CUT'] = pd.to_datetime(df['DATE CUT'], format='%d %B %Y').dt.strftime('%d %B %Y')
df['DATE GIVEN'] = pd.to_datetime(df['DATE GIVEN'], format='%d/%B/%Y').dt.strftime('%d %B %Y')
output
INITIATION DATE DATE CUT DATE GIVEN
0 01 July 2022 21 July 2022 11 July 2022
1 17 July 2022 16 July 2022 21 July 2022
2 16 July 2022 01 July 2022 09 July 2022
3 19 July 2022 31 July 2022 27 July 2022
You get that error because your datetime strings (e.g. '18 July 2022') do not match your format specifiers ('%d-%b-%Y') because of the extra hyphens in the format specifier.

Convert date strings with Italian month names to %Y-%m-%d

I would like to convert dates (Before) within a column (After) in date format:
Before After
23 Ottobre 2020 2020-10-23
24 Ottobre 2020 2020-10-24
27 Ottobre 2020 2020-10-27
30 Ottobre 2020 2020-10-30
22 Luglio 2020 2020-07-22
I tried as follows:
from datetime import datetime
date = df.Before.tolist()
dtObject = datetime.strptime(date,"%d %m, %y")
dtConverted = dtObject.strftime("%y-%m-%d")
But it does not work.
Can you explain me how to do it?
Similar to this question, you can set the locale to Italian before parsing:
import pandas as pd
import locale
locale.setlocale(locale.LC_ALL, 'it_IT')
df = pd.DataFrame({'Before': ['30 Ottobre 2020', '22 Luglio 2020']})
df['After'] = pd.to_datetime(df['Before'], format='%d %B %Y')
# df
# Before After
# 0 30 Ottobre 2020 2020-10-30
# 1 22 Luglio 2020 2020-07-22
If you want the "After" column as dtype string, use df['After'].dt.strftime('%Y-%m-%d').

Pandas "to_datetime" not accepting series

I am new to pandas and am trying to convert a column of strings with dates in the format '%d %B' (01 January, 02 January .... ) to date time objects and the type of the column is <class 'pandas.core.series.Series'> .
if i pass in this series in the to_datetime method, like
print(pd.to_datetime(data_file['Date'], format='%d %B', errors="coerce"))
it all returns NaT for all the entries, where as it should return date time objects
I checked the documentation and it says that it accepts a Series object.
Any way to fix this?
Edit 1:
here is the head of the data i am using:
Date Daily Confirmed
0 30 January 1
1 31 January 0
2 01 February 0
3 02 February 1
4 03 February 1
edit 2: here is the information of the data
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 179 entries, 0 to 178
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 179 non-null object
1 Daily Confirmed 179 non-null int64
dtypes: int64(1), object(1)
memory usage: 2.2+ KB
If I understand correctly, you may be facing this issue because there are spaces around the dates in this column. To solve it, use strip before to_datetime. Here's a piece of code that does that:
df = pd.DataFrame({'Date':
['30 January ', '31 January ', ' 01 February ', '02 February',
'03 February'], 'Daily Confirmed': [1, 0, 0, 1, 1]})
pd.to_datetime(df.Date.str.strip(), format = "%d %B")
The output is:
0 1900-01-30
1 1900-01-31
2 1900-02-01
...
import pandas as pd
dic = {"Date": ["30 January", "31 January", "01 February", ] , "Daily Confirmed":[0,1,0]}
df =pd.DataFrame(dic)
df['date1'] = pd.to_datetime(df['Date'].astype(str), format='%d %B')
df
By default, it contains years as 1900. Because you did not provide year on your Dataframe
Output:
Date Daily Confirmed date1
0 30 January 0 1900-01-30
1 31 January 1 1900-01-31
2 01 February 0 1900-02-01
If you don't want year as prefix of date. Please add the below code:
df['date2']=df['date1'].dt.strftime('%d-%m')
df
Date Daily Confirmed date1 date2
0 30 January 0 1900-01-30 30-1
1 31 January 1 1900-01-31 31-1
2 01 February 0 1900-02-01 01-2
Thanks
You may try this:
from datetime import datetime
df['datetime'] = df['date'].apply(lambda x: datetime.strptime(x, "%d %B"))
apply() allows you to use python functions in series, here you may have to specify the year otherwise the default year (1900) will be set as default.
Good luck

Python - Extract year and month from a single column of different year and month arrangements

I would like to create two columns "Year" and "Month" from a Date column that contains different year and month arrangements. Some are YY-Mmm and the others are Mmm-YY.
import pandas as pd
dataSet = {
"Date": ["18-Jan", "18-Jan", "18-Feb", "18-Feb", "Oct-17", "Oct-17"],
"Quantity": [3476, 20, 789, 409, 81, 640],
}
df = pd.DataFrame(dataSet, columns=["Date", "Quantity"])
My attempt is as follows:
Date1 = []
Date2 = []
for dt in df.Date:
Date1.append(dt.split("-")[0])
Date2.append(dt.split("-")[1])
Year = []
try:
for yr in Date1:
Year.append(int(yr.Date1))
except:
for yr in Date2:
Year.append(int(yr.Date2))
You can make use of the extract dataframe string method to split the date strings up. Since the year can precede or follow the month, we can get a bit creative and have a Year1 column and Year2 columns for either position. Then use np.where to create a single Year column pulls from each of these other year columns.
For example:
import numpy as np
split_dates = df["Date"].str.extract(r"(?P<Year1>\d+)?-?(?P<Month>\w+)-?(?P<Year2>\d+)?")
split_dates["Year"] = np.where(
split_dates["Year1"].notna(),
split_dates["Year1"],
split_dates["Year2"],
)
split_dates = split_dates[["Year", "Month"]]
With result for split_dates:
Year Month
0 18 Jan
1 18 Jan
2 18 Feb
3 18 Feb
4 17 Oct
5 17 Oct
Then you can merge back with your original dataframe with pd.merge, like so:
pd.merge(df, split_dates, how="inner", left_index=True, right_index=True)
Which yields:
Date Quantity Year Month
0 18-Jan 3476 18 Jan
1 18-Jan 20 18 Jan
2 18-Feb 789 18 Feb
3 18-Feb 409 18 Feb
4 Oct-17 81 17 Oct
5 Oct-17 640 17 Oct
Thank you for your help. I managed to get it working with what I've learned so far, i.e. for loop, if-else and split() and with the help of another expert.
# Split the Date column and store it in an array
dA = []
for dP in df.Date:
dA.append(dP.split("-"))
# Append month and year to respective lists based on if conditions
Month = []
Year = []
for moYr in dA:
if len(moYr[0]) == 2:
Month.append(moYr[1])
Year.append(moYr[0])
else:
Month.append(moYr[0])
Year.append(moYr[1])
This took me hours!
Try using Python datetime strptime(<date>, "%y-%b") on the date column to convert it to a Python datetime.
from datetime import datetime
def parse_dt(x):
try:
return datetime.strptime(x, "%y-%b")
except:
return datetime.strptime(x, "%b-%y")
df['timestamp'] = df['Date'].apply(parse_dt)
df
Date Quantity timestamp
0 18-Jan 3476 2018-01-01
1 18-Jan 20 2018-01-01
2 18-Feb 789 2018-02-01
3 18-Feb 409 2018-02-01
4 Oct-17 81 2017-10-01
5 Oct-17 640 2017-10-01
Then you can just use .month and .year attributes, or if you prefer the month as its abbreviated form, use Python datetime.strftime('%b').
df['year'] = df.timestamp.apply(lambda x: x.year)
df['month'] = df.timestamp.apply(lambda x: x.strftime('%b'))
df
Date Quantity timestamp year month
0 18-Jan 3476 2018-01-01 2018 Jan
1 18-Jan 20 2018-01-01 2018 Jan
2 18-Feb 789 2018-02-01 2018 Feb
3 18-Feb 409 2018-02-01 2018 Feb
4 Oct-17 81 2017-10-01 2017 Oct
5 Oct-17 640 2017-10-01 2017 Oct

How to reformat date data in Pandas dataframe

My input dataframe is
df = pd.DataFrame({'Source':['Pre-Nov 2017', 'Pre-Nov 2017', 'Oct 19', '2019-04-01 00:00:00', '2019-06-01 00:00:00', 'Nov 17-Nov 18', 'Nov 17-Nov 18']})
I would need Target column as below
If I use the below code , it's not working. I'm getting the same values of Source in the Target column.
df['Target'] = pd.to_datetime(df['Source'], format= '%b %Y',errors='ignore')
Looks like pandas is considering values like '2019-04-01 00:00:00', '2019-06-01 00:00:00' as NaN
One idea is use errors='coerce' for missing values if not matching datetimes, then convert to custom strings by Series.dt.strftime - also NaT are strings, so for replace to original use Series.mask:
df['Target'] = (pd.to_datetime(df['Source'], errors='coerce')
.dt.strftime('%b %y')
.mask(lambda x: x == 'NaT', df['Source']))
print (df)
Source Target
0 Pre-Nov 2017 Pre-Nov 2017
1 Pre-Nov 2017 Pre-Nov 2017
2 Oct 19 Oct 19
3 2019-04-01 00:00:00 Apr 19
4 2019-06-01 00:00:00 Jun 19
5 Nov 17-Nov 18 Nov 17-Nov 18
6 Nov 17-Nov 18 Nov 17-Nov 18
Alternative is use numpy.where:
d = pd.to_datetime(df['Source'], errors='coerce')
df['Target'] = np.where(d.isna(), df['Source'], d.dt.strftime('%b %y'))
EDIT:
but why did this did not worked
df['Target'] = pd.to_datetime(df['Source'], format= '%b %Y',errors='ignore')
If check to_datetime and use errors='ignore' it return same values of column if converting failed.
If 'ignore', then invalid parsing will return the input

Categories

Resources