Pandas date-time conversion - python

I am fetching data from one of the file which has date stored as
20 March
Using pandas I want to convert to 20/03/2020
I tried using strftime,to_datetime using errors but still I am not able convert.
Moreover when I group by date it stores date column numerically like:
1 January,1 February,1 March then 2 January,2 February, 2 March
How do I resolve this?

import pandas as pd
def to_datetime_(dt):
return pd.to_datetime(dt + " 2020")
to get timestamp in pandas with year 2020 always

If year is always 2020 then use the following code:
df = pd.DataFrame({'date':['20 March','22 March']})
df['date_new'] = pd.to_datetime(df['date'], format='%d %B')
If this shows year as 1900 then:
df['date_new'] = df['date_new'].mask(df['date_new'].dt.year == 1900, df['date_new'] + pd.offsets.DateOffset(year = 2020))
print(df)
date date_new
0 20 March 2020-03-20
1 22 March 2020-03-22
Further you can convert the date format as required.

Do,
import pandas as pd
import datetime
df = pd.DataFrame({
'dates': ['1 January', '2 January', '10 March', '1 April']
})
df['dates'] = df['dates'].map(lambda x: datetime.datetime.strptime(x, "%d %B").replace(year=2020))
# Output
dates
0 2020-01-01
1 2020-01-02
2 2020-03-10
3 2020-04-01

Related

Pandas groupby month output is incorrect [duplicate]

My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct
Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30

convert mixed date with int and string to date

i have a dataframe with a column of dates. the date format is "mixed" with integers and string, like: " 15 January 2000". i would like to have a column with a date like "2000-01-15"
list_dates = ['15 January 2000', '16 January 2000', '17 January 2000']
df_dates = pd.DataFrame(list_dates)
df_dates['expect'] = ['2000-01-15', '2000-01-16', '2000-01-17']
I expect a column like "df_dates['expect']". Thank you for help!
Here's one way:
df_dates['expect'] = pd.to_datetime(df_dates['date'])
Here you go:
from datetime import datetime
new_values = []
for d in df_dates[0].values:
dt = datetime.strptime(d, '%d %B %Y')
new_values.append(f'{dt.year}-{dt.month}-{dt.day}')
df_dates[0] = new_values
A simple solution would be to use pandas.to_datetime function.
You are looking for the function:
df_dates["expect"] = pd.to_datetime(df_dates["column_name"])
A code snippet is shown below:
import pandas as pd
list_dates = ['15 January 2000', '16 January 2000', '17 January 2000']
df_dates = pd.DataFrame(list_dates)
df_dates['expect'] = pd.to_datetime(df_dates[0])
print(df_dates)
Output:
0 expect
0 15 January 2000 2000-01-15
1 16 January 2000 2000-01-16
2 17 January 2000 2000-01-17

Pandas extract week of year and year from date

I caught up with this scenario and don't know how can I solve this.
I have the data frame where I am trying to add "week_of_year" and "year" column based in the "date" column of the pandas' data frame which is working fine.
import pandas as pd
df = pd.DataFrame({'date': ['2018-12-31', '2019-01-01', '2019-12-31', '2020-01-01']})
df['date'] = pd.to_datetime(df['date'])
df['week_of_year'] = df['date'].apply(lambda x: x.weekofyear)
df['year'] = df['date'].apply(lambda x: x.year)
print(df)
Current Output
date week_of_year year
0 2018-12-31 1 2018
1 2019-01-01 1 2019
2 2019-12-31 1 2019
3 2020-01-01 1 2020
Expected Output
So here what I am expecting is for 2018 and 2019 the last date was the first week of the new year which is 2019 and 2020 respectively so I want to add logic in the year, where the week is 1 but the date belongs for the previous year so the year column would track that as in the expected output.
date week_of_year year
0 2018-12-31 1 2019
1 2019-01-01 1 2019
2 2019-12-31 1 2020
3 2020-01-01 1 2020
Try:
df['date'] = pd.to_datetime(df['date'])
df['week_of_year'] = df['date'].dt.weekofyear
df['year']=(df['date']+pd.to_timedelta(6-df['date'].dt.weekday, unit='d')).dt.year
Outputs:
date week_of_year year
0 2018-12-31 1 2019
1 2019-01-01 1 2019
2 2019-12-31 1 2020
3 2020-01-01 1 2020
Few things - generally avoid .apply(..).
For datetime columns you can just interact with the date through df[col].dt variable.
Then to get the last day of the week just add to date 6-weekday where weekday is between 0 (Monday) and 6 to the date
TLDR CODE
To get the week number as a series
df['DATE'].dt.isocalendar().week
To set a new column to the week use same function and set series returned to a column:
df['WEEK'] = df['DATE'].dt.isocalendar().week
TLDR EXPLANATION
Use the pd.series.dt.isocalendar().week to get the the week for a given series object.
Note:
column "DATE" must be stored as a datetime column

Convert string to date in python if date string has different format

My data has date variable with two different date formats
Date
01 Jan 2019
02 Feb 2019
01-12-2019
23-01-2019
11-04-2019
22-05-2019
I want to convert this string into date(YYYY-mm-dd)
Date
2019-01-01
2019-02-01
2019-12-01
2019-01-23
2019-04-11
2019-05-22
I have tried following things, but I am looking for better approach
df['Date'] = np.where(df['Date'].str.contains('-'), pd.to_datetime(df['Date'], format='%d-%m-%Y'), pd.to_datetime(df['Date'], format='%d %b %Y'))
Working solution for me
df['Date_1']= np.where(df['Date'].str.contains('-'),df['Date'],np.nan)
df['Date_2']= np.where(df['Date'].str.contains('-'),np.nan,df['Date'])
df['Date_new'] = np.where(df['Date'].str.contains('-'),pd.to_datetime(df['Date_1'], format = '%d-%m-%Y'),pd.to_datetime(df['Date_2'], format = '%d %b %Y'))
Just use the option dayfirst=True
pd.to_datetime(df.Date, dayfirst=True)
Out[353]:
0 2019-01-01
1 2019-02-02
2 2019-12-01
3 2019-01-23
4 2019-04-11
5 2019-05-22
Name: Date, dtype: datetime64[ns]
My suggestion:
Define a conversion function as follows:
import datetime as dt
def conv_date(x):
try:
res = pd.to_datetime(dt.datetime.strptime(x, "%d %b %Y"))
except ValueError:
res = pd.to_datetime(dt.datetime.strptime(x, "%d-%m-%Y"))
return res
Now get the new date column as folows:
df['Date_new'] = df['Date'].apply(lambda x: conv_date(x))
You can get your desired result with the help of apply AND to_datetime method of pandas, as given below:-
import pandas pd
def change(value):
return pd.to_datetime(value)
df = pd.DataFrame(data = {'date':['01 jan 2019']})
df['date'] = df['date'].apply(change)
df
I hope it may help you.
This works simply as expected -
import pandas as pd
a = pd. DataFrame({
'Date' : ['01 Jan 2019',
'02 Feb 2019',
'01-12-2019',
'23-01-2019',
'11-04-2019',
'22-05-2019']
})
a['Date'] = a['Date'].apply(lambda date: pd.to_datetime(date, dayfirst=True))
print(a)

Sorting Python data frame according to dates

How to sort a python data frame according to dates in the format that can be seen on the image. The output that I want to receive is the same data frame but at index 0 I would have January 2013 and the corresponding amount and at index 1 I would have February 2013 etc.
import pandas as pd
df = pd.DataFrame( {'Amount':['54241.25','54008.83','54008.82'] ,
'Date':['05/01/2015','05/01/2017','06/01/2017']})
df['Date'] =pd.to_datetime(df.Date)
df.sort_values('Date', inplace=True)
You just need to convert your Date column to a datetime, then you can sort the dataframe by that column
import pandas as pd
df = pd.DataFrame({'Date': ['05-2016', '05-2017', '06-2017', '01-2017', '02-2017'],
'Amount': [2,5,6,3,2]})
df['Date'] = pd.to_datetime(df['Date'], format='%m-%Y')
df = df.sort_values('Date').reset_index(drop=True)
Which gives:
Date Amount
0 2016-05-01 2
1 2017-01-01 3
2 2017-02-01 2
3 2017-05-01 5
4 2017-06-01 6

Categories

Resources