I am fetching data from one of the file which has date stored as
20 March
Using pandas I want to convert to 20/03/2020
I tried using strftime,to_datetime using errors but still I am not able convert.
Moreover when I group by date it stores date column numerically like:
1 January,1 February,1 March then 2 January,2 February, 2 March
How do I resolve this?
import pandas as pd
def to_datetime_(dt):
return pd.to_datetime(dt + " 2020")
to get timestamp in pandas with year 2020 always
If year is always 2020 then use the following code:
df = pd.DataFrame({'date':['20 March','22 March']})
df['date_new'] = pd.to_datetime(df['date'], format='%d %B')
If this shows year as 1900 then:
df['date_new'] = df['date_new'].mask(df['date_new'].dt.year == 1900, df['date_new'] + pd.offsets.DateOffset(year = 2020))
print(df)
date date_new
0 20 March 2020-03-20
1 22 March 2020-03-22
Further you can convert the date format as required.
Do,
import pandas as pd
import datetime
df = pd.DataFrame({
'dates': ['1 January', '2 January', '10 March', '1 April']
})
df['dates'] = df['dates'].map(lambda x: datetime.datetime.strptime(x, "%d %B").replace(year=2020))
# Output
dates
0 2020-01-01
1 2020-01-02
2 2020-03-10
3 2020-04-01
Related
My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct
Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30
i have a dataframe with a column of dates. the date format is "mixed" with integers and string, like: " 15 January 2000". i would like to have a column with a date like "2000-01-15"
list_dates = ['15 January 2000', '16 January 2000', '17 January 2000']
df_dates = pd.DataFrame(list_dates)
df_dates['expect'] = ['2000-01-15', '2000-01-16', '2000-01-17']
I expect a column like "df_dates['expect']". Thank you for help!
Here's one way:
df_dates['expect'] = pd.to_datetime(df_dates['date'])
Here you go:
from datetime import datetime
new_values = []
for d in df_dates[0].values:
dt = datetime.strptime(d, '%d %B %Y')
new_values.append(f'{dt.year}-{dt.month}-{dt.day}')
df_dates[0] = new_values
A simple solution would be to use pandas.to_datetime function.
You are looking for the function:
df_dates["expect"] = pd.to_datetime(df_dates["column_name"])
A code snippet is shown below:
import pandas as pd
list_dates = ['15 January 2000', '16 January 2000', '17 January 2000']
df_dates = pd.DataFrame(list_dates)
df_dates['expect'] = pd.to_datetime(df_dates[0])
print(df_dates)
Output:
0 expect
0 15 January 2000 2000-01-15
1 16 January 2000 2000-01-16
2 17 January 2000 2000-01-17
I caught up with this scenario and don't know how can I solve this.
I have the data frame where I am trying to add "week_of_year" and "year" column based in the "date" column of the pandas' data frame which is working fine.
import pandas as pd
df = pd.DataFrame({'date': ['2018-12-31', '2019-01-01', '2019-12-31', '2020-01-01']})
df['date'] = pd.to_datetime(df['date'])
df['week_of_year'] = df['date'].apply(lambda x: x.weekofyear)
df['year'] = df['date'].apply(lambda x: x.year)
print(df)
Current Output
date week_of_year year
0 2018-12-31 1 2018
1 2019-01-01 1 2019
2 2019-12-31 1 2019
3 2020-01-01 1 2020
Expected Output
So here what I am expecting is for 2018 and 2019 the last date was the first week of the new year which is 2019 and 2020 respectively so I want to add logic in the year, where the week is 1 but the date belongs for the previous year so the year column would track that as in the expected output.
date week_of_year year
0 2018-12-31 1 2019
1 2019-01-01 1 2019
2 2019-12-31 1 2020
3 2020-01-01 1 2020
Try:
df['date'] = pd.to_datetime(df['date'])
df['week_of_year'] = df['date'].dt.weekofyear
df['year']=(df['date']+pd.to_timedelta(6-df['date'].dt.weekday, unit='d')).dt.year
Outputs:
date week_of_year year
0 2018-12-31 1 2019
1 2019-01-01 1 2019
2 2019-12-31 1 2020
3 2020-01-01 1 2020
Few things - generally avoid .apply(..).
For datetime columns you can just interact with the date through df[col].dt variable.
Then to get the last day of the week just add to date 6-weekday where weekday is between 0 (Monday) and 6 to the date
TLDR CODE
To get the week number as a series
df['DATE'].dt.isocalendar().week
To set a new column to the week use same function and set series returned to a column:
df['WEEK'] = df['DATE'].dt.isocalendar().week
TLDR EXPLANATION
Use the pd.series.dt.isocalendar().week to get the the week for a given series object.
Note:
column "DATE" must be stored as a datetime column
My data has date variable with two different date formats
Date
01 Jan 2019
02 Feb 2019
01-12-2019
23-01-2019
11-04-2019
22-05-2019
I want to convert this string into date(YYYY-mm-dd)
Date
2019-01-01
2019-02-01
2019-12-01
2019-01-23
2019-04-11
2019-05-22
I have tried following things, but I am looking for better approach
df['Date'] = np.where(df['Date'].str.contains('-'), pd.to_datetime(df['Date'], format='%d-%m-%Y'), pd.to_datetime(df['Date'], format='%d %b %Y'))
Working solution for me
df['Date_1']= np.where(df['Date'].str.contains('-'),df['Date'],np.nan)
df['Date_2']= np.where(df['Date'].str.contains('-'),np.nan,df['Date'])
df['Date_new'] = np.where(df['Date'].str.contains('-'),pd.to_datetime(df['Date_1'], format = '%d-%m-%Y'),pd.to_datetime(df['Date_2'], format = '%d %b %Y'))
Just use the option dayfirst=True
pd.to_datetime(df.Date, dayfirst=True)
Out[353]:
0 2019-01-01
1 2019-02-02
2 2019-12-01
3 2019-01-23
4 2019-04-11
5 2019-05-22
Name: Date, dtype: datetime64[ns]
My suggestion:
Define a conversion function as follows:
import datetime as dt
def conv_date(x):
try:
res = pd.to_datetime(dt.datetime.strptime(x, "%d %b %Y"))
except ValueError:
res = pd.to_datetime(dt.datetime.strptime(x, "%d-%m-%Y"))
return res
Now get the new date column as folows:
df['Date_new'] = df['Date'].apply(lambda x: conv_date(x))
You can get your desired result with the help of apply AND to_datetime method of pandas, as given below:-
import pandas pd
def change(value):
return pd.to_datetime(value)
df = pd.DataFrame(data = {'date':['01 jan 2019']})
df['date'] = df['date'].apply(change)
df
I hope it may help you.
This works simply as expected -
import pandas as pd
a = pd. DataFrame({
'Date' : ['01 Jan 2019',
'02 Feb 2019',
'01-12-2019',
'23-01-2019',
'11-04-2019',
'22-05-2019']
})
a['Date'] = a['Date'].apply(lambda date: pd.to_datetime(date, dayfirst=True))
print(a)
How to sort a python data frame according to dates in the format that can be seen on the image. The output that I want to receive is the same data frame but at index 0 I would have January 2013 and the corresponding amount and at index 1 I would have February 2013 etc.
import pandas as pd
df = pd.DataFrame( {'Amount':['54241.25','54008.83','54008.82'] ,
'Date':['05/01/2015','05/01/2017','06/01/2017']})
df['Date'] =pd.to_datetime(df.Date)
df.sort_values('Date', inplace=True)
You just need to convert your Date column to a datetime, then you can sort the dataframe by that column
import pandas as pd
df = pd.DataFrame({'Date': ['05-2016', '05-2017', '06-2017', '01-2017', '02-2017'],
'Amount': [2,5,6,3,2]})
df['Date'] = pd.to_datetime(df['Date'], format='%m-%Y')
df = df.sort_values('Date').reset_index(drop=True)
Which gives:
Date Amount
0 2016-05-01 2
1 2017-01-01 3
2 2017-02-01 2
3 2017-05-01 5
4 2017-06-01 6