i have a dataframe with a column of dates. the date format is "mixed" with integers and string, like: " 15 January 2000". i would like to have a column with a date like "2000-01-15"
list_dates = ['15 January 2000', '16 January 2000', '17 January 2000']
df_dates = pd.DataFrame(list_dates)
df_dates['expect'] = ['2000-01-15', '2000-01-16', '2000-01-17']
I expect a column like "df_dates['expect']". Thank you for help!
Here's one way:
df_dates['expect'] = pd.to_datetime(df_dates['date'])
Here you go:
from datetime import datetime
new_values = []
for d in df_dates[0].values:
dt = datetime.strptime(d, '%d %B %Y')
new_values.append(f'{dt.year}-{dt.month}-{dt.day}')
df_dates[0] = new_values
A simple solution would be to use pandas.to_datetime function.
You are looking for the function:
df_dates["expect"] = pd.to_datetime(df_dates["column_name"])
A code snippet is shown below:
import pandas as pd
list_dates = ['15 January 2000', '16 January 2000', '17 January 2000']
df_dates = pd.DataFrame(list_dates)
df_dates['expect'] = pd.to_datetime(df_dates[0])
print(df_dates)
Output:
0 expect
0 15 January 2000 2000-01-15
1 16 January 2000 2000-01-16
2 17 January 2000 2000-01-17
Related
I need to covert a string which contains date information (e.g., November 3, 2020) into date format (i.e., 11/03/2020).
I wrote
df['Date']=pd.to_datetime(df['Date']).map(lambda x: x.strftime('%m/%d/%y'))
where Date is
November 3, 2020
June 26, 2002
July 02, 2010
and many other dates, but I found the error ValueError: NaTType does not support strftime.
You can use pandas.Series.dt.strftime, which handles the NaT:
import pandas as pd
dates = ['November 3, 2020',
'June 26, 2002',
'July 02, 2010',
'NaT']
dates = pd.to_datetime(dates)
df = pd.DataFrame(dates, columns=['Date'])
df['Date'] = df['Date'].dt.strftime('%m/%d/%y')
Output:
Date
0 11/03/20
1 06/26/02
2 07/02/10
3 NaN
I would like to convert dates (Before) within a column (After) in date format:
Before After
23 Ottobre 2020 2020-10-23
24 Ottobre 2020 2020-10-24
27 Ottobre 2020 2020-10-27
30 Ottobre 2020 2020-10-30
22 Luglio 2020 2020-07-22
I tried as follows:
from datetime import datetime
date = df.Before.tolist()
dtObject = datetime.strptime(date,"%d %m, %y")
dtConverted = dtObject.strftime("%y-%m-%d")
But it does not work.
Can you explain me how to do it?
Similar to this question, you can set the locale to Italian before parsing:
import pandas as pd
import locale
locale.setlocale(locale.LC_ALL, 'it_IT')
df = pd.DataFrame({'Before': ['30 Ottobre 2020', '22 Luglio 2020']})
df['After'] = pd.to_datetime(df['Before'], format='%d %B %Y')
# df
# Before After
# 0 30 Ottobre 2020 2020-10-30
# 1 22 Luglio 2020 2020-07-22
If you want the "After" column as dtype string, use df['After'].dt.strftime('%Y-%m-%d').
I am fetching data from one of the file which has date stored as
20 March
Using pandas I want to convert to 20/03/2020
I tried using strftime,to_datetime using errors but still I am not able convert.
Moreover when I group by date it stores date column numerically like:
1 January,1 February,1 March then 2 January,2 February, 2 March
How do I resolve this?
import pandas as pd
def to_datetime_(dt):
return pd.to_datetime(dt + " 2020")
to get timestamp in pandas with year 2020 always
If year is always 2020 then use the following code:
df = pd.DataFrame({'date':['20 March','22 March']})
df['date_new'] = pd.to_datetime(df['date'], format='%d %B')
If this shows year as 1900 then:
df['date_new'] = df['date_new'].mask(df['date_new'].dt.year == 1900, df['date_new'] + pd.offsets.DateOffset(year = 2020))
print(df)
date date_new
0 20 March 2020-03-20
1 22 March 2020-03-22
Further you can convert the date format as required.
Do,
import pandas as pd
import datetime
df = pd.DataFrame({
'dates': ['1 January', '2 January', '10 March', '1 April']
})
df['dates'] = df['dates'].map(lambda x: datetime.datetime.strptime(x, "%d %B").replace(year=2020))
# Output
dates
0 2020-01-01
1 2020-01-02
2 2020-03-10
3 2020-04-01
My data has date variable with two different date formats
Date
01 Jan 2019
02 Feb 2019
01-12-2019
23-01-2019
11-04-2019
22-05-2019
I want to convert this string into date(YYYY-mm-dd)
Date
2019-01-01
2019-02-01
2019-12-01
2019-01-23
2019-04-11
2019-05-22
I have tried following things, but I am looking for better approach
df['Date'] = np.where(df['Date'].str.contains('-'), pd.to_datetime(df['Date'], format='%d-%m-%Y'), pd.to_datetime(df['Date'], format='%d %b %Y'))
Working solution for me
df['Date_1']= np.where(df['Date'].str.contains('-'),df['Date'],np.nan)
df['Date_2']= np.where(df['Date'].str.contains('-'),np.nan,df['Date'])
df['Date_new'] = np.where(df['Date'].str.contains('-'),pd.to_datetime(df['Date_1'], format = '%d-%m-%Y'),pd.to_datetime(df['Date_2'], format = '%d %b %Y'))
Just use the option dayfirst=True
pd.to_datetime(df.Date, dayfirst=True)
Out[353]:
0 2019-01-01
1 2019-02-02
2 2019-12-01
3 2019-01-23
4 2019-04-11
5 2019-05-22
Name: Date, dtype: datetime64[ns]
My suggestion:
Define a conversion function as follows:
import datetime as dt
def conv_date(x):
try:
res = pd.to_datetime(dt.datetime.strptime(x, "%d %b %Y"))
except ValueError:
res = pd.to_datetime(dt.datetime.strptime(x, "%d-%m-%Y"))
return res
Now get the new date column as folows:
df['Date_new'] = df['Date'].apply(lambda x: conv_date(x))
You can get your desired result with the help of apply AND to_datetime method of pandas, as given below:-
import pandas pd
def change(value):
return pd.to_datetime(value)
df = pd.DataFrame(data = {'date':['01 jan 2019']})
df['date'] = df['date'].apply(change)
df
I hope it may help you.
This works simply as expected -
import pandas as pd
a = pd. DataFrame({
'Date' : ['01 Jan 2019',
'02 Feb 2019',
'01-12-2019',
'23-01-2019',
'11-04-2019',
'22-05-2019']
})
a['Date'] = a['Date'].apply(lambda date: pd.to_datetime(date, dayfirst=True))
print(a)
I need a Python function to return a Pandas DataFrame with range of dates, only year and month, for example, from November 2016 to March 2017 and have this as result:
year month
2016 11
2016 12
2017 01
2017 02
2017 03
My dates are in string format Y-m (from = '2016-11', to = '2017-03'). I'm not sure on turning them to datetime type or to separate them into two different integer values.
Any ideas on how to achieve it properly?
Are you looking at something like this?
pd.date_range('November 2016', 'April 2017', freq = 'M')
You get
DatetimeIndex(['2016-11-30', '2016-12-31', '2017-01-31', '2017-02-28',
'2017-03-31'],
dtype='datetime64[ns]', freq='M')
To get dataframe
index = pd.date_range('November 2016', 'April 2017', freq = 'M')
df = pd.DataFrame(index = index)
pd.Series(pd.date_range('2016-11', '2017-4', freq='M').strftime('%Y-%m')) \
.str.split('-', expand=True) \
.rename(columns={0: 'year', 1: 'month'})
year month
0 2016 11
1 2016 12
2 2017 01
3 2017 02
4 2017 03
You can use a combination of pd.to_datetime and pd.date_range.
import pandas as pd
start = 'November 2016'
end = 'March 2017'
s = pd.Series(pd.date_range(*(pd.to_datetime([start, end]) \
+ pd.offsets.MonthEnd()), freq='1M'))
Construct a dataframe using the .dt accessor attributes.
df = pd.DataFrame({'year' : s.dt.year, 'month' : s.dt.month})
df
month year
0 11 2016
1 12 2016
2 1 2017
3 2 2017
4 3 2017