I have a column in the following format
Date
June 22
June 23
June 24
June 25
I am trying to convert this column to datetime within a pandas df with the format YYYY-mm-dd
How can I accomplish this? I was able to format the date and convert to mm-dd but not sure how to add the current's year since it's not present in my Date column
df['Date'] = pd.to_datetime(df['Date'], format='%B %d')
Results:
Date
1900-07-22
1900-07-21
1900-07-20
1900-07-19
Desired results:
Date
2021-07-22
2021-07-21
2021-07-20
2021-07-19
Try:
>>> pd.to_datetime(df['Date'].add(' 2021'), format="%B %d %Y")
0 2021-06-22
1 2021-06-23
2 2021-06-24
3 2021-06-25
Name: Date, dtype: datetime64[ns]
Suggested by #HenryEcker, to add the current year instead of specifying 2021:
pd.to_datetime(df['Date'].add(f' {pd.Timestamp.now().year}'), format="%B %d %Y")
Related
I have a Pandas dataframe df that looks as follows:
df = pd.DataFrame({'timestamp' : ['Wednesday, Apr 4/04/22 at 17:02',
'Saturday, Apr 4/23/22 at 15:45'],
'foo' : [1, 2]
})
df
timestamp foo
0 Wednesday, Apr 4/04/22 at 17:02 1
1 Saturday, Apr 4/23/22 at 15:45 2
I'm trying to convert the timestamp column to a datetime object so that I can add a day_of_week column.
My attempt:
df['timestamp'] = pd.to_datetime(df['timestamp'],
format='%A, %b %-m/%-d/%y at %H:%M')
df['day_of_week'] = df['timestamp'].dt.day_name()
The error is:
ValueError: '-' is a bad directive in format '%A, %b %-m/%-d/%y at %H:%M'
Any assistance would be greatly appreciated. Thanks!
Just use the format without the -:
df['timestamp'] = pd.to_datetime(df['timestamp'],
format='%A, %b %m/%d/%y at %H:%M')
df['day_of_week'] = df['timestamp'].dt.day_name()
NB. to_datetime is quite flexible on the provided data, note how the incorrect day of week was just ignored.
output:
timestamp foo day_of_week
0 2022-04-04 17:02:00 1 Monday
1 2022-04-23 15:45:00 2 Saturday
I'm trying to create a new column in this DataFrame with the name of the day of the week, but I can't. Does anyone have any tips?
from datetime import date
days = [
'Segunda-feira',
'Terça-feira',
'Quarta-feira',
'Quinta-feira',
'Sexta-feira',
'Sábado',
'Domingo'
]
import pandas as pd
vacinacao = pd.read_excel("vacinacao_br.xlsx")
vacinacao.head()
UF Data Vacinacao quantidade
0 AC 2021-01-18 00:00:00 1
1 AC 2021-01-19 00:00:00 46
2 AC 2021-01-20 00:00:00 1021
3 AC 2021-01-21 00:00:00 1609
4 AC 2021-01-22 00:00:00 1105
vacinacao['dia_semana'] = vacinacao['Data Vacinacao'].days.weekday.()
AttributeError: 'Series' object has no attribute 'days'
Assuming the dtype of your "Data Vacinado" column is some sort of daytime, you can use dt.day_name.
vacinacao['dia_semana'] = vacinacao['Data Vacinacao'].dt.day_name()
NB: If the dtype of your column is a str, you'll have to convert your dates to datetime objects first:
vacinacao['Data Vacinacao'] = pd.to_datetime(vacinacao['Data Vacinacao'], format="%Y-%m-%d")
You want dt and not days
Try:
#for day number (0, 1, etc.)
vacinacao['dia_semana'] = vacinacao['Data Vacinacao'].dt.weekday
#for day full name (Monday, Tuesday, etc.)
vacinacao['dia_semana'] = vacinacao['Data Vacinacao'].dt.strftime("%A")
#for day short name (Mon, Tue, etc.)
vacinacao['dia_semana'] = vacinacao['Data Vacinacao'].dt.strftime("%a")
Look up from your days list with .weekday() integer:
vacinacao['Data Vacinacao'] = pd.to_datetime(vacinacao['Data Vacinacao'],
format="AC %Y-%m-%d %H:%M:%S")
vacinacao['dia_semana'] = [days[i.weekday()] for i in vacinacao['Data Vacinacao']]
I have a column in the following format
Date
June 22
June 23
June 24
June 25
I am trying to convert this column to datetime within a pandas df with the format YYYY-mm-dd
How can I accomplish this? I was able to format the date and convert to mm-dd but not sure how to add the current's year since it's not present in my Date column
df['Date'] = pd.to_datetime(df['Date'], format='%B %d')
Results:
Date
1900-07-22
1900-07-21
1900-07-20
1900-07-19
Desired results:
Date
2021-07-22
2021-07-21
2021-07-20
2021-07-19
Try:
>>> pd.to_datetime(df['Date'].add(' 2021'), format="%B %d %Y")
0 2021-06-22
1 2021-06-23
2 2021-06-24
3 2021-06-25
Name: Date, dtype: datetime64[ns]
Suggested by #HenryEcker, to add the current year instead of specifying 2021:
pd.to_datetime(df['Date'].add(f' {pd.Timestamp.now().year}'), format="%B %d %Y")
I have a column in a dataframe that I want to convert to a date. The values of the column are either DDMONYYY or DD Month YYYY 00:00:00.000 GMT. For example, one row in the dataframe could have the value 31DEC2002 and the next row could have 31 December 2015 00:00:00.000 GMT. I think this is why I get an error when trying to convert the column to a date using pd.to_datetime or datetime.strptime to convert.
Anyone got any ideas? I'd be very grateful for any help/pointers.
For me working to_datetime with utc=True for converting all values to UTC and errors='coerce' for convert not parseable values to NaT (missing datetime):
df = pd.DataFrame({'date':['31DEC2002','31 December 2015 00:00:00.000 GMT','.']})
df['date'] = pd.to_datetime(df['date'], utc=True, errors='coerce')
print (df)
date
0 2002-12-31 00:00:00+00:00
1 2015-12-31 00:00:00+00:00
2 NaT
I have a CSV with some data that looks like such:
I have many of these files, and I want to read them into DataFrame:
df = pd.read_csv(filepath, engine='c')
df['closingDate'] = pd.to_datetime(df['closingDate'], format='%dd-%mmm-%yy')
df['Fut Expiration Date'] = pd.to_datetime(df['Fut Expiration Date'], format='%d-%m-%yy')
I've tried a multitude of formats, but none seem to work. Is there an alternative?
Actually you do not need to specify the format here. The format is unambiguous, if we convert it without specifying a format, we get:
>>> df
Date
0 1-Dec-99
1 1-Jul-99
2 1-Jun-99
3 1-Nov-99
4 1-Oct-99
5 1-Sep-99
6 2-Aug-99
7 2-Dec-99
>>> pd.to_datetime(df['Date'])
0 1999-12-01
1 1999-07-01
2 1999-06-01
3 1999-11-01
4 1999-10-01
5 1999-09-01
6 1999-08-02
7 1999-12-02
Name: Date, dtype: datetime64[ns]
Alternatively, we can look up the format in the documentation of the datetime module [Python-doc]. We here se that:
%d Day of the month as a zero-padded 01, 02, …, 31
decimal number.
%b Month as locale’s abbreviated name. Jan, Feb, …, Dec (en_US);
Jan, Feb, …, Dez (de_DE)
%y Year without century as a 00, 01, …, 99
zero-padded decimal number.
So we can specify the format as:
>>> pd.to_datetime(df['Date'], format='%d-%b-%y')
0 1999-12-01
1 1999-07-01
2 1999-06-01
3 1999-11-01
4 1999-10-01
5 1999-09-01
6 1999-08-02
7 1999-12-02
Name: Date, dtype: datetime64[ns]
Check out the directives for datetimes here. The following should work, using 3 letter months and 2 digit years:
df['Fut Expiration Date'] = pd.to_datetime(df['Fut Expiration Date'], format='%d-%b-%y')
Use %b for a three letter month. Please see the Python strftime reference: http://strftime.org/
I think you want: w for the day, b for the month, and yy for the year.
I'm assuming the days aren't zero padded, if the days are zero padded then use d instead of w.