I have a date column in my csv file
This is my Date column data
14/3/18
28/3/18
9/4/2018
How to make the year all become 2018 ?
I have tried this
df['DateTime'] = pd.to_datetime(df['Date'])
print (df['DateTime'])
but it return
1 2018-03-14
2 2018-03-28
3 2018-09-04
The Last column 09 become month but it supposed 04 is month.
Add parameter dayfirst=True:
df['DateTime'] = pd.to_datetime(df['Date'], dayfirst=True)
print (df)
Date DateTime
0 14/3/18 2018-03-14
1 28/3/18 2018-03-28
2 9/4/2018 2018-04-09
You can use .dt.strftime:
df['DateTime'] = pd.to_datetime(df['DateTime']).dt.strftime("%Y-%d-%m")
Output:
0 2018-14-03
1 2018-28-03
2 2018-04-09
Name: A, dtype: object
Related
My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct
Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30
I caught up with this scenario and don't know how can I solve this.
I have the data frame where I am trying to add "week_of_year" and "year" column based in the "date" column of the pandas' data frame which is working fine.
import pandas as pd
df = pd.DataFrame({'date': ['2018-12-31', '2019-01-01', '2019-12-31', '2020-01-01']})
df['date'] = pd.to_datetime(df['date'])
df['week_of_year'] = df['date'].apply(lambda x: x.weekofyear)
df['year'] = df['date'].apply(lambda x: x.year)
print(df)
Current Output
date week_of_year year
0 2018-12-31 1 2018
1 2019-01-01 1 2019
2 2019-12-31 1 2019
3 2020-01-01 1 2020
Expected Output
So here what I am expecting is for 2018 and 2019 the last date was the first week of the new year which is 2019 and 2020 respectively so I want to add logic in the year, where the week is 1 but the date belongs for the previous year so the year column would track that as in the expected output.
date week_of_year year
0 2018-12-31 1 2019
1 2019-01-01 1 2019
2 2019-12-31 1 2020
3 2020-01-01 1 2020
Try:
df['date'] = pd.to_datetime(df['date'])
df['week_of_year'] = df['date'].dt.weekofyear
df['year']=(df['date']+pd.to_timedelta(6-df['date'].dt.weekday, unit='d')).dt.year
Outputs:
date week_of_year year
0 2018-12-31 1 2019
1 2019-01-01 1 2019
2 2019-12-31 1 2020
3 2020-01-01 1 2020
Few things - generally avoid .apply(..).
For datetime columns you can just interact with the date through df[col].dt variable.
Then to get the last day of the week just add to date 6-weekday where weekday is between 0 (Monday) and 6 to the date
TLDR CODE
To get the week number as a series
df['DATE'].dt.isocalendar().week
To set a new column to the week use same function and set series returned to a column:
df['WEEK'] = df['DATE'].dt.isocalendar().week
TLDR EXPLANATION
Use the pd.series.dt.isocalendar().week to get the the week for a given series object.
Note:
column "DATE" must be stored as a datetime column
I have a data frame with a field time of timestamps with dates, and another column period. How can I add a number of days to time based on period?
Current Output:
time period
------------------------------
2020-04-28 10:00:00 1
2020-04-27 12:34:56 3
Expected Output
time
---------------
2020-04-29 10:00:00
2020-04-30 12:34:56
If I try df['time'] = df['time'] + pd.DateOffset(df['period']) I get an error TypeError:nargument must be an integer, got <class 'pandas.core.series.Series'> because it is trying to pass the whole column into the function which expects an integer. How can this be accomplished?
Because days can be converted to timedeltas by to_timedelta is possible use:
df['time'] = df['time'] + pd.to_timedelta(df['period'], unit='d')
print (df)
time period
0 2020-04-29 10:00:00 1
1 2020-04-30 12:34:56 3
But if want add months is necessary use:
df['time'] = df['time'] + df['period'].apply(lambda x: pd.DateOffset(months=x))
print (df)
time period
0 2020-05-28 10:00:00 1
1 2020-07-27 12:34:56 3
If use month timedelatas is working with 'default month', so precision is different:
df['time'] = df['time'] + pd.to_timedelta(df['period'], unit='M')
print (df)
time period
0 2020-05-28 20:29:06 1
1 2020-07-27 20:02:14 3
I just want to extract from my df HH:MM. How do I do it?
Here's a description of the column in the df:
count 810
unique 691
top 2018-07-25 11:14:00
freq 5
Name: datetime, dtype: object
The string value includes a full time stamp. The goal is to parse each row's HH:MM into another df, and to loop back over and extract just the %Y-%m-%d into another df.
Assume the df looks like
print(df)
date_col
0 2018-07-25 11:14:00
1 2018-08-26 11:15:00
2 2018-07-29 11:17:00
#convert from string to datetime
df['date_col'] = pd.to_datetime(df['date_col'])
#to get date only
print(df['date_col'].dt.date)
0 2018-07-25
1 2018-08-26
2 2018-07-29
#to get time:
print(df['date_col'].dt.time)
0 11:14:00
1 11:15:00
2 11:17:00
#to get hour and minute
print(df['date_col'].dt.strftime('%H:%M'))
0 11:14
1 11:15
2 11:17
First convert to datetime:
df['datetime'] = pd.to_datetime(df['datetime'])
Then you can do:
df2['datetime'] = df['datetime'].dt.strptime('%H:%M')
df3['datetime'] = df['datetime'].dt.strptime('%Y-%m-%d')
General solution (not pandas based)
import time
top = '2018-07-25 11:14:00'
time_struct = time.strptime(top, '%Y-%m-%d %H:%M:%S')
short_top = time.strftime('%H:%M', time_struct)
print(short_top)
Output
11:14
I want to convert a string from a dataframe to datetime.
dfx = df.ix[:,'a']
dfx = pd.to_datetime(dfx)
But it gives the following error:
ValueError: day is out of range for month
Can anyone help?
Maybe help add parameter dayfirst=True to to_datetime, if format of datetime is 30-01-2016:
dfx = df.ix[:,'a']
dfx = pd.to_datetime(dfx, dayfirst=True)
More universal is use parameter format with errors='coerce' for replacing values with other format to NaN:
dfx = '30-01-2016'
dfx = pd.to_datetime(dfx, format='%d-%m-%Y', errors='coerce')
print (dfx)
2016-01-30 00:00:00
Sample:
dfx = pd.Series(['30-01-2016', '15-09-2015', '40-09-2016'])
print (dfx)
0 30-01-2016
1 15-09-2015
2 40-09-2016
dtype: object
dfx = pd.to_datetime(dfx, format='%d-%m-%Y', errors='coerce')
print (dfx)
0 2016-01-30
1 2015-09-15
2 NaT
dtype: datetime64[ns]
If format is standard (e.g. 01-30-2016 or 01-30-2016), add only errors='coerce':
dfx = pd.Series(['01-30-2016', '09-15-2015', '09-40-2016'])
print (dfx)
0 01-30-2016
1 09-15-2015
2 09-40-2016
dtype: object
dfx = pd.to_datetime(dfx, errors='coerce')
print (dfx)
0 2016-01-30
1 2015-09-15
2 NaT
dtype: datetime64[ns]
Well in my case
year = 2023
month = 2
date = datetime.date(year, month, 30)
got me this error because February month has 29 or 28 days in it. Maybe that point helps someone