I have a question somehow similar to what discussed here How to add a year to a column of dates in pandas
however in my case, the number of years to add to the date column is stored in another column. This is my not working code:
import datetime
import pandas as pd
df1 = pd.DataFrame( [ ["Tom",5], ['Jane',3],['Peter',1]], columns = ["Name","Years"])
df1['Date'] = datetime.date.today()
df1['Final_Date'] = df1['Date'] + pd.offsets.DateOffset(years=df1['Years'])
The goal is to add 5 years to the current date for row 1, 3 years to current date in row 2, eccetera.
Any suggestions? Thank you
Convert to time delta by converting years to days, then adding to a converted datetime column:
df1['Final_Date'] = pd.to_datetime(df1['Date']) \
+ pd.to_timedelta(df1['Years'] * 365, unit='D')
Use of to_timedelta with unit='Y' for years is deprecated and throws ValueError.
Edit. If you need day-exact changes, you will need to go row-by-row and update the date objects accordingly. Other answers explain.
Assuming the number of different values in Years is limited, you can try groupby and do the operation with pd.DateOffset like:
df1['new_date'] = (
df1.groupby('Years')
['Date'].apply(lambda x: x + pd.DateOffset(years=x.name))
)
print(df1)
Name Years Date new_date
0 Tom 5 2021-07-13 2026-07-13
1 Jane 3 2021-07-13 2024-07-13
2 Peter 1 2021-07-13 2022-07-13
else you can extract year, month and day, add the Years column to year and recreate a datetime column
df1['Date'] = pd.to_datetime(df1['Date'])
df1['new_date'] = (
df1.assign(year=lambda x: x['Date'].dt.year+x['Years'],
month=lambda x: x['Date'].dt.month,
day=lambda x: x['Date'].dt.day,
new_date=lambda x: pd.to_datetime(x[['year','month','day']]))
['new_date']
)
same result
import datetime
import pandas as pd
df1 = pd.DataFrame( [ ["Tom",5], ['Jane',3],['Peter',1]], columns = ["Name","Years"])
df1['Date'] = datetime.date.today()
df1['Final_date'] = datetime.date.today()
df1['Final_date'] = df1.apply(lambda g: g['Date'] + pd.offsets.DateOffset(years = g['Years']), axis=1)
print(df1)
Try this, you were trying to add the whole column when you called pd.offsets.DateOffset(years=df1['Years']) instead of just 1 value in the column.
EDIT: I changed from iterrows to a vectorization method due to iterrows's poor performance
I have two date formats in one Pandas series (column) that need to be standardized into one format (mmm dd & mm/dd/YY)
Date
Jan 3
Jan 2
Jan 1
12/31/19
12/30/19
12/29/19
Even Excel won't recognize the mmm dd format as a date format. I can change the mmm to a fully-spelled out month using str.replace:
df['Date'] = df['Date'].str.replace('Jan', 'January', regex=True)
But how do I add the current year? How do I then convert January 1, 2020 to 01/01/20?
Have you tried the parse()
from dateutil.parser import parse
import datetime
def clean_date(text):
datetimestr = parse(text)
text = datetime.strptime(datetimestr, '%Y%m%d')
return text
df['Date'] = df['Date'].apply(clean_date)
df['Date'] = pd.to_datetime(df['Date'])
If it's in a data frame use this:
from dateutil.parser import parse
import pandas as pd
for i in range(len(df['Date'])):
df['Date'][i] = parse(df['Date'][i])
df['Date'] = pd.to_datetime(df['Date']).dt.strftime("%d-%m-%Y")
Found the solution (needed to use apply):
df['date'] = df['date'].apply(dateutil.parser.parse)
I am reading a csv file of the number of employees in the US by year and month (in thousands). It starts out like this:
Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
1961,45119,44969,45051,44997,45119,45289,45400,45535,45591,45716,45931,46035
1962,46040,46309,46375,46679,46668,46644,46720,46775,46888,46927,46910,46901
1963,46912,47000,47077,47316,47328,47356,47461,47542,47661,47805,47771,47863
...
I want my Pandas Dataframe to have the datetime as the index for each month's value. I'm doing this so I can later add values for specific time ranges. I want it to look something like this:
1961-01-01 45119.0
1961-02-01 44969.0
1961-03-01 45051.0
1961-04-01 44997.0
1961-05-01 45119.0
...
I did some research and thought that if I stacked the years and months together, I could combine them into a datetime. Here is what I have done:
import pandas as pd
import numpy as np
df = pd.read_csv("BLS_private.csv", header=5, index_col="Year")
df.columns = range(1, 13) # I transformed months into numbers 1-12 for easier datetime conversion
df = df.stack() # Months are no longer columns
print(df)
Here is my output:
Year
1961 1 45119.0
2 44969.0
3 45051.0
4 44997.0
5 45119.0
...
I do not know how to combine the year and the months in the stacked indices. Does stacking the indices help at all in my case? I am also not the most familiar with Pandas datetime, so any explanation about how I could use that would be very helpful. Also if anyone has alternate solutions than making datetime the index, I welcome ideas.
After the stack create the DateTimeIndex from the current index
from datetime import datetime
dt_index = pd.to_datetime([datetime(year=year, month=month, day=1)
for year, month in df.index.values])
df.index = dt_index
df.head(3)
# 1961-01-01 45119
# 1961-02-01 44969
# 1961-03-01 45051
import pandas as pd
df = pd.read_csv("BLS_private.csv", index_col="Year")
dates = pd.date_range(start=str(df.index[0]), end=str(df.index[-1] + 1), closed='left', freq="MS")
df = df.stack()
df.index = dates
df.to_frame()
s = """Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
1961,45119,44969,45051,44997,45119,45289,45400,45535,45591,45716,45931,46035
1962,46040,46309,46375,46679,46668,46644,46720,46775,46888,46927,46910,46901
1963,46912,47000,47077,47316,47328,47356,47461,47542,47661,47805,47771,47863"""
df = pd.read_csv(StringIO(s))
# set index and stack
stack = df.set_index('Year').stack().reset_index()
# create a new index
stack.index = pd.to_datetime(stack['Year'].astype(str) +'-'+ stack['level_1'])
# remove columns
final = stack[0].to_frame()
1961-01-01 45119
1961-02-01 44969
1961-03-01 45051
1961-04-01 44997
1961-05-01 45119
1961-06-01 45289
I have a column with dates looking like this: 10-apr-18.
when I'm transposing my df or doing anything with it, pandas automatically sort this column by the day (the first number) so it's not chronological.
I've tried to use to_datetime but because the month is a string it won't work.
How can I convert this to date OR cancel the automatically sorting (my raw data is already in the right order).
I suggest convert to datetimes with to_datetime and parameter format:
df = pd.DataFrame({'dates':['10-may-18','10-apr-18']})
#also working for me
#df['dates'] = pd.to_datetime(df['dates'])
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%y')
df = df.sort_values('dates')
df['dates'] = df['dates'].dt.strftime('%d-%B-%y')
print (df)
dates
1 10-April-18
0 10-May-18
df = pd.DataFrame({'dates':['10-may-18','10-apr-18']})
#also working for me
#df['dates'] = pd.to_datetime(df['dates'])
df['datetimes'] = pd.to_datetime(df['dates'], format='%d-%b-%y')
df = df.sort_values('datetimes')
df['full'] = df['datetimes'].dt.strftime('%d-%B-%y')
print (df)
dates datetimes full
1 10-apr-18 2018-04-10 10-April-18
0 10-may-18 2018-05-10 10-May-18
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%y').dt.strftime('%d/%B/%y')
This question already has answers here:
Convert Pandas Column to DateTime
(8 answers)
Closed 4 years ago.
I have a pandas dataframe with a column that should indicate the end of a financial quarter. The format is of the type "Q1-2009". Is there a quick way to convert these strings into a timestamp as "2009-03-31"?
I have found only the conversion from the format "YYYY-QQ", but not the opposite.
Create quarters periods with swap quarter and year part by replace and convert to datetimes with PeriodIndex.to_timestamp:
df = pd.DataFrame({'per':['Q1-2009','Q3-2007']})
df['date'] = (pd.PeriodIndex(df['per'].str.replace(r'(Q\d)-(\d+)', r'\2-\1'), freq='Q')
.to_timestamp(how='e'))
print (df)
per date
0 Q1-2009 2009-03-31
1 Q3-2007 2007-09-30
Another solution is use string indexing:
df['date'] = (pd.PeriodIndex(df['per'].str[-4:] + df['per'].str[:2], freq='Q')
.to_timestamp(how='e'))
One solution using a list comprehension followed by pd.offsets.MonthEnd:
# data from #jezrael
df = pd.DataFrame({'per':['Q1-2009','Q3-2007']})
def get_values(x):
''' Returns string with quarter number multiplied by 3 '''
return f'{int(x[0][1:])*3}-{x[1]}'
values = [get_values(x.split('-')) for x in df['per']]
df['LastDay'] = pd.to_datetime(values, format='%m-%Y') + pd.offsets.MonthEnd(1)
print(df)
per LastDay
0 Q1-2009 2009-03-31
1 Q3-2007 2007-09-30