Python: Add Weeks to Date from df

Python: Add Weeks to Date from df - python

How would I add two df columns together (date + weeks):
This works for me:
df['Date'] = pd.to_datetime(startDate, format='%Y-%m-%d') + datetime.timedelta(weeks = 3)
But when I try to add weeks from a column, I get a type error: unsupported type for timedelta weeks component: Series
df['Date'] = pd.to_datetime(startDate, format='%Y-%m-%d') + datetime.timedelta(weeks = df['Duration (weeks)'])
Would appreciate any help thank you!

You can use the pandas to_timelta function to transform the number of weeks column to a timedelta, like this:
import pandas as pd
import numpy as np
# create a DataFrame with a `date` column
df = pd.DataFrame(
pd.date_range(start='1/1/2018', end='1/08/2018'),
columns=["date"]
)
# add a column `weeks` with a random number of weeks
df['weeks'] = np.random.randint(1, 6, df.shape[0])
# use `pd.to_timedelta` to transform the number of weeks column to a timedelta
# and add it to the `date` column
df["new_date"] = df["date"] + pd.to_timedelta(df["weeks"], unit="W")
df.head()
date weeks new_date
0 2018-01-01 5 2018-02-05
1 2018-01-02 2 2018-01-16
2 2018-01-03 2 2018-01-17
3 2018-01-04 4 2018-02-01
4 2018-01-05 3 2018-01-26

Related

Pandas groupby month output is incorrect [duplicate]

My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct

Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')

You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30

How to get rid of MonthEnds type

I am trying to get the delta in months between a starting date and an ending date within Pandas DataFrame. The result is not totally satisfying...
First, the outcome is some sort of Datetime type in the form of <[value] * MonthEnds>. I can't use this to calculate with. First question is how to convert this to an integer. I tried the .n attribute but then I get the following error:
AttributeError: 'Series' object has no attribute 'n'
Second, the outcome is 'missing' one month. Can this be avoided by using another solution/method? Or should I just add 1 month to the answer?
To support my questions I created some simplified code:
dates = [{'Start':'1-1-2020', 'End':'31-10-2020'}, {'Start':'1-2-2020', 'End':'30-11-2020'}]
df = pd.DataFrame(dates)
df['Start'] = pd.to_datetime(df['Start'], dayfirst=True)
df['End'] = pd.to_datetime(df['End'], dayfirst=True)
df['Duration'] = (df['End'].dt.to_period('M') - df['Start'].dt.to_period('M'))
df
This results in:
Start End Duration
0 2020-01-01 2020-10-31 <9 * MonthEnds>
1 2020-02-01 2020-11-30 <9 * MonthEnds>
The preferred result would be:
Start End Duration
0 2020-01-01 2020-10-31 10
1 2020-02-01 2020-11-30 10

Subtract the start-date from the end-date and convert the time delta to months.
import pandas as pd
dates = [{'Start':'1-1-2020', 'End':'31-10-2020'}, {'Start':'1-2-2020', 'End':'30-11-2020'}]
df = pd.DataFrame(dates)
df['Start'] = pd.to_datetime(df['Start'], dayfirst=True)
df['End'] = pd.to_datetime(df['End'], dayfirst=True)
df['Duration'] = (df['End']-df['Start']).astype('<m8[M]').astype(int)+1
print(df)
Output:
Start End Duration
0 2020-01-01 2020-10-31 10
1 2020-02-01 2020-11-30 10

Try This
dates = [{'Start':'1-1-2020', 'End':'31-10-2020'}, {'Start':'1-2-2020', 'End':'30-11-2020'}]
df = pd.DataFrame(dates)
df['Start'] = pd.to_datetime(df['Start'], dayfirst=True)
df['End'] = pd.to_datetime(df['End'], dayfirst=True)
df['Duration'] = (df['End'] - df['Start']).apply(lambda x:x.days//30)
print(df)

How do I condense a pandas data frame where the rows are the months and I'm trying to condense them into years?

So I have a dataframe
https://docs.google.com/spreadsheets/d/19ssG8bvkZKVDR6V5yU9fZVRJbJNfTTEYmWqLwmDwBa0/edit#gid=0
This is the out put that my code gives.
Here is the code:
from yahoofinancials import YahooFinancials
import pandas as pd
import datetime as datetime
df = pd.read_excel('C:/Users/User/Downloads/Div Tickers.xlsx', sheet_name='Sheet1')
tickers_list = df['Ticker'].tolist()
data = pd.DataFrame(columns=tickers_list)
yahoo_financials_ecommerce = YahooFinancials(data)
ecommerce_income_statement_data = yahoo_financials_ecommerce.get_financial_stmts('annual', 'income')
data = ecommerce_income_statement_data['incomeStatementHistory']
df_dict = dict()
for ticker in tickers_list:
df_dict[ticker] = pd.concat([pd.DataFrame(data[ticker][x]) for x in range(len(data[ticker]))],
sort=False, join='outer', axis=1)
df = pd.concat(df_dict, sort=True)
df_l = pd.DataFrame(df.stack())
df_l.reset_index(inplace=True)
df_l.columns = ['ticker', 'financials', 'date', 'value']
df_w = df_l.pivot_table(index=['date.year', 'financials'], columns='ticker', values='value')
export_excel = df_w.to_excel(r'C:/Users/User/Downloads/Income Statement Histories.xlsx', sheet_name="Sheet1", index= True)
How would I go about condensing the months into years so that the data is comparable Year-over-Year?

IIUC, you need to melt, then use groupby on your date column to group by year.
#df['date'] = pd.to_datetime(df['date'])
df = pd.melt(df,id_vars=['date','financials'],var_name='ticker')
df.groupby([df['date'].dt.year,df['financials'],df['ticker']])['value'].sum().unstack()
ticker AEM AGI ALB \
date financials
2016 costOfRevenue 1.030000e+09 309000000.0 1.710000e+09
discontinuedOperations 0.000000e+00 0.0 2.020000e+08
ebit 3.360000e+08 21300000.0 5.370000e+08
grossProfit 1.110000e+09 173000000.0 9.700000e+08
incomeBeforeTax 2.680000e+08 -7600000.0 5.750000e+08
... ... ... ...
2019 researchDevelopment 0.000000e+00 0.0 5.828700e+07
sellingGeneralAdministrative 1.210000e+08 19800000.0 4.390000e+08
totalOperatingExpenses 1.650000e+09 557000000.0 2.830000e+09
totalOtherIncomeExpenseNet -1.000000e+08 2900000.0 -6.900000e+07
totalRevenue 2.490000e+09 683000000.0 3.590000e+09

Not sure since you didn't give us any data, but you can change a datetime column to year with the following code. The first bit is just generating some smaple data:
from datetime import datetime, timedelta
from random import randint
df = pd.DataFrame({
'dates': [datetime.today() - timedelta(randint(0, 1000)) for _ in range(50)]
})
print(df.head())
dates
0 2019-09-02 21:01:46.702300
1 2019-11-03 21:01:46.702329
2 2019-04-01 21:01:46.702338
3 2019-03-04 21:01:46.702345
4 2019-03-28 21:01:46.702351
The part that matters
df.dates.dt.to_period('Y')
0 2018
1 2018
2 2019
3 2018
4 2019
5 2020

Pandas Dataframe Calculate Num Business Days

I am working on a project and I am trying to calculate the number of business days within a month. What I currently did was extract all of the unique months from one dataframe into a different dataframe and created a second column with
df2['Signin Date Shifted'] = df2['Signin Date'] + pd.DateOffset(months=1)
Thus the current dataframe looks like:
I know I can do dt.daysinmonth or a timedelta but that gives me all of the days within a month including Sundays/Saturdays (which I don't want).

Using busday_count from np
Ex:
import pandas as pd
import numpy as np
df = pd.DataFrame({"Signin Date": ["2018-01-01", "2018-02-01"]})
df["Signin Date"] = pd.to_datetime(df["Signin Date"])
df['Signin Date Shifted'] = pd.DatetimeIndex(df['Signin Date']) + pd.DateOffset(months=1)
df["bussDays"] = np.busday_count( df["Signin Date"].values.astype('datetime64[D]'), df['Signin Date Shifted'].values.astype('datetime64[D]'))
print(df)
Output:
Signin Date Signin Date Shifted bussDays
0 2018-01-01 2018-02-01 23
1 2018-02-01 2018-03-01 20
MoreInfo

Add days to date in pandas

I have a data frame that contains 2 columns, one is Date and other is float number.
I would like to add those 2 to get the following:
Index Date Days NewDate
0 20-04-2016 5 25-04-2016
1 16-03-2015 3.7 20-03-2015
As you can see if there is decimal it is converted as int as 3.1--> 4 (days).
I have some weird questions so I appreciate any help.
Thank you !

First, ensure that the Date column is a datetime object:
df['Date'] = pd.to_datetime(df['Date'])
Then, we can convert the Days column to int by ceiling it and the converting it to a pandas Timedelta:
temp = df['Days'].apply(np.ceil).apply(lambda x: pd.Timedelta(x, unit='D'))
Datetime objects and timedeltas can be added:
df['NewDate'] = df['Date'] + temp

You can convert the Days column to timedelta and add it to Date column:
import pandas as pd
df['NewDate'] = pd.to_datetime(df.Date) + pd.to_timedelta(pd.np.ceil(df.Days), unit="D")
df

using combine for two columns calculations and pd.DateOffset for adding days
df['NewDate'] = df['Date'].combine(df['Days'], lambda x,y: x + pd.DateOffset(days=int(np.ceil(y))))
output:
Date Days NewDate
0 2016-04-20 5.0 2016-04-25
1 2016-03-16 3.7 2016-03-20

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Add Weeks to Date from df - python

Related

Pandas groupby month output is incorrect [duplicate]

How to get rid of MonthEnds type

How do I condense a pandas data frame where the rows are the months and I'm trying to condense them into years?

Pandas Dataframe Calculate Num Business Days

Add days to date in pandas

Categories

Resources