I have to calculate number of days when temperature was more than 32 degree C, in last 30 days.
I am try use rolling average. The issue is that number of days in a month varies.
weather_2['highTemp_days'] = weather_2.groupby(['date','station'])['over32'].apply(lambda x: x.rolling(len('month')).sum())
weather_2 has 66 stations
date varies from 1950 to 2020
over32 is boolean data. If temp on that date is > 32 then 1 otherwise zero.
month is taken from the date data which is weather_2['month'] = weather_2['date'].dt.month
I used this
weather_2['highTemp_days'] = weather_2.groupby(['year','station'])['over32'].apply(lambda x: x.rolling(30).sum())
This issue was I was grouping by date. That is why the answer was wrong.
Related
I have a df of columns with tickers with rows of daily returns and the index is a datetime index
SPY IWM TLT
2016-01-04 0.914939 0.998960 1.014094
2016-01-05 1.014062 1.002650 1.002819
2016-01-06 0.991911 0.999906 1.014441
2016-01-07 0.937087 0.995280 1.014140
2016-01-08 1.005388 0.999147 0.995572
I have initial weights for each ticker on day one
SPY 50
IWM 25
TLT 25
Each weight has a band
SPY = 40, 60
IWM = 20, 30
TLT = 20, 30
The dataframe goes on for 5 years daily. For the first day I want to calculate the original weight times the return for that day. For every day after that day, I want to calculate the return of that day (which is the previous day value times that days return) and check each day if the weight of any one of the three is outside the band. Weight for each day is the value for that ticker / the sum of that days value. If one of the ticker weights violates the band for 5 days straight, I want to rebalance all three and the next day row should be the original weight divided by the previous days portfolio value.
Example SPY IWM TLT PortValue SPYW IWMW TLTW
XX Date 51.45 27.25 21.54 100.24 51.3 27.18 21.4 No Rebal,nextday*prevday
XX Date 59 29 15 103 57 28 14.5 Rebal, next day below
NEXT DAY 50/103*ret 25/103*ret 25/103*ret
I have tried everything. lambda functions, np.where, for loops, if statements, nested variations of all of the above. I cant get around the bool test for the index for the first day and make that work with the rest of the days where the next row is contingent upon the calculation of the previous row and not on the datetime index location
Interesting question. Something along the lines of the following should work - obviously, with many modifications to reflect your actual data. Note that this totally disregards the datetime index since it's informational and doesn't affect the outcome:
portfolio = [50,25,25] #starting amount of each investment in the portfolio
allocations =[.50,.25,.25] #base allocations among the portfolio investments
tickers = list(df3.columns.values)
bands = [[.4,.6],[.20,.30],[.20,.30]] #permissible bands for each investment/ticker
violations = {ticker:0 for ticker in tickers} #initialize a violations counter for each ticker
#start iterating through each day in the dataframe:
for i, row in df.iterrows():
yields = row.to_list() #extract the daily yields
portfolio = [investment*yld for investment, yld in zip(portfolio,yields) ] #recalculate the new value of each investment
weights = [investment/sum(portfolio)for investment in portfolio] #recalculate the new relative weight of each investment
for weight,band in zip(weights, bands):
ticker = tickers[weights.index(weight)] #for each ticker -
#check if it's outside its permitted band
if weight<band[0] or weight>band[1]:
violations[ticker] +=1 #if it is, increment its specific violations counter
else:
violations[ticker]=0 #if not, reset the counter to zero, to account for non-consecutive violations
if violations[ticker] == 5: #5 consecutive violations detected....
portfolio = [sum(portfolio)*allocation for allocation in allocations] #rebalance the portofolio
I'm working in machine learning regression problem where i predict sales value based on input features. In which date is one of the feature and i want to fetch month and week number from the date. Month gives in 1 to 12 that is okay. but for weeks i get between 1 to 52, which is also correct but i'm trying to get week number in range of 1 to 5, some months have 4 weeks and some have 5.
I have tried available methods for getting week number but it gives in range of 1 to 52 only. I can not just simply divide this by 4, otherwise no month will have 5 weeks.
this code gives output in range of 1 to 52 and i have also tried several other methods.
df['Week'] = df['Date'].dt.week
it should return like if a particular date is belong to fifth week of month than it should give week number 5.
Week number refers to week of year in most contexts. Week of month is not a standard notion and is thus not implemented in pandas. You can implement it yourself. See e.g. this question on Stackoverflow.
I have data over the timespan of over a year. I am interested in grouping the data by week, and getting the slope of two variables by week. Here is what the data looks like:
Date | Total_Sales| Products
2015-12-30 07:42:50| 2900 | 24
2015-12-30 09:10:10| 3400 | 20
2016-02-07 07:07:07| 5400 | 25
2016-02-07 07:08:08| 1000 | 64
So ideally I would like to perform a linear regression on total_sales and products on each week of this data and record the slope. This works when each week is represented in the data, but I have problems when there are some weeks skipped in the data. I know I could do this with turning the date into the week number but I feel like the result will be skewed because there is over a year's worth of data.
Here is the code I have so far:
df['Date']=pd.to_datetime(vals['EventDate']) - pd.to_timedelta(7,unit='d')
df.groupby(pd.Grouper(key='Week', freq='W-MON')).apply(lambda v: linregress(v.Total_Sales, v.Products)[0]).reset_index()
However, I get the following error:
ValueError: Inputs must not be empty.
I expect the output to look like this:
Date | Slope
2015-12-28 | -0.008
2016-02-01 | -0.008
I assume this is happening because python is unable to groupby properly and also it is unable to recognise datetime as key ,as Date column has varying timestamp too.
Try the following code.It worked for me:
df['Date']=pd.to_datetime(df['Date']) #### Converts Date column to Python Datetime
df['daysoffset'] = df['Date'].apply(lambda x: x.weekday())
#### Return the day of the week as an integer, where Monday is 0 and Sunday is 6.
df['week_start'] = df.apply(lambda x: x['Date'].date()-timedelta(days=x['daysoffset']), axis=1)
#### x.['Date'].date() removes timestamp and considers only Date
#### the line assigns date corresponding to last Monday to column 'week_start'.
df.groupby('week_start').apply(lambda v: stats.linregress(v.Total_Sales,v.Products)
[0]).reset_index()
We have some ready available sales data for certain periods, like 1week, 1month...1year:
time_pillars = pd.Series(['1W', '1M', '3M', '1Y'])
sales = pd.Series([4.75, 5.00, 5.10, 5.75])
data = {'time_pillar': time_pillars, 'sales': sales}
df = pd.DataFrame(data)
I would like to do two operations.
Firstly, create a new column of date type, df['date'], that corresponds to the actual date of 1week, 1month..1year from now.
Then, I'd like to create another column df['days_from_now'], taking how many days are on these pillars (1week would be 7days, 1month would be around 30days..1year around 365days).
The goal of this is then to use any day as input for a a simple linear_interpolation_method() to obtain sales data for any given day (eg, what are sales for 4Octobober2018? ---> We would interpolate between 3months and 1year).
Many thanks.
I'm not exactly sure what you mean regarding your interpolation, but here is a way to make your dataframe in pandas (starting from your original df you provided in your post):
from datetime import datetime
from dateutil.relativedelta import relativedelta
def create_dates(df):
df['date'] = [i.date() for i in
[d+delt for d,delt in zip([datetime.now()] * 4 ,
[relativedelta(weeks=1), relativedelta(months=1),
relativedelta(months=3), relativedelta(years=1)])]]
df['days_from_now'] = df['date'] - datetime.now().date()
return df
create_dates(df)
sales time_pillar date days_from_now
0 4.75 1W 2018-04-11 7 days
1 5.00 1M 2018-05-04 30 days
2 5.10 3M 2018-07-04 91 days
3 5.75 1Y 2019-04-04 365 days
I wrapped it in a function, so that you can call it on any given day and get your results for 1 week, 3 weeks, etc. from that exact day.
Note: if you want your days_from_now to simply be an integer of the number of days, use df['days_from_now'] = [i.days for i in df['date'] - datetime.now().date()] in the function, instead of df['days_from_now'] = df['date'] - datetime.now().date()
Explanation:
df['date'] = [i.date() for i in
[d+delt for d,delt in zip([datetime.now()] * 4 ,
[relativedelta(weeks=1), relativedelta(months=1),
relativedelta(months=3), relativedelta(years=1)])]]
Takes a list of the date today (datetime.now()) repeated 4 times, and adds a relativedelta (a time difference) of 1 week, 1 month, 3 months, and 1 year, respectively, extracts the date (i.date() for ...), finally creating a new column using the resulting list.
df['days_from_now'] = df['date'] - datetime.now().date()
is much more straightforward, it simply subtracts those new dates that you got above from the date today. The result is a timedelta object, which pandas conveniently formats as "n days".
I have a Pandas dataframe with the index of daily timestep just as below:
oldman.head()
Value
date
1992-01-01 1080.4
1992-01-02 1080.4
1992-01-03 1080.4
1992-01-04 1080.0
1992-01-05 1079.6
...
starting from 1992-01-01 to 2016-12-31. I want to extract weekly mean values of each year. However, my weeks should be in special way. There should be 52 weeks in a year with 365 days but with the last week of 8 days! The first week should start from January 1st of each year.
I am wondering how am I supposed to extract this kind of weeks from a daily timestep data.
Thanks,
I modified COLDSPEED's solution a bit an added in the last week as 8 days. It's worth noting that on leap years that last "week" is actually 9 days. The following example will only work when you include all of a year. The reason for this is that my function assumes the last row in the groupby is actually the last week of the year.
#make some data
df = pd.DataFrame(index=pd.date_range("1992-1-1","1992-12-31"))
df["value"] = 1
#add a counting variable
df["count"] = 1
df = df.groupby(pd.Grouper(freq='Y'))\
.resample('7D')\
.sum()\
.reset_index(level=0, drop=True)\
def chop_last_week(df):
df1=df.copy()
df1.iloc[-2] += df1.iloc[-1]
return df1.iloc[:-1]
df = df.groupby(df.index.year)\
.apply(chop_last_week)\
.reset_index(level=0, drop=True)
df["mean"] = df["value"]/df["count"]
df.tail(5)
It's not the cleanest solution but it runs quickly.