Pandas Dataframes: Addition of float to column value based on if condition

Pandas Dataframes: Addition of float to column value based on if condition - python

Relative newbie with Python and Pandas, finally admitting defeat on not being able to figure this out myself. I have a pandas Dataframe from our energy suppliers API, each row is a 30min interval showing wholesale energy costs in p/kWH 'value_exc_vat', the solar output for the house 'export' and a datetime stamp 'datetime'.
| index |'value_exc_vat'|'datetime'|'export'|'hour'|'export_rate'|'export_rate_var'|
'hour' is taken from datetime for each row e.g. 13, 14, 15, 16, etc.
To calculate the price/kWh we are paid i need to calculate
0.97 x 'value_exc_vat' + peak_rate_uplift
peak_rate_uplift is only applied during the hours 16:19 inclusive
I've tried just about every method i can think of but i can't get this to work.
peak_rate = [16,17,18,19]
for hour in df['hour']:
if hour == peak_rate:
df['export_rate_var'] = (df['export_rate'] + peak_rate_uplift)
else:
df['export_rate_var'] = df['export_rate']
Printing the output from the if function i can see that 'hour' is being selected for the correct values but the remainder of the statement doesn't then add the peak_rate_uplift I would expect.
Any advice or help on how to apply the addition to the selected row would be appreciated, feels like it should be something simple but I've been at this for 3 days now...

You could use:
peak_rate = [16,17,18,19]
df['export_rate_var'] = (df['export_rate'] + df.hour.isin(peak_rate) * peak_rate_uplift)
Where df.hour.isin([peak_rate]) returns a boolean series. This multiplied with the integer peak_rate_uplift gives a Series of integers which is 0 where the hour is not in the peak rate hours.

Does this work:
peak_rate = [16,17,18,19]
for i in range(len(df)):
if df.hour.iloc[i].isin(peak_rate):
df['export_rate_var'] = (df['export_rate'] + peak_rate_uplift)
else:
df['export_rate_var'] = df['export_rate']

Related

Upsample timeseries with weather data in a correct way

I have a dataset that holds weather data for each month from 1st day to 20th of month and for each hour of the day throw a year and the last 10 days(with it's hours) of each month are removed.
The weather data are :
(temperature - humidity - wind_speed - visibility - dew_temperature - solar_radiation - rainfall -snowfall)
I want to upsample the dataset as time series to fill the missing data of the days but i face many issue due too the changes of climate.
Here it what is tried so far
def get_hour_month_mean(data,date,hour,max_id):
return { 'ID':max_id,
'temperature':data['temperature'].mean(),
'humidity':data['humidity'].mean(),
'date':date,
'hour':hour,
'wind_speed':data['wind_speed'].mean(),
'visibility':data['visibility'].mean(),
'dew_temperature':data['dew_temperature'].mean(),
'solar_radiation':data['solar_radiation'].mean(),
'rainfall':data['rainfall'].mean(),
'count':data['count'].mean() if str(date.date()) not in seoul_not_func else 0,
'snowfall':data['snowfall'].mean(),
'season':data['season'].mode()[0],
'is_holiday':'No Holiday' if str(date.date()) not in seoul_p_holidays_17_18 else 'Holiday' ,
'functional_day':'Yes' if str(date.date()) not in seoul_not_func else 'No' ,
}
def upsample_data_with_missing_dates(data):
data_range = pd.date_range(
start="2017-12-20", end="2018-11-30", freq='D')
missing_range=data_range.difference(df['date'])
hour_range=range(0,24)
max_id=data['ID'].max()
data_copy=data.copy()
for date in missing_range:
for hour in hour_range:
max_id+=1
year=data_copy.year
month=date.month
if date.month==11:
year-=1
month=12
else:
month+=1
month_mask=((data_copy['year'] == year) &
(data_copy['month'] == month) &
(data_copy['hour'] == hour) &(data_copy['day'].isin([1,2])))
data_filter=data_copy[month_mask]
dict_row=get_hour_month_mean(data_filter,date,hour,max_id)
data = data.append(dict_row, ignore_index=True)
return data
any ideas what is the best way to get the values of the missing days if i have the previous 20 days and the next 20 days ?

There is a lot of manners to deal with missing timeseries values in fact.
You already tried the traditional way, imputing data with mean values. But the drawback of this method is the bias caused by so many values on the data.
You can try a genetic algorithm (GA), Support Vector Machine(SVR), autoregressive(AR) and moving average(MA) for time series imputation and modeling. To overcome the bias problem caused by the tradional method (mean), these methods are used to forecast or/and impute time series.
(Consider that you have a multivariate timeseries)
Here are some ressources you can use :
A Survey on Deep Learning Approaches
time.series.missing-values-in-time-series-in-python
Interpolation in Python to fill Missing Values

Pandas- locate a value based on logical statements

I am using the this dataset for a project.
I am trying to find the total yield for each inverter for the 34 day duration of the dataset (basically use the final and initial value available for each inverter). I have been able to get the list of inverters using pd.unique()(there are 22 inverters for each solar power plant.
I am having trouble querying the total_yield data for each inverter.
Here is what I have tried:
def get_yields(arr: np.ndarray, df:pd.core.frame.DataFrame) -> np.ndarray:
delta = np.zeros(len(arr))
index =0
for i in arr:
initial = df.loc[df["DATE_TIME"]=="15-05-2020 02:00"]
initial = initial.loc[initial["INVERTER_ID"]==i]
initial.reset_index(inplace=True,drop=True)
initial = initial.at[0,"TOTAL_YIELD"]
final = df.loc[(df["DATE_TIME"]=="17-06-2020 23:45")]
final = final.loc[final["INVERTER_ID"]==i]
final.reset_index(inplace=True, drop=True)
final = final.at[0,"TOTAL_YIELD"]
delta[index] = final - initial
index = index + 1
return delta
Reference: arr is the array of inverters, listed below. df is the generation dataframe for each plant.
The problem is that not every inverter has a data point for each interval. This makes this function only work for the inverters at the first plant, not the second one.
My second approach was to filter by the inverter first, then take the first and last data points. But I get an error- 'Series' objects are mutable, thus they cannot be hashed
Here is the code for that so far:
def get_yields2(arr: np.ndarray, df: pd.core.frame.DataFrame) -> np.ndarry:
delta = np.zeros(len(arr))
index = 0
for i in arr:
initial = df.loc(df["INVERTER_ID"] == i)
index += 1
break
return delta
List of inverters at plant 1 for reference(labeled as SOURCE_KEY):
['1BY6WEcLGh8j5v7' '1IF53ai7Xc0U56Y' '3PZuoBAID5Wc2HD' '7JYdWkrLSPkdwr4'
'McdE0feGgRqW7Ca' 'VHMLBKoKgIrUVDU' 'WRmjgnKYAwPKWDb' 'ZnxXDlPa8U1GXgE'
'ZoEaEvLYb1n2sOq' 'adLQvlD726eNBSB' 'bvBOhCH3iADSZry' 'iCRJl6heRkivqQ3'
'ih0vzX44oOqAx2f' 'pkci93gMrogZuBj' 'rGa61gmuvPhdLxV' 'sjndEbLyjtCKgGv'
'uHbuxQJl8lW7ozc' 'wCURE6d3bPkepu2' 'z9Y9gH1T5YWrNuG' 'zBIq5rxdHJRwDNY'
'zVJPv84UY57bAof' 'YxYtjZvoooNbGkE']
List of inverters at plant 2:
['4UPUqMRk7TRMgml' '81aHJ1q11NBPMrL' '9kRcWv60rDACzjR' 'Et9kgGMDl729KT4'
'IQ2d7wF4YD8zU1Q' 'LYwnQax7tkwH5Cb' 'LlT2YUhhzqhg5Sw' 'Mx2yZCDsyf6DPfv'
'NgDl19wMapZy17u' 'PeE6FRyGXUgsRhN' 'Qf4GUc1pJu5T6c6' 'Quc1TzYxW2pYoWX'
'V94E5Ben1TlhnDV' 'WcxssY2VbP4hApt' 'mqwcsP2rE7J0TFp' 'oZ35aAeoifZaQzV'
'oZZkBaNadn6DNKz' 'q49J1IKaHRwDQnt' 'rrq4fwE8jgrTyWY' 'vOuJvMaM2sgwLmb'
'xMbIugepa2P7lBB' 'xoJJ8DcxJEcupym']
Thank you very much.

I can't download the dataset to test this. Getting "To May Requests" Error.
However, you should be able to do this with a groupby.
import pandas as pd
result = df.groupby('INVERTER_ID')['TOTAL_YIELD'].agg(['max','min'])
result['delta'] = result['max']-result['min']
print(result[['delta']])

So if I'm understanding this right, what you want is the TOTAL_YIELD for each inverter for the beginning of the time period starting 5-05-2020 02:00 and ending 17-06-2020 23:45. Try this:
# enumerate lets you have an index value along with iterating through the array
for i, code in enumerate(arr):
# to filter the info to between the two dates, but not necessarily assuming that
# each inverter's data starts and ends at each date
inverter_df = df.loc[df['DATE_TIME'] >= pd.to_datetime('15-05-2020 02:00:00')]
inverter_df = inverter_df.loc[inverter_df['DATE_TIME'] <= pd.to_datetime('17-06-2020
23:45:00')]
inverter_df = inverter_df.loc[inverter_df["INVERTER_ID"]==code]]
# sort by date
inverter_df.sort_values(by='DATE_TIME', inplace= True)
# grab TOTAL_YIELD at the first available date
initial = inverter_df['TOTAL_YIELD'].iloc[0]
# grab TOTAL_YIELD at the last available date
final = inverter_df['TOTAL_YIELD'].iloc[-1]
delta[index] = final - initial

Python pandas finding data in between time

I am using crime statistics (in a data frame)and I am trying to find when most crimes occur between 12 am-8am,8am-4pm, and 4pm-12pm. I have already converted the column to DateTime. the code I used is:
#first attempt
df_15['FIRST_OCCURRENCE_DATE']=pd.date_range('01/01/2015',periods=10000,freq='H')
df_15[(df_15['FIRST_OCCURrENCE_DATE'] > '2015-1-1 00:00:00') & (df_15['FIRST_OCCURRENCE_DATE'] <= '2015-12-31 08:00:00')]
#second attempt
df_15 = df_15.set_index(df_15['FIRST_OCCURRENCE_DATE'])
df_15.loc['2015-01-01 00:00:00':'2015-12-31 00:00:00']
#third attempt
date_rng = pd.date_range(start='00:00:00', end='08:00:00',freq='H')
date_rng1 = pd.DataFrame(date_rng)
date_rng1.head(30)
#fourth attempt
df_15.FIRST_OCCURRENCE_DATE.dt.hour
ts = pd.to_datetime('12/31/2015 08:00:00')
df_15.loc[df_15.FIRST_OCCURRENCE_DATE <= ts,:].head()
The results I get are time entries that go outside of 08:00:00.
PS. all the data is from the same year

Looks like you can just do a little arithmetic and count:
(df_15['FIRST_OCCURrENCE_DATE'].dt.hour // 8).value_counts()
There are a lot of ways to solve this problem but this is likely the simplest. Extract the hour of day from each date, find which time slot it belongs to. Floor-divide by 8 to get 0 (12AM-8AM), 1 (8AM-4PM), or 2 (4PM-12AM) for each, and just count these occurrences.

Python Data manipulation: Duplicate and Average row and column values using dates

Hi I have a dataset in the following format:
Code for replicating the data:
import pandas as pd
d1 = {'Year':
['2008','2008','2008','2008','2008','2008','2008','2008','2008','2008'],
'Month':['1','1','2','6','7','8','8','11','12','12'],
'Day':['6','22','6','18','3','10','14','6','16','24'],
'Subject_A':['','30','','','','35','','','',''],
'Subject_B':['','','','','','','','40','',''],
'Subject_C': ['','','','','','65','','50','','']}
d1 = pd.DataFrame(d1)
I input the numbers as a string to show blank cells
Where the first three columns denotes date (Year, Month and Day) and the following columns represent individuals (My actual data file consists of about 300 such rows and about 1000 subjects. I presented a subset of the data here).
Where the column value refers to expenditure on FMCG products.
What I would like to do is the following:
Part 1 (Beginning and end points)
a) For each individual locate the first observation and duplicate the value of the first observation for atleast the previous six months. For example: Subject C's 1st observation is on the 10th of August 2008. In that case I would want all the rows from June 10, 2008 to be equal to 65 for Subject C (Roughly 2/12/2008
is the cutoff date. SO we leave the 3rd cell from the top for Subject_C's column blank).
b) Locate last observation and repeat the last observation for the following 3 months. For example for Subject_A, we repeat 35 twice (till 6th November 2008).
Please refer to the following diagram for the highlighted cell with the solutions.
Part II - (Rows in between)
Next I would like to do two things (I would need to do the following three steps separately, not all at one time):
For individuals like Subject_A, locate two observations that come one after the other (30 and 35).
i) Use the average of the two observations. In this case we would have 32.5 in the four rows without caring about time.
for eg:
ii) Find the total time between two observations and take the mean of the time. For the 1st half of the time period assign the first value and for the 2nd half assign the second value. For example - for subject 1, the total days between 01/22/208 and 08/10/2008 is 201 days. For the first 201/2 = 100.5 days assign the value of 30 to Subject_A and for the remaining value assign 35. In this case the columns for Subject_A and Subject_C will look like:
The final dataset will use (a), (b) & (i) or (a), (b) & (ii)
Final data I [using a,b and i]
Final data II [using a,b and ii]
I would appreciate any help with this. Thanks in advance. Please let me know if the steps are unclear.
Follow up question and Issues
Thanks #Juan for the initial answer. Here's my follow up question. Suppose that Subject_A has more than 2 observations (code for the example data below). Would we be able to extend this code to incorporate more than 2 observations?
import pandas as pd
d1 = {'Year':
['2008','2008','2008','2008','2008','2008','2008','2008','2008','2008'],
'Month':['1','1','2','6','7','8','8','11','12','12'],
'Day':['6','22','6','18','3','10','14','6','16','24'],
'Subject_A':['','30','','45','','35','','','',''],
'Subject_B':['','','','','','','','40','',''],
'Subject_C': ['','','','','','65','','50','','']}
d1 = pd.DataFrame(d1)
Issues
For the current code, I found an issue for part II (ii). This is the output that I get:
This is actually on the right track. The two cells above 35 does not seem to get updated. Is there something wrong on my end? Also the same question as before, would we be able to extend it to the case of >2 observations?

Here a code solution for subject A. Should work with the other subjects:
d1 = {'Year':
['2008','2008','2008','2008','2008','2008','2008','2008','2008','2008'],
'Month':['1','1','2','6','7','8','8','11','12','12'],
'Day':['6','22','6','18','3','10','14','6','16','24'],
'Subject_A':['','30','','45','','35','','','',''],
'Subject_B':['','','','','','','','40','',''],
'Subject_C': ['','','','','','65','','50','','']}
d1 = pd.DataFrame(d1)
d1 = pd.DataFrame(d1)
## Create a variable named date
d1['date']= pd.to_datetime(d1['Year']+'/'+d1['Month']+'/'+d1['Day'])
# convert to float, to calculate mean
d1['Subject_A'] = d1['Subject_A'].replace('',np.nan).astype(float)
# index of the not null rows
subja = d1['Subject_A'].notnull()
### max and min index row with notnull value
max_id_subja = d1.loc[subja,'date'].idxmax()
min_id_subja = d1.loc[subja,'date'].idxmin()
### max and min date for Sub A with notnull value
max_date_subja = d1.loc[subja,'date'].max()
min_date_subja = d1.loc[subja,'date'].min()
### value for max and min date
max_val_subja = d1.loc[max_id_subja,'Subject_A']
min_val_subja = d1.loc[min_id_subja,'Subject_A']
#### Cutoffs
min_cutoff = min_date_subja-pd.Timedelta(6, unit='M')
max_cutoff = max_date_subja+pd.Timedelta(3, unit='M')
## PART I.a
d1.loc[(d1['date']<min_date_subja) & (d1['date']>min_cutoff),'Subject_A'] = min_val_subja
## PART I.b
d1.loc[(d1['date']>max_date_subja) & (d1['date']<max_cutoff),'Subject_A'] = max_val_subja
## PART II
d1_2i = d1.copy()
d1_2ii = d1.copy()
lower_date = min_date_subja
lower_val = min_val_subja.copy()
next_dates_index = d1_2i.loc[(d1['date']>min_date_subja) & subja].index
for N in next_dates_index:
next_date = d1_2i.loc[N,'date']
next_val = d1_2i.loc[N,'Subject_A']
#PART II.i
d1_2i.loc[(d1['date']>lower_date) & (d1['date']<next_date),'Subject_A'] = np.mean([lower_val,next_val])
#PART II.ii
mean_time_a = pd.Timedelta((next_date-lower_date).days/2, unit='d')
d1_2ii.loc[(d1['date']>lower_date) & (d1['date']<=lower_date+mean_time_a),'Subject_A'] = lower_val
d1_2ii.loc[(d1['date']>lower_date+mean_time_a) & (d1['date']<=next_date),'Subject_A'] = next_val
lower_date = next_date
lower_val = next_val
print(d1_2i)
print(d1_2ii)

How do I avoid a loop with Python/Pandas to build an equity curve?

I am trying to build an equity curve in Python using Pandas. For those not in the know, an equity curve is a cumulative tally of investing profits/losses day by day. The code below works but it is incredibly slow. I've tried to build an alternate using Pandas .iloc and such but nothing is working. I'm not sure if it is possible to do this outside of a loop given how I have to reference the prior row(s).
for today in range(len(f1)): #initiate a loop that runs the length of the "f1" dataframe
if today == 0: #if the index value is zero (aka first row in the dataframe) then...
f1.loc[today,'StartAUM'] = StartAUM #Set intial assets
f1.loc[today,'Shares'] = 0 #dummy placeholder for shares; no trading on day 1
f1.loc[today,'PnL'] = 0 #dummy placeholder for P&L; no trading day 1
f1.loc[today,'EndAUM'] = StartAUM #set ending AUM; should be beginning AUM since no trades
continue #and on to the second row in the dataframe
yesterday = today - 1 #used to reference the rows (see below)
f1.loc[today,'StartAUM'] = f1.loc[yesterday,'EndAUM'] #todays starting aseets are yesterday's ending assets
f1.loc[today,'Shares'] = f1.loc[yesterday,'EndAUM']//f1.loc[yesterday,'Shareprice'] #today's shares to trade = yesterday's assets/yesterday's share price
f1.loc[today,'PnL'] = f1.loc[today,'Shares']*f1.loc[today,'Outcome1'] #Our P&L should be the shares traded (see prior line) multiplied by the outcome for 1 share
#Note Outcome1 came from the dataframe before this loop >> for the purposes here it's value is irrelevant
f1.loc[today,'EndAUM'] = f1.loc[today,'StartAUM']+f1.loc[today,'PnL'] #ending assets are starting assets + today's P&L

There is a good example here: http://www.pythonforfinance.net/category/basic-data-analysis/ and I know that there is an example in Wes McKinney's book Python for Data Analysis. You might be able to find it here: http://wesmckinney.com/blog/python-for-financial-data-analysis-with-pandas/

Have you tried using iterrows() to construct the for loop?
for index, row in f1.iterrows():
if today == 0:
row['StartAUM'] = StartAUM #Set intial assets
row['Shares'] = 0 #dummy placeholder for shares; no trading on day 1
row['PnL'] = 0 #dummy placeholder for P&L; no trading day 1
row['EndAUM'] = StartAUM #set ending AUM; should be beginning AUM since no trades
continue #and on to the second row in the dataframe
yesterday = row[today] - 1 #used to reference the rows (see below)
row['StartAUM'] = row['EndAUM'] #todays starting aseets are yesterday's ending assets
row['Shares'] = row['EndAUM']//['Shareprice'] #today's shares to trade = yesterday's assets/yesterday's share price
row['PnL'] = row['Shares']*row['Outcome1'] #Our P&L should be the shares traded (see prior line) multiplied by the outcome for 1 share
#Note Outcome1 came from the dataframe before this loop >> for the purposes here it's value is irrelevant
row['EndAUM'] = row['StartAUM']+row['PnL'] #ending assets are starting assets + today's P&L
Probably the code is so slow as loc goes through f1 from beginning every time. iterrows() uses the same dataframe as it loops through it row by row.
See more details about iterrows() here.

You need to vectorize the operations (don't iterate with for but rather compute whole column at once)
# fill the initial values
f1['StartAUM'] = StartAUM # Set intial assets
f1['Shares'] = 0 # dummy placeholder for shares; no trading on day 1
f1['PnL'] = 0 # dummy placeholder for P&L; no trading day 1
f1['EndAUM'] = StartAUM # s
#do the computations (vectorized)
f1['StartAUM'].iloc[1:] = f1['EndAUM'].iloc[:-1]
f1['Shares'].iloc[1:] = f1['EndAUM'].iloc[:-1] // f1['Shareprice'].iloc[:-1]
f1['PnL'] = f1['Shares'] * f1['Outcome1']
f1['EndAUM'] = f1['StartAUM'] + f1 ['PnL']
EDIT: this will not work correctly since StartAUM, EndAUM, Shares depend on each other and cannot be computed one without another. I didn't notice that before.

Can you try the following:
#import relevant modules
import pandas as pd
import numpy as np
from pandas_datareader import data
import matplotlib.pyplot as plt
#download data into DataFrame and create moving averages columns
f1 = data.DataReader('AAPL', 'yahoo',start='1/1/2017')
StartAUM = 1000000
#populate DataFrame with starting values
f1['Shares'] = 0
f1['PnL'] = 0
f1['EndAUM'] = StartAUM
#Set shares held to be the previous day's EndAUM divided by the previous day's closing price
f1['Shares'] = f1['EndAUM'].shift(1) / f1['Adj Close'].shift(1)
#Set the day's PnL to be the number of shares held multiplied by the change in closing price from yesterday to today's close
f1['PnL'] = f1['Shares'] * (f1['Adj Close'] - f1['Adj Close'].shift(1))
#Set day's ending AUM to be previous days ending AUM plus daily PnL
f1['EndAUM'] = f1['EndAUM'].shift(1) + f1['PnL']
#Plot the equity curve
f1['EndAUM'].plot()
Does the above solve your issue?

The solution was to use the Numba package. It performs the loop task in a fraction of the time.
https://numba.pydata.org/
The arguments/dataframe can be passed to the numba module/function. I will try to write up a more detailed explanation with code when time permits.
Thanks to all

In case others come across this, you can definitely make an equity curve without loops.
Dummy up some data
import pandas as pd
import numpy as np
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (13, 10)
# Some data to work with
np.random.seed(1)
stock = pd.DataFrame(
np.random.randn(100).cumsum() + 10,
index=pd.date_range('1/1/2020', periods=100, freq='D'),
columns=['Close']
)
stock['ma_5'] = stock['Close'].rolling(5).mean()
stock['ma_15'] = stock['Close'].rolling(15).mean()
Holdings: simple long/short based on moving average crossover signals
longs = stock['Close'].where(stock['ma_5'] > stock['ma_15'], np.nan)
shorts = stock['Close'].where(stock['ma_5'] < stock['ma_15'], np.nan)
# Quick plot
stock.plot()
longs.plot(lw=5, c='green')
shorts.plot(lw=5, c='red')
EQUITY CURVE:
Identify which side (l/s) has first holding (ie: first trade, in this case, short), then keep the initial trade price and subsequently cumulatively sum the daily changes (there would normally be more nan's in the series if you have exit rules as well for when you are out of the market), and finally forward fill over the nan values and fill any last remaining nans with zeros. Its basically the same for the second opposite holdings (in this case, long) except don't keep the starting price. The other important thing is to invert the short daily changes (ie: negative changes should be positive to the PnL).
lidx = np.where(longs > 0)[0][0]
sidx = np.where(shorts > 0)[0][0]
startdx = min(lidx, sidx)
# For first holding side, keep first trade price, then calc daily change fwd and ffill nan's
# For second holdng side, get cumsum of daily changes, ffill and fillna(0) (make sure short changes are inverted)
if lidx == startdx:
lcurve = longs.diff() # get daily changes
lcurve[lidx] = longs[lidx] # put back initial starting price
lcurve = lcurve.cumsum().ffill() # add dialy changes/ffill to build curve
scurve = -shorts.diff().cumsum().ffill().fillna(0) # get daily changes (make declines positive changes)
else:
scurve = -shorts.diff() # get daily changes (make declines positive changes)
scurve[sidx] = shorts[sidx] # put back initial starting price
scurve = scurve.cumsum().ffill() # add dialy changes/ffill to build curve
lcurve = longs.diff().cumsum().ffill().fillna(0) # get daily changes
Add the 2 long/short curves together to get the final equity curve
eq_curve = lcurve + scurve
# quick plot
stock.iloc[:, :3].plot()
longs.plot(lw=5, c='green', label='Long')
shorts.plot(lw=5, c='red', label='Short')
eq_curve.plot(lw=2, ls='dotted', c='orange', label='Equity Curve')
plt.legend()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas Dataframes: Addition of float to column value based on if condition - python

Does this work: peak_rate = [16,17,18,19] for i in range(len(df)): if df.hour.iloc[i].isin(peak_rate): df['export_rate_var'] = (df['export_rate'] + peak_rate_uplift) else: df['export_rate_var'] = df['export_rate']

Related

Upsample timeseries with weather data in a correct way

Pandas- locate a value based on logical statements

Python pandas finding data in between time

Python Data manipulation: Duplicate and Average row and column values using dates

How do I avoid a loop with Python/Pandas to build an equity curve?

Categories

Resources