Python Pandas cumulative sum of a column plotting - python

I have a daily output for a cropping system data frame with multiple outputs from a crop model. I need to take a cumulative of a column (dThrTime) and add the cumulative values in the data frame as a new column. I was able to that but when I am trying to plot the newly added cumulative column against a daily time series, I get an error : ValueError: Could not interpret value GDU for parameter y
Blockquote
def read_data(file_name, run):
data_df = pd.read_csv(file_name)
data_df['Date'] = data_df['Date'].apply (lambda x: datetime.strptime(x, '%Y-%m-%d'))
data_df['Year'] = data_df['Date'].apply (lambda x: datetime.strftime(x, '%Y'))
data_df['run_name'] = run
return data_df
filenc = "up_goldsboro.csv"
all_df = read_data(filenc, run = "restricted")
#separating data for each crop year
harvest_2011 = datetime.strptime("2012-05-17",'%Y-%m-%d')
planting_2011 = datetime.strptime("2011-10-27",'%Y-%m-%d')
data_2011 = all_df.loc[all_df["Date"]<=harvest_2011]
data_2011 = data_2011.loc[data_2011["Date"]>=planting_2011]
#Taking cumulative of a column named dThrTime
all_df['GDU'] = all_df['dThrTime'].cumsum(axis = 0, skipna = True)
#Plotting error
sns.lineplot(x = "Date", y = "GDU", data=data_2011)
plt.ylabel("Clover Root Biomass in kg/ha")
plt.title("Location:Goldsboro,NC\n Termination Year: 2012\n Clover GDU")
plt.show()
enter image description here

Related

Python plotly how to change X and Y axis in button

I'm working on a graph that ilustrates computer usage each day. I want to have a button that will group dates monthly for last year and set y as AVERAGE (mean) and draw avg line.
My code:
import datetime
import numpy as np
import pandas as pd
import plotly.graph_objects as go
example_data = {"date": ["29/07/2022", "30/07/2022", "31/07/2022", "01/08/2022", "02/08/2022"],
"time_spent" : [15840, 21720, 40020, 1200, 4200]}
df = pd.DataFrame(example_data)
df["date"] = pd.to_datetime(df["date"], dayfirst=True)
df['Time spent'] = df['time_spent'].apply(lambda x:str(datetime.timedelta(seconds=x)))
df['Time spent'] = pd.to_datetime(df['Time spent'])
df = df.drop("time_spent", axis=1)
dfall = df.resample("M", on="date").mean().copy()
dfyearly = dfall.tail(12).copy()
dfweekly = df.tail(7).copy()
dfmonthly = df.tail(30).copy()
del df
dfs = {'Week':dfweekly, 'Month': dfmonthly, 'Year' : dfyearly, "All" : dfall}
for dframe in list(dfs.values()):
dframe['StfTime'] = dframe['Time spent'].apply(lambda x: x.strftime("%H:%M"))
frames = len(dfs) # number of dataframes organized in dict
columns = len(dfs['Week'].columns) - 1 # number of columns i df, minus 1 for Date
scenarios = [list(s) for s in [e==1 for e in np.eye(frames)]]
visibility = [list(np.repeat(e, columns)) for e in scenarios]
lowest_value = datetime.datetime.combine(datetime.date.today(), datetime.datetime.min.time())
highest_value = dfweekly["Time spent"].max().ceil("H")
buttons = []
fig = go.Figure()
for i, (period, df) in enumerate(dfs.items()):
print(i)
for column in df.columns[1:]:
fig.add_bar(
name = column,
x = df['date'],
y = df[column],
customdata=df[['StfTime']],
text=df['StfTime'],
visible=True if period=='Week' else False # 'Week' values are shown from the start
)
#Change display data to more friendly format
fig.update_traces(textfont=dict(size=20), hovertemplate='<b>Time ON</b>: %{customdata[0]}</br>')
#Change range for better scalling
this_value =df["Time spent"].max().ceil("H")
if highest_value <= this_value:
highest_value = this_value
fig.update_yaxes(range=[lowest_value, highest_value])
#Add average value indicator
average_value = df["Time spent"].mean()
fig.add_hline(y=average_value, line_width=3, line_dash="dash",
line_color="green")
# one button per dataframe to trigger the visibility
# of all columns / traces for each dataframe
button = dict(label=period,
method = 'restyle',
args = ['visible',visibility[i]])
buttons.append(button)
fig.update_yaxes(dtick=60*60*1000, tickformat='%H:%M')
fig.update_xaxes(type='date', dtick='D1')
fig.update_layout(updatemenus=[dict(type="dropdown",
direction="down",
buttons = buttons)])
fig.show()
EDIT 1.
Thanks to vestland I managed to get semi-working dropdown.
The problem is that the line added with add_hline affect all bar charts. I want it to display only on the chart that it had been added for. Also after passing in custom data for nicer display, the space between bars is doubled. Any way to fix above issues?

Plot an infinite line between two pandas series points

I want to plot an infinite non ending line between two points that are in the form of a pandas series. I am able to successfully plot a standard line between the points, however I don't want the line to "end" and instead it should continue. Expanding on this I would also like to extract the values of this new infinite line to a new dataframe so that I can see what corresponding line value a given x value in has.
data = yf.download("AAPL", start="2021-01-01", interval = "1d").drop(columns=['Adj Close'])
data = data[30:].rename(columns={"Open": "open", "High": "high", "Low": "low", "Close": "close", "Volume": "volume"})
local_max = argrelextrema(data['high'].values, np.greater)[0]
local_min = argrelextrema(data['low'].values, np.less)[0]
highs = data.iloc[local_max,:]
lows = data.iloc[local_min,:]
highesttwo = highs["high"].nlargest(2)
lowesttwo = lows["low"].nsmallest(2)
fig = plt.figure(figsize=[10,7])
data['high'].plot(marker='o', markevery=local_max)
data['low'].plot(marker='o', markevery=local_min)
highesttwo.plot()
lowesttwo.plot()
plt.show()
Currently my plot looks like this:
How ever I want it to look like this as well as be able to get the values of the line for the corresponding x value.
This can be done in a few steps as shown in the following example where the lines are computed with element-wise operations (i.e. vectorized) using the slope-intercept form of the line equation.
The stock data has a frequency based on the opening dates of the stock exchange. This frequency is not automatically recognized by pandas, therefore the .plot method produces a plot with a continuous date for the x-axis and includes the days with no data. This can be avoided by setting the argument use_index=False so that the x-axis uses integers starting from zero instead.
The challenge is to then create nicely formatted tick labels. The following example attempts to imitate the pandas tick format by using list comprehensions to select the tick locations and format the labels. These will need to be adjusted if the date range is significantly lengthened or shortened.
import numpy as np # v 1.19.2
import pandas as pd # v 1.2.3
import matplotlib.pyplot as plt # v 3.3.4
from scipy.signal import argrelextrema # v 1.6.1
import yfinance as yf # v 0.1.54
# Import data
data = (yf.download('AAPL', start='2021-01-04', end='2021-03-15', interval='1d')
.drop(columns=['Adj Close']))
data = data.rename(columns={'Open': 'open', 'High': 'high', 'Low': 'low',
'Close': 'close', 'Volume': 'volume'})
# Extract points and get appropriate x values for the points by using
# reset_index for highs/lows
local_max = argrelextrema(data['high'].values, np.greater)[0]
local_min = argrelextrema(data['low'].values, np.less)[0]
highs = data.reset_index().iloc[local_max, :]
lows = data.reset_index().iloc[local_min, :]
htwo = highs['high'].nlargest(2).sort_index()
ltwo = lows['low'].nsmallest(2).sort_index()
# Compute slope and y-intercept for each line
slope_high, intercept_high = np.polyfit(htwo.index, htwo, 1)
slope_low, intercept_low = np.polyfit(ltwo.index, ltwo, 1)
# Create dataframe for each line by using reindexed htwo and ltwo so that the
# index extends to the end of the dataset and serves as the x variable then
# compute y values
# High
line_high = htwo.reindex(range(htwo.index[0], len(data))).reset_index()
line_high.columns = ['x', 'y']
line_high['y'] = slope_high*line_high['x'] + intercept_high
# Low
line_low = ltwo.reindex(range(ltwo.index[0], len(data))).reset_index()
line_low.columns = ['x', 'y']
line_low['y'] = slope_low*line_low['x'] + intercept_low
# Plot data using pandas plotting function and add lines with matplotlib function
fig = plt.figure(figsize=[10,6])
ax = data['high'].plot(marker='o', markevery=local_max, use_index=False)
data['low'].plot(marker='o', markevery=local_min, use_index=False)
ax.plot(line_high['x'], line_high['y'])
ax.plot(line_low['x'], line_low['y'])
ax.set_xlim(0, len(data)-1)
# Set major and minor tick locations
tks_maj = [idx for idx, timestamp in enumerate(data.index)
if (timestamp.month != data.index[idx-1].month) | (idx == 0)]
tks_min = range(len(data))
ax.set_xticks(tks_maj)
ax.set_xticks(tks_min, minor=True)
# Format major and minor tick labels
labels_maj = [ts.strftime('\n%b\n%Y') if (data.index[tks_maj[idx]].year
!= data.index[tks_maj[idx-1]].year) | (idx == 0)
else ts.strftime('\n%b') for idx, ts in enumerate(data.index[tks_maj])]
labels_min = [ts.strftime('%d') if (idx+3)%5 == 0 else ''
for idx, ts in enumerate(data.index[tks_min])]
ax.set_xticklabels(labels_maj)
ax.set_xticklabels(labels_min, minor=True)
plt.show()
You can find more examples of tick formatting here and here in Solution 1.
Date string format codes

Fetching date of high and low prices for week based on daily high low prices

First of all I will share objective of running python code.
Getting Daily High and Low Prices for a stock from Yahoo.
Converting the daily high and lows to Weekly High/Lows, monthly High Lows, Yearly High Lows.
Getting exact dates of Weekly or Monthly High Lows from a daily dataframe
Finally after fetching Dates for Weekly(or Monthly)High & lows, I want to arrange the data of what occured first High or Low during the week. for eg. during week ending 12th December, 2020, I get High of the week is 100 and low of week is 97(after completing step 2) and also High date and low date from daily dataframe (from step 3), I want to arrange Prices in order of occurence. so if High happened on 9th December and Low happened on 12th December. The prices will be arranged as 100 in row 1 and then 97 in row 2 and this process repeats for entire data frame.
What I have been able to achieve.
I have completed step 1 and step 2. Struggling in step for 3 as of now.
Have accomplished Step 1 by
import pandas as pd
import yfinance as yf
Ticker = '^NSEI'
f = yf.download(Ticker,period="max")
f = f.drop(['Adj Close'], axis=1)
f = f.drop(['Open'], axis=1)
f = f.drop(['Close'], axis=1)
f = f.drop(['Volume'], axis=1)
f.reset_index(inplace=True)
f.insert(0,'Ticker',Ticker)
Step 2 by
fw = f.groupby(['Ticker', pd.Grouper(key='Date', freq='W')])\
.agg(High=pd.NamedAgg(column='High', aggfunc='max'),
Low=pd.NamedAgg(column='Low', aggfunc='min'))\
.reset_index()
fm = f.groupby(['Ticker', pd.Grouper(key='Date', freq='M')])\
.agg(High=pd.NamedAgg(column='High', aggfunc='max'),
Low=pd.NamedAgg(column='Low', aggfunc='min'))\
.reset_index()
fq = f.groupby(['Ticker', pd.Grouper(key='Date', freq='Q')])\
.agg(High=pd.NamedAgg(column='High', aggfunc='max'),
Low=pd.NamedAgg(column='Low', aggfunc='min'))\
.reset_index()
fy = f.groupby(['Ticker', pd.Grouper(key='Date', freq='Y')])\
.agg(High=pd.NamedAgg(column='High', aggfunc='max'),
Low=pd.NamedAgg(column='Low', aggfunc='min'))\
.reset_index()
Struggling with step 3. used pd.merge, pd.join, pd.concat but unable to combine Weekly dataframe with dataframe on Highs and lows. The no of weekly records increase by performing merge and drop duplcates also didn't work properly when specified keep last.
So if you all can help me in step 3 and 4 would be grateful. Thanks
Solved the query which i posted above. Hope this help others. Thanks
import pandas as pd
import yfinance as yf
import datetime as dt
import numpy as np
Ticker = '^NSEI'
df = yf.download(Ticker, period='max')
df= df.drop(['Open', 'Close', 'Adj Close', 'Volume'], axis = 1).reset_index()
# Daily 3238 columns for reference
#Adding columns for weekly, monthly,6 month,Yearly,
df['WkEnd'] = df.Date.dt.to_period('W').apply(lambda r: r.start_time) + dt.timedelta(days=6)
df['MEnd'] = (df.Date.dt.to_period('M').apply(lambda r: r.end_time)).dt.date
df['6Mend'] = np.where(df.Date.dt.month <= 6,(df.Date.dt.year).astype(str)+'-1H',(df['Date'].dt.year).astype(str)+'-2H')
df['YEnd'] = (df.Date.dt.to_period('Y').apply(lambda r: r.end_time)).dt.date
# key variable for melting
d = {'Date':['Hidate', 'Lodate'], 'Price':['High','Low']}
#creating weekly neoformat
dw = df.groupby(['WkEnd']).agg({'High' : 'max','Low' : 'min' }).reset_index()
dw['Hidate'] = dw[['WkEnd','High']].merge(df,how = 'left').Date
dw['Lodate'] = dw[['WkEnd','Low']].merge(df,how = 'left').Date
dw = pd.lreshape(dw,d)
dw = dw.sort_values(by = ['Date']).reset_index()
dw = dw.drop(['index'], axis = 1)
#creating Monthly neoformat
dm = df.groupby(['MEnd']).agg({'High' : 'max','Low' : 'min' }).reset_index()
dm['Hidate'] = dm[['MEnd','High']].merge(df,how = 'left').Date
dm['Lodate'] = dm[['MEnd','Low']].merge(df,how = 'left').Date
dm = pd.lreshape(dm,d)
dm = dm.sort_values(by = ['Date']).reset_index()
dm = dm.drop(['index'], axis = 1)
#creating 6mth neoformat
d6m = df.groupby(['6Mend']).agg({'High' : 'max','Low' : 'min' }).reset_index()
d6m['Hidate'] = d6m[['6Mend','High']].merge(df,how = 'left').Date
d6m['Lodate'] = d6m[['6Mend','Low']].merge(df,how = 'left').Date
d6m = pd.lreshape(d6m,d)
d6m = d6m.sort_values(by = ['Date']).reset_index()
d6m = d6m.drop(['index'], axis = 1)
#creating Yearly neoformat
dy = df.groupby(['YEnd']).agg({'High' : 'max','Low' : 'min' }).reset_index()
dy['Hidate'] = dy[['YEnd','High']].merge(df,how = 'left').Date
dy['Lodate'] = dy[['YEnd','Low']].merge(df,how = 'left').Date
dy = pd.lreshape(dy,d)
dy = dy.sort_values(by = ['Date']).reset_index()
dy = dy.drop(['index'], axis = 1)

Use a pandas DataFrame created inside a function outside of the function

I am a Python beginner and wrote a function for a simple moving average strategy. I created a portfolio DataFrame inside the function and now I want to use this DataFrame outside of the function for plotting some graphs. My solution is: return portfolio - but this does not work. Can anybody help me?
This is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Import a data source - FSE-Data with Index 'Date'
all_close_prices = pd.read_csv('FSE_daily_close.csv')
all_close_prices = all_close_prices.set_index('Date')
# Fill NaN Values with the last available stock price - except for Zalando
all_close_prices = all_close_prices.fillna(method='ffill')
# Import ticker symbols
ticker_list = list(all_close_prices)
# Zalando 'FSE/ZO1_X' (position row 99) - doesn't begin in 2004
# Drop Zalando
all_close_prices.drop('FSE/ZO1_X', axis=1)
# Also from the ticker list
ticker_list.remove('FSE/ZO1_X')
# Create an empty signal dataframe with datetime index equivalent to the stocks
signals = pd.DataFrame(index=all_close_prices.index)
def ma_strategy(ticker, long_window, short_window):
# Calculate the moving avergaes
moving_avg_long = all_close_prices.rolling(window=long_window, min_periods=1).mean()
moving_avg_short = all_close_prices.rolling(window=short_window, min_periods=1).mean()
moving_avg_short = moving_avg_short
moving_avg_long = moving_avg_long
# Add the two MAs for the stocks in the ticker_list to the signals dataframe
for i in ticker_list:
signals['moving_avg_short_' + i] = moving_avg_short[i]
signals['moving_avg_long_' + i] = moving_avg_long[i]
# Set up the signals
for i in ticker_list:
signals['signal_' + i] = np.where(signals['moving_avg_short_' + i] > signals['moving_avg_long_' + i], 1, 0)
signals['positions_' + i] = signals['signal_' + i].diff(periods=1)
#Backtest
initial_capital = float(100000)
# Create a DataFrame `positions` with index of signals
positions = pd.DataFrame(index=all_close_prices)
# Create a new column in the positions DataFrame
# On the days that the signal is 1 (short moving average crosses the long moving average, you’ll buy a 100 shares.
# The days on which the signal is 0, the final result will be 0 as a result of the operation 100*signals['signal']
positions = 100 * signals[['signal_' + ticker]]
# Store the portfolio value owned with the stock
# DataFrame.multiply(other, axis='columns', fill_value=None) - Multiplication of dataframe and other, element-wise
# Store the difference in shares owned - same like position column in signals
pos_diff = positions.diff()
# Add `holdings` to portfolio
portfolio = pd.DataFrame(index=all_close_prices.index)
portfolio['holdings'] = (positions.multiply(all_close_prices[ticker], axis=0)).sum(axis=1)
# Add `cash` to portfolio
portfolio['cash'] = initial_capital - (pos_diff.multiply(all_close_prices[ticker], axis=0)).sum(
axis=1).cumsum()
# Add `total` to portfolio
portfolio['total'] = portfolio['cash'] + portfolio['holdings']
# Add `returns` to portfolio
portfolio['return'] = portfolio['total'].pct_change()
portfolio['return_cum'] = portfolio['total'].pct_change().cumsum()
return portfolio
ma_strategy('FSE/VOW3_X',20,5)
# Visualize the total value of the portfolio
portfolio_value = plt.figure(figsize=(12, 8))
ax1 = portfolio_value.add_subplot(1, 1, 1, ylabel='Portfolio value in $')
# Plot the equity curve in dollars
portfolio['total'].plot(ax=ax1, lw=2.)
You need to assign your function return value to a variable. The line which says
ma_strategy('FSE/VOW3_X',20,5)
probably needs to change to
portfolio = ma_strategy('FSE/VOW3_X',20,5)

Appending function created column to an existing data frame

I currently have a dataframe as below:
and wish to add a column, E, that is calculated based on the following function.
def geometric_brownian_motion(T = 1, N = 100, mu = 0.1, sigma = 0.01, S0 = 20):
dt = float(T)/N
t = np.linspace(0, T, N)
W = np.random.standard_normal(size = N)
W = np.cumsum(W)*np.sqrt(dt) ### standard brownian motion ###
X = (mu-0.5*sigma**2)*t + sigma*W
S = S0*np.exp(X) ### geometric brownian motion ###
return S
(originating from here)
How to i create a time-series for all of the dates contained within the data-frame and append it?
The function input parameters are as follows:
T = (#days between df row 1 and df last)/365
N = # rows in data frame
S0 = 100
As i understand the essense of question is how to apply some method to every column, taking into account, the fact that to calculate a new value you need an index from dataframe:
I suggest you to extract index as separate column and use apply as usually.
from functools import partial
df['index'] = df.index
T = # precalculate T here
N = df.shape[0]
applying_method = partial(geometric_brownian_motion,T=T,N=N, S0=100)
df['E'] = df.apply(lambda row: applying_method(*row),axis=1)
Or if you rename columns of dataframe accroding to you function arguments:
df['E'] = df.apply(lambda row: applying_method(**row),axis=1)
Hope that helps.

Categories

Resources