Plot horizontal lines between date ranges iterating through pandas dataframe - python

I essentially have two different data frames, one for calculating weekly data (df) and a second one (df1) that has the plot values of the stock/crypto. On df, I have created a pandas column 'pivot' ((open+high+low)/3) using the weekly data to create a set of values containing the weekly pivot values.
Now I want to plot these weekly data (as lines) onto df1 which has the daily data. Therefore the x1 would be the start of the week and x2 be the end of the week. the y values being the pivot value from the df(weekly).
Here is what I would want it to look like:
My Approach & Problem:
First of all, I am a beginner in Python, this is my second month of learning. My apologies if this was asked before.
I know the pivot values can be calculated using a single data frame & pandas group-by but I want to take the issue after this is done, so both ways should be fine if you are approaching this issue. What I would like to have is those final lines with OHLC candlesticks. I would like to plot these results using Plotly OHLC and go Shapes. What I am stuck with is iterating through the pivot weekly data frame and adding the lines as traces on top of the OHLC data daily data.
Here's my code so far:
import yfinance as yf
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta
df = yf.download( tickers = 'BTC-USD',
start = '2021-08-30',
end = datetime.today().strftime('%Y-%m-%d'),
interval = '1wk',
group_by = 'ticker',
auto_adjust = True).reset_index()
#daily df for plot
df2 = yf.download( tickers = 'BTC-USD',
start = '2021-08-30',
end = datetime.today().strftime('%Y-%m-%d'),
interval = '1d',
group_by = 'ticker',
auto_adjust = True).reset_index()
#small cap everything
df = df.rename(columns={'Date':'date',
'Open': 'open',
'High': 'high',
'Low' : 'low',
'Close' : 'close'})
df['pivot'] = (df['high']+ df['low'] + df['close'])/3
result = df.copy()
fig = go.Figure(data = [go.Candlestick(x= df['date'],
open = df['open'],
high = df['high'],
low = df['low'],
close = df['close'],
name = 'Price Candle')])
This would be for plotting until the candlesticks OHLC, however, the rest iteration is what is troubling me. You can plot it on a line chart or on an OHLC chart and iterate it.
fig = px.line(df, x='time', y='close')
result = df.copy()
for i, pivot in result.iterrows():
fig.add_shape(type="line",
x0=pivot.date, y0=pivot, x1=pivot.date, y1=pivot,
line=dict(
color="green",
width=3)
)
fig
When I print this no pivot lines appear the way I want them to show.Only the original price line graph shows
Thanks in advance for taking the time to read this so far.

There are two ways to create a line segment: add a shape or use line mode on a scatter plot. I think the line mode of scatter plots is more advantageous because it allows for more detailed settings. For the data frame, introduce a loop process on a line-by-line basis to get the next line using the idx of the data frame. y-axis values are pivot values. I wanted to get Yokohama, so I moved the legend position up. Also, since we are looping through the scatter plot, we will have many legends for the pivot values, so we set the legend display to True for the first time only.
import yfinance as yf
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta
df = yf.download( tickers = 'BTC-USD',
start = '2021-08-30',
end = datetime.today().strftime('%Y-%m-%d'),
interval = '1wk',
group_by = 'ticker',
auto_adjust = True).reset_index()
#daily df for plot
df2 = yf.download( tickers = 'BTC-USD',
start = '2021-08-30',
end = datetime.today().strftime('%Y-%m-%d'),
interval = '1d',
group_by = 'ticker',
auto_adjust = True).reset_index()
#small cap everything
df = df.rename(columns={'Date':'date',
'Open': 'open',
'High': 'high',
'Low' : 'low',
'Close' : 'close'})
df['pivot'] = (df['high']+ df['low'] + df['close'])/3
fig = go.Figure()
fig.add_trace(go.Candlestick(x= df['date'],
open = df['open'],
high = df['high'],
low = df['low'],
close = df['close'],
name = 'Price Candle',
legendgroup='one'
)
)
#fig.add_trace(go.Scatter(mode='lines', x=df['date'], y=df['pivot'], line=dict(color='green'), name='pivot'))
for idx, row in df.iterrows():
#print(idx)
if idx == len(df)-2:
break
fig.add_trace(go.Scatter(mode='lines',
x=[row['date'], df.loc[idx+1,'date']],
y=[row['pivot'], row['pivot']],
line=dict(color='blue', width=1),
name='pivot',
showlegend=True if idx == 0 else False,
)
)
fig.update_layout(
autosize=False,
height=600,
width=1100,
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1)
)
fig.update_xaxes(rangeslider_visible=False)
fig.show()

Related

Separate heatmap ranges for each row in Plotly

I'm trying to build a timeseries heatmap along a 24-hour day on each day of the week, and I want to have each day be subject within its own values only. Here's what I've done in Plotly so far.
The problem is the "highest" color only goes to the one on the 2nd row. My desired output, made in Excel, is this one:
Each row clearly shows its own green color since they each of them have separate conditional formatting.
My code:
import plotly.express as px
import pandas as pd
df = pd.read_csv('test0.csv', header=None)
fig = px.imshow(df, color_continuous_scale=['red', 'green'])
fig.update_coloraxes(showscale=False)
fig.show()
The csv file:
0,0,1,2,0,5,2,3,3,5,8,4,7,9,9,0,4,5,2,0,7,6,5,7
1,3,4,9,4,3,3,2,12,15,6,9,1,4,3,1,1,2,5,3,4,2,5,8
9,6,7,1,3,4,5,6,9,8,7,8,6,6,5,4,5,3,3,6,4,8,9,10
8,7,8,6,7,5,4,6,6,7,8,5,5,6,5,7,5,6,7,5,8,6,4,4
3,4,2,1,1,2,2,1,2,1,1,1,1,3,4,4,2,2,1,1,1,2,4,3
3,5,4,4,4,6,5,5,5,4,3,7,7,8,7,6,7,6,6,3,4,3,3,3
5,4,4,5,4,3,1,1,1,1,2,2,3,2,1,1,4,3,4,5,4,4,3,4
I've solved it! I had to make the heatmaps by row and combine them.
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import pandas as pd
import calendar
df = pd.read_csv('test0.csv', header=None)
# initialize subplots with vertical_spacing as 0 so the rows are right next to each other
fig = make_subplots(rows=7, cols=1, vertical_spacing=0)
# shift sunday to first position
days = list(calendar.day_name)
days = days[-1:] + days[:-1]
for index, row in df.iterrows():
row_list = row.tolist()
sub_fig = go.Heatmap(
x=list(range(0, 24)), # hours
y=[days[index]], # days of the week
z=[row_list], # data
colorscale=[
[0, '#FF0000'],
[1, '#00FF00']
],
showscale=False
)
# insert heatmap to subplot
fig.append_trace(sub_fig, index + 1, 1)
fig.show()
Output:

How to change the x-axis and y-axis labels in plotly?

How can I change the x and y-axis labels in plotly because in matplotlib, I can simply use plt.xlabel but I am unable to do that in plotly.
By using this code in a dataframe:
Date = df[df.Country=="India"].Date
New_cases = df[df.Country=="India"]['7day_rolling_avg']
px.line(df,x=Date, y=New_cases, title="India Daily New Covid Cases")
I get this output:
In this X and Y axis are labeled as X and Y how can I change the name of X and Y axis to "Date" and "Cases"
simple case of setting axis title
update_layout(
xaxis_title="Date", yaxis_title="7 day avg"
)
full code as MWE
import pandas as pd
import io, requests
df = pd.read_csv(
io.StringIO(
requests.get(
"https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv"
).text
)
)
df["Date"] = pd.to_datetime(df["date"])
df["Country"] = df["location"]
df["7day_rolling_avg"] = df["daily_people_vaccinated_per_hundred"]
Date = df[df.Country == "India"].Date
New_cases = df[df.Country == "India"]["7day_rolling_avg"]
px.line(df, x=Date, y=New_cases, title="India Daily New Covid Cases").update_layout(
xaxis_title="Date", yaxis_title="7 day avg"
)

How to create an animated line plot with ploty express?

I'm trying to create an animated line plot to illustrate the price increase of 3 different asset classes over time (year), but it doesn't work and I don't know why!
What I've done so far:
get closing price data for each asset
start = datetime.datetime(2010,7,01)
end = datetime.datetime(2021,7,01)
data = pdr.get_data_yahoo(['BTC-USD', 'GC=F','^GSPC'],startDate,endDate)['Adj Close']
transpose columns into rows to avoid a lot of calculations
data['Date'] = data.index
data['Year'] = data.index.year
dataNew =data.melt(['Date', 'Year'], var_name='Asset')
dataNew = dataNew.rename(columns = {'value': 'Price'})
plot
fig = px.line(dataNew,
x = 'Date',
y = 'Price',
range_y=[0,50000],
color = 'Asset',
animation_frame = 'Year')
st.write(fig)
Output:
Short answer:
This is in fact very much possible, and the only additions you'll have to make to a standard px.line() time series using axes of type date plot setup is this:
# input data
dfi = px.data.stocks().head(50)
# new datastructure for animation
df = pd.DataFrame() # container for df with new datastructure
for i in np.arange(start,obs):
dfa = dfi.head(i).copy()
dfa['ix']=i
df = pd.concat([df, dfa])
The cool details:
Contrary to what seems to be the most common belief, and contrary to my own comments just a few minutes ago, this is in fact possible to do with px.line and a set of time series as you're describing. As long, as you massage the dataset just a little bit. The only real drawback seems to be that it might not work very well for larger datasets, since the amount of data that the figure structure will contain will be huge. But let's get back to the boring details after the cool stuff. The snippet below and the dataset px.data.stocks() will produce the following figure:
Plot 1 - Animation using the play button:
When the animation has come to an end, you can also subset the lines however you'd like.
Plot 2 - Animation using the slider:
The boring details:
I'll get back to this if the OP or anyone else is interested
Complete code:
import pandas as pd
import numpy as np
import plotly.express as px
# input data
dfi = px.data.stocks().head(50)
dfi['date'] = pd.to_datetime(dfi['date'])
start = 12
obs = len(dfi)
# new datastructure for animation
df = pd.DataFrame() # container for df with new datastructure
for i in np.arange(start,obs):
dfa = dfi.head(i).copy()
dfa['ix']=i
df = pd.concat([df, dfa])
# plotly figure
fig = px.line(df, x = 'date', y = ['GOOG', 'AAPL', 'AMZN', 'FB', 'NFLX', 'MSFT'],
animation_frame='ix',
# template = 'plotly_dark',
width=1000, height=600)
# attribute adjusments
fig.layout.updatemenus[0].buttons[0]['args'][1]['frame']['redraw'] = True
fig.show()

Plotly: How to change the time resolution along the x-axis?

I want to plot some time series data in plotly where the historic portion of the data has a daily resolution and the data for the current day has minute resolution. Is there a way to somehow "split" the x axis so that for the historic data it only shows the date and for the current data it shows time as well?
Currently it looks like this which is not really that readable
I think the only viable approach would be to put together two subplots. But using the correct setup should make the subplots reach pretty much 100% of what you're describing. You'll only need to adjust a few details like:
fig = make_subplots(rows=1, cols=2,
horizontal_spacing = 0,
shared_yaxes=True,
shared_xaxes=True)
Complete code:
# import pandas as pd
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from plotly.subplots import make_subplots
import plotly.graph_objects as go
# custom function to set the first
# minute dataset to contiunue from
# the last day in the day dataset
def next_day(date):
s = date
date = datetime.strptime(s, "%Y-%m-%d")
next_date = date + timedelta(days=1)
return(datetime.strftime(next_date, "%Y-%m-%d"))
# data
np.random.seed(10)
n_days = 5
n_minutes = (2*24)
dfd = pd.DataFrame({'time':[t for t in pd.date_range('2020', freq='D', periods=n_days).format()],
'y':np.random.uniform(low=-1, high=1, size=n_days).tolist()})
dfm = pd.DataFrame({'time':[t for t in pd.date_range(next_day(dfd['time'].iloc[-1]), freq='min', periods=n_minutes).format()],
'y':np.random.uniform(low=-1, high=1, size=n_minutes).tolist()})
dfm['y'] = dfm['y'].cumsum()
# subplot setup
fig = make_subplots(rows=1, cols=2,
horizontal_spacing = 0,
shared_yaxes=True,
shared_xaxes=True)
# trace for days
fig.add_trace(
go.Scatter(x=dfd['time'], y=dfd['y'], name = 'days'),
row=1, col=1
)
# trace for minutes
fig.add_trace(
go.Scatter(x=dfm['time'], y=dfm['y'], name = 'minutes'),
row=1, col=2
)
# some x-axis aesthetics
fig.update_layout(xaxis1 = dict(tickangle=0))
fig.update_layout(xaxis2 = dict(tickangle=90))
fig.add_shape( dict(type="line",
x0=dfd['time'].iloc[-1],
y0=dfd['y'].iloc[-1],
x1=dfm['time'].iloc[0],
y1=dfm['y'].iloc[0],
xanchor = 'middle',
xref = 'x1',
yref = 'y1',
line=dict(dash = 'dash',
color="rgba(0,0,255,0.9)",
width=1
)))
fig.update_xaxes(showgrid=False)
fig.update_layout(template = 'plotly_dark')
fig.show()

Sorting and conditional color formatting in matplotlib

to skip the context and get straight to the question, go down to "desired changes"
I wrote the helper function below to
Fetch data
Calculate the YTD return
Plot the results in a bar plot
Here is the function:
def ytd_perf(symb, col_names, source = 'yahoo'):
import datetime as datetime
from datetime import date
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import seaborn as sns
%pylab inline
#establish start and end dates
start = date(date.today().year, 1, 1)
end = datetime.date.today()
#fetch data
df = web.DataReader(symb, source, start = start, end = end)['Adj Close']
#make sure column orders don't change
df = df.reindex_axis(symb, 1)
#rename the columns
df.columns = col_names
#calc returns from the first element
df = (df / df.ix[0]) - 1
#Plot the most recent line of data -- this represents the YTD return
ax = df.ix[-1].plot(kind = 'bar', title = ('YTD Performance as of '+ str(end)),figsize=(12,9))
vals = ax.get_yticks()
ax.set_yticklabels(['{:3.1f}%'.format(x*100) for x in vals])
So, when I run:
tickers = ['SPY', 'TLT']
names = ['Stocks', 'Bonds']
ytd_perf(tickers, names)
I get the following output:
2 desired changes that I can't quite get to work:
I would like to change the color of the bar such that if the value < 0, it is red.
Sort the bars from highest to lowest (which is the case in this chart because there are only two series, but doesnt work with many series).

Categories

Resources