does anyone know how to add an entry signal onto my graph so that I can see it visually? I want to see an entry signal on my graph when the SPY price hits $411.50.
I've read through everything I can find online regarding this topic including the mplfinance documentation and it seems like there's a way to do it by using addplot() scatter plot but I can't quite figure it out.
Here's what my code looks like so far:
# Libraries:
import pandas as pd
import mplfinance as mpf
# Pulling data:
data = pd.read_csv(r'C:\Users\Viktor\Documents\D2DT\Reformatted_Data\SPY_2021-04-12-2021-04-12.csv')
# Pre-processing data for mpf:
data['Date'] = pd.to_datetime(data['Date'])
data = data.set_index('Date')
# Plotting Data:
mpf.plot(data, type='candle', volume=True, style='yahoo', title="SP500 Data")
data.close()
Inside CSV:
Thank You for your help with this!
#r-beginners, I messed around with the documentation in the link you provided in the last comment and got it to work. Thank you. I'm posting the code below, hopefully this is helpful to somebody in the future!
import pandas as pd
import numpy as np
import mplfinance as mpf
day = '2021-04-12'
data = pd.read_csv(r'C:\Users\Viktor\Documents\D2DT\Reformatted_Data\SPY_' + str(day) + "-" + str(day) + '.csv')
data['Date'] = pd.to_datetime(data['Date'])
data = data.set_index('Date')
signal = []
for i in data['close']:
if i > 411.50:
signal.append(i)
else:
signal.append(np.nan)
apd = mpf.make_addplot(signal, type='scatter', markersize=200, marker='v', color='r')
mpf.plot(data[day], type='candle', volume=True, style='yahoo', title="SP500 Data", addplot=apd)
This single line replaces the for loop and is more efficient. It adds the required column (which we called signal) to the dataframe for plotting.
target = 411.50
df.loc[df['close'] >target, 'signal'] = True
For reference, use this site:
https://datagy.io/pandas-conditional-column/
basically df.loc[df[‘column’] condition, ‘new column name’] = ‘value if condition is met’
So the full solution looks something like this:
import pandas as pd
import numpy as np
import mplfinance as mpf
day = '2021-04-12'
data = pd.read_csv(r'C:\Users\Viktor\Documents\D2DT\Reformatted_Data\SPY_' + str(day) + "-" + str(day) + '.csv')
data['Date'] = pd.to_datetime(data['Date'])
data = data.set_index('Date')
target = 411.50
df.loc[df['close'] >target, 'signal'] = True
apd = mpf.make_addplot(df['signal'], type='scatter', markersize=200, marker='v', color='r')
mpf.plot(data[day], type='candle', volume=True, style='yahoo', title="SP500 Data", addplot=apd)
Related
I am trying create a simple line graph using "X" axis as date/time; "Y" as number of calls. I have fix a few syntax errors along the way (I am extremely new to python) as I inherited a position at work. My predecessor didnt leave anything to help and the code I am running mostly comes from his notes.
This is the error
13 df.plot(x='Datetime',y='calls',) # figure.gca means "get current axis"
This is the entire code
import pandas as pd
import datetime
import csv
import matplotlib.pyplot as plt
import pylab as pl
pl.xticks(rotation = 90)
headers = ['calls','date','time']
df = pd.read_csv('C:\\Users\\cbordelon\\Documents\\Python\\testcr1.csv',parse_dates= {"Datetime" : [1,2]},names=headers)
#pd.to_datetime(df['Date'] + ' ' + df['Time'])
#df.apply(lambda r : pd.datetime.combine(r['Date'],r['Time']),)
print (df)
#f = plt.figure(figsize=(10, 10))
df.plot(x='Datetime',y='calls') # figure.gca means "get current axis"
df.Datetime=pd.to_datetime(df.Datetime)
df.set_index('Datetime')
df['calls'].plot()
plt.title('calls over year', color='black')
plt.tight_layout()
plt.show()
Any help would be greatly appreciated.
Edit1:
I tried df=df.astype(float) before .plot and nothing changed so I assume that my csv is reading correctly.
Edit2:
Correcting code per Raphael suggestions
Edit 3:
Now I am getting "No numeric data to plot" Damn this is frustrating.
When you use to_datetime() you can specify which string format you wish to convert to Dates. The one you provided should actually be interpreted correctly in any case though. I think in this case you shouldn't use parse_dates (or at least do not do both).
Here is a simplified example:
import pandas
import matplotlib.pyplot as plt
df = pandas.DataFrame({"Datetime": ["4/23/2021 12:43:54", "7/24/2021 10:43:54", "9/27/2021 08:43:54", "9/30/2021 11:43:54"],
"calls": [76, 12, 1, 53]})
df["Datetime"] = pandas.to_datetime(df["Datetime"], format="%m/%d/%Y %H:%M:%S")
df.plot(x="Datetime", y="calls")
plt.show()
In your case you would replace the df = ... with just df = pd.read_csv('C:\\Users\\cbordelon\\Documents\\Python\\testcr1.csv', names=headers). So you wouldn't do the parse_dates or converting to float because this code expects exactly these string inputs in this format.
Edit:
Modified for your usage:
import pandas
import matplotlib.pyplot as plt
headers = ['calls','date','time']
df = pandas.read_csv('C:\\Users\\cbordelon\\Documents\\Python\\testcr1.csv', names=headers)
df["Datetime"] = df['date'] + ' ' + df['time']
df["Datetime"] = pandas.to_datetime(df["Datetime"], format="%m/%d/%Y %H:%M:%S")
df.plot(x="Datetime", y="calls")
plt.show()
import pandas
import matplotlib.pyplot as plt
headers = ['date', 'time', 'calls']
df = pandas.read_csv('C:\\Users\\cbordelon\\Documents\\Python\\testcr2.csv', names=headers, skiprows=1)
df['date'].astype(str)
df['time'].astype(str)
df["datetime"] = df['date'] + ' ' + df['time']
df["datetime"] = pandas.to_datetime(df["datetime"], format="%m/%d/%Y %H:%M:%S")
df.plot(x="datetime",y="calls", kind="line")
plt.show()
I am trying to add an arrow on a given date and price to mpf plot. To do this i have the following code:
import pandas as pd
import yfinance as yf
import datetime
from dateutil.relativedelta import relativedelta
import pandas as pd, mplfinance as mpf, matplotlib.pyplot as plt
db = yf.download(tickers='goog', start=datetime.datetime.now()-relativedelta(days=7), end= datetime.datetime.now(), interval="5m")
db = db.dropna()
a = db['Close'][31:32]
test = mpf.make_addplot(a, type='scatter', markersize=200, marker='^')
mpf.plot(db, type='candle', style= 'charles', addplot=test)
But it is producing the following error:
ValueError: x and y must be the same size
Could you please advise how can i resolve this.
The data passed into mpf.make_addplot() must be the same length as the dataframe passed into mpf.plot(). To plot only some points, the remaining points must be filled with nan values (float('nan'), or np.nan).
You can see this clearly in the documentation at cell **In [7]** (and used in the following cells). See there where the signal data is generated as follows:
def percentB_belowzero(percentB,price):
import numpy as np
signal = []
previous = -1.0
for date,value in percentB.iteritems():
if value < 0 and previous >= 0:
signal.append(price[date]*0.99)
else:
signal.append(np.nan) # <- Make `nan` where no marker needed.
previous = value
return signal
Note: alternatively the signal data can be generated by first initializing to all nan values, and then replacing those nans where you want your arrows:
signal = [float('nan')]*len(db)
signal[31] = db['Close'][31:32]
test = mpf.make_addplot(signal, type='scatter', markersize=200, marker='^')
...
If your ultimate goal is to add an arrow to the title of the question, you can add it in the way shown in #Daniel Goldfarb's How to add value of hlines in y axis using mplfinance python. I used this answer to create a code that meets the end goal. As you can see in the answer, the way to do this is to get the axis and then add an annotation for that axis, where 31 is the date/time index and a[0] is the closing price.
import pandas as pd
import yfinance as yf
import datetime
from dateutil.relativedelta import relativedelta
import pandas as pd
import mplfinance as mpf
import matplotlib.pyplot as plt
db = yf.download(tickers='goog', start=datetime.datetime.now()-relativedelta(days=7), end= datetime.datetime.now(), interval="5m")
db = db.dropna()
a = db['Close'][31:32]
#test = mpf.make_addplot(a, type='scatter', markersize=200, marker='^')
fig, axlist = mpf.plot(db, type='candle', style= 'charles', returnfig=True)#addplot=test
axlist[0].annotate('X', (31, a[0]), fontsize=20, xytext=(34, a[0]+20),
color='r',
arrowprops=dict(
arrowstyle='->',
facecolor='r',
edgecolor='r'))
mpf.show()
I am trying to generally recreate this graph and struggling with adding a column to the hovertemplate of a plotly Scatter. Here is a working example:
import pandas as pd
import chart_studio.plotly as py
import plotly.graph_objects as go
dfs = pd.read_html('https://coronavirus.jhu.edu/data/mortality', header=0)
df = dfs[0]
percent = df['Case-Fatality'] # This is my closest guess, but isn't working
fig = go.Figure(data=go.Scatter(x=df['Confirmed'],
y = df['Deaths'],
mode='markers',
hovertext=df['Country'],
hoverlabel=dict(namelength=0),
hovertemplate = '%{hovertext}<br>Confirmed: %{x}<br>Fatalities: %{y}<br>%{percent}',
))
fig.show()
I'd like to get the column Cast-Fatality to show under {percent}
I've also tried putting in the Scatter() call a line for text = [df['Case-Fatality']], and switching {percent} to {text} as shown in this example, but this doesn't pull from the dataframe as hoped.
I've tried replotting it as a px, following this example but it throws the error dictionary changed size during iteration and I think using go may be simpler than px but I'm new to plotly.
Thanks in advance for any insight for how to add a column to the hover.
As the question asks for a solution with graph_objects, here are two that work-
Method (i)
Adding %{text} where you want the variable value to be and passing another variable called text that is a list of values needed in the go.Scatter() call. Like this-
percent = df['Case-Fatality']
hovertemplate = '%{hovertext}<br>Confirmed: %{x}<br>Fatalities: %{y}<br>%{text}',text = percent
Here is the complete code-
import pandas as pd
import plotly.graph_objects as go
dfs = pd.read_html('https://coronavirus.jhu.edu/data/mortality', header=0)
df = dfs[0]
percent = df['Case-Fatality'] # This is my closest guess, but isn't working
fig = go.Figure(data=go.Scatter(x=df['Confirmed'],
y = df['Deaths'],
mode='markers',
hovertext=df['Country'],
hoverlabel=dict(namelength=0),
hovertemplate = '%{hovertext}<br>Confirmed: %{x}<br>Fatalities: %{y}<br>%{text}',
text = percent))
fig.show()
Method (ii)
This solution requires you to see the hoverlabel as when you pass x unified to hovermode. All you need to do then is pass an invisible trace with the same x-axis and the desired y-axis values. Passing mode='none' makes it invisible. Here is the complete code-
import pandas as pd
import plotly.graph_objects as go
dfs = pd.read_html('https://coronavirus.jhu.edu/data/mortality', header=0)
df = dfs[0]
percent = df['Case-Fatality'] # This is my closest guess, but isn't working
fig = go.Figure(data=go.Scatter(x=df['Confirmed'],
y = df['Deaths'],
mode='markers',
hovertext=df['Country'],
hoverlabel=dict(namelength=0)))
fig.add_scatter(x=df.Confirmed, y=percent, mode='none')
fig.update_layout(hovermode='x unified')
fig.show()
The link you shared is broken. Are you looking for something like this?
import pandas as pd
import plotly.express as px
px.scatter(df,
x="Confirmed",
y="Deaths",
hover_name="Country",
hover_data={"Case-Fatality":True})
Then if you need to use bold or change your hover_template you can follow the last step in this answer
Drawing inspiration from another SO question/answer, I find that this is working as desired and permits adding multiple cols to the hover data:
import pandas as pd
import plotly.express as px
fig = px.scatter(df,
x="Confirmed",
y="Deaths",
hover_name="Country",
hover_data=[df['Case-Fatality'], df['Deaths/100K pop.']])
fig.show()
I have two functions which both create a diagramm. But when I run those 2 functions, in the second one is the data which should be in the first one. Here are the diagramms:
This diagramm shows the temerature
And this one should only show the humidity data. Not the humidity and the temperature data.
Here is my source code:
from pandas import DataFrame
import sqlite3
import matplotlib.pyplot as plt
import pandas as pd
from datetime import date, datetime
datum = str(date.today())
date = [datum]
con = sqlite3.connect("/home/pi/test2.db")
sql = "SELECT * from data4 WHERE date in (?)"
df3 = pd.read_sql_query(sql,con, params=[datum])
def daily_hum():
df3 = pd.read_sql_query(sql,con, params=[datum])
df3['datetime'] = pd.to_datetime((df3.date + ' ' + df3.time))
df3.groupby([df3.datetime]).hum.mean().plot()
plt.savefig('/home/pi/flask/static/daily_hum.jpg')
def daily_temp1():
df4 = pd.read_sql_query(sql,con, params=[datum])
df4['datetime'] = pd.to_datetime((df4.date + ' ' + df4.time))
df4.groupby([df4.datetime]).temp.mean().plot()
plt.savefig('/home/pi/flask/static/daily_temp.jpg')
daily_temp()
daily_hum()
The database/ the DataFrame looks like this:
id,hum,temp,zeit,date
721,60,21,11:04:23,2020-06-21
722,64,22,11:04:24,2020-06-21
723,68,22,11:04:27,2020-06-21
724,70,22,11:07:20,2020-06-21
725,63,22,11:08:20,2020-06-21
726,63,22,11:09:21,2020-06-21
727,63,22,11:10:22,2020-06-21
728,63,22,11:11:22,2020-06-21
729,69,22,11:12:24,2020-06-21
730,64,22,11:13:29,2020-06-21
731,70,22,11:14:32,2020-06-21
732,64,22,11:15:33,2020-06-21
733,64,22,11:16:34,2020-06-21
734,64,22,11:17:34,2020-06-21
735,64,22,11:18:35,2020-06-21
736,64,22,11:19:35,2020-06-21
737,64,22,11:20:36,2020-06-21
738,64,22,11:21:37,2020-06-21
739,64,22,11:22:37,2020-06-21
740,64,22,11:23:38,2020-06-21
741,65,22,11:24:38,2020-06-21
742,65,22,11:25:39,2020-06-21
743,65,22,11:26:40,2020-06-21
744,65,22,11:27:40,2020-06-21
I hope you can help me
You could try this. Matplotlib needs to know, if you want a new figure for each plot or not.
from pandas import DataFrame
import sqlite3
import matplotlib.pyplot as plt
import pandas as pd
from datetime import date, datetime
datum = str(date.today())
date = [datum]
con = sqlite3.connect("/home/pi/test2.db")
sql = "SELECT * from data4 WHERE date in (?)"
df3 = pd.read_sql_query(sql,con, params=[datum])
df3['datetime'] = pd.to_datetime((df3.date + ' ' + df3.time))
# new figure
fig, ax = plt.subplots()
# Some figure modifying code
fig.suptitle('Titel of Figure')
ax.set_xlabel('X-Label')
ax.set_ylabel('Y-Label')
df3.groupby([df3.datetime]).hum.mean().plot(ax=ax)
plt.savefig('/home/pi/flask/static/daily_hum.jpg')
# new figure
fig, ax = plt.subplots()
# Some figure modifying code
fig.suptitle('Titel of Figure')
ax.set_xlabel('X-Label')
ax.set_ylabel('Y-Label')
df3.groupby([df3.datetime]).temp.mean().plot(ax=ax)
plt.savefig('/home/pi/flask/static/daily_temp.jpg')
I'm hoping to create a line graph which shows the changes to flowering and fruiting times (phenophases) from year to year. For each phenophase I'd like to plot the average Day of Year and, if possible, show the min and max for each year as an error bar. I've filtered down all the data I need in a few data frames, grouped it all in a sensible way, but I can't figure out how to get it all to plot. Here's a screen grab of where I'm at: Imgur
All the examples I've found adding error bars have been based on formulas or other equal amounts over/under, but in my case the max/min will be different so I'm not sure how to integrate that. Possible just create a list of each column's data and feed that to plot? I'm playing with that now but not getting far.
Also, if anyone has general suggestions as to better ways to present this data I'm all ears. I've looked into Gantt plots but didn't get far with them, as this seems a bit more straight-forward just using matplotlib. I'm happy to put some demo data or the rest of my notebook up if anyone thinks that would help.
Edit: Here's some sample data and the code from my notebook: Gist
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline
pd.set_option('display.max_columns', 40)
tick_spacing = 1
dfClean = df[['Site_Cluster', 'Species', 'Phenophase_Name',
'Phenophase_Status', 'Observation_Year', 'Day_of_Year']]
dfClean = dfClean[dfClean.Phenophase_Status == 1]
PhenoNames = ['Open flowers', 'Ripe fruits']
dfLakes = dfClean[(dfClean.Phenophase_Name.isin(PhenoNames))
& (dfClean.Site_Cluster == 'Lakes')
& (dfClean.Species == 'lapponica')]
dfLakesGrouped = dfLakes.groupby(['Observation_Year', 'Phenophase_Name'])
dfLakesReady = dfLakesGrouped.Day_of_Year.agg([np.min, np.mean, np.max]).round(0)
dfLakesReady = dfLakesReady.unstack()
print(dfLakesReady['mean'].plot())
Here's another answer:
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
rng = date_range(start='2015-01-01', periods=5, freq='24H')
df = DataFrame({'y':np.random.normal(size=len(rng))}, index=rng)
y1 = df['y']
y2 = (y1*3)
sd1 = (y1*2)
sd2 = (y1*2)
fig,(ax1,ax2) = plt.subplots(2,1,sharex=True)
_ = y1.plot(yerr=sd1, ax=ax1)
_ = y2.plot(yerr=sd2, ax=ax2)
Output: