Missing part of the data in graph - python

I'm writing a program which will show the candlestick chart of Gold and detect patterns. I'm getting the data from yfinance and trying to draw the chart with plotly, but I see that some parts of the data are missing. I checked the data with mplfinance and everything worked successfully, but I need it in plotly.
import plotly.graph_objects as go
import pandas as pd
import yfinance as yf
import talib
import mplfinance as mpf
data = yf.download(tickers="GC=F", period="5d", interval="5m")
fig = go.Figure(data=[go.Candlestick(x=data.index,
open=data['Open'], high=data['High'],
low=data['Low'], close=data['Close'])
])
fig.update_layout(xaxis_rangeslider_visible=False)
fig.write_html('first_figure.html', auto_open=True)

There are indeed some lacunae in the original yahoo data (the website has inconsistent X axis rather than showing gaps). For the purpose of time series analysis, applying data = data.resample('5T').ffill() (or interpolate()) is about the best you can do I presume.
If you wish to imitate yahoo chart behaviour, you'll have to configure rangebreaks like in this question.

I don't have the code for mplfinance so I don't know, but I think nontarading=True is set and the gap is automatically removed. plotly has a feature for market holidays and nighttime exclusions. Since your time series data is in 5 minute increments, set dvalue=1000ms60sec60min minutes. The main point is to prepare and set the time series list you want to remove.
import plotly.graph_objects as go
import pandas as pd
import yfinance as yf
import numpy as np
data = yf.download(tickers="GC=F", period="5d", interval="5m")
full_range = pd.date_range(data.index[0],data.index[-1], freq='5min')
data = data.reindex(full_range, fill_value=np.NaN, axis=0)
delete_range = data[data['Open'].isnull()].index
fig = go.Figure(data=[go.Candlestick(x=data.index,
open=data['Open'], high=data['High'],
low=data['Low'], close=data['Close'])
])
fig.update_layout(xaxis_rangeslider_visible=False)
fig.update_xaxes(rangebreaks=[
dict(values=delete_range, dvalue=3600000)
])
# fig.write_html('first_figure.html', auto_open=True)
fig.show()

Related

Problem : Candlestick chart covering complete y-axis in the form of bars

I am making a candlesticks chart using Plotly python. But my candlesticks cover the whole y-axis and are in the form of bars. Instead, I want them to look like a normal candlestick chart.
The code is presented below:
import pandas as pd
from datetime import datetime
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as pyo
pyo.init_notebook_mode()
df = pd.read_csv(r"MMM_15m_traditional.csv")
fig = go.Figure(data=[go.Candlestick(x=df['Time'],
open=df['Open'],
high=df['Close'],
low=df['Low'],
close=df['High'])])
fig.update_layout(
title='MMM_15m_traditional',
yaxis_title='MMM Data',
)
fig.show()
The output is as under:
But using the sample data from the plotly website into the same code, I get the normal candlesticks.
The code using the plotly website's data is as under :
The output:
The output of the "df.head" is shwon below :
Pls check this below code, which is reproducable in any system and check if you are still facing issue. Install investpy library.
Pls share dataframe df.head() output to check the values in your csv files, may be the data is not correct so you are facing issue.
Always share reproducible code so that other can copy and test in their system.I don't know your df and cannot reproduce from image you uploaded.
try removing hover_text = df['Symbol] from the code.
Reproducible Code-
pip install investpy
import pandas as pd
import investpy
from datetime import datetime
import plotly.graph_objects as go
import plotly.figure_factory as ff
import plotly.express as px
df = investpy.get_index_historical_data(index="Nifty 50",country="India",from_date=("23/03/2022"),to_date= "23/04/2022")
df.tail(10)
fig = go.Figure(data=[go.Candlestick(x=df.index,
open=df['Open'], high=df['High'],
low=df['Low'], close=df['Close'], name = 'Nifty50')])
fig.show()
The Viz is created succesfully-
Your Code with exact df.Also remove column Unnamed 0 from csv or df
df.head()
Date Time Open Close High Low
0 2021-04-12 09:31:00 198.25 198.76 197.54 197.54
1 2021-04-12 09:45:00 198.79 199.10 199.27 198.67
2 2021-04-12 10:00:AM 199.13 198.41 199.29 198.35
3 2021-04-12 10:15:AM 198.46 198.35 198.68 198.27
import plotly.graph_objs as go
import pandas as pd
df = pd.read_excel('hii.xlsx')
df
fig = go.Figure(data=[go.Candlestick(x=df['Time'],
open=df['Open'],
high=df['Close'],
low=df['Low'],
close=df['High'])])
fig.update_layout(
title='MMM_15m_traditional',
yaxis_title='MMM Data',
)
Viz-

matplotlib: Changing x limit dates

I would like to be able to change the x limits so it shows a time frame of my choice.
reproducible example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# libraries for Data Download
import datetime # if you want to play with the dates
from pandas_datareader import data as pdr
import yfinance as yf
df = pdr.get_data_yahoo('ETH-USD', interval = '1d', period = "5y")
plt.figure(figsize=(24,10), dpi=140)
plt.grid()
df['Close'].plot()
df['Close'].ewm(span=50).mean().plot(c = '#4d00ff')
df['Close'].ewm(span=100).mean().plot(c = '#9001f0')
df['Close'].ewm(span=200).mean().plot(c = '#d102e8')
df['Close'].ewm(span=300).mean().plot(c = '#f101c2')
df['Close'].rolling(window=200).mean().plot(c = '#e80030')
plt.title('ETH-USD PLOT',fontsize=25, ha='center')
plt.legend(['C L O S E', 'EMA 50','EMA 100','EMA 200','EMA 300','MA 200', ])
# plt.xlim(['2016-5','2017-05']) # My attempt
plt.show()
when un-commenting the line above I get:
I would have liked '2016-5' to '2017-05' to have taken up the whole plot so I can see more detail.
It seems to me that you xlim works well, however, if I understand your question correctly, you also need to adjust ylim (let's say (0,100) from your graph, as it doesn't seem data within the time period specified goes past value of 100) to stretch data vertically, and so fill the graph efficiently.
try adding plt.ylim((0,100)) together with your commented code
Output:
with your plt.xlim(['2016-5','2017-05']) and plt.ylim((0,100))
with your plt.xlim(['2016-5','2017-05']) and plt.ylim((0,40))
as you can see, due to data variance in the period, you might lose some data information at later dates or have less clear image of movement at earlier dates.

plotly 2 or more column based subplot

I am new to plotly and wanted to visualize some data. I got this plot. see here
But I want to get this in 2 or more column based so that it can be seen better.
Can someone help me with that. Here is my source code what I have tried:
import pandas as pd
import plotly.express as px
fig = px.scatter(data2, x = "Total_System_Cost", y= "Total_CO2_Emissions",
color="Pol_Inst", symbol="Pol_Inst",
facet_row='Technologie',width=600, height=3500)
fig.show()
And the data looks like this.here
In this case you should use facet_col and facet_col_wrap as in this example
import pandas as pd
import plotly.express as px
fig = px.scatter(data2,
x="Total_System_Cost",
y="Total_CO2_Emissions",
color="Pol_Inst",
symbol="Pol_Inst",
facet_col='Technologie',
facet_col_wrap=2, #eventually change this
)
fig.show()
If you then want to use width and height do it so according to data2['Technologie'].nunique() and the value you picked for facet_col_wrap.

How to plot time series graph in jupyter?

I have tried to plot the data in order to achieve something like this:
But I could not and I just achieved this graph with plotly:
Here is the small sample of my data
Does anyone know how to achieve that graph?
Thanks in advance
You'll find a lot of good stuff on timeseries on plotly.ly/python. Still, I'd like to share some practical details that I find very useful:
organize your data in a pandas dataframe
set up a basic plotly structure using fig=go.Figure(go.Scatter())
Make your desired additions to that structure using fig.add_traces(go.Scatter())
Plot:
Code:
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# random data or other data sources
np.random.seed(123)
observations = 200
timestep = np.arange(0, observations/10, 0.1)
dates = pd.date_range('1/1/2020', periods=observations)
val1 = np.sin(timestep)
val2=val1+np.random.uniform(low=-1, high=1, size=observations)#.tolist()
# organize data in a pandas dataframe
df= pd.DataFrame({'Timestep':timestep, 'Date':dates,
'Value_1':val1,
'Value_2':val2})
# Main plotly figure structure
fig = go.Figure([go.Scatter(x=df['Date'], y=df['Value_2'],
marker_color='black',
opacity=0.6,
name='Value 1')])
# One of many possible additions
fig.add_traces([go.Scatter(x=df['Date'], y=df['Value_1'],
marker_color='blue',
name='Value 2')])
# plot figure
fig.show()

Make line chart with multiple series and error bars

I'm hoping to create a line graph which shows the changes to flowering and fruiting times (phenophases) from year to year. For each phenophase I'd like to plot the average Day of Year and, if possible, show the min and max for each year as an error bar. I've filtered down all the data I need in a few data frames, grouped it all in a sensible way, but I can't figure out how to get it all to plot. Here's a screen grab of where I'm at: Imgur
All the examples I've found adding error bars have been based on formulas or other equal amounts over/under, but in my case the max/min will be different so I'm not sure how to integrate that. Possible just create a list of each column's data and feed that to plot? I'm playing with that now but not getting far.
Also, if anyone has general suggestions as to better ways to present this data I'm all ears. I've looked into Gantt plots but didn't get far with them, as this seems a bit more straight-forward just using matplotlib. I'm happy to put some demo data or the rest of my notebook up if anyone thinks that would help.
Edit: Here's some sample data and the code from my notebook: Gist
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline
pd.set_option('display.max_columns', 40)
tick_spacing = 1
dfClean = df[['Site_Cluster', 'Species', 'Phenophase_Name',
'Phenophase_Status', 'Observation_Year', 'Day_of_Year']]
dfClean = dfClean[dfClean.Phenophase_Status == 1]
PhenoNames = ['Open flowers', 'Ripe fruits']
dfLakes = dfClean[(dfClean.Phenophase_Name.isin(PhenoNames))
& (dfClean.Site_Cluster == 'Lakes')
& (dfClean.Species == 'lapponica')]
dfLakesGrouped = dfLakes.groupby(['Observation_Year', 'Phenophase_Name'])
dfLakesReady = dfLakesGrouped.Day_of_Year.agg([np.min, np.mean, np.max]).round(0)
dfLakesReady = dfLakesReady.unstack()
print(dfLakesReady['mean'].plot())
Here's another answer:
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
rng = date_range(start='2015-01-01', periods=5, freq='24H')
df = DataFrame({'y':np.random.normal(size=len(rng))}, index=rng)
y1 = df['y']
y2 = (y1*3)
sd1 = (y1*2)
sd2 = (y1*2)
fig,(ax1,ax2) = plt.subplots(2,1,sharex=True)
_ = y1.plot(yerr=sd1, ax=ax1)
_ = y2.plot(yerr=sd2, ax=ax2)
Output:

Categories

Resources