Matplotlib incorrect X & Y-axis despite correct data - python

My code creates 4 stock graphs (data taken from yahoo with pandas datareader) with matplotlib. The problem i SOMETIMES get is the X & Y-axis on the first graph created in the sequence of 4 graphs being created one after the other looks like this:
Matplotlib scales it entirely wrong, the numbers given to it are not years ranging from 1970 to 2020, It is only given 5 days. But as stated earlier it only sometimes happens and when the code creates the other 3 graphs there are absolutely no problems. I believe this is due to some issue happening when it creates a graph that ranges from 5-10 dollars and then quickly another graph that ranges from 50-100 dollars. I do have a time sleep between every graph being created but it doesn't seem to work.
I also can't scale it manually every time because the code has to create graphs of various stocks with various ranges on the axis.
#Libraries
import pandas as pd
import pandas_datareader as web
import matplotlib.pyplot as plt
import datetime
date = datetime.date.today()
def Graphmaker(ticker, start, end, interval):
stock_data = web.DataReader(ticker, data_source = "yahoo", start = start, end = end) #Gets information
plt.plot(stock_data['Close']) #Plots
plt.autoscale()
plt.axis('off') #Removes axis
graph = plt.gcf()
#
plt.switch_backend('agg')
#
plt.draw()
#plt.show()
date = datetime.date.today()
file_name = "graph_" + ticker + "_" + interval + "_day_graph_" + str(date) + ".png"
graph.savefig('Data/test/' + file_name, dpi=100) #Saves graph, Adds the graph image to be tested which later gets moved to 'train'
plt.clf() #Without this the graphs stack on each other and create incorrect lines
return(file_name) #Passing the file name back to main which gets passed to predictor

Related

Display only time on axis with matplotlib.plot_dates

So I've spent some time managing to plot data using time on the x-axis, and the way I've found to do that is to use matplotlib.plot_date after converting datetime objects to pltdates objects.
X_d = pltdates.date2num(X) # X is an array containing datetime objects
(...)
plt.plot_date(X_d, Y)
It works great, all my data is plotted properly.
Plot with dates appearing on x-axis
However, all the measures I want to plot were made the same day (17/12/2021), the only difference is the time.
As shown on the image, matplotlib still displays the number of the the day (17th) although it is the same within the whole plot.
Anyone has a clue how to keep only the time, still using matplotlib.plot_date?
Use this example:
import matplotlib
import matplotlib.pyplot as plt
from datetime import datetime
origin = ['2020-02-05 04:11:55',
'2020-02-05 05:01:51',
'2020-02-05 07:44:49']
a = [datetime.strptime(d, '%Y-%m-%d %H:%M:%S') for d in origin]
b = ['35.764299', '20.3008', '36.94704']
x = matplotlib.dates.date2num(a)
formatter = matplotlib.dates.DateFormatter('%H:%M')
figure = plt.figure()
axes = figure.add_subplot(1, 1, 1)
axes.xaxis.set_major_formatter(formatter)
plt.setp(axes.get_xticklabels(), rotation=15)
axes.plot(x, b)
plt.show()

How can I adjust the bounds of the x tick values that are automatically chosen by matplotlib?

I have a graph that shows the closing price of a stock throughout a day at each five minute interval. The x axis shows the time and the range of x values is from 9:30 to 4:00 (16:00).
The problem is that the automatic bounds for the x axis go from 9:37 to 16:07 and I really just want it from 9:30 to 16:00.
The code I am currently running is this:
stk = yf.Ticker(ticker)
his = stk.history(interval="5m", start=start, end=end).values.tolist() #open - high - low - close - volume
x = []
y = []
count = 0
five_minutes = datetime.timedelta(minutes = 5)
for bar in his:
x.append((start + five_minutes * count))#.strftime("%H:%M"))
count = count + 1
y.append(bar[3])
plt.clf()
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
plt.gca().xaxis.set_major_locator(mdates.MinuteLocator(interval=30))
plt.plot(x, y)
plt.gcf().autofmt_xdate()
plt.show()
And it produces this plot (currently a link because I am on a new user account):
I thought I was supposed to use the axis.set_data_interval function providing, so I did so by providing datetime objects representing 9:30 and 16:00 as the min and the max. This gave me the error:
TypeError: '<' not supported between instances of 'float' and 'datetime.datetime'
Is there another a way for me to be able to adjust the first xtick and still have it automatically fill in the rest?
This problem can be fixed by adjusting the way you use the mdates tick locator. Here is an example based on the one shared by r-beginners to make it comparable. Note that I use the pandas plotting function for convenience. The x_compat=True argument is needed for it to work with mdates:
import pandas as pd # 1.1.3
import yfinance as yf # 0.1.54
import matplotlib.dates as mdates # 3.3.2
# Import data
ticker = 'AAPL'
stk = yf.Ticker(ticker)
his = stk.history(period='1D', interval='5m')
# Create pandas plot with appropriately formatted x-axis ticks
ax = his.plot(y='Close', x_compat=True, figsize=(10,5))
ax.xaxis.set_major_locator(mdates.MinuteLocator(byminute=[0, 30]))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M', tz=his.index.tz))
ax.legend(frameon=False)
ax.figure.autofmt_xdate(rotation=0, ha='center')
The sample data was created by obtaining Apple's stock price from Yahoo Finance. The desired five-minute interval labels are a list of strings obtained by using the date function to get the start and end times at five-minute intervals.
Based on this, the x-axis is drawn as a graph of the number of five-minute intervals and the closing price, and the x-axis is set to any interval by slicing.
import yfinance as yf
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import numpy as np
ticker = 'AAPL'
stk = yf.Ticker(ticker)
his = stk.history(period='1D',interval="5m")
his.reset_index(inplace=True)
time_rng = pd.date_range('09:30','15:55', freq='5min')
labels = ['{:02}:{:02}'.format(t.hour,t.minute) for t in time_rng]
fig, ax = plt.subplots()
x = np.arange(len(his))
y = his.Close
ax.plot(x,y)
ax.set_xticks(x[::3])
ax.set_xticklabels(labels[::3], rotation=45)
plt.show()

How to make overlay plots of a variable, but every plot than i want to make has a different length of data

I want to overlay 30 plots, each of those is the Temperature of one day, to make at the end a comparison of the develop of the Temperature and how much differ from one day to another , the problem is that when i separate the data(separate the 30 days) in pandas, every day data set has different length,for example the first day has 54977 Temperature data , and the second day has 54988 ant the third also differ so the thing I want in resume is: overlay 30 plots and in the resultant graphic the x axis use the time ticks of the first day, and the other 29 plots just match to those ticks and reduce the data to a limit in the plot to make them all start from a point a finish in other it doesnt matter if some hours or data get lost, i just want to make something like this(see last image).
The code so far is this, im not very good in python so dont judge my long code
`
import pandas as pd
from datetime import date
import datetime as dt
import calendar
import numpy as np
import pylab as plt
import matplotlib.ticker as ticker
import seaborn as sns
>
datos = pd.read_csv("Jun2018T.txt", sep = ',', names=('Fecha', 'Hora', 'RADNETA', 'RADCORENT', 'RADCORSAL', 'RADINFENT', 'RADINFSAL', 'TEMP'))
>
datos['Hora'] = datos['Hora'].str[:9]
datos['Hora']
>
Dia01Jun2018 = datos[datos['Fecha'] == "2018-06-01"]
>
tiempo01=Dia01Jun2018['Hora']
temp01=Dia01Jun2018['TEMP']
>
imagen = plt.figure(figsize=(25,10))
plt.plot(tiempo01,temp01)
plt.xticks(np.arange(0, 54977, 7000)) #the number 54977 is the last data that the first day has, the second day has a different length an so on with the rest of the days
plt.xlabel("Tiempo (H:M:S)(Formato 24 Horas)")
plt.ylabel("Temperatura (K)")
plt.title("Día 01 Jun 2018")
plt.show()
imagen.savefig('D1JUN2018')
`
The code above repeats for every day, maybe with a cycle is more quickly but i don handle python very good.
And the result of this is this graph is the next one:
enter image description here
The graph that i want is this
enter image description here
Mi data is represented in this form
enter image description here
and this are the formats
enter image description here
if I understood your question right, that you want to plot all days in a single plot, you have togenerate one figure, plt.plot() all days before you finally plt.show() the image including all plots made before. Try something like shown below:
(as I don't know your data, I don't know if this code would work. the concept should be clear at least.)
import pandas as pd
from datetime import date
import datetime as dt
import calendar
import numpy as np
import pylab as plt
import matplotlib.ticker as ticker
import seaborn as sns
>
datos = pd.read_csv("Jun2018T.txt", sep = ',', names=('Fecha', 'Hora', 'RADNETA', 'RADCORENT', 'RADCORSAL', 'RADINFENT', 'RADINFSAL', 'TEMP'))
>
datos['Hora'] = datos['Hora'].str[:9]
>
imagen = plt.figure(figsize=(25,10))
for day in range(1,31):
dia = datos[datos['Fecha'] == "2018-06-"+(f"{day:02d}")]
tiempo= pd.to_datetime(dia['HORA'], format='%H:%M:%S').dt.time
temp= dia['TEMP']
plt.plot(tiempo, temp)
#plt.xticks(np.arange(0, 54977, 7000))
plt.xlabel("Tiempo (H:M:S)(Formato 24 Horas)")
plt.ylabel("Temperatura (K)")
plt.title("Jun 2018")
plt.show()
imagen.savefig('JUN2018')
For the second part of your question:
as your data is stored with an timestamp, you can transform it to pandas time objects. Using them for plots, the x-axis should not have an offset anymore. I've modified the tiempo =... assignment in the code above.
The x-tics should automatically be in time mode now.

Plotting a times series using matplotlib with 24 hours on the y-axis

If I run the following, it appears to work as expected, but the y-axis is limited to the earliest and latest times in the data. I want it to show midnight to midnight. I thought I could do that with the code that's commented out. But when I uncomment it, I get the correct y-axis, yet nothing plots. Where am I going wrong?
from datetime import datetime
import matplotlib.pyplot as plt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-01-04 11:55:09']
x = []
y = []
for i in range(0, len(data)):
t = datetime.strptime(data[i], '%Y-%m-%d %H:%M:%S')
x.append(t.strftime('%Y-%m-%d')) # X-axis = date
y.append(t.strftime('%H:%M:%S')) # Y-axis = time
plt.plot(x, y, '.')
# begin = datetime.strptime('00:00:00', '%H:%M:%S').strftime('%H:%M:%S')
# end = datetime.strptime('23:59:59', '%H:%M:%S').strftime('%H:%M:%S')
# plt.ylim(begin, end)
plt.show()
Edit: I also noticed that the x-axis isn't right either. The data skips Jan 2, but I want that on the axis so the data is to scale.
This is a dramatically simplified version of code dealing with over a year's worth of data with over 2,500 entries.
If Pandas is available to you, consider this approach:
import pandas as pd
data = pd.to_datetime(data, yearfirst=True)
plt.plot(data.date, data.time)
_=plt.ylim(["00:00:00", "23:59:59"])
Update per comments
X-axis date formatting can be adjusted using the Locator and Formatter methods of the matplotlib.dates module. Locator finds the tick positions, and Formatter specifies how you want the labels to appear.
Sometimes Matplotlib/Pandas just gets it right, other times you need to call out exactly what you want using these extra methods. In this case, I'm not sure why those numbers are showing up, but this code will remove them.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time)
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%m-%d')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)

Matplotlib xticks as days

So I do have a simple question. I have a program which simulates a week/month of living of a shop. For now it takes care of cashdesks (I don't know if I transalted that one correctly from my language), as they can fail sometimes, and some specialist has to come to the shop and repair them. At the end of simulation, program plots a graph which look like this:
The 1.0 state occurs when the cashdesk has gotten some error/broke, then it waits for a technician to repair it, and then it gets back to 0, working state.
I or rather my project guy would rather see something else than minutes on the x axis. How can I do it? I mean, I would like it to be like Day 1, then an interval, Day 2, etc.
I know about pyplot.xticks() method, but it assigns the labels to the ticks that are in the list in the first argument, so then I would have to make like 2000 labels, with minutes, and I want only 7, with days written on it.
You can use matplotlib set_ticks and get_xticklabels() method of ax, inspired by this and this questions.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
minutes_in_day = 24 * 60
test = pd.Series(np.random.binomial(1, 0.002, 7 * minutes_in_day))
fig, ax = plt.subplots(1)
test.plot(ax = ax)
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, minutes_in_day))
labels = ['Day\n %d'%(int(item.get_text())/minutes_in_day+ 1) for item in ax.get_xticklabels()]
ax.set_xticklabels(labels)
I get something like the picture below.
You're on the right track with plt.xticks(). Try this:
import matplotlib.pyplot as plt
# Generate dummy data
x_minutes = range(1, 2001)
y = [i*2 for i in x_minutes]
# Convert minutes to days
x_days = [i/1440.0 for i in x_minutes]
# Plot the data over the newly created days list
plt.plot(x_days, y)
# Create labels using some string formatting
labels = ['Day %d' % (item) for item in range(int(min(x_days)), int(max(x_days)+1))]
# Set the tick strings
plt.xticks(range(len(labels)), labels)
# Show the plot
plt.show()

Categories

Resources