I am often working interactively in an ipython shell or ipython notebook.
Say, I had a pandas.DataFrame with a DatetimeIndex, like this:
idx = pd.date_range("12:00", periods=400000, freq="S")
df = pd.DataFrame({"temp":np.random.normal(size=len(idx))}, index=idx)
In order to plot it, I can simply do:
plt.plot(df.temp, '.')
As one can see, neither do I have to specify the x-axis data, as it is nicely inferred from the DataFrame, nor do I have to specify that I actually want the x-axis to be date based. (Gone are the times of plt.plot_date)
That's awesome!
But the x-axis looks ugly in two ways:
the labels overlap and are hard to read.
the labels show hours instead of dates.
One can almost repair this problem like, e.g. like this:
plt.plot(df.temp, '.')
import matplotlib.dates as mdates
plt.gca().xaxis.set_major_formatter(
mdates.DateFormatter('%d-%m-%Y %H:%M:%S'))
plt.gcf().autofmt_xdate()
As one can see in the resulting plot, the leftmost date label is clipped.
So by increasing the code size by 300% one can almost get a nice plot.
Now to my question:
I can for my life not remember these 2..3 lines, which I'll have to type in always, when making date based plots. It makes the interface feel clumsy and slow. I always have to google for the solution ...
Can I setup matplotlib in a way, that it kind of remembers what my personal defaults are with regard to date based plotting?
I guess, I could actually hack into the plot function. But maybe there is a way using these matplotlib.rc_params, which I wasn't able to find.
As I said above, plt.plot is going a long way to actually guess what I want. It guesses the x-axis data to be the index of the DataFrame .. and it guesses it should actually plot a date based x-axis instead of a the numerical representation of the dates. How can I add something to this?
I'm thinking of maybe even give it some hints like:
plt.plot(df.temp, '.', date_fmt='%d-%m-%Y %H:%M:%S')
or
plt.plot(df.temp, '.', autofmt_xdate=True)
You can use DateFormatter:
import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(8,5))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y %H:%M:%S'))
#rotates the tick labels automatically
fig.autofmt_xdate()
ax.plot(df["T"], '.')
plt.show()
Related
I want to plot a continuous 'Time' column against dates on a simple timeseries linechart in plotly express. The 'Time' column starts out as a string, in the format HH:MM:SS, but when I plot this outright it is treated as discrete values. To remedy this and make it continuous I tried converting to timedelta data type, using pd.to_timedelta. This correctly converts my column into nanoseconds and the shape of the line and axis looks correct. However I do not want to display the axis as nanoseconds, or any other fixed unit, I would like it to display as HH:MM:SS, but am unsure how I might format this.
There is no easy way to do this in express. But if you use Plotly go you can use this piece of code directly from their website.
fig.update_xaxes(
ticktext=["End of Q1", "End of Q2", "End of Q3", "End of Q4"],
tickvals=["2016-04-01", "2016-07-01", "2016-10-01"],
)
It will map ticket vals display name to the matching entry in ticket text. This should maintain scale if tickvals are a scalar. In addition in this example you can have these pieces of text loop every year after.
Here is the link to their website: Plotly Axes with Labels
I am using mplfinance package to plot candlestick charts of the stock. I am currently trying to figure out how can I change the formatting of the volume in the mplfinance. In all examples provided by the package, and in my own chart the volume comes out in strange notation like 1e23 etc. I would like my volume to reflect the numerical value of what is actually in the pandas dataframe. I trade myself and when I am looking at charts anywhere on the actual trading platforms, it shows normal, it actually shows the volume. But when I look at matplotlib, pandas, mplfinance examples online, the notations is formatted in a strange way everywhere.
Example of what I am talking about
Alternatively, to show the volumes not in scientific notation, but keeping the original values (not scaled down) ... using the same data/code as in the answer from #r-beginners ...
fig, axlist = mpf.plot(daily,type='candle',volume=True,
title='\nS&P 500, Nov 2019',
ylabel='OHLC Candles',
ylabel_lower='Shares\nTraded',
returnfig=True)
import matplotlib.ticker as mticker
axlist[2].yaxis.set_major_formatter(mticker.FormatStrFormatter('%d'))
mpf.show()
The result:
In theory it would be relatively easy to enhance mplfinance to accept a kwarg for formating the axis labels; but for now the above will work.
The volume notation is automatically in exponential form based on the size of the volume, so if you want to avoid this, you can avoid it by making the original data smaller with unit data. The following example shows how to deal with this problem by dividing by 1 million. This data is taken from the official website.
daily['Volume'] = daily['Volume'] / 1000000
This is how we responded.
%matplotlib inline
import pandas as pd
daily = pd.read_csv('data/SP500_NOV2019_Hist.csv',index_col=0,parse_dates=True)
daily['Volume'] = daily['Volume'] / 1000000
import mplfinance as mpf
mpf.plot(daily,type='candle',volume=True,
title='\nS&P 500, Nov 2019',
ylabel='OHLC Candles',
ylabel_lower='Shares\nTraded')
Example of normal output
So I am trying to set up a chart in python to show the development of an inter-month spread over the year (i.e. Oct/Nov 2015, Oct/Nov 2016, and so on).
Currently when I plot, it shows me the whole timeline on the x-axis from 2015 to however far I go.
Preferably I would like to show number of days rather than actual date on X-axis, since they are all over a year.
I've tried the following code:
#Fetching curve
curve_name = 'Oct/Nov'
OctNov = get_forward_curve_history(name=curve_name, start_date='2019-01-
01', end_date=date)
#plotting spread
Oct/Nov = Med4.loc['2019-10-01':'2019-10-31'].mean() - JKM5.loc['2019-11-
01':'2019-11-30'].mean()
Oct/Nov.plot()
#legend and grid commands
plt.gca().legend(('Oct/Nov17','Oct/Nov18','Oct/Nov19'))
plt.grid()
plt.show()
I would expecting something like the below, where we can see different years but on the same X-axis scale (roughly 365 days):
If I understand correctly you just want to plot a bunch of years worth of data on the same graph?
If so you want to either use the plt.hold(True) option and just add the to the figure again and again then show at the end or ready all the data and plot it all at once.
It is very hard to produce any code without the original data but this may help:
Python equivalent to 'hold on' in Matlab
I bumped into a problem when plotting a pandas series.
When plotting the series with a datetime x-axis, x-axis is accordingly relabeled when zooming, i.e. it works fine:
from matplotlib import pyplot as plt
from numpy.random import randn
from pandas import Series,date_range
import numpy as np, pandas as pd
date_index = date_range('1/1/2016', periods=6*24*7, freq='10Min')
ts = Series(randn(len(date_index)), index=date_index)
ts.plot(); plt.show()
However, when i redefine the series index as strings, a strange thing happens, the zoom does not work properly anymore (the limits seem not to change)
sindex=np.vectorize(lambda s: s.strftime('%d.%m %H:%M'))(ts.index.to_pydatetime())
ts = Series(randn(len(date_index)), index=sindex)
ts.plot(); plt.show()
Is this a bug or do i misuse/misunderstand ? advice/help would be very welcome.
I also noticed that plotting with kind='bar' is comparatively to default incredibly slow (with longer vectors), and i am not sure what would be the origin of that...
When you format your date labels as strings before plotting, you lose all the actual date information; they're just strings now. This means that pandas / matplotlib can't reformat the tick labels when you zoom. See the first paragraph after the plot here.
For you second question, bar plot will draw a tick and bar for every data point. For large series this gets expensive. At this time pandas bar plots are not hooked into the auto-formatting like like plot is. You can do a bar plot directly with matplotlib though, and suppress some of the ticks yourself.
All,
I'm comparing a few time series using matplotlib's broken_barh command---it's a nice way to compare data gaps in multiple, simultaneous, time series. Because broken_barh appears to only take numerical input, I'm using POSIX times.
Once finished, I would like to change the x-axis to display nicely formatted dates and times.
Now, I can do this directly through something like:
xtick_locations = ax.xaxis.get_majorticklocs()
xtick_new_labels = [
'{0:04n}-{1:02n}-{2:02n}\n{3:02n}:{4:02n}:{5:02n}'.format(
t.year,t.month,t.day,t.hour,t.minute,t.second) for t in
[pd.Timestamp(tick,unit='s') for tick in xtick_locations]]
plt.xticks(xticks_locations,xtick_new_labels,rotation='vertical')
But then my times and dates are not located at nice, logical locations, e.g. the start of days, weeks, months, and so forth as dependent upon the plotted range.
Does anyone know a way to use matplotlib.dates.AutoDateLocator and matplotlib.dates.AutoDateFormatter to intelligently calculate and display new tick locations given my original, numerical x-axis? Can I suppress the original axis and overlay it with a new one?
Thanks!