I bumped into a problem when plotting a pandas series.
When plotting the series with a datetime x-axis, x-axis is accordingly relabeled when zooming, i.e. it works fine:
from matplotlib import pyplot as plt
from numpy.random import randn
from pandas import Series,date_range
import numpy as np, pandas as pd
date_index = date_range('1/1/2016', periods=6*24*7, freq='10Min')
ts = Series(randn(len(date_index)), index=date_index)
ts.plot(); plt.show()
However, when i redefine the series index as strings, a strange thing happens, the zoom does not work properly anymore (the limits seem not to change)
sindex=np.vectorize(lambda s: s.strftime('%d.%m %H:%M'))(ts.index.to_pydatetime())
ts = Series(randn(len(date_index)), index=sindex)
ts.plot(); plt.show()
Is this a bug or do i misuse/misunderstand ? advice/help would be very welcome.
I also noticed that plotting with kind='bar' is comparatively to default incredibly slow (with longer vectors), and i am not sure what would be the origin of that...
When you format your date labels as strings before plotting, you lose all the actual date information; they're just strings now. This means that pandas / matplotlib can't reformat the tick labels when you zoom. See the first paragraph after the plot here.
For you second question, bar plot will draw a tick and bar for every data point. For large series this gets expensive. At this time pandas bar plots are not hooked into the auto-formatting like like plot is. You can do a bar plot directly with matplotlib though, and suppress some of the ticks yourself.
Related
I am using mplfinance package to plot candlestick charts of the stock. I am currently trying to figure out how can I change the formatting of the volume in the mplfinance. In all examples provided by the package, and in my own chart the volume comes out in strange notation like 1e23 etc. I would like my volume to reflect the numerical value of what is actually in the pandas dataframe. I trade myself and when I am looking at charts anywhere on the actual trading platforms, it shows normal, it actually shows the volume. But when I look at matplotlib, pandas, mplfinance examples online, the notations is formatted in a strange way everywhere.
Example of what I am talking about
Alternatively, to show the volumes not in scientific notation, but keeping the original values (not scaled down) ... using the same data/code as in the answer from #r-beginners ...
fig, axlist = mpf.plot(daily,type='candle',volume=True,
title='\nS&P 500, Nov 2019',
ylabel='OHLC Candles',
ylabel_lower='Shares\nTraded',
returnfig=True)
import matplotlib.ticker as mticker
axlist[2].yaxis.set_major_formatter(mticker.FormatStrFormatter('%d'))
mpf.show()
The result:
In theory it would be relatively easy to enhance mplfinance to accept a kwarg for formating the axis labels; but for now the above will work.
The volume notation is automatically in exponential form based on the size of the volume, so if you want to avoid this, you can avoid it by making the original data smaller with unit data. The following example shows how to deal with this problem by dividing by 1 million. This data is taken from the official website.
daily['Volume'] = daily['Volume'] / 1000000
This is how we responded.
%matplotlib inline
import pandas as pd
daily = pd.read_csv('data/SP500_NOV2019_Hist.csv',index_col=0,parse_dates=True)
daily['Volume'] = daily['Volume'] / 1000000
import mplfinance as mpf
mpf.plot(daily,type='candle',volume=True,
title='\nS&P 500, Nov 2019',
ylabel='OHLC Candles',
ylabel_lower='Shares\nTraded')
Example of normal output
I'm trying to filter some data from my plot. Specifically it is the 'hair' at the top of the cycle that I wish to remove.
The 'hair' is due to some numerical errors and I'm not aware of any method in python or how to write a code that would filter away points that don't occur frequently.
I wish to filter the data as I plot it.
You can use Smooth function for this. to get rid of noice,
import pandas as pd
import matplotlib.pyplot as plt
# Plot the Raw Data
# plt. Your Data >>>Bla Bla Blaa
smooth_data = pd.rolling_mean(ts,5).plot(style='k')
plt.show()
All I'd like to do is make a simple barchart of the populations of the countries of the world.
Ideally, the x-axis would have country names, small font, but slanting diagonally; the y-axis would be logarithmic.
Here's what I'm doing so far:
import pandas as pd
import numpy as np
import matplotlib as plt
import matplotlib.pyplot as plt
cols = ['Rank', 'Country', 'UN_Continental_region', 'UN_Statistical_region', 'Population', 'Population2015', 'percent_change']
pop_list = pd.read_table('country.dat', names=cols)
pop_list['Population'].plot().hist(alpha=0.5)
plt.show()
The plot().hist line gives a TypeError: hist() missing 1 required positional argument: 'x'
error, but then a the plt.show() doe make a line plot of the populations.
What's going on???!!
The full code can be found here:
https://github.com/d80b2t/python/blob/master/wealth_and_population/population_barchart_forStackOverflow.ipynb
The reason you get a TypeError is that you are calling plot().hist() when you should be calling plot.hist() (note the lack of parentheses after plot).
The DataFrame.plot object, without parentheses, is a pandas plot API instance, which has a hist() method that references the associated dataframe. When you call that plot API object itself with DataFrame.plot(), it returns a matplotlib axis instance, which has a hist() method that requires an array as the first argument (it's the normal ax.hist() method that you might be familiar with from matplotlib).
The reason the plot still shows up is that when you do pop_list.plot(), it creates a line graph. It's not until you call the hist() method of that axis that you get the error... you've already created the plot!
So, to get rid of the type error, use
pop_list.plot.hist()
But fixing that will give you a histogram of populations, and it sounds like you want a bar chart, not a histogram. To create a bar chart, you can use pop_list.plot.bar. This is something like what you want:
pop_list.plot.bar('Country', 'Population')
But there are so many countries in your data that the chart is too busy to be very useful.
I am often working interactively in an ipython shell or ipython notebook.
Say, I had a pandas.DataFrame with a DatetimeIndex, like this:
idx = pd.date_range("12:00", periods=400000, freq="S")
df = pd.DataFrame({"temp":np.random.normal(size=len(idx))}, index=idx)
In order to plot it, I can simply do:
plt.plot(df.temp, '.')
As one can see, neither do I have to specify the x-axis data, as it is nicely inferred from the DataFrame, nor do I have to specify that I actually want the x-axis to be date based. (Gone are the times of plt.plot_date)
That's awesome!
But the x-axis looks ugly in two ways:
the labels overlap and are hard to read.
the labels show hours instead of dates.
One can almost repair this problem like, e.g. like this:
plt.plot(df.temp, '.')
import matplotlib.dates as mdates
plt.gca().xaxis.set_major_formatter(
mdates.DateFormatter('%d-%m-%Y %H:%M:%S'))
plt.gcf().autofmt_xdate()
As one can see in the resulting plot, the leftmost date label is clipped.
So by increasing the code size by 300% one can almost get a nice plot.
Now to my question:
I can for my life not remember these 2..3 lines, which I'll have to type in always, when making date based plots. It makes the interface feel clumsy and slow. I always have to google for the solution ...
Can I setup matplotlib in a way, that it kind of remembers what my personal defaults are with regard to date based plotting?
I guess, I could actually hack into the plot function. But maybe there is a way using these matplotlib.rc_params, which I wasn't able to find.
As I said above, plt.plot is going a long way to actually guess what I want. It guesses the x-axis data to be the index of the DataFrame .. and it guesses it should actually plot a date based x-axis instead of a the numerical representation of the dates. How can I add something to this?
I'm thinking of maybe even give it some hints like:
plt.plot(df.temp, '.', date_fmt='%d-%m-%Y %H:%M:%S')
or
plt.plot(df.temp, '.', autofmt_xdate=True)
You can use DateFormatter:
import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(8,5))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y %H:%M:%S'))
#rotates the tick labels automatically
fig.autofmt_xdate()
ax.plot(df["T"], '.')
plt.show()
I am plotting a pandas Series where the index is date by date. When I say series.plot(), a chart is generated correctly. The problem is that when I hover the mouse over interesting points on the chart, it only shows the Month and Year of that point. It does not show the exact date of that point.
Below is a sample of the code. Depending on luck, when I mouse over the line, sometimes I see the exact date displayed on the status bar but sometimes I only see year and month.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
idx = pd.date_range('2011-1-1', '2015-1-1')
x = pd.Series(np.cumsum(np.random.randn(len(idx))), idx)
df = pd.DataFrame(x)
df.plot()
plt.show()
Is there any way to display the exact date? How does matplotlib control what to display on status bar? I wonder it has something to do with pandas changing the default configuration after some code is called.
When launching your code everything seems to be working and a complete date (the x-coordinate) is shown in the status bar all the time. But the two coordinates are shown also when I am not directly over the graph (so it is difficult to know the actual values of your graph). Are you looking for a tooltip that shows the exact date, when mousing over the graph, or are the right values (complete dates) in the status bar enough? Can you make a screenshot of how your problem looks like, when it occurs and provide details on the versions you are using? I am on matplotlib 1.4.3 and numpy 1.9.2 combined with pandas 0.15.2.
Also have a look at the matplotlib recipes (http://matplotlib.org/users/recipes.html)! Section "Fixing common date annoyances" sounds very similar to your problem!