Pandas timeseries plot -- show date when hovering mouse over the line - python

I am plotting a pandas Series where the index is date by date. When I say series.plot(), a chart is generated correctly. The problem is that when I hover the mouse over interesting points on the chart, it only shows the Month and Year of that point. It does not show the exact date of that point.
Below is a sample of the code. Depending on luck, when I mouse over the line, sometimes I see the exact date displayed on the status bar but sometimes I only see year and month.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
idx = pd.date_range('2011-1-1', '2015-1-1')
x = pd.Series(np.cumsum(np.random.randn(len(idx))), idx)
df = pd.DataFrame(x)
df.plot()
plt.show()
Is there any way to display the exact date? How does matplotlib control what to display on status bar? I wonder it has something to do with pandas changing the default configuration after some code is called.

When launching your code everything seems to be working and a complete date (the x-coordinate) is shown in the status bar all the time. But the two coordinates are shown also when I am not directly over the graph (so it is difficult to know the actual values of your graph). Are you looking for a tooltip that shows the exact date, when mousing over the graph, or are the right values (complete dates) in the status bar enough? Can you make a screenshot of how your problem looks like, when it occurs and provide details on the versions you are using? I am on matplotlib 1.4.3 and numpy 1.9.2 combined with pandas 0.15.2.
Also have a look at the matplotlib recipes (http://matplotlib.org/users/recipes.html)! Section "Fixing common date annoyances" sounds very similar to your problem!

Related

Pandas Data frames and sorting values

I am having a difficult time with writing this hw assignment, and am not sure where I messed up. I have tried several things, and believe my issue lies in the sort_values or maybe in the groupby command.
The issue is that I want to only display graph data from the year 2007. (using pandas and plotly in jupyternotebook for my class). I have the graph I want mostly but cannot get it to display the data correctly. It simply isn't filtering out the years, or taking data from specific dates as requested.
import pandas as pd
import plotly.express as px
df = pd.read_csv('Data/Country_Data.csv')
print(df.shape)
df.head(2)
df_Q1 = df.query("year == '2007'")
print(df_Q1.shape)
df_Q1.head()
This is where the issue begins, because it prints a table with only header information. As in it prints all the column names, but none of the data for them, and then later on it displays a graph of what I assume is the most recent death data rather than the year 2007 as specified.

How can I change the formatting of the mplfinance volume on the chart?

I am using mplfinance package to plot candlestick charts of the stock. I am currently trying to figure out how can I change the formatting of the volume in the mplfinance. In all examples provided by the package, and in my own chart the volume comes out in strange notation like 1e23 etc. I would like my volume to reflect the numerical value of what is actually in the pandas dataframe. I trade myself and when I am looking at charts anywhere on the actual trading platforms, it shows normal, it actually shows the volume. But when I look at matplotlib, pandas, mplfinance examples online, the notations is formatted in a strange way everywhere.
Example of what I am talking about
Alternatively, to show the volumes not in scientific notation, but keeping the original values (not scaled down) ... using the same data/code as in the answer from #r-beginners ...
fig, axlist = mpf.plot(daily,type='candle',volume=True,
title='\nS&P 500, Nov 2019',
ylabel='OHLC Candles',
ylabel_lower='Shares\nTraded',
returnfig=True)
import matplotlib.ticker as mticker
axlist[2].yaxis.set_major_formatter(mticker.FormatStrFormatter('%d'))
mpf.show()
The result:
In theory it would be relatively easy to enhance mplfinance to accept a kwarg for formating the axis labels; but for now the above will work.
The volume notation is automatically in exponential form based on the size of the volume, so if you want to avoid this, you can avoid it by making the original data smaller with unit data. The following example shows how to deal with this problem by dividing by 1 million. This data is taken from the official website.
daily['Volume'] = daily['Volume'] / 1000000
This is how we responded.
%matplotlib inline
import pandas as pd
daily = pd.read_csv('data/SP500_NOV2019_Hist.csv',index_col=0,parse_dates=True)
daily['Volume'] = daily['Volume'] / 1000000
import mplfinance as mpf
mpf.plot(daily,type='candle',volume=True,
title='\nS&P 500, Nov 2019',
ylabel='OHLC Candles',
ylabel_lower='Shares\nTraded')
Example of normal output

How to show more categories in a line plot of a pivot table

I have an Excel file containing rows of objects with at least two columns of variables: one for year and one for category. There are 22 types in the category variable.
So far, I can read the Excel file into a DataFrame and apply a pivot table to show the count of each category per year. I can also plot these yearly counts by category. However, when I do so, only 4 of the 22 categories are plotted. How do I instruct Matplotlib to show plot lines and labels for each of the 22 categories?
Here is my code
import numpy as np
import pandas as pd
import matplotlib as plt
df = pd.read_excel("table_merged.xlsx", sheet_name="records", encoding="utf8")
df.pivot_table(index="year", columns="category", values="y_m_d", aggfunc=np.count_nonzero, fill_value="0").plot(figsize=(10,10))
I checked the matplotlib documentation for plot(). The only argument that seemed remotely related to what I'm trying to accomplish is markevery() but it produced the error "positional argument follows keyword argument", so it doesn't seem right. I was able to use several of the other arguments successfully, like making the lines dashed, etc.
Here is the dataframe
Here is the resulting plot generated by matplotlib
Here are the same data plotted in Excel. I'm trying to make a similar plot using matplotlib
Solution
Change pivot(...,fill_value="0") to pivot(...,fill_value=0) and all of the categories appear in the figure as coded above. In the original figure, the four displayed categories were the only ones of the 22 that did not have a 0 value for any year. This is why they were displayed. Any category that had a "0" value was ignored by matplotlib.
A simpler, and better solution is pd.crosstab(df['year'],df['category']) rather than my line 5 above.
The problem comes with the pivot, most likely you don't need that since you are just tabulating years and category. the y-m-d column is not useful at all.
Try something like below:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'year':np.random.randint(2008,2020,1000),
'category':np.random.choice(np.arange(10),size=1000,p=np.arange(10)/sum(np.arange(10))),
'y_m_d':np.random.choice(['a','b','c'],1000)})
pd.crosstab(df['year'],df['category']).plot()
And looking at the code you have, the error comes from:
pivot(...,fill_value="0")
You are filling with a string "0" and this changes the column to something else, and will be ignored by matplotlib. It should be fill_value=0 and it will work, though a very complicated approach......

OHLC python chart

I'm new to pandas and matplotlib and I'm trying to code some algorithmic trading.
I bought this course, and now I understand more, BUT...
It does not includes sample code for OHLC chart in intraday (I mean, it is not complete)
And there are others problems that i have like that my native language is not English (there is no quality material in Spanish about those libraries)
All the material that I found online only plots "daily chart" and is based in matplotlib.finance, and now it is deprecated, currently python uses mplfinance.
Please I need a sample code to chart the csv file in seconds, minutes, hours and days.
I really had tried, I'm not a lazy person, but is taking a lot of time just to plot that chart, the course does not solve my requirement.
Here you have csv file for Alibaba (BABA) in 1 second, 5 second, 15 second, 30 second and 1 minute OHLC chart.
My data
MPLFINANCE
You can use mplfinance. I tried it and it worked, here is the sample code.
note: you need to rename the column in your source data so the columns Open, High, Low, Close have uppercase in their first character.
import mplfinance as mpf
import pandas as pd
data = pd.read_csv('NYSE_BABA, 5s.csv', index_col=0)
data.index = pd.to_datetime(data.index)
mpf.plot(data,type='candle')
Well yes the candlestick is difficult to see because we have the short range data, but you get the idea. Hope it helps!
PLOTLY
You might want to consider Plotly for a nicer visualization.
import plotly.graph_objects as go
import pandas as pd
data = pd.read_csv('NYSE_BABA, 5s.csv')
data['time'] = pd.to_datetime(data['time'], unit='s')
fig = go.Figure(data=[go.Candlestick(x=data['time'],
open=data['Open'],
high=data['High'],
low=data['Low'],
close=data['Close'])])
fig.show()

pandas plot x-axis label

I bumped into a problem when plotting a pandas series.
When plotting the series with a datetime x-axis, x-axis is accordingly relabeled when zooming, i.e. it works fine:
from matplotlib import pyplot as plt
from numpy.random import randn
from pandas import Series,date_range
import numpy as np, pandas as pd
date_index = date_range('1/1/2016', periods=6*24*7, freq='10Min')
ts = Series(randn(len(date_index)), index=date_index)
ts.plot(); plt.show()
However, when i redefine the series index as strings, a strange thing happens, the zoom does not work properly anymore (the limits seem not to change)
sindex=np.vectorize(lambda s: s.strftime('%d.%m %H:%M'))(ts.index.to_pydatetime())
ts = Series(randn(len(date_index)), index=sindex)
ts.plot(); plt.show()
Is this a bug or do i misuse/misunderstand ? advice/help would be very welcome.
I also noticed that plotting with kind='bar' is comparatively to default incredibly slow (with longer vectors), and i am not sure what would be the origin of that...
When you format your date labels as strings before plotting, you lose all the actual date information; they're just strings now. This means that pandas / matplotlib can't reformat the tick labels when you zoom. See the first paragraph after the plot here.
For you second question, bar plot will draw a tick and bar for every data point. For large series this gets expensive. At this time pandas bar plots are not hooked into the auto-formatting like like plot is. You can do a bar plot directly with matplotlib though, and suppress some of the ticks yourself.

Categories

Resources