I am creating a demo using IPython notebook. I launch the notebook in the pylab inline mode, e.g. ipython notebook --pylab=inline, and what I would like to do is progressively build a plot, modifying aspects of the plot in subsequent cells, and having the chart redisplay after each modification. For instance, I would like to have consecutive cells,
CELL 1:
from pandas.io.data import DataReader
from datetime import datetime
import matplotlib.pyplot as plt
goog = DataReader("GOOG", "yahoo", datetime(2000,1,1), datetime(2012,1,1))
close_vals = goog['Close']
plot(close_vals.index, close_vals.values)
CHART DISPLAYED INLINE
CELL 2:
xlim(datetime(2009,1,1), datetime(2010,1,1))
MODIFIED CHART DISPLAYED INLINE
However, the original chart doesn't seem to make it's way into subsequent cells, and the chart displayed in CELL 2 is empty. In order to see the original plot with the modification, I have to re-issue the plot command,
CELL 2:
plot(close_vals.index, close_vals.values)
xlim(datetime(2009,1,1), datetime(2010,1,1))
This quickly gets clunky and inelegant as I add moving average trend lines and labels. Also, working from the IPython console, this method of progressively building a plot works just fine. Anyone know of a better way to create this kind of demo in the notebook? Thanks.
UPDATE:
My final code ended up looking like this.
CELL 1:
from pandas.io.data import DataReader
from datetime import datetime
import matplotlib.pyplot as plt
goog = DataReader("GOOG", "yahoo", datetime(2000,1,1), datetime(2012,1,1))
close_vals = goog['Close']
fig, ax = subplots(1,1)
ax.plot(close_vals.index, close_vals.values,label='GOOG Stock Price')
CELL 2:
ax.set_xlim(datetime(2009,1,1), datetime(2010,1,1))
fig
CELL 3:
avg_20 = [ sum(close_vals.values[i-20:i])/20.0 for i in range(20,len(close_vals))]
avg_20_times = close_vals.index[20:]
ax.plot(avg_20_times, avg_20, label='20 day trailing average')
ax.legend()
fig
After updating ax in each subsequent cell, calling fig redisplays the plot; exactly what I was looking for. Thanks!
You can use variables to reference the figure and Axe objects:
In cell 1:
fig, ax = subplots(1, 1)
plot(randn(100));
In cell 2:
ax.set_xlim(20, 40)
fig
Related
I've a time series (typically energy usage) recorded over a range of days. Since usage tends to be different over the weekend I want to highlight the weekends.
I've done what seems sensible:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import random
#Create dummy data.
start=datetime.datetime(2022,10,22,0,0)
finish=datetime.datetime(2022,11,7,0,0)
def randomWalk():
i=0
while True:
i=i+random.random()-0.5
yield i
s = pd.Series({i: next(randomWalk()) for i in pd.date_range(start, finish,freq='h')})
# Plot it.
plt.figure(figsize=[12, 8]);
s.plot();
# Color the labels according to the day of week.
for label, day in zip(plt.gca().xaxis.get_ticklabels(which='minor'),
pd.date_range(start,finish,freq='d')):
label.set_color('red' if day.weekday() > 4 else 'black')
But what I get is wrong. Two weekends appear one off, and the third doesn't show at all.
I've explored the 'label' objects, but their X coordinate is just an integer, and doesn't seem meaningful. Using DateFormatter just gives nonsense.
How would be best to fix this, please?
OK - since matplotlib only provides the information we need to the Tick Label Formatter functions, that's what we have to use:
minorLabels=plt.gca().xaxis.get_ticklabels(which='minor')
majorLabels=plt.gca().xaxis.get_ticklabels(which='major')
def MinorFormatter(dateInMinutes, index):
# Formatter: first param is value (date in minutes, would you believe), second is which item in order.
day=pd.to_datetime(np.datetime64(int(dateInMinutes),'m'))
minorLabels[index].set_color('red' if day.weekday()==6 else 'black') # Sunday
return day.day
def MajorFormatter(dateInMinutes, index):
day=pd.to_datetime(np.datetime64(int(dateInMinutes),'m'))
majorLabels[index].set_color('red' if day.weekday()==6 else 'black') # Sunday
return "" if (index==0 or index==len(majorLabels)-1) else day.strftime("%d\n%b\n%Y")
plt.gca().xaxis.set_minor_formatter(MinorFormatter)
plt.gca().xaxis.set_major_formatter(MajorFormatter)
Pretty clunky, but it works. Could be fragile, though - anyone got a better answer?
Matplotlib is meant for scientific use and although technically styling is possible, it's really hard and not worth the effort.
Consider using Plotly instead of Matplotlib as below:
#pip install plotly in terminal
import plotly.express as px
# read plotly express provided sample dataframe
df = px.data.tips()
# create plotly figure with color_discrete_map property specifying color per day
fig = px.bar(df, x="day", y="total_bill", color='day',
color_discrete_map={"Sat": "orange", "Sun": "orange", "Thur": "blue", "Fri": "blue"}
)
# send to browser
fig.show()
Solves your problem using a lot fewer lines. Only thing here is you need to make sure your data is in a Pandas DataFrame rather than Series with column names which you can pass into plotly.express.bar or scatter plot.
This question already has answers here:
How to move labels from bottom to top without adding "ticks"
(2 answers)
How to have the axis ticks in both top and bottom, left and right of a heatmap
(2 answers)
Closed 4 months ago.
I have created a heatmap using the seaborn and matplotlib package in python, and while it is perfectly suited for my current needs, I really would prefer to have the labels on the x-axis of the heatmap to be placed at the top of the plot, rather than at the bottom (which seems to be its default).
So an abridged form of my data looks like this:
NP NP1 NP2 NP3 NP4 NP5
identifier
A1BG~P04217 -0.094045 0.012229 0.102279 1.319618 0.002383
A2M~P01023 -0.805089 -0.477339 -0.351341 0.089735 -0.473815
AARS1~P49588 0.081827 -0.099849 -0.287426 0.101588 0.136366
ABCB6~Q9NP58 0.109911 0.458039 -0.039325 -0.484872 1.905586
ABCC1~I3L4X2 -0.560155 0.580285 0.012868 0.291303 -0.407900
ABCC4~O15439 0.055264 0.138630 -0.204665 0.191241 0.304999
ABCE1~P61221 -0.510108 -0.059724 -0.233365 0.078956 -0.651327
ABCF1~Q8NE71 -0.348526 -0.135414 -0.390021 -0.190644 -0.276303
ABHD10~Q9NUJ1 0.237959 -2.060834 0.325901 -0.778036 -4.046345
ABHD11~Q8NFV4 0.294587 1.193258 -0.797294 -0.148064 -1.153391
And when I use the following code:
import seaborn as sns
import matplotlib as plt
fig, ax = plt.subplots(figsize=(10,30))
ax = sns.heatmap(df_example, annot=True, xticklabels=True)
I get this kind of plot:
https://imgpile.com/i/T3zPH1
I should note that the this plot was made from the abridged dataframe above, the actual dataframe has thousands of identifiers, making it very long.
But as you can see, the labels on the x axis only appear at the bottom. I have been trying to get them to appear on the top, but seaborn doesn't seem to allow this kind of formatting.
So I have also tried using plotly express, but while I solve the issue of placing my x-axis labels on top, I have been completely unable to format the heat map as I had before using seaborn. The following code:
import plotly.express as px
fig = px.imshow(df_example, width= 500, height=6000)
fig.update_xaxes(side="top")
fig.show()
yields this kind of plot: https://imgpile.com/i/T3zF42.
I have tried many times to reformat it using the documentation from plotly (https://plotly.com/python/heatmaps/), but I can't seem to get it to work. When one thing is fixed, another problem arises. I really just want to keep using the seaborn based code as above, and just fix the x-axis labels. I'm also happy to have the x-axis label at both the top and bottom of the plot, but I can't get that work presently. Can someone advise me on what to do here?
Ok, so I did a bit more research, and it turns out you can add the follow code with the seaborn approach:
plt.tick_params(axis='both', which='major', labelsize=10, labelbottom = False, bottom=False, top = False, labeltop=True)
If your data are stored into csv file, you can use this code:
import pandas as pd
import plotly.express as px
df = pd.read_csv("file.csv").round(2)
fig = px.imshow(df.iloc[:,1:],
y = df['identifier'],
text_auto=True, aspect="auto")
fig.show()
The data in the CSV file are in the following format:
identifier NP1 NP2 NP3 NP4 NP5
A1BG~P04217 -0.094045 0.012229 0.102279 1.319618 0.002383
A2M~P01023 -0.805089 -0.477339 -0.351341 0.089735 -0.473815
AARS1~P49588 0.081827 -0.099849 -0.287426 0.101588 0.136366
ABCB6~Q9NP58 0.109911 0.458039 -0.039325 -0.484872 1.905586
ABCC1~I3L4X2 -0.560155 0.580285 0.012868 0.291303 -0.407900
ABCC4~O15439 0.055264 0.138630 -0.204665 0.191241 0.304999
ABCE1~P61221 -0.510108 -0.059724 -0.233365 0.078956 -0.651327
ABCF1~Q8NE71 -0.348526 -0.135414 -0.390021 -0.190644 -0.276303
ABHD10~Q9NUJ1 0.237959 -2.060834 0.325901 -0.778036 -4.046345
ABHD11~Q8NFV4 0.294587 1.193258 -0.797294 -0.148064 -1.153391
Now let's display the xaxis top of the heatmap by adding:
fig.update_layout(xaxis = dict(side ="top"))
Alternative solution if you have old version of Plotly:
fig = go.Figure(data=go.Heatmap(
x=df.columns[1:],
y=df.identifier,
z=df.iloc[:,1:],
text=df.iloc[:,1:],
texttemplate="%{text}"))
fig.update_layout(xaxis = dict(side ="top"))
fig.show()
During debugging or computationally heavy loops, i would like to see how my data processing evolves (for example in a line plot or an image).
In matplotlib the code can redraw / update the figure with plt.cla() and then plt.draw() or plt.pause(0.001), so that i can follow the progress of my computation in real time or while debugging. How do I do that in plotly express (or plotly)?
So i think i essentially figured it out. The trick is to not use go.Figure() to create a figure, but go.FigureWidget() Which is optically the same thing, but behind the scenes it's not.
documentation
youtube video demonstration
Those FigureWidgets are exactly there to be updated as new data comes in. They stay dynamic, and later calls can modify them.
A FigureWidget can be made from a Figure:
figure = go.Figure(data=data, layout=layout)
f2 = go.FigureWidget(figure)
f2 #display the figure
This is practical, because it makes it possible to use the simplified plotly express interface to create a Figure and then use this to construct a FigureWidget out of it. Unfortunately plotly express does not seem to have it's own simplified FigureWidget module. So one needs to use the more complicated go.FigureWidget.
I'm not sure if an idential functionality exists for plotly. But you can at least build a figure, expand your data source, and then just replace the data of the figure without touching any other of the figure elements like this:
for i, col in enumerate(fig.data):
fig.data[i]['y'] = df[df.columns[i]]
fig.data[i]['x'] = df.index
It should not matter if your figure is a result of using plotly.express or go.Figure since both approaches will produce a figure structure that can be edited by the code snippet above. You can test this for yourself by setting the two following snippets up in two different cells in JupyterLab.
Code for cell 1
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
# code and plot setup
# settings
pd.options.plotting.backend = "plotly"
# sample dataframe of a wide format
np.random.seed(5); cols = list('abc')
X = np.random.randn(50,len(cols))
df=pd.DataFrame(X, columns=cols)
df.iloc[0]=0;df=df.cumsum()
# plotly figure
fig = df.plot(template = 'plotly_dark')
fig.show()
Code for cell 2
# create or retrieve new data
Y = np.random.randn(1,len(cols))
# organize new data in a df
df2 = pd.DataFrame(Y, columns = cols)
# add last row to df to new values
# this step can be skipped if your real world
# data is not a cumulative process like
# in this example
df2.iloc[-1] = df2.iloc[-1] + df.iloc[-1]
# append new data to existing df
df = df.append(df2, ignore_index=True)#.reset_index()
# replace old data in fig with new data
for i, col in enumerate(fig.data):
fig.data[i]['y'] = df[df.columns[i]]
fig.data[i]['x'] = df.index
fig.show()
Running the first cell will put together some data and build a figure like this:
Running the second cell will produce a new dataframe with only one row, append it to your original dataframe, replace the data in your existing figure, and show the figure again. You can run the second cell as many times as you like to redraw your figure with an expanding dataset. After 50 runs, your figure will look like this:
I am trying to create line chart using pandas data frame and matplotlib. I am using following code to create line chart.
import pandas as pd
import matplotlib.pyplot as plt
data = {
'Quarter': ['Q1-2018', 'Q2-2018', 'Q3-2018', 'Q4-2018', 'Q1-2019'],
'Data': [256339, 265555, 274880, 211128, 0]
}
dataset2 = pd.DataFrame(data=data)
ax3 = dataset2[['Quarter', 'Data']].plot.line(x='Quarter', y='Data',
legend=False)
ax3.margins(x=0.1)
plt.show()
Which produces following result
As you can see, start and end of line is starting and ending at edge of the plot.
What I am trying to achieve is to have some space at the start and end of line chart like below.
I tried setting x margin by using ax3.margins(x=0.1) but it does not do any thing.
How do I add some space to start and end of chart so that line does not stick to edges?
In pandas 0.23 you would get the correct plot with margins as desired, yet without labels. This "bug" seems to have been fixed in pandas 0.24, at the expense of another undesired behaviour.
That is, pandas fixes the limits of categorical plots and sets the ticklabels to the positions that would look correct if limits are not changed. While you could in theory unfix the limits (ax.set_xlim(None, None)) and let the axes autoscale (ax.autoscale()), the result will be a incorrectly labelled plot.
I doubt there is any reasoning behind this, it's rather an oversight in the pandas source. This pandas issue best describes the problem, which then boils down to this 5 year old issue.
In any case, for categorical plots, consider using matplotlib directly. It's categorical feature is pretty stable by now and easy to use:
import pandas as pd
import matplotlib.pyplot as plt
data = {
'Quarter': ['Q1-2018', 'Q2-2018', 'Q3-2018', 'Q4-2018', 'Q1-2019'],
'Data': [1,3,2,4,1]
}
df = pd.DataFrame(data=data)
plt.plot("Quarter", "Data", data=df)
plt.show()
This question mostly pertains to Matplotlib and animation. The issue is that when animation is updated i need to clear out the axis each time or I get overlapping images because of color changes. When i was using matplotlib-1.0.1 the code below was working fine, but now that I am using matplotlib-1-3.1 if I continue to use ax.clear() in the code below the images of ax do not show up on the chart. Here is the code:
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import time
from finance_pylab import quotes_historical_yahoo, candlestick,plot_day_summary, candlestick2,volume_overlay
from pylab import *
import f_charting
import f_datamanip # to use idx_of_last_unique
import f_tcnvrt # for dukascopy date
from numpy import*
import f_file_handling
import first_add2file
import f_dukascopy_fetcher
import datetime ## so we can print a time stamp we recognize
import f_candle_stick_maker3_fluent
anim_data_file='candle_test6'
showlast_idxs=40
tframe=5 ##how many minutes candles will be #minutes
dukas_file_name="EURUSD_1m_jan2012"
first_candle=datetime.datetime(2012,1,1,22,10,0) ## start chart here
last_candle=datetime.datetime(2012,1,2,1,55,0) ## start trading after here
candle_name_LDP_diff=datetime.timedelta(0,(tframe-1)*60) ##LDP = last_data_point
dukas_last_data_point=last_candle+candle_name_LDP_diff
Dcsv,D,O,H,L,C,V=f_dukascopy_fetcher.dukas2(dukas_file_name,first_candle,dukas_last_data_point)## will get data up to data_end, could just put in high date to take all in file#,s_date)#,data_start,s_date)
f_candle_stick_maker3_fluent.candle_2_txtfile(anim_data_file,tframe,first_candle,last_candle,Dcsv,D,O,H,L,C,V,0)
filename='candle_test6';file_name=filename+'.txt'
candle_width=.8;colorup='#33CC33'; colordown='#E80000' ;up_col='#B8B8B8';down_col='w'
rect1=[.05,.14,.94,.82]#left, bottom, width, height #rect1=[.1,.1,.8,.7] #seems full:rect1=[.02,.04,.95,.93] , the more you move left , you also have to adjust width. bottom and hight push on each other as well
fig =plt.figure(figsize=(15,7),facecolor='white');axescolor ='#f6f6f6' #'#200000' #'#180000' ##100000'#f6f6f6' # the axies background color # border of chart
ax = fig.add_axes(rect1, axisbg=axescolor) #start with volume axis
ax1v = ax.twinx()
def candle_animate(i):
pullData = open(anim_data_file+'.txt','r').read()
dataArray= pullData.split('\n')
contig_time=[];_open=[];_close=[];_high=[];_low=[];_vol=[]; _timevec=[]
for eachLine in dataArray:
if len(eachLine)>1:
_t,_o,_c,_h,_l,_v,u=eachLine.split(',')##x,y=eachLine.split(',')
_timevec.append(f_tcnvrt.str2time_dukascopy(_t))
_open.append(float(_o))
_close.append(float(_c))
_high.append(float(_h))
_low.append(float(_l))
_vol.append(float(_v))
units_print=u
_open=array(_open);_close=array(_close);_high=array(_high);_low=array(_low);_vol=array(_vol)
ax.clear();ax1v.clear() # this line worked with matplotlib-1.0.1 but matplotlib-1.3.1 keeps ax blank
D=_timevec;O=_open;H=_high;L=_low;C=_close;V=_vol
time_delta=datetime.timedelta(0,tframe*60)
last_data_point=D[-1]
_time,_open,_high,_low,_close,_vol=f_candle_stick_maker3_fluent.cs_maker(tframe,first_candle,last_data_point,D,O,H,L,C,V)
contig_time=range(0,len(_time))
chartstart=len(contig_time)-showlast_idxs
numXlbls=12
myidx,x_label=f_charting.x_labels_last_tick_showall_contig(_time,numXlbls)
data4candleshow=transpose([contig_time,_open,_close,_high,_low,_vol])
data4candle=transpose([contig_time,_open,_close,_high,_low,_vol])[-showlast_idxs:]
candlestick(ax, data4candle, width=candle_width,colorup='#33CC33',colordown='#E80000') #'#00FF00' '#C11B17'
f_charting.bar_vol(contig_time[-showlast_idxs:],_open[-showlast_idxs:],_close[-showlast_idxs:],_vol[-showlast_idxs:],ax1v,candle_width,up_col,down_col)
ani=animation.FuncAnimation(fig,candle_animate, interval=1000)
plt.show()
I took a lot out of the code to make it simpler, but i know there is still a lot to look at. Hopefully some expert knows more about the difference between the 2 matplotlib editions or is familiar with my issue. There must be a a clean way to clear out the chart between updates so i don't get overlapping images.
Note: the line: ax.clear();ax1v.clear() can be found in the code above and has a comment next to it denoting this is the line that worked for its purpose in the older matplotlib, but now unfortunately clears out the graph when using matplotlib-1.3.1
Thank you