Range slider Plotly : How not to add a line - python

I have the code below, that produce a subplot with a unique range slider. Except i want the range slider to be on row 2, not to create a replicated row2, is there a way to do it?
import pandas as pd
import plotly.express as px
import plotly.subplots as sp
from datetime import datetime
data1 = [
[1, datetime(2022, 11, 26)],
[7, datetime(2022, 11, 29)],
[4, datetime(2022, 11, 30)],
]
df1 = pd.DataFrame(data1, columns=["value", "date"])
data2 = [
["unique_row","a", datetime(2022, 11, 26),datetime(2022, 11, 27)],
["unique_row","b", datetime(2022, 11, 27),datetime(2022, 11, 30)],
["unique_row","c", datetime(2022, 11, 30),datetime(2022, 12, 2)],
]
df2 = pd.DataFrame(data2, columns=["unique_row","value", "dates_begin","date_end"])
fig1= px.line(df1, x="date", y="value")
fig2 = px.timeline(df2, x_start="dates_begin", x_end="date_end", y="unique_row", color="value")
fig_sub = sp.make_subplots(rows=2,
shared_xaxes=True)
fig_sub.append_trace(fig1['data'][0], row=1, col=1)
fig_sub.append_trace(fig2['data'][0], row=2, col=1)
fig_sub.update_layout(xaxis2_rangeslider_visible=True)

Related

What is plotted when string data is passed to the matplotlib API?

# first, some imports:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Let's say I want to make a scatter plot, using this data:
np.random.seed(42)
x=np.arange(0,50)
y=np.random.normal(loc=3000,scale=1,size=50)
Plot via:
plt.scatter(x,y)
I get this answer:
Ok, let's create a dataframe first:
df=pd.DataFrame.from_dict({'x':x,'y':y.astype(str)})
(I am aware that I am storing y as str - this is a reproducible example, and I do this to reflect the real use case.)
Then, if I do:
plt.scatter(df.x,df.y)
I get:
What am I seeing in this second plot? I thought that the second plot must be showing the x column plotted against the y column, which are converted to float. This is clearly not the case.
Matplotlib doesn't automatically convert str values to numerical, so your y values are treated as categorical. As far as Matplotlib is concerned, the differences '1.0' to '0.9' and '1.0' to '100.0' are not different.
So, the y-axis on the plot will be the same as range(len(y)) (since the difference between all categorical values is the same) with labels assigned from the categorical values.
Since your x is a range equal to range(50), and now your y is a range too (also equal to range(50)), it plots x = y, with y-labels set to respective str value.
As per the excellent answer by dm2, when you pass y as a string, y is simply being treated as arbitrary string labels, and being plotted one after the other in the order in which they appear. To demonstrate, here's an even simpler example.
from matplotlib import pyplot as plt
x = [1, 2, 3, 4]
y = [5, 25, 10, 1] # these are ints
plt.scatter(x, y)
So far so good. Now, different string y values.
y = list("abcd")
plt.scatter(x, y)
You can see how it just takes the y labels and just drops them on the axis one after another.
Finally,
y = ["5", "25", "10", "1"]
plt.scatter(x, y)
Compare this with the previous results and now it should become obvious what's going on.
It's more obvious if the labels and locations are extracted, that the API plots the strings as labels, and the axis locations are 0 indexed numbers based on the how many (len) categories exist.
.get_xticks() and .get_yticks() extract a list of the numeric locations.
.get_xticklabels() and .get_yticklabels() extract a list of matplotlib.text.Text, Text(x, y, text).
There are fewer numbers in the list for the y axis because there were duplicate values as a result of rounding.
This applies to any APIs, like seaborn or pandas that use matplotlib as the backend.
sns.scatterplot(data=df, x='x_num', y='y', ax=ax1)
ax1.scatter(data=df, x='x_num', y='y')
ax1.plot('x_num', 'y', 'o', data=df)
Labels, Locs, and Text
print(x_nums_loc)
print(y_nums_loc)
print(x_lets_loc)
print(y_lets_loc)
print(x_lets_labels)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
[Text(0, 0, 'A'), Text(1, 0, 'B'), Text(2, 0, 'C'), Text(3, 0, 'D'), Text(4, 0, 'E'),
Text(5, 0, 'F'), Text(6, 0, 'G'), Text(7, 0, 'H'), Text(8, 0, 'I'), Text(9, 0, 'J'),
Text(10, 0, 'K'), Text(11, 0, 'L'), Text(12, 0, 'M'), Text(13, 0, 'N'), Text(14, 0, 'O'),
Text(15, 0, 'P'), Text(16, 0, 'Q'), Text(17, 0, 'R'), Text(18, 0, 'S'), Text(19, 0, 'T'),
Text(20, 0, 'U'), Text(21, 0, 'V'), Text(22, 0, 'W'), Text(23, 0, 'X'), Text(24, 0, 'Y'),
Text(25, 0, 'Z')]
Imports, Data, and Plotting
import numpy as np
import string
import pandas as pd
import matplotlib.pyplot as plt
import string
# sample data
np.random.seed(45)
x_numbers = np.arange(100, 126)
x_letters = list(string.ascii_uppercase)
y= np.random.normal(loc=3000, scale=1, size=26).round(2)
df = pd.DataFrame.from_dict({'x_num': x_numbers, 'x_let': x_letters, 'y': y}).astype(str)
# plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 3.5))
df.plot(kind='scatter', x='x_num', y='y', ax=ax1, title='X Numbers', rot=90)
df.plot(kind='scatter', x='x_let', y='y', ax=ax2, title='X Letters')
x_nums_loc = ax1.get_xticks()
y_nums_loc = ax1.get_yticks()
x_lets_loc = ax2.get_xticks()
y_lets_loc = ax2.get_yticks()
x_lets_labels = ax2.get_xticklabels()
fig.tight_layout()
plt.show()

pandas boxplot contains content of plot saved before

I'm plotting some columns of a datafame into a boxplot. Sofar, no problem. As seen below I wrote some stuff and it works. BUT: the second plot contains the plot of the first plot, too. So as you can see I tried it with "= None" or "del value", but it does not work. Putting the plot function outside also don't solves the problem.
Whats wrong with my code?
Here is an executable example
import pandas as pd
d1 = {'ff_opt_time': [10, 20, 11, 5, 15 , 13, 19, 25 ], 'ff_count_opt': [30, 40, 45, 29, 35,38,32,41]}
df1 = pd.DataFrame(data=d1)
d2 = {'ff_opt_time': [1, 2, 1, 5, 1 , 1, 4, 5 ], 'ff_count_opt': [3, 4, 4, 9, 5,3, 2,4]}
df2 = pd.DataFrame(data=d2)
def evaluate2(df1, df2):
def plot(df, output ):
boxplot = df.boxplot(rot=45,fontsize=5)
fig = boxplot.get_figure()
fig.savefig(output + ".pdf")
df_ot = pd.DataFrame(columns=['opt_time1' , 'opt_time2'])
df_ot['opt_time1'] = df1['ff_opt_time']
df_ot['opt_time2'] = df2['ff_opt_time']
plot(df_ot, "bp_opt_time")
df_op = pd.DataFrame(columns=['count_opt1' , 'count_opt2'])
df_op['count_opt1'] = df1['ff_count_opt']
df_op['count_opt2'] = df2['ff_count_opt']
plot(df_op, "bp_count_opt_perm")
evaluate2(df1, df2)
Here is another executable example. I even used other variable names.
import pandas as pd
d1 = {'ff_opt_time': [10, 20, 11, 5, 15 , 13, 19, 25 ], 'ff_count_opt': [30, 40, 45, 29, 35,38,32,41]}
df1 = pd.DataFrame(data=d1)
d2 = {'ff_opt_time': [1, 2, 1, 5, 1 , 1, 4, 5 ], 'ff_count_opt': [3, 4, 4, 9, 5,3, 2,4]}
df2 = pd.DataFrame(data=d2)
def evaluate2(df1, df2):
df_ot = pd.DataFrame(columns=['opt_time1' , 'opt_time2'])
df_ot['opt_time1'] = df1['ff_opt_time']
df_ot['opt_time2'] = df2['ff_opt_time']
boxplot1 = df_ot.boxplot(rot=45,fontsize=5)
fig1 = boxplot1.get_figure()
fig1.savefig( "bp_opt_time.pdf")
df_op = pd.DataFrame(columns=['count_opt1' , 'count_opt2'])
df_op['count_opt1'] = df1['ff_count_opt']
df_op['count_opt2'] = df2['ff_count_opt']
boxplot2 = df_op.boxplot(rot=45,fontsize=5)
fig2 = boxplot2.get_figure()
fig2.savefig( "bp_count_opt_perm.pdf")
evaluate2(df1, df2)
I can see from your code that boxplots: boxplot1 & boxplot2 are in the same graph. What you need to do is instruct that there is going to be two plots.
This can be achieved either by
Create two sub plots using pyplot in matplotlib, this code does the trick fig1, ax1 = plt.subplots() with ax1 specifying boxplot to put in that axes and fig2 specifying boxplot figure
Dissolve evaluate2 function and execute the boxplot separately in different cell in the jupyter notebook
Solution 1 : Two subplots using pyplot
import pandas as pd
import matplotlib.pyplot as plt
d1 = {'ff_opt_time': [10, 20, 11, 5, 15 , 13, 19, 25 ], 'ff_count_opt': [30, 40, 45, 29, 35,38,32,41]}
df1 = pd.DataFrame(data=d1)
d2 = {'ff_opt_time': [1, 2, 1, 5, 1 , 1, 4, 5 ], 'ff_count_opt': [3, 4, 4, 9, 5,3, 2,4]}
df2 = pd.DataFrame(data=d2)
def evaluate2(df1, df2):
df_ot = pd.DataFrame(columns=['opt_time1' , 'opt_time2'])
df_ot['opt_time1'] = df1['ff_opt_time']
df_ot['opt_time2'] = df2['ff_opt_time']
fig1, ax1 = plt.subplots()
boxplot1 = df_ot.boxplot(rot=45,fontsize=5)
ax1=boxplot1
fig1 = boxplot1.get_figure()
fig1.savefig( "bp_opt_time.pdf")
df_op = pd.DataFrame(columns=['count_opt1' , 'count_opt2'])
df_op['count_opt1'] = df1['ff_count_opt']
df_op['count_opt2'] = df2['ff_count_opt']
fig2, ax2 = plt.subplots()
boxplot2 = df_op.boxplot(rot=45,fontsize=5)
fig2 = boxplot2.get_figure()
ax2=boxplot2
fig2.savefig( "bp_count_opt_perm.pdf")
plt.show()
evaluate2(df1, df2)
Solution 2: Executing boxplot in different cell
Update based on comments : clearing plots
Two ways you can clear the plot,
plot itself using clf()
matplotlib.pyplot.clf() function to clear the current Figure’s state without closing it
clear axes using cla()
matplotlib.pyplot.cla() function clears the current Axes state without closing the Axes.
Simply call plt.clf() function after calling fig.save
Read this documentation on how to clear a plot in Python using matplotlib
Just grab the code from Archana David and put it in your plot function: the goal is to call "fig, ax = plt.subplots()" to create a new graph.
import pandas as pd
import matplotlib.pyplot as plt
d1 = {'ff_opt_time': [10, 20, 11, 5, 15, 13, 19, 25],
'ff_count_opt': [30, 40, 45, 29, 35, 38, 32, 41]}
df1 = pd.DataFrame(data=d1)
d2 = {'ff_opt_time': [1, 2, 1, 5, 1, 1, 4, 5],
'ff_count_opt': [3, 4, 4, 9, 5, 3, 2, 4]}
df2 = pd.DataFrame(data=d2)
def evaluate2(df1, df2):
def plot(df, output):
fig, ax = plt.subplots()
boxplot = df.boxplot(rot=45, fontsize=5)
ax = boxplot
fig = boxplot.get_figure()
fig.savefig(output + ".pdf")
df_ot = pd.DataFrame(columns=['opt_time1', 'opt_time2'])
df_ot['opt_time1'] = df1['ff_opt_time']
df_ot['opt_time2'] = df2['ff_opt_time']
plot(df_ot, "bp_opt_time")
df_op = pd.DataFrame(columns=['count_opt1' , 'count_opt2'])
df_op['count_opt1'] = df1['ff_count_opt']
df_op['count_opt2'] = df2['ff_count_opt']
plot(df_op, "bp_count_opt_perm")
evaluate2(df1, df2)

DateLocator in matplotlib to show the first days of both the week and the month

I would like to create a DateLocator in matplotlib that selects all Mondays and the first days of the month. As matplotlib uses the dateutil library I read the docs of how to use RRuleLocator with rrule objects. With the rruleset object from dateutil I can achieve the required functionality:
>>> rrset = rruleset()
>>> rrset.rrule(rrule(DAILY, byweekday=MO, count=5))
>>> rrset.rrule(rrule(DAILY, bymonthday=1, count=5))
>>> list(rrset)
[datetime.datetime(2020, 11, 30, 16, 10, 2),
datetime.datetime(2020, 12, 1, 16, 10, 2),
datetime.datetime(2020, 12, 7, 16, 10, 2),
datetime.datetime(2020, 12, 14, 16, 10, 2),
datetime.datetime(2020, 12, 21, 16, 10, 2),
datetime.datetime(2020, 12, 28, 16, 10, 2),
datetime.datetime(2021, 1, 1, 16, 10, 2),
datetime.datetime(2021, 2, 1, 16, 10, 2),
datetime.datetime(2021, 3, 1, 16, 10, 2),
datetime.datetime(2021, 4, 1, 16, 10, 2)]
But unfortunately I did not manage to find out how to use rruleset with matplotlib. RRuleLocator expects a rrulewrapper object (defined in matplotlib) that hides away the rrule instance and I can not use it with rruleset. Any other way to do this?
If I understood you correctly, calling .set_xticks(list(rrset)) might be enough. For example:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import dateutil
from dateutil.rrule import *
import datetime
import numpy as np
rrset = rruleset()
rrset.rrule(rrule(DAILY, byweekday=MO, count=5))
rrset.rrule(rrule(DAILY, bymonthday=1, count=5))
print(list(rrset))
## generate dates 90 days into the future
base = datetime.datetime.today()
dates = [base + datetime.timedelta(days=3*x) for x in range(30)]
fig = plt.figure(figsize=(10,5))
ax = plt.subplot(111)
ax.set_autoscale_on(True)
## simply plot dates over dates
ax.plot(dates,dates,marker='s')
ax.set_xticks(list(rrset))
formatter = mdates.DateFormatter('%m/%d/%y')
ax.xaxis.set_major_formatter(formatter)
ax.xaxis.set_tick_params(rotation=30, labelsize=10)
ax.autoscale_view()
ax.grid()
plt.show()
yields (today on 11/26/20 where 11/30/2020 is the next Monday, hence the tick label overlapping with the first of the month):

Timeline bar using matplotlib & PolyCollection - Python

I have been trying to replicate #theimportanceofbeingernest 's answer to Timeline bar graph using python and matplotlib
and can't seem to get the correct output graph.
Here is my current output
Here is my desired output (but with using my data etc.)
I'm struggling to identify the issue.
Any help will be greatly appreciated!
Thank you.
Here's the code:
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.collections import PolyCollection
data = [(dt.datetime(1900, 1, 1, 14, 19, 26), dt.datetime(1900, 1, 1, 14, 19, 29), 'index'),
(dt.datetime(1900, 1, 1, 14, 19, 29), dt.datetime(1900, 1, 1, 14, 19, 31), 'links'),
(dt.datetime(1900, 1, 1, 14, 19, 31), dt.datetime(1900, 1, 1, 14, 19, 33), 'guides'),
(dt.datetime(1900, 1, 1, 14, 19, 33), dt.datetime(1900, 1, 1, 14, 19, 35), 'prices'),
(dt.datetime(1900, 1, 1, 14, 19, 35), dt.datetime(1900, 1, 1, 16, 39, 47), 'index'),
(dt.datetime(1900, 1, 1, 16, 39, 47), dt.datetime(1900, 1, 1, 16, 39, 48), 'prices')]
cats = {'index': 1, 'links': 2, 'guides': 3, 'prices': 4}
colormapping = {'index': 'C0', 'links': 'C1', 'guides': 'C2', 'prices': 'C3'}
verts = []
colors = []
for d in data:
v = [(mdates.date2num(d[0]), cats[d[2]]-.4),
(mdates.date2num(d[0]), cats[d[2]]+.4),
(mdates.date2num(d[1]), cats[d[2]]+.4),
(mdates.date2num(d[1]), cats[d[2]]-.4),
(mdates.date2num(d[0]), cats[d[2]]-.4)]
verts.append(v)
colors.append(colormapping[d[2]])
bars = PolyCollection(verts, facecolors=colors)
fig, ax = plt.subplots()
ax.add_collection(bars)
ax.autoscale()
loc = mdates.MinuteLocator(byminute=[0,30])
ax.xaxis.set_major_locator(loc)
ax.xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
ax.set_yticks([1,2,3,4])
ax.set_yticklabels(['index', 'links', 'guides', 'prices'])
plt.show()
Your time differences are extremely short. They are a few seconds, while yourthe x-range is a few hours. So, these bars basically get invisible.
Note that in matplotlib areas are usually drawn without antialiasing, which is useful when putting together multiple semitransparent areas. Lines, however, are drawn with some thickness (in screenspace) and antialiased. Therefore, setting an explicit edgecolor helps to visualize these "bars".
bars = PolyCollection(verts, facecolors=colors, edgecolors=colors)

Can't plot heatmap in Bokeh with datetime x axis

I'm trying to plot the following simple heatmap:
data = {
'value': [1, 2, 3, 4, 5, 6],
'x': [datetime(2016, 10, 25, 0, 0),
datetime(2016, 10, 25, 8, 0),
datetime(2016, 10, 25, 16, 0),
datetime(2016, 10, 25, 0, 0),
datetime(2016, 10, 25, 8, 0),
datetime(2016, 10, 25, 16, 0)],
'y': ['param1', 'param1', 'param1', 'param2', 'param2', 'param2']
}
hm = HeatMap(data, x='x', y='y', values='value', stat=None)
output_file('heatmap.html')
show(hm)
Unfortunately it doesn't render properly:
I've tried setting x_range but nothing seems to work.
I've managed to get something working with the following code:
d1 = data['x'][0]
d2 = data['x'][-1]
p = figure(
x_axis_type="datetime", x_range=(d1, d2), y_range=data['y'],
tools='xpan, xwheel_zoom, reset, save, resize,'
)
p.rect(
source=ColumnDataSource(data), x='x', y='y', width=12000000, height=1,
)
However as soon as I try to use the zoom tool, I get the following errors in console:
Uncaught Error: Number property 'start' given invalid value:
Uncaught TypeError: Cannot read property 'indexOf' of null
I've using Bokeh 0.12.3.
The bokeh.charts, including HeatMap was deprecated and removed in 2017. You should use the stable and supported bokeh.plotting API. With your data above, a complete example:
from datetime import datetime
from bokeh.plotting import figure, show
from bokeh.transform import linear_cmap
data = {
'value': [1, 2, 3, 4, 5, 6],
'x': [datetime(2016, 10, 25, 0, 0),
datetime(2016, 10, 25, 8, 0),
datetime(2016, 10, 25, 16, 0),
datetime(2016, 10, 25, 0, 0),
datetime(2016, 10, 25, 8, 0),
datetime(2016, 10, 25, 16, 0)],
'y': ['param1', 'param1', 'param1', 'param2', 'param2', 'param2']
}
p = figure(x_axis_type='datetime', y_range=('param1', 'param2'))
EIGHT_HOURS = 8*60*60*1000
p.rect(x='x', y='y', width=EIGHT_HOURS, height=1, line_color="white",
fill_color=linear_cmap('value', 'Spectral6', 1, 6), source=data)
show(p)

Categories

Resources