I have a dataframe like this one (code to generate the data):
I want to compare two lines - l1 and l2, both depend on the parameter t. Each line has five values of t sampled that are numbered with t_i. I want to plot both lines, with one of the sampled points highlighted for each line. The points to highlight should be set with two sliders - one for each line.
I can get it working without the sliders:
base = alt.Chart(df).encode(x='x', y='y', color='line_name')
for line_name in df.line_name.unique():
line = base.transform_filter(datum.line_name == line_name)
plots += [line.mark_line(), line.mark_point().transform_filter(datum.t_i == int(line_name[1]))]
alt.layer(*plots)
Or with 1 slider:
for line_name in df.line_name.unique():
line = base.transform_filter(datum.line_name == line_name)
slider = alt.binding_range(min=0, max=4, step=1, name='t_i:')
select_t_i = alt.selection_single(name="t_i", fields=['t_i'], bind=slider, init={'t_i': 0})
plots += [line.mark_line(),
line.mark_point().add_selection(select_t_i).transform_filter(select_t_i)]
alt.layer(*plots[:-1])
I get the expected result:
But if I change the last line to actually add the second slider:
alt.layer(*plots[:-1]) -> alt.layer(*plots)
I get nothing - the plot does not show up and calling display does not help. How should I do that instead?
Also, I would like to see the value of t for the selected point, not the t_i. I actually added t_i because I couldn't define the slider with arbitrary values - all examples I saw, have min, max, step. How can I display the value of t, so it is updated with the slider?
Thanks!
EDIT (working code):
for line_name in df.line_name.unique():
line = base.transform_filter(datum.line_name == line_name)
slider = alt.binding_range(min=0, max=4, step=1, name='t_%s:' % line_name[1:])
select_t_i = alt.selection_single(fields=['t_i'], bind=slider, init={'t_i': 0})
plots += [line.mark_line(),
line.mark_point().add_selection(select_t_i).transform_filter(select_t_i)]
alt.layer(*plots[:-1])
Two selections cannot have the same name. Remove name="t_i" from your selection definition (so that each one will have a unique automatically-generated name), and it will work.
Related
So I have found this code that creates a python graph that is updated a plot in real time that does everything i need, but i would like if instead of the x-axis moving the values are updated. I have been searching for examples but i only find a static version where the values of the x values stay the same.
What i have right now:
what i want:
Here is the code:
# Create Plot Widget
self.scrolling_timestamp_plot_widget = pg.PlotWidget(axisItems={'bottom': TimeAxisItem(orientation='bottom')})
# Enable/disable plot squeeze (Fixed axis movement)
self.scrolling_timestamp_plot_widget.plotItem.setMouseEnabled(x=False, y=False)
self.scrolling_timestamp_plot_widget.setTitle('Signal 1 ')
self.scrolling_timestamp_plot_widget.setLabel('left', 'Value')
self.scrolling_timestamp_plot_widget.setLabel('bottom', 'Time (s)')
self.scrolling_timestamp_plot = self.scrolling_timestamp_plot_widget.plot()
self.scrolling_timestamp_plot.setPen("r")
def plot_updater(self):
self.data_point = float(self.current_position_value)
self.data.append({'x': self.timestamp.elapsed(), 'y': self.data_point })
print("List Values:",self.data)
self.scrolling_timestamp_plot.setData(x=[item['x'] for item in self.data], y=[item['y'] for item in self.data])
I have a plot that I have made which has two different categories that is subdvided into three different groups. I have made calculations of the mean and median for each of these groups, but when I try to add annotate the figures with these numbers, they end up printing on top of each other, when I want each figure within the plot to be annotated with its respective mean and median.
So my code to make this plot currently looks like this:
fig = px.violin(CVs,
y="cv %",
x="group",
color="method",
box=True,
points=False,
hover_data=CVs.columns)
for i in CVs['method'].unique():
for j in CVs['group'].unique():
mean, median = np.round(CVs.loc[CVs['method']==i].agg({'cv %':['mean', 'median']}), 2)['cv %'].values
fig.add_annotation(x=j, y=0,
yshift=-65,
text="Mean: {}%".format(mean),
font=dict(size=10),
showarrow=False)
fig.add_annotation(x=j, y=0,
yshift=-75,
text="Median: {}%".format(median),
font=dict(size=10),
showarrow=False)
fig.update_traces(meanline_visible=True)
fig.update_layout(template='plotly_white', yaxis_zeroline=False, height=fig_height, width=fig_width)
iplot(fig)
From what I have read in the documentation (https://plotly.com/python/text-and-annotations/), it seems like you need indicate the coordinates of the added annotation using the parameters x and y.
I have tried to adhere to these parameters by setting y to 0 (since the y axis is numerical), and setting x to the pertinent group along the x axis (which is a categorical). However, as one can tell from the plot above, this doesn't seem to work. I have also tried setting x to a value that increments with each iteration of the for loop, but all the values I have tried (e.g. 1, 10, 0.1) haven't worked, the annotations keep printing on top of each other, just at different places along the x axis.
I want to have one set of annotations under each figure. Does anyone know how I can set this up?
Based on what you used (yshift) to adjust the annotation, I have done the same using xshift to move each of the labels below their respective plot. Note that you have fig_height and fig_width which was not provided, so I let plotly choose the size. You may need to adjust the offset a bit if figure is different. Hope this works.
CVs = px.data.tips() ##Used tips db
CVs.rename(columns={'sex': 'group', 'day':'method', 'total_bill': 'cv %'}, inplace=True) ##Replaced to names you have
CVs = CVs[CVs.method != 'Thur'] ##Removed one as there were 4 days in tips
fig = px.violin(CVs,
y="cv %",
x="group",
color="method",
box=True,
points=False,
hover_data=CVs.columns)
x_shift = -100 ##Start at -100 to the left of the j location
for i in CVs['method'].unique():
for j in CVs['group'].unique():
mean, median = np.round(CVs.loc[CVs['method']==i].agg({'cv %':['mean', 'median']}), 2)['cv %'].values
fig.add_annotation(x=j, y=0,
yshift=-65, xshift = x_shift,
text="Mean: {}%".format(mean),
font=dict(size=10),
showarrow=False)
fig.add_annotation(x=j, y=0,
yshift=-75, xshift = x_shift,
text="Median: {}%".format(median),
font=dict(size=10),
showarrow=False)
x_shift = x_shift + 100 ##After each entry (healthy/sick in your case), add 100
fig.update_traces(meanline_visible=True)
fig.update_layout(template='plotly_white', yaxis_zeroline=False)#, height=fig_height, width=fig_width)
#iplot(fig)
Plot
I wish to plot some data from an array with multiple columns, and would like each column to be a different line on the same scrolling graph. As there are many columns, I think it would make sense to plot them within a loop. I'd also like to plot a second scrolling graph with a single line.
I can get the single line graph to scroll correctly, but the graph containing the multiple lines over-plots from the updated array without clearing the previous lines.
How do I get the lines to clear within the for loop. I thought that setData, might do the clearing. Do I have to have a pg.QtGui.QApplication.processEvents() or something similar within the loop? I tried to add that call but had it no effect.
My code:
#Based on example from PyQtGraph documentation
import numpy as np
import pyqtgraph as pg
win = pg.GraphicsLayoutWidget(show=True)
win.setWindowTitle('pyqtgraph example: Scrolling Plots')
timer = pg.QtCore.QTimer()
plot_1 = win.addPlot()
plot_2 = win.addPlot()
data1 = np.random.normal(size=(300))
curve1 = plot_1.plot(data1)
data_2d = np.random.normal(size=(3,300))
def update_plot():
global data1, data_2d
data1[:-1] = data1[1:]
data1[-1] = np.random.normal()
curve1.setData(data1)
for idx, n in enumerate(data_2d):
n[:-1] = n[1:]
n[-1] = np.random.normal()
curve2 = plot_2.plot(n,pen=(idx))
curve2.setData(n)
#pg.QtGui.QApplication.processEvents() #Does nothing
timer = pg.QtCore.QTimer()
timer.timeout.connect(update_plot)
timer.start(50)
if __name__ == '__main__':
pg.exec()
You could clear the plot of all curves each time with .clear(), but that wouldn't be very performant. A better solution would be to keep all the curve objects around and call setData on them each time, like you're doing with the single-curve plot. E.g.
curves_2d = [plot_2.plot(pen=idx) for idx, n in enumerate(data_2d)]
# ... in update_plot
curves_2d[idx].setData(n)
I am converting some old Python 2.7 code to 3.6.
My routine plots the first line OK but subsequent lines seem to start where the previous line left off. (Running on-line at www.pythonanywhere.com)
My code:
import matplotlib
from matplotlib import pyplot;
k = 0
while k < len(Stations):
# Draw the graph
fig.patch.set_facecolor('black') # Outside border
pyplot.rcParams['axes.facecolor'] = 'black' # Graph background
pyplot.rcParams['axes.edgecolor'] = 'red'
pyplot.tick_params(axis='x', colors='yellow')
pyplot.tick_params(axis='y', colors='yellow')
pyplot.ylim(float(BtmLimit),float(TopLimit))
pyplot.ylabel("Percent of normal range.", size=10, color = "yellow")
pyplot.xticks([]) # Hide X axis
pyplot.title("Plotted at %sGMT, %s %s %s" % (thour, tday, tdate, tmonth), color = "yellow")
if Error == 'False': pyplot.plot(Epoch, Scaled, color = (Color), linewidth=1.9)
pyplot.plot(Epoch, Top, color = [0,0.5,0]) # Green lines
pyplot.plot(Epoch, Btm, color = [0,0.5,0])
k = k + 1
pyplot.savefig(SD+'RiverLevels.png', facecolor='black', bbox_inches='tight')
pyplot.show()
pyplot.close()
The data looks like this:
Epoch
['1638046800', '1638047700', '1638048600', '1638049500', '1638050400', '1638051300', '1638052200', '1638053100', '1638054000', '1638054900', '1638
055800', '1638056700', '1638057600', '1638058500', '1638059400', '1638060300', '1638061200', '1638062100', '1638063000', '1638063900', '1638064800
', '1638065700', '1638066600', '1638067500', '1638068400', '1638069300', '1638070200', '1638071100', '1638072000', '1638072900', '1638073800', '16
38074700', '1638075600', '1638076500', '1638077400', '1638078300', '1638079200', '1638080100', '1638081000', '1638081900', '1638082800', '16380837
00', '1638084600', '1638085500', '1638086400', '1638087300', '1638088200', '1638089100', '1638090000', '1638090900', '1638091800', '1638092700', '
1638093600', '1638094500', '1638095400']
Scaled
['32.475247524752476', '33.069306930693074', '33.76237623762376', '33.56435643564357', '33.56435643564357', '33.86138613861387', '34.1584158415841
6', '34.35643564356436', '34.554455445544555', '34.554455445544555', '34.75247524752476', '34.95049504950495', '35.049504950495056', '35.148514851
48515', '35.049504950495056', '35.14851485148515', '35.44554455445545', '35.54455445544555', '35.54455445544555', '35.34653465346535', '35.5445544
5544555', '35.64356435643565', '35.84158415841585', '35.742574257425744', '35.54455445544555', '35.44554455445545', '35.44554455445545', '35.34653
465346535', '35.24752475247525', '35.049504950495056', '34.95049504950495', '34.95049504950495', '34.851485148514854', '34.65346534653466', '34.35
643564356436', '34.15841584158416', '34.35643564356436', '34.35643564356436', '34.25742574257426', '34.05940594059406', '33.86138613861387', '33.6
63366336633665', '33.86138613861387', '33.663366336633665', '33.663366336633665', '33.46534653465347', '33.366336633663366', '33.56435643564357',
'33.663366336633665', '33.663366336633665', '33.663366336633665', '33.663366336633665', '33.960396039603964', '34.05940594059406', '34.05940594059
406']
Output image
I guess this may be due to using strings instead of numbers. When you use strings, the x values are taken as categories and not ordered numerically but in the order they appear in the list (unless a category is exactly repeated). I understand that the snippet is not complete, but the values of Epoch and Scaled actually change on each iteration.
After plotting the first set of data, any values not present in the first set will be positioned "afterwards" those of the first set (ie: to the right of first set's last point in x, and higher than the last point in y). When the second set of data is plotted, the first x values have not appeared in the previous set, so they are plotted afterwards (beginning of light blue line in the plot), regardless of their numeric value. Then, the final values are the same of those that had appeared in the first set, so the line goes back to the left of the figure.
You can try using [float(x) for x in Epoch] and [float(y) for y in Scaled] in the plots. As I see that there are spaces in the strings representing the numbers, you could use a function like this:
def flist_from_slist(data):
return [float(x.replace(' ', '')) for x in data]
And replace the pyplot.plot call by:
pyplot.plot(flist_from_slist(Epoch), flist_from_slist(Scaled), linewidth=1.9)
Moreover, there is a lot of code inside the loop that could be moved outside (setting the ticks, labels, etc).
I am trying to replicate a chart like the following using a pandas dataframe and bokeh vbar.:
Objective
So far, I´ve managed to place the labels in their corresponding height but now I can't find a way to access the numeric value where the category (2016,2017,2018) is located in the x axis. This is my result:
My nested categorical stacked bars chart
This is my code. It's messy but it's what i've managed so far. So is there a way to access the numeric value in x_axis of the bars?
def make_nested_stacked_bars(source,measurement,dimension_attr):
#dimension_attr is a list that contains the names of columns in source that will be used as categories
#measurement containes the name of the column with numeric data.
data = source.copy()
#Creates list of values of highest index
list_attr = source[dimension_attr[0]].unique()
list_stackers = list(source[dimension_attr[-1]].unique())
list_stackers.sort()
#trims labals that are too wide to fit in graph
for column in data.columns:
if data[column].dtype.name == 'object':
data[column] = np.where(data[column].apply(len) > 30, data[column].str[:30]+'...', data[column])
#Creates a list of dataframes, each grouping a specific value
list_groups = []
for item in list_attr:
list_groups.append(data[data[dimension_attr[0]] == item])
#Groups data by dimension attrs, aggregates measurement to count
#Drops highest index from dimension attr
dropped_attr = dimension_attr[0]
dimension_attr.remove(dropped_attr)
#Creates groupby by the last 2 parameters, and aggregates to count
#Calculates percentage
for index,value in enumerate(list_groups):
list_groups[index] = list_groups[index].groupby(by=dimension_attr).agg({measurement: ['count']})
list_groups[index] = list_groups[index].groupby(level=0).apply(lambda x: round(100 * x / float(x.sum()),1))
# Resets indexes
list_groups[index] = list_groups[index].reset_index()
list_groups[index] = list_groups[index].pivot(index=dimension_attr[0], columns=dimension_attr[1])
list_groups[index].index = [(x,list_attr[index]) for x in list_groups[index].index]
# Drops dimension attr as top level column
list_groups[index].columns = list_groups[index].columns.droplevel(0)
list_groups[index].columns = list_groups[index].columns.droplevel(0)
df = pd.concat(list_groups)
# Get the number of colors needed for the plot.
colors = brewer["Spectral"][len(list_stackers)]
colors.reverse()
p = figure(plot_width=800, plot_height=500, x_range=FactorRange(*df.index))
renderers = p.vbar_stack(list_stackers, x='index', width=0.3, fill_color=colors, legend=[get_item_value(x)for x in list_stackers], line_color=None, source=df, name=list_stackers,)
# Adds a different hovertool to a stacked bar
#empy dictionary with initial values set to zero
list_previous_y = {}
for item in df.index:
list_previous_y[item] = 0
#loops through bar graphs
for r in renderers:
stack = r.name
hover = HoverTool(tooltips=[
("%s" % stack, "#%s" % stack),
], renderers=[r])
#Initial value for placing label in x_axis
previous_x = 0.5
#Loops through dataset rows
for index, row in df.iterrows():
#adds value of df column to list
list_previous_y[index] = list_previous_y[index] + df[stack][index]
## adds label if value is not nan and at least 10
if not math.isnan(df[stack][index]) and df[stack][index]>=10:
p.add_layout(Label(x=previous_x, y=list_previous_y[index] -df[stack][index]/2,
text='% '+str(df[stack][index]), render_mode='css',
border_line_color='black', border_line_alpha=1.0,
background_fill_color='white', background_fill_alpha=1.0))
# increases position in x_axis
#this should be done by adding the value of next bar in x_axis
previous_x = previous_x + 0.8
p.add_tools(hover)
p.add_tools(hover)
p.legend.location = "top_left"
p.x_range.range_padding = 0.2
p.xgrid.grid_line_color = None
return p
Or is there an easier way to get all this done?
Thank you for your time!
UPDATE:
Added an additional image of a three level nested chart where the label placement in x_axis should be accomplished too
Three level nested chart
I can't find a way to access the numeric value where the category (2016,2017,2018) is located in the x axis.
There is not any way to access this information on the Python side in standalone Bokeh output. The coordinates are only computed inside the browser on the JavaScript side. i.e. only after your Python code has finished running and is out of the picture entirely. Even in a Bokeh server app context there is not any direct way, as there are not any synchronized properties that record the values.
As of Bokeh 1.3.4, support for placing labels with categorical coordinates is a known open issue.
In the mean time, the only workarounds I can suggest are:
Use the text glyph method with coordinates in a ColumnDataSource, instead of Label. That should work to position with actual categorical coordinates. (LabelSet might also work, though I have not tried). You can see an example of text with categorical coordiantes here:
https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/periodic.py
Use numerical coordinates to position the Label. But you will have to experiment/best guess to find numercal coordinates that work for you. A rule of thumb is that categories have a width of 1.0 in synthetic (numeric) coordinate space.
My solution was..
Creating a copy of the dataframe used for making the chart. This dataframe (labeling_data) contains the y_axis coordinates calculated so that the label is positioned at the middle of the corresponding stacked bar.
Then, added aditional columnns to be used as the actual label where the values to be displayed were concatenated with the percentage symbol.
labeling_data = df.copy()
#Cumulative sum of columns
labeling_data = labeling_data.cumsum(axis=1)
#New names for columns
y_position = []
for item in labeling_data.columns:
y_position.append(item+'_offset')
labeling_data.columns = y_position
#Copies original columns
for item in df:
#Adding original columns
labeling_data[item] = df[item]
#Modifying offset columns to place label in the middle of the bar
labeling_data[item+'_offset'] = labeling_data[item+'_offset']-labeling_data[item]/2
#Concatenating values with percentage symbol if at least 10
labeling_data[item+'_label'] = np.where(df[item] >=10 , '% '+df[item].astype(str), "")
Finally, by looping through the renderers of the plot, a labelset was added to each stack group using the labeling_data as Datasource . By doing this, the index of the dataframe can be used to set the x_coordinate of the label. And the corresponding columns were added for the y_coordinate and text parameters.
info = ColumnDataSource(labeling_data)
#loops through bar graphs
for r in renderers:
stack = r.name
#Loops through dataset rows
for index, row in df.iterrows():
#Creates Labelset and uses index, y_offset and label columns
#as x, y and text parameters
labels = LabelSet(x='index', y=stack+'_offset', text=stack+'_label', level='overlay',
x_offset=-25, y_offset=-5, source=info)
p.add_layout(labels)
Final result:
Nested categorical stacked bar chart with labels