Plot a line on a curve that is undersampled - python

I was wondering if there was a way to color a line to follow the curve from the user specified input. Example is shown below. The user wants to color a line that starts from x = 11, to x = 14 (see image below for the result). I tried f.ex df.loc[..] where it tries to locate points closest to. But then it just colors it from x = 10 to 15. Anyone have an idea how to solve this? Do I need to add extra points between two points, how would I do that? The user might also add x = 11 to x = 19.
Appreciate any help or guidance.
from bokeh.plotting import figure, output_file, show
import pandas as pd
p = figure(width=600, height=600, tools="pan,reset,save")
data = {'x': [1, 2, 3, 6, 10, 15, 20, 22],
'y': [2, 3, 6, 8, 18, 24, 50, 77]}
df = pd.DataFrame(data)
p.line(df.x, df.y)
show(p)
What the result should look like when user inputs x = 11 (start) and x = 14 (end):

With pandas you can create an interpolated DataFrame from the original.
With this you can add a new line in red.
from bokeh.plotting import figure, output_notebook, show
import pandas as pd
output_notebook()
p = figure(width=600, height=600, tools="pan,reset,save")
data = {'x': [1, 2, 3, 6, 10, 15, 20, 22],
'y': [2, 3, 6, 8, 18, 24, 50, 77]}
df = pd.DataFrame(data)
df_interpolated = (df.copy()
.set_index('x')
.reindex(index = range(df['x'].min(), df['x'].max()))
.reset_index() # optional, you could write 'index' in the second line plot, too.
.interpolate()
)
p.line(df.x, df.y)
p.line(df_interpolated.x[11:14], df_interpolated.y[11:14], color="red")
show(p)

Related

Why is my gridline above x-axis and how can I correct it(matplotlib)?

In the figure (see the link below the code), you can see that the bottom horizontal gridline is above the x-axis whereas I would prefer it to be overlapping the x-axis to make the graph look more accurate. Could anyone please tell me how to achieve that? Also, it would be amazing if someone could tell me how I can start my graph from 0 at the bottom left corner. Here is my code:
import matplotlib.pyplot as plt
import numpy as np
x_coordinates = np.array([0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
y_coordinates = np.array([0, 5, 10, 15, 20, 25, 30, 35, 40,45 ])
plt.xlabel("extension/mm")
plt.ylabel("tension/ N")
plt.title("extention vs tension correlation")
plt.xticks([0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
plt.minorticks_on()
plt.grid(b=True, which="minor", color="black" )
plt.grid(b=True, which ="major",color="black")
plt.plot(x_coordinates, y_coordinates)
plt.show()
It's plotting the minor ticks, and that looks confusing against the x-axis. If your plot range ends on a major tick, then it will look nicer. Here is one possible solution:
plt.ylim([min(y_coordinates),max(y_coordinates)])
plt.xlim([min(x_coordinates),max(x_coordinates)])

Creating a heatmap with uneven block sizes / stacked bar chart using Python

I want to create a heatmap in Python that is similar to what is shown on the bottom of this screenshot from TomTom Move: https://d2altcye8lkl9f.cloudfront.net/2021/03/image-1.png (source: https://support.move.tomtom.com/ts-output-route-analysis/)
A route contains multiple segments that vary in length. Each segment consists of the average speed which I want to color using the colormap (green for fast speed to yellow to red for slow speed). I was able to plot each segment in their correct order using a stacked histplot, but when I add hue, it orders the segments with the fastest average speeds first to slowest, and not the segments in their correct order.
There are three time sets containing 4 segments with their length, length of the route so far and speed for each segment for each time set.
import pandas as pd
d = {'timeRanges': ['00:00-06:30', '00:00-06:30', '00:00-06:30', '00:00-06:30', '07:00-15:00', '07:00-15:00', '07:00-15:00', '07:00-15:00', '16:00-17:30', '16:00-17:30', '16:00-17:30', '16:00-17:30'], 'segmentOrder': [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], 'segmentDistance': [20, 45, 60, 30, 20, 45, 60, 30, 20, 45, 60, 30], 'distanceAlongRoute': [20, 65, 125, 155, 20, 65, 125, 155, 20, 65, 125, 155], 'averageSpeed': [54.2, 48.1, 23.5, 33.7, 56.2, 53.2, 42.5, 44.2, 50.2, 46.2, 35.3, 33.2]}
df = pd.DataFrame(data=d)
I have tried using seaborn heatmap and imshow and I have yet to make the x axis block widths vary for each segment.
Much appreciated.
Here is a simple example of a heatmap with different box sizes. Based on the example "Heatmap with Unequal Block Sizes" https://plotly.com/python/heatmaps/. Just set the xe variable to all of the x-axis edges and z to the values that will be used for determining the colors between those points. There should be 1 fewer z value than xe value.
import plotly.graph_objects as go
import numpy as np
xe = [0, 1, 2, 5, 6]
ye = [0, 1]
z = [[1, 2, 1, 3]]
fig = go.Figure(data=go.Heatmap(
x = np.sort(xe),
y = np.sort(ye),
z = z,
type = 'heatmap',
colorscale = 'Viridis'))
fig.update_layout(margin = dict(t=200,r=200,b=200,l=200),
showlegend = False,
width = 700, height = 500,
autosize = False
)
fig.show()

How to make duplicated lines visible in Plotly

I went through Plotly Pythons documentation and could find a way to do it. I am trying to plot over 1000 lines and some of it plots on top of each other. I want to see duplicated lines. I tried passing random line width, but sometimes most bold line plots on top. Tried making lines transparent did not work as well. Please advise I inserted simple example below:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [10, 8, 6, 4, 2, 0, 2, 4, 2, 0]
fig = go.Figure()
fig.add_trace(go.Scatter(
x=x, y=y,
line_color='red',
name='Duplicate1',
))
fig.add_trace(go.Scatter(
x=x, y=y,
line_color='rgb(231,107,243)',
name='Duplicate2',
))
fig.update_traces(mode='lines')
fig.show()
You can iterate over the lines in descending order of their thickness. You can start with a max_width and reduce from there for every new line being plotted. I created a sample script for 10 lines with a linear color scheme.
import plotly.graph_objects as go
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [10, 8, 6, 4, 2, 0, 2, 4, 2, 0]
fig = go.Figure()
max_thickness = 100
N = 10
for i in range(N):
fig.add_trace(go.Scatter(
x=x, y=y,
line_color='rgb({r},255,255)'.format(r= (255//N)*i ) ,
name='Duplicate ' + str(i),
line=dict(width=max_thickness - (i*10) )
))
fig.update_traces(mode='lines')
fig.show()
Here, we plot the same line over and over again but with varying thicknesses and varying colors. The output is as shown below:
This feels like a very open question with regards to why you would want to plot duplicated series, how you end up with duplicates at all. But we'll leave that for now. If it's the case that you can end up with 1000 duplicates, I would use different line widths for each series, and a very low opacity a in 'rgba(0, 0, 255, {a}'. You could also use a varying opacity for each line, but you don't have to. Here's one way of doing it if you've got duplicated values in df_dupe and some unique series in df. Dupes ar displayed in shades of blue. I'd be happy to go into other details if this is something you can use.
Plot:
Complete code:
from plotly.subplots import make_subplots
import plotly.graph_objs as go
import pandas as pd
import numpy as np
# random data
np.random.seed(123)
frame_rows = 100
frame_cols = 3
frame_columns = ['V_'+str(e) for e in list(range(frame_cols+1))]
df=pd.DataFrame()
dupe_rows = 100
dupe_cols = 1000
dupe_columns = ['D_'+str(e) for e in list(range(dupe_cols+1))]
df_dupe=pd.DataFrame()
# rng = range(fra)
# dupe data
for i, col in enumerate(dupe_columns):
df_dupe[col]=np.sin(np.arange(0,frame_rows/10, 0.1))#*np.random.uniform(low=0.1, high=0.99))
# non-dupe data
for i, col in enumerate(frame_columns):
df[col]=np.sin(np.arange(0,frame_rows/10, 0.1))*((i+1)/5)
fig = go.Figure()
# calculations for opacity, colors and widths for duped lines
N = len(dupe_columns)
opac = []
colors = []
max_width = 50
widths = []
# colors and widths
for i, col in enumerate(dupe_columns):
a = (1/N)*(i+1)
opac.append(a)
colors.append('rgba(0,0,255, '+str(a)+')')
#widths2 = N/(i+1)
widths.append(max_width/(i+1)**(1/2))
# line and colors for duplicated values
fig = go.Figure()
for i, col in enumerate(dupe_columns):
fig.add_traces(go.Scatter(x=df_dupe.index, y = df_dupe[col], mode = 'lines',
# line_color = colors[i],
line_color ='rgba(0,0,255, 0.05)',
line_width = widths[i]))
# highlight one of the dupe series
fig.add_traces(go.Scatter(x=df_dupe.index, y = df_dupe[col], mode = 'lines',
line_color ='rgb(0,0,255)',
line_width = 3))
# compare dupes to some other series
for i, col in enumerate(frame_columns[-3:]):
fig.add_traces(go.Scatter(x=df.index, y = df[col], mode = 'lines',
# line_color = colors[i],
# line_width = widths[i]
))
fig.update_yaxes(range=[-1.3, 1.3])
fig.show()

Bokeh: Unable to generate different line colours when using MultiLine glyph

I've used Bokeh to generate a multiline chart that updates using a slider. I cannot find a way to have each line drawn with a different colour. I've tried using itertools to iterate through a palette, and passing a range of palette colours.
Here's the itertools approach (full_source is there to support the slider interaction which uses CustomJS):
import itertools
from bokeh.plotting import figure
from bokeh.embed import components
from bokeh.models import CustomJS, ColumnDataSource, Slider
from bokeh.palettes import Category20 as palette
from bokeh.models.glyphs import MultiLine
from bokeh.models.widgets import DataTable, TableColumn
from bokeh.layouts import column, row
from bokeh.io import show
data={'xdata':[[0, 1, 2, 4, 5, 6, 10, 11, 12], [4, 8, 16, 0, 13, 21, -3, 9, 21]],
'ydata':[[4, 8, 16, 0, 13, 21, -3, 9, 21], [0, 1, 2, 4, 5, 6, 10, 11, 12]]}
colors=itertools.cycle(palette[2])
source = ColumnDataSource(data)
full_source = ColumnDataSource(data)
glyph = MultiLine(xs='xdata', ys='ydata', line_color = next(colors))
p = figure(title = None, plot_width = 400, plot_height = 400, toolbar_location = None)
p.add_glyph(source, glyph)
print(glyph.line_color)
show(p)
This gives two lines, but both of the same colour. print(glyph.line_color) shows just one color passed - #1f77b4 (which is the first colour in the Category20 palette)
I've also tried using the example found here:
import itertools
from bokeh.plotting import figure
from bokeh.embed import components
from bokeh.models import CustomJS, ColumnDataSource, Slider
from bokeh.palettes import Spectral11
from bokeh.models.glyphs import MultiLine
from bokeh.models.widgets import DataTable, TableColumn
from bokeh.layouts import column, row
from bokeh.io import show
data={'xdata':[[0, 1, 2, 4, 5, 6, 10, 11, 12], [4, 8, 16, 0, 13, 21, -3, 9, 21]],
'ydata':[[4, 8, 16, 0, 13, 21, -3, 9, 21], [0, 1, 2, 4, 5, 6, 10, 11, 12]]}
my_pallet = Spectral11[0:2]
source = ColumnDataSource(data)
full_source = ColumnDataSource(data)
glyph = MultiLine(xs='xdata', ys='ydata', line_color = my_pallet)
p = figure(title = None, plot_width = 400, plot_height = 400, toolbar_location = None)
p.add_glyph(source, glyph)
print(glyph.line_color)
show(p)
This gives:
ValueError expected an element of either String, Dict(Enum('expr', 'field', 'value', 'transform'), Either(String, Instance(Transform), Instance(Expression), Color)) or Color, got ['#5e4fa2', '#3288bd', '#66c2a5']
How can I get multiple colours from a palette into a MultiLine graph?
Ok it looks like I'd not been using ColumnDataSource correctly. By passing the colours into the ColumnDataSource as an additional key:value pair in the data Dict, it works. I also could get rid of the MultiLine glyph object.
Working code is:
from bokeh.plotting import figure
from bokeh.embed import components
from bokeh.models import CustomJS, ColumnDataSource, Slider
from bokeh.palettes import Category20 as palette
from bokeh.models.widgets import DataTable, TableColumn
from bokeh.layouts import column, row
from bokeh.io import show
data = {'xs':[[...,...,..,][...,...,...]],'ys':[[...,...,..,][...,...,...]]}
length = len(data)
colors = palette[length]
#because Category20 has a minimum of 3 values, and length may be smaller
while len(colors)>length:
colors.pop()
data['color'] = colors
source = ColumnDataSource(data)
full_source = ColumnDataSource(data)
p = figure(title = None, plot_width = 400, plot_height = 400, toolbar_location = None)
p.multi_line(xs='xdata', ys='ydata', source=source, line_color='color')

Bokeh not displaying plot for pandas

I can't get Bokeh to display my plot. This is my Python code.
import pandas as pd
from bokeh.plotting import figure, ColumnDataSource
from bokeh.io import output_file, show
if __name__ == '__main__':
file = 'Overview Data.csv'
overview_df = pd.read_csv(file)
overview_ds = ColumnDataSource(overview_df)
output_file('Wins across Seasons.html')
print(overview_ds.data)
p = figure(plot_width=400, plot_height=400)
# add a circle renderer with a size, color, and alpha
p.circle('Season', 'Wins', source = overview_ds, size=20, color="navy", alpha=0.5)
# show the results
show(p)
I checked my Chrome browser Inspect Element and the console shows the following.
Wins across Seasons.html:17 [bokeh] could not set initial ranges
e.set_initial_range # Wins across Seasons.html:17
This only seems to happen when I am reading from a file. Hard-coding x and y coordinates work.
I have checked other posts but none of the fixes worked. All my packages are up to date.
This is the file I am reading
Season,Matches Played,Wins,Losses,Goals,Goals Conceded,Clean Sheets
2011-12,38,28,5,89,33,20
2010-11,38,23,4,78,37,15
2009-10,38,27,7,86,28,19
2008-09,38,28,4,68,24,24
2007-08,38,27,5,80,22,21
2006-07,38,28,5,83,27,16
This is the output of the print statement.
{'Season': array(['2011-12', '2010-11', '2009-10', '2008-09', '2007-08', '2006-07'],
dtype=object), 'Matches Played': array([38, 38, 38, 38, 38, 38], dtype=int64), 'Wins': array([28, 23, 27, 28, 27, 28], dtype=int64), 'Losses': array([5, 4, 7, 4, 5, 5], dtype=int64), 'Goals': array([89, 78, 86, 68, 80, 83], dtype=int64), 'Goals Conceded': array([33, 37, 28, 24, 22, 27], dtype=int64), 'Clean Sheets': array([20, 15, 19, 24, 21, 16], dtype=int64), 'index': array([0, 1, 2, 3, 4, 5], dtype=int64)}
Bokeh does not know what to do with those string dates unless you tell it. There are two basic possibilities:
Keep them as strings, and treat them as categorical factors. You can do that by telling Bokeh what the factors are when you create the plot:
p = figure(plot_width=400, plot_height=400,
x_range=list(overview_df.Season.unique()))
That results in this figure:
If you want a different order of categories you can re-order x_range however you like.
Convert them to real datetime values and use a datetime axis. You can do this by telling Pandas to parse column 0 as a date field:
overview_df = pd.read_csv(file, parse_dates=[0])
and telling Bokeh to use a datetime axis:
p = figure(plot_width=400, plot_height=400, x_axis_type="datetime")
That results in this figure:
you can convert the 'Season'-column to datetime to get an output.
overview_df = pd.read_csv(file)
overview_df.Season = pd.to_datetime(overview_df.Season)
overview_ds = ColumnDataSource(overview_df)

Categories

Resources