Python Bokeh: How do I keep the y-axis tick marks stable? - python

Here I've taken some gapminder.org data and looped through the years to create a series of charts (which I converted to an animated gif in imageio) by modifying the Making Interactive Visualizations with Bokeh notebook.
The problem is that when the Middle Eastern countries float to the top in the 1970s, the y-axis tick marks (and the legend) gets perturbed. I'm keeping as many things as possible out of the year loop when I build the plots, so my y-axis code looks like this:
# Personal income (GDP per capita)
y_low = int(math.floor(income_df.min().min()))
y_high = int(math.ceil(income_df.max().max()))
y_data_range = DataRange1d(y_low-0.5*y_low, 1000000*y_high)
# ...
for year in columns_list:
# ...
# Build the plot
plot = Plot(
# Children per woman (total fertility)
x_range=x_data_range,
# Personal income (GDP per capita)
y_range=y_data_range,
y_scale=LogScale(),
plot_width=800,
plot_height=400,
outline_line_color=None,
toolbar_location=None,
min_border=20,
)
# Build the axes
xaxis = LinearAxis(ticker=SingleIntervalTicker(interval=x_interval),
axis_label="Children per woman (total fertility)",
**AXIS_FORMATS)
yaxis = LogAxis(ticker=LogTicker(),
axis_label="Personal income (GDP per capita)",
**AXIS_FORMATS)
plot.add_layout(xaxis, 'below')
plot.add_layout(yaxis, 'left')
As you can see, I've bumped up the data range by a factor of 10^6 with no effect. Is there some parameter I need to add to keep my y-axis tick marks (and legend) stable?

Don't use a DataRange1d, that's what is actually doing the "auto-ranging". If you know the full range that you want to always show up front use a Range1d:
Plot(y_range=Range1d(low, high), ...)
or more for convenience this will also work:
Plot(y_range=(low, high), ...)

Related

Merge two dataframes and use ONE boxplot to plot them with different colors

I am trying to plot a box plot of the temperature of the 20th Century vs the 21st century.
I want to plot these on one box plot but I want the temperature of the 20th century in different color vs the 21st century in a different color.
I don't want to have two different box plots. I want to plot it on one box plot to see if the values of the 21st century are in the outlier range or not.
Also, I want to see the values of individual points in the box plot. Not sure how to do this? I tried Seaborn but it doesn't allow me to show individual values and have a different color of data points in the 2 centuries.
Here is the code to generate values of temperature:
def generate(median=20, err=1, outlier_err=25, size=100, outlier_size=10):
errs = err * np.random.rand(size) * np.random.choice((-5, 5), size)
data = median + errs
lower_errs = outlier_err * np.random.rand(outlier_size)
lower_outliers = median - err - lower_errs
upper_errs = outlier_err * np.random.rand(outlier_size)
upper_outliers = median + err + upper_errs
data = np.round(np.concatenate((data, lower_outliers, upper_outliers)))
np.random.shuffle(data)
return data
data = pd.DataFrame(generate(),columns=['temp'])
data['year']='20th Century'
Not sure if I got what you wanted right, but considering you want individual coloured points and just one box, I suggest you try .swarmplot(). Here's how it might look like:
import seaborn as sns
# generate data for two centuries in a DataFrame
data= pd.DataFrame({'20_century': generate(),
'21_century': generate()})
# transform from wide to long form to plot individual points in a single swarm
data_long = pd.melt(data, value_vars=['20_century', '21_century'])
# rename columns
data_long.columns = ['century', 'temp']
# since .swarmplot() requiers categories on one axis, add one dummy for all, say, for a timescale
data_long['timescale'] = ['century' for row in data_long.iterrows()]
# draw a stripplot with hue to color centuries, dodge=False to plot in one swarm
sns.swarmplot(data=data_long, x='timescale', y='temp', hue='century', dodge=False)
I got one group of individual points, coloured by century, outliers are visible:
You might want to try .stripplot() as well:
# added alpha=0.7 for less opacity to better show overlapping points
sns.stripplot(data=data_long, x='timescale', y='temp', hue='century', dodge=False, alpha=0.7)
I individually like this one better:
This is how a boxplot would look like in the way I understood your request:
sns.boxplot(data=data_long, x='timescale', y='temp', hue='century', dodge=False)

How to create Plotly benchmarking bar chart with average value as base axis

I want to develop a kind of cool benchmark chart that will show on the Y-axis some categories (in my case companies), on the X-axis there will be some numeric values representing a given measurement e.g. daily number of sold products.
And I want present it in a way that somewhere in the middle of the X-axis I will show a long straight line which will represent the average of my measurement - and then I want to use it as my base axis. And for each category, I will show a bar chart that will have a 0 (start) value at the average line, and the end will be the difference between the measurement of the given category and the average (category measurement - average).
Below is an example that I draw in PowerPoint. I hope it will give you a short overview of what I want to achieve.
To be honest I don't know if it's possible with Plotly library. My initial idea is that I need to put the average line as a separate element on the given chart, and then I need to somehow dynamically calculate the start, and end of the given category bar chart and put it on the chart. But don't know if it makes sense.
Could you help me with this case?
Assume you have some data,
import pandas as pd
import plotly.graph_objects as go
data = pd.DataFrame({
'companies': ['Comp1', 'Comp2', 'Comp3', 'Comp4'],
'performance': [26, 37, 54, 19]
})
# if you want a manual benchmark
benchmark_value = 40
# or in case the average
benchmark_value = data['performance'].mean()
data['benchmark_score'] = data['performance'] - benchmark_value
fig = go.Figure(
go.Bar(
y=data['companies'],
x=data['benchmark_score'],
marker=dict(color=['red' if score < 0 else 'green' for score in data['benchmark_score']]),
orientation='h',
text=data['benchmark_score'],
)
)
# add vertical line at 0 but with benchmark value as annotation
fig.add_vline(x=0, line_width=2, line_dash="dash", annotation_text=benchmark_value)
fig.show()
Output:
You can customize a bit on some attributes such as bar colors and text annotation.

How to make the x_labels appear for subplots that are on each other?

I'm learning how to create a dashboard to track my expenses. The first goal is to create a simple bar-plot that tracks my income and expenses for the two biggest categories. The code for the bar-plot is supplied below.
income = 5000
immediate_obligations = -1000
true_expenses = -2000
total = income - immediate_obligations - true_expenses
fig,ax = plt.subplots()
ax.bar(x=[1],height=[income],color = 'green',tick_label="Income")
ax.bar(x=[2,3],height=[immediate_obligations,true_expenses], color = 'red', tick_label=["Immediate Obligations","true_expenses"])
ax.bar(x=[4],height=[total], color = 'blue')
fig.suptitle('Spending Current Month')
The reason I went for three axes was to be able to color the income green, the expenses red, and the difference blue. The plots render well because they don't overlap. However the tick_labels only appear for the plot that is created the latest. It makes sense, but how do I apply the labels to the entire plot?
The argument color does not have to be a single color. You can pass an array with the color of each bar. Therefore your output can be a achieved using only one call to bar():
income = 5000
immediate_obligations = -1000
true_expenses = -2000
total = income - immediate_obligations - true_expenses
bars = [income, immediate_obligations, true_expenses, total]
colors = ['g','r','r','b']
labels = ['Income','Immediate obligations','True expenses','Total']
fig,ax = plt.subplots()
ax.bar(x=range(len(bars)), height=bars, color=colors, tick_label=labels)
fig.suptitle('Spending Current Month')

X-axis tick labels are too dense when drawing plots with matplotlib

I am drawing matplotlib plots and my x axis consists of YYYYMM formatted strings of year and month like 201901 for January of 2019.
My problem is that some of the data spans on a long period of time and this makes the x axis tick labels so dense that they pile up on each other and they become unreadable.
I tried making the font smaller and I rotated the labels 90 degrees which helped a lot but it is still not enough for some of my data.
Here is an example of one of my x axis which looks ok:
And here is an example of an x axis which is too dense because the data spans on a long period of time:
So I want matplotlib to skip printing a few tick labels when the tick labels start piling up on each other. For example, print the label for January, skip printing the labels for February, March, April and May, print the label for June, and skip printing the labels for July, August etc. But I don't know how to do this?
Or are there any other kind of solutions I can use to overcome this problem?
A quick dirty solution would be the following:
ax.set_xticks(ax.get_xticks()[::2])
This would only display every second xtick. If you wanted to only display every n-th tick you would use
ax.set_xticks(ax.get_xticks()[::n])
If you don't have a handle on ax you can get one as ax = plt.gca().
Alternatively, you could specify the number of xticks to use with:
plt.locator_params(axis='x', nbins=10)
An alternate solution could be as below:
x = df['Date']
y = df['Value']
# Risize the figure (optional)
plt.figure(figsize=(20,5))
# Plot the x and y values on the graph
plt.plot(x, y)
# Here you specify the ticks you want to display
# You can also specify rotation for the tick labels in degrees or with keywords.
plt.xticks(x[::5], rotation='vertical')
# Add margins (padding) so that markers don't get clipped by the axes
plt.margins(0.2)
# Display the graph
plt.show()

Compare stock indices of different sizes Python

I am using Python to try and do some macroeconomic analysis of different stock markets. I was wondering about how to properly compare indices of varying sizes. For instance, the Dow Jones is around 25,000 on the y-axis, while the Russel 2000 is only around 1,500. I know that the website tradingview makes it possible to compare these two in their online charter. What it does is shrink/enlarge a background chart so that it matches the other on a new y-axis. Is there some statistical method where I can do this same thing in Python?
I know that the website tradingview makes it possible to compare these two in their online charter. What it does is shrink/enlarge a background chart so that it matches the other on a new y-axis.
These websites rescale them by fixing the initial starting points for both indices at, say, 100. I.e. if Dow is 25000 points and S&P is 2500, then Dow is divided by 250 to get to 100 initially and S&P by 25. Then you have two indices that start at 100 and you then can compare them side by side.
The other method (works good only if you have two series) - is to set y-axis on the right hand side for one series, and on the left hand side for the other one.
You have multiple possibilities here. Let's say you define your axis by the following call
fig, ax = plt.subplots()
Then, you can change the scale of the y axis to logarithmic using
ax.set_yscale('log')
You can also define two y axes inside the same plot with different scales with the call
ax2 = ax.twinx()
and then plot, let's say, big values on ax and small ones on ax2. That will only work well if you have two ranges of values at most.
Another solution is to create a new axis which zooms inside your plot
from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes
ax2 = zoomed_inset_axes(ax, zoom, bbox_to_anchor=(, ),
bbox_transform=ax.transAxes, loc='', borderpad=)
A last thing would be to directly scale your data. For example, if DowJones varies between 20,000 and 30,000, then you can apply the following transformation
DowJones = (DowJones - min(DowJones)) / (max(DowJones) - min(DowJones))
and then your values will vary between 0 and 1. Applying similar transformations to other variables will then allow you to compare variations more easily on the same graph without making any change to the axes.

Categories

Resources