Pandas Plot floating bar chart - python

I am trying to create a bar chart where the upper and lower bound of each bar could be above or below zero. Hence the boxes should "float" depending on the data. I'm also trying to use pandas.plot function as it makes my life way easier in the real application.
The solution I've devised is a horrible kludge and only partially works. Basically I'm running two different bar charts that overlap, with one of the bars being white to "hide" the main bar if necessary. I'm using a mask to mark which bars should be which color. As you can see, this works OK in the "London" and "Paris" example below, but in the "Tokyo" it isn't working because the green bar is "in front" of the white bar.
I could manually fix this a few ways that I can think of, but it would make an already kludgy solution even worse. I'm sure there's a better way that I'm just not smart enough to think of!
Here's the plot, and full code below.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_data = {'Category':['London', 'Paris', 'New York', 'Tokyo'],
'Upper':[10, 5, 0, -5],
'Lower':[5, -5, -10, -10]}
df = pd.DataFrame(data = df_data)
#Color corrector
u_mask = df['Upper'] < 0
d_mask = df['Lower'] < 0
n = len(df)
uca = ['darkgreen' for i in range(n)]
uca = np.array(uca)
uc = uca.copy()
uc[u_mask] = 'white'
dca = ['white' for i in range(n)]
dca = np.array(dca, dtype=uca.dtype)
dc = dca.copy()
dc[d_mask] = 'darkgreen'
(df.plot(kind='bar', y='Upper', x='Category',
color=uc, legend=False))
ax = plt.gca()
(df.plot(kind='bar', y='Lower', x='Category',
color=dc, legend=False, ax=ax))
plt.axhline(0, color='black')
x_axis = ax.xaxis
x_axis.label.set_visible(False)
plt.subplots_adjust(left=0.1,right=0.90,bottom=0.2,top=0.90)
plt.show()

To create the plot via pandas, you could create an extra column with the height. And use df.plot(..., y=df['Height'], bottom=df['Lower']):
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_data = {'Category': ['London', 'Paris', 'New York', 'Tokyo'],
'Upper': [10, 5, 0, -5],
'Lower': [5, -5, -10, -10]}
df = pd.DataFrame(data=df_data)
df['Height'] = df['Upper'] - df['Lower']
ax = df.plot(kind='bar', y='Height', x='Category', bottom=df['Lower'],
color='darkgreen', legend=False)
ax.axhline(0, color='black')
plt.tight_layout()
plt.show()
PS: Note that pandas barplot forces the lower ylim to be "sticky". This is a desired behavior when all values are positive and the bars stand firmly on y=0. However, this behavior is distracting when both positive and negative values are involved.
To remove the stickyness:
ax.use_sticky_edges = False # df.plot() makes the lower ylim sticky
ax.autoscale(enable=True, axis='y')

plt.bar has a bottom paramter. You just need to calculate the heights. Here is a very easy exampel:
upper = [10, 5, 0, -5]
lower = [5, -5, -10, -10]
height = [upper[i] - lower[i] for i in range(len(upper))]
data = [1,2,3]
plt.bar(range(len(lower)),height, bottom=lower)
plt.show()

Related

How to create a wind rose or polar bar plot

I would like to write scout report on some football players and for that I need visualizations. One type of which is pie charts. Now I need some pie charts that looks like below, with different size of slices ( proportionate to the number of the thing the slice indicates) . Can anyone suggest how to do it or have any link to websites where I can learn this?
What you are looking for is called a "Radar Pie Chart". It's analogous to the more commonly used "Radar Chart", but I think it looks better as it highlights the values, rather than focus on meaningless shapes.
The challenge you face with your football dataset is that each category is on a different scale, so you want to plot each value as a percentage of some max. My code will accomplish that, but you'll want to annotate the original values to finish off these charts.
The plot itself can be done with just the standard matplotlib library using polar axes. I borrowed code from here (https://raphaelletseng.medium.com/getting-to-know-matplotlib-and-python-docx-5ee67bad38d2).
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from math import pi
from random import random, seed
seed(12345)
# Generate dataset with 10 rows, different maxes
maxes = [5, 5, 5, 2, 2, 10, 10, 10, 10, 10]
df = pd.DataFrame(
data = {
'categories': ['category_{}'.format(x) for x, _ in enumerate(maxes)],
'scores': [random()*max for max in maxes],
'max_values': maxes,
},
)
df['pct'] = df['scores'] / df['max_values']
df = df.set_index('categories')
# Plot pie radar chart
N = df.shape[0]
theta = np.linspace(0.0, 2*np.pi, N, endpoint=False)
categories = df.index
df['radar_angles'] = theta
ax = plt.subplot(polar=True)
ax.bar(df['radar_angles'], df['pct'], width=2*pi/N, linewidth=2, edgecolor='k', alpha=0.5)
ax.set_xticks(theta)
ax.set_xticklabels(categories)
_ = ax.set_yticklabels([])
I had previously work with rose or polar bar chart. Here is the example.
import plotly.express as px
df = px.data.wind()
fig = px.bar_polar(df, r="frequency", theta="direction",
color="strength", template="plotly_dark",
color_discrete_sequence= px.colors.sequential.Plasma_r)
fig.show()

Is it possible to have a given number (n>2) of y-axes in matplotlib?

prices = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
I have my prices dataframe, and it currently has 3 columns. But at other times, it could have more or fewer columns. Is there a way to use some sort of twinx() loop to create a line-chart of all the different timeseries with a (potentially) infinite number of y-axes?
I tried the double for loop below but I got typeError'd:bTypeError: 'AxesSubplot' object does not support item assignment
# for i in range(0,len(prices.columns)):
# for column in list(prices.columns):
# fig, ax[i] = plt.subplots()
# ax[i].set_xlabel(prices.index())
# ax[i].set_ylabel(column[i])
# ax[i].plot(prices.Date, prices[column])
# ax[i].tick_params(axis ='y')
#
# ax[i+1] = ax[i].twinx()
# ax[i+1].set_ylabel(column[i+1])
# ax[i+1].plot(prices.Date, column[i+1])
# ax[i+1].tick_params(axis ='y')
#
# fig.suptitle('matplotlib.pyplot.twinx() function \ Example\n\n', fontweight ="bold")
# plt.show()
# =============================================================================
I believe I understand why I got the error - the ax object does not allow the assignment of the i variable. I'm hoping there is some ingenious way to accomplish this.
Turned out, the main problem was that you should not mix pandas plotting function with matplotlib which led to a duplication of the axes. Otherwise, the implementation is rather straight forward adapted from this matplotlib example.
from mpl_toolkits.axes_grid1 import host_subplot
import mpl_toolkits.axisartist as AA
from matplotlib import pyplot as plt
from itertools import cycle
import pandas as pd
#fake data creation with different spread for different axes
#this entire block can be deleted if you import your df
from pandas._testing import rands_array
import numpy as np
fakencol=5
fakenrow=7
np.random.seed(20200916)
df = pd.DataFrame(np.random.randint(1, 10, fakenrow*fakencol).reshape(fakenrow, fakencol), columns=rands_array(2, fakencol))
df = df.multiply(np.power(np.asarray([10]), np.arange(fakencol)))
df.index = pd.date_range("20200916", periods=fakenrow)
#defining a color scheme with unique colors
#if you want to include more than 20 axes, well, what can I say
sc_color = cycle(plt.cm.tab20.colors)
#defining the size of the figure in relation to the number of dataframe columns
#might need adjustment for optimal data presentation
offset = 60
plt.rcParams['figure.figsize'] = 10+df.shape[1], 5
#host figure and first plot
host = host_subplot(111, axes_class=AA.Axes)
h, = host.plot(df.index, df.iloc[:, 0], c=next(sc_color), label=df.columns[0])
host.set_ylabel(df.columns[0])
host.axis["left"].label.set_color(h.get_color())
host.set_xlabel("time")
#plotting the rest of the axes
for i, cols in enumerate(df.columns[1:]):
curr_ax = host.twinx()
new_fixed_axis = curr_ax.get_grid_helper().new_fixed_axis
curr_ax.axis["right"] = new_fixed_axis(loc="right",
axes=curr_ax,
offset=(offset*i, 0))
curr_p, = curr_ax.plot(df.index, df[cols], c=next(sc_color), label=cols)
curr_ax.axis["right"].label.set_color(curr_p.get_color())
curr_ax.set_ylabel(cols)
curr_ax.yaxis.label.set_color(curr_p.get_color())
plt.legend()
plt.tight_layout()
plt.show()
Coming to think of it - it would probably have been better to distribute the axes equally to the left and the right of the plot. Oh, well.

Why does matplotlib.ticker.MaxNLocator(prune='both') not prune my tick labels?

I am trying to make a plot including a number of subplots, and I want them to share their axes. I have been trying to use matplotlib.ticker.MaxNLocator to do prune the labels on my ticks, so the figure has readable axes. However, regardless of whether I use prune='upper', 'lower', or 'both', I end up with labels which overlap each other, as shown in the image below:
An ever so slightly simplified version of the code I am using (although still fairly long, sorry) is below:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.ticker as tck
chains = np.array([[-.4, 4, 0, 0, 0], [6, 19, 2000, 2.1e16, 6.1e13]])
pars = np.array([r'$SF_\infty$', r'$\tau_D$', r'$\tau$', 'width', 'scale'])
nplots = len(chains[0,:]) - 1
# Fix up matplotlib to give plots like you like
mpl.rcParams.update({'font.size': 6, 'font.family': 'serif', 'mathtext.fontset': 'cm', 'mathtext.rm': 'serif',
'lines.linewidth': .7, 'xtick.top': True, 'ytick.right': True})
# Make a plot
fig = plt.figure()
for i in range(1,nplots+1):
for j in range(nplots):
if (j<i):
ax = plt.subplot(nplots, nplots, (i-1)*nplots+j+1)
plt.plot(chains[ :, j ], chains[ :, i ], '.-', markersize=0.3, alpha=0.5)
# Set aspect
xlim, ylim = ax.get_xlim(), ax.get_ylim()
ax.axis([xlim[0], xlim[1], ylim[0], ylim[1]])
ax.set_aspect( float(xlim[1]-xlim[0]) / (ylim[1]-ylim[0]) )
# Print things around the edges
if (j == 0): plt.ylabel(pars[i])
if (i == nplots): plt.xlabel(pars[j])
if (j != 0): ax.tick_params(labelleft=False)
if (i != nplots): ax.tick_params(labelbottom=False)
if (j != i-1):
ax.get_xaxis().get_offset_text().set_visible(False)
ax.get_yaxis().get_offset_text().set_visible(False)
# End if-statements
# Fix tickers
ax.minorticks_on()
ax.tick_params(which='both', direction='inout', width=0.5)
ax.xaxis.set_major_locator(tck.MaxNLocator(nbins=5, prune='both'))
ax.yaxis.set_major_locator(tck.MaxNLocator(nbins=5, prune='both'))
# Saving file
plt.subplots_adjust(hspace=0, wspace=-.58)
plt.savefig('testing.png', dpi=400, bbox_inches='tight')
plt.close('all')
What am I misunderstanding in my usage of the MaxNLocator function? I am using Matplotlib 2.0.0.
(Also obviously, any comments on how to improve this plot for readability and decrease the amount of hard-coding is much appreciated!)

Python Matplotlib polar Labeling

Hi Im currently wishing to label my polar bar chart in the form whereby the labels are all rotating by differing amounts so they can be read easily much like a clock. I know there is a rotation in plt.xlabel however this will only rotate it by one amount I have many values and thus would like to not have them all crossing my graph.
This is figuratively what my graph is like with all the orientations in the same way, however I would like something akin to this; I really need this just using matplotlib and pandas if possible. Thanks in advance for the help!
Some example names might be farming, generalists, food and drink if these are not correctly rotated they will overlap the graph and be difficult to read.
from pandas import DataFrame,Series
import pandas as pd
import matplotlib.pylab as plt
from pylab import *
import numpy as np
data = pd.read_csv('/.../data.csv')
data=DataFrame(data)
N = len(data)
data1=DataFrame(data,columns=['X'])
data1=data1.get_values()
plt.figure(figsize=(8,8))
ax = plt.subplot(projection='polar')
plt.xlabel("AAs",fontsize=24)
ax.set_theta_zero_location("N")
bars = ax.bar(theta, data1,width=width, bottom=0.0,color=colours)
I would then like to label the bars according to their names which I can obtain in a list, However there are a number of values and i would like to be able to read the data names.
The very meager beginnings of an answer for you (I was doing something similar, so I just threw a quick hack to go in the right direction):
# The number of labels you'd like
In [521]: N = 5
# Where on the circle it will show up
In [522]: theta = numpy.linspace(0., 2 * numpy.pi, N + 1, endpoint = True)
In [523]: theta = theta[1:]
# Create the figure
In [524]: fig = plt.figure(figsize = (6,6), facecolor = 'white', edgecolor = None)
# Create the axis, notice polar = True
In [525]: ax = plt.subplot2grid((1, 1), (0,0), polar = True)
# Create white bars so you're really just focusing on the labels
In [526]: ax.bar(theta, numpy.ones_like(theta), align = 'center',
...: color = 'white', edgecolor = 'white')
# Create the text you're looking to add, here I just use numbers from counter = 1 to N
In [527]: counter = 1
In [528]: for t, o in zip(theta, numpy.ones_like(theta)):
...: ax.text(t, 1 - .1, counter, horizontalalignment = 'center', verticalalignment = 'center', rotation = t * 100)
...: counter += 1
In [529]: ax.set_yticklabels([])
In [530]: ax.set_xticklabels([])
In [531]: ax.grid(False)
In [531]: plt.show()

matplotlib: drawing lines between points ignoring missing data

I have a set of data which I want plotted as a line-graph. For each series, some data is missing (but different for each series). Currently matplotlib does not draw lines which skip missing data: for example
import matplotlib.pyplot as plt
xs = range(8)
series1 = [1, 3, 3, None, None, 5, 8, 9]
series2 = [2, None, 5, None, 4, None, 3, 2]
plt.plot(xs, series1, linestyle='-', marker='o')
plt.plot(xs, series2, linestyle='-', marker='o')
plt.show()
results in a plot with gaps in the lines. How can I tell matplotlib to draw lines through the gaps? (I'd rather not have to interpolate the data).
You can mask the NaN values this way:
import numpy as np
import matplotlib.pyplot as plt
xs = np.arange(8)
series1 = np.array([1, 3, 3, None, None, 5, 8, 9]).astype(np.double)
s1mask = np.isfinite(series1)
series2 = np.array([2, None, 5, None, 4, None, 3, 2]).astype(np.double)
s2mask = np.isfinite(series2)
plt.plot(xs[s1mask], series1[s1mask], linestyle='-', marker='o')
plt.plot(xs[s2mask], series2[s2mask], linestyle='-', marker='o')
plt.show()
This leads to
Qouting #Rutger Kassies (link) :
Matplotlib only draws a line between consecutive (valid) data points,
and leaves a gap at NaN values.
A solution if you are using Pandas, :
#pd.Series
s.dropna().plot() #masking (as #Thorsten Kranz suggestion)
#pd.DataFrame
df['a_col_ffill'] = df['a_col'].ffill()
df['b_col_ffill'] = df['b_col'].ffill() # changed from a to b
df[['a_col_ffill','b_col_ffill']].plot()
A solution with pandas:
import matplotlib.pyplot as plt
import pandas as pd
def splitSerToArr(ser):
return [ser.index, ser.as_matrix()]
xs = range(8)
series1 = [1, 3, 3, None, None, 5, 8, 9]
series2 = [2, None, 5, None, 4, None, 3, 2]
s1 = pd.Series(series1, index=xs)
s2 = pd.Series(series2, index=xs)
plt.plot( *splitSerToArr(s1.dropna()), linestyle='-', marker='o')
plt.plot( *splitSerToArr(s2.dropna()), linestyle='-', marker='o')
plt.show()
The splitSerToArr function is very handy, when plotting in Pandas. This is the output:
Without interpolation you'll need to remove the None's from the data. This also means you'll need to remove the X-values corresponding to None's in the series. Here's an (ugly) one liner for doing that:
x1Clean,series1Clean = zip(* filter( lambda x: x[1] is not None , zip(xs,series1) ))
The lambda function returns False for None values, filtering the x,series pairs from the list, it then re-zips the data back into its original form.
For what it may be worth, after some trial and error I would like to add one clarification to Thorsten's solution. Hopefully saving time for users who looked elsewhere after having tried this approach.
I was unable to get success with an identical problem while using
from pyplot import *
and attempting to plot with
plot(abscissa[mask],ordinate[mask])
It seemed it was required to use import matplotlib.pyplot as plt to get the proper NaNs handling, though I cannot say why.
Another solution for pandas DataFrames:
plot = df.plot(style='o-') # draw the lines so they appears in the legend
colors = [line.get_color() for line in plot.lines] # get the colors of the markers
df = df.interpolate(limit_area='inside') # interpolate
lines = plot.plot(df.index, df.values) # add more lines (with a new set of colors)
for color, line in zip(colors, lines):
line.set_color(color) # overwrite the new lines colors with the same colors as the old lines
I had the same problem, but the mask eliminate the point between and the line was cut either way (the pink lines that we see in the picture were the only not NaN data that was consecutive, that´s why the line). Here is the result of masking the data (still with gaps):
xs = df['time'].to_numpy()
series1 = np.array(df['zz'].to_numpy()).astype(np.double)
s1mask = np.isfinite(series1)
fplt.plot(xs[s1mask], series1[s1mask], ax=ax_candle, color='#FF00FF', width = 1, legend='ZZ')
Maybe because I was using finplot (to plot candle chart), so I decided to make the Y-axe points that was missing with the linear formula y2-y1=m(x2-x1) and then formulate the function that generate the Y values between the missing points.
def fillYLine(y):
#Line Formula
fi=0
first = None
next = None
for i in range(0,len(y),1):
ne = not(isnan(y[i]))
next = y[i] if ne else next
if not(next is None):
if not(first is None):
m = (first-next)/(i-fi) #m = y1 - y2 / x1 - x2
cant_points = np.abs(i-fi)-1
if (cant_points)>0:
points = createLine(next,first,i,fi,cant_points)#Create the line with the values of the difference to generate the points x that we need
x = 1
for p in points:
y[fi+x] = p
x = x + 1
first = next
fi = i
next = None
return y
def createLine(y2,y1,x2,x1,cant_points):
m = (y2-y1)/(x2-x1) #Pendiente
points = []
x = x1 + 1#first point to assign
for i in range(0,cant_points,1):
y = ((m*(x2-x))-y2)*-1
points.append(y)
x = x + 1#The values of the line are numeric we don´t use the time to assign them, but we will do it at the same order
return points
Then I use simple call the function to fill the gaps between like y = fillYLine(y), and my finplot was like:
x = df['time'].to_numpy()
y = df['zz'].to_numpy()
y = fillYLine(y)
fplt.plot(x, y, ax=ax_candle, color='#FF00FF', width = 1, legend='ZZ')
You need to think that the data in Y variable is only for the plot, I need the NaN values between in the operations (or remove them from the list), that´s why I created a Y variable from the pandas dataset df['zz'].
Note: I noticed that the data is eliminated in my case because if I don´t mask X (xs) the values slide left in the graph, in this case they become consecutive not NaN values and it draws the consecutive line but shrinked to the left:
fplt.plot(xs, series1[s1mask], ax=ax_candle, color='#FF00FF', width = 1, legend='ZZ') #No xs masking (xs[masking])
This made me think that the reason for some people to work the mask is because they are only plotting that line or there´s no great difference between the non masked and masked data (few gaps, not like my data that have a lot).
Perhaps I missed the point, but I believe Pandas now does this automatically. The example below is a little involved, and requires internet access, but the line for China has lots of gaps in the early years, hence the straight line segments.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# read data from Maddison project
url = 'http://www.ggdc.net/maddison/maddison-project/data/mpd_2013-01.xlsx'
mpd = pd.read_excel(url, skiprows=2, index_col=0, na_values=[' '])
mpd.columns = map(str.rstrip, mpd.columns)
# select countries
countries = ['England/GB/UK', 'USA', 'Japan', 'China', 'India', 'Argentina']
mpd = mpd[countries].dropna()
mpd = mpd.rename(columns={'England/GB/UK': 'UK'})
mpd = np.log(mpd)/np.log(2) # convert to log2
# plots
ax = mpd.plot(lw=2)
ax.set_title('GDP per person', fontsize=14, loc='left')
ax.set_ylabel('GDP Per Capita (1990 USD, log2 scale)')
ax.legend(loc='upper left', fontsize=10, handlelength=2, labelspacing=0.15)
fig = ax.get_figure()
fig.show()

Categories

Resources