Related
Please forgive the crude explanation but I'm unsure how to describe the issue and as they say, a picture says a thousand words, so what I am trying to achieve is to draw a graph in matplotlib that looks like the below:
whereby the scale of the color range is the same across all bars as the x limits of the x-axis.
The closest I have got to so far is this (please ignore the fact it's not horizontal - I was planning on editing that once I had figured out the coloring):
fig, ax = plt.subplots()
mpl.pyplot.viridis()
bars = ax.bar(df['Profile'], df['noise_result'])
grad = np.atleast_2d(np.linspace(0,1,256)).T
ax = bars[0].axes
lim = ax.get_xlim()+ax.get_ylim()
for bar in bars:
bar.set_zorder(1)
bar.set_facecolor('none')
x,y = bar.get_xy()
w, h = bar.get_width(), bar.get_height()
ax.imshow(grad, extent=[x,x+w,y,y+h], aspect='auto', zorder=1,interpolation='nearest')
ax.axis(lim)
which only results in a graph like below:
Many thanks
I'm going along with your approach. The idea is to:
choose an appropriate colormap
create a normalizer for the bar values.
create a mappable which is going to map the normalized values to the colormap in order to create a colorbar.
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from matplotlib.colors import Normalize
import pandas as pd
import numpy as np
df = pd.DataFrame({'key':['A', 'B', 'C', 'D', 'E'], 'val':[100, 20, 70, 40, 100]})
# create a normalizer
norm = Normalize(vmin=df['val'].min(), vmax=df['val'].max())
# choose a colormap
cmap = cm.plasma
# map values to a colorbar
mappable = cm.ScalarMappable(norm=norm, cmap=cmap)
mappable.set_array(df['val'])
fig, ax = plt.subplots()
bars = ax.bar(df['key'], df['val'])
ax = bars[0].axes
lim = ax.get_xlim()+ax.get_ylim()
for bar, val in zip(bars, df['val']):
grad = np.atleast_2d(np.linspace(0,val,256)).T
bar.set_zorder(1)
bar.set_facecolor('none')
x, y = bar.get_xy()
w, h = bar.get_width(), bar.get_height()
ax.imshow(np.flip(grad), extent=[x,x+w,y,y+h], aspect='auto', zorder=1,interpolation='nearest', cmap=cmap, norm=norm)
ax.axis(lim)
cb = fig.colorbar(mappable)
cb.set_label("Values")
Using what you have, you could change line 12 to:
ax.imshow(grad, extent=[x,x+w,y,y+h], aspect='auto', zorder=1, cmap = plt.get_cmap('gist_heat_r'))
or some other color map from:
https://matplotlib.org/stable/tutorials/colors/colormaps.html
You could also change line 3 to start as:
bars = ax.barh
for horizontal bars.
I have a parallel coordinates plot with lots of data points so I'm trying to use a continuous colour bar to represent that, which I think I have worked out. However, I haven't been able to remove the default key that is put in when creating the plot, which is very long and hinders readability. Is there a way to remove this table to make the graph much easier to read?
This is the code I'm currently using to generate the parallel coordinates plot:
parallel_coordinates(data[[' male_le','
female_le','diet','activity','obese_perc','median_income']],'median_income',colormap = 'rainbow',
alpha = 0.5)
fig, ax = plt.subplots(figsize=(6, 1))
fig.subplots_adjust(bottom=0.5)
cmap = mpl.cm.rainbow
bounds = [0.00,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
norm = mpl.colors.BoundaryNorm(bounds, cmap.N,)
plt.colorbar(mpl.cm.ScalarMappable(norm = norm, cmap=cmap),cax = ax, orientation = 'horizontal',
label = 'normalised median income', alpha = 0.5)
plt.show()
Current Output:
I want my legend to be represented as a color bar, like this:
Any help would be greatly appreciated. Thanks.
You can use ax.legend_.remove() to remove the legend.
The cax parameter of plt.colorbar indicates the subplot where to put the colorbar. If you leave it out, matplotlib will create a new subplot, "stealing" space from the current subplot (subplots are often referenced to by ax in matplotlib). So, here leaving out cax (adding ax=ax isn't necessary, as here ax is the current subplot) will create the desired colorbar.
The code below uses seaborn's penguin dataset to create a standalone example.
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import numpy as np
from pandas.plotting import parallel_coordinates
penguins = sns.load_dataset('penguins')
fig, ax = plt.subplots(figsize=(10, 4))
cmap = plt.get_cmap('rainbow')
bounds = np.arange(penguins['body_mass_g'].min(), penguins['body_mass_g'].max() + 200, 200)
norm = mpl.colors.BoundaryNorm(bounds, 256)
penguins = penguins.dropna(subset=['body_mass_g'])
parallel_coordinates(penguins[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']],
'body_mass_g', colormap=cmap, alpha=0.5, ax=ax)
ax.legend_.remove()
plt.colorbar(mpl.cm.ScalarMappable(norm=norm, cmap=cmap),
ax=ax, orientation='horizontal', label='body mass', alpha=0.5)
plt.show()
I am trying to create a self updating chart that displays a horizontal line and color bars based on a y-axis value of interest. So bars might be colored red if they are definitely above this value (given a 95% confidence interval), blue if they are definitely below this value, or white if they contain this value. something similar to this:
The problem I have is I cant display the colorbar on my plot. I managed to color each bar based on a LinearSegmentedColormap and some conditions, but I cant manage to display this colorbar on my image.
This is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
import matplotlib.axes
from matplotlib import cm
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
from matplotlib.cm import ScalarMappable
np.random.seed(12345)
df = pd.DataFrame([np.random.normal(32000,200000,3650),
np.random.normal(43000,100000,3650),
np.random.normal(43500,140000,3650),
np.random.normal(48000,70000,3650)],
index=[1992,1993,1994,1995])
means = []
for i in df.index:
means.append(df.loc[i].mean())
std = []
for i in df.index:
std.append(df.loc[i].std())
# compute the 95% confidence intervals
conf = []
for i in range(len(means)):
margin = (1.96*std[i])/sqrt(len(df.columns))
conf.append(margin)
fig, axs = plt.subplots(1)
bars = plt.bar(df.index, means, yerr= conf, tick_label = df.index, capsize = 10)
#Setup the plot
yinterest = 43000
plt.gca().spines.get('top').set_visible(False)
plt.gca().spines.get('right').set_visible(False)
plt.axhline(yinterest, color = 'black', label = '4300')
#setting the y-interest tick
plt.draw()
labels = [w.get_text() for w in ax.get_yticklabels()]
locs=list(ax.get_yticks())
labels+=[str(yinterest)]
locs+=[float(yinterest)]
ax.set_yticklabels(labels)
ax.set_yticks(locs)
plt.draw()
#setting up the colormap
colormap = cm.get_cmap('RdBu', 10)
colores = []
for i in range(len(means)):
color = (yinterest-(means[i]-conf[i]))/((means[i]+conf[i])-(means[i]-conf[i]))
bars[i].set_color(colormap(color))
I am fairly new to python (or programming for that matter) and I have searched everywhere for a solution but to no avail. Any help would be appreciated.
Greetings.
The first hint is to use pandasonic methods to compute plot data
(much more concise):
means = df.mean(axis=1)
std = df.std(axis=1)
conf = (std * 1.96 / sqrt(df.shape[1]))
And to draw your plot, run:
yinterest = 39541
fig, ax = plt.subplots(figsize=(10,6))
ax.spines.get('top').set_visible(False)
ax.spines.get('right').set_visible(False)
colors = (yinterest - (means - conf)) / (2 * conf)
colormap = plt.cm.get_cmap('RdBu', 10)
plt.bar(df.index, means, yerr=conf, tick_label=df.index, capsize=10, color=colormap(colors))
cbar = plt.colorbar(plt.cm.ScalarMappable(cmap=colormap), orientation='horizontal')
cbar.set_label('Color', labelpad=5)
plt.axhline(yinterest, color='black', linestyle='--', linewidth=1)
plt.show()
One trick that allows to avoid colouring the bars after their
generation is that I compute colors, which are then converted to
a color map and passed to plt.bar.
To draw the color bar, use plt.colorbar.
I changed the value of yinterest to that included in your picture and got
something similar to your picture, but with a color bar:
I am using matplotlib to make some plots and I have run into a few difficulties that I need help with.
problem 1) In order to keep a consistent colorscheme I need to only use half of the color axis. There are only positive values, so I want the zero values to be green, the mid values to be yellow and the highest values to be red. The color scheme that most closely matches this is gist_rainbow_r, but I only want the top half of it.
problem 2) I can't seem to figure out how to get the colorbar on the right hand side of the plot to show up or how to get it to let me label the axes.
If it helps, I am using the latest version of Anaconda wth the latext version of matplotlib
cmap = plt.get_cmap('gist_rainbow_r')
edosfig2 = plt.figure(2)
edossub2 = edosfig.add_subplot(1,1,1)
edossub2 = plt.contourf(eVec,kints,smallEDOS,cmap=cmap)
edosfig2.show()
If you have a specific set of colors that you want to use for you colormap, you can build it based on those. For example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
cmap = LinearSegmentedColormap.from_list('name', ['green', 'yellow', 'red'])
# Generate some data similar to yours
y, x = np.mgrid[-200:1900, -300:2000]
z = np.cos(np.hypot(x, y) / 100) + 1
fig, ax = plt.subplots()
cax = ax.contourf(x, y, z, cmap=cmap)
cbar = fig.colorbar(cax)
cbar.set_label('Z-Values')
plt.show()
However, if you did just want the top half of some particularly complex colormap, you can copy a portion of it by evaluating the colormap over the range you're interested in. For example, if you wanted the "top" half, you'd evaluate it from 0.5 to 1:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
# Evaluate an existing colormap from 0.5 (midpoint) to 1 (upper end)
cmap = plt.get_cmap('gist_earth')
colors = cmap(np.linspace(0.5, 1, cmap.N // 2))
# Create a new colormap from those colors
cmap2 = LinearSegmentedColormap.from_list('Upper Half', colors)
y, x = np.mgrid[-200:1900, -300:2000]
z = np.cos(np.hypot(x, y) / 100) + 1
fig, axes = plt.subplots(ncols=2)
for ax, cmap in zip(axes.flat, [cmap, cmap2]):
cax = ax.imshow(z, cmap=cmap, origin='lower',
extent=[x.min(), x.max(), y.min(), y.max()])
cbar = fig.colorbar(cax, ax=ax, orientation='horizontal')
cbar.set_label(cmap.name)
plt.show()
I created a histogram plot using data from a file and no problem. Now I wanted to superpose data from another file in the same histogram, so I do something like this
n,bins,patchs = ax.hist(mydata1,100)
n,bins,patchs = ax.hist(mydata2,100)
but the problem is that for each interval, only the bar with the highest value appears, and the other is hidden. I wonder how could I plot both histograms at the same time with different colors.
Here you have a working example:
import random
import numpy
from matplotlib import pyplot
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]
bins = numpy.linspace(-10, 10, 100)
pyplot.hist(x, bins, alpha=0.5, label='x')
pyplot.hist(y, bins, alpha=0.5, label='y')
pyplot.legend(loc='upper right')
pyplot.show()
The accepted answers gives the code for a histogram with overlapping bars, but in case you want each bar to be side-by-side (as I did), try the variation below:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-deep')
x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
bins = np.linspace(-10, 10, 30)
plt.hist([x, y], bins, label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()
Reference: http://matplotlib.org/examples/statistics/histogram_demo_multihist.html
EDIT [2018/03/16]: Updated to allow plotting of arrays of different sizes, as suggested by #stochastic_zeitgeist
In the case you have different sample sizes, it may be difficult to compare the distributions with a single y-axis. For example:
import numpy as np
import matplotlib.pyplot as plt
#makes the data
y1 = np.random.normal(-2, 2, 1000)
y2 = np.random.normal(2, 2, 5000)
colors = ['b','g']
#plots the histogram
fig, ax1 = plt.subplots()
ax1.hist([y1,y2],color=colors)
ax1.set_xlim(-10,10)
ax1.set_ylabel("Count")
plt.tight_layout()
plt.show()
In this case, you can plot your two data sets on different axes. To do so, you can get your histogram data using matplotlib, clear the axis, and then re-plot it on two separate axes (shifting the bin edges so that they don't overlap):
#sets up the axis and gets histogram data
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.hist([y1, y2], color=colors)
n, bins, patches = ax1.hist([y1,y2])
ax1.cla() #clear the axis
#plots the histogram data
width = (bins[1] - bins[0]) * 0.4
bins_shifted = bins + width
ax1.bar(bins[:-1], n[0], width, align='edge', color=colors[0])
ax2.bar(bins_shifted[:-1], n[1], width, align='edge', color=colors[1])
#finishes the plot
ax1.set_ylabel("Count", color=colors[0])
ax2.set_ylabel("Count", color=colors[1])
ax1.tick_params('y', colors=colors[0])
ax2.tick_params('y', colors=colors[1])
plt.tight_layout()
plt.show()
As a completion to Gustavo Bezerra's answer:
If you want each histogram to be normalized (normed for mpl<=2.1 and density for mpl>=3.1) you cannot just use normed/density=True, you need to set the weights for each value instead:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
x_w = np.empty(x.shape)
x_w.fill(1/x.shape[0])
y_w = np.empty(y.shape)
y_w.fill(1/y.shape[0])
bins = np.linspace(-10, 10, 30)
plt.hist([x, y], bins, weights=[x_w, y_w], label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()
As a comparison, the exact same x and y vectors with default weights and density=True:
You should use bins from the values returned by hist:
import numpy as np
import matplotlib.pyplot as plt
foo = np.random.normal(loc=1, size=100) # a normal distribution
bar = np.random.normal(loc=-1, size=10000) # a normal distribution
_, bins, _ = plt.hist(foo, bins=50, range=[-6, 6], normed=True)
_ = plt.hist(bar, bins=bins, alpha=0.5, normed=True)
Here is a simple method to plot two histograms, with their bars side-by-side, on the same plot when the data has different sizes:
def plotHistogram(p, o):
"""
p and o are iterables with the values you want to
plot the histogram of
"""
plt.hist([p, o], color=['g','r'], alpha=0.8, bins=50)
plt.show()
Plotting two overlapping histograms (or more) can lead to a rather cluttered plot. I find that using step histograms (aka hollow histograms) improves the readability quite a bit. The only downside is that in matplotlib the default legend for a step histogram is not properly formatted, so it can be edited like in the following example:
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
from matplotlib.lines import Line2D
rng = np.random.default_rng(seed=123)
# Create two normally distributed random variables of different sizes
# and with different shapes
data1 = rng.normal(loc=30, scale=10, size=500)
data2 = rng.normal(loc=50, scale=10, size=1000)
# Create figure with 'step' type of histogram to improve plot readability
fig, ax = plt.subplots(figsize=(9,5))
ax.hist([data1, data2], bins=15, histtype='step', linewidth=2,
alpha=0.7, label=['data1','data2'])
# Edit legend to get lines as legend keys instead of the default polygons
# and sort the legend entries in alphanumeric order
handles, labels = ax.get_legend_handles_labels()
leg_entries = {}
for h, label in zip(handles, labels):
leg_entries[label] = Line2D([0], [0], color=h.get_facecolor()[:-1],
alpha=h.get_alpha(), lw=h.get_linewidth())
labels_sorted, lines = zip(*sorted(leg_entries.items()))
ax.legend(lines, labels_sorted, frameon=False)
# Remove spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# Add annotations
plt.ylabel('Frequency', labelpad=15)
plt.title('Matplotlib step histogram', fontsize=14, pad=20)
plt.show()
As you can see, the result looks quite clean. This is especially useful when overlapping even more than two histograms. Depending on how the variables are distributed, this can work for up to around 5 overlapping distributions. More than that would require the use of another type of plot, such as one of those presented here.
It sounds like you might want just a bar graph:
http://matplotlib.sourceforge.net/examples/pylab_examples/bar_stacked.html
http://matplotlib.sourceforge.net/examples/pylab_examples/barchart_demo.html
Alternatively, you can use subplots.
There is one caveat when you want to plot the histogram from a 2-d numpy array. You need to swap the 2 axes.
import numpy as np
import matplotlib.pyplot as plt
data = np.random.normal(size=(2, 300))
# swapped_data.shape == (300, 2)
swapped_data = np.swapaxes(x, axis1=0, axis2=1)
plt.hist(swapped_data, bins=30, label=['x', 'y'])
plt.legend()
plt.show()
Also an option which is quite similar to joaquin answer:
import random
from matplotlib import pyplot
#random data
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]
#plot both histograms(range from -10 to 10), bins set to 100
pyplot.hist([x,y], bins= 100, range=[-10,10], alpha=0.5, label=['x', 'y'])
#plot legend
pyplot.legend(loc='upper right')
#show it
pyplot.show()
Gives the following output:
Just in case you have pandas (import pandas as pd) or are ok with using it:
test = pd.DataFrame([[random.gauss(3,1) for _ in range(400)],
[random.gauss(4,2) for _ in range(400)]])
plt.hist(test.values.T)
plt.show()
This question has been answered before, but wanted to add another quick/easy workaround that might help other visitors to this question.
import seasborn as sns
sns.kdeplot(mydata1)
sns.kdeplot(mydata2)
Some helpful examples are here for kde vs histogram comparison.
Inspired by Solomon's answer, but to stick with the question, which is related to histogram, a clean solution is:
sns.distplot(bar)
sns.distplot(foo)
plt.show()
Make sure to plot the taller one first, otherwise you would need to set plt.ylim(0,0.45) so that the taller histogram is not chopped off.