Opacity misleading when plotting two histograms at the same time with matplotlib

Opacity misleading when plotting two histograms at the same time with matplotlib - python

Let's say I have two histograms and I set the opacity using the parameter of hist: 'alpha=0.5'
I have plotted two histograms yet I get three colors! I understand this makes sense from an opacity point of view.
But! It makes is very confusing to show someone a graph of two things with three colors. Can I just somehow set the smallest bar for each bin to be in front with no opacity?
Example graph

The usual way this issue is handled is to have the plots with some small separation. This is done by default when plt.hist is given multiple sets of data:
import pylab as plt
x = 200 + 25*plt.randn(1000)
y = 150 + 25*plt.randn(1000)
n, bins, patches = plt.hist([x, y])
You instead which to stack them (this could be done above using the argument histtype='barstacked') but notice that the ordering is incorrect.
This can be fixed by individually checking each pair of points to see which is larger and then using zorder to set which one comes first. For simplicity I am using the output of the code above (e.g n is two stacked arrays of the number of points in each bin for x and y):
n_x = n[0]
n_y = n[1]
for i in range(len(n[0])):
if n_x[i] > n_y[i]:
zorder=1
else:
zorder=0
plt.bar(bins[:-1][i], n_x[i], width=10)
plt.bar(bins[:-1][i], n_y[i], width=10, color="g", zorder=zorder)
Here is the resulting image:
By changing the ordering like this the image looks very weird indeed, this is probably why it is not implemented and needs a hack to do it. I would stick with the small separation method, anyone used to these plots assumes they take the same x-value.

Related

How can I account for identical data points in a scatter plot?

I'm working with some data that has several identical data points. I would like to visualize the data in a scatter plot, but scatter plotting doesn't do a good job of showing the duplicates.
If I change the alpha value, then the identical data points become darker, which is nice, but not ideal.
Is there some way to map the color of a dot to how many times it occurs in the data set? What about size? How can I assign the size of the dot to how many times it occurs in the data set?

As it was pointed out, whether this makes sense depends a bit on your dataset. If you have reasonably discrete points and exact matches make sense, you can do something like this:
import numpy as np
import matplotlib.pyplot as plt
test_x=[2,3,4,1,2,4,2]
test_y=[1,2,1,3,1,1,1] # I am just generating some test x and y values. Use your data here
#Generate a list of unique points
points=list(set(zip(test_x,test_y)))
#Generate a list of point counts
count=[len([x for x,y in zip(test_x,test_y) if x==p[0] and y==p[1]]) for p in points]
#Now for the plotting:
plot_x=[i[0] for i in points]
plot_y=[i[1] for i in points]
count=np.array(count)
plt.scatter(plot_x,plot_y,c=count,s=100*count**0.5,cmap='Spectral_r')
plt.colorbar()
plt.show()
Notice: You will need to adjust the radius (the value 100 in th s argument) according to your point density. I also used the square root of the count to scale it so that the point area is proportional to the counts.
Also note: If you have very dense points, it might be more appropriate to use a different kind of plot. Histograms for example (I personally like hexbin for 2d data) are a decent alternative in these cases.

How do I bin and categorize numbers in Python?

I'm not sure if binning is the correct term, but I want to implement the following for a project I am working on:
I have an array or maybe a dict describing boundaries and/or regions, for example:
boundaries = OrderedDict([(10,'red'),(20,'blue'),(55,'purple')])
The areas are indexed from 0 to 100 (for example). I want to classify each area into a color (that is less than the key in the dict) and then plot it. For example, if it is less than 10, it is red.
So far, I have:
boundaries = OrderedDict([(10,'red'),(20,'blue'),(55,'purple')])
areas = range(0,101)
binned = []
for area in areas:
for border in boundaries.keys():
if area < border:
binned.append(boundaries[border])
break
Also, I need to figure out a way to define the colors and find a package to plot it. So if you have any ideas how can I plot a 2-D color plot (the actual project will be in 2-D). Maybe matplotlib or PIL? I have used matplotlib before but never for this type of data.
Also, is there a scipy/numpy function that already does what I'm trying to do? It would be nice if the code is short and fast. This is not for an assignment of any sort (it's for a little experiment / data project of mine), so I don't want to reinvent the wheel here.

import matplotlib.pyplot as plt
boundaries = collections.OrderedDict([(10,'red'),(20,'blue'),(55,'purple')])
areas = range(0,101)
n, bins, patches = plt.hist(areas, [0]+list(boundaries), histtype='bar', rwidth=1.0)
for (patch,color) in zip(patches,boundaries.values()):
patch.set_color(color)
plt.show()

Matplotlib_venn: weird output

I have a script that makes a Venn diagram of 3 sets using the matplotlib_venn module, as follows:
union = set_1.union(set_2).union(set_3)
indicators = ['%d%d%d' % (a in set_1, a in set_2, a in set_3) for a in union]
subsets = Counter(indicators)
fig = plt.figure((n+1)*2 - 1)
ax = fig.add_subplot(1, 1, 1)
v = venn3(subsets, (compare[0], compare[1], compare[2]), ax=ax)
plt.show()
Here are two images I get out of this, from two different datasets with three sets each (one small and one large):
In this image, The numbers are off. 180 should be in the middle and the 2 should be somewhere on the right side of the image, at the barely visible yellow/green part. I first thought this was due to the small size of the data set, but looking at the larger set ...
... I can still see that the numbers are slightly off, although not as much as previously. The larger, common set is still not in the middle, and the other numbers seem to be a little off to where the "center" of their set is.
Any ideas as to why this is, and what can be done to remedy the problem?
Using the venn3_unweighted function rather than venn3 shows perfectly nice (an non-proportional) images, including any 0s in the smaller data set, but it just doesn't work with the proportional version.

Heatmap with varying y axis

I would like to create a visualization like the upper part of this image. Essentially, a heatmap where each point in time has a fixed number of components but these components are anchored to the y axis by means of labels (that I can supply) rather than by their first index in the heatmap's matrix.
I am aware of pcolormesh, but that does not seem to give me the y-axis functionality I seek.
Lastly, I am also open to solutions in R, although a Python option would be much preferable.

I am not completely sure if I understand your meaning correctly, but by looking at the picture you have linked, you might be best off with a roll-your-own solution.
First, you need to create an array with the heatmap values so that you have on row for each label and one column for each time slot. You fill the array with nans and then write whatever heatmap values you have to the correct positions.
Then you need to trick imshow a bit to scale and show the image in the correct way.
For example:
# create some masked data
a=cumsum(random.random((20,200)), axis=0)
X,Y=meshgrid(arange(a.shape[1]),arange(a.shape[0]))
a[Y<15*sin(X/50.)]=nan
a[Y>10+15*sin(X/50.)]=nan
# draw the image along with some curves
imshow(a,interpolation='nearest',origin='lower',extent=[-2,2,0,3])
xd = linspace(-2, 2, 200)
yd = 1 + .1 * cumsum(random.random(200)-.5)
plot(xd, yd,'w',linewidth=3)
plot(xd, yd,'k',linewidth=1)
axis('normal')
Gives:

Matlab, Python: Fixing colormap to specified values

It is a simple but common task required when trying to fix a colormap according to a 2D matrix of values.
To demonstrate consider the problem in Matlab, the solution does not need to be in Matlab (i.e., the code presented here is only for demonstration purpose).
x = [0,1,2; 3,4,5; 6,7,8];
imagesc(x)
axis square
axis off
So the output is as:
when some values change to over the maximum value it happens like:
x = [0,1,2; 3,4,5; 6,7,18];
which looks logical but makes problems when we wish to compare/trace elements in two maps. Since the colormap association is changed it is almost impossible to find an individual cell for comparison/trace etc.
The solution I implemented is to mask the matrix as:
x = [0,1,2; 3,4,5; 6,7,18];
m = 8;
x(x>=m) = m;
which works perfectly.
Since the provided code requires searching/filtering (extra time consuming!) I wonder if there is a general/more efficient way for this job to be implemented in Matlab, Python etc?
One of the cases that this issue occurs is when we have many simulations sequentially and wish to make a sense-making animation of the progress; in this case each color should keep its association fixed.

In Python using package MatPlotLib the solution is as follows:
import pylab as pl
x = [[0,1,2],[3,4,5],[6,7,18]]
pl.matshow(x, vmin=0, vmax=8)
pl.axis('image')
pl.axis('off')
show()
So vmin and vmax are boundary limits for the full range of colormap.

The indexing is pretty quick so I don't think you need worry.
However, in Matlab, you can pass in the clims argument to imagesc:
imagesc(x,[0 8]);
This maps all values above 8 to the top colour in the colour scale, and all values below 0 to the bottom colour in the colour scale, and then stretches the scale for colours in-between.
imagesc documentation.

f1 = figure;
x = [0,1,2; 3,4,5; 6,7,8];
imagesc(x)
axis square
axis off
limits = get(gca(f1),'CLim');
f2 = figure;
z = [0,1,2; 3,4,5; 6,7,18];
imagesc(z)
axis square
axis off
caxis(limits)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Opacity misleading when plotting two histograms at the same time with matplotlib - python

Related

How can I account for identical data points in a scatter plot?

How do I bin and categorize numbers in Python?

Matplotlib_venn: weird output

Heatmap with varying y axis

Matlab, Python: Fixing colormap to specified values

Categories

Resources