How do I bin and categorize numbers in Python? - python

I'm not sure if binning is the correct term, but I want to implement the following for a project I am working on:
I have an array or maybe a dict describing boundaries and/or regions, for example:
boundaries = OrderedDict([(10,'red'),(20,'blue'),(55,'purple')])
The areas are indexed from 0 to 100 (for example). I want to classify each area into a color (that is less than the key in the dict) and then plot it. For example, if it is less than 10, it is red.
So far, I have:
boundaries = OrderedDict([(10,'red'),(20,'blue'),(55,'purple')])
areas = range(0,101)
binned = []
for area in areas:
for border in boundaries.keys():
if area < border:
binned.append(boundaries[border])
break
Also, I need to figure out a way to define the colors and find a package to plot it. So if you have any ideas how can I plot a 2-D color plot (the actual project will be in 2-D). Maybe matplotlib or PIL? I have used matplotlib before but never for this type of data.
Also, is there a scipy/numpy function that already does what I'm trying to do? It would be nice if the code is short and fast. This is not for an assignment of any sort (it's for a little experiment / data project of mine), so I don't want to reinvent the wheel here.

import matplotlib.pyplot as plt
boundaries = collections.OrderedDict([(10,'red'),(20,'blue'),(55,'purple')])
areas = range(0,101)
n, bins, patches = plt.hist(areas, [0]+list(boundaries), histtype='bar', rwidth=1.0)
for (patch,color) in zip(patches,boundaries.values()):
patch.set_color(color)
plt.show()

Related

Colour between the rings on a python radar graph

I'm rather new to coding and i'm currently stuck on this problem.
I am trying to shade the region from 0-2 on the radar graph and have been using
ax.fill(x_as, values3, color="#757575", alpha=0.3)
where i set values 3 as 2.
However, this creates a hexagon rather than a smooth shading from 0-2.
Not sure if there is a simple way of solving this, but any input would be useful!
Cheers
Current radar graph
Without seeing your code, it is hard to be sure, but most likely you are only using 6 different values in x_as -- the same values you use for your line plots. If instead you use a more densely populated array, say with 100 values, your fill area will appear to be circular:
thetas = np.linspace(0,2*np.pi,100)
ax.fill(thetas, [2 for i in thetas], color = "#757575", alpha = 0.3)
Below a figure with some arbitrary data for the line plots and the above given code for the shaded area:
Hope this helps.

How can I account for identical data points in a scatter plot?

I'm working with some data that has several identical data points. I would like to visualize the data in a scatter plot, but scatter plotting doesn't do a good job of showing the duplicates.
If I change the alpha value, then the identical data points become darker, which is nice, but not ideal.
Is there some way to map the color of a dot to how many times it occurs in the data set? What about size? How can I assign the size of the dot to how many times it occurs in the data set?
As it was pointed out, whether this makes sense depends a bit on your dataset. If you have reasonably discrete points and exact matches make sense, you can do something like this:
import numpy as np
import matplotlib.pyplot as plt
test_x=[2,3,4,1,2,4,2]
test_y=[1,2,1,3,1,1,1] # I am just generating some test x and y values. Use your data here
#Generate a list of unique points
points=list(set(zip(test_x,test_y)))
#Generate a list of point counts
count=[len([x for x,y in zip(test_x,test_y) if x==p[0] and y==p[1]]) for p in points]
#Now for the plotting:
plot_x=[i[0] for i in points]
plot_y=[i[1] for i in points]
count=np.array(count)
plt.scatter(plot_x,plot_y,c=count,s=100*count**0.5,cmap='Spectral_r')
plt.colorbar()
plt.show()
Notice: You will need to adjust the radius (the value 100 in th s argument) according to your point density. I also used the square root of the count to scale it so that the point area is proportional to the counts.
Also note: If you have very dense points, it might be more appropriate to use a different kind of plot. Histograms for example (I personally like hexbin for 2d data) are a decent alternative in these cases.

Heatmap with varying y axis

I would like to create a visualization like the upper part of this image. Essentially, a heatmap where each point in time has a fixed number of components but these components are anchored to the y axis by means of labels (that I can supply) rather than by their first index in the heatmap's matrix.
I am aware of pcolormesh, but that does not seem to give me the y-axis functionality I seek.
Lastly, I am also open to solutions in R, although a Python option would be much preferable.
I am not completely sure if I understand your meaning correctly, but by looking at the picture you have linked, you might be best off with a roll-your-own solution.
First, you need to create an array with the heatmap values so that you have on row for each label and one column for each time slot. You fill the array with nans and then write whatever heatmap values you have to the correct positions.
Then you need to trick imshow a bit to scale and show the image in the correct way.
For example:
# create some masked data
a=cumsum(random.random((20,200)), axis=0)
X,Y=meshgrid(arange(a.shape[1]),arange(a.shape[0]))
a[Y<15*sin(X/50.)]=nan
a[Y>10+15*sin(X/50.)]=nan
# draw the image along with some curves
imshow(a,interpolation='nearest',origin='lower',extent=[-2,2,0,3])
xd = linspace(-2, 2, 200)
yd = 1 + .1 * cumsum(random.random(200)-.5)
plot(xd, yd,'w',linewidth=3)
plot(xd, yd,'k',linewidth=1)
axis('normal')
Gives:

Possible to use a custom arrow or polygon as a marker to plot location and heading in matplotlib?

I have a series of x,y coordinates and associated heading angles for multiple aircraft. I can plot the paths flown, and I would like to use a special marker to mark a particular location along the path that also shows the aircraft's heading when it was at that location.
Using matplotlib.pyplot I've used an arrowhead with no base to do this, but having to define the head and tail locations ended up with inconsistent arrowhead lengths when plotting multiple aircraft. I also used a custom three-sided symbol with the tuple (numsides, style, angle) as well as the wedge and bigvee symbols, but they never look very good.
From Custom arrow style for matplotlib, pyplot.annotate Saullo Castro showed a nice custom arrow (arrow1) that I'm wondering whether it can be used or converted in such a way as to just simply plot it at a given x,y and have its orientation defined by a heading angle.
I can plot the custom arrow with the following. Any ideas on how to rotate it to reflect a heading?
a1 = np.array([[0,0],[0,1],[-1,2],[3,0],[-1,-2],[0,-1],[0,0]], dtype=float)
polB = patches.Polygon(a1, closed=True, facecolor='grey')
ax.add_patch(polB)
Thanks in advance.
So I made the polygon a little simpler and also found that the rotation could be done by using mpl.transforms.Affine2D().rotate_deg_around():
a2 = np.array([[newX,newY+2],[newX+1,newY-1],[newX,newY],[newX-1,newY-1],[newX,newY+2]], dtype=float)
polB = patches.Polygon(a2, closed=True, facecolor='gold')
t2 = mpl.transforms.Affine2D().rotate_deg_around(newX,newY,heading) + newax.transData
polB.set_transform(t2)
newax.add_patch(polB)
I first tried to overlay the polygon on a line plotted from the x,y coordinates. However, the scales of the x and y axes were not equal (nor did I want them to be), so the polygon ended up looking all warped and stretched when rotated. I got around this by first adding a new axis with equal x/y scaling:
newax = fig.add_axes(ax.get_position(), frameon=False)
newax.set_xlim(-20,20)
newax.set_ylim(-20,20)
I could at least then rotate all I wanted and not have the warp issue. But then I needed to figure out how to basically connect the two axes so that I could plot the polygon on the new axis at a point referenced from the original axis. The way I figured to do this was by using transformations to go from the data coordinates on the original axis, converting them to display coordinates, and then inverting them back to data coordinates except this time at the data coordinates on the new axis:
inTrans = ax.transData.transform((x, y))
inv = newax.transData.inverted()
newTrans = inv.transform((inTrans[0], inTrans[1]))
newX = newTrans[0]
newY = newTrans[1]
It felt a little like some sort of Rube Goldberg machine to do it this way, but it did what I wanted.
In the end, I decided I didn't like this approach and went with keeping it simpler and using a fancy arrowhead instead of a polygon. Such is life...

Opacity misleading when plotting two histograms at the same time with matplotlib

Let's say I have two histograms and I set the opacity using the parameter of hist: 'alpha=0.5'
I have plotted two histograms yet I get three colors! I understand this makes sense from an opacity point of view.
But! It makes is very confusing to show someone a graph of two things with three colors. Can I just somehow set the smallest bar for each bin to be in front with no opacity?
Example graph
The usual way this issue is handled is to have the plots with some small separation. This is done by default when plt.hist is given multiple sets of data:
import pylab as plt
x = 200 + 25*plt.randn(1000)
y = 150 + 25*plt.randn(1000)
n, bins, patches = plt.hist([x, y])
You instead which to stack them (this could be done above using the argument histtype='barstacked') but notice that the ordering is incorrect.
This can be fixed by individually checking each pair of points to see which is larger and then using zorder to set which one comes first. For simplicity I am using the output of the code above (e.g n is two stacked arrays of the number of points in each bin for x and y):
n_x = n[0]
n_y = n[1]
for i in range(len(n[0])):
if n_x[i] > n_y[i]:
zorder=1
else:
zorder=0
plt.bar(bins[:-1][i], n_x[i], width=10)
plt.bar(bins[:-1][i], n_y[i], width=10, color="g", zorder=zorder)
Here is the resulting image:
By changing the ordering like this the image looks very weird indeed, this is probably why it is not implemented and needs a hack to do it. I would stick with the small separation method, anyone used to these plots assumes they take the same x-value.

Categories

Resources