Display all the bins on sns distplot [duplicate]

Display all the bins on sns distplot [duplicate] - python

To simplify my problem (it's not exactly like that but I prefer simple answers to simple questions):
I have several 2D maps that portray rectangular region areas. I'd like to add on the map axes and ticks to show the distances on this map (with matplotlib, since the old code is with it), but the problem is that the areas are different sized. I'd like to put on the axes nice, clear ticks, but the widths and heights of the maps can be anything...
To try to explain what I mean: Let's say I have a map of a region whose size is 4.37 km * 6.42 km. I want that there is on x-axis ticks on 0, 1, 2, 3, and 4 km:s and on y-axis ticks on 0, 1, 2, 3, 4, 5, and 6 km:s. However, the image and the axes reach a bit further than to 4 km and 6 km, since the region is larger then 4 km * 6 km.
The space between the ticks can be constant, 1 km. However, the sizes of the maps vary quite a lot (let's say, between 5-15 km), and they are float values. My current script knows the size of the region and can scale the image into right height/width ratio, but how to tell it where to put the ticks?
There may be already solution for this problem, but since I couldn't find suitable search words for my problem, I had to ask it here...

Just set the tick locator to use matplotlib.ticker.MultipleLocator(x) where x is the spacing that you want (e.g. 1.0 in your example above).
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FormatStrFormatter
x = np.arange(20)
y = x * 0.1
fig, ax = plt.subplots()
ax.plot(x, y)
ax.xaxis.set_major_locator(MultipleLocator(1.0))
ax.yaxis.set_major_locator(MultipleLocator(1.0))
# Forcing the plot to be labeled with "plain" integers instead of scientific notation
ax.xaxis.set_major_formatter(FormatStrFormatter('%i'))
plt.show()
The advantage to this is that no matter how we zoom or interact with the plot, it will always be labeled with ticks 1 unit apart.

This should give you ticks at all integer values within your current axis limits on the x axis:
from matplotlib import pylab as plt
import math
# get values for the axis limits (unless you already have them)
xmin,xmax = plt.xlim()
# get the outermost integer values using floor and ceiling
# (I need to convert them to int to avoid a DeprecationWarning),
# then get all the integer values between them using range
new_xticks = range(int(math.ceil(xmin)),int(math.floor(xmax)+1))
plt.xticks(new_xticks,new_xticks)
# passing the same argment twice here because the first gives the tick locations
# and the second gives the tick labels, which should just be the numbers
Repeat for the y axis.
Out of curiosity: what kind of ticks do you get by default?

Okay, I tried your versions, but unfortunately I couldn't make them work, since there was some scaling and PDF locating stuff that made me (and your code suggestions) badly confused. But by testing them, I learned again a lot of python, thanks!
I managed finally to find a solution that isn't very exact but satisfies my needs. Here is how I did it.
In my version, one km is divided by a suitable integer constant named STEP_PART. The bigger is STEP_PART, the more accurate the axis values are (and if it is too big, the axis becomes messy to read). For example, if STEP_PART is 5, the accuracy is 1 km / 5 = 200 m, and ticks are put to every 200 m.
STEP_PART = 5 # In the start of the program.
height = 6.42 # These are actually given elsewhere,
width = 4.37 # but just as example...
vHeight = range(0, int(STEP_PART*height), 1) # Make tick vectors, now in format
# 0, 1, 2... instead of 0, 0.2...
vWidth = range(0, int(STEP_PART*width), 1) # Should be divided by STEP_PART
# later to get right values.
To avoid making too many axis labels (0, 1, 2... are enough, 0, 0.2, 0.4... is far too much), we replace non-integer km values with string "". Simultaneously, we divide integer km values by STEP_PART to get right values.
for j in range(len(vHeight)):
if (j % STEP_PART != 0):
vHeight[j] = ""
else:
vHeight[j] = int(vHeight[j]/STEP_PART)
for i in range(len(vWidth)):
if (i % STEP_PART != 0):
vWidth[i] = ""
else:
vWidth[i] = int(vWidth[i]/STEP_PART)
Later, after creating the graph and axes, ticks are put in that way (x axis as an example). There, x is the actual width of the picture, got with shape() command (I don't exactly understand how... there is quite a lot scaling and stuff in the code I'm modifying).
xt = np.linspace(0,x-1,len(vWidth)+1) # For locating those ticks on the same distances.
locs, labels = mpl.xticks(xt, vWidth, fontsize=9)
Repeat for y axis. The result is a graph where is ticks on every 200 m's but data labels on the integer km values. Anyway, the accuracy of those axes are 200 m's, it's not exact but it was enough for me. The script will be even better if I find out how to grow the size of the integer ticks...

Related

Matplotlib: How to increase colormap/linewidth quality in streamplot?

I have the following code to generate a streamplot based on an interp1d-Interpolation of discrete data:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
from scipy.interpolate import interp1d
# CSV Import
a1array=pd.read_csv('a1.csv', sep=',',header=None).values
rv=a1array[:,0]
a1v=a1array[:,1]
da1vM=a1array[:,2]
a1 = interp1d(rv, a1v)
da1M = interp1d(rv, da1vM)
# Bx and By vector components
def bx(x ,y):
rad = np.sqrt(x**2+y**2)
if rad == 0:
return 0
else:
return x*y/rad**4*(-2*a1(rad)+rad*da1M(rad))/2.87445E-19*1E-12
def by(x ,y):
rad = np.sqrt(x**2+y**2)
if rad == 0:
return 4.02995937E-04/2.87445E-19*1E-12
else:
return -1/rad**4*(2*a1(rad)*y**2+rad*da1M(rad)*x**2)/2.87445E-19*1E-12
Bx = np.vectorize(bx, otypes=[np.float])
By = np.vectorize(by, otypes=[np.float])
# Grid
num_steps = 11
Y, X = np.mgrid[-25:25:(num_steps * 1j), 0:25:(num_steps * 1j)]
Vx = Bx(X, Y)
Vy = By(X, Y)
speed = np.sqrt(Bx(X, Y)**2+By(X, Y)**2)
lw = 2*speed / speed.max()+.5
# Star Radius
circle3 = plt.Circle((0, 0), 16.3473140, color='black', fill=False)
# Plot
fig0, ax0 = plt.subplots(num=None, figsize=(11,9), dpi=80, facecolor='w', edgecolor='k')
strm = ax0.streamplot(X, Y, Vx, Vy, color=speed, linewidth=lw,density=[1,2], cmap=plt.cm.jet)
ax0.streamplot(-X, Y, -Vx, Vy, color=speed, linewidth=lw,density=[1,2], cmap=plt.cm.jet)
ax0.add_artist(circle3)
cbar=fig0.colorbar(strm.lines,fraction=0.046, pad=0.04)
cbar.set_label('B[GT]', rotation=270, labelpad=8)
cbar.set_clim(0,1500)
cbar.draw_all()
ax0.set_ylim([-25,25])
ax0.set_xlim([-25,25])
ax0.set_xlabel('x [km]')
ax0.set_ylabel('z [km]')
ax0.set_aspect(1)
plt.title('polyEos(0.05,2), M/R=0.2, B_r(0,0)=1402GT', y=1.01)
plt.savefig('MR02Br1402.pdf',bbox_inches=0)
plt.show(fig0)
I uploaded the csv-file here if you want to try some stuff https://www.dropbox.com/s/4t7jixpglt0mkl5/a1.csv?dl=0.
Which generates the following plot:
I am actually pretty happy with the result except for one small detail, which I can not figure out: If one looks closely the linewidth and the color change in rather big steps, which is especially visible at the center:
Is there some way/option with which I can decrease the size of this steps to especially make the colormap smother?

I had another look at this and it wasnt as painful as I thought it might be.
Add:
subdiv = 15
points = np.arange(len(t[0]))
interp_points = np.linspace(0, len(t[0]), subdiv * len(t[0]))
tgx = np.interp(interp_points, points, tgx)
tgy = np.interp(interp_points, points, tgy)
tx = np.interp(interp_points, points, tx)
ty = np.interp(interp_points, points, ty)
after ty is initialised in the trajectories loop (line 164 in my version). Just substitute whatever number of subdivisions you want for subdiv = 15. All the segments in the streamplot will be subdivided into as many equally sized segments as you choose. The colors and linewidths for each will still be properly obtained from interpolating the data.
Its not as neat as changing the integration step but it does plot exactly the same trajectories.

If you don't mind changing the streamplot code (matplotlib/streamplot.py), you could simply decrease the size of the integration steps. Inside _integrate_rk12() the maximum step size is defined as:
maxds = min(1. / dmap.mask.nx, 1. / dmap.mask.ny, 0.1)
If you decrease that, lets say:
maxds = 0.1 * min(1. / dmap.mask.nx, 1. / dmap.mask.ny, 0.1)
I get this result (left = new, right = original):
Of course, this makes the code about 10x slower, and I haven't thoroughly tested it, but it seems to work (as a quick hack) for this example.
About the density (mentioned in the comments): I personally don't see the problem of that. It's not like we are trying to visualize the actual path line of (e.g.) a particle; the density is already some arbitrary (controllable) choice, and yes it is influenced by choices in the integration, but I don't thing that it changes the (not quite sure how to call this) required visualization we're after.
The results (density) do seem to converge a bit for decreasing step sizes, this shows the results for decreasing the integration step with a factor {1,5,10,20}:

You could increase the density parameter to get more smooth color transitions,
but then use the start_points parameter to reduce your overall clutter.
The start_points parameter allows you to explicity choose the location and
number of trajectories to draw. It overrides the default, which is to plot
as many as possible to fill up the entire plot.
But first you need one little fix to your existing code:
According to the streamplot documentation, the X and Y args should be 1d arrays, not 2d arrays as produced by mgrid.
It looks like passing in 2d arrays is supported, but it is undocumented
and it is currently not compatible with the start_points parameter.
Here is how I revised your X, Y, Vx, Vy and speed:
# Grid
num_steps = 11
Y = np.linspace(-25, 25, num_steps)
X = np.linspace(0, 25, num_steps)
Ygrid, Xgrid = np.mgrid[-25:25:(num_steps * 1j), 0:25:(num_steps * 1j)]
Vx = Bx(Xgrid, Ygrid)
Vy = By(Xgrid, Ygrid)
speed = np.hypot(Vx, Vy)
lw = 3*speed / speed.max()+.5
Now you can explicitly set your start_points parameter. The start points are actually
"seed" points. Any given stream trajectory will grow in both directions
from the seed point. So if you put a seed point right in the center of
the example plot, it will grow both up and down to produce a vertical
stream line.
Besides controlling the number of trajectories, using the
start_points parameter also controls the order they are
drawn. This is important when considering how trajectories terminate.
They will either hit the border of the plot, or they will terminate if
they hit a cell of the plot that already has a trajectory. That means
your first seeds will tend to grow longer and your later seeds will tend
to get limited by previous ones. Some of the later seeds may not grow
at all. The default seeding strategy is to plant a seed at every cell,
which is pretty obnoxious if you have a high density. It also orders
them by planting seeds first along the plot borders and spiraling inward.
This may not be ideal for your particular case. I found a very simple
strategy for your example was to just plant a few seeds between those
two points of zero velocity, y=0 and x from -10 to 10. Those trajectories
grow to their fullest and fill in most of the plot without clutter.
Here is how I create the seed points and set the density:
num_streams = 8
stptsy = np.zeros((num_streams,), np.float)
stptsx_left = np.linspace(0, -10.0, num_streams)
stptsx_right = np.linspace(0, 10.0, num_streams)
stpts_left = np.column_stack((stptsx_left, stptsy))
stpts_right = np.column_stack((stptsx_right, stptsy))
density = (3,6)
And here is how I modify the calls to streamplot:
strm = ax0.streamplot(X, Y, Vx, Vy, color=speed, linewidth=lw, density=density,
cmap=plt.cm.jet, start_points=stpts_right)
ax0.streamplot(-X, Y, -Vx, Vy, color=speed, linewidth=lw,density=density,
cmap=plt.cm.jet, start_points=stpts_left)
The result basically looks like the original, but with smoother color transitions and only 15 stream lines. (sorry no reputation to inline the image)

I think your best bet is to use a colormap other than jet. Perhaps cmap=plt.cmap.plasma.
Wierd looking graphs obscure understanding of the data.
For data which is ordered in some way, like by the speed vector magnitude in this case, uniform sequential colormaps will always look smoother. The brightness of sequential maps varies monotonically over the color range, removing large percieved color changes over small ranges of data. The uniform maps vary linearly over their whole range which makes the main features in the data much more visually apparent.
(source: matplotlib.org)
The jet colormap spans a very wide variety of brightnesses over its range with in inflexion in the middle. This is responsible for the particularly egregious red to blue transition around the center region of your graph.
(source: matplotlib.org)
The matplotlib user guide on choosing a color map has a few recomendations for about selecting an appropriate map for a given data set.
I dont think there is much else you can do to improve this by just changing parameters in your plot.
The streamplot divides the graph into cells with 30*density[x,y] in each direction, at most one streamline goes through each cell. The only setting which directly increases the number of segments is the density of the grid matplotlib uses. Increasing the Y density will decrease the segment length so that the middle region may transition more smoothly. The cost of this is an inevitable cluttering of the graph in regions where the streamlines are horizontal.
You could also try to normalise the speeds differently so the the change is artifically lowered in near the center. At the end of the day though it seems like it defeats the point of the graph. The graph should provide a useful view of the data for a human to understand. Using a colormap with strange inflexions or warping the data so that it looks nicer removes some understanding which could otherwise be obtained from looking at the graph.
A more detailed discussion about the issues with colormaps like jet can be found on this blog.

Opacity misleading when plotting two histograms at the same time with matplotlib

Let's say I have two histograms and I set the opacity using the parameter of hist: 'alpha=0.5'
I have plotted two histograms yet I get three colors! I understand this makes sense from an opacity point of view.
But! It makes is very confusing to show someone a graph of two things with three colors. Can I just somehow set the smallest bar for each bin to be in front with no opacity?
Example graph

The usual way this issue is handled is to have the plots with some small separation. This is done by default when plt.hist is given multiple sets of data:
import pylab as plt
x = 200 + 25*plt.randn(1000)
y = 150 + 25*plt.randn(1000)
n, bins, patches = plt.hist([x, y])
You instead which to stack them (this could be done above using the argument histtype='barstacked') but notice that the ordering is incorrect.
This can be fixed by individually checking each pair of points to see which is larger and then using zorder to set which one comes first. For simplicity I am using the output of the code above (e.g n is two stacked arrays of the number of points in each bin for x and y):
n_x = n[0]
n_y = n[1]
for i in range(len(n[0])):
if n_x[i] > n_y[i]:
zorder=1
else:
zorder=0
plt.bar(bins[:-1][i], n_x[i], width=10)
plt.bar(bins[:-1][i], n_y[i], width=10, color="g", zorder=zorder)
Here is the resulting image:
By changing the ordering like this the image looks very weird indeed, this is probably why it is not implemented and needs a hack to do it. I would stick with the small separation method, anyone used to these plots assumes they take the same x-value.

Upper/lower limits with matplotlib

I want to plot some data points with errorbars.
Some of these data points have only upper or lower limit, instead of error bars.
So I was trying to use indices to differentiate between the points with errorbars, and the points with upper/lower limits.
However, when I try something like this:
errorbar(x[i], y[i], yerr = (ymin[i], ymax[i]))
I receive the error:
ValueError: In safezip, len(args[0])=1 but len(args[1])=2
This is similar to the discussion here, but I don't use pandas, and however, it would be really useful to me to read some other few words about that.
In any case, I tried to "turnaround" the error in the following way:
errorbar(x[i], y[i], yerr = [[ymin[i], ymax[i]]], uplims = True)
But the resulting plot is not clear: it seems that the upper limit AND the errorbars are plotted together, or the upper limit is plotted twice...
The goal is to plot upper/lower limits when the upper and lower error bars are not symmetrical, so I can choose the length of the bar before the arrow for the upper/lower limit.

This is actually one of the things that tends to annoy me about errorbar: it's very finicky about the shape and dimensionality of inputs.
What I'm assuming is that you want "error bars", but want their locations to be set by absolute upper and lower bounds, rather than by a symmetric "error" value.
What errorbar does is zip together (with safezip) your y array and yerr[0] for the lower bound (yerr[1] for upper). So your y and yerr[0] (and [1]) should be arrays with the same size, shape and number of dimensions. yerr itself doesn't need to be an array at all.
The first code that you have will work if x, y, ymin, and ymax are all one-dimensional arrays, which is what they should be. It sounds like your y is not.
However, it's important to note that since errorbar yerr are error amounts, and not absolute limits, you need to add and subtract your y from your actual lower and upper limits.
For example:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1,2,3,4])
y = np.array([1,2,3,4])
ymin = np.array([0,1,2,3])
ymax = np.array([3,4,5,6])
ytop = ymax-y
ybot = y-ymin
# This works
plt.errorbar(x, y, yerr=(ybot, ytop) )
Let me know if I'm misinterpreting anything. It would be good if you could post some example data in the form that you're using.

Matlab, Python: Fixing colormap to specified values

It is a simple but common task required when trying to fix a colormap according to a 2D matrix of values.
To demonstrate consider the problem in Matlab, the solution does not need to be in Matlab (i.e., the code presented here is only for demonstration purpose).
x = [0,1,2; 3,4,5; 6,7,8];
imagesc(x)
axis square
axis off
So the output is as:
when some values change to over the maximum value it happens like:
x = [0,1,2; 3,4,5; 6,7,18];
which looks logical but makes problems when we wish to compare/trace elements in two maps. Since the colormap association is changed it is almost impossible to find an individual cell for comparison/trace etc.
The solution I implemented is to mask the matrix as:
x = [0,1,2; 3,4,5; 6,7,18];
m = 8;
x(x>=m) = m;
which works perfectly.
Since the provided code requires searching/filtering (extra time consuming!) I wonder if there is a general/more efficient way for this job to be implemented in Matlab, Python etc?
One of the cases that this issue occurs is when we have many simulations sequentially and wish to make a sense-making animation of the progress; in this case each color should keep its association fixed.

In Python using package MatPlotLib the solution is as follows:
import pylab as pl
x = [[0,1,2],[3,4,5],[6,7,18]]
pl.matshow(x, vmin=0, vmax=8)
pl.axis('image')
pl.axis('off')
show()
So vmin and vmax are boundary limits for the full range of colormap.

The indexing is pretty quick so I don't think you need worry.
However, in Matlab, you can pass in the clims argument to imagesc:
imagesc(x,[0 8]);
This maps all values above 8 to the top colour in the colour scale, and all values below 0 to the bottom colour in the colour scale, and then stretches the scale for colours in-between.
imagesc documentation.

f1 = figure;
x = [0,1,2; 3,4,5; 6,7,8];
imagesc(x)
axis square
axis off
limits = get(gca(f1),'CLim');
f2 = figure;
z = [0,1,2; 3,4,5; 6,7,18];
imagesc(z)
axis square
axis off
caxis(limits)

How to best utilize the hist() to show a cumulative and normed histogram?

I have a problem while dealing with a data set which the value range from 0 to tens of thousand. And there is no problem to show the histogram of the whole data set using hist(). However, if I only want to show the cumulative and normed detailed histogram using say x = [0, 120], I have to use 600000 bins to assure the detail.
The tricky problem is if I just use the range of (0 ,120) to show normed and cumulative hist, it will end with 1. But actually it is far less than the real '1' since it just normed within this small range of data. Could anyone have some ideas how to utilize the hist() in matplotlib to tackle this problem? I thought this should not be so complicated that I have to write another function to draw the hist I need.

You can set bins to a list, not an integer, e.g., bins=[1,2,3,..,120,30000,60000].
To answer your commnet below, here is an excerpt from the documentation:
bins:
Either an integer number of bins or a sequence giving the bins. If bins is an integer, bins + 1 bin edges will be returned, consistent with numpy.histogram() for numpy version >= 1.3, and with the new = True argument in earlier versions. Unequally spaced bins are supported if bins is a sequence.
And here is an example with cumulative normalized histogram. Notice the effect of bins = [100,125,150,160,170,180,190,200,210,220,230,240,250,275,300] on this bar plot, how the first two bars are wider than the middle bars.

Hmmm, I guess this is related to your previous question (Memory error when dealing with huge data). My suggestion there doesn't seem to work for a cumulative histogram.
I can't get plt.hist() to play nice with cyborg's suggestion, so I did the cumsum and normalisation by hand:
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
from numpy.random import normal
inp = np.abs(normal(0, 100000, 100000))
bins = range(0, 120)
a,b = np.histogram(inp, bins = bins)
bar_edges = b[:-1]
bar_width = b[1] - b[0]
bar_height = (np.cumsum(a) + sum(inp<min(bins))) / len(inp)
plt.figure(1)
plt.bar(bar_edges, bar_height, width = bar_width)
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.