How to properly scale graph of highly disproprtionate matrix in matplotlib? - python

I have a set of matrices I'm graphing with plt.matshow(matrix) and it works fine for matrices with dimensions that are close to each other (i.e. 56,000 x 5,000 or 64x6). However, when I try it with a 56,000x6 matrix, I just get a really large scale and no graph (See attached image), which I suspect is due to matplotlib not being sure how to scale the image. Does anyone know how to handle this?

you could use a logarithmic scale:
import matplotlib.pyplot as plt
import numpy as np
# dummy matrix:
a = np.arange(20000).reshape(10000, 2)
plt.matshow(a)
plt.yscale('log')
plt.show()
Alternatively, you can manually change the aspect of your plot:
plt.matshow(a)
plt.gca().set_aspect(0.0001)

Related

Cumulative probability plots in Matplotlib

How would I make a plot of this style in python with matplotlib? (Cumulative probability plot) I don't need complete code, mostly just need a place to start and a general idea of what I need to do for it.
A cumulative probability plot is really easy to make:
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(1000)
fig,ax = plt.subplots()
ax.plot(np.sort(data),np.linspace(0.0,1.0,len(data)))
plt.xlabel(r'$x$')
plt.ylabel(r'$P(X \leq x)$')
plt.show()
Note that it can have a strong advantage over a probability density plot as it does not require binning of your data. (Should you be looking for the latter you can check this code).

Boxcar convolve a scatter plot in python/astropy?

I believe the fix to this will be relatively simple, but I can't seem to figure out how to convolve a scatter plot that I've plotted in python.
I have 2 data arrays, one of galactic latitudes and one of galactic longitudes, and I've plotted them with a hammer projection to represent a distribution of stars in galactic coordinates.
Now, I want to use boxcar smoothing to smooth the plot with 15 degree boxes.
I have tried using astropy.convolution with convolve and Box2DKernel, but I can't seem to make it work.
I've also looked at examples from http://docs.astropy.org/en/stable/convolution/kernels.html
but I don't understand how to translate their examples to what I need to do. They seem to be plotting a 2D function and smoothing that. Can I not convolve a plot and bin up the points by where they are on the graph? The only thing that I've gotten to display anything produces a straight line and I don't understand why. I'm very new to python so this has been giving me a lot of trouble.
This is the code that I have so far:
This plots the two arrays into a hammer projection:
from astropy import units as u
import astropy.coordinates as coord
glat = coord.Angle(pos_data['GLAT']*u.degree)
glon = coord.Angle(pos_data['GLON']*u.degree)
glon= glon.wrap_at(180*u.degree)
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10,12))
ax = fig.add_subplot(211, projection="hammer")
ax.scatter(glon.radian, glat.radian)
ax.grid(True)
This is my attempt at convolving the data:
from astropy.convolution import convolve, Box2DKernel
data = [glon, glat]
kernel = Box2DKernel(10)
smoothed = convolve(data, kernel)
ax = fig.add_subplot(212, projection="hammer")
ax.scatter(smoothed[0]*u.radian, smoothed[1]*u.radian)
ax.grid(True)
Like I said, it's just one of many attempts that ended up giving something instead of an error, but I'm not sure that I'm using the function correctly at all. I'm not sure (or I don't think) that I can create "data" the way that I did, but any other combination of arrays or convolving each as a 1D array didn't work either.
Any ideas would be really helpful, thanks.
It seems like you're looking for Kernel Density Estimation, which is a way of turning individual measurements of spatial point patterns into a continuous distribution. I happen to prefer the scikit-learn implementation. You can then use the basemap package to do your plotting. The following code should be adaptable to your situation, where ra and dec are arrays of your stars' Right Ascension and Declination (you'll have to be careful about radians vs degrees here):
from sklearn.neighbors import KernelDensity
from sklearn.grid_search import GridSearchCV
data = np.column_stack((ra, dec))
# use a tophat/boxcar kernel and a haversine (spherical) metric
p = {'bandwidth': np.logspace(-1, 1, 20), 'kernel'='tophat',
'metric'='haversine'}
grid = GridSearchCV(KernelDensity(), params)
grid.fit(data)
Then you should be able to define a meshgrid over which to evaluate your KDE, and then plot it using imshow/pcolormesh/something else over a Hammer projection (see here or here)

2D color plot with irregularly spaced samples (matplotlib.mlab.griddata)

cI previously posted this over at code review, but moved it over here as I was told it is more fitting.
Basically, I want to create a colorplot of some irregularly sampled data. I've had some success with the interpolation using matplotlib.mlab.griddata. When I plot the interpolated data (using matplotlib.pyplot.imshow) however, the edges of the domain appear to be left blank. This gets better if I increase the grid density (increase N in the code) but doesn't solve the problem.
I've attached my code and would like to upload an image of the plot I can generate, but am still lacking the reputation to post an image ;)
edit: That has changed now, uploaded the plot after the changes proposed by Ajean:
. Can someone help me out as to what is going wrong?
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.mlab import griddata
# Generate Data
X=np.random.random(100)
Y=2*np.random.random(100)-1
Z=X*Y
# Interpolation
N=100j
extent=(0,1,-1,1)
xs,ys = np.mgrid[extent[0]:extent[1]:N, extent[2]:extent[3]:N]
resampled=griddata(X,Y,Z,xs,ys,interp='nn')
#Plot
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xlabel('X')
ax.set_ylabel('Y')
cplot=ax.imshow(resampled.T,extent=extent)
ticks=np.linspace(-1,1,11)
cbar=fig.colorbar(magplot,ticks=ticks,orientation='vertical')
cbar.set_label('Value', labelpad=20,rotation=270,size=16)
ax.scatter(X,Y,c='r')
It is because your calls to random don't provide you with any values at the boundary corners, therefore there is nothing to interpolate with. If you change X and Y definitions to
# Just include the four corners
X=np.concatenate([np.random.random(100),[0,0,1,1]])
Y=np.concatenate([2*np.random.random(100)-1,[-1,1,1,-1]])
You'll fill in the whole thing.

Number density contours in Python

I'm trying to reproduce this plot in python with little luck:
It's a simple number density contour currently done in SuperMongo. I'd like to drop it in favor of Python but the closest I can get is:
which is by using hexbin(). How could I go about getting the python plot to resemble the SuperMongo one? I don't have enough rep to post images, sorry for the links. Thanks for your time!
Example simple contour plot from a fellow SuperMongo => python sufferer:
import numpy as np
from matplotlib.colors import LogNorm
from matplotlib import pyplot as plt
plt.interactive(True)
fig=plt.figure(1)
plt.clf()
# generate input data; you already have that
x1 = np.random.normal(0,10,100000)
y1 = np.random.normal(0,7,100000)/10.
x2 = np.random.normal(-15,7,100000)
y2 = np.random.normal(-10,10,100000)/10.
x=np.concatenate([x1,x2])
y=np.concatenate([y1,y2])
# calculate the 2D density of the data given
counts,xbins,ybins=np.histogram2d(x,y,bins=100,normed=LogNorm())
# make the contour plot
plt.contour(counts.transpose(),extent=[xbins.min(),xbins.max(),
ybins.min(),ybins.max()],linewidths=3,colors='black',
linestyles='solid')
plt.show()
produces a nice contour plot.
The contour function offers a lot of fancy adjustments, for example let's set the levels by hand:
plt.clf()
mylevels=[1.e-4, 1.e-3, 1.e-2]
plt.contour(counts.transpose(),mylevels,extent=[xbins.min(),xbins.max(),
ybins.min(),ybins.max()],linewidths=3,colors='black',
linestyles='solid')
plt.show()
producing this plot:
And finally, in SM one can do contour plots on linear and log scales, so I spent a little time trying to figure out how to do this in matplotlib. Here is an example when the y points need to be plotted on the log scale and the x points still on the linear scale:
plt.clf()
# this is our new data which ought to be plotted on the log scale
ynew=10**y
# but the binning needs to be done in linear space
counts,xbins,ybins=np.histogram2d(x,y,bins=100,normed=LogNorm())
mylevels=[1.e-4,1.e-3,1.e-2]
# and the plotting needs to be done in the data (i.e., exponential) space
plt.contour(xbins[:-1],10**ybins[:-1],counts.transpose(),mylevels,
extent=[xbins.min(),xbins.max(),ybins.min(),ybins.max()],
linewidths=3,colors='black',linestyles='solid')
plt.yscale('log')
plt.show()
This produces a plot which looks very similar to the linear one, but with a nice vertical log axis, which is what was intended:
Have you checked out matplotlib's contour plot?
Unfortunately I couldn't view yours images. Do you mean something like this? It was done by MathGL -- GPL plotting library, which have Python interface too. And you can use arbitrary data arrays as input (including numpy's one).
You can use numpy.histogram2d to get a number density distribution of your array.
Try this example:
http://micropore.wordpress.com/2011/10/01/2d-density-plot-or-2d-histogram/

Scatter plot with a huge amount of data

I would like to use Matplotlib to generate a scatter plot with a huge amount of data (about 3 million points). Actually I've 3 vectors with the same dimension and I use to plot in the following way.
import matplotlib.pyplot as plt
import numpy as np
from numpy import *
from matplotlib import rc
import pylab
from pylab import *
fig = plt.figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)
plt.scatter(delta,vf,c=dS,alpha=0.7,cmap=cm.Paired)
Nothing special actually. But it takes too long to generate it actually (I'm working on my MacBook Pro 4 GB RAM with Python 2.7 and Matplotlib 1.0). Is there any way to improve the speed?
Unless your graphic is huge, many of those 3 million points are going to overlap.
(A 400x600 image only has 240K dots...)
So the easiest thing to do would be to take a sample of say, 1000 points, from your data:
import random
delta_sample=random.sample(delta,1000)
and just plot that.
For example:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
import random
fig = plt.figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)
N=3*10**6
delta=np.random.normal(size=N)
vf=np.random.normal(size=N)
dS=np.random.normal(size=N)
idx=random.sample(range(N),1000)
plt.scatter(delta[idx],vf[idx],c=dS[idx],alpha=0.7,cmap=cm.Paired)
plt.show()
Or, if you need to pay more attention to outliers, then perhaps you could bin your data using np.histogram, and then compose a delta_sample which has representatives from each bin.
Unfortunately, when using np.histogram I don't think there is any easy way to associate bins with individual data points. A simple, but approximate solution is to use the location of a point in or on the bin edge itself as a proxy for the points in it:
xedges=np.linspace(-10,10,100)
yedges=np.linspace(-10,10,100)
zedges=np.linspace(-10,10,10)
hist,edges=np.histogramdd((delta,vf,dS), (xedges,yedges,zedges))
xidx,yidx,zidx=np.where(hist>0)
plt.scatter(xedges[xidx],yedges[yidx],c=zedges[zidx],alpha=0.7,cmap=cm.Paired)
plt.show()
What about trying pyplot.hexbin? It generates a sort of heatmap based on point density in a set number of bins.
You could take the heatmap approach shown here. In this example the color represents the quantity of data in the bin, not the median value of the dS array, but that should be easy to change. More later if you are interested.

Categories

Resources