I'm looking for some way in to convert a scatter plot (X vs Y, color normalized by Z) into a 2D "pixel" image. I.e. how can I plot a pixelized image where the pixels are colored according to a third variable?
In my case, I have a list of galaxies, each a with sky coordinate (X,Y) and a distance (Z). I want to make a pixelized image of X vs Y, with the pixels color normalized according to Z (e.g. the median Z value for the galaxies in that pixel).
I know I could do something like this with hexbin, but I would like to have square pixels, not hexagons. (Something more like what imshow produces).
I'm still learning python, so if there is a simple/quick way to do this (or clear instructions on how to do it the complicated way!) that'd be great.
Any help would be much appreciated!
Okay - there are two ways that you can do this. One would be for you to have a discreet number of bins for the distances (like d < 10pc, 10pc < d < 20pc, d> 20pc). This is relatively easy, all you need to do are a few loops - here is an example with 3:
raclose = []
ramid = []
rafar = []
decdlose = []
decmid = []
decfar = []
for ii in range(len(dist)):
if dist[ii] < 10.:
raclose.append(ra[ii])
decclose.append(dec[ii])
elif dist[ii] > 20.:
rafar.append(ra[ii])
decfar.append(dec[ii])
else:
ramid.append(ra[ii])
decmid.append(dec[ii])
plt.clf
ax1 = scatter(raclose, decclose, marker='o', s=20, color="darkgreen", alpha=0.6)
ax2 = scatter(ramid, decmid, marker='o', s=20, color="goldenrod", alpha=0.6)
ax3 = scatter(rafar, decfar, marker='o', s=20, color="firebrick", alpha=0.6)
line1 = Line2D(range(10), range(10), marker='o', color="darkgreen")
line2 = Line2D(range(10), range(10), marker='o',color="goldenrod")
line3 = Line2D(range(10), range(10), marker='o',color="firebrick")
plt.legend((line1,line2,line3),('d < 10pc','20pc > d > 10pc', 'd > 20pc'),numpoints=1, loc=3)
show()
Or you can do a contour plot, such that you stipulate RA on the x-axis and Dec on the y-axis and fill in the plot with the distances. Both RA and Dec are 1D arrays with the respective coordinates. Then you make a 2D array with the distance. Determine what the median/mean value of the distances are and then divide the 2D array by that value to normalize it. Finally, plot using a contour plot (using contourf or imshow), like:
import matplotlib.pyplot as plt
from matplotlib import cm
ax = pylab.contourf(RA,Dec,dists, levels=[1, 5, 10, 15], cmap=plt.cm.spectral)
cbar=pylab.colorbar()
Related
I have a 2d array that I am trying to plot. Im currently using imshow and matplotlib. The array contains values from -4000 to 1. There are no zeros (and nothing closer to zero than 1). Basically it can be imagined as a relief/topography, negative values for beneath ocean level, positive for land.
Now I would like to plot all negative values with a blue colormap (something like the 'Blues' colormap in matplotlib https://matplotlib.org/stable/tutorials/colors/colormaps.html ) and something like 'Oranges' for positive values. So the desired result is an image with all negative values blue, the darker the more negative, and all positive values colored orange with a clear distinction in color, i.e. if possible the cmap should not go all the way to white like 'Blues' but only to a light blue.
I have tried this Colorplot that distinguishes between positive and negative values but I was not able to achieve the desired result. I also checked this https://matplotlib.org/stable/gallery/color/custom_cmap.html#sphx-glr-gallery-color-custom-cmap-py but again it did not result in what I wanted. Another solution I had in mind was turning my array into two arrays, one for positive and one for negative values, and then plot them in the same fig with different cmaps but it always takes the same cmap.
Edit: Here are a few snippets of my attempts, incl. the two suggestions by Jody Klymak and JohanC. The two suggested solutions work pretty well, the only problem I am still getting is that contiguous regions of land=1 are not filled with one color but have white squares in them or are only dotted with orange.
current_cmap = mpl.cm.get_cmap('Blues_r').copy()
new_cmap = current_cmap
maee = np.ma.masked_where(GD1 == 1, GD1)
new_cmap.set_bad(color='brown')
plt.imshow(maee, cmap=blue_red3, interpolation='bilinear')
plt.colorbar()
plt.show()
plt.clf()
norm = colors.TwoSlopeNorm(vmin=-4000., vcenter=0., vmax=1)
cmap1 = mpl.cm.get_cmap('Blues_r').copy()
cmap1.set_over(color='brown')
plt.imshow(GD1, cmap=cmap1, interpolation='bilinear', vmax=-5)
plt.colorbar()
plt.show()
plt.clf()
cmap = LinearSegmentedColormap.from_list('custom blue', ['blue','lightblue'], N=256)
cmap.set_over('brown')
plt.imshow(GD1, cmap=cmap, vmax=-5)
plt.colorbar()
plt.show()
plt.clf()
plt.contourf(GD1, cmap=cmap1, vmax=-5)
plt.colorbar()
plt.show()
plt.clf()
GD1[GD1 == 1] = 4000
plt.imshow(GD1, cmap='RdBu_r', vmax=4000, vmin=-4000, interpolation='bilinear')
plt.colorbar()
plt.show()
plt.clf()
m1 = np.ma.masked_where(GD1 > 0, GD1)
cmap2 = mpl.cm.get_cmap('Blues_r').copy()
cmap2.set_bad('brown')
plt.imshow(m1, cmap=cmap2, interpolation='bilinear', norm=norm)
plt.colorbar()
plt.show()
plt.clf()
My data comes from datasets that are then put into an array. Sort of along these lines (this can not reproduce my problem):
shape = (180, 360)
# a = np.random.randint(-4000,1,shape).astype(float)
# a[a>-3500] = np.nan
a=np.empty(shape)
a[:] = np.nan
a[0:10,0:360] = 1
a[10:75,100:250] = 1
a[50:75,100:125] = np.nan
a[10:100,50:51] = -4000
a[100:150,60:65] = -500
a[150:152,100:300] = -500
plt.imshow(a,interpolation='nearest')
plt.colorbar()
plt.show()
array = np.ma.masked_invalid(a)
x = np.arange(0, array.shape[1])
y = np.arange(0, array.shape[0])
xx, yy = np.meshgrid(x, y)
x1 = xx[~array.mask]
y1 = yy[~array.mask]
newarr = array[~array.mask]
GD1 = interpolate.griddata((x1, y1), newarr.ravel(), (xx, yy), method='linear')
(interpolation taken from interpolate missing values 2d python). I also had the thought that the interpolation was the problem and not the plotting but when I save the array as a csv and go through the said regions, they don't have any holes, i.e. the regions are filled contiguously with 1s.
The question is to read 10,000 coordinate points from a file and create a colored grid based on the density of each block on the grid. The range of x-axis is [-73.59, -73.55] and the y-axis is [45.49,45.530]. My code will plot a grid with many different colors, now I need a feature to only color the grid that has a specific density n, for example, The n = 100, only the grid with 100 points or higher will be colored to yellow, and other grids will be black.
I just added a link to my shapefile
https://drive.google.com/open?id=1H-8FhfonnPrYW9y7RQZDtiNLxVEiC6R8
import numpy as np
import matplotlib.pyplot as plt
import shapefile
grid_size = 0.002
x1 = np.arange(-73.59,-73.55,grid_size)
y1 = np.arange(45.49,45.530,grid_size)
shape = shapefile.Reader("Shape/crime_dt.shp",encoding='ISO-8859-1')
shapeRecords = shape.shapeRecords()
x_coordinates=[]
y_coordinates=[]
# read all points in .shp file, and store them in 2 lists.
for k in range(len(shapeRecords)):
x = float(shapeRecords[k].shape.__geo_interface__["coordinates"][0])
y = float(shapeRecords[k].shape.__geo_interface__["coordinates"][1])
x_coordinates.append(x)
y_coordinates.append(y)
plt.hist2d(x_coordinates,y_coordinates,bins=[x1,y1])
plt.show()
You can create a colormap with just two colors, and set vmin and vmax to be symmetrical around your desired pivot value.
Optionally you put the value of each bin inside the cells, while the pivot value decides the text color.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
grid_size = 0.002
x1 = np.arange(-73.59, -73.55, grid_size)
y1 = np.arange(45.49, 45.530, grid_size)
# read coordinates from file and put them into two lists, similar to this
x_coordinates = np.random.uniform(x1.min(), x1.max(), size=40000)
y_coordinates = np.random.uniform(y1.min(), y1.max(), size=40000)
pivot_value = 100
# create a colormap with two colors, vmin and vmax are chosen so that their center is the pivot value
cmap = ListedColormap(['indigo', 'gold'])
# create a 2d histogram with xs and ys as bin boundaries
binvalues, _, _, _ = plt.hist2d(x_coordinates, y_coordinates, bins=[x1, y1], cmap=cmap, vmin=0, vmax=2*pivot_value)
binvalues = binvalues.astype(np.int)
for i in range(len(x1) - 1):
for j in range(len(y1) - 1):
plt.text((x1[i] + x1[i + 1]) / 2, (y1[j] + y1[j + 1]) / 2, binvalues[i, j],
color='white' if binvalues[i, j] < pivot_value else 'black',
ha='center', va='center', size=8)
plt.show()
PS: If the bin values are very important, you can add them all as ticks. Then, their positions can also be used to draw gridlines as a division between the cells.
plt.yticks(y1)
plt.xticks(x1, rotation=90)
plt.grid(True, ls='-', lw=1, color='black')
To obtain contours based on these data, you could plt.contourf with the generated matrix. (You might want to use np.histogram2d to directly create the matrix.)
plt.contourf((x1[1:]+x1[:-1])/2, (y1[1:]+y1[:-1])/2, binvalues.T, levels=[0,100,1000], cmap=cmap)
I have a matrix generated by parsing a file the numpy array is the size 101X101X41 and each entry has a value which represents the magnitude at each point.
Now what I want to do is to plot it in a 3d plot where the 4th dimension will be represented by color. so that I will be able to see the shape of the data points (represent molecular orbitals) and deduce its magnitude at that point.
If I plot each slice of data I get the desired outcome, but in a 2d with the 3rd dimension as the color.
Is there a way to plot this model in python using Matplotlib or equivalent library
Thanks
EDIT:
Im trying to get the question clearer to what I desire.
Ive tried the solution suggested but ive received the following plot:
as one can see, due to the fact the the mesh has lots of zeros in it it "hide" the 3d orbitals. in the following plot one can see a slice of the data, where I get the following plot:
So as you can see I have a certain structure I desire to show in the plot.
my question is, is there a way to plot only the structure and ignore the zeroes such that they won't "hide" the structure.
the code I used to generate the plots:
x = np.linspase(1,101,101)
y = np.linspase(1,101,101)
z = np.linspase(1,101,101)
xx,yy,zz = np.meshgrid(x,y,z)
fig=plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(xx, yy, zz, c=cube.calc_data.flatten())
plt.show()
plt.imshow(cube.calc_data[:,:,11],cmap='jet')
plt.show()
Hope that now the question is much clearer, and that you'd appreciate the question enough now to upvote
Thanks.
you can perform the following:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
epsilon = 2.5e-2 # threshold
height, width, depth = data.shape
global_min = np.inf
global_max = -np.inf
for d in range(depth):
slice = data[:, :, d]
minima = slice.min()
if (minima < global_min): global_min = minima
maxima = slice.max()
if (maxima>global_max): global_max=maxima
norm = colors.Normalize(vmin=minima, vmax=maxima, clip=True)
mapper = cm.ScalarMappable(norm=norm, cmap=cm.jet)
points_gt_epsilon = np.where(slice >= epsilon)
ax.scatter(points_gt_epsilon[0], points_gt_epsilon[1], d,
c=mapper.to_rgba(data[points_gt_epsilon[0],points_gt_epsilon[1],d]), alpha=0.015, cmap=cm.jet)
points_lt_epsilon = np.where(slice <= -epsilon)
ax.scatter(points_lt_epsilon[0], points_lt_epsilon[1], d,
c=mapper.to_rgba(data[points_lt_epsilon[0], points_lt_epsilon[1], d]), alpha=0.015, cmap=cm.jet)
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
plt.title('Electron Density Prob.')
norm = colors.Normalize(vmin=global_min, vmax=global_max, clip=True)
cax, _ = colorbar.make_axes(ax)
colorbar.ColorbarBase(cax, cmap=cm.jet,norm=norm)
plt.savefig('test.png')
plt.clf()
What this piece of code does is going slice by slice from the data matrix and for each scatter plot only the points desired (depend on epsilon).
in this case you avoid plotting a lot of zeros that 'hide' your model, using your words.
Hope this helps
You can adjust the color and size of the markers for the scatter. So for example you can filter out all markers below a certain threshold by putting their size to 0. You can also make the size of the marker adaptive to the field strength.
As an example:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
f = lambda x,y,z: np.exp(-(x-3)**2-(y-3)**2-(z-1)**2) - \
np.exp(-(x+3)**2-(y+3)**2-(z+1)**2)
t1 = np.linspace(-6,6,101)
t2 = np.linspace(-3,3,41)
# Data of shape 101,101,41
data = f(*np.meshgrid(t1,t1,t2))
print(data.shape)
# Coordinates
x = np.linspace(1,101,101)
y = np.linspace(1,101,101)
z = np.linspace(1,101,41)
xx,yy,zz = np.meshgrid(x,y,z)
fig=plt.figure()
ax = fig.add_subplot(111, projection='3d')
s = np.abs(data/data.max())**2*25
s[np.abs(data) < 0.05] = 0
ax.scatter(xx, yy, zz, s=s, c=data.flatten(), linewidth=0, cmap="jet", alpha=.5)
plt.show()
I wrote some code a while ago that used gaussian kde to make simple density scatter plots. However, for datasets larger than about 100,000 points, it just ran 'forever' (I killed it after a few days). A friend gave me some code in R that could create such a density plot in seconds (plot_fun.R), and it seems like matplotlib should be able to do the same thing.
I think the right place to look is 2d histograms, but I am struggling to get the density to be 'right'. I modified code I found at this question to accomplish this, but the density is not showing, it looks like only the densist posible points are getting any color.
Here is approximately the code I am using:
# initial data
x = -np.log10(np.random.random_sample(10000))
y = -np.log10(np.random.random_sample(10000))
#histogram definition
bins = [1000, 1000] # number of bins
thresh = 3 #density threshold
#data definition
mn = min(x.min(), y.min())
mx = max(x.max(), y.max())
mn = mn-(mn*.1)
mx = mx+(mx*.1)
xyrange = [[mn, mx], [mn, mx]]
# histogram the data
hh, locx, locy = np.histogram2d(x, y, range=xyrange, bins=bins)
posx = np.digitize(x, locx)
posy = np.digitize(y, locy)
#select points within the histogram
ind = (posx > 0) & (posx <= bins[0]) & (posy > 0) & (posy <= bins[1])
hhsub = hh[posx[ind] - 1, posy[ind] - 1] # values of the histogram where the points are
xdat1 = x[ind][hhsub < thresh] # low density points
ydat1 = y[ind][hhsub < thresh]
hh[hh < thresh] = np.nan # fill the areas with low density by NaNs
f, a = plt.subplots(figsize=(12,12))
c = a.imshow(
np.flipud(hh.T), cmap='jet',
extent=np.array(xyrange).flatten(), interpolation='none',
origin='upper'
)
f.colorbar(c, ax=ax, orientation='vertical', shrink=0.75, pad=0.05)
s = a.scatter(
xdat1, ydat1, color='darkblue', edgecolor='', label=None,
picker=True, zorder=2
)
That produces this plot:
The KDE code is here:
f, a = plt.subplots(figsize=(12,12))
xy = np.vstack([x, y])
z = sts.gaussian_kde(xy)(xy)
# Sort the points by density, so that the densest points are
# plotted last
idx = z.argsort()
x2, y2, z = x[idx], y[idx], z[idx]
s = a.scatter(
x2, y2, c=z, s=50, cmap='jet',
edgecolor='', label=None, picker=True, zorder=2
)
That produces this plot:
The problem is, of course, that this code is unusable on large data sets.
My question is: how can I use the 2d histogram to produce a scatter plot like that? ax.hist2d does not produce a useful output, because it colors the whole plot, and all my efforts to get the above 2d histogram data to actually color the dense regions of the plot correctly have failed, I always end up with either no coloring or a tiny percentage of the densest points being colored. Clearly I just don't understand the code very well.
Your histogram code assigns a unique color (color='darkblue') so what are you expecting?
I think you are also over complicating things. This much simpler code works fine:
import numpy as np
import matplotlib.pyplot as plt
x, y = -np.log10(np.random.random_sample((2,10**6)))
#histogram definition
bins = [1000, 1000] # number of bins
# histogram the data
hh, locx, locy = np.histogram2d(x, y, bins=bins)
# Sort the points by density, so that the densest points are plotted last
z = np.array([hh[np.argmax(a<=locx[1:]),np.argmax(b<=locy[1:])] for a,b in zip(x,y)])
idx = z.argsort()
x2, y2, z2 = x[idx], y[idx], z[idx]
plt.figure(1,figsize=(8,8)).clf()
s = plt.scatter(x2, y2, c=z2, cmap='jet', marker='.')
In matplotlib, I'm looking to create an inset color bar to show the scale of my contour plot, but when I create the contour using contour, the color bar has white stripes running through it, whereas when I use contourf, the colorbar has the proper "smooth" appearance:
How can I get that nice smooth colorbar from the filled contour on my normal contour plot? I'd also be OK with a filled contour where the zero-level can be set to white, I imagine.
Here is code to generate this example:
from numpy import linspace, outer, exp
from matplotlib.pyplot import figure, gca, clf, subplots_adjust, subplot
from matplotlib.pyplot import contour, contourf, colorbar, xlim, ylim, title
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
# Make some data to plot - 2D gaussians
x = linspace(0, 5, 100)
y = linspace(0, 5, 100)
g1 = exp(-((x-0.75)/0.2)**2)
g2 = exp(-((y-4.25)/0.1)**2)
g3 = exp(-((x-3.5)/0.15)**2)
g4 = exp(-((y-1.75)/0.05)**2)
z = outer(g1, g2) + outer(g3, g4)
figure(1, figsize=(13,6.5))
clf()
# Create a contour and a contourf
for ii in range(0, 2):
subplot(1, 2, ii+1)
if ii == 0:
ca = contour(x, y, z, 125)
title('Contour')
else:
ca = contourf(x, y, z, 125)
title('Filled Contour')
xlim(0, 5)
ylim(0, 5)
# Make the axis labels
yt = text(-0.35, 2.55, 'y (units)', rotation='vertical', size=14);
xt = text(2.45, -0.4, 'x (units)', rotation='horizontal', size=14)
# Add color bar
ains = inset_axes(gca(), width='5%', height='60%', loc=2)
colorbar(ca, cax=ains, orientation='vertical', ticks=[round(xx*10.0)/10.0 for xx in linspace(0, 1)])
if ii ==1:
ains.tick_params(axis='y', colors='#CCCCCC')
subplots_adjust(left=0.05, bottom=0.09, right=0.98, top=0.94, wspace=0.12, hspace=0.2)
show()
Edit: I realize now that at the lower resolution, the white striping behavior is hard to distinguish from some light transparency. Here's an example with only 30 contour lines which makes the problem more obvious:
Edit 2: While I am still interested in figuring out how to do this in the general general case (like if there's negative values), in my specific case, I have determined that I can effectively create something that looks like what I want by simply setting the levels of a filled contour to start above the zero-level:
ca = contourf(x, y, z, levels=linspace(0.05, 1, 125))
Which basically looks like what I want:
A simple hack is to set the thickness of the lines in the colorbar to some higher value.
E.g. storing the colorbar object as cb and adding the following lines to your example
for line in cb.lines:
line.set_linewidth(3)
gives