Two dimensional distribution with contour plotting with Gaussian smoothing - python

I am trying to plot Two-dimensional distribution with Gaussian smoothing for all the data points. Here is my sample data. I would like to plot like this paper figure.
[Two-dimensional distribution of Hα emitters(galaxies). The small circles indicate the positions of Hα emitters. The colored contours indicate 1 σ, 1.5 σ, 2 σ, 3 σ, 4 σ, 5 σ above the mean density distribution computed with all member galaxies (i.e., the photo-z-selected sample and Hα emitters). Here we apply Gaussian smoothing for all the data points with σ ∼ 0.75 Mpc, and co-add the tail of the Gaussian wing at each position. The large gray circles show the object masks; we show only large masks with radius of >2΄ for clarity. (Color online)]
I am trying to plot the same plot with my data. But I am not able to plot like this.
Here is my code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.interpolate import griddata
import scipy as sp
import scipy.ndimage
from astropy.table import Table
import branca
import pandas as pd
%matplotlib inline
#DATA
data=Table.read('s_data.fits')
density_mean=np.mean(data['density'])
std=np.std(data['density'])
vmin = density_mean+std
vmax = density_mean+(5*std)
levels = len(colors)
cm = branca.colormap.LinearColormap(colors, vmin=vmin, vmax=vmax).to_step(levels)
x_orig = data['ra']
y_orig = data['dec']
z_orig = data['density']
# Make a grid
x_arr = np.linspace(np.min(x_orig), np.max(x_orig), 500)
y_arr = np.linspace(np.min(y_orig), np.max(y_orig), 500)
#WHEN I CHANGE THE VALUE FROM 500 to ANOTHER IT DOESN'T COVER ALL THE AREA. WHAT WILL BE THE APPROPRIATE VALUE FOR THIS?
x_mesh, y_mesh = np.meshgrid(x_arr, y_arr)
# Grid the values
z_mesh = griddata((x_orig, y_orig), z_orig, (x_mesh, y_mesh), method='linear')
# Gaussian filter the grid to make it smoother
sigma = [5, 5]
z_mesh = sp.ndimage.filters.gaussian_filter(z_mesh, sigma, mode='constant')
# Create the contour
plt.plot(data['ra'], data['dec'], 'ko', markersize=1,alpha=1)
plt.text(352.33,0.135, 'CL1',fontsize=15) #CL1 coordinate
plt.text(352.096,0.390, 'CL2', fontsize=15) #CL2 coordinate
plt.contourf(x_mesh, y_mesh, z_mesh, levels, alpha=0.7, colors=colors, linestyles='None',
vmin=vmin, vmax=vmax)
#plt.colorbar()
plt.gca().invert_xaxis()
And I got like this
I know something is wrong with my data. Kindly help to get the proper figure like given in the paper. Thank you.

Related

Colour statistically non-significant values in seaborn heatmap with a different colour

I had this problem that I wanted to somehow highlight statistically not significant correlations in seaborn's heatmap. I knew I could hide them with the following code:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr
planets = sns.load_dataset('planets')
# get the p value for pearson coefficient, subtract 1 on the diagonal
pvals = planets.corr(method=lambda x, y: pearsonr(x, y)[1]) - np.eye(*planets.corr().shape)
# set the significance threshold
psig = 0.05
plt.figure(figsize=(6,6))
sns.heatmap(planets.corr()[pvals<psig], annot=True, square=True)
However, that creates these weird white holes and I would like to keep the values and the information, I would just like to emphasise it with another colour.
The way how to solve it was a) to use the same thresholding for another heatmap plotted to the same axes; and b) to add a patch to the legend so that it also has a nice label:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr
import matplotlib.patches as mpatches
planets = sns.load_dataset('planets')
# get the p value for pearson coefficient, subtract 1 on the diagonal
pvals = planets.corr(method=lambda x, y: pearsonr(x, y)[1]) - np.eye(*planets.corr().shape)
# set the significance threshold
psig = 0.05
plt.figure(figsize=(6,6))
sns.heatmap(planets.corr()[pvals<psig], annot=True, square=True)
# add another heatmap with colouring the non-significant cells
sns.heatmap(planets.corr()[pvals>=psig], annot=True, square=True, cbar=False,
cmap=sns.color_palette("Greys", n_colors=1, desat=1))
# add a label for the colour
# https://stackoverflow.com/questions/44098362/using-mpatches-patch-for-a-custom-legend
colors = [sns.color_palette("Greys", n_colors=1, desat=1)[0]]
texts = [f"n.s. (at {psig})"]
patches = [ mpatches.Patch(color=colors[i], label="{:s}".format(texts[i]) ) for i in range(len(texts)) ]
plt.legend(handles=patches, bbox_to_anchor=(.85, 1.05), loc='center')
Furthermore, one would be able to even use multiple conditions for the masking and different significance levels.

2D histogram where one axis is cumulative and the other is not

Let's say I have instances of two random variables that can be treated as paired.
import numpy as np
x = np.random.normal(size=1000)
y = np.random.normal(size=1000)
Using matplotlib it is pretty easy to make a 2D histogram.
import matplotlib.pyplot as plt
plt.hist2d(x,y)
In 1D, matplotlib has an option to make a histogram cumulative.
plt.hist(x,cumulative=True)
What I would like incorporates elements of both classes. I would like to construct a 2D histogram such that the horizontal axis is cumulative and the vertical axis is not cumulative.
Is there are way to do this with Python/Matplotlib?
You can take advantage of np.cumsum to create your cumulative histogram. First save the output from hist2d, then apply to your data when plotting.
import matplotlib.pyplot as plt
import numpy as np
#Some random data
x = np.random.normal(size=1000)
y = np.random.normal(size=1000)
#create a figure
plt.figure(figsize=(16,8))
ax1 = plt.subplot(121) #Left plot original
ax2 = plt.subplot(122) #right plot the cumulative distribution along axis
#What you have so far
ax1.hist2d(x,y)
#save the data and bins
h, xedge, yedge,image = plt.hist2d(x,y)
#Plot using np.cumsum which does a cumulative sum along a specified axis
ax2.pcolormesh(xedge,yedge,np.cumsum(h.T,axis=1))
plt.show()

Heat map generation using coordinate points

I am new in Python. The answer to my question might be available in the StackOverflow, but honestly speaking, I tried almost all the codes and suggestions available in the StackOverflow.
My problem: Almost the same as it is described here. I have coordinate points (x and y) and the corresponding value (p) as a .csv file. I am reading that file using pandas.
df = pd.read_csv("example.csv")
The example.csv file can be download from here. Let an image of size 2000 x 2000.
Task:
Based on the x and y coordinate points in the excel sheet, I have to locate the point in that image.
Lets, A is an image and A(x,y) is any point within A. Now I have to generate a heat map in such a way so that 50 pixels from x and 50 pixels fromy i.e., A(x,y), A(x+50, y), A(x, y+50) and A(x+50, y+50) contains p corresponding to that coordinate points.
I found this link which is very helpful and serves my issue, but the problem is some more modifications are necessary for my datasets.
The code which is available in the above link:
#!/usr/bin/python3
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from skimage import io
from skimage.color import rgb2gray
import matplotlib as mpl
# Read original image
img = io.imread('img.jpg')
# Get the dimensions of the original image
x_dim, y_dim, z_dim = np.shape(img)
# Create heatmap
heatmap = np.zeros((x_dim, y_dim), dtype=float)
# Read CSV with a Pandas DataFrame
df = pd.read_csv("data.csv")
# Set probabilities values to specific indexes in the heatmap
for index, row in df.iterrows():
x = np.int(row["x"])
y = np.int(row["y"])
x1 = np.int(row["x1"])
y1 = np.int(row["y1"])
p = row["Probability value"]
heatmap[x:x1,y:y1] = p
# Plot images
fig, axes = plt.subplots(1, 2, figsize=(8, 4))
ax = axes.ravel()
ax[0].imshow(img)
ax[0].set_title("Original")
fig.colorbar(ax[0].imshow(img), ax=ax[0])
ax[1].imshow(img, vmin=0, vmax=1)
ax[1].imshow(heatmap, alpha=.5, cmap='jet')
ax[1].set_title("Original + heatmap")
# Specific colorbar
norm = mpl.colors.Normalize(vmin=0,vmax=2)
N = 11
cmap = plt.get_cmap('jet',N)
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array([])
plt.colorbar(sm, ticks=np.linspace(0,1,N),
boundaries=np.arange(0,1.1,0.1))
fig.tight_layout()
plt.show()
Issues which I am facing when using this code:
This code is generating a heat map of square edges, but I am expecting a smooth edge. I know Gaussian distribution might solve this problem. But I am new in python and I don't know how to implement the Gaussian Distribution in my dataset.
The regions which don't belong to the coordinate points also generating a layer of color. As a result in an overlayed image those layer covering the background of original images. In one sentence I want the background of the heat map will be transparent so that overlays will not create any problem in showing the regions which are not covered by the coordinate points.
Any leads will be highly appreciated.
Your code is perfect. Just change only one line, then your both issues will be solved.
Before changes:
ax[1].imshow(heatmap, alpha=.5, cmap='jet')
After changes:
ax[1].imshow(heatmap, alpha=.5, cmap='coolwarm', interpolation='gaussian')
Though above changes will solve your issue, but if you want then for additional transparency, you can use below function
def transparent_cmap(cmap, N=255):
"Copy colormap and set alpha values"
mycmap = cmap
mycmap._init()
mycmap._lut[:,-1] = np.linspace(0, 0.8, N+4)
return mycmap
mycmap = transparent_cmap(plt.cm.coolwarm)
In that case, your previous code line will change like below:
ax[1].imshow(heatmap, alpha=.5, cmap=mycmap, vmin=0, vmax=1)
The question you linked uses plotly. If you don't want to use that and want to simply smooth the way your data looks, I suggest just using a gaussian filter using scipy.
At the top, import:
import seaborn as sns
from scipy.ndimage.filters import gaussian_filter
Then use it like this:
df_smooth = gaussian_filter(df, sigma=1)
sns.heatmap(df_smooth, vmin=-40, vmax=150, cmap ="coolwarm" , cbar=True , cbar_kws={"ticks":[-40,150,-20,0,25,50,75,100,125]})
You can change the amount of smoothing, using e.g. sigma=3, or any other number that gives you the amount of smoothing you want.
Keep in mind that that will also "smooth out" any maximum data peaks you have, so your minimum and maximum data will not be the same that you specified in your normalization anymore. To still get good looking heatmaps I would suggest not using fixed values for your vmin and vmax, but:
sns.heatmap(df_smooth, vmin=np.min(df_smooth), vmax=np.max(df_smooth), cmap ="coolwarm" , cbar=True , cbar_kws={"ticks":[-40,150,-20,0,25,50,75,100,125]})
In case that you Gaussian filter fulfill your expectations you mentioned you can even implement Gaussian normalization on your data directly.

Plotting dataset using griddata without cancelling outliers

I have an x, y, z dataset which contains a rather large number of points.
x and y are the positions while z is the actual observable at those coordinates.
Most coordinates have a zero value for z, while only a few of them define lines (with smoothly changing z) in the 2D map.
If I plot it with
scatter(x,y,c=z))
I get only very faint lines as the scatterpoints with color defined by z=0 are overlapping with the nonzero values of z. If I decrease the size of the points to reduce overlap, I can't see them anymore.
Here an example of the best I could get using scatter (blue is zero z, other colors are non-zero z).
So, I thought of instead using
data = np.genfromtxt('data')
x=data[:,0]
y=data[:,1]*3.0
z=data[:,2]
grid_x, grid_y = np.mgrid[min(x):max(x):100, min(y):max(y):1000]
from scipy.interpolate import griddata
grid_z0 = griddata((x, y),z, (grid_x, grid_y), method='cubic')
im = imshow(grid_z0,origin="lower",extent=[0,0.175,-0.15,0.15]) # zoom in on specific part of data
to get a denser grid of points and possibly get wider lines due to the cubic interpolation of points around them.
However, then it seems like griddata is removing the non-zero z, considering them as outliners, thus hiding any possible features and the whole grid plots a zero z.
Is there any python/matplotlib/... feature or trick I am missing to plot this in a nice way?
I am trying to make plots that would look something like the ones you can see in Fig. 2 (2) of [https://journals.aps.org/prb/abstract/10.1103/PhysRevB.93.0854092 (you can see the figure without downloading the paper) with possibly some kind of glow around the lines.
The data I used is in this dropbox link.
Of course you may change the scatter, e.g. to set the size of the points without energy to 0.
import matplotlib.pyplot as plt
import numpy as np
data = np.genfromtxt('data/some_solidstate_physics_data.txt')
x=data[:,0]; y=data[:,1]*3.0; z=data[:,2]
plt.scatter(x,y,c=z, s=np.log10(z+1), cmap="PuRd", vmin=-500)
plt.show()
Since the data is already gridded, there is for sure no need to use griddata, this will only smooth out the data. Instead just reshaping the data into a grid is enough.
import matplotlib.pyplot as plt
import numpy as np
data = np.genfromtxt('data/some_solidstate_physics_data.txt')
x=data[:,0]; y=data[:,1]*3.0; z=data[:,2]
ux = np.unique(x); uy = np.unique(y)
Z = z.reshape(len(ux),len(uy)).T
dx = np.diff(ux[:2])[0]; dy = np.diff(uy[:2])[0]
ext = [ux.min()-dx/2.,ux.max()+dx/2.,uy.min()-dy/2., uy.max()+dy/2.]
plt.imshow(Z, extent=ext, aspect="auto", cmap="magma")
plt.show()
Since the grid is very dense, it looks somehow pixelated.
You may of course also bin your data into larger chunks. For example joining the data of 3x3 pixels into one and taking the maximum value, using scipy.stats.binned_statistic_2d
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import binned_statistic_2d
data = np.genfromtxt('data/some_solidstate_physics_data.txt')
x=data[:,0]; y=data[:,1]*3.0; z=data[:,2]
ux = np.unique(x); uy = np.unique(y)
h, ex, ey,_ = binned_statistic_2d(x, y, z, bins=[ux[::3],uy[::3]], statistic='max')
dx = np.diff(ex[:2])[0]; dy = np.diff(ey[:2])[0]
ext = [ux.min()-dx/2.,ux.max()+dx/2.,uy.min()-dy/2., uy.max()+dy/2.]
plt.imshow(h.T, extent=ext, aspect="auto", cmap="magma")
plt.show()
Having those techniques at your disposal you may then decide to beautify your result at the expense of quantitative accuracy.
E.g. applying a gaussian filter, scipy.ndimage.filters.gaussian_filter as well as interpolation="gaussian" in the plotting.
import matplotlib.pyplot as plt
import numpy as np
import scipy.ndimage.filters
data = np.genfromtxt('data/some_solidstate_physics_data.txt')
x=data[:,0]; y=data[:,1]*3.0; z=data[:,2]
ux = np.unique(x); uy = np.unique(y)
Z = z.reshape(len(ux),len(uy)).T
Z = scipy.ndimage.filters.gaussian_filter(Z, 3)
dx = np.diff(ux[:2])[0]; dy = np.diff(uy[:2])[0]
ext = [ux.min()-dx/2.,ux.max()+dx/2.,uy.min()-dy/2., uy.max()+dy/2.]
plt.imshow(Z, extent=ext, aspect="auto", cmap="magma", interpolation="gaussian")
plt.show()

Modify matplotlib colormap

I'm trying to produce a similar version of this image using Python:
I'm close but can't quite figure out how to modify a matplotlib colormap to make values <0.4 go to white. I tried masking those values and using set_bad but I ended up with a real blocky appearance, losing the nice smooth contours seen in the original image.
Result with continuous colormap (problem: no white):
Result with set_bad (problem: no smooth transition to white):
Code so far:
from netCDF4 import Dataset as NetCDFFile
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.basemap import Basemap
nc = NetCDFFile('C:/myfile1.nc')
nc1 = NetCDFFile('C:/myfile2.nc')
lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time = nc.variables['time'][:]
uwnd = nc.variables['uwnd'][:]
vwnd = nc1.variables['vwnd'][:]
map = Basemap(llcrnrlon=180.,llcrnrlat=0.,urcrnrlon=340.,urcrnrlat=80.)
lons,lats = np.meshgrid(lon,lat)
x,y = map(lons,lats)
speed = np.sqrt(uwnd*uwnd+vwnd*vwnd)
#speed = np.ma.masked_where(speed < 0.4, speed)
#cmap = plt.cm.jet
#cmap.set_bad(color='white')
levels = np.arange(0.0,3.0,0.1)
ticks = np.arange(0.0,3.0,0.2)
cs = map.contourf(x,y,speed[0],levels, cmap='jet')
vw = plt.quiver(x,y,speed)
cbar = plt.colorbar(cs, orientation='horizontal', cmap='jet', spacing='proportional',ticks=ticks)
cbar.set_label('850 mb Vector Wind Anomalies (m/s)')
map.drawcoastlines()
map.drawparallels(np.arange(20,80,20),labels=[1,1,0,0], linewidth=0.5)
map.drawmeridians(np.arange(200,340,20),labels=[0,0,0,1], linewidth=0.5)
#plt.show()
plt.savefig('phase8_850wind_anom.png',dpi=600)
The answer to get the result smooth lies in constructing your own colormap. To do this one has to create an RGBA-matrix: a matrix with on each row the amount (between 0 and 1) of Red, Green, Blue, and Alpha (transparency; 0 means that the pixel does not have any coverage information and is transparent).
As an example the distance to some point is plotted in two dimensions. Then:
For any distance higher than some critical value, the colors will be taken from a standard colormap.
For any distance lower than some critical value, the colors will linearly go from white to the first color of the previously mentioned map.
The choices depend fully on what you want to show. The colormaps and their sizes depend on your problem. For example, you can choose different types of interpolation: linear, exponential, ...; single- or multi-color colormaps; etc..
The code:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
# create colormap
# ---------------
# create a colormap that consists of
# - 1/5 : custom colormap, ranging from white to the first color of the colormap
# - 4/5 : existing colormap
# set upper part: 4 * 256/4 entries
upper = mpl.cm.jet(np.arange(256))
# set lower part: 1 * 256/4 entries
# - initialize all entries to 1 to make sure that the alpha channel (4th column) is 1
lower = np.ones((int(256/4),4))
# - modify the first three columns (RGB):
# range linearly between white (1,1,1) and the first color of the upper colormap
for i in range(3):
lower[:,i] = np.linspace(1, upper[0,i], lower.shape[0])
# combine parts of colormap
cmap = np.vstack(( lower, upper ))
# convert to matplotlib colormap
cmap = mpl.colors.ListedColormap(cmap, name='myColorMap', N=cmap.shape[0])
# show some example
# -----------------
# open a new figure
fig, ax = plt.subplots()
# some data to plot: distance to point at (50,50)
x,y = np.meshgrid(np.linspace(0,99,100),np.linspace(0,99,100))
z = (x-50)**2. + (y-50)**2.
# plot data, apply colormap, set limit such that our interpretation is correct
im = ax.imshow(z, interpolation='nearest', cmap=cmap, clim=(0,5000))
# add a colorbar to the bottom of the image
div = make_axes_locatable(ax)
cax = div.append_axes('bottom', size='5%', pad=0.4)
cbar = plt.colorbar(im, cax=cax, orientation='horizontal')
# save/show the image
plt.savefig('so.png')
plt.show()
The result:

Categories

Resources