IPython FITS file plotting gives different results - python

I am facing a problem while running a script ( please find the code below ).
I am trying to plot an array of values, write it into a FITS file format, read it back again and plot it ---> I don't get the same plots!
If you could please help me with this it would be great.
The following are the versions of my packages and compiler:
matplotlib : '2.0.0b1'
numpy : '1.11.0'
astropy : u'1.1.2'
python : 2.7
Sincerely,
Anik Halder
import numpy as np
from pylab import *
from astropy.io import fits
# Just making a 10x10 meshgrid
x = np.arange(10)
X , Y = np.meshgrid(x,x)
# finding the distance of different points on the meshgrid from a point suppose at (5,5)
Z = ((X-5)**2 + (Y-5)**2)**0.5
# plotting Z (see image [link below] - left one)
imshow(Z, origin = "lower")
colorbar()
show()
# writing the Z data into a fits file
fits.writeto("my_file.fits", Z)
# reading the same fits file and storing the data
Z_read = fits.open("my_file.fits")[0].data
# plotting Z_read : we expect it to show the same plot as before
imshow(Z_read, origin = "lower")
colorbar()
show()
# Lo! That's not the case for me! It's not the same plot! (see image - right one)
# Hence, I try to check whether the values stored in Z and Z_read are different..
print Z - Z_read
# No! It returns an array full of zeros! This means Z and Z_read are the same! I don't get why the plots look different!
Please find the image in this link: http://imgur.com/1TklSjU

Actually it turns out to be that it's to do with the version of matplotlib.
Answered by a developer of matplotlib - Jens Nielsen
This doesn't occur on matplotlib version 1.51
In version 2 beta 1, it seems that the FITS data is converted from float32 to big endian float8. Please look at the following link:
https://gist.github.com/jenshnielsen/86d4a86d8f667fadddc09f88c5fb87e6
The issue has been posted and you can look it up over here:
https://github.com/matplotlib/matplotlib/issues/6671
In the meantime for having the same plots (Z and Z _read) we should rather use the following code (for matplotlib 2 beta 1) :
imshow(Z_read.astype('float64'), origin = "lower")

Related

Interactive plot of larger-than-memory binary data file

I have larger-than-memory uniform (regularly gridded) 2d binary data which I am trying to interactively plot using any combination of Dask, Datashader and Holoviews. I am open to using other python-based tools, but the internet has led me to these ones for now.
The data files are ~11 GB and consist of a (600000, 4800) array of float32s.
I want to plot them on a different aspect ratio (1000x1000 px), and have a callback handle the dataloading/shading on zoom/pan. I am serving to a browser instead of using notebooks.
Within a 1000x1000px datashader canvas I have plotted:
4800x4800 points (which filled the canvas)
600000x4800 points (which filled only the bottom few pixels of the canvas, since the colored pixels had an aspect ratio of 600000/4800)
Neither were interactive.
What I have to far using python3.10 is:
import numpy as np
import datashader as ds
from datashader import transfer_functions as tf
import xarray as xr
import holoviews as hv
import panel as pn
hv.extension('bokeh', logo=False)
hv.output(backend="bokeh")
filename = 'path/to/binary/datafile'
arr = np.memmap(filename, shape=(4800,600000), offset=0, dtype=np.dtype("f4"), mode='r')
arr = xr.DataArray(arr, dims=("x", "y"), coords={'x': np.arange(4800), "y": np.arange(600000)})
cvs = ds.Canvas(plot_width=1000, plot_height=1000, x_range=(0, 4800), y_range=(0, 4800))
# the following line works too but does not fill the canvas
# cvs = ds.Canvas(plot_width=1000, plot_height=1000, x_range=(0, 4800), y_range=(0, 600000))
agg = cvs.raster(arr)
sh = tf.shade(agg)
pn.Row(sh).show()
Any advice is appreciated!
I'm not sure precisely what the ask is here, but the HoloViz way of approaching this problem would be to use dask without .persist() or .compute(). The np.memmap approach may also work.
And then you'd use holoviews as described at https://examples.pyviz.org/census/census.html, or hvplot as described at https://hvplot.holoviz.org . Without having the actual data or a synthesized version of it it's hard to be more specific than that.
BTW, I think you have x and y switched in your x_range and y_range above, since a Numpy shape of 4800,600000 corresponds to a y_range of 0,4800 and an x_range of 0,600000 (since NumPy shapes are row, column while row is on y and column is on x).

Plot PDF of Pareto distribution in Python

I have a specific Pareto distribution. For example,
Pareto(beta=0.00317985, alpha=0.147365, gamma=1.0283)
which I obtained from this answer and now I want to plot a graph of its Probability Density Function (PDF) in matplotlib. So I believe that the x-axis will be all positive real numbers, and the y-axis will be the same.
How exactly can I obtain the appropriate PDF information and plot it? Programmatically obtaining the mathematical PDF function or coordinates is a requirement for this question.
UPDATE:
The drawPDF method returns a Graph object that contains coordinates for the PDF. However, I don't know how to access these coordinates programmatically. I certainly don't want to convert the object to a string nor use a regex to pull out the information:
In [45]: pdfg = distribution.drawPDF()
In [46]: pdfg
Out[46]: class=Graph name=pdf as a function of X0 implementation=class=GraphImplementation name=pdf as a function of X0 title= xTitle=X0 yTitle=PDF axes=ON grid=ON legendposition=topright legendFontSize=1
drawables=[class=Drawable name=Unnamed implementation=class=Curve name=Unnamed derived from class=DrawableImplementation name=Unnamed legend=X0 PDF data=class=Sample name=Unnamed implementation=class=Sam
pleImplementation name=Unnamed size=129 dimension=2 data=[[-1610.7,0],[-1575.83,0],[-1540.96,0],[-1506.09,0],[-1471.22,0],[-1436.35,0],[-1401.48,0],[-1366.61,0],...,[-1331.7,6.95394e-06],[2852.57,6.85646e-06]] color
=red fillStyle=solid lineStyle=solid pointStyle=none lineWidth=2]
I assume that you want to perform different tasks:
To plot the PDF
To compute the PDF at a single point
To compute the PDF for a range of values
Each of these needs requires a different script. Please let me detail them.
I first create the Pareto distribution:
import openturns as ot
import numpy as np
beta = 0.00317985
alpha = 0.147365
gamma = 1.0283
distribution = ot.Pareto(beta, alpha, gamma)
print("distribution", distribution)
To plot the PDF, use drawPDF() method. This creates a ot.Graph which can be viewed directly in Jupyter Notebook or IPython. We can force the creation of the plot with View:
import openturns.viewer as otv
graph = distribution.drawPDF()
otv.View(graph)
This plots:
To compute the PDF at a single point, use computePDF(x), where x is a ot.Point(). This can also be a Python list or tuple or 1D numpy array, as the conversion is automatically managed by OpenTURNS:
x = 500.0
y = distribution.computePDF(x)
print("y=", y)
The previous script prints:
y= 5.0659235352823877e-05
To compute the PDF for a range of values, we can use the computePDF(x), where x is a ot.Sample(). This can also be a Python list of lists or a 2D numpy array, as the conversion is automatically managed by OpenTURNS.
x = ot.Sample([[v] for v in np.linspace(0.0, 1000.0)])
y = distribution.computePDF(x)
print("y=", y)
The previous script prints:
y=
0 : [ 0 ]
1 : [ 0.00210511 ]
[...]
49 : [ 2.28431e-05 ]

understanding pyresample to regrid irregular grid data to a regular grid

I need to regrid data on a irregular grid (lambert conical) to a regular grid. I think pyresample is my best bet. Infact my original lat,lon are not 1D (which seems to be needed to use basemap.interp or scipy.interpolate.griddata).
I found this SO's answer helpful. However I get empty interpolated data. I think it has to do with the choice of my radius of influence and with the fact that my data are wrapped (??).
This is my code:
import numpy as np
from matplotlib import pyplot as plt
import netCDF4
%matplotlib inline
url = "http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR/Dailies/monolevel/hlcy.2009.nc"
SRHtemp = netCDF4.Dataset(url).variables['hlcy'][0,::]
Y_n = netCDF4.Dataset(url).variables['y'][:]
X_n = netCDF4.Dataset(url).variables['x'][:]
T_n = netCDF4.Dataset(url).variables['time'][:]
lat_n = netCDF4.Dataset(url).variables['lat'][:]
lon_n = netCDF4.Dataset(url).variables['lon'][:]
lat_n and lon_n are irregular and the latitude and longitude corresponding to the projected coordinates x,y.
Because of the way lon_n is, I added:
lon_n[lon_n<0] = lon_n[lon_n<0]+360
so that now if I plot them they look nice and ok:
Then I create my new set of regular coordinates:
XI = np.arange(148,360)
YI = np.arange(0,87)
XI, YI = np.meshgrid(XI,YI)
Following the answer above I wrote the following code:
from pyresample.geometry import SwathDefinition
from pyresample.kd_tree import resample_nearest
def_a = SwathDefinition(lons=XI, lats=YI)
def_b = SwathDefinition(lons=lon_n, lats=lat_n)
interp_dat = resample_nearest(def_b,SRHtemp,def_a,radius_of_influence = 70000,fill_value = -9.96921e+36)
the resolution of the data is about 30km, so I put 70km, the fill_value I put is the one from the data, but of course I can just put zero or nan.
however I get an empty array.
What do I do wrong? also - if there is another way of doing it, I am interested in knowing it. Pyresample documentation is a bit thin, and I need a bit more help.
I did find this answer suggesting to use another griddata function:
import matplotlib.mlab as ml
resampled_data = ml.griddata(lon_n.ravel(), lat_n.ravel(),SRHtemp.ravel(),XI,YI,interp = "linear")
and it seems to be ok:
But I would like to understand more about pyresample, since it seems so powerful.
The problem is that XI and XI are integers, not floats. You can fix this by simply doing
XI = np.arange(148,360.)
YI = np.arange(0,87.)
XI, YI = np.meshgrid(XI,YI)
The inability to handle integer datatypes is an undocumented, unintuitive, and possibly buggy behavior from pyresample.
A few more notes on your coding style:
It's not necessary to overwrite the XI and YI variables, you don't gain much by this
You should just load the netCDF dataset once and the access the variables via that object

Collapsing / Flattening a FITS data cube in python

I've looked all over the place and am not finding a solution to this issue. I feel like it should be fairly straightforward, but we'll see.
I have a .FITS format data cube and I need to collapse it into a 2D FITS image. The data cube has two spacial dimensions and one spectral/velocity dimension.
Just looking for a simple python routine to load in the cube and flatten all these layers (i.e. integrate them along the spectral/velocity axis). Thanks for any help.
This tutorial on pyfits is a little old, but still basically correct. The key is that the output of opening a FITS cube with pyfits (or astropy.io.fits) is that you have a 3 dimensional numpy array.
import pyfits
# if you are using astropy then for this example
# from astropy.io import fits as pyfits
data_cube, header_data_cube = pyfits.getdata("data_cube.fits", 0, header=True)
data_cube.shape
# (Z, X, Y)
You then have to decided how to flatten/integrate cube along the Z axis, and there are plenty of resources out there to help you decide the right (hopefully based in some analysis framework) to do that.
OK, this seems to work:
import pyfits
import numpy as np
hdulist = pyfits.open(filename)
header = hdulist[0].header
data = hdulist[0].data
data = np.nan_to_num(data)
new_data = data[0]
for i in range(1,84): #this depends on number of layers or pages
new_data += data[i]
hdu = pyfits.PrimaryHDU(new_data)
hdu.writeto(new_filename)
One problem with this routine is that WCS coordinates (which are attached to the original data cube) are lost during this conversion.
This is a bit of an old question, but spectral-cube now provides a better solution for this.
Example, based on Teachey's answer:
from spectral_cube import SpectralCube
cube = SpectralCube.read(filename)
summed_image = cube.sum(axis=0)
summed_image.hdu.writeto(new_filename)

Issues with 2D-Interpolation in Scipy

In my application, the data data is sampled on a distorted grid, and I would like to resample it to a nondistorted grid. In order to test this, I wrote this program with examplary distortions and a simple function as data:
from __future__ import division
import numpy as np
import scipy.interpolate as intp
import pylab as plt
# Defining some variables:
quadratic = -3/128
linear = 1/16
pn = np.poly1d([quadratic, linear,0])
pixels_x = 50
pixels_y = 30
frame = np.zeros((pixels_x,pixels_y))
x_width= np.concatenate((np.linspace(8,7.8,57) , np.linspace(7.8,8,pixels_y-57)))
def data(x,y):
z = y*(np.exp(-(x-5)**2/3) + np.exp(-(x)**2/5) + np.exp(-(x+5)**2))
return(z)
# Generating grid coordinates
yt = np.arange(380,380+pixels_y*4,4)
xt = np.linspace(-7.8,7.8,pixels_x)
X, Y = np.meshgrid(xt,yt)
Y=Y.T
X=X.T
Y_m = np.zeros((pixels_x,pixels_y))
X_m = np.zeros((pixels_x,pixels_y))
# generating distorted grid coordinates:
for i in range(pixels_y):
Y_m[:,i] = Y[:,i] - pn(xt)
X_m[:,i] = np.linspace(-x_width[i],x_width[i],pixels_x)
# Sample data:
for i in range(pixels_y):
for j in range(pixels_x):
frame[j,i] = data(X_m[j,i],Y_m[j,i])
Y_m = Y_m.flatten()
X_m = X_m.flatten()
frame = frame.flatten()
##
Y = Y.flatten()
X = X.flatten()
ipf = intp.interp2d(X_m,Y_m,frame)
interpolated_frame = ipf(xt,yt)
At this point, I have to questions:
The code works, but I get the the following warning:
Warning: No more knots can be added because the number of B-spline coefficients
already exceeds the number of data points m. Probably causes: either
s or m too small. (fp>s)
kx,ky=1,1 nx,ny=54,31 m=1500 fp=0.000006 s=0.000000
Also, some interpolation artifacts appear, and I assume that they are related to the warning - Do you guys know what I am doing wrong?
For my actual applications, the frames need to be around 500*100, but when doing this, I get a MemoryError - Is there something I can do to help that, apart from splitting the frame into several parts?
Thanks!
This problem is most likely related to the usage of bisplrep and bisplev within interp2d. The docs mention that they use a smooting factor of s=0.0 and that bisplrep and bisplev should be used directly if more control over s is needed. The related docs mention that s should be found between (m-sqrt(2*m),m+sqrt(2*m)) where m is the number of points used to construct the splines. I had a similar problem and found it solved when using bisplrep and bisplev directly, where s is only optional.
For 2d interpolation,
griddata
is solid, local, fast.
Take a look at problem-with-2d-interpolation-in-scipy-non-rectangular-grid on SO.
You might want to look at the following interp method in basemap:
mpl_toolkits.basemap.interp
http://matplotlib.sourceforge.net/basemap/doc/html/api/basemap_api.html
unless you really need spline-based interpolation.

Categories

Resources