Interactive plot of larger-than-memory binary data file - python

I have larger-than-memory uniform (regularly gridded) 2d binary data which I am trying to interactively plot using any combination of Dask, Datashader and Holoviews. I am open to using other python-based tools, but the internet has led me to these ones for now.
The data files are ~11 GB and consist of a (600000, 4800) array of float32s.
I want to plot them on a different aspect ratio (1000x1000 px), and have a callback handle the dataloading/shading on zoom/pan. I am serving to a browser instead of using notebooks.
Within a 1000x1000px datashader canvas I have plotted:
4800x4800 points (which filled the canvas)
600000x4800 points (which filled only the bottom few pixels of the canvas, since the colored pixels had an aspect ratio of 600000/4800)
Neither were interactive.
What I have to far using python3.10 is:
import numpy as np
import datashader as ds
from datashader import transfer_functions as tf
import xarray as xr
import holoviews as hv
import panel as pn
hv.extension('bokeh', logo=False)
hv.output(backend="bokeh")
filename = 'path/to/binary/datafile'
arr = np.memmap(filename, shape=(4800,600000), offset=0, dtype=np.dtype("f4"), mode='r')
arr = xr.DataArray(arr, dims=("x", "y"), coords={'x': np.arange(4800), "y": np.arange(600000)})
cvs = ds.Canvas(plot_width=1000, plot_height=1000, x_range=(0, 4800), y_range=(0, 4800))
# the following line works too but does not fill the canvas
# cvs = ds.Canvas(plot_width=1000, plot_height=1000, x_range=(0, 4800), y_range=(0, 600000))
agg = cvs.raster(arr)
sh = tf.shade(agg)
pn.Row(sh).show()
Any advice is appreciated!

I'm not sure precisely what the ask is here, but the HoloViz way of approaching this problem would be to use dask without .persist() or .compute(). The np.memmap approach may also work.
And then you'd use holoviews as described at https://examples.pyviz.org/census/census.html, or hvplot as described at https://hvplot.holoviz.org . Without having the actual data or a synthesized version of it it's hard to be more specific than that.
BTW, I think you have x and y switched in your x_range and y_range above, since a Numpy shape of 4800,600000 corresponds to a y_range of 0,4800 and an x_range of 0,600000 (since NumPy shapes are row, column while row is on y and column is on x).

Related

How to cut vertices and faces connected to points lower than some value in pyvista?

So when one exports r.out.vtk from Grass GIS we get a bad surface with -99999 points instead of nulls:
I want to remove them, yet a simple clip is not enough:
pd = pv.read('./pid1.vtk')
pd = pd.clip((0,1,1), invert=False).extract_surface()
p.add_mesh(pd ) #add atoms to scene
p.show()
resulting in:
So I wonder how to keep from it only top (> -999) points and connected vertices - in order to get only the top plane (it is curved\not flat actually) using pyvista?
link to example .vtk
There is an easy way to do this and there isn't...
You could use pyvista's threshold filter with all_scalars=True as long as you have only one set of scalars:
import pyvista as pv
pd = pv.read('./pid1.vtk')
pd = pd.threshold(-999, all_scalars=True)
plotter = pv.Plotter()
plotter.add_mesh(pd) #add atoms to scene
plotter.show()
Since all_scalars starts filtering based on every scalar array, this will only do what you'd expect if there are no other scalars. Furthermore, unfortunately there seems to be a bug in pyvista (expected to be fixed in version 0.32.0) which makes the use of this keyword impossible.
What you can do in the meantime (if you don't want to use pyvista's main branch before the fix is released) is to threshold the data yourself using numpy:
import pyvista as pv
pd = pv.read('./pid1.vtk')
scalars = pd.active_scalars
keep_inds = (scalars > -999).nonzero()[0]
pd = pd.extract_points(keep_inds, adjacent_cells=False)
plotter = pv.Plotter()
plotter.add_mesh(pd) #add atoms to scene
plotter.show()
The main point of both all_scalars (in threshold) and adjacent_cells (in extract_points) is to only keep cells where every point satisfies the condition.
With both of the above I get the following figure using your data:

Bilinear interpolation from structured grid to unstructured grid (arbitrary points)

I need to interpolate bilinearly some air data of a hdf4/netcdf4/hdf5 file from a 240x240 structured grid on an arbitrary collection of coordinates. I have no idea on how to do this. I have tried using pyresample but that needs an AreaDefinition of target grid which is not possible in my case of unstructured target data (arbitrary points). Here is my code:
import numpy as np
import pyresample
from netCDF4 import Dataset
air_file = Dataset('air.hdf', mode='r')
air_data = air_file.variables['air_2m' ][:].flatten()
air_lon = air_file.variables['air_lon'][:].flatten()
air_lat = air_file.variables['air_lat'][:].flatten()
air_data = air_data.reshape(240,240)
air_lon = air_lon. reshape(240,240) # grid size is 240x240
air_lat = air_lat. reshape(240,240)
tar_lon = 100 * np.random.random((100,1)) # random points
tar_lat = 100 * np.random.random((100,1)) # random points
source_def = pyresample.geometry.SwathDefinition(lons=air_lon, lats=air_lat)
target_def = pyresample.geometry.SwathDefinition(lons=tar_lon, lats=tar_lat)
result = pyresample.bilinear.resample_bilinear(gmt_1500, source_def, target_def, radius=50e3, neighbours=32, nprocs=1, fill_value=None, reduce_data=True, segments=None, epsilon=0)
I am getting the following error (which is understood as it needs an AreaDefinition for target):
AttributeError: 'SwathDefinition' object has no attribute 'proj_str'
Is there any other way of doing this?
I'm not familiar with the pyresample package, but for bilinear interpolation in python I suggest referring to this earlier stackexchange thread which gives a number of useful examples:
How to perform bilinear interpolation in Python
p.s: By the way, if anyone wants to perform this task from the command line, you can also extract a set of points using bilinear interpolation with cdo too
# some bash loop over a pairs of x and y
cdo remapbil,lon=${x}/lat=${x} in.nc mypoint_${x}_${y}.nc

IPython FITS file plotting gives different results

I am facing a problem while running a script ( please find the code below ).
I am trying to plot an array of values, write it into a FITS file format, read it back again and plot it ---> I don't get the same plots!
If you could please help me with this it would be great.
The following are the versions of my packages and compiler:
matplotlib : '2.0.0b1'
numpy : '1.11.0'
astropy : u'1.1.2'
python : 2.7
Sincerely,
Anik Halder
import numpy as np
from pylab import *
from astropy.io import fits
# Just making a 10x10 meshgrid
x = np.arange(10)
X , Y = np.meshgrid(x,x)
# finding the distance of different points on the meshgrid from a point suppose at (5,5)
Z = ((X-5)**2 + (Y-5)**2)**0.5
# plotting Z (see image [link below] - left one)
imshow(Z, origin = "lower")
colorbar()
show()
# writing the Z data into a fits file
fits.writeto("my_file.fits", Z)
# reading the same fits file and storing the data
Z_read = fits.open("my_file.fits")[0].data
# plotting Z_read : we expect it to show the same plot as before
imshow(Z_read, origin = "lower")
colorbar()
show()
# Lo! That's not the case for me! It's not the same plot! (see image - right one)
# Hence, I try to check whether the values stored in Z and Z_read are different..
print Z - Z_read
# No! It returns an array full of zeros! This means Z and Z_read are the same! I don't get why the plots look different!
Please find the image in this link: http://imgur.com/1TklSjU
Actually it turns out to be that it's to do with the version of matplotlib.
Answered by a developer of matplotlib - Jens Nielsen
This doesn't occur on matplotlib version 1.51
In version 2 beta 1, it seems that the FITS data is converted from float32 to big endian float8. Please look at the following link:
https://gist.github.com/jenshnielsen/86d4a86d8f667fadddc09f88c5fb87e6
The issue has been posted and you can look it up over here:
https://github.com/matplotlib/matplotlib/issues/6671
In the meantime for having the same plots (Z and Z _read) we should rather use the following code (for matplotlib 2 beta 1) :
imshow(Z_read.astype('float64'), origin = "lower")

Splicing image array (FITS file) using coordinates from header

I am trying to splice a fits array based on the latitudes provided from the Header. However, I cannot seem to do so with my knowledge of Python and the documentation of astropy. The code I have is something like this:
from astropy.io import fits
import numpy as np
Wise1 = fits.open('Image1.fits')
im1 = Wise1[0].data
im1 = np.where(im1 > *latitude1, 0, im1)
newhdu = fits.PrimaryHDU(im1)
newhdulist = fits.HDUList([newhdu])
newhdulist.writeto('1b1_Bg_Removed_2.fits')
Here latitude1 would be a value in degrees, recognized after being called from the header. So there are two things I need to accomplish:
How to call the header to recognize Galactic Latitudes?
Splice the array in such a way that it only contains values for the range of latitudes, with everything else being 0.
I think by "splice" you mean "cut out" or "crop", based on the example you've shown.
astropy.nddata has a routine for world-coordinate-system-based (i.e., lat/lon or ra/dec) cutouts
However, in the simple case you're dealing with, you just need the coordinates of each pixel. Do this by making a WCS:
from astropy import wcs
w = wcs.WCS(Wise1[0].header)
xx,yy = np.indices(im.shape)
lon,lat = w.wcs_pix2world(xx,yy,0)
newim = im[lat > my_lowest_latitude]
But if you want to preserve the header information, you're much better off using the cutout tool, since you then do not have to manually manage this.
from astropy.nddata import Cutout2D
from astropy import coordinates
from astropy import units as u
# example coordinate - you'll have to figure one out that's in your map
center = coordinates.SkyCoord(mylon*u.deg, mylat*u.deg, frame='fk5')
# then make an array cutout
co = nddata.Cutout2D(im, center, size=[0.1,0.2]*u.arcmin, wcs=w)
# create a new FITS HDU
hdu = fits.PrimaryHDU(data=co.data, header=co.wcs.to_header())
# write to disk
hdu.writeto('cropped_file.fits')
An example use case is in the astropy documentation.

Collapsing / Flattening a FITS data cube in python

I've looked all over the place and am not finding a solution to this issue. I feel like it should be fairly straightforward, but we'll see.
I have a .FITS format data cube and I need to collapse it into a 2D FITS image. The data cube has two spacial dimensions and one spectral/velocity dimension.
Just looking for a simple python routine to load in the cube and flatten all these layers (i.e. integrate them along the spectral/velocity axis). Thanks for any help.
This tutorial on pyfits is a little old, but still basically correct. The key is that the output of opening a FITS cube with pyfits (or astropy.io.fits) is that you have a 3 dimensional numpy array.
import pyfits
# if you are using astropy then for this example
# from astropy.io import fits as pyfits
data_cube, header_data_cube = pyfits.getdata("data_cube.fits", 0, header=True)
data_cube.shape
# (Z, X, Y)
You then have to decided how to flatten/integrate cube along the Z axis, and there are plenty of resources out there to help you decide the right (hopefully based in some analysis framework) to do that.
OK, this seems to work:
import pyfits
import numpy as np
hdulist = pyfits.open(filename)
header = hdulist[0].header
data = hdulist[0].data
data = np.nan_to_num(data)
new_data = data[0]
for i in range(1,84): #this depends on number of layers or pages
new_data += data[i]
hdu = pyfits.PrimaryHDU(new_data)
hdu.writeto(new_filename)
One problem with this routine is that WCS coordinates (which are attached to the original data cube) are lost during this conversion.
This is a bit of an old question, but spectral-cube now provides a better solution for this.
Example, based on Teachey's answer:
from spectral_cube import SpectralCube
cube = SpectralCube.read(filename)
summed_image = cube.sum(axis=0)
summed_image.hdu.writeto(new_filename)

Categories

Resources