Displaying a large .dat binary file in python

Displaying a large .dat binary file in python - python

I have a large 40 mb (about 173,397 lines) .dat file filled with binary data (random symbols). It is an astronomical photograph. I need to read and display it with Python. I am using a binary file because I will need to extract pixel value data from specific regions of the image. But for now I just need to ingest it into Python. Something like the READU procedure in IDL. Tried numpy and matplotlib but nothing worked. Suggestions?

You need to know the data type and dimensions of the binary file. For example, if the file contains float data, use numpy.fromfile like:
import numpy as np
data = np.fromfile(filename, dtype=float)
Then reshape the array to the dimensions of the image, dims, using numpy.reshape (the equivalent of REFORM in IDL):
im = np.reshape(data, dims)

Related

something is missing when writing data into new tif file with gdal

There are problems in my writing data into .tif file with gdal module in python.
I want to extract data (numpy array) from a tif file and modify some of its values before saving it into a new one, with the new file functioning normally. I use following script:
tif = gdal.Open('data/pre_heilj_mean90_15.tif') #original tif file
imwidth = tif.RasterXSize
imheight = tif.RasterYSize
data = tif.ReadAsArray()
data[100][100] = 100 #modify value
data = data.astype(np.float32)
driver = gdal.GetDriverByName("GTiff")
dataset = driver.Create('data/res.tif', imwidth, imheight, 1, gdal.GDT_Float32)
dataset.SetSpatialRef(tif.GetSpatialRef())
dataset.SetGeoTransform(tif.GetGeoTransform())
dataset.SetProjection(tif.GetProjection())
dataset.GetRasterBand(1).WriteArray(data)
dataset.FlushCache()
dataset=None
data=None
tif=None
I am certain that data in original tif file is 2-d and float32 type.
However, the new tif file(res.tif) is all black in ArcMap:
res.tif
Here is how the original tif file shows in ArcMap:
original tif file
And sizes of the two files differ a lot, original is 5287KB and the new one is 4633KB.
I want to know what goes wrong.(forgive my poor English pls)

You probably forgot to write the nodata value in the metadata of the output file. The fact that it's "black" is probably just due to stretching, if you stretch the output similar (min = ~406) is should look similar.
For example get the nodata value with:
nodata_value = tif.GetRasterBand(1).GetNoDataValue()
Then write/assign it with:
dataset.GetRasterBand(1).SetNoDataValue(nodata_value)
Keep in mind that this is a property of a band, so multiple bands in a single file can potentially have different nodata values.

How to save python array as raw image file

I have an numpy array of size 2592 x 1944, and I want to save it as raw file format like ".nef" or ".raw". I wonder if there is any package or sample code to do this.
I see people asking related questions here How to save numpy array as .raw image?
and someone suggestted using RawPy. However, after looking at RawPy I think it only allows opening a raw image and return the m x n array, which is exactly opposite to what I want to do (save a m x n array into raw image)
Thank you so much for helping!

How to effectively store a very large list in python

Question:I have a big 3D image collection that i would like to store into one file. How should I effectively do it?
Background: The dataset has about 1,000 3D MRI images with a size of 256 by 256 by 156. To avoid frequent files open and close, I was trying to store all of them into one big list and export it.
So far I tried reading each MRI in as 3D numpy array and append it to a list. When i tried to save it using numpy.save, it consumed all my memory and exited with "Memory Error".
Here is the code i tried:
import numpy as np
import nibabel as nib
import os
file_list = os.listdir('path/to/files')
for file in file_list:
mri = nib.load(os.path.join('path/to/files',file))
mri_array = np.array(mri.dataobj)
data.append(mri_array)
np.save('imported.npy',data)
Expected Outcome:
Is there a better way to store such dataset without consuming too much memory?

Using HDF5 file format or Numpy's memmap are the two options that I would go to first if you want to jam all your data into one file. These options do not load all the data into memory.
Python has the h5py package to handle HDF5 files. These have a lot of features, and I would generally lean toward this option. It would look something like this:
import h5py
with h5py.File('data.h5') as h5file:
for n, image in enumerate(mri_images):
h5file[f'image{n}'] = image
memmap works with binary files, so not really feature rich at all. This would look something like:
import numpy as np
bin_file = np.memmap('data.bin', mode='w+', dtype=int, shape=(1000, 256, 256, 156))
for n, image in enumerate(mri_images):
bin_file[n] = image
del bin_file # dumps data to file

Tiff to array - error

Hello, I have some problem with converting Tiff file to numpy array.
I have a 16 bit signed raster file and I want to convert it to numpy array.
I using to this gdal libarary.
import numpy
from osgeo import gdal
ds = gdal.Open("C:/.../dem.tif")
dem = numpy.array(ds.GetRasterBand(1).ReadAsArray())
At first glance, everything converts well, but I compared the result obtained in python with result in GIS software and I got different results.
Python result
Arcmap result
I found many value in numpy array that are below 91 and 278 (real min and max values), that should not exist.

GDAL already returns a Numpy array, and wrapping it in np.array by default creates a copy of that array. Which is an unnecessary performance hit. Just use:
dem = ds.GetRasterBand(1).ReadAsArray()
Or if its a single-band raster, simply:
dem = ds.ReadAsArray()
Regading the statistics, are you sure ArcMap shows the absolute high/low value? I know QGIS for example often draws the statistics from a sample of the dataset (for performance) and depending on the settings sometimes uses a percentile (eg 1%, 99%).
edit: BTW, is this a public dataset? Like an SRTM tile? It might help if you list the source.

Best dtype for creating large arrays with numpy

I am looking to store pixel values from satellite imagery into an array. I've been using
np.empty((image_width, image_length)
and it worked for smaller subsets of an image, but when using it on the entire image (3858 x 3743) the code terminates very quickly and all I get is an array of zeros.
I load the image values into the array using a loop and opening the image with gdal
img = gdal.Open(os.path.join(fn + "\{0}".format(fname))).ReadAsArray()
but when I include print img_array I end up with just zeros.
I have tried almost every single dtype that I could find in the numpy documentation but keep getting the same result.
Is numpy unable to load this many values or is there a way to optimize the array?
I am working with 8-bit tiff images that contain NDVI (decimal) values.
Thanks

Not certain what type of images you are trying to read, but in the case of radarsat-2 images you can the following:
dataset = gdal.Open("RADARSAT_2_CALIB:SIGMA0:" + inpath + "product.xml")
S_HH = dataset.GetRasterBand(1).ReadAsArray()
S_VV = dataset.GetRasterBand(2).ReadAsArray()
# gets the intensity (Intensity = re**2+imag**2), and amplitude = sqrt(Intensity)
self.image_HH_I = numpy.real(S_HH)**2+numpy.imag(S_HH)**2
self.image_VV_I = numpy.real(S_VV)**2+numpy.imag(S_VV)**2
But that is specifically for that type of images (in this case each image contains several bands, so i need to read in each band separately with GetRasterBand(i), and than do ReadAsArray() If there is a specific GDAL driver for the type of images you want to read in, life gets very easy
If you give some more info on the type of images you want to read in, i can maybe help more specifically
Edit: did you try something like this ? (not sure if that will work on tiff, or how many bits the header is, hence the something:)
A=open(filename,"r")
B=numpy.fromfile(A,dtype='uint8')[something:].reshape(3858,3743)
C=B*1.0
A.close()
Edit: The problem is solved when using 64bit python instead of 32bit, due to memory errors at 2Gb when using the 32bit python version.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Displaying a large .dat binary file in python - python

Related

something is missing when writing data into new tif file with gdal

How to save python array as raw image file

How to effectively store a very large list in python

Tiff to array - error

Best dtype for creating large arrays with numpy

Categories

Resources