Python gdal to read HDF5 with enbedded compression

Python gdal to read HDF5 with enbedded compression - python

I am trying to access HDF5 with the compressed image datablok. I use the classical command gdal
f = gdal.Open(path+product)
but this seems not working since the file is pointing to none has you can see below
Starting processing proba database
processing PROBAV_L1C_20131009_092303_2_V001.HDF5
None
processing PROBAV_L1C_20130925_092925_2_V001.HDF5
None
Processing complete
I would like to ask if there is someone can give me some indication how to handle hdf5 which gdal without using h5py which does not support compressed datablock as well.
Thanks

It couldn't open the file, either because it couldn't see the path, or you don't have an HDF5 driver for Python. The behaviour returning None is expected behaivour, but can be modified to raise an exception if it cannot open the file:
from osgeo import gdal
gdal.UseExceptions()
if not gdal.GetDriverByName('HDF5'):
raise Exception('HDF5 driver is not available')

I think you miss protocol before Open.
This works for me with other Proba images:
from os import gddal
path="PROBAV_L2A_20140321_031709_2_333M_V001.HDF5"
product="LEVEL2A/GEOMETRY/SAA"
f = gdal.Open("HDF5:\"{}\"://{}".format(path,product))
f.ReadAsArray()
You could also read the complete name using GetSubDatasets which returns a list of tuples:
ds = gdal.Open(path)
subdataset_read = ds.GetSubDatasets()[0]
print("Subdataset: ",subdataset_read)
ds_sub = gdal.Open(subdataset_read[0],
gdal.GA_ReadOnly)
ds_sub.ReadAsArray()

Related

Unable to load in Python hdf file saved using Igor pro

The file is Imported with no problem in Wolfran Matematica
using Import['filename'] I get:
{"Amplitude", "trace #", "time (sec)"}
using Import['filename', {"Datasets", "time (sec)"}] I get the time data
When I try to uploaded using pandas or h5py it does'n work
In pandas I used: r = pd.read_hdf('filename')
In h5py I used: r = h5py.File('filename', 'r')
Both approaches give similar error messages, mainly:
Unable to open file (file signature not found)
# can't auto open/close if we are using an iterator so delegate to the iterator

Read OpenAir File using Python GDAL

I need to read OpenAir files in Python.
According to the following vector driver description, GDAL has built-in OpenAir functionality:
https://gdal.org/drivers/vector/openair.html
However there is no example code for reading such OpenAir files.
So far I have tried to read a sample file using the following lines:
from osgeo import gdal
airspace = gdal.Open('export.txt')
However it returns me the following error:
ERROR 4: `export.txt' not recognized as a supported file format.
I already looked at vectorio however no OpenAir functionality has been implemented.
Why do I get the error above?
In case anyone wants to reproduce the problem: sample OpenAir files can easily be generated using XContest:
https://airspace.xcontest.org/

Since you're dealing with vector data, you need to use ogr instead of gdal (it's normally packaged along with gdal)
So you can do:
from osgeo import ogr
ds = ogr.Open('export.txt')
layer = ds.GetLayer(0)
featureCount = layer.GetFeatureCount()
print(featureCount)
There's plenty of info out there on using ogr, but this cookbook might be helpful.

Detecting .mat version using python

I am aware of the two alternatives to read .mat files using python. For .mat files prior to the 7.3 version the function scipy.io.loadmat works perfectly. For .mat files from the 7.3 version, you need to use a HDF5 reader like h5py.
My question is; is there a way to find, for a given file, its version of .mat within python? This way I can create a function that reads any .mat.

EAFP (it is easier to ask forgiveness than permission)
scipy.loadmat
does check the version and raises an error if it is not supported.
try:
import scipy.io as sio
test = sio.loadmat('test.mat')
except NotImplementedError:
import h5py
with h5py.File('test.mat', 'r') as hf:
data = hf['name-of-dataset'][:]
except:
ValueError('could not read at all...')
If you want to do the checking yourself, you can use get_matfile_version() in scipy.io.matlab.miobase.py. Usage as in the first link.

can't open shape file with GeoPandas

I don't seem to be able to open the zip3.zip shape file I download from (http://www.vdstech.com/usa-data.aspx)
Here is my code:
import geopandas as gpd
data = gpd.read_file("data/zip3.shp")
this gives me the error:
CPLE_AppDefinedError: b'Recode from CP437 to UTF-8 failed with the error: "Invalid argument".'

As per my answer on this question, seems like your dataset contains non-UTF characters. If you are facing this similar issue, chances are that using encoding-"utf-8" won't help as Fiona's open() call will still fail.
If other solutions don't work, two solutions I propose that solved this issue are:
Open your shapefile on a GIS editor (like QGis), then save it again making sure you select the Encoding option to "UTF-8". After this you should have no problem when calling gpd.read_file("data/zip3.shp").
You can also achieve this format change in Python using GDAL, by reading your shapefile and saving it again. This will effectively change the encoding to UTF-8, as this is the default encoding as indicated in the docs for the CreateDataSource() method. For this try the following code snippet:
from osgeo import ogr
driver = ogr.GetDriverByName("ESRI Shapefile")
ds = driver.Open("nbac_2016_r2_20170707_1114.shp", 0) #open your shapefile
#get its layer
layer = ds.GetLayer()
#create new shapefile to convert
ds2 = driver.CreateDataSource('convertedShape.shp')
#create a Polygon layer, as the one your Shapefile has
layer2 = ds2.CreateLayer('', None, ogr.wkbPolygon)
#iterate over all features of your original shapefile
for feature in layer:
#and create a new feature on your converted shapefile with those features
layer2.CreateFeature(feature)
#proper closing
ds = layer = ds2 = layer2 = None

It looks like this shapefile doesn't have an associated cpg specifying the encoding of the .dbf file, and then falling back to trying to use your default system encoding isn't working either. You should be able to open this with:
data = gpd.read_file("data/zip3.shp", encoding="utf-8")
geopandas relies on fiona for shapefile reading, and you may need to upgrade your fiona version for this to work; see some discussion here

Since you probably have GDAL installed, I recommend converting the file to UTF-8 using the CLI:
ogr2ogr output.shp input.shp -lco ENCODING=UTF-8
Worked like a charm for me. It's much faster than QGIS and can be used in a cluster environment. I also posted this answer here. Specifying the encoding in geopandas did not work for me.

Maybe the file is dependent on other files.
I faced the same problem and when I copied other files that this shapefile is dependent on, the code ran correctly but requested to install another package called descartes. When I installed the package, the code ran correctly

Read matlab file (*.mat) from zipped file without extracting to directory in Python

This specific questions stems from the attempt to handle large data sets produced by a MATLAB algorithm so that I can process them with python algorithms.
Background: I have large arrays in MATLAB (typically 20x20x40x15000 [i,j,k,frame]) and I want to use them in python. So I save the array to a *.mat file and use scipy.io.loadmat(fname) to read the *.mat file into a numpy array. However, a problem arises in that if I try to load the entire *.mat file in python, a memory error occurs. To get around this, I slice the *.mat file into pieces, so that I can load the pieces one at a time into a python array. If I divide up the *.mat by frame, I now have 15,000 *.mat files which quickly becomes a pain to work with (at least in windows). So my solution is to use zipped files.
Question: Can I use scipy to directly read a *.mat file from a zipped file without first unzipping the file to the current working directory?
Specs: Python 2.7, windows xp
Current code:
import scipy.io
import zipfile
import numpy as np
def readZip(zfilename,dim,frames):
data=np.zeros((dim[0],dim[1],dim[2],frames),dtype=np.float32)
zfile = zipfile.ZipFile( zfilename, "r" )
i=0
for info in zfile.infolist():
fname = info.filename
zfile.extract(fname)
mat=scipy.io.loadmat(fname)
data[:,:,:,i]=mat['export']
mat.clear()
i=i+1
return data
Tried code:
mat=scipy.io.loadmat(zfile.read(fname))
produces this error:
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
mat=scipy.io.loadmat(zfile.open(fname))
produces this error:
fileobj.seek(0)
UnsupportedOperation: seek
Any other suggestions on handling the data are appreciated.
Thanks!

I am pretty sure that the answer to my question is NO and there are better ways to accomplish what I am trying to do.
Regardless, with the suggestion from J.F. Sebastian, I have devised a solution.
Solution: Save the data in MATLAB in the HDF5 format, namely hdf5write(fname, '/data', data_variable). This produces a *.h5 file which then can be read into python via h5py.
python code:
import h5py
r = h5py.File(fname, 'r+')
data = r['data']
I can now index directly into the data, however is stays on the hard drive.
print data[:,:,:,1]
Or I can load it into memory.
data_mem = data[:]
However, this once again gives memory errors. So, to get it into memory I can loop through each frame and add it to a numpy array.
h5py FTW!

In one of my frozen applications we bundle some files into the .bin file that py2exe creates, then pull them out like this:
z = zipfile.ZipFile(os.path.join(myDir, 'common.bin'))
data = z.read('schema-new.sql')
I am not certain if that would feed your .mat files into scipy, but I'd consider it worth a try.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python gdal to read HDF5 with enbedded compression - python

Related

Unable to load in Python hdf file saved using Igor pro

Read OpenAir File using Python GDAL

Detecting .mat version using python

can't open shape file with GeoPandas

Read matlab file (*.mat) from zipped file without extracting to directory in Python

Categories

Resources