Unable to load in Python hdf file saved using Igor pro - python

The file is Imported with no problem in Wolfran Matematica
using Import['filename'] I get:
{"Amplitude", "trace #", "time (sec)"}
using Import['filename', {"Datasets", "time (sec)"}] I get the time data
When I try to uploaded using pandas or h5py it does'n work
In pandas I used: r = pd.read_hdf('filename')
In h5py I used: r = h5py.File('filename', 'r')
Both approaches give similar error messages, mainly:
Unable to open file (file signature not found)
# can't auto open/close if we are using an iterator so delegate to the iterator

Related

How to keep hdf5 binary of a pandas dataframe in-memory?

I would like to get the byte contents of a pandas dataframe exported as hdf5, ideally without actually saving the file (i.e., in-memory).
On python>=3.6, < 3.9 (and pandas==1.2.4, pytables==3.6.1) the following used to work:
import pandas as pd
with pd.HDFStore(
"in-memory-save-file",
mode="w",
driver="H5FD_CORE",
driver_core_backing_store=0,
) as store:
store.put("my_key", df, format="table")
binary_data = store._handle.get_file_image()
Where df is the dataframe to be converted to hdf5, and the last line calls this pytables function.
However, starting with python 3.9, I get the following error when using the snippet above:
File "tables/hdf5extension.pyx", line 523, in tables.hdf5extension.File.get_file_image
tables.exceptions.HDF5ExtError: Unable to retrieve the size of the buffer for the file image. Plese note that not all drivers provide support for image files.
The error is raised by the same pytables function linked above, apparently due to issues while retrieving the size of the buffer for the file image. I don't understand the ultimate reason for it, though.
I have tried other alternatives such as saving to a BytesIO file-object, so far unsuccessfully.
How can I keep the hdf5 binary of a pandas dataframe in-memory on python 3.9?
The fix was to do conda install -c conda-forge pytables instead of pip install pytables. I still don't understand the ultimate reason behind the error, though.

R Studio: pdf () failed to load default encoding and failed to load encoding file ISOLatin1.enc

First time poster with a question about RStudio.
I am following a tutorial to use a package called Plum in R (https://github.com/maquinolopez/Plum).
My system set-up on R is as follows: R (3.5.1), with rPython (R package; Python 2.7), numpy (python package), scipy (python package), matplotlib (python package).
library(devtools)
install_github("maquinolopez/Plum")
library(Plum)
install.packages("rPython", configure.vars= "RPYTHON_PYTHON_VERSION=2")
runPlum() #dummy dataset
When I runPlum(), which has a dummy dataset that I am working with to learn the package, the program works; however, I get the following errors every time:
Error in pdf(paste(folder, paste("Chronologylines ", Core.name, ".pdf", : failed to load default encoding
In addition: Warning message: In pdf(paste(folder, paste("Chronologylines ", Core.name, ".pdf", : failed to load encoding file 'ISOLatin1.enc'
The result is that I cannot see the pdfs with the data plotted up -- very annoying!
I have tried putting pdf.options(encoding='ISOLatin1.enc') before I run Plum, to no avail.
I have tried manually putting UTI-8 in the encoding option before a run, which also did not fix anything.
Any tips on how to fix this?
Thank you!

Error: Line magic function

I'm trying to read a file using python and I keep getting this error
ERROR: Line magic function `%user_vars` not found.
My code is very basic just
names = read_csv('Combined data.csv')
names.head()
I get this for anytime I try to read or open a file. I tried using this thread for help.
ERROR: Line magic function `%matplotlib` not found
I'm using enthought canopy and I have IPython version 2.4.1. I made sure to update using the IPython installation page for help. I'm not sure what's wrong because it should be very simple to open/read files. I even get this error for opening text files.
EDIT:
I imported traceback and used
print(traceback.format_exc())
But all I get is none printed. I'm not sure what that means.
Looks like you are using Pandas. Try the following (assuming your csv file is in the same path as the your script lib) and insert it one line at a time if you are using the IPython Shell:
import pandas as pd
names = pd.read_csv('Combined data.csv')
names.head()

Python gdal to read HDF5 with enbedded compression

I am trying to access HDF5 with the compressed image datablok. I use the classical command gdal
f = gdal.Open(path+product)
but this seems not working since the file is pointing to none has you can see below
Starting processing proba database
processing PROBAV_L1C_20131009_092303_2_V001.HDF5
None
processing PROBAV_L1C_20130925_092925_2_V001.HDF5
None
Processing complete
I would like to ask if there is someone can give me some indication how to handle hdf5 which gdal without using h5py which does not support compressed datablock as well.
Thanks
It couldn't open the file, either because it couldn't see the path, or you don't have an HDF5 driver for Python. The behaviour returning None is expected behaivour, but can be modified to raise an exception if it cannot open the file:
from osgeo import gdal
gdal.UseExceptions()
if not gdal.GetDriverByName('HDF5'):
raise Exception('HDF5 driver is not available')
I think you miss protocol before Open.
This works for me with other Proba images:
from os import gddal
path="PROBAV_L2A_20140321_031709_2_333M_V001.HDF5"
product="LEVEL2A/GEOMETRY/SAA"
f = gdal.Open("HDF5:\"{}\"://{}".format(path,product))
f.ReadAsArray()
You could also read the complete name using GetSubDatasets which returns a list of tuples:
ds = gdal.Open(path)
subdataset_read = ds.GetSubDatasets()[0]
print("Subdataset: ",subdataset_read)
ds_sub = gdal.Open(subdataset_read[0],
gdal.GA_ReadOnly)
ds_sub.ReadAsArray()

Read matlab file (*.mat) from zipped file without extracting to directory in Python

This specific questions stems from the attempt to handle large data sets produced by a MATLAB algorithm so that I can process them with python algorithms.
Background: I have large arrays in MATLAB (typically 20x20x40x15000 [i,j,k,frame]) and I want to use them in python. So I save the array to a *.mat file and use scipy.io.loadmat(fname) to read the *.mat file into a numpy array. However, a problem arises in that if I try to load the entire *.mat file in python, a memory error occurs. To get around this, I slice the *.mat file into pieces, so that I can load the pieces one at a time into a python array. If I divide up the *.mat by frame, I now have 15,000 *.mat files which quickly becomes a pain to work with (at least in windows). So my solution is to use zipped files.
Question: Can I use scipy to directly read a *.mat file from a zipped file without first unzipping the file to the current working directory?
Specs: Python 2.7, windows xp
Current code:
import scipy.io
import zipfile
import numpy as np
def readZip(zfilename,dim,frames):
data=np.zeros((dim[0],dim[1],dim[2],frames),dtype=np.float32)
zfile = zipfile.ZipFile( zfilename, "r" )
i=0
for info in zfile.infolist():
fname = info.filename
zfile.extract(fname)
mat=scipy.io.loadmat(fname)
data[:,:,:,i]=mat['export']
mat.clear()
i=i+1
return data
Tried code:
mat=scipy.io.loadmat(zfile.read(fname))
produces this error:
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
mat=scipy.io.loadmat(zfile.open(fname))
produces this error:
fileobj.seek(0)
UnsupportedOperation: seek
Any other suggestions on handling the data are appreciated.
Thanks!
I am pretty sure that the answer to my question is NO and there are better ways to accomplish what I am trying to do.
Regardless, with the suggestion from J.F. Sebastian, I have devised a solution.
Solution: Save the data in MATLAB in the HDF5 format, namely hdf5write(fname, '/data', data_variable). This produces a *.h5 file which then can be read into python via h5py.
python code:
import h5py
r = h5py.File(fname, 'r+')
data = r['data']
I can now index directly into the data, however is stays on the hard drive.
print data[:,:,:,1]
Or I can load it into memory.
data_mem = data[:]
However, this once again gives memory errors. So, to get it into memory I can loop through each frame and add it to a numpy array.
h5py FTW!
In one of my frozen applications we bundle some files into the .bin file that py2exe creates, then pull them out like this:
z = zipfile.ZipFile(os.path.join(myDir, 'common.bin'))
data = z.read('schema-new.sql')
I am not certain if that would feed your .mat files into scipy, but I'd consider it worth a try.

Categories

Resources