Open .h5 file in Python - python

I am trying to read a h5 file in Python.
The file can be found in this link and it is called 'vstoxx_data_31032014.h5'. The code I am trying to run is from the book Python for Finance, by Yves Hilpisch and goes like this:
import pandas as pd
h5 = pd.HDFStore('path.../vstoxx_data_31032014.h5', 'r')
futures_data = h5['futures_data'] # VSTOXX futures data
options_data = h5['options_data'] # VSTOXX call option data
h5.close()
I am getting the following error:
h5 = pd.HDFStore('path.../vstoxx_data_31032014.h5', 'r')
Traceback (most recent call last):
File "<ipython-input-692-dc4e79ec8f8b>", line 1, in <module>
h5 = pd.HDFStore('path.../vstoxx_data_31032014.h5', 'r')
File "C:\Users\Laura\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 466, in __init__
self.open(mode=mode, **kwargs)
File "C:\Users\Laura\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 637, in open
raise IOError(str(e))
OSError: HDF5 error back trace
File "C:\aroot\work\hdf5-1.8.15-patch1\src\H5F.c", line 604, in H5Fopen
unable to open file
File "C:\aroot\work\hdf5-1.8.15-patch1\src\H5Fint.c", line 1085, in H5F_open
unable to read superblock
File "C:\aroot\work\hdf5-1.8.15-patch1\src\H5Fsuper.c", line 277, in H5F_super_read
file signature not found
End of HDF5 error back trace
Unable to open/create file 'path.../vstoxx_data_31032014.h5'
where I have substituted my working directory for 'path.../' for the purpose of this question.
Does anyone know where this error might be coming from?

In order to open a HDF5 file with the h5py module you can use h5py.File(filename). The documentation can be found here.
import h5py
filename = "vstoxx_data_31032014.h5"
h5 = h5py.File(filename,'r')
futures_data = h5['futures_data'] # VSTOXX futures data
options_data = h5['options_data'] # VSTOXX call option data
h5.close()

import os
wd=os.chdir('pah of your working directory') #change the file path to your working directory
wd=os.getcwd() #request what is the current working directory
print(wd)
if __name__ == '__main__':
# import required libraries
import h5py as h5
import numpy as np
import matplotlib.pyplot as plt
f = h5.File("hdf5 file with its path", "r")
datasetNames = [n for n in f.keys()]
for n in datasetNames:
print(n)

Related

Invalid argument error when using f string in path of DataFrame.to_csv()

I want to write pandas dataframe to a csv file every 10 secs. The csv file name includes the current timestamp. Here is part of the code:
import pandas as pd
import time
while True:
df = pd.read_sql_query('select * from dbo.tbl_tag_values', cnxn)
t = time.localtime()
current_time = time.strftime('%Y-%m-%dT%H:%M:%S',t)
csv_path =f'C:/Users/00_Projects/App/data-{current_time}.csv'
df.to_csv(csv_path)
time.sleep(10)
Without using f-string and a static file name, the script works fine but with the f-string I get the error:
Traceback (most recent call last):
File "c:\Users\00_Projects\App\script.py", line 24, in <module>
df.to_csv(csv_path)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3466, in to_csv
return DataFrameRenderer(formatter).to_csv(
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\format.py", line 1105, in to_csv
csv_formatter.save()
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 237, in save
with get_handle(
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\common.py", line 702, in get_handle
handle = open(
OSError: [Errno 22] Invalid argument: 'C:/Users/00_Projects/App/data-2023-01-02T15:33:19.csv'
I read this post How to use f string in a path location and tried Path from pathlib but got the same error.
My OS is windows.
Thanks for any help!

NameError: name 'find_stack_level' is not defined when trying to get an xlsx file from a website

I was trying to import a xlsx file from a website using the
requests
packaged and it returned me an strange error. The code and the error below.
import numpy as np
import matplotlib as plt
import pandas as pd
from io import BytesIO
import requests as rq
url = "http://pdet.mte.gov.br/images/Novo_CAGED/Jan2022/3-tabelas.xlsx"
data = rq.get(url).content
caged = pd.read_excel(BytesIO(data))
Traceback (most recent call last):
File "D:\Perfil\Desktop\UFV\trabalho_econometria.py", line 9, in <module>
caged = pd.read_excel(BytesIO(data))
File "C:\Users\Windows\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "C:\Users\Windows\anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 344, in read_excel
data = io.parse(
File "C:\Users\Windows\anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 1170, in parse
return self._reader.parse(
File "C:\Users\Windows\anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 504, in parse
if header is not None and is_list_like(header):
NameError: name 'find_stack_level' is not defined
Was trying to read an xlsx sheet from a website and got a strange error.

Read differnet audio file formats from folder using Python

I am attempting to read different audio file formats from a folder using Python. The folder contains mainly .wav and .mp3 audio file formats. When I use the code listed below, I get the error:
Traceback (most recent call last): File
"G:/FAU/PythonCode/scipy_audio_spectrogram.py", line 7, in <module>
fs, Audiodata = wavfile.read(AudioName) File "C:\Python\lib\site-packages\scipy\io\wavfile.py", line 267, in read
file_size, is_big_endian = _read_riff_chunk(fid) File "C:\Python\lib\site-packages\scipy\io\wavfile.py", line 168, in
_read_riff_chunk
"understood.".format(repr(str1))) ValueError: File format b'ID3\x03'... not understood.
How to modify the code so that both .wav and .mp3 files can be read from the folder? Thanks!
Here is the code that I am using:
from scipy.io import wavfile # scipy library to read wav files
import numpy as np
AudioName = r'G:\FAU\PythonCode\4201.mp3' # Audio File
fs, Audiodata = wavfile.read(AudioName)

How to turn a comma seperated value TXT into a CSV for machine learning

How do I turn this format of TXT file into a CSV file?
Date,Open,high,low,close
1/1/2017,1,2,1,2
1/2/2017,2,3,2,3
1/3/2017,3,4,3,4
I am sure you can understand? It already has the comma -eparated values.
I tried using numpy.
>>> import numpy as np
>>> table = np.genfromtxt("171028 A.txt", comments="%")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1551, in genfromtxt
fhd = iter(np.lib._datasource.open(fname, 'rb'))
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 151, in open
return ds.open(path, mode)
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 501, in open
raise IOError("%s not found." % path)
OSError: 171028 A.txt not found.
I have (S&P) 500 txt files to do this with.
You can use csv module. You can find more information here.
import csv
txt_file = 'mytext.txt'
csv_file = 'mycsv.csv'
in_txt = csv.reader(open(txt_file, "r"), delimiter=',')
out_csv = csv.writer(open(csv_file, 'w+'))
out_csv.writerows(in_txt)
Per #dclarke's comment, check the directory from which you run the code. As you coded the call, the file must be in that directory. When I have it there, the code runs without error (although the resulting table is a single line with four nan values). When I move the file elsewhere, I reproduce your error quite nicely.
Either move the file to be local, add a local link to the file, or change the file name in your program to use the proper path to the file (either relative or absolute).

How to properly load an .XLS file in script

I have a python script that opens an .xls file to read its lines and store them in a list and then it does a bunch of other things. When I try to call it from Excel like so:
Sub SampleCall()
RunPython ("import sov_reformat;sov_reformat.sov_convert()")
End Sub
Here's the first few lines of my script:
# In[1]:
import xlwings as xw
import pandas as pd
import numpy as np
import csv
# In[2]:
def sov_convert():
wb = xw.Book.caller()
df2 = pd.read_excel("Full SOV.XLS")
temp_df = df2
I get an error:
Error
Traceback (most recent call last):
File "", line 1, in
File "...\sov_reformat.py", line 70, in
sov_convert()
File "...\sov_reformat.py", line 14, in sov_convert
df2 = pd.read_excel("Full SOV.XLS")
File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 170, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 227, in init
self.book = xlrd.open_workbook(io)
File "C:\Python27\lib\site-packages\xlrd__init__.py", line 395, in open_workbook
with open(filename, "rb") as f:
IOError: [Errno 2] No such file or directory: 'Full SOV.XLS'
I think the line that's causing this error is the following:
with open('rates.csv', 'rb') as f:
reader = csv.reader(f)
rate_combinations = list(reader)
But I don't understand why. When I run the script it does exactly what I want so I know everything else is working.

Categories

Resources