wanna to read hdf file in dataframe

wanna to read hdf file in dataframe - python

gss = pd.read_hdf('gss.hdf5', 'gs')
this the code i have used on VS code. and i got this
Traceback (most recent call last):
File "d:\pthon_txt\t.py", line 4, in <module>
gss = pd.read_hdf('gss.hdf5', 'gs')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Mohammed\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\pytables.py", line 442, in read_hdf
return store.select(
^^^^^^^^^^^^^
File "C:\Users\Mohammed\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\pytables.py", line 847, in select
raise KeyError(f"No object named {key} in the file")
KeyError: 'No object named gs in the file'
PS D:\pthon_txt>
i wanna to load this hdf file in pandas data frame

To know which keys stored in your HDF store, use the following code:
with pd.HDFStore('gss.hdf5') as store:
print(store.keys())
After that, you will be able to load your data with the correct key:
gss = pd.read_hdf('gss.hdf5', <KEY>)

The error is saying that the key gs doesn't exist in the file. If there's only one key you can use read_hdf without the key parameter, eg :
df = pd.read_hdf('gss.hdf5')

Related

TypeError: Parameter to MergeFrom() must be instance of same class: expected google.datastore.v1.Key.PathElement got PathElement

I am trying to push the data to GCP DataStore, The below code snippet works fine in Jupyter Notebook but it is throwing error in VS Code.
def load_data_json(self, kind_name, data_with_qp_ID, qp_id):
#Load the data in JSON format to upload into the DataStore
data_with_qp_ID_as_JSON = self.convert_DF_to_JSON(data_with_qp_ID, qp_id)
#Loop to iterate through the JSON format and upload into the GCS Storage
for data in data_with_qp_ID_as_JSON.keys():
with self.client.transaction():
incomplete_key = self.client.key(kind_name)
task = datastore.Entity(key=incomplete_key)
task.update(data_with_qp_ID_as_JSON[data])
self.client.put(task)
return 'Ingestion Successful - Data Store Repository'
I have defined the name of the bucket in "Kind Name", data_with_qp_id is a pandas dataframe, qp_id is the name of the column name in pandas. Please see the error message that I get below,
Traceback (most recent call last):
File "/Users/ajaykrishnan/Desktop/Projects/Sprint 3/Data Migration/DataMigration_v1.1/main2.py", line 139, in <module>
write_datastore_db.load_data_json(ds_kindname, bookmarks_data_with_qp_ID, qp_id)
File "/Users/ajaykrishnan/Desktop/Projects/Sprint 3/Data Migration/DataMigration_v1.1/pkg/repository/ds_repository.py", line 50, in load_data_json
self.client.put(task)
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/client.py", line 597, in put
self.put_multi(entities=[entity], retry=retry, timeout=timeout)
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/client.py", line 634, in put_multi
current.put(entity)
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/transaction.py", line 315, in put
super(Transaction, self).put(entity)
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/batch.py", line 227, in put
_assign_entity_to_pb(entity_pb, entity)
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/batch.py", line 373, in _assign_entity_to_pb
bare_entity_pb = helpers.entity_to_protobuf(entity)
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/helpers.py", line 208, in entity_to_protobuf
key_pb = entity.key.to_protobuf()
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/key.py", line 298, in to_protobuf
key.path.append(element)
TypeError: Parameter to MergeFrom() must be instance of same class: expected google.datastore.v1.Key.PathElement got PathElement.
My environment is as follows,
Mac OS Monterey V12.06
Python - Conda 3.9.12

I was able to clear this error. It was an issue with Protobuf library that my environment was using. I downgraded the version of protobuf from 4.x.x to 3.20.1 and it worked.

pd.to_datetime error after saving csv file without doing anything

when I was using pd.to_datetime, my code is like below
rate = pd.read_csv('P2training.csv', header=0)
rate['Date'] = pd.to_datetime(rate['Date'], format='%Y-%m-%d')
rate.set_index('Date', inplace=True, drop=True)
rate.tail(10)
print(rate)
in P2training.csv, first column is 'Date' and this code ran well when I first downloaded P2training dataset. However after I open the csv file and save it without doing anything else, this code started to report errors below. If I put the original downloaded file to replace the 'saved' file, the code can still run properly.
C:\Users\yaojia\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\compat\pandas.py:56:
FutureWarning: The pandas.core.datetools module is deprecated and will
be removed in a future version. Please use the pandas.tseries module
instead. from pandas.core import datetools Traceback (most recent
call last): File
"C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 444, in _convert_listlike
values, tz = tslib.datetime_to_datetime64(arg) File "pandas_libs\tslib.pyx", line 1810, in
pandas._libs.tslib.datetime_to_datetime64 (pandas_libs\tslib.c:33275)
TypeError: Unrecognized value type:
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File
"C:/Users/yaojia/.PyCharmEdu4.0/config/scratches/scratch_7.py", line
23, in
rate['Date'] = pd.to_datetime(rate['Date'], format='%Y-%m-%d') File
"C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 509, in to_datetime
values = _convert_listlike(arg._values, False, format) File "C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 447, in _convert_listlike
raise e File "C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 435, in _convert_listlike
require_iso8601=require_iso8601 File "pandas_libs\tslib.pyx", line 2355, in pandas._libs.tslib.array_to_datetime
(pandas_libs\tslib.c:46617) File "pandas_libs\tslib.pyx", line
2484, in pandas._libs.tslib.array_to_datetime
(pandas_libs\tslib.c:44616) ValueError: time data '12/31/1979'
doesn't match format specified
Process finished with exit code 1
Could anyone give any hint what's going wrong?

I guess you open the csv with excel? If yes, excel recognize that column 'Date' are indeed dates and parse the column in it's own date format (in your case 'day/month/year') and save it this way while you are expecting 'year-month-day'.
I suggest you to open/save your csv with a text editor or change the default excel date format...

Create HDF5 file using pytables with table format and data columns

I want to read a h5 file previously created with PyTables.
The file is read using Pandas, and with some conditions, like this:
pd.read_hdf('myH5file.h5', 'anyTable', where='some_conditions')
From another question, I have been told that, in order for a h5 file to be "queryable" with read_hdf's where argument it must be writen in table format and, in addition, some columns must be declared as data columns.
I cannot find anything about it in PyTables documentation.
The documentation on PyTable's create_table method does not indicate anything about it.
So, right now, if I try to use something like that on my h5 file createed with PyTables I get the following:
>>> d = pd.read_hdf('test_file.h5','basic_data', where='operation==1')
C:\Python27\lib\site-packages\pandas\io\pytables.py:3070: IncompatibilityWarning:
where criteria is being ignored as this version [0.0.0] is too old (or
not-defined), read the file in and write it out to a new file to upgrade (with
the copy_to method)
warnings.warn(ws, IncompatibilityWarning)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 323, in read_hdf
return f(store, True)
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 305, in <lambda>
key, auto_close=auto_close, **kwargs)
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 665, in select
return it.get_result()
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 1359, in get_result
results = self.func(self.start, self.stop, where)
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 658, in func
columns=columns, **kwargs)
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 3968, in read
if not self.read_axes(where=where, **kwargs):
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 3196, in read_axes
values = self.selection.select()
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 4482, in select
start=self.start, stop=self.stop)
File "C:\Python27\lib\site-packages\tables\table.py", line 1567, in read_where
self._where(condition, condvars, start, stop, step)]
File "C:\Python27\lib\site-packages\tables\table.py", line 1528, in _where
compiled = self._compile_condition(condition, condvars)
File "C:\Python27\lib\site-packages\tables\table.py", line 1366, in _compile_condition
compiled = compile_condition(condition, typemap, indexedcols)
File "C:\Python27\lib\site-packages\tables\conditions.py", line 430, in compile_condition
raise _unsupported_operation_error(nie)
NotImplementedError: unsupported operand types for *eq*: int, bytes
EDIT:
The traceback mentions something about IncompatibilityWarning and version [0.0.0], however if I check my versions of Pandas and Tables I get:
>>> import pandas
>>> pandas.__version__
'0.15.2'
>>> import tables
>>> tables.__version__
'3.1.1'
So, I am totally confused.

I had the same issue, and this is what I have done.
Create a HDF5 file by PyTables;
Read this HDF5 file by pandas.read_hdf and use parameters like "where = where_string, columns = selected_columns"
I got the warning message like below and other error messages:
D:\Program
Files\Anaconda3\lib\site-packages\pandas\io\pytables.py:3065:
IncompatibilityWarning: where criteria is being ignored as this
version [0.0.0] is too old (or not-defined), read the file in and
write it out to a new file to upgrade (with the copy_to method)
warnings.warn(ws, IncompatibilityWarning)
I tried commands like this:
hdf5_store = pd.HDFStore(hdf5_file, mode = 'r')
h5cpt_store_new = hdf5_store.copy(hdf5_new_file, complevel=9, complib='blosc')
h5cpt_store_new.close()
And run the command exactly like step 2, it works.
pandas.version
'0.17.1'
tables.version
'3.2.2'

pandas HDFStore - how to reopen?

I created a file by using:
store = pd.HDFStore('/home/.../data.h5')
and stored some tables using:
store['firstSet'] = df1
store.close()
I closed down python and reopened in a fresh environment.
How do I reopen this file?
When I go:
store = pd.HDFStore('/home/.../data.h5')
I get the following error.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/misc/apps/linux/python-2.6.1/lib/python2.6/site-packages/pandas-0.10.0-py2.6-linux-x86_64.egg/pandas/io/pytables.py", line 207, in __init__
self.open(mode=mode, warn=False)
File "/misc/apps/linux/python-2.6.1/lib/python2.6/site-packages/pandas-0.10.0-py2.6-linux-x86_64.egg/pandas/io/pytables.py", line 302, in open
self.handle = _tables().openFile(self.path, self.mode)
File "/apps/linux/python-2.6.1/lib/python2.6/site-packages/tables/file.py", line 230, in openFile
return File(filename, mode, title, rootUEP, filters, **kwargs)
File "/apps/linux/python-2.6.1/lib/python2.6/site-packages/tables/file.py", line 495, in __init__
self._g_new(filename, mode, **params)
File "hdf5Extension.pyx", line 317, in tables.hdf5Extension.File._g_new (tables/hdf5Extension.c:3039)
tables.exceptions.HDF5ExtError: HDF5 error back trace
File "H5F.c", line 1582, in H5Fopen
unable to open file
File "H5F.c", line 1373, in H5F_open
unable to read superblock
File "H5Fsuper.c", line 334, in H5F_super_read
unable to find file signature
File "H5Fsuper.c", line 155, in H5F_locate_signature
unable to find a valid file signature
End of HDF5 error back trace
Unable to open/create file '/home/.../data.h5'
What am I doing wrong here? Thank you.

In my hands, following approach works best:
df = pd.DataFrame(...)
"write"
with pd.HDFStore('test.h5', mode='w') as store:
store.append('df', df, data_columns= df.columns, format='table')
"read"
with pd.HDFStore('test.h5', mode='r') as newstore:
df_restored = newstore.select('df')

You could try doing instead:
store = pd.io.pytables.HDFStore('/home/.../data.h5')
df1 = store['firstSet']
or use the read method directly:
df1 = pd.read_hdf('/home/.../data.h5', 'firstSet')
Either way, you should have pandas 0.12.0 or higher...

I had the same problem and finally fixed it by installing the pytables module (next to the pandas modules which I was using):
conda install pytables
which got me numexpr-2.4.3 and pytables-3.2.0
After that it worked. I am using pandas 0.16.2 under python 2.7.9

Pickle: Reading a dictionary, EOFError

I recently found out about pickle, which is amazing. But it errors on me when used for my actual script, testing it with a one item dictionary it worked fine. My real script is thousands of lines of code storing various objects within maya into it. I do not know if it has anything to do with the size, I have read around a lot of threads here but none are specific to my error.
I have tried writing with all priorities. No luck.
This is my output code:
output = open('locatorsDump.pkl', 'wb')
pickle.dump(l.locators, output, -1)
output.close()
This is my read code:
jntdump = open('locatorsDump.pkl', 'rb')
test = pickle.load(jntdump)
jntdump.close()
This is the error:
# Error: Error in maya.utils._guiExceptHook:
# File "C:\Program Files\Autodesk\Maya2011\Python\lib\site-packages\pymel-1.0.0-py2.6.egg\maya\utils.py", line 277, in formatGuiException
# exceptionMsg = excLines[-1].split(':',1)[1].strip()
# IndexError: list index out of range
#
# Original exception was:
# Traceback (most recent call last):
# File "<maya console>", line 3, in <module>
# File "C:\Program Files\Autodesk\Maya2011\bin\python26.zip\pickle.py", line 1370, in load
# return Unpickler(file).load()
# File "C:\Program Files\Autodesk\Maya2011\bin\python26.zip\pickle.py", line 858, in load
# dispatch[key](self)
# File "C:\Program Files\Autodesk\Maya2011\bin\python26.zip\pickle.py", line 880, in load_eof
# raise EOFError
# EOFError #

Try using pickle.dumps() and pickle.loads() as a test.
If you don't recieve the same error, you know it is related to the file write.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

wanna to read hdf file in dataframe - python

To know which keys stored in your HDF store, use the following code: with pd.HDFStore('gss.hdf5') as store: print(store.keys()) After that, you will be able to load your data with the correct key: gss = pd.read_hdf('gss.hdf5', <KEY>)

The error is saying that the key gs doesn't exist in the file. If there's only one key you can use read_hdf without the key parameter, eg : df = pd.read_hdf('gss.hdf5')

Related

TypeError: Parameter to MergeFrom() must be instance of same class: expected google.datastore.v1.Key.PathElement got PathElement

pd.to_datetime error after saving csv file without doing anything

Create HDF5 file using pytables with table format and data columns

pandas HDFStore - how to reopen?

Pickle: Reading a dictionary, EOFError

Categories

Resources