How to write on top of pandas HDF5 'read-only mode' files?

How to write on top of pandas HDF5 'read-only mode' files? - python

I am storing data using pandas built-in HDF5 methods.
Somehow, these HDF5 files were turned into 'read-only' files, and I am getting a lot of Opening xxx in read-only mode messages when I open those files in write mode and I can't write them, which is something I really need to do.
The thing I really don't understand so far is how come those files turned into read-only, as I am not aware of a piece of code that I wrote that may result in that behavior. (I have tried to check if the data stored in the HDF5 is corrupt, but I am able to read it and manipulate it, so it seems to be working just fine)
I have 2 questions:
How can I append data to those 'read-only mode' HDF5 files? (Can I convert them back to write mode or any other clever solution?)
Is there any pandas method that would change the HDF5 file to a 'read-only mode' by default so I can avoid turning those files into read-only in the first place?
Code:
The piece of code that is raising this issue is, which is the piece I use to save the output I generated:
with pd.HDFStore('data/observer/' + self._currency + '_' + str(ts)) as hdf:
hdf.append(key='observers', value=df, format='table', data_columns=True)
I also use this piece of code to manipulate the outputs that were generated previously:
for the_file in list_dir:
if currency in the_file:
temp_df = pd.read_hdf(folder + the_file)
...
I use some select commands as well to get specific columns from the data files:
with pd.HDFStore('data/observer/' + self.currency + '_' + timestamp) as hdf:
df = hdf.select(key='observers', columns=[x, y])
Error Traceback:
File ".../data_processing/observer_data.py", line 52, in save_obs_to_pandas
hdf.append(key='observers', value=df, format='table', data_columns=True)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 963, in append
**kwargs)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 1341, in _write_to_group
s.write(obj=value, append=append, complib=complib, **kwargs)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 3930, in write
self.set_info()
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 3163, in set_info
self.attrs.info = self.info
File ".../venv/lib/python3.5/site-packages/tables/attributeset.py", line 464, in __setattr__
nodefile._check_writable()
File ".../venv/lib/python3.5/site-packages/tables/file.py", line 2119, in _check_writable
raise FileModeError("the file is not writable")
tables.exceptions.FileModeError: the file is not writable
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".../general_manager.py", line 144, in <module>
gm.run()
File ".../general_manager.py", line 114, in run
list_of_observer_managers = self.load_all_observer_managers()
File ".../general_manager.py", line 64, in load_all_observer_managers
observer = currency_pool.map(self.load_observer_manager, list_of_currencies)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
tables.exceptions.FileModeError: the file is not writable

The issue at hand was that I messed up with OS file permissions. The file I was trying to read belonged to the root (as I had run the code that generated those files with the root) and I was trying to access them with a user account.
I am running debian, and the following command (as root) solved my issues:
chown -R user.user folder
This commands recursively changes permissions of all files inside that folder to user.user.

Related

How to export .csv file from python and using pandas DataFrame

I am trying to export some filtered data from Python using Pandas DF to .csv file (Personal Learning project)
Code : df5.to_csv(r'/C:/Users/j/Downloads/data1/export.csv')
Error:
Traceback (most recent call last):
File "C:\Users\jansa\PycharmProjects\bbb\main.py", line 62, in <module>
df5.to_csv(r'/C:/Users/jansa/Downloads/data1/export.csv')
File "C:\Users\jansa\PycharmProjects\bbb\venv\lib\site-packages\pandas\core\generic.py", line 3551, in to_csv
return DataFrameRenderer(formatter).to_csv(
File "C:\Users\jansa\PycharmProjects\bbb\venv\lib\site-packages\pandas\io\formats\format.py", line 1180, in to_csv
csv_formatter.save()
File "C:\Users\jansa\PycharmProjects\bbb\venv\lib\site-packages\pandas\io\formats\csvs.py", line 241, in save with get_handle(
File "C:\Users\jansa\PycharmProjects\bbb\venv\lib\site-packages\pandas\io\common.py", line 697, in get_handle
check_parent_directory(str(handle))
File "C:\Users\jansa\PycharmProjects\bbb\venv\lib\site-packages\pandas\io\common.py", line 571, in check_parent_directory
raise OSError(rf"Cannot save file into a non-existent directory: '{parent}'")
OSError: Cannot save file into a non-existent directory: '\C:\Users\jansa\Downloads\data1'
I am researching, but cannot pinpoint the error.

Try
df.to_csv(r'C:\path\to\directory\filename.csv')

Generally, in Linux/Mac environment path separator is '/' but in windows, it is '\'. Also, the absolute path starts with '/' in Linux/Mac, while in windows, it starts with / So, using arguments in to_csv with C:\Users\j\Downloads\data1\export.csv' will resolve your issue.
In addition, if you want to get rid of such situations, you can do this:
import os
path = os.path.join('.', 'export.csv') #will save the file in current directory
Also, this returns the os path separator:
print(os.sep)

Pandas can't find the relevant file

I'm attempting to try a t-test in python using the pandas module. However, the same error keeps occuring in which my target file cannot be found. In this case, the target file is brain_size.csv, where the separators are semi-colons. The values which are left blank are represented by a period.
Here's what I have keyed in:
import pandas as pd
data = pd.read_csv('This PC\Desktop\brain_size.csv', sep=';', na_values='.')
Here's the error message. It's a long string
File "C:\Users\Tina Gnali\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\Tina Gnali\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 315, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\Tina Gnali\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 645, in __init__
self._make_engine(self.engine)
File "C:\Users\Tina Gnali\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 799, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Users\Tina Gnali\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1213, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas\parser.pyx", line 358, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3427)
File "pandas\parser.pyx", line 628, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:6861)
OSError: File b'This PC\\Desktop\x08rain_size.csv' does not exist
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
I want to ask:
What am I doing wrong? Why I can't retrieve the target file?
Why my error elicits such a long error message?
What does the "parser" module do?

The problem is with using backslashes "\". You must avoid that. Backslash is reserved for something called escape characters, like new line being denoted with "\n" and stuff. Either use double backslashes "\\" or just forwardslashes "/" or raw literals in your read_csv():
"C:\\Users\\blabla\\"
or
"C:/Users/blabla/"
or
r"C:\Users\blabla\"
Regarding how to identify the error, look for the "error" string in the error message. It is here:
OSError: File b'This PC\\Desktop\x08rain_size.csv' does not exist
This tells you that Python is looking for a file called 'x08rain_size.csv', and obviously you don't have such a file. But what is x08rain? Could it be that b is replaced with x08 when you place a backslash in front of it? Let's ask this to Python:
In [247]: '\b'
Out[247]: '\x08'
There we go!

Sometimes you may not be able to use
"C:\\Users\\blabla\\" or "C:/Users/blabla/"
Solution 1. The other option could be this:
Open Anaconda prompt or cmd, then change the path. Lets assume you are in drive "c" and your folder is in drive e. So after opening the cmd write "e:" and hit enter. Then the command will show you "E:\>". Now you should write "cd E:\Users\blabla\desired_folder". After running that, you should write "jupyter notebook" and run it. It will generate and open a new notebook in the same folder that has your file.
Solution 2. The other simple solution is, after opening the jupyther notebook> file> use the folder icon and choose the right folder.

May be the sep is different it may be "," try it and if still it doesnt works try removing sep and na values and try to keep the file in same directory where the program is present or give actual path

How to turn a comma seperated value TXT into a CSV for machine learning

How do I turn this format of TXT file into a CSV file?
Date,Open,high,low,close
1/1/2017,1,2,1,2
1/2/2017,2,3,2,3
1/3/2017,3,4,3,4
I am sure you can understand? It already has the comma -eparated values.
I tried using numpy.
>>> import numpy as np
>>> table = np.genfromtxt("171028 A.txt", comments="%")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1551, in genfromtxt
fhd = iter(np.lib._datasource.open(fname, 'rb'))
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 151, in open
return ds.open(path, mode)
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 501, in open
raise IOError("%s not found." % path)
OSError: 171028 A.txt not found.
I have (S&P) 500 txt files to do this with.

You can use csv module. You can find more information here.
import csv
txt_file = 'mytext.txt'
csv_file = 'mycsv.csv'
in_txt = csv.reader(open(txt_file, "r"), delimiter=',')
out_csv = csv.writer(open(csv_file, 'w+'))
out_csv.writerows(in_txt)

Per #dclarke's comment, check the directory from which you run the code. As you coded the call, the file must be in that directory. When I have it there, the code runs without error (although the resulting table is a single line with four nan values). When I move the file elsewhere, I reproduce your error quite nicely.
Either move the file to be local, add a local link to the file, or change the file name in your program to use the proper path to the file (either relative or absolute).

Python + open() + write on Windows - permission issue (IOError) for file created on OS X

(I asked this previously, but have since accepted that it is not an issue with openpyxl, so changing tack)
Given an xlsx created on OS X, I open it for writing on that platform using openpyxl load_workbook(). I add some data, then I save it using Workbook.save() (to the same file).
All good on OS X, but when the program, and the XLSX, are transferred onto a Windows machine and executed, I get:
c:\Users\Me\Desktop\ROI>python roi_cut7.py > log.txt
Traceback (most recent call last):
File "roi_cut7.py", line 379, in <module>
main()
File "roi_cut7.py", line 374, in main
processSource(wb, 'Taboola', taboolaSpends, taboolaRevenues)
File "roi_cut7.py", line 276, in processSource
wb.save(r'output.xlsx')
File "C:\Python27\lib\site-packages\openpyxl\workbook\workbook.py", line 298,
in save
save_workbook(self, filename)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 198, in sa
ve_workbook
writer.save(filename, as_template=as_template)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 180, in sa
ve
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
File "C:\Python27\lib\zipfile.py", line 756, in __init__
self.fp = open(file, modeDict[mode])
IOError: [Errno 22] invalid mode ('wb') or filename: 'output.xlsx'
Following extensive investigation, I've tried the following:
chmod 777 on the XLSX before moving it to Windows icacls output.xlsx
Ruling out path issues by using os.join and r'' raw notation
icacls output.xlsx /grant everyone:F on the file after moving it to Windows
Unticking the read-only checkbox under Attributes in the Windows folder Properties
... with no success. The Windows folder is a regular Windows folder. I can write to it both manually with new files, and programmatically. It's just the writing to this XLSX that is the issue.
I could update the script so that it reads from one file and writes to another (newly-created by openpyxl) file, but this workaround will only serve for a single day, since each subsequent day, the program needs to build upon the written file from the previous day, and if the file being read from is unchanged, this breaks down. Interestingly, I notice that if I open the XLSX in Excel, save it as a new file, update my program to read and write to this file new file, and re-execute, it works .... but only once. Every subsequent execution results in the IOError.
What would your next step be?

Python Zipfile - Invalid Argument Errno 22

I have a .zip file and would like to know the names of the files within it. Here's the code:
zip_path = glob.glob(path + '/*.zip')[0]
file = open(zip_path, 'r') # opens without error
if zipfile.is_zipfile(file):
print str(file) # prints to console
my_zipfile = zipfile.ZipFile(zip_path) # throws IOError
Here is the traceback:
<open file u'/Users/me/Documents/project/uploads/assets/peter/offline_message/offline_imgs.zip', mode 'r' at 0x107b2a150>
Traceback (most recent call last):
File "/Users/me/Documents/project/admin_dev/proj_name/views.py", line 1680, in get_dps_app_builder_assets
link_to_assets_zip = zip_dps_app_builder_assets(server_url, app_slug, button_slugs)
File "/Users/me/Documents/project/admin_dev/proj_name/views.py", line 1724, in zip_dps_app_builder_assets
my_zipfile = zipfile.ZipFile(zip_path)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 712, in __init__
self._GetContents()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 746, in _GetContents
self._RealGetContents()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 779, in _RealGetContents
fp.seek(self.start_dir, 0)
IOError: [Errno 22] Invalid argument
I am very confused as to why this is happening since the file is clearly there and is a valid .zip file. The documentation clearly states that you can pass it either the path to the file or a file-like object, neither of which work in my case:
http://docs.python.org/2/library/zipfile#zipfile-objects

I was not able to figure this issue out and ended up doing it a different way entirely.
EDIT: In the Django app I work with, users needed to be able to upload assets in the form of .zip files and later download everything they had uploaded (plus other content we generate dynamically) in another zip with a different structure. So, I wanted to unzip a previously uploaded file and zip up the contents of that file up in another zip, which I couldn't do because of the error. Instead of reading the zip file when the user requested the download, I ended up unzipping it from a Django InMemoryUploadedFile (whose contents I was able to successfully read) and just leaving the unzipped files on the file system to work with later. The contents of the zip are only two smallish image files, so this workaround of unzipping the zip ahead of time to be used later worked OK for my purposes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to write on top of pandas HDF5 'read-only mode' files? - python

Related

How to export .csv file from python and using pandas DataFrame

Pandas can't find the relevant file

How to turn a comma seperated value TXT into a CSV for machine learning

Python + open() + write on Windows - permission issue (IOError) for file created on OS X

Python Zipfile - Invalid Argument Errno 22

Categories

Resources