How to iterate through and delete certain files from Python fcache? - python

In my PyQt5 app, I've been using fache (https://pypi.org/project/fcache/) to cache lots of small files to the user's temp folder for speed. It's working well for caching, but now I need to be able to iterate through the cached files and selectively delete files that are no longer needed.
However when I try to iterate through the FileCache object, I'm getting an error.
thisCache is the name of my cache, and if I print(thisCache) I get:
which is fine.
Then if I do print(thisCache.keys()) I get KeysView(<fcache.cache.FileCache object at 0x000001F7BA0F2848>), which seems correct (I think?). Similarly, printing .values() gives me a ValuesView.
Then if I do print(len(thisCache.keys()) I get: 1903, showing that there are 1903 files in there, which is probably correct. But here's where I get stuck.
If I try to iterate through the KeysView in any way, I get an error. Each of the following attempts:
for f in thisCache.values():
for f in thisCache.keys():
always throws an error:
Process finished with exit code -1073740791 (0xC0000409)
I'm fairly new to Python, so am I just misunderstanding how I'm supposed to iterate through this list? Or is there a bug or gotcha here that I need to work around?
Thanks
::::::::: EDIT ::::::::
After a bit of a delay, here's a reproducile (but not especially minimal or quality) bit of example code.
import random
import string
from fcache.cache import FileCache
from shutil import copyfile
def random_string(stringLength=10):
letters = string.ascii_lowercase
return ''.join(random.choice(letters) for i in range(stringLength))
cacheName = "TestCache"
cache = FileCache(cacheName)
sourceFile = "C:\\TestFile.mov"
targetCount = 50
# copy the file 50 times:
for w in range(1, targetCount+1):
fileName = random_string(50) + ".mov"
targetPath = cache.cache_dir + "\\" + fileName
print("Copying file ", w)
copyfile(sourceFile, targetPath)
cache[str(w)] = targetPath
print("Cached", targetCount, "items.")
print("Syncing cache...")
cache.sync()
# iterate through the cache:
print("Item keys:", cache.keys())
for key in cache.keys():
v = cache[key]
print(key, v)
print("Cache read.")
There is one dependency, which is having a file called "C:\TestFile.mov" on your system, but the path isn't important so this can be pointed to any file. I've tested with other file formats, with the same result.
The error that is thrown is:
Traceback (most recent call last):
File "C:\Users\stuart.bruce\AppData\Local\Programs\Python\Python37\lib\encodings\hex_codec.py", line 19, in hex_decode
return (binascii.a2b_hex(input), len(input))
binascii.Error: Non-hexadecimal digit found
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File
"C:\Users\stuart.bruce\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\stuart.bruce\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\stuart.bruce\PycharmProjects\testproject\test_code.py", line 32, in <module>
for key in cache.keys():
File "C:\Users\stuart.bruce\AppData\Local\Programs\Python\Python37\lib\_collections_abc.py", line 720, in __iter__
yield from self._mapping
File "C:\Users\stuart.bruce\AppData\Local\Programs\Python\Python37\lib\site-packages\fcache\cache.py", line 297, in __iter__
yield self._decode_key(key)
File "C:\Users\stuart.bruce\AppData\Local\Programs\Python\Python37\lib\site-packages\fcache\cache.py", line 211, in _decode_key
bkey = codecs.decode(key.encode(self._keyencoding), 'hex_codec')
binascii.Error: decoding with 'hex_codec' codec failed (Error: Non-hexadecimal digit found)
Line 32 of test_code.py (as mentioned in the error) is the line for key in cache.keys():, so this is where it seems a non-hexidecimal character is being found. But firstly I'm not sure why, and secondly I don't know how to get around it?
(PS. Please note that if you run this code, you'll end up with 50 copies of your chosen file in your temp folder, and nothing will tidy it up automatically!)

After reading the sources of fcache, it seems that the cache_dir should only be used by fcache itself, as it reads all its files to find previously created cache data.
The program (or, better, the module) crashes because you created the other files in that directory, and it cannot deal with them.
The solution is to use another directory to store those files.
import os
# ...
data_dir = os.path.join(os.path.dirname(cache.cache_dir), 'data')
if not os.path.exists(data_dir):
os.mkdir(data_dir)
for w in range(1, targetCount+1):
fileName = random_string(50) + ".mov"
targetPath = os.path.join(data_dir, fileName)
copyfile(sourceFile, targetPath)
cache[str(w)] = targetPath

Related

Iterate over pathlib paths and python-docx: zipfile.BadZipFile

My python skills are a bit rusty since I recently primarily used Rstats. However I ran into the following problem, my goal is that I want to recursively iterate over all .docx files in a directory and change some of the core attributes with the python-docx package.
For the loop, I first created a list with pathlib and glob
from docx import Document
from docx.shared import Inches
import pathlib
# Reading the stats dir
root_dir = pathlib.Path(r"C:\some\Björn\PycharmProjects\mre_docx")
# Get all word files in the stats directory
files = [x for x in root_dir.glob("**/*.docx") if x.is_file()]
files
Output of files looks fine.
[WindowsPath('C:/Users/Björn/PycharmProjects/mre_docx/test1.docx'),
WindowsPath('C:/Users/Björn/PycharmProjects/mre_docx/test2.docx')]
When I now want to read in a document with the list I get a zip error (see full traceback below)
document = Document(files[1])
Traceback (most recent call last):
File "C:\Users\Björn\AppData\Local\Programs\Python\Python39\lib\site-packages\IPython\core\interactiveshell.py", line 3441, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-26-482c5438fa33>", line 1, in <module>
document = Document(files[1])
File "C:\Users\Björn\AppData\Local\Programs\Python\Python39\lib\site-packages\docx\api.py", line 25, in Document
document_part = Package.open(docx).main_document_part
File "C:\Users\Björn\AppData\Local\Programs\Python\Python39\lib\site-packages\docx\opc\package.py", line 128, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "C:\Users\Björn\AppData\Local\Programs\Python\Python39\lib\site-packages\docx\opc\pkgreader.py", line 32, in from_file
phys_reader = PhysPkgReader(pkg_file)
File "C:\Users\Björn\AppData\Local\Programs\Python\Python39\lib\site-packages\docx\opc\phys_pkg.py", line 101, in __init__
self._zipf = ZipFile(pkg_file, 'r')
File "C:\Users\Björn\AppData\Local\Programs\Python\Python39\lib\zipfile.py", line 1257, in __init__
self._RealGetContents()
File "C:\Users\Björn\AppData\Local\Programs\Python\Python39\lib\zipfile.py", line 1324, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
However just running the same line of code, without the list works fine (except for differences in the path separator / and r"\", which I thought should not matter due to the fact that the lists contains pathlib.Path objects).
document = Document(pathlib.Path(r"C:\Users\Björn\PycharmProjects\mre_docx\test1.docx"))
Edit to Comment
I created a total of 4 new word files for this mre. Now I entered text in two of them and two are empty. And to my surprise I found out that the empty ones result in the error.
for file in files:
try:
document = Document(file)
except:
print(f"The file: {file} appears to be corrupted")
Output:
The file: C:\Users\Björn\PycharmProjects\mre_docx\new_file.docx appears to be corrupted
The file: C:\Users\Björn\PycharmProjects\mre_docx\test2.docx appears to be corrupted
Semi Solution to Future Readers
Add a try and except block around the call to Document("Path/to/file.docx"), and print out the respective file for which the function failed. In my case it where just a few, which I could easily edit manually.
You are not doing wrong, since documents are empty you are getting this error. If you open those files type something, you will not get any error. But
According to https://python-docx.readthedocs.io/en/latest/user/documents.html
You can open word documents with different codes.
First:
document = Document()
document.save(files[1])
Second:
document = Document(files[1])
document.save(files[1])
Also According to docs you can open them like files:
with open(files[1], 'rb') as f:
document = Document(f)

IDLE giving an error when MailMerge tries to work with doc/docx files

I would appreciate a hand with this.
It has previously popped corrupt file errors when opening the word file, but if I change .doc to .docx and remove some hyperlinks (I understand from another post somewhere that hyperlinks, footnotes, comments all cause errors), this time IDLE pops out the following error:
Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Programs\Python\Python38-32\Files_tempfiller_\tempfiller.py", line 43, in
document.write('TBCO.docx')
File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\mailmerge.py", line 129, in write
output.writestr(zi.filename, self.zip.read(zi))
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\zipfile.py", line 1475, in read
with self.open(name, "r", pwd) as fp:
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\zipfile.py", line 1532, in open
raise BadZipFile("Truncated file header")
zipfile.BadZipFile: Truncated file header
Because this is in the mailmerge.py file I can't really understand what this is.
My code is as follows:
from __future__ import print_function
from mailmerge import MailMerge
from datetime import date
template = "TBCO.docx" #add .docx suffix if failing
print('1')
document = MailMerge(template)
print(document.get_merge_fields())
print('2')
document.merge(
date = '1.1.1',
name = 'Bob',
nhs = '2223')
print('3')
document.write('TBCO.docx')
print('4')
The prints were for me to see what was happening when it was giving set() repeatedly, but that's fixed. The sense I get from the error message is that it is struggling with the file type for some reason, but I can't make head nor tail of the error. Any help would be appreciated.
Thank you

TypeError: expected str, bytes or os.PathLike object, not int; in subprocess.run()

I'm writing a program to analyze hundreds of thousands of astronomical data files and classify the objects. I've gotten to the very last step - the files are all ready to be passed to the classification software, but the software requires using the command line. So I'm using subprocess.run() to access the command line from my Python script. When I run this code as is, I get the following error:
Traceback (most recent call last):
File "iterator.py", line 30, in <module>
subprocess.run(["mkclass", txtpath, "libr18", typepath, logpath, 1, 3]) #Passes the txt file to the MKCLASS script
File "/opt/anaconda3/lib/python3.7/subprocess.py", line 472, in run
with Popen(*popenargs, **kwargs) as process:
File "/opt/anaconda3/lib/python3.7/subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "/opt/anaconda3/lib/python3.7/subprocess.py", line 1453, in _execute_child
restore_signals, start_new_session, preexec_fn)
TypeError: expected str, bytes or os.PathLike object, not int
Here's the relevant code. Everything up until the last line of code works as expected (at least best I can tell!)
import os
from astropy.io import fits
import sys
import subprocess
from spelunker import *
fitsdir = sys.argv[1] #Directory to find fits files
txtdir = sys.argv[2] #Directory to find/put txt files
typedir = sys.argv[3] #Directory to put output files (output is star type)
logdir = sys.argv[4] #Directory to put log files (process MKCLASS took)
for spec in os.listdir(fitsdir): #Iterates through each spectrum file in spectrum directory
specpath = os.path.join(fitsdir,spec) #Defines the full path to the spectrum
txtpath = os.path.join(txtdir, spec) #Defines the full path for the txt file to be
hdul = fits.open(specpath, ignore_missing_end=True) #Accesses spectrum file
if hdul[2].data.field('CLASS')[0] != "STAR": #Checks if file is a star
#print("File " +str(specpath) +" is not a star") #use for testing only
continue
filemaker(fluxGetter(hdul),waveGetter(hdul), txtpath.strip(".fits")) #Converts spectrum to txt file and puts it in proper directory
#print("done") #use for testing only
hdul.close() #Closes spectrum file
if len(os.listdir(fitsdir)) != len(os.listdir(txtdir)):
print("Some files were not stars")
for txt in os.listdir(txtdir): #iterates through each txt file in directory
txtpath = os.path.join(txtdir, txt) #Defines the full path to the txt file
typepath = os.path.join(typedir, txt.replace("spec","TYPE")) #Defines the full path for the output to be
logpath = os.path.join(logdir, txt.replace("spec","LOG")) #Defines the full path for the log file to be
subprocess.run(["mkclass", txtpath, "libr18", typepath, logpath, 1, 3]) #Passes the txt file to the MKCLASS script
The errors says "no int" but you have integers. Just change those to strings.
subprocess.run(["mkclass", txtpath, "libr18", typepath, logpath, "1", "3"])
Answer from tdelaney directly solves the problem for me.
However, i would like to add something. I was using Thonny IDE (probably the same behaviour with all IDEs) and the error was not specified at the correct line number.
In the following example/printscreen, the error was reported to be at line 133. However, the integers that should have been strings were on line 127 and 128. After adding str(Duration_of_Video) and str(Frame_Per_Second) in place of Duration_of_Video and Frame_Per_Second without the str, it solves the problem.
I lost time in finding the error because the indicated line number of the error was not correct. In the following printscreen, one should note that the code doesn't correspond to the error message since I have added the two str() to provide a printscreen with the solution. Thus, the code is actually working. Removing the two str() would lead to the error message in red in the printscreen.

How to write on top of pandas HDF5 'read-only mode' files?

I am storing data using pandas built-in HDF5 methods.
Somehow, these HDF5 files were turned into 'read-only' files, and I am getting a lot of Opening xxx in read-only mode messages when I open those files in write mode and I can't write them, which is something I really need to do.
The thing I really don't understand so far is how come those files turned into read-only, as I am not aware of a piece of code that I wrote that may result in that behavior. (I have tried to check if the data stored in the HDF5 is corrupt, but I am able to read it and manipulate it, so it seems to be working just fine)
I have 2 questions:
How can I append data to those 'read-only mode' HDF5 files? (Can I convert them back to write mode or any other clever solution?)
Is there any pandas method that would change the HDF5 file to a 'read-only mode' by default so I can avoid turning those files into read-only in the first place?
Code:
The piece of code that is raising this issue is, which is the piece I use to save the output I generated:
with pd.HDFStore('data/observer/' + self._currency + '_' + str(ts)) as hdf:
hdf.append(key='observers', value=df, format='table', data_columns=True)
I also use this piece of code to manipulate the outputs that were generated previously:
for the_file in list_dir:
if currency in the_file:
temp_df = pd.read_hdf(folder + the_file)
...
I use some select commands as well to get specific columns from the data files:
with pd.HDFStore('data/observer/' + self.currency + '_' + timestamp) as hdf:
df = hdf.select(key='observers', columns=[x, y])
Error Traceback:
File ".../data_processing/observer_data.py", line 52, in save_obs_to_pandas
hdf.append(key='observers', value=df, format='table', data_columns=True)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 963, in append
**kwargs)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 1341, in _write_to_group
s.write(obj=value, append=append, complib=complib, **kwargs)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 3930, in write
self.set_info()
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 3163, in set_info
self.attrs.info = self.info
File ".../venv/lib/python3.5/site-packages/tables/attributeset.py", line 464, in __setattr__
nodefile._check_writable()
File ".../venv/lib/python3.5/site-packages/tables/file.py", line 2119, in _check_writable
raise FileModeError("the file is not writable")
tables.exceptions.FileModeError: the file is not writable
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".../general_manager.py", line 144, in <module>
gm.run()
File ".../general_manager.py", line 114, in run
list_of_observer_managers = self.load_all_observer_managers()
File ".../general_manager.py", line 64, in load_all_observer_managers
observer = currency_pool.map(self.load_observer_manager, list_of_currencies)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
tables.exceptions.FileModeError: the file is not writable
The issue at hand was that I messed up with OS file permissions. The file I was trying to read belonged to the root (as I had run the code that generated those files with the root) and I was trying to access them with a user account.
I am running debian, and the following command (as root) solved my issues:
chown -R user.user folder
This commands recursively changes permissions of all files inside that folder to user.user.

How to turn a comma seperated value TXT into a CSV for machine learning

How do I turn this format of TXT file into a CSV file?
Date,Open,high,low,close
1/1/2017,1,2,1,2
1/2/2017,2,3,2,3
1/3/2017,3,4,3,4
I am sure you can understand? It already has the comma -eparated values.
I tried using numpy.
>>> import numpy as np
>>> table = np.genfromtxt("171028 A.txt", comments="%")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1551, in genfromtxt
fhd = iter(np.lib._datasource.open(fname, 'rb'))
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 151, in open
return ds.open(path, mode)
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 501, in open
raise IOError("%s not found." % path)
OSError: 171028 A.txt not found.
I have (S&P) 500 txt files to do this with.
You can use csv module. You can find more information here.
import csv
txt_file = 'mytext.txt'
csv_file = 'mycsv.csv'
in_txt = csv.reader(open(txt_file, "r"), delimiter=',')
out_csv = csv.writer(open(csv_file, 'w+'))
out_csv.writerows(in_txt)
Per #dclarke's comment, check the directory from which you run the code. As you coded the call, the file must be in that directory. When I have it there, the code runs without error (although the resulting table is a single line with four nan values). When I move the file elsewhere, I reproduce your error quite nicely.
Either move the file to be local, add a local link to the file, or change the file name in your program to use the proper path to the file (either relative or absolute).

Categories

Resources