I have a .zip file and would like to know the names of the files within it. Here's the code:
zip_path = glob.glob(path + '/*.zip')[0]
file = open(zip_path, 'r') # opens without error
if zipfile.is_zipfile(file):
print str(file) # prints to console
my_zipfile = zipfile.ZipFile(zip_path) # throws IOError
Here is the traceback:
<open file u'/Users/me/Documents/project/uploads/assets/peter/offline_message/offline_imgs.zip', mode 'r' at 0x107b2a150>
Traceback (most recent call last):
File "/Users/me/Documents/project/admin_dev/proj_name/views.py", line 1680, in get_dps_app_builder_assets
link_to_assets_zip = zip_dps_app_builder_assets(server_url, app_slug, button_slugs)
File "/Users/me/Documents/project/admin_dev/proj_name/views.py", line 1724, in zip_dps_app_builder_assets
my_zipfile = zipfile.ZipFile(zip_path)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 712, in __init__
self._GetContents()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 746, in _GetContents
self._RealGetContents()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 779, in _RealGetContents
fp.seek(self.start_dir, 0)
IOError: [Errno 22] Invalid argument
I am very confused as to why this is happening since the file is clearly there and is a valid .zip file. The documentation clearly states that you can pass it either the path to the file or a file-like object, neither of which work in my case:
http://docs.python.org/2/library/zipfile#zipfile-objects
I was not able to figure this issue out and ended up doing it a different way entirely.
EDIT: In the Django app I work with, users needed to be able to upload assets in the form of .zip files and later download everything they had uploaded (plus other content we generate dynamically) in another zip with a different structure. So, I wanted to unzip a previously uploaded file and zip up the contents of that file up in another zip, which I couldn't do because of the error. Instead of reading the zip file when the user requested the download, I ended up unzipping it from a Django InMemoryUploadedFile (whose contents I was able to successfully read) and just leaving the unzipped files on the file system to work with later. The contents of the zip are only two smallish image files, so this workaround of unzipping the zip ahead of time to be used later worked OK for my purposes.
Related
Python code:
import tarfile,os
import sys
def extract_system_report (tar_file_path):
extract_path = ""
tar = tarfile.open(sys.argv[1])
for member in tar.getmembers():
print ("member_name is: ")
print (member.name)
tar.extract(member.name, "./crachinfo/")
extract_system_report(sys.argv[1])
while extracting the file getting below error:
>> python tar_read.py /nobackup/deepakhe/POLARIS_POJECT_09102019/hackathon_2021/a.tar.gz
member_name is:
/bootflash/.prst_sync/reload_info
Traceback (most recent call last):
File "tar_read.py", line 38, in <module>
extract_system_report(sys.argv[1])
File "tar_read.py", line 10, in extract_system_report
tar.extract(member.name, "./crachinfo/")
File "/auto/pysw/cel8x/python64/3.6.10/lib/python3.6/tarfile.py", line 2052, in extract
numeric_owner=numeric_owner)
File "/auto/pysw/cel8x/python64/3.6.10/lib/python3.6/tarfile.py", line 2114, in _extract_member
os.makedirs(upperdirs)
File "/auto/pysw/cel8x/python64/3.6.10/lib/python3.6/os.py", line 210, in makedirs
makedirs(head, mode, exist_ok)
File "/auto/pysw/cel8x/python64/3.6.10/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/bootflash'
I am specifying the folder to extract but still it seems trying to create a folder in the root directory. is this behavior expected? I can extract this tar file fine in file manager. is there a way to handle this in python?
I am specifying the folder to extract but still it seems trying to create a folder in the root directory. is this behavior expected?
Yes. The path provided is just a "current directory", it's not a "jail". The documentation even specifically warns about it:
It is possible that files are created outside of path, e.g. members that have absolute filenames starting with "/" or filenames with two dots "..".
I can extract this tar file fine in file manager. is there a way to handle this in python?
Use extractfile to get a pseudo-file handle on the contents of the archived file, then copy that wherever you want.
I am storing data using pandas built-in HDF5 methods.
Somehow, these HDF5 files were turned into 'read-only' files, and I am getting a lot of Opening xxx in read-only mode messages when I open those files in write mode and I can't write them, which is something I really need to do.
The thing I really don't understand so far is how come those files turned into read-only, as I am not aware of a piece of code that I wrote that may result in that behavior. (I have tried to check if the data stored in the HDF5 is corrupt, but I am able to read it and manipulate it, so it seems to be working just fine)
I have 2 questions:
How can I append data to those 'read-only mode' HDF5 files? (Can I convert them back to write mode or any other clever solution?)
Is there any pandas method that would change the HDF5 file to a 'read-only mode' by default so I can avoid turning those files into read-only in the first place?
Code:
The piece of code that is raising this issue is, which is the piece I use to save the output I generated:
with pd.HDFStore('data/observer/' + self._currency + '_' + str(ts)) as hdf:
hdf.append(key='observers', value=df, format='table', data_columns=True)
I also use this piece of code to manipulate the outputs that were generated previously:
for the_file in list_dir:
if currency in the_file:
temp_df = pd.read_hdf(folder + the_file)
...
I use some select commands as well to get specific columns from the data files:
with pd.HDFStore('data/observer/' + self.currency + '_' + timestamp) as hdf:
df = hdf.select(key='observers', columns=[x, y])
Error Traceback:
File ".../data_processing/observer_data.py", line 52, in save_obs_to_pandas
hdf.append(key='observers', value=df, format='table', data_columns=True)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 963, in append
**kwargs)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 1341, in _write_to_group
s.write(obj=value, append=append, complib=complib, **kwargs)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 3930, in write
self.set_info()
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 3163, in set_info
self.attrs.info = self.info
File ".../venv/lib/python3.5/site-packages/tables/attributeset.py", line 464, in __setattr__
nodefile._check_writable()
File ".../venv/lib/python3.5/site-packages/tables/file.py", line 2119, in _check_writable
raise FileModeError("the file is not writable")
tables.exceptions.FileModeError: the file is not writable
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".../general_manager.py", line 144, in <module>
gm.run()
File ".../general_manager.py", line 114, in run
list_of_observer_managers = self.load_all_observer_managers()
File ".../general_manager.py", line 64, in load_all_observer_managers
observer = currency_pool.map(self.load_observer_manager, list_of_currencies)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
tables.exceptions.FileModeError: the file is not writable
The issue at hand was that I messed up with OS file permissions. The file I was trying to read belonged to the root (as I had run the code that generated those files with the root) and I was trying to access them with a user account.
I am running debian, and the following command (as root) solved my issues:
chown -R user.user folder
This commands recursively changes permissions of all files inside that folder to user.user.
I'm trying to unzip a zip file inside my script using Python's zipfile module. The problem is that when I try to unzip this file, it raises Bad magic number for file header error:
This is the error:
..
zip_ref.extractall(destination_to_unzip_file)
File "C:\Python27\lib\zipfile.py", line 1040, in extractall
self.extract(zipinfo, path, pwd)
File "C:\Python27\lib\zipfile.py", line 1028, in extract
return self._extract_member(member, path, pwd)
File "C:\Python27\lib\zipfile.py", line 1082, in _extract_member
with self.open(member, pwd=pwd) as source, \
File "C:\Python27\lib\zipfile.py", line 971, in open
raise BadZipfile("Bad magic number for file header")
zipfile.BadZipfile: Bad magic number for file header
The file I want to unzip is downloaded this way:
_url = """http://edane.drsr.sk/report/ds_dphs_csv.zip"""
def download_platici_dph(self):
if os.path.isfile(_destination_for_downloads+'platici_dph.zip'):
os.remove(_destination_for_downloads+'platici_dph.zip')
with open(_destination_for_downloads+'platici_dph.zip','w') as f:
response = requests.get(_url,stream=True)
if not response.ok:
print 'Something went wrong'
return False
else:
for block in response.iter_content(1024):
f.write(block)
Do anybody knows where is the problem?
Quoth the documentation for open(): "when opening a binary file, you should append 'b' to the mode value to open the file in binary mode"
Open your output file using b for binary:
with open(_destination_for_downloads+'platici_dph.zip','wb') as f:
I tried downloading your archive without using your download-code and then extracting it with:
import zipfile
with zipfile.ZipFile("ds_dphs_csv.zip") as a:
a.extractall()
It worked fine. The exception zipfile.BadZipfile is raised when there is a problem in a header thus I think your file is corrupted after download. There must be a problem with your downloading method.
You can find more details on the exception in this post: Python - Extracting files from a large (6GB+) zip file
(I asked this previously, but have since accepted that it is not an issue with openpyxl, so changing tack)
Given an xlsx created on OS X, I open it for writing on that platform using openpyxl load_workbook(). I add some data, then I save it using Workbook.save() (to the same file).
All good on OS X, but when the program, and the XLSX, are transferred onto a Windows machine and executed, I get:
c:\Users\Me\Desktop\ROI>python roi_cut7.py > log.txt
Traceback (most recent call last):
File "roi_cut7.py", line 379, in <module>
main()
File "roi_cut7.py", line 374, in main
processSource(wb, 'Taboola', taboolaSpends, taboolaRevenues)
File "roi_cut7.py", line 276, in processSource
wb.save(r'output.xlsx')
File "C:\Python27\lib\site-packages\openpyxl\workbook\workbook.py", line 298,
in save
save_workbook(self, filename)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 198, in sa
ve_workbook
writer.save(filename, as_template=as_template)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 180, in sa
ve
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
File "C:\Python27\lib\zipfile.py", line 756, in __init__
self.fp = open(file, modeDict[mode])
IOError: [Errno 22] invalid mode ('wb') or filename: 'output.xlsx'
Following extensive investigation, I've tried the following:
chmod 777 on the XLSX before moving it to Windows icacls output.xlsx
Ruling out path issues by using os.join and r'' raw notation
icacls output.xlsx /grant everyone:F on the file after moving it to Windows
Unticking the read-only checkbox under Attributes in the Windows folder Properties
... with no success. The Windows folder is a regular Windows folder. I can write to it both manually with new files, and programmatically. It's just the writing to this XLSX that is the issue.
I could update the script so that it reads from one file and writes to another (newly-created by openpyxl) file, but this workaround will only serve for a single day, since each subsequent day, the program needs to build upon the written file from the previous day, and if the file being read from is unchanged, this breaks down. Interestingly, I notice that if I open the XLSX in Excel, save it as a new file, update my program to read and write to this file new file, and re-execute, it works .... but only once. Every subsequent execution results in the IOError.
What would your next step be?
I am new to python. I can't figure out what I am doing wrong when trying to read the contents of .tar.gz file into python. The tarfile I would like to read is hosted at the following web address:
ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar.gz
more info on file at this site (just so you can trust contents)
http://www.pubmedcentral.nih.gov/utils/oa/oa.fcgi?id=PMC13901
The tarfile contains .pdf and .nxml copies of the journal article. And also a couple of image files.
If I open the file in my browser by copying and pasting. I can save to a location on my PC and import the tarfile fine using the following commands (note: winzip changes the file from .tar.gz to simply .tar when I save to location):
import tarfile
thetarfile = "C:/Users/dfcm/Documents/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar"
tfile = tarfile.open(thetarfile)
tfile
However, if I try to access the file directly using similar commands:
thetarfile = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar.gz"
bbb = tarfile.open(thetarfile)
That results in the following error:
Traceback (most recent call last):
File "<pyshell#137>", line 1, in <module>
bbb = tarfile.open(thetarfile)
File "C:\Python30\lib\tarfile.py", line 1625, in open
return func(name, "r", fileobj, **kwargs)
File "C:\Python30\lib\tarfile.py", line 1687, in gzopen
fileobj = bltn_open(name, mode + "b")
File "C:\Python30\lib\io.py", line 278, in __new__
return open(*args, **kwargs)
File "C:\Python30\lib\io.py", line 222, in open
closefd)
File "C:\Python30\lib\io.py", line 615, in __init__
_fileio._FileIO.__init__(self, name, mode, closefd)
IOError: [Errno 22] Invalid argument: 'ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar'
Can anyone explain what I am doing wrong when trying to read the .tar.gz file directly from the web address? Thanks in advance. Chris
Unfortunately you cannot just open files from the network. Things are a bit more complex here. You have to instruct the interpreter to create a network request and create an object representing the request state. This can be done using the urllib module.
import urllib.request
import tarfile
thetarfile = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar.gz"
ftpstream = urllib.request.urlopen(thetarfile)
thetarfile = tarfile.open(fileobj=ftpstream, mode="r|gz")
The ftpstream object is a file-like that represents the connection to the ftp server. Then the tarfile module can access this stream. Since we do not pass the filename, we have to specify the compression in the mode parameter.