zipping files in python without folder structure

zipping files in python without folder structure - python

I want to create a zip file of some files somewhere on the disk in python. I successfully got the path to the folder and each file name so I did:
with zp(os.path.join(self.savePath, self.selectedIndex + ".zip"), "w") as zip:
for file in filesToZip:
zip.write(self.folderPath + file)
Everything works fine, but the zipfile that is output contains the entire folder structure leading up to the files. Is there a way to only zip the files and not the folders with it?

From the documentation:
ZipFile.write(filename, arcname=None, compress_type=None,
compresslevel=None)
Write the file named filename to the archive, giving it the archive
name arcname (by default, this will be the same as filename, but
without a drive letter and with leading path separators removed).
So, just specify an explicit arcname:
with zp(os.path.join(self.savePath, self.selectedIndex + ".zip"), "w") as zip:
for file in filesToZip:
zip.write(self.folderPath + file, arcname=file)

Maybe I' misunderstand the question but could someone explain to me why the answer is not:
zip.write(file, arcname=os.path.basename(file))
It works for me but, again, I might be missing something in the question...

Related

Different File Paths in Python ZipFile Depending on .write() vs .writestr()

I just wanted to ask quickly if the behavior I'm seeing in Python's zipfile module is expected... I wanted to put together a zip archive. For reasons I don't think I need to get into, I was adding some files using zipfile.writestr() and others using .write(). I was writing some files to zip subdirectory called /scripts and others to a zip subdirectory called /data.
For /data, I originally did this:
for root, _, filenames in os.walk(tmpdirname):
for root_name in filenames:
print(f"Handle zip of {root_name}")
name = os.path.join(root, root_name)
name = os.path.normpath(name)
zipFile.write(name, f'/data/{root_name}')
This worked fine and produced a working archive that I could extract. So far, so good. To write text files to the /script subdirectory, I used:
zipFile.writestr(f'/script/{scriptname}', fileBytes)
Again, so far so good.
Now it gets odd... I wanted to extract files in /data/. So I looked for paths in zipFile.namelist() starting with /data. My code kept missing the files in /data/, however. Doing some more digging, I noticed that the files written using .writestr had a slash at the start of the zipfile path like this: "/scripts/myscript.py". The files written using .write did not have a slash at the start of the path, so the data file paths looked like this: "data/mydata.pickle".
I changed my code to use .writestr() for the data files:
for root, _, filenames in os.walk(tmpdirname):
for root_name in filenames:
print(f"Handle zip of {root_name}")
name = os.path.join(root, root_name)
name = os.path.normpath(name)
with open(name, mode='rb') as extracted_file:
zipFile.writestr(f'/data/{root_name}', extracted_file.read())
Voila, the data files now have slashes at the start of the path. I'm not sure why, however, as I'm providing the same file path either way, and I wouldn't expect using one method versus another would change the paths.
Is this supposed to work this way? Am I missing something obvious here?

Writing zipfile in Python 3.6 without absolute path

I am trying to write a zip file using Python's zipfile module that starts at a certain subfolder but still maintains the tree structure from that subfolder. For example, if I pass "C:\Users\User1\OneDrive\Documents", the zip file will contain everything from Documents onward, with all of Documents' subfolders maintained within Documents. I have the following code:
import zipfile
import os
import datetime
def backup(src, dest):
"""Backup files from src to dest."""
base = os.path.basename(src)
now = datetime.datetime.now()
newFile = f'{base}_{now.month}-{now.day}-{now.year}.zip'
# Set the current working directory.
os.chdir(dest)
if os.path.exists(newFile):
os.unlink(newFile)
newFile = f'{base}_{now.month}-{now.day}-{now.year}_OVERWRITE.zip'
# Write the zipfile and walk the source directory tree.
with zipfile.ZipFile(newFile, 'w') as zip:
for folder, _ , files in os.walk(src):
print(f'Working in folder {os.path.basename(folder)}')
for file in files:
zip.write(os.path.join(folder, file),
arcname=os.path.join(
folder[len(os.path.dirname(folder)) + 1:], file),
compress_type=zipfile.ZIP_DEFLATED)
print(f'\n---------- Backup of {base} to {dest} successful! ----------\n')
I know I have to use the arcname parameter for zipfile.write(), but I can't figure out how to get it to maintain the tree structure of the original directory. The code as it is now writes every subfolder to the first level of the zip file, if that makes sense. I've read several posts suggesting I use os.path.relname() to chop off the root, but I can't seem to figure out how to do it properly. I am also aware that this post looks similar to others on Stack Overflow. I have read those other posts and cannot figure out how to solve this problem.

The arcname parameter will set the exact path within the zip file for the file you are adding. You issue is when you are building the path for arcname you are using the wrong value to get the length of the prefix to remove. Specifically:
arcname=os.path.join(folder[len(os.path.dirname(folder)) + 1:], file)
Should be changed to:
arcname=os.path.join(folder[len(src):], file)

Determine Filename of Unzipped File

Say you unzip a file called file123.zip with zipfile.ZipFile, which yields an unzipped file saved to a known path. However, this unzipped file has a completely random name. How do you determine this completely random filename? Or is there some way to control what the name of the unzipped file is?
I am trying to implement this in python.

By "random" I assume that you mean that the files are named arbitrarily.
You can use ZipFile.read() which unzips the file and returns its contents as a string of bytes. You can then write that string to a named file of your choice.
from zipfile import ZipFile
with ZipFile('file123.zip') as zf:
for i, name in enumerate(zf.namelist()):
with open('outfile_{}'.format(i), 'wb') as f:
f.write(zf.read(name))
This will write each file from the archive to a file named output_n in the current directory. The names of the files contained in the archive are obtained with ZipFile.namelist(). I've used enumerate() as a simple method of generating the file names, however, you could substitute that with whatever naming scheme you require.

If the filename is completely random you can first check for all filenames in a particular directory using os.listdir(). Now you know the filename and can do whatever you want with it :)
See this topic for more information.

Adding file to existing zipfile

I'm using python's zipfile module.
Having a zip file located in a path of:
/home/user/a/b/c/test.zip
And having another file created under /home/user/a/b/c/1.txt
I want to add this file to existing zip, I did:
zip = zipfile.ZipFile('/home/user/a/b/c/test.zip','a')
zip.write('/home/user/a/b/c/1.txt')
zip.close()`
And got all the subfolders appears in path when unzipping the file, how do I just enter the zip file without path's subfolders?
I tried also :
zip.write(os.path.basename('/home/user/a/b/c/1.txt'))
And got an error that file doesn't exist, although it does.

You got very close:
zip.write(path_to_file, os.path.basename(path_to_file))
should do the trick for you.
Explanation: The zip.write function accepts a second argument (the arcname) which is the filename to be stored in the zip archive, see the documentation for zipfile more details.
os.path.basename() strips off the directories in the path for you, so that the file will be stored in the archive under just it's name.
Note that if you only zip.write(os.path.basename(path_to_file)) it will look for the file in the current directory where it (as the error says) does not exist.

import zipfile
# Open a zip file at the given filepath. If it doesn't exist, create one.
# If the directory does not exist, it fails with FileNotFoundError
filepath = '/home/user/a/b/c/test.zip'
with zipfile.ZipFile(filepath, 'a') as zipf:
# Add a file located at the source_path to the destination within the zip
# file. It will overwrite existing files if the names collide, but it
# will give a warning
source_path = '/home/user/a/b/c/1.txt'
destination = 'foobar.txt'
zipf.write(source_path, destination)

Zipped files have extra unwanted folders

I have been having an issue with using the zipfile.Zipfile() function. It zips my files properly, but then has extra folders that I do not want in the output zip file. It does put all my desired files in the .zip but it seems to add the last few directories from the files being written in the .zip file by default. Is there any way to exclude these folders? Here is my code:
import arcpy, os
from os import path as p
import zipfile
arcpy.overwriteOutput = True
def ZipShapes(path, out_path):
arcpy.env.workspace = path
shapes = arcpy.ListFeatureClasses()
# iterate through list of shapefiles
for shape in shapes:
name = p.splitext(shape)[0]
print name
zip_path = p.join(out_path, name + '.zip')
zip = zipfile.ZipFile(zip_path, 'w')
zip.write(p.join(path,shape))
for f in arcpy.ListFiles('%s*' %name):
if not f.endswith('.shp'):
zip.write(p.join(path,f))
print 'All files written to %s' %zip_path
zip.close()
if __name__ == '__main__':
path = r'C:\Shape_test\Census_CedarCo'
out_path = r'C:\Shape_outputs'
ZipShapes(path, out_path)
I tried to post some pictures but I do not have enough reputation points. Basically it is adding 2 extra folders (empty) inside the zip file. So instead of the files being inside the zip like this:
C:\Shape_outputs\Public_Buildings.zip\Public_Buildings.shp
They are showing up like this:
C:\Shape_outputs\Public_Buildings.zip\Shape_test\Census_CedarCo\Public_Buildings.shp
The "Shape_test" and "Census_CedarCo" folders are the directories that the shapefiles I am trying to copy come from, but if I am just writing these files why are the sub directories also being copied into the zip file? I suppose it is not a huge deal since I am getting the files zipped, but it is more of an annoyance than anything.
I assumed that when creating a zip file it would just write the files I specify themselves. Why does it add these extra directories inside the zip file? Is there a way around it? Am I missing something here? I appreciate any input! Thanks

The optional second parameter in ZipFile.write(filename[, arcname[, compress_type]]) is that name used in the archive file. You can strip the offending folders from the front of the path and use the remainder for the archive path name. I'm not sure exactly how arcpy gives you the paths, but something like zip.write(p.join(path,shape), shape) should do it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

zipping files in python without folder structure - python

Maybe I' misunderstand the question but could someone explain to me why the answer is not: zip.write(file, arcname=os.path.basename(file)) It works for me but, again, I might be missing something in the question...

Related

Different File Paths in Python ZipFile Depending on .write() vs .writestr()

Writing zipfile in Python 3.6 without absolute path

Determine Filename of Unzipped File

Adding file to existing zipfile

Zipped files have extra unwanted folders

Categories

Resources