I'm using python's zipfile module.
Having a zip file located in a path of:
/home/user/a/b/c/test.zip
And having another file created under /home/user/a/b/c/1.txt
I want to add this file to existing zip, I did:
zip = zipfile.ZipFile('/home/user/a/b/c/test.zip','a')
zip.write('/home/user/a/b/c/1.txt')
zip.close()`
And got all the subfolders appears in path when unzipping the file, how do I just enter the zip file without path's subfolders?
I tried also :
zip.write(os.path.basename('/home/user/a/b/c/1.txt'))
And got an error that file doesn't exist, although it does.
You got very close:
zip.write(path_to_file, os.path.basename(path_to_file))
should do the trick for you.
Explanation: The zip.write function accepts a second argument (the arcname) which is the filename to be stored in the zip archive, see the documentation for zipfile more details.
os.path.basename() strips off the directories in the path for you, so that the file will be stored in the archive under just it's name.
Note that if you only zip.write(os.path.basename(path_to_file)) it will look for the file in the current directory where it (as the error says) does not exist.
import zipfile
# Open a zip file at the given filepath. If it doesn't exist, create one.
# If the directory does not exist, it fails with FileNotFoundError
filepath = '/home/user/a/b/c/test.zip'
with zipfile.ZipFile(filepath, 'a') as zipf:
# Add a file located at the source_path to the destination within the zip
# file. It will overwrite existing files if the names collide, but it
# will give a warning
source_path = '/home/user/a/b/c/1.txt'
destination = 'foobar.txt'
zipf.write(source_path, destination)
Related
I'm trying to create a zip folder with a .txt file in it. But when I open test_20210616.zip, test is available as a folder and not as a .txt file.
with zipfile.ZipFile('/dbfs/Test/test_20210616.zip', 'w', allowZip64 = True) as z:
z.write('/dbfs/Test/','test.txt')
In the docs of zipfile the function ZipFile.write looks as follows
ZipFile.write(filename, arcname=None, compress_type=None, compresslevel=None)
Since you are calling it with z.write('/dbfs/Test/','test.txt') you are writing the folder /dbfs/Test/ into the zip file and giving it the name text.txt
Simply pass the whole path to the file as the first argument.
I am trying to write a zip file using Python's zipfile module that starts at a certain subfolder but still maintains the tree structure from that subfolder. For example, if I pass "C:\Users\User1\OneDrive\Documents", the zip file will contain everything from Documents onward, with all of Documents' subfolders maintained within Documents. I have the following code:
import zipfile
import os
import datetime
def backup(src, dest):
"""Backup files from src to dest."""
base = os.path.basename(src)
now = datetime.datetime.now()
newFile = f'{base}_{now.month}-{now.day}-{now.year}.zip'
# Set the current working directory.
os.chdir(dest)
if os.path.exists(newFile):
os.unlink(newFile)
newFile = f'{base}_{now.month}-{now.day}-{now.year}_OVERWRITE.zip'
# Write the zipfile and walk the source directory tree.
with zipfile.ZipFile(newFile, 'w') as zip:
for folder, _ , files in os.walk(src):
print(f'Working in folder {os.path.basename(folder)}')
for file in files:
zip.write(os.path.join(folder, file),
arcname=os.path.join(
folder[len(os.path.dirname(folder)) + 1:], file),
compress_type=zipfile.ZIP_DEFLATED)
print(f'\n---------- Backup of {base} to {dest} successful! ----------\n')
I know I have to use the arcname parameter for zipfile.write(), but I can't figure out how to get it to maintain the tree structure of the original directory. The code as it is now writes every subfolder to the first level of the zip file, if that makes sense. I've read several posts suggesting I use os.path.relname() to chop off the root, but I can't seem to figure out how to do it properly. I am also aware that this post looks similar to others on Stack Overflow. I have read those other posts and cannot figure out how to solve this problem.
The arcname parameter will set the exact path within the zip file for the file you are adding. You issue is when you are building the path for arcname you are using the wrong value to get the length of the prefix to remove. Specifically:
arcname=os.path.join(folder[len(os.path.dirname(folder)) + 1:], file)
Should be changed to:
arcname=os.path.join(folder[len(src):], file)
Say you unzip a file called file123.zip with zipfile.ZipFile, which yields an unzipped file saved to a known path. However, this unzipped file has a completely random name. How do you determine this completely random filename? Or is there some way to control what the name of the unzipped file is?
I am trying to implement this in python.
By "random" I assume that you mean that the files are named arbitrarily.
You can use ZipFile.read() which unzips the file and returns its contents as a string of bytes. You can then write that string to a named file of your choice.
from zipfile import ZipFile
with ZipFile('file123.zip') as zf:
for i, name in enumerate(zf.namelist()):
with open('outfile_{}'.format(i), 'wb') as f:
f.write(zf.read(name))
This will write each file from the archive to a file named output_n in the current directory. The names of the files contained in the archive are obtained with ZipFile.namelist(). I've used enumerate() as a simple method of generating the file names, however, you could substitute that with whatever naming scheme you require.
If the filename is completely random you can first check for all filenames in a particular directory using os.listdir(). Now you know the filename and can do whatever you want with it :)
See this topic for more information.
I am quite new to python.Here i am trying to create zip file of "diveintomark-diveintopython3-793871b' directory.I changed the current working directory using os.chdir() function.The zip file is created but the problem is when i extract the zip file i get the the following directory
Users/laiba/Desktop/diveintomark-diveintopython3-793871b
but i only want diveintomark-diveintopython3-793871b folder inside my zip folder not the whole nested directory created .Why is this happening and how i can solve this?
import zipfile, os
os.chdir('c:\\Users\\laiba\\Desktop')
myzip=zipfile.ZipFile('diveZip.zip','w',zipfile.ZIP_DEFLATED)
for folder,subfolder,file in os.walk('diveintomark-diveintopython3-793871b'):
myzip.write(folder)
for each in subfolder:
myzip.write(os.path.abspath(os.path.join(folder,each)))
for each in file:
myzip.write(os.path.abspath(os.path.join(folder,each)))
you could use argument arcname: name of the item in the archive as opposed to the full path name. But here you don't need it because you already are in the correct directory. Just drop the abspath and you're done (and also the duplicate folder entry)
import zipfile, os
os.chdir('c:\\Users\\laiba\\Desktop')
myzip=zipfile.ZipFile('diveZip.zip','w',zipfile.ZIP_DEFLATED)
for folder,subfolder,file in os.walk('diveintomark-diveintopython3-793871b'):
for each in subfolder+file:
myzip.write(os.path.join(folder,each))
myzip.close()
This is possible to do without changing directories but more complex, also more elegant since you don't have to chdir
import zipfile, os
root_dir = r"c:\Users\laiba\Desktop"
myzip=zipfile.ZipFile(os.path.join(root_dir,'diveZip.zip'),'w',zipfile.ZIP_DEFLATED)
for folder,subfolder,file in os.walk(os.path.join(root_dir,'diveintomark-diveintopython3-793871b')):
for each in subfolder+file:
source = os.path.join(folder,each)
# remove the absolute path to compose arcname
# also handles the remaining leading path separator with lstrip
arcname = source[len(root_dir):].lstrip(os.sep)
# write the file under a different name in the archive
myzip.write(source,arcname=arcname)
myzip.close()
When I invoke add() on a tarfile object with a file path, the file is added to the tarball with directory hierarchy associated. In other words, if I unzip the tarfile the directories in the original directories hierarchy are reproduced.
Is there a way to simply adding a plain file without directory info that untarring the resulting tarball produce a flat list of files?
Using the arcname argument of TarFile.add() method is an alternate and convenient way to match your destination.
Example: you want to archive a dir repo/a.git/ to a tar.gz file, but you rather want the tree root in the archive begins by a.git/ but not repo/a.git/, you can do like followings:
archive = tarfile.open("a.git.tar.gz", "w|gz")
archive.add("repo/a.git", arcname="a.git")
archive.close()
You can use tarfile.addfile(), in the TarInfo object, which is the first parameter, you can specify a name that's different from the file you're adding.
This piece of code should add /path/to/filename to the TAR file but will extract it as myfilename:
tar.addfile(tarfile.TarInfo("myfilename.txt"), open("/path/to/filename.txt"))
Maybe you can use the "arcname" argument to TarFile.add(name, arcname). It takes an alternate name that the file will have inside the archive.
thanks to #diabloneo, function to create selective tarball of a dir
def compress(output_file="archive.tar.gz", output_dir='', root_dir='.', items=[]):
"""compress dirs.
KWArgs
------
output_file : str, default ="archive.tar.gz"
output_dir : str, default = ''
absolute path to output
root_dir='.',
absolute path to input root dir
items : list
list of dirs/items relative to root dir
"""
os.chdir(root_dir)
with tarfile.open(os.path.join(output_dir, output_file), "w:gz") as tar:
for item in items:
tar.add(item, arcname=item)
>>>root_dir = "/abs/pth/to/dir/"
>>>compress(output_file="archive.tar.gz", output_dir=root_dir,
root_dir=root_dir, items=["logs", "output"])
Here is the code sample to tar list of files in folder without adding folder:
with tarfile.open(tar_path, 'w') as tar:
for filename in os.listdir(folder):
fpath = os.path.join(folder, filename)
tar.add(fpath, arcname=filename)
If you want to add the directory name but not its contents inside a tarfile, you can do the following:
(1) create an empty directory called empty
(2) tf.add("empty", arcname=path_you_want_to_add)
That creates an empty directory with the name path_you_want_to_add.