Zipped files have extra unwanted folders

Zipped files have extra unwanted folders - python

I have been having an issue with using the zipfile.Zipfile() function. It zips my files properly, but then has extra folders that I do not want in the output zip file. It does put all my desired files in the .zip but it seems to add the last few directories from the files being written in the .zip file by default. Is there any way to exclude these folders? Here is my code:
import arcpy, os
from os import path as p
import zipfile
arcpy.overwriteOutput = True
def ZipShapes(path, out_path):
arcpy.env.workspace = path
shapes = arcpy.ListFeatureClasses()
# iterate through list of shapefiles
for shape in shapes:
name = p.splitext(shape)[0]
print name
zip_path = p.join(out_path, name + '.zip')
zip = zipfile.ZipFile(zip_path, 'w')
zip.write(p.join(path,shape))
for f in arcpy.ListFiles('%s*' %name):
if not f.endswith('.shp'):
zip.write(p.join(path,f))
print 'All files written to %s' %zip_path
zip.close()
if __name__ == '__main__':
path = r'C:\Shape_test\Census_CedarCo'
out_path = r'C:\Shape_outputs'
ZipShapes(path, out_path)
I tried to post some pictures but I do not have enough reputation points. Basically it is adding 2 extra folders (empty) inside the zip file. So instead of the files being inside the zip like this:
C:\Shape_outputs\Public_Buildings.zip\Public_Buildings.shp
They are showing up like this:
C:\Shape_outputs\Public_Buildings.zip\Shape_test\Census_CedarCo\Public_Buildings.shp
The "Shape_test" and "Census_CedarCo" folders are the directories that the shapefiles I am trying to copy come from, but if I am just writing these files why are the sub directories also being copied into the zip file? I suppose it is not a huge deal since I am getting the files zipped, but it is more of an annoyance than anything.
I assumed that when creating a zip file it would just write the files I specify themselves. Why does it add these extra directories inside the zip file? Is there a way around it? Am I missing something here? I appreciate any input! Thanks

The optional second parameter in ZipFile.write(filename[, arcname[, compress_type]]) is that name used in the archive file. You can strip the offending folders from the front of the path and use the remainder for the archive path name. I'm not sure exactly how arcpy gives you the paths, but something like zip.write(p.join(path,shape), shape) should do it.

Related

Is there a way to change your cwd in Python using a file as an input?

I have a Python program where I am calculating the number of files within different directories, but I wanted to know if it was possible to use a text file containing a list of different directory locations to change the cwd within my program?
Input: Would be a text file that has different folder locations that contains various files.
I have my program set up to return the total amount of files in a given folder location and return the amount to a count text file that will be located in each folder the program is called on.

You can use os module in Python.
import os
# dirs will store the list of directories, can be populated from your text file
dirs = []
text_file = open(your_text_file, "r")
for dir in text_file.readlines():
dirs.append(dir)
#Now simply loop over dirs list
for directory in dirs:
# Change directory
os.chdir(directory)
# Print cwd
print(os.getcwd())
# Print number of files in cwd
print(len([name for name in os.listdir(directory)
if os.path.isfile(os.path.join(directory, name))]))

Yes.
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
os.chdir(targetdir)
# Do your stuff here
os.chdir(start_dir)
Do bear in mind that if your program dies half way through it'll leave you in a different working directory to the one you started in, which is confusing for users and can occasionally be dangerous (especially if they don't notice it's happened and start trying to delete files that they expect to be there - they might get the wrong file). You might want to consider if there's a way to achieve what you want without changing the working directory.
EDIT:
And to suggest the latter, rather than changing directory use os.listdir() to get the files in the directory of interest:
import os
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
contents = os.listdir(targetdir)
numfiles = len(contents)
countfile = open(os.path.join(targetdir, "count.txt"), "w")
countfile.write(str(numfiles))
countfile.close()
Note that this will count files and directories, not just files. If you only want files then you'll have to go through the list returned by os.listdir checking whether each item is a file using os.path.isfile()

Iterate over files located in different folders

I’d like to write a function to iterate over excel files that are in different folders. Parts of the path of each file are the same, for instance:
C:\Main\Division\Reports\Year\Data.xls
The only part of each path that changes is ‘Year’. The files all have the same name.
Is there a way to do this with a placeholder for Year? If not, what approach should I take?

You can use os.listdir function
directory = "C:\Main\Division\Reports"
root_dir = os.path.dirname(directory)
for data in os.listdir(directory):
file_name = os.path.join(root_dir, data, 'Data.xls')
# do something

You could try os.walk
import os
parent = "C:\Main\Division\Reports"
for root, directory, files in os.walk(parent):
print root
print directory
print files

Writing zipfile in Python 3.6 without absolute path

I am trying to write a zip file using Python's zipfile module that starts at a certain subfolder but still maintains the tree structure from that subfolder. For example, if I pass "C:\Users\User1\OneDrive\Documents", the zip file will contain everything from Documents onward, with all of Documents' subfolders maintained within Documents. I have the following code:
import zipfile
import os
import datetime
def backup(src, dest):
"""Backup files from src to dest."""
base = os.path.basename(src)
now = datetime.datetime.now()
newFile = f'{base}_{now.month}-{now.day}-{now.year}.zip'
# Set the current working directory.
os.chdir(dest)
if os.path.exists(newFile):
os.unlink(newFile)
newFile = f'{base}_{now.month}-{now.day}-{now.year}_OVERWRITE.zip'
# Write the zipfile and walk the source directory tree.
with zipfile.ZipFile(newFile, 'w') as zip:
for folder, _ , files in os.walk(src):
print(f'Working in folder {os.path.basename(folder)}')
for file in files:
zip.write(os.path.join(folder, file),
arcname=os.path.join(
folder[len(os.path.dirname(folder)) + 1:], file),
compress_type=zipfile.ZIP_DEFLATED)
print(f'\n---------- Backup of {base} to {dest} successful! ----------\n')
I know I have to use the arcname parameter for zipfile.write(), but I can't figure out how to get it to maintain the tree structure of the original directory. The code as it is now writes every subfolder to the first level of the zip file, if that makes sense. I've read several posts suggesting I use os.path.relname() to chop off the root, but I can't seem to figure out how to do it properly. I am also aware that this post looks similar to others on Stack Overflow. I have read those other posts and cannot figure out how to solve this problem.

The arcname parameter will set the exact path within the zip file for the file you are adding. You issue is when you are building the path for arcname you are using the wrong value to get the length of the prefix to remove. Specifically:
arcname=os.path.join(folder[len(os.path.dirname(folder)) + 1:], file)
Should be changed to:
arcname=os.path.join(folder[len(src):], file)

How to create a folder for each item in a directory?

I'm having trouble making folders that I create go where I want them to go. For each file in a given folder, I want to create a new folder, then put that file in the new folder. My problem is that the new folders I create are being put in the parent directory, not the one I want. My example:
def createFolder():
dir_name = 'C:\\Users\\Adrian\\Entertainment\\Coding\\Test Folder'
files = os.listdir(dir_name)
for i in files:
os.mkdir(i)
Let's say that my files in that directory are Hello.txt and Goodbye.txt. When I run the script, it makes new folders for these files, but puts them one level above, in 'C:\Users\Adrian\Entertainment\Coding.
How do I make it so they are created in the same place as the files, AKA 'C:\Users\Adrian\Entertainment\Coding\Test Folder'?

import os, shutil
for i in files:
os.mkdir(os.path.join(dir_name , i.split(".")[0]))
shutil.copy(os.path.join(dir_name , i), os.path.join(dir_name , i.split(".")[0]))

os.listdir(dir_name) lists only the names of the files, not full paths to the files. To get a path to the file, join it with dir_name:
os.mkdir(os.path.join(dir_name, i))

Python - Search for files & ZIP, across multiple directories

This is my first time hacking together bits and pieces of code to form a utility that I need (I'm a designer by trade) and, though I feel I'm close, I'm having trouble getting the following to work.
I routinely need to zip up files with a .COD extension that are inside of a directory structure I've created. As an example, the structure may look like this:
(single root folder) -> (multiple folders) -> (two folders) -> (one folder) -> COD files
I need to ZIP up all the COD files into COD.zip and place that zip file one directory above where the files currently are. Folder structure would look like this when done for example:
EXPORT folder -> 9800 folder -> 6 folder -> OTA folder (+ new COD.zip) -> COD files
My issues -
first, the COD.zip that it creates seems to be appropriate for the COD files within it but when I unzip it, there is only 1 .cod inside but the file size of that ZIP is the size of all the CODs zipped together.
second, I need the COD files to be zipped w/o any folder structure - just directly within COD.zip. Currently, my script creates an entire directory structure (starting with "users/mysuername/etc etc").
Any help would be greatly appreciated - and explanations even better as I'm trying to learn :)
Thanks.
import os, glob, fnmatch, zipfile
def scandirs(path):
for currentFile in glob.glob( os.path.join(path, '*') ):
if os.path.isdir(currentFile):
scandirs(currentFile)
if fnmatch.fnmatch(currentFile, '*.cod'):
cod = zipfile.ZipFile("COD.zip","a")
cod.write(currentFile)
scandirs(os.getcwd())

For problem #1, I think your problem is probably this section:
cod = zipfile.ZipFile("COD.zip","a")
cod.write(currentFile)
You're creating a new zip (and possibly overwriting the existing one) every time you go to write a new file. Instead you want to create the zip once per directory and then repeatedly append to it (see example below).
For problem #2, your issue is that you probably need to flatten the filename when you write it to the archive. One approach would be to use os.chdir to CD into each directory in scandirs as you look at it. An easier approach is to use the os.path module to split up the file path and grab the basename (the filename without the path) and then you can use the 2nd parameter to cod.write to change the filename that gets put into the actual zip (see example below).
import os, os.path, glob, fnmatch, zipfile
def scandirs(path):
#zip file goes at current path, then up one dir, then COD.zip
zip_file_path = os.path.join(path,os.path.pardir,"COD.zip")
cod = zipfile.ZipFile(zip_file_path,"a") #NOTE: will result in some empty zips at the moment for dirs that contain no .cod files
for currentFile in glob.glob( os.path.join(path, '*') ):
if os.path.isdir(currentFile):
scandirs(currentFile)
if fnmatch.fnmatch(currentFile, '*.cod'):
cod.write(currentFile,os.path.basename(currentFile))
cod.close()
if not cod.namelist(): #zip is empty
os.remove(zip_file_path)
scandirs(os.getcwd())
So create the zip file once, repeatedly append to it while flattening the filenames, then close it. You also need to make sure you call close or you may not get all your files written.
I don't have a good way to test this locally at the moment, so feel free to try it and report back. I'm sure I probably broke something. ;-)

The following code has the same effect but is more reusable and does not create multiple zip files.
import os,glob,zipfile
def scandirs(path, pattern):
result = []
for file in glob.glob( os.path.join( path, pattern)):
if os.path.isdir(file):
result.extend(scandirs(file, pattern))
else:
result.append(file)
return result
zfile = zipfile.ZipFile('yourfile.zip','w')
for file in scandirs(yourbasepath,'*.COD'):
print 'Processing file: ' + file
zfile.write(file) # folder structure
zfile.write(file, os.path.split(file)[1]) # no folder structure
zfile.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Zipped files have extra unwanted folders - python

Related

Is there a way to change your cwd in Python using a file as an input?

Iterate over files located in different folders

Writing zipfile in Python 3.6 without absolute path

How to create a folder for each item in a directory?

Python - Search for files & ZIP, across multiple directories

Categories

Resources