Python - Search for files & ZIP, across multiple directories

Python - Search for files & ZIP, across multiple directories - python

This is my first time hacking together bits and pieces of code to form a utility that I need (I'm a designer by trade) and, though I feel I'm close, I'm having trouble getting the following to work.
I routinely need to zip up files with a .COD extension that are inside of a directory structure I've created. As an example, the structure may look like this:
(single root folder) -> (multiple folders) -> (two folders) -> (one folder) -> COD files
I need to ZIP up all the COD files into COD.zip and place that zip file one directory above where the files currently are. Folder structure would look like this when done for example:
EXPORT folder -> 9800 folder -> 6 folder -> OTA folder (+ new COD.zip) -> COD files
My issues -
first, the COD.zip that it creates seems to be appropriate for the COD files within it but when I unzip it, there is only 1 .cod inside but the file size of that ZIP is the size of all the CODs zipped together.
second, I need the COD files to be zipped w/o any folder structure - just directly within COD.zip. Currently, my script creates an entire directory structure (starting with "users/mysuername/etc etc").
Any help would be greatly appreciated - and explanations even better as I'm trying to learn :)
Thanks.
import os, glob, fnmatch, zipfile
def scandirs(path):
for currentFile in glob.glob( os.path.join(path, '*') ):
if os.path.isdir(currentFile):
scandirs(currentFile)
if fnmatch.fnmatch(currentFile, '*.cod'):
cod = zipfile.ZipFile("COD.zip","a")
cod.write(currentFile)
scandirs(os.getcwd())

For problem #1, I think your problem is probably this section:
cod = zipfile.ZipFile("COD.zip","a")
cod.write(currentFile)
You're creating a new zip (and possibly overwriting the existing one) every time you go to write a new file. Instead you want to create the zip once per directory and then repeatedly append to it (see example below).
For problem #2, your issue is that you probably need to flatten the filename when you write it to the archive. One approach would be to use os.chdir to CD into each directory in scandirs as you look at it. An easier approach is to use the os.path module to split up the file path and grab the basename (the filename without the path) and then you can use the 2nd parameter to cod.write to change the filename that gets put into the actual zip (see example below).
import os, os.path, glob, fnmatch, zipfile
def scandirs(path):
#zip file goes at current path, then up one dir, then COD.zip
zip_file_path = os.path.join(path,os.path.pardir,"COD.zip")
cod = zipfile.ZipFile(zip_file_path,"a") #NOTE: will result in some empty zips at the moment for dirs that contain no .cod files
for currentFile in glob.glob( os.path.join(path, '*') ):
if os.path.isdir(currentFile):
scandirs(currentFile)
if fnmatch.fnmatch(currentFile, '*.cod'):
cod.write(currentFile,os.path.basename(currentFile))
cod.close()
if not cod.namelist(): #zip is empty
os.remove(zip_file_path)
scandirs(os.getcwd())
So create the zip file once, repeatedly append to it while flattening the filenames, then close it. You also need to make sure you call close or you may not get all your files written.
I don't have a good way to test this locally at the moment, so feel free to try it and report back. I'm sure I probably broke something. ;-)

The following code has the same effect but is more reusable and does not create multiple zip files.
import os,glob,zipfile
def scandirs(path, pattern):
result = []
for file in glob.glob( os.path.join( path, pattern)):
if os.path.isdir(file):
result.extend(scandirs(file, pattern))
else:
result.append(file)
return result
zfile = zipfile.ZipFile('yourfile.zip','w')
for file in scandirs(yourbasepath,'*.COD'):
print 'Processing file: ' + file
zfile.write(file) # folder structure
zfile.write(file, os.path.split(file)[1]) # no folder structure
zfile.close()

Related

Move files one by one to newly created directories for each file with Python 3

What I have is an initial directory with a file inside D:\BBS\file.x and multiple .txt files in the work directory D:\
What I am trying to do is to copy the folder BBS with its content and incrementing it's name by number, then copy/move each existing .txt file to the newly created directory to make it \BBS1, \BBS2, ..., BBSn (depends on number of the txt).
Visual example of the Before and After:
Initial view of the \WorkFolder
Desired view of the \WorkFolder
Right now I have reached only creating of a new directory and moving txt in it but all at once, not as I would like to. Here's my code:
from pathlib import Path
from shutil import copy
import shutil
import os
wkDir = Path.cwd()
src = wkDir.joinpath('BBS')
count = 0
for content in src.iterdir():
addname = src.name.split('_')[0]
out_folder = wkDir.joinpath(f'!{addname}')
out_folder.mkdir(exist_ok=True)
out_path = out_folder.joinpath(content.name)
copy(content, out_path)
files = os.listdir(wkDir)
for f in files:
if f.endswith(".txt"):
shutil.move(f, out_folder)
I kindly request for assistance with incrementing and copying files one by one to the newly created directory for each as mentioned.
Not much skills with python in general. Python3 OS Windows
Thanks in advance

Now, I understand what you want to accomplish. I think you can do it quite easily by only iterating over the text files and for each one you copy the BBS folder. After that you move the file you are currently at. In order to get the folder_num, you may be able to just access the file name's characters at the particular indexes (e.g. f[4:6]) if the name is always of the pattern TextXX.txt. If the prefix "Text" may vary, it is more stable to use regular expressions like in the following sample.
Also, the function shutil.copytree copies a directory with its children.
import re
import shutil
from pathlib import Path
wkDir = Path.cwd()
src = wkDir.joinpath('BBS')
for f in os.listdir(wkDir):
if f.endswith(".txt"):
folder_num = re.findall(r"\d+", f)[0]
target = wkDir.joinpath(f"{src.name}{folder_num}")
# copy BBS
shutil.copytree(src, target)
# move .txt file
shutil.move(f, target)

Copying/pasting specific files in batch from one folder to another

I'am really new to this python scripting thing, but pretty sure, that there is the way to copy files from one folder (via given path in .txt file) to another.
there would be directly path to the folder, which contains photo files
I'am working with huge amounts of photos, which contains gps metadata (so i need not to lose it).
Really apreciate any help, thanks.

Here is a short and simple solution:
import shutil
import os
# replace file_list.txt with your files
file_list = open("file_list.txt", "r")
# replace my_dir with your copy dir
copy_dir = "my_dir"
for f in file_list.read().splitlines():
print(f"copying file: {f}")
shutil.copyfile(f, f"{copy_dir}/{os.path.split(f)[1]}")
file_list.close()
print("done")
It loops over all the files in the file list, and copies them. It should be fast enough.

Writing zipfile in Python 3.6 without absolute path

I am trying to write a zip file using Python's zipfile module that starts at a certain subfolder but still maintains the tree structure from that subfolder. For example, if I pass "C:\Users\User1\OneDrive\Documents", the zip file will contain everything from Documents onward, with all of Documents' subfolders maintained within Documents. I have the following code:
import zipfile
import os
import datetime
def backup(src, dest):
"""Backup files from src to dest."""
base = os.path.basename(src)
now = datetime.datetime.now()
newFile = f'{base}_{now.month}-{now.day}-{now.year}.zip'
# Set the current working directory.
os.chdir(dest)
if os.path.exists(newFile):
os.unlink(newFile)
newFile = f'{base}_{now.month}-{now.day}-{now.year}_OVERWRITE.zip'
# Write the zipfile and walk the source directory tree.
with zipfile.ZipFile(newFile, 'w') as zip:
for folder, _ , files in os.walk(src):
print(f'Working in folder {os.path.basename(folder)}')
for file in files:
zip.write(os.path.join(folder, file),
arcname=os.path.join(
folder[len(os.path.dirname(folder)) + 1:], file),
compress_type=zipfile.ZIP_DEFLATED)
print(f'\n---------- Backup of {base} to {dest} successful! ----------\n')
I know I have to use the arcname parameter for zipfile.write(), but I can't figure out how to get it to maintain the tree structure of the original directory. The code as it is now writes every subfolder to the first level of the zip file, if that makes sense. I've read several posts suggesting I use os.path.relname() to chop off the root, but I can't seem to figure out how to do it properly. I am also aware that this post looks similar to others on Stack Overflow. I have read those other posts and cannot figure out how to solve this problem.

The arcname parameter will set the exact path within the zip file for the file you are adding. You issue is when you are building the path for arcname you are using the wrong value to get the length of the prefix to remove. Specifically:
arcname=os.path.join(folder[len(os.path.dirname(folder)) + 1:], file)
Should be changed to:
arcname=os.path.join(folder[len(src):], file)

Copy files (as backup) and change original file names (rearranging contents)

i'm a total python noob but i want to learn it and integrate it to my workflow.
I have about 400 files containing 4 different parts in the filename separated by an underline:
-> Version_Date_ProjectName_ProjectNumber
As we allways look at the Projectnumber first, we arranged the contents of the filename for new projects to:
-> ProjectNumber_Version_ProjektName
My Problem now is, that i like to rename all the existing files to be rearranged to the new format while having them backed up in a subdirectory called "Archiv".
It just has to be a simple script that i put in the directory and every file in this directory will be copied as backup and changed to the new filename.
EDIT:
My first step was to create a subfolder within the source directory, and it worked somehow. But no i saw, that i just need to backup the files with a specific file extension.
import os, shutil
src_dir= os.curdir
dst_dir= os.path.join(os.curdir, "Archiv")
shutil.copytree(src_dir, dst_dir)
i tried to extend the code with the solutions from here but it doesn't work out. :/

import os
import shutil
import glob
src_path = "YOU_SOURCE_PATH"
dest_path = "YOUR DESTINATION PATH"
if not os.path.exists(dest_path):
os.makedirs(dest_path)
files = glob.iglob(os.path.join(src_dir, "*.pdf"))
for file in files:
if os.path.isfile(file):
shutil.copy2(file, dest_path)

Zipped files have extra unwanted folders

I have been having an issue with using the zipfile.Zipfile() function. It zips my files properly, but then has extra folders that I do not want in the output zip file. It does put all my desired files in the .zip but it seems to add the last few directories from the files being written in the .zip file by default. Is there any way to exclude these folders? Here is my code:
import arcpy, os
from os import path as p
import zipfile
arcpy.overwriteOutput = True
def ZipShapes(path, out_path):
arcpy.env.workspace = path
shapes = arcpy.ListFeatureClasses()
# iterate through list of shapefiles
for shape in shapes:
name = p.splitext(shape)[0]
print name
zip_path = p.join(out_path, name + '.zip')
zip = zipfile.ZipFile(zip_path, 'w')
zip.write(p.join(path,shape))
for f in arcpy.ListFiles('%s*' %name):
if not f.endswith('.shp'):
zip.write(p.join(path,f))
print 'All files written to %s' %zip_path
zip.close()
if __name__ == '__main__':
path = r'C:\Shape_test\Census_CedarCo'
out_path = r'C:\Shape_outputs'
ZipShapes(path, out_path)
I tried to post some pictures but I do not have enough reputation points. Basically it is adding 2 extra folders (empty) inside the zip file. So instead of the files being inside the zip like this:
C:\Shape_outputs\Public_Buildings.zip\Public_Buildings.shp
They are showing up like this:
C:\Shape_outputs\Public_Buildings.zip\Shape_test\Census_CedarCo\Public_Buildings.shp
The "Shape_test" and "Census_CedarCo" folders are the directories that the shapefiles I am trying to copy come from, but if I am just writing these files why are the sub directories also being copied into the zip file? I suppose it is not a huge deal since I am getting the files zipped, but it is more of an annoyance than anything.
I assumed that when creating a zip file it would just write the files I specify themselves. Why does it add these extra directories inside the zip file? Is there a way around it? Am I missing something here? I appreciate any input! Thanks

The optional second parameter in ZipFile.write(filename[, arcname[, compress_type]]) is that name used in the archive file. You can strip the offending folders from the front of the path and use the remainder for the archive path name. I'm not sure exactly how arcpy gives you the paths, but something like zip.write(p.join(path,shape), shape) should do it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Search for files & ZIP, across multiple directories - python

Related

Move files one by one to newly created directories for each file with Python 3

Copying/pasting specific files in batch from one folder to another

Writing zipfile in Python 3.6 without absolute path

Copy files (as backup) and change original file names (rearranging contents)

Zipped files have extra unwanted folders

Categories

Resources