How to get Python to look at Sub-Folders? - python

I am trying to create a python script that will look in a series of sub-folders and delete empty shapefiles. I have successfully created the part of the script that will delete the empty files in one folder, but there are a total of 70 folders within the "Project" folder. While I could just copy and paste the code 69 times I'm sure must be a way to get it to look at each sub-folder and run the code for each of those sub-folders. Below is the what I have so far. Any ideas? I'm very new to this and I have simply edited an existing code to get this far. Thanks!
import os
# Set the working directory
os.chdir ("C:/Naview/Platypus/Project")
# Get the list of only files in the current directory
file = filter(os.path.isfile, os.listdir('C:/Naview/Platypus/Project'))
# For each file in directory
for shp in file:
# Get only the files that end in ".shp"
if shp.endswith(".shp"):
# Get the size of the ".shp" file.
# NOTE: The ".dbf" file can vary is size whereas
# the shp & shx are always the same when "empty".
size = os.path.getsize(shp)
print "\nChecking " + shp + "'s file size..."
#If the file size is greater than 100 bytes, leave it alone.
if size > 100:
print "File is " + str(size) + " bytes"
print shp + " will NOT be deleted \n"
#If the file size is equal to 100 bytes, delete it.
if size == 100:
# Convert the int output from (size) to a string.
print "File is " + str(size) + " bytes"
# Get the filename without the extention
base = shp[:-4]
# Remove entire shapefile
print "Removing " + base + ".* \n"
if os.path.exists(base + ".shp"):
os.remove(base + ".shp")
if os.path.exists(base + ".shx"):
os.remove(base + ".shx")
if os.path.exists(base + ".dbf"):
os.remove(base + ".dbf")
if os.path.exists(base + ".prj"):
os.remove(base + ".prj")
if os.path.exists(base + ".sbn"):
os.remove(base + ".sbn")
if os.path.exists(base + ".sbx"):
os.remove(base + ".sbx")
if os.path.exists(base + ".shp.xml"):
os.remove(base + ".shp.xml")

There are several ways to do this. I'm a fan of glob
for shp in glob.glob('C:/Naview/Platypus/Project/**/*.shp'):
size = os.path.getsize(shp)
print "\nChecking " + shp + "'s file size..."
#If the file size is greater than 100 bytes, leave it alone.
if size > 100:
print "File is " + str(size) + " bytes"
print shp + " will NOT be deleted \n"
continue
print "Removing", shp, "files"
for file in glob.glob(shp[:-3] + '*'):
print " removing", file
os.remove(file)

Time to learn about procedural programming: Defining Functions.
Put your code into a function with a path parameter and call it for each of your 70 paths:
def delete_empty_shapefiles(path):
# Get the list of only files in the current directory
file = filter(os.path.isfile, os.listdir(path))
...
paths = ['C:/Naview/Platypus/Project', ...]
for path in paths:
delete_empty_shapefiles(path)
Bonus points for creating a function that performs the os.path.exists() and os.remove() calls.

Related

Unexpected end of data when zipping zip files in Python

Good day.
I wrote a little Python program to help me easily create .cbc files for Calibre, which is just a renamed .zip file with a text file called comics.txt for TOC purposes. Each chapter is another zip file.
The issue is that the last zip file zipped always has the error "Unexpected end of data". The file itself is not corrupt, if I unzip it and rezip it it works perfectly. Playing around it seems that the problem is that Python doesn't close the last zip file after zipping it, since I can't delete the last zip while the program is still running since it's still open in Python. Needless to say, Calibre doesn't like the file and fails to convert it unless I manually rezip the affected chapters.
The code is as follows, checking the folders for not-image files, zipping the folders, zipping the zips while creating the text file, and "changing" extension.
import re, glob, os, zipfile, shutil, pathlib, gzip, itertools
Folders = glob.glob("*/")
items = len(Folders)
cn_list = []
cn_list_filtered = []
dirs_filtered = []
ch_id = ["c", "Ch. "]
subdir_im = []
total = 0
Dirs = next(os.walk('.'))[1]
for i in range(0, len(Dirs)):
for items in os.listdir("./" + Dirs[i]):
if items.__contains__('.png') or items.__contains__('.jpg'):
total+=1
else:
print(items + " not an accepted format.")
subdir_im.append(total)
total = 0
for fname in Folders:
if re.search(ch_id[0] + r'\d+' + r'[\S]' + r'\d+', fname):
cn = re.findall(ch_id[0] + "(\d+[\S]\d+)", fname)[0]
cn_list.append(cn)
elif re.search(ch_id[0] + r'\d+', fname):
cn = re.findall(ch_id[0] + "(\d+)", fname)[0]
cn_list.append(cn)
elif re.search(ch_id[1] + r'\d+' + '[\S]' + r'\d+', fname):
cn = re.findall(ch_id[1] + "(\d+[\S]\d+)", fname)[0]
cn_list.append(cn)
elif re.search(ch_id[1] + r'\d+', fname):
cn = re.findall(ch_id[1] + "(\d+)", fname)[0]
cn_list.append(cn)
else:
print('Warning: File found without proper filename format.')
cn_list_filtered = set(cn_list)
cn_list_filtered = sorted(cn_list_filtered)
cwd = os.getcwd()
Dirs = Folders
subdir_zi = []
total = 0
for i in range(0, len(cn_list_filtered)):
for folders in Dirs:
if folders.__contains__(ch_id[0] + cn_list_filtered[i] + " ")\
or folders.__contains__(ch_id[1] + cn_list_filtered[i] + " "):
print('Zipping folder ', folders)
namezip = "Chapter " + cn_list_filtered[i] + ".zip"
current_zip = zipfile.ZipFile(namezip, "a")
for items in os.listdir(folders):
if items.__contains__('.png') or items.__contains__('.jpg'):
current_zip.write(folders + "/" + items, items)
total+=1
subdir_zi.append(total)
total = 0
print('Folder contents in order:', subdir_im, ' Total:', sum(subdir_im))
print("Number of items per zip: ", subdir_zi, ' Total:', sum(subdir_zi))
if subdir_im == subdir_zi:
print("All items in folders have been successfully zipped")
else:
print("Warning: File count in folders and zips do not match. Please check the affected chapters")
zips = glob.glob("*.zip")
namezip2 = os.path.basename(os.getcwd()) + ".zip"
zipfinal = zipfile.ZipFile(namezip2, "a")
for i in range(0, len(zips), 1):
zipfinal.write(zips[i],zips[i])
Data = []
for i in range (0,len(cn_list_filtered),1):
Datai = ("Chapter " + cn_list_filtered[i] + ".zip" + ":Chapter " + cn_list_filtered[i] + "\r\n")
Data.append(Datai)
Dataok = ''.join(Data)
with zipfile.ZipFile(namezip2, 'a') as myzip:
myzip.writestr("comics.txt", Dataok)
zipfinal.close()
os.rename(namezip2, namezip2 + ".cbc")
os.system("pause")
I am by no means a programmer, that is just a Frankenstein monster code I eventually managed to put together by checking threads, but this last issue has me stumped.
Some solutions I tried are:
for i in range(0, len(zips), 1):
zipfinal.write(zips[i],zips[i])
zips[i].close()
Fails with:
zips[i].close()
AttributeError: 'str' object has no attribute 'close'
and:
for i in range(0, len(zips), 1):
zipfinal.write(zips[i],zips[i])
zips[len(zips)].close()
Fails with:
zips[len(zips)].close()
IndexError: list index out of range
Thanks for the help.
This solved my issue:
def generate_zip(file_list, file_name=None):
zip_buffer = io.BytesIO()
zf = zipfile.ZipFile(zip_buffer, mode="w", compression=zipfile.ZIP_DEFLATED)
for file in file_list:
print(f"Filename: {file[0]}\nData: {file[1]}")
zf.writestr(file[0], file[1])
**zf.close()**
with open(file_name, 'wb') as f:
f.write(zip_buffer.getvalue())
f.close()

Python script to run FME workbench

I have more than 500 xml files and each xml file should processed on FME workbench individually (iteration of FME workbench for each xml file).
For such a propose i have to run a python file (loop.py) to iterate FME workbench for each xml file.
The whole process was working in past on other PC without any problem. Now Once i run Module i got the following error:
Traceback (most recent call last):E:\XML_Data
File "E:\XML_Data\process\01_XML_Tile_1.py", line 28, in
if "Translation was SUCCESSFUL" in open(path_log + "\" + data + ".log").read():
IOError: [Errno 2] No such file or directory: 'E:\XML_Data\data_out\log_01\re_3385-5275.xml.log'
Attached the python code(loop.py).
Any help is greatly appreciated.
import os
import time
# Mainpath and Working Folder:
#path_main = r"E:\XML_Data"
path_main = r"E:\XML_Data"
teil = str("01")
# variables
path_in = path_main + r"\data_in\03_Places\teil_" + teil # "Source folder of XML files"
path_in_tile10 = path_main + r"\data_in\01_Tiling\10x10.shp" # "Source folder of Grid shapefile"
path_in_commu = path_main + r"\data_in\02_Communities\Communities.shp" # "Source folder of Communities shapefile"
path_out = path_main + r"\data_out\teil_" + teil # "Output folder of shapefiles that resulted from XML files (tile_01 folder)"
path_log = path_main + r"\data_out\log_" + teil # "Output folder of log files for each run(log_01 folder)"
path_fme = r"%FME_EXE_2015%" # "C:\Program Files\FME2015\fme.exe"
path_fme_workbench = path_main + r"\process\PY_FME2015.fmw" # "path of FME workbench"
datalists = os.listdir(path_in)
count = 0
# loop each file individually in FME
for data in datalists:
if data.find(".xml") != -1:
count +=1
print ("Run-No." + str(count) + ": with data " + data)
os.system (path_fme + " " + path_fme_workbench + " " + "--SourceDataset_XML"+ " " + path_in + "\\" + data + " " + "--SourceDataset_SHAPE" + " " + path_in_tile10 + " " + "--SourceDataset_SHAPE_COMU" + " " + path_in_commu + " " + "--DestDataset_SHAPE" +" " +path_out + " " +"LOG_FILENAME" + " " + path_log + "\\" + data + ".log" )
print ("Data processed: " + data)
shape = str(data[19:28]) + "_POPINT_CENTR_UTM32N.shp"
print ("ResultsFileName: " + shape)
if "Translation was SUCCESSFUL" in open(path_log + "\\" + data + ".log").read():
# Translation was successful and SHP file exists:
if os.path.isfile(path_out + "\\" + shape):
write_log = open(path_out + "\\" + "result_xml.log", "a")
write_log.write(time.asctime(time.localtime()) + " " + shape + "\n")
write_log.close()
print("Everything ok")
#Translation was successful, but SHP file does not exist:
else:
write_log = open(path_out + "\\" + "error_xml.log", "a")
write_log.write(time.asctime(time.localtime()) + " Data: " + shape + " unavailable.\n")
write_log.close()
# Translation was not successful:
else:
write_log = open(path_out + "\\" + "error_xml.log", "a")
write_log.write(time.asctime(time.localtime()) + " Translation " + Data + " not successful.\n")
write_log.close()
print ("Number of calculated files: " + str(count))
Most likely, the script failed at the os.system line, so the log file was not created from the command. Since you mentioned a different computer, it could be caused by many reasons, such as a different version of FME (so the environment variable %FME_EXE_2015% would not exist).
Use a workspace runner transformer to do this.
The FME version is outdated.so first check the version whether it is creating the problem.
subprocess.call(["C:/Program Files/fme/FMEStarter/FMEStarter.exe", "C:/Program Files/fme/fme20238/fme.exe", "/fmefile.fmw" "LOG_FILENAME","logfile"], stdin=None, stdout=None, stderr=None, shell=True, timeout=None)

os.rename deleting files python 3

Python novice, my simple script gets a given directory and renames all files sequentially, however it is deleting the files but the print is showing the files names getting renamed, not sure where its going wrong here.
Also, in what order does it retrieve these files?
import os
path = os.path.abspath("D:\Desktop\gp")
i = 0
for file_name in os.listdir(path):
try:
print (file_name + " - " + str(i))
os.rename(os.path.join(path,file_name), str(i))
except WindowsError:
os.remove(str(i))
os.rename(os.path.join(path,file_name), str(i))
i += 1
print(str(i) + " files.")
Edit
Below is the solution with working code, retrieves all files in a dir by creation date and assigns them a iterated number while retaining file extension.
import os
def sorted_dir(folder):
def getctime(name):
path = os.path.join(folder, name)
return os.path.getctime(path)
return sorted(os.listdir(path), key=getctime)
path = os.path.abspath("D:\Path\Here")
i = 0
for file_name in sorted_dir(path):
_, ext = os.path.splitext(file_name)
print (file_name + " - " + str(i)+ext)
os.rename(os.path.join(path,file_name), os.path.join(path, str(i) + ext))
i += 1
print(str(i-1) + " files.")
The problem is that you're using an absolute path for the source, but a relative path for the destination. So the files aren't getting deleted, they're just getting moved into the current working directory.
To fix it so they get renamed into the same directory they were already in, you can do the same thing on the destination you do on the source:
os.rename(os.path.join(path,file_name), os.path.join(path, str(i)))
From a comment, it sounds like you may want to preserve the extensions on these files. To do that:
_, ext = os.path.splitext(file_name)
os.rename(os.path.join(path,file_name), os.path.join(path, str(i) + ext))

Python Loop Count Stops at 67381

I'm assuming this has to be a memory issue but I'm not sure. The program loops through PDF's to look for corrupted files. When a file is corrupted, it writes that location to a txt file for me to review later. When running it the first time, I logged both pass and fail scenarios to the log. After 67381 log entries, it stopped. Then I changed this logic so it only logs errors, however, in the console I did display a count of the loop so I can tell how far along the process is. There are about 190k files to loop through and at exactly 67381 the count stops every time. It looks like the python program is still running in the background as the memory and cpu keeps fluctuating but it's hard to be sure. I also don't know now if it will still write errors to the log.
Here is the code,
import PyPDF2, os
from time import gmtime,strftime
path = raw_input("Enter folder path of PDF files:")
t = open(r'c:\pdf_check\log.txt','w')
count = 1
for dirpath,dnames,fnames in os.walk(path):
for file in fnames:
print count
count = count + 1
if file.endswith(".pdf"):
file = os.path.join(dirpath, file)
try:
PyPDF2.PdfFileReader(open(file, "rb"))
except PyPDF2.utils.PdfReadError:
curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
t.write (str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "fail" + "\n")
else:
pass
#curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
#t.write(str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "pass" + "\n")
t.close()
Edit 1: (New Code)
New code and the same issue:
import PyPDF2, os
from time import gmtime,strftime
path = raw_input("Enter folder path of PDF files:")
t = open(r'c:\pdf_check\log.txt','w')
count = 1
for dirpath,dnames,fnames in os.walk(path):
for file in fnames:
print count
count = count + 1
if file.endswith(".pdf"):
file = os.path.join(dirpath, file)
try:
with open(file,'rb') as f:
PyPDF2.PdfFileReader(f)
except PyPDF2.utils.PdfReadError:
curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
t.write (str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "fail" + "\n")
f.close()
else:
pass
f.close()
#curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
#t.write(str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "pass" + "\n")
t.close()
Edit 2: I am trying to now run this from a different machine with beefier hardware and a different version of windows (10 pro instead of server 2008 r2) but I don't think this is the issue.
Try to edit one of the .pdf files to make it larger. That way, if the loop number your program "stops" at is smaller, you can identify the problem as a memory issue.
Else, it might be an unusually large pdf file that is taking your program a while to verify integrity.
Debugging this, you could print the file location of the .pdf files you open to find this particular .pdf and manually open it to investigate further..
Figured it out. The issue is actually due to a random and very large corrupted PDF. So this is not a loop issue, it's a corrupted file issue.

Python code to print directories recursively

I have the following code , but it traverses the 1st directory it finds and stops.
I feel i have the recursive function which should have given the other directories too.
Can anyone please point out what is wrong with this code.
def func(path,no):
no=no+2
for item in os.listdir(path):
if os.path.isfile(path+"\\"+item):
print no * "-" + " " + item
if os.path.isdir(path+"\\"+item):
path=path + "\\" + item
print no * "-" + " " + item
func(path,no)
path="D:\\Hello"
no=0
func(pah,no)
OUTPUT :
-- 1.txt
-- 2.txt
-- 3.txt
-- blue
---- 33.txt
---- 45.txt
---- 56.txt
---- Tere
"blue" & "tere" are the directories. There are more directories int the "HELLO" folder which is not printed.
To walk through directories recursivly, use os.walk
import os
path = r'path\to\root\dir'
for root, dirs, files in os.walk(path):
# Access subdirs and files
On another note:
To join parts of a path together, use os.path.join. So instead of path+"\\"+item you could use os.path.join(path, item). This will work on all platforms, and you don't have to think about escaping slashes etc.
A better way to print values is to use the format method. In your case you could write
print '{} {}'.format(no*'-', item)`
instead of
print no * "-" + " " + item
It's because you change the path value to path + "\\" + item when you first find a directory. Then os.path.isfile(path+"\\"+item) and os.path.isdir(path+"\\"+item) all return False.
It should be like this:
def func(path,no):
no=no+2
for item in os.listdir(path):
if os.path.isfile(path+"\\"+item):
print no * "-" + " " + item
if os.path.isdir(path+"\\"+item):
print no * "-" + " " + item
func(path + "\\" + item,no)

Categories

Resources