Finding empty directories in Python

Finding empty directories in Python - python

All,
What is the best way to check to see if there is data in a directory before deleting it? I am browsing through a couple pages to find some pics using wget and of course every page does not have an image on it but the directory is still created.
dir = 'Files\\%s' % (directory)
os.mkdir(dir)
cmd = 'wget -r -l1 -nd -np -A.jpg,.png,.gif -P %s %s' %(dir, i[1])
os.system(cmd)
if not os.path.isdir(dir):
os.rmdir(dir)
I would like to test to see if a file was dropped in the directory after it was created. If nothing is there...delete it.
Thanks,
Adam

import os
if not os.listdir(dir):
os.rmdir(dir)
LBYL style.
for EAFP, see mouad's answer.

I will go with EAFP like so:
try:
os.rmdir(dir)
except OSError as ex:
if ex.errno == errno.ENOTEMPTY:
print "directory not empty"
os.rmdir will not delete a directory that is not empty.

Try:
if not os.listdir(dir):
print "Empty"
or
if os.listdir(dir) == []:
print "Empty"

This can now be done more efficiently in Python3.5+, since there is no need to build a list of the directory contents just to see if its empty:
import os
def is_dir_empty(path):
with os.scandir(path) as scan:
return next(scan, None) is None
Example usage:
if os.path.isdir(directory) and is_dir_empty(directory):
os.rmdir(directory)

What if you did checked if the directory exists, and whether there is content in the directory... something like:
if os.path.isdir(dir) and len(os.listdir(dir)) == 0:
os.rmdir(dir)

If the empty directories are already created, you can place this script in your outer directory and run it:
import os
def _visit(arg, dirname, names):
if not names:
print 'Remove %s' % dirname
os.rmdir(dirname)
def run(outer_dir):
os.path.walk(outer_dir, _visit, 0)
if __name__ == '__main__':
outer_dir = os.path.dirname(__file__)
run(outer_dir)
os.system('pause')

Here is the fastest and optimized way to check if the directory is empty or not.
empty = False
for dirpath, dirnames, files in os.walk(dir):
if files:
print("Not empty !") ;
if not files:
print("It is empty !" )
empty = True
break ;
The other answers mentioned here are not fast because , if you want use the usual os.listdir() , if the directory has too many files , it will slow ur code and if you use the os.rmdir( ) method to try to catch the error , then it will simply delete that folder. This might not be something which u wanna do if you just want to check for emptyness .

I have follews Bash checking if folder has contents answer.
Mainly it is similiar approach as #ideasman42's answer on https://stackoverflow.com/a/47363995/2402577, in order to not to build the complete list, which would probably work on Debian as well.
there is no need to build a list of the directory contents just to
see if its empty:
os.walk('.') returns the complete files under a directory and if there thousands it may be inefficient. Instead following command find "$target" -mindepth 1 -print -quit returns first found file and quits. If it returns an empty string, which means folder is empty.
You can check if a directory is empty using find, and processing its
output
def is_dir_empty(absolute_path):
cmd = ["find", absolute_path, "-mindepth", "1", "-print", "-quit"]
output = subprocess.check_output(cmd).decode("utf-8").strip()
return not output
print is_dir_empty("some/path/here")

Related

How to create tar.gz archive in Python/tar without include parent directory?

I have a FolderA which contains FolderB and FileB. How can I create a tar.gz archive which ONLY contains FolderB and FileB, removing the parent directory FolderA? I'm using Python and I'm running this code on a Windows machine.
The best lead I found was: How to create full compressed tar file using Python?
In the most upvoted answer, people discuss ways to remove the parent directory, but none of them work for me. I've tried arcname, os.walk, and running the tar command via subprocess.call ().
I got close with os.walk, but in the code below, it still drops a " _ " directory in with FolderB and FileB. So, the file structure is ARCHIVE.tar.gz > ARCHIVE.tar > "_" directory, FolderB, FileB.
def make_tarfile(output_filename, source_dir):
with tarfile.open(output_filename, "w:gz") as tar:
length = len(source_dir)
for root, dirs, files in os.walk(source_dir):
folder = root[length:] # path without "parent"
for file in files:
tar.add(os.path.join(root, folder), folder)
I make the archive using:
make_tarfile('ARCHIVE.tar.gz', 'C:\FolderA')
Should I carry on using os.walk, or is there any other way to solve this?
Update
Here is an image showing the contents of my archive. As you can see, there is a " _ " folder in my archive that I want to get rid of--oddly enough, when I extract, only FolderA and FileB.html appear as archived. In essence, the behavior is correct, but if I could go the last step of removing the " _ " folder from the archive, that would be perfect. I'm going to ask an updated question to limit confusion.

This works for me:
with tarfile.open(output_filename, "w:gz") as tar:
for fn in os.listdir(source_dir):
p = os.path.join(source_dir, fn)
tar.add(p, arcname=fn)
i.e. Just list the root of the source dir and add each entry to the archive. No need for walking the source dir as adding a directory via tar.add() is automatically recursive.

I've tried to provide some examples of how changes to the source directory makes a difference to what finally gets extracted.
As per your example, I have this folder structure
I have this python to generate the tar file (lifted from here)
import tarfile
import os
def make_tarfile(output_filename, source_dir):
with tarfile.open(output_filename, "w:gz") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
What data and structure is included in the tar file depends on what location I provide as a parameter.
So this location parameter,
make_tarfile('folder.tar.gz','folder_A/' )
will generate this result when extracted
If I move into folder_A and reference folder_B,
make_tarfile('folder.tar.gz','folder_A/folder_B' )
This is what the extract will be,
Notice that folder_B is the root of this extract.
Now finally,
make_tarfile('folder.tar.gz','folder_A/folder_B/' )
Will extract to this
Just the file is included in the extract.

Here is a function to perform the task. I have had some issues extracting the tar on Windows (with WinRar) as it seemed to try to extract the same file twice, but I think it will work fine when extracting the archive properly.
"""
The directory structure I have is as follows:
├───FolderA
│ │ FileB
│ │
│ └───FolderB
│ FileC
"""
import tarfile
import os
# This is where I stored FolderA on my computer
ROOT = os.path.join(os.path.dirname(__file__), "FolderA")
def make_tarfile(output_filename: str, source_dir: str) -> bool:
"""
:return: True on success, False otherwise
"""
# This is where the path to each file and folder will be saved
paths_to_tar = set()
# os.walk over the root folder ("FolderA") - note it will never get added
for dirpath, dirnames, filenames in os.walk(source_dir):
# Resolve path issues, for example for Windows
dirpath = os.path.normpath(dirpath)
# Add each folder and path in the current directory
# Probably could use zip here instead of set unions but can't be bothered to try to figure it out
paths_to_tar = paths_to_tar.union(
{os.path.join(dirpath, d) for d in dirnames}).union(
{os.path.join(dirpath, f) for f in filenames})
try:
# This will create the tar file in the current directory
with tarfile.open(output_filename, "w:gz") as tar:
# Change the directory to treat all paths relatively
os.chdir(source_dir)
# Finally add each path using the relative path
for path in paths_to_tar:
tar.add(os.path.relpath(path, source_dir))
return True
except (tarfile.TarError, OSError) as e:
print(f"An error occurred - {e}")
return False
if __name__ == '__main__':
make_tarfile("tarred_files.tar.gz", ROOT)

You could use subprocess to achieve something similar but much faster.
def make_tarfile(output_filename, source_dir):
subprocess.call(["tar", "-C", source_dir, "-zcvf", output_filename, "."])

Finding partial filenames in subdirectories

I'm trying to create a script where it will go through multiple directories and sub directories and find a matching filename and display it's path.
I was able to do this in shell script with ease and i was able to get the desired output. I have used this in shell like this:
echo "enter the name of the movie."
read moviename
cd "D:\movies"
find -iname "*$moviename*" > result.txt
cat result.txt
for i in result.txt
do
if [ "$(stat -c %s "$i")" -le 1 ]
then
echo "No such movie exists"
fi
done
This is what I have in python and i'm getting nowhere.
import os.path
from os import path
print ('What\'s the name of the movie?')
name = input()
for root, dirs, files in os.walk('D:\movies'):
for file in files:
if os.path.isfile('D:\movies'+name):
print(os.path.join(root, file))
else:
print('No such movie')
I want it to search for the filename case insensitive and have it display. I've tried so hard to do it.

import os
name = input('What\'s the name of the movie?')
success = False
for root, dirs, files in os.walk('D:\movies'):
for file in files:
if name.lower() in file.lower():
print(os.path.join(root, file))
success = True
if success == False:
print('No such movie')
You don't need to import each part of os separately.
You can combine input and print into one line.
This is basically asking 'if this string is in that string, print the path'. lower() will make it case insensitive.
I added the success variable as otherwise it will print line for every time a file doesn't match.

You may want to replace the (absolutely meaningless, because you don't compare file with anything):
if os.path.isfile('D:\movies'+name):
with:
if file.lower().find(name.lower()) != -1 :
and have fun with the file list you're getting =)

from pathlib import Path
MOVIES = Path('D:\movies')
def find_file(name)
for path in MOVIES.rglob('*'):
if path.is_file() and name.lower() in path.name.lower():
break
else:
print('File not found.')
path = None
return path
You could also look into the fuzzywuzzy library for fuzzy matching between file names and input name.

How do I iteratively copy logs from the local drive to a network share?

I'm new to Python. I'm running version 3.3. I'd like to iteratively copy all wildcard named folders and files from the C drive to a network share. Wildcard named folders are called "Test_1", "Test_2", etc. with folders containing the same named folder, "Pass". The files in "Pass" end with .log. I do NOT want to copy the .log files in the Fail folder. So, I have this:
C:\Test_1\Pass\a.log
C:\Test_1\Fail\a.log
C:\Test_1\Pass\b.log
C:\Test_1\Fail\b.log
C:\Test_2\Pass\y.log
C:\Test_2\Fail\y.log
C:\Test_2\Pass\z.log
C:\Test_2\Fail\z.log
but only want to copy
C:\Test_1\Pass\a.log
C:\Test_1\Pass\b.log
C:\Test_2\Pass\y.log
C:\Test_2\Pass\z.log
to:
\\share\Test_1\Pass\a.log
\\share\Test_1\Pass\b.log
\\share\Test_2\Pass\y.log
\\share\Test_2\Pass\z.log'
The following code works but I don't want to copy tons of procedural code. I'd like to make it object oriented.
import shutil, os
from shutil import copytree
def main():
source = ("C:\\Test_1\\Pass\\")
destination = ("\\\\share\\Test_1\\Pass\\")
if os.path.exists ("C:\\Test_1\\Pass\\"):
shutil.copytree (source, destination)
print ('Congratulations! Copy was successfully completed!')
else:
print ('There is no Actual folder in %source.')
main()
Also, I noticed it is not printing the "else" print statement when the os path does not exist. How do I accomplish this? Thanks in advance!

This is not a perfect example but you could do this:
import glob, os, shutil
#root directory
start_dir = 'C:\\'
def copy_to_remote(local_folders, remote_path):
if os.path.exists(remote_path):
for source in local_folders:
# source currently has start_dir at start. Strip it and add remote path
dest = os.path.join(remote_path, source.lstrip(start_dir))
try:
shutil.copytree(source, dest)
print ('Congratulations! Copy was successfully completed!')
except FileExistsError as fe_err:
print(fe_err)
except PermissionError as pe_err:
print(pe_err)
else:
print('{} - does not exist'.format(remote_path))
# Find all directories that start start_dir\Test_ and have subdirectory Pass
dir_list = glob.glob(os.path.join(start_dir, 'Test_*\\Pass'))
if dir_list:
copy_to_remote(dir_list, '\\\\Share\\' )
Documentation for glob can be found here.

def remotecopy(local, remote)
if os.path.exists(local):
shutil.copytree (local, remote)
print ('Congratulations! Copy was successfully completed!')
else:
print ('There is no Actual folder in %local.')
Then just remotecopy("C:\Local\Whatever", "C:\Remote\Whatever")

how can i make u1-publish-folder (Ubuntu One) recursive into each folder

How do I extend this to each folder in the target folder?
you know recursive to every file in all subfolders.
http://bazaar.launchpad.net/~sil/+junk/utility-programs/view/head:/u1-publish-folder

Alternative if you really want to modify the script edit the bottom part like this:
(and yes, sorry for double posting but this keeps it cleaner)
if __name__ == '__main__':
if len(sys.argv) > 1:
folder = sys.argv[1]
else:
print "Syntax: %s folder" % sys.argv[0]
sys.exit(1)
folder = os.path.realpath(folder)
if not os.path.isdir(folder):
print "%s is not a folder. Terminating." % (folder,)
sys.exit(1)
# walk all directories inside `folder`
for (dirname, subdirs, files) in os.walk(folder):
subdir = os.path.realpath(dirname)
if not os.path.isdir(subdir):
continue
reactor.callWhenRunning(process_folder, subdir)
reactor.run()
This should do the same as my bash-based answer but with a simple call of u1-publish-folder target-folder.

If I understand the script and your question correctly you could do it with a bash line like this:
find /path/to/target-folder -type d -exec u1-publish-folder '{}' ';'
Which will recursively find all folders below target-folder and call u1-publish-folder for them.
(But excluding target-folder, you'll have to call it once for that one)
If you explicitly wan't the script to do the recursion by itself please edit your question - ok?

Check if a directory exists in a zip file with Python

Initially I was thinking of using os.path.isdir but I don't think this works for zip files. Is there a way to peek into the zip file and verify that this directory exists? I would like to prevent using unzip -l "$#" as much as possible, but if that's the only solution then I guess I have no choice.

Just check the filename with "/" at the end of it.
import zipfile
def isdir(z, name):
return any(x.startswith("%s/" % name.rstrip("/")) for x in z.namelist())
f = zipfile.ZipFile("sample.zip", "r")
print isdir(f, "a")
print isdir(f, "a/b")
print isdir(f, "a/X")
You use this line
any(x.startswith("%s/" % name.rstrip("/")) for x in z.namelist())
because it is possible that archive contains no directory explicitly; just a path with a directory name.
Execution result:
$ mkdir -p a/b/c/d
$ touch a/X
$ zip -r sample.zip a
adding: a/ (stored 0%)
adding: a/X (stored 0%)
adding: a/b/ (stored 0%)
adding: a/b/c/ (stored 0%)
adding: a/b/c/d/ (stored 0%)
$ python z.py
True
True
False

You can check for the directories with ZipFile.namelist().
import os, zipfile
dir = "some/directory/"
z = zipfile.ZipFile("myfile.zip")
if dir in z.namelist():
print "Found %s!" % dir

for python(>=3.6):
this is how the is_dir() implemented in python source code:
def is_dir(self):
"""Return True if this archive member is a directory."""
return self.filename[-1] == '/'
It simply checks if the filename ends with a slash /, Can't tell if this will work correctly in some certain circumstances(so IMO it is badly implemented).
for python(<3.6):
as print(zipinfo) will show filemode but no corrsponding property or field is provided, I dive into zipfile module source code and found how it is implemented.
(see def __repr__(self): https://github.com/python/cpython/blob/3.6/Lib/zipfile.py)
possibly a bad idea but it will work:
if you want something simple and easy, this will work in most cases but it may fail because in some cases this field will not be printed.
def is_dir(zipinfo):
return "filemode='d" in zipinfo.__repr__()
Finally:
my solution is to check file mode manually and decide if the referenced file is actually a directory inspired by https://github.com/python/cpython/blob/3.6/Lib/zipfile.py line 391.
def is_dir(fileinfo):
hi = fileinfo.external_attr >> 16
return (hi & 0x4000) > 0

You can accomplish this using the built-in library ZipFile.
import zipfile
z = zipfile.ZipFile("file.zip")
if "DirName/" in [member.filename for member in z.infolist()]:
print("Directory exists in archive")
Tested and functional with Python32.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.