Check if a directory exists in a zip file with Python - python

Initially I was thinking of using os.path.isdir but I don't think this works for zip files. Is there a way to peek into the zip file and verify that this directory exists? I would like to prevent using unzip -l "$#" as much as possible, but if that's the only solution then I guess I have no choice.

Just check the filename with "/" at the end of it.
import zipfile
def isdir(z, name):
return any(x.startswith("%s/" % name.rstrip("/")) for x in z.namelist())
f = zipfile.ZipFile("sample.zip", "r")
print isdir(f, "a")
print isdir(f, "a/b")
print isdir(f, "a/X")
You use this line
any(x.startswith("%s/" % name.rstrip("/")) for x in z.namelist())
because it is possible that archive contains no directory explicitly; just a path with a directory name.
Execution result:
$ mkdir -p a/b/c/d
$ touch a/X
$ zip -r sample.zip a
adding: a/ (stored 0%)
adding: a/X (stored 0%)
adding: a/b/ (stored 0%)
adding: a/b/c/ (stored 0%)
adding: a/b/c/d/ (stored 0%)
$ python z.py
True
True
False

You can check for the directories with ZipFile.namelist().
import os, zipfile
dir = "some/directory/"
z = zipfile.ZipFile("myfile.zip")
if dir in z.namelist():
print "Found %s!" % dir

for python(>=3.6):
this is how the is_dir() implemented in python source code:
def is_dir(self):
"""Return True if this archive member is a directory."""
return self.filename[-1] == '/'
It simply checks if the filename ends with a slash /, Can't tell if this will work correctly in some certain circumstances(so IMO it is badly implemented).
for python(<3.6):
as print(zipinfo) will show filemode but no corrsponding property or field is provided, I dive into zipfile module source code and found how it is implemented.
(see def __repr__(self): https://github.com/python/cpython/blob/3.6/Lib/zipfile.py)
possibly a bad idea but it will work:
if you want something simple and easy, this will work in most cases but it may fail because in some cases this field will not be printed.
def is_dir(zipinfo):
return "filemode='d" in zipinfo.__repr__()
Finally:
my solution is to check file mode manually and decide if the referenced file is actually a directory inspired by https://github.com/python/cpython/blob/3.6/Lib/zipfile.py line 391.
def is_dir(fileinfo):
hi = fileinfo.external_attr >> 16
return (hi & 0x4000) > 0

You can accomplish this using the built-in library ZipFile.
import zipfile
z = zipfile.ZipFile("file.zip")
if "DirName/" in [member.filename for member in z.infolist()]:
print("Directory exists in archive")
Tested and functional with Python32.

Related

How to create tar.gz archive in Python/tar without include parent directory?

I have a FolderA which contains FolderB and FileB. How can I create a tar.gz archive which ONLY contains FolderB and FileB, removing the parent directory FolderA? I'm using Python and I'm running this code on a Windows machine.
The best lead I found was: How to create full compressed tar file using Python?
In the most upvoted answer, people discuss ways to remove the parent directory, but none of them work for me. I've tried arcname, os.walk, and running the tar command via subprocess.call ().
I got close with os.walk, but in the code below, it still drops a " _ " directory in with FolderB and FileB. So, the file structure is ARCHIVE.tar.gz > ARCHIVE.tar > "_" directory, FolderB, FileB.
def make_tarfile(output_filename, source_dir):
with tarfile.open(output_filename, "w:gz") as tar:
length = len(source_dir)
for root, dirs, files in os.walk(source_dir):
folder = root[length:] # path without "parent"
for file in files:
tar.add(os.path.join(root, folder), folder)
I make the archive using:
make_tarfile('ARCHIVE.tar.gz', 'C:\FolderA')
Should I carry on using os.walk, or is there any other way to solve this?
Update
Here is an image showing the contents of my archive. As you can see, there is a " _ " folder in my archive that I want to get rid of--oddly enough, when I extract, only FolderA and FileB.html appear as archived. In essence, the behavior is correct, but if I could go the last step of removing the " _ " folder from the archive, that would be perfect. I'm going to ask an updated question to limit confusion.
This works for me:
with tarfile.open(output_filename, "w:gz") as tar:
for fn in os.listdir(source_dir):
p = os.path.join(source_dir, fn)
tar.add(p, arcname=fn)
i.e. Just list the root of the source dir and add each entry to the archive. No need for walking the source dir as adding a directory via tar.add() is automatically recursive.
I've tried to provide some examples of how changes to the source directory makes a difference to what finally gets extracted.
As per your example, I have this folder structure
I have this python to generate the tar file (lifted from here)
import tarfile
import os
def make_tarfile(output_filename, source_dir):
with tarfile.open(output_filename, "w:gz") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
What data and structure is included in the tar file depends on what location I provide as a parameter.
So this location parameter,
make_tarfile('folder.tar.gz','folder_A/' )
will generate this result when extracted
If I move into folder_A and reference folder_B,
make_tarfile('folder.tar.gz','folder_A/folder_B' )
This is what the extract will be,
Notice that folder_B is the root of this extract.
Now finally,
make_tarfile('folder.tar.gz','folder_A/folder_B/' )
Will extract to this
Just the file is included in the extract.
Here is a function to perform the task. I have had some issues extracting the tar on Windows (with WinRar) as it seemed to try to extract the same file twice, but I think it will work fine when extracting the archive properly.
"""
The directory structure I have is as follows:
├───FolderA
│ │ FileB
│ │
│ └───FolderB
│ FileC
"""
import tarfile
import os
# This is where I stored FolderA on my computer
ROOT = os.path.join(os.path.dirname(__file__), "FolderA")
def make_tarfile(output_filename: str, source_dir: str) -> bool:
"""
:return: True on success, False otherwise
"""
# This is where the path to each file and folder will be saved
paths_to_tar = set()
# os.walk over the root folder ("FolderA") - note it will never get added
for dirpath, dirnames, filenames in os.walk(source_dir):
# Resolve path issues, for example for Windows
dirpath = os.path.normpath(dirpath)
# Add each folder and path in the current directory
# Probably could use zip here instead of set unions but can't be bothered to try to figure it out
paths_to_tar = paths_to_tar.union(
{os.path.join(dirpath, d) for d in dirnames}).union(
{os.path.join(dirpath, f) for f in filenames})
try:
# This will create the tar file in the current directory
with tarfile.open(output_filename, "w:gz") as tar:
# Change the directory to treat all paths relatively
os.chdir(source_dir)
# Finally add each path using the relative path
for path in paths_to_tar:
tar.add(os.path.relpath(path, source_dir))
return True
except (tarfile.TarError, OSError) as e:
print(f"An error occurred - {e}")
return False
if __name__ == '__main__':
make_tarfile("tarred_files.tar.gz", ROOT)
You could use subprocess to achieve something similar but much faster.
def make_tarfile(output_filename, source_dir):
subprocess.call(["tar", "-C", source_dir, "-zcvf", output_filename, "."])

Is it possible to Remove Directory with some contents using pysftp module?

I am creating a backup script using pysftp module. I am able to upload
and download files. When i am trying to Delete a directory with some
contents i got an exception.
This is what i tried
con = pysftp.Connection('192.168.0.40',username='root',password='clado123')
con.chdir('/root/backup')
con.pwd
con.listdir()
['data', 'test']
data - directory is not empty.
test - directory is empty.
con.rmdir('test')
con.listdir()
['data']
con.rmdir('data')
OSError: Failure
Can any one suggest me a way to solve this problem?
I have find out a way. There is method called 'execute' in the pysftp connection object. We can execute commands on remote server using this method.
con.execute('rm -rf /root/backup/data')
con.listdir()
[]
rmdir(remotepath) found at the documenation http://pysftp.readthedocs.io/en/release_0.2.8/pysftp.html#pysftp.Connection.rmdir
For a pysftp only approach to delete non-empty directories, you can use con.walktree to recursively delete all regular files while building a list of subdirectories to delete after the files.
You can't directly delete the directories from walktree since it visits the deeper levels last, and you need the opposite.
After the walktree runs you can iterate (in reverse) over the list of directories and rmdir each one.
Note that in the code below fcallback=con.remove so walktree will call remove(remotepath) for each file recursively.
On the other hand, dcallback=dirs.append so for each directory will will execute dirs.append(directory_remote_path) , effectively building the list of directories to delete.
import os
import pysftp
sftp_host = os.environ[ 'SFTP_HOST' ]
sftp_username = os.environ[ 'SFTP_USERNAME' ]
sftp_password = os.environ[ 'SFTP_PASSWORD' ]
con = pysftp.Connection(sftp_host, username=sftp_username, password=sftp_password)
dirs = ['/root/backup/data']
con.walktree(dirs[0], fcallback=con.remove, dcallback=dirs.append, ucallback=con.remove, recurse=True)
print(dirs)
for d in reversed(dirs):
print("Delete directory", d)
con.rmdir(d)
con.close()
remove all the files from directory and then use rmdir to remove empty directory. Works for me.
def clean_dir(sftp, dir, files):
assert sftp.isdir(str(dir)), "Outbox does not exist!"
assert files.is_dir(), "Source directory does not exist!"
for capsule in list(files.iterdir()):
target = dir / capsule.name
assert sftp.exists(str(target)), "Target files does not exist!"
sftp.remove(str(target))
assert not sftp.exists(str(target)), "Removed file still exists!"

Finding empty directories in Python

All,
What is the best way to check to see if there is data in a directory before deleting it? I am browsing through a couple pages to find some pics using wget and of course every page does not have an image on it but the directory is still created.
dir = 'Files\\%s' % (directory)
os.mkdir(dir)
cmd = 'wget -r -l1 -nd -np -A.jpg,.png,.gif -P %s %s' %(dir, i[1])
os.system(cmd)
if not os.path.isdir(dir):
os.rmdir(dir)
I would like to test to see if a file was dropped in the directory after it was created. If nothing is there...delete it.
Thanks,
Adam
import os
if not os.listdir(dir):
os.rmdir(dir)
LBYL style.
for EAFP, see mouad's answer.
I will go with EAFP like so:
try:
os.rmdir(dir)
except OSError as ex:
if ex.errno == errno.ENOTEMPTY:
print "directory not empty"
os.rmdir will not delete a directory that is not empty.
Try:
if not os.listdir(dir):
print "Empty"
or
if os.listdir(dir) == []:
print "Empty"
This can now be done more efficiently in Python3.5+, since there is no need to build a list of the directory contents just to see if its empty:
import os
def is_dir_empty(path):
with os.scandir(path) as scan:
return next(scan, None) is None
Example usage:
if os.path.isdir(directory) and is_dir_empty(directory):
os.rmdir(directory)
What if you did checked if the directory exists, and whether there is content in the directory... something like:
if os.path.isdir(dir) and len(os.listdir(dir)) == 0:
os.rmdir(dir)
If the empty directories are already created, you can place this script in your outer directory and run it:
import os
def _visit(arg, dirname, names):
if not names:
print 'Remove %s' % dirname
os.rmdir(dirname)
def run(outer_dir):
os.path.walk(outer_dir, _visit, 0)
if __name__ == '__main__':
outer_dir = os.path.dirname(__file__)
run(outer_dir)
os.system('pause')
Here is the fastest and optimized way to check if the directory is empty or not.
empty = False
for dirpath, dirnames, files in os.walk(dir):
if files:
print("Not empty !") ;
if not files:
print("It is empty !" )
empty = True
break ;
The other answers mentioned here are not fast because , if you want use the usual os.listdir() , if the directory has too many files , it will slow ur code and if you use the os.rmdir( ) method to try to catch the error , then it will simply delete that folder. This might not be something which u wanna do if you just want to check for emptyness .
I have follews Bash checking if folder has contents answer.
Mainly it is similiar approach as #ideasman42's answer on https://stackoverflow.com/a/47363995/2402577, in order to not to build the complete list, which would probably work on Debian as well.
there is no need to build a list of the directory contents just to
see if its empty:
os.walk('.') returns the complete files under a directory and if there thousands it may be inefficient. Instead following command find "$target" -mindepth 1 -print -quit returns first found file and quits. If it returns an empty string, which means folder is empty.
You can check if a directory is empty using find, and processing its
output
def is_dir_empty(absolute_path):
cmd = ["find", absolute_path, "-mindepth", "1", "-print", "-quit"]
output = subprocess.check_output(cmd).decode("utf-8").strip()
return not output
print is_dir_empty("some/path/here")

Python Script to backup a directory

#Filename:backup_ver1
import os
import time
#1 Using list to specify the files and directory to be backed up
source = r'C:\Documents and Settings\rgolwalkar\Desktop\Desktop\Dr Py\Final_Py'
#2 define backup directory
destination = r'C:\Documents and Settings\rgolwalkar\Desktop\Desktop\PyDevResourse'
#3 Setting the backup name
targetBackup = destination + time.strftime('%Y%m%d%H%M%S') + '.rar'
rar_command = "rar.exe a -ag '%s' %s" % (targetBackup, ''.join(source))
##i am sure i am doing something wrong here - rar command please let me know
if os.system(rar_command) == 0:
print 'Successful backup to', targetBackup
else:
print 'Backup FAILED'
O/P:- Backup FAILED
winrar is added to Path and CLASSPATH under Environment variables as well - anyone else with a suggestion for backing up the directory is most welcome
Maybe instead of writing your own backup script you could use python tool called rdiff-backup, which can create incremental backups?
The source directory contains spaces, but you don't have quotes around it in the command line. This might be a reason for the backup to fail.
To avoid problems like this, use the subprocess module instead of os.system:
subprocess.call(['rar.exe', 'a', '-ag', targetBackup, source])
if the compression algorithm can be something else and its just to backup a directory, why not do it with python's own tar and gzip instead? eg
import os
import tarfile
import time
root="c:\\"
source=os.path.join(root,"Documents and Settings","rgolwalkar","Desktop","Desktop","Dr Py","Final_Py")
destination=os.path.join(root,"Documents and Settings","rgolwalkar","Desktop","Desktop","PyDevResourse")
targetBackup = destination + time.strftime('%Y%m%d%H%M%S') + 'tar.gz'
tar = tarfile.open(targetBackup, "w:gz")
tar.add(source)
tar.close()
that way, you are not dependent on rar.exe on the system.

Test if executable exists in Python?

In Python, is there a portable and simple way to test if an executable program exists?
By simple I mean something like the which command which would be just perfect. I don't want to search PATH manually or something involving trying to execute it with Popen & al and see if it fails (that's what I'm doing now, but imagine it's launchmissiles)
I know this is an ancient question, but you can use distutils.spawn.find_executable. This has been documented since python 2.4 and has existed since python 1.6.
import distutils.spawn
distutils.spawn.find_executable("notepad.exe")
Also, Python 3.3 now offers shutil.which().
Easiest way I can think of:
def which(program):
import os
def is_exe(fpath):
return os.path.isfile(fpath) and os.access(fpath, os.X_OK)
fpath, fname = os.path.split(program)
if fpath:
if is_exe(program):
return program
else:
for path in os.environ["PATH"].split(os.pathsep):
exe_file = os.path.join(path, program)
if is_exe(exe_file):
return exe_file
return None
Edit: Updated code sample to include logic for handling case where provided argument is already a full path to the executable, i.e. "which /bin/ls". This mimics the behavior of the UNIX 'which' command.
Edit: Updated to use os.path.isfile() instead of os.path.exists() per comments.
Edit: path.strip('"') seems like the wrong thing to do here. Neither Windows nor POSIX appear to encourage quoted PATH items.
Use shutil.which() from Python's wonderful standard library.
Batteries included!
For python 3.3 and later:
import shutil
command = 'ls'
shutil.which(command) is not None
As a one-liner of Jan-Philip Gehrcke Answer:
cmd_exists = lambda x: shutil.which(x) is not None
As a def:
def cmd_exists(cmd):
return shutil.which(cmd) is not None
For python 3.2 and earlier:
my_command = 'ls'
any(
(
os.access(os.path.join(path, my_command), os.X_OK)
and os.path.isfile(os.path.join(path, my_command)
)
for path in os.environ["PATH"].split(os.pathsep)
)
This is a one-liner of Jay's Answer, Also here as a lambda func:
cmd_exists = lambda x: any((os.access(os.path.join(path, x), os.X_OK) and os.path.isfile(os.path.join(path, x))) for path in os.environ["PATH"].split(os.pathsep))
cmd_exists('ls')
Or lastly, indented as a function:
def cmd_exists(cmd, path=None):
""" test if path contains an executable file with name
"""
if path is None:
path = os.environ["PATH"].split(os.pathsep)
for prefix in path:
filename = os.path.join(prefix, cmd)
executable = os.access(filename, os.X_OK)
is_not_directory = os.path.isfile(filename)
if executable and is_not_directory:
return True
return False
Just remember to specify the file extension on windows. Otherwise, you have to write a much complicated is_exe for windows using PATHEXT environment variable. You may just want to use FindPath.
OTOH, why are you even bothering to search for the executable? The operating system will do it for you as part of popen call & will raise an exception if the executable is not found. All you need to do is catch the correct exception for given OS. Note that on Windows, subprocess.Popen(exe, shell=True) will fail silently if exe is not found.
Incorporating PATHEXT into the above implementation of which (in Jay's answer):
def which(program):
def is_exe(fpath):
return os.path.exists(fpath) and os.access(fpath, os.X_OK) and os.path.isfile(fpath)
def ext_candidates(fpath):
yield fpath
for ext in os.environ.get("PATHEXT", "").split(os.pathsep):
yield fpath + ext
fpath, fname = os.path.split(program)
if fpath:
if is_exe(program):
return program
else:
for path in os.environ["PATH"].split(os.pathsep):
exe_file = os.path.join(path, program)
for candidate in ext_candidates(exe_file):
if is_exe(candidate):
return candidate
return None
For *nix platforms (Linux and OS X)
This seems to be working for me:
Edited to work on Linux, thanks to Mestreion
def cmd_exists(cmd):
return subprocess.call("type " + cmd, shell=True,
stdout=subprocess.PIPE, stderr=subprocess.PIPE) == 0
What we're doing here is using the builtin command type and checking the exit code. If there's no such command, type will exit with 1 (or a non-zero status code anyway).
The bit about stdout and stderr is just to silence the output of the type command, since we're only interested in the exit status code.
Example usage:
>>> cmd_exists("jsmin")
True
>>> cmd_exists("cssmin")
False
>>> cmd_exists("ls")
True
>>> cmd_exists("dir")
False
>>> cmd_exists("node")
True
>>> cmd_exists("steam")
False
On the basis that it is easier to ask forgiveness than permission (and, importantly, that the command is safe to run) I would just try to use it and catch the error (OSError in this case - I checked for file does not exist and file is not executable and they both give OSError).
It helps if the executable has something like a --version or --help flag that is a quick no-op.
import subprocess
myexec = "python2.8"
try:
subprocess.call([myexec, '--version']
except OSError:
print "%s not found on path" % myexec
This is not a general solution, but will be the easiest way for a lot of use cases - those where the code needs to look for a single well known executable which is safe to run, or at least safe to run with a given flag.
See os.path module for some useful functions on pathnames. To check if an existing file is executable, use os.access(path, mode), with the os.X_OK mode.
os.X_OK
Value to include in the mode parameter of access() to determine if path can be executed.
EDIT: The suggested which() implementations are missing one clue - using os.path.join() to build full file names.
I know that I'm being a bit of a necromancer here, but I stumbled across this question and the accepted solution didn't work for me for all cases Thought it might be useful to submit anyway. In particular, the "executable" mode detection, and the requirement of supplying the file extension. Furthermore, both python3.3's shutil.which (uses PATHEXT) and python2.4+'s distutils.spawn.find_executable (just tries adding '.exe') only work in a subset of cases.
So I wrote a "super" version (based on the accepted answer, and the PATHEXT suggestion from Suraj). This version of which does the task a bit more thoroughly, and tries a series of "broadphase" breadth-first techniques first, and eventually tries more fine-grained searches over the PATH space:
import os
import sys
import stat
import tempfile
def is_case_sensitive_filesystem():
tmphandle, tmppath = tempfile.mkstemp()
is_insensitive = os.path.exists(tmppath.upper())
os.close(tmphandle)
os.remove(tmppath)
return not is_insensitive
_IS_CASE_SENSITIVE_FILESYSTEM = is_case_sensitive_filesystem()
def which(program, case_sensitive=_IS_CASE_SENSITIVE_FILESYSTEM):
""" Simulates unix `which` command. Returns absolute path if program found """
def is_exe(fpath):
""" Return true if fpath is a file we have access to that is executable """
accessmode = os.F_OK | os.X_OK
if os.path.exists(fpath) and os.access(fpath, accessmode) and not os.path.isdir(fpath):
filemode = os.stat(fpath).st_mode
ret = bool(filemode & stat.S_IXUSR or filemode & stat.S_IXGRP or filemode & stat.S_IXOTH)
return ret
def list_file_exts(directory, search_filename=None, ignore_case=True):
""" Return list of (filename, extension) tuples which match the search_filename"""
if ignore_case:
search_filename = search_filename.lower()
for root, dirs, files in os.walk(path):
for f in files:
filename, extension = os.path.splitext(f)
if ignore_case:
filename = filename.lower()
if not search_filename or filename == search_filename:
yield (filename, extension)
break
fpath, fname = os.path.split(program)
# is a path: try direct program path
if fpath:
if is_exe(program):
return program
elif "win" in sys.platform:
# isnt a path: try fname in current directory on windows
if is_exe(fname):
return program
paths = [path.strip('"') for path in os.environ.get("PATH", "").split(os.pathsep)]
exe_exts = [ext for ext in os.environ.get("PATHEXT", "").split(os.pathsep)]
if not case_sensitive:
exe_exts = map(str.lower, exe_exts)
# try append program path per directory
for path in paths:
exe_file = os.path.join(path, program)
if is_exe(exe_file):
return exe_file
# try with known executable extensions per program path per directory
for path in paths:
filepath = os.path.join(path, program)
for extension in exe_exts:
exe_file = filepath+extension
if is_exe(exe_file):
return exe_file
# try search program name with "soft" extension search
if len(os.path.splitext(fname)[1]) == 0:
for path in paths:
file_exts = list_file_exts(path, fname, not case_sensitive)
for file_ext in file_exts:
filename = "".join(file_ext)
exe_file = os.path.join(path, filename)
if is_exe(exe_file):
return exe_file
return None
Usage looks like this:
>>> which.which("meld")
'C:\\Program Files (x86)\\Meld\\meld\\meld.exe'
The accepted solution did not work for me in this case, since there were files like meld.1, meld.ico, meld.doap, etc also in the directory, one of which were returned instead (presumably since lexicographically first) because the executable test in the accepted answer was incomplete and giving false positives.
This seems simple enough and works both in python 2 and 3
try: subprocess.check_output('which executable',shell=True)
except: sys.exit('ERROR: executable not found')
Added windows support
def which(program):
path_ext = [""];
ext_list = None
if sys.platform == "win32":
ext_list = [ext.lower() for ext in os.environ["PATHEXT"].split(";")]
def is_exe(fpath):
exe = os.path.isfile(fpath) and os.access(fpath, os.X_OK)
# search for executable under windows
if not exe:
if ext_list:
for ext in ext_list:
exe_path = "%s%s" % (fpath,ext)
if os.path.isfile(exe_path) and os.access(exe_path, os.X_OK):
path_ext[0] = ext
return True
return False
return exe
fpath, fname = os.path.split(program)
if fpath:
if is_exe(program):
return "%s%s" % (program, path_ext[0])
else:
for path in os.environ["PATH"].split(os.pathsep):
path = path.strip('"')
exe_file = os.path.join(path, program)
if is_exe(exe_file):
return "%s%s" % (exe_file, path_ext[0])
return None
An important question is "Why do you need to test if executable exist?" Maybe you don't? ;-)
Recently I needed this functionality to launch viewer for PNG file. I wanted to iterate over some predefined viewers and run the first that exists. Fortunately, I came across os.startfile. It's much better! Simple, portable and uses the default viewer on the system:
>>> os.startfile('yourfile.png')
Update: I was wrong about os.startfile being portable... It's Windows only. On Mac you have to run open command. And xdg_open on Unix. There's a Python issue on adding Mac and Unix support for os.startfile.
You can try the external lib called "sh" (http://amoffat.github.io/sh/).
import sh
print sh.which('ls') # prints '/bin/ls' depending on your setup
print sh.which('xxx') # prints None
So basically you want to find a file in mounted filesystem (not necessarily in PATH directories only) and check if it is executable. This translates to following plan:
enumerate all files in locally mounted filesystems
match results with name pattern
for each file found check if it is executable
I'd say, doing this in a portable way will require lots of computing power and time. Is it really what you need?
There is a which.py script in a standard Python distribution (e.g. on Windows '\PythonXX\Tools\Scripts\which.py').
EDIT: which.py depends on ls therefore it is not cross-platform.

Categories

Resources