I'm obviously doing something very wrong. I'd like to find files, that are in one directory but not in second directory (for instance xxx.phn in one directory and xxx.wav in second directory...
IT seems that I cannot detect, when file is NOT present in second directory (it's always showing like all files are)... I don't get any file displayed, although they exist...
import shutil, random, os, sys
if len(sys.argv) < 4:
print """usage: python del_orphans_dir1_dir2.py source_folder source_ext dest_folder dest_ext
"""
sys.exit(-1)
folder = sys.argv[1]
ext = sys.argv[2]
dest_folder = sys.argv[3]
dest_ext = sys.argv[4]
i = 0
for d, ds, fs in os.walk(folder):
for fname in fs:
basename = os.path.splitext(fname)[0]
if (not os.path.exists(dest_folder+'/'+basename + '.' + dest_ext) ):
print str(i)+': No duplicate for: '+fname
i=i+1
print str(i)+' files found'
Can I suggest that you make the filename you're looking at checking and print it before checking whether it exists..
dest_fname = dest_folder+'/'+basename + '.' + dest_ext
print "dest exists? %s" % dest_fname
os.path.exists(dest_fname)
Also as an aside please join paths using the join() method. (If you really want the basename without the leading path elements there's a basename() function).
I tried your program out and it worked for two simple flat directories. Here are the directory contents:
a\a.txt
a\b.txt # Missing from b directory
a\c.txt
b\a.csv
b\c.csv
And result of your script with a txt b csv as parameters. If your result was different, maybe you used different parameters?
0: No duplicate for: b.txt
1 files found
But when I added subdirectories:
a\a.txt
a\b.txt # Missing from b directory
a\c.txt
a\c\d.txt
a\c\e.txt # Missing from b\c directory
b\a.csv
b\c.csv
b\c\d.csv
Your script gives:
0: No duplicate for: b.txt
1: No duplicate for: d.txt # Error here
2: No duplicate for: e.txt
3 files found
To work with sub-directories you need to compute the path relative to the source directory, and then add it to the destination directory. Here's the result with a few other minor cleanups and prints to see what is going on. Note that fname is always just the file name and needs to be joined with d to get the whole path:
#!python2
import os, sys
if len(sys.argv) < 4:
print """usage: python del_orphans_dir1_dir2.py source_folder source_ext dest_folder dest_ext
"""
sys.exit(-1)
folder = sys.argv[1]
ext = sys.argv[2]
dest_folder = sys.argv[3]
dest_ext = sys.argv[4]
i = 0
for d, ds, fs in os.walk(folder):
for fname in fs:
relpath = os.path.relpath(os.path.join(d,fname),folder)
relbase = os.path.splitext(relpath)[0]
path_to_check = os.path.join(dest_folder,relbase+'.'+dest_ext)
if not os.path.exists(path_to_check):
print '{}: No duplicate for: {}, {} not found.'.format(i,os.path.join(folder,relpath),path_to_check)
i += 1
print i,'files found'
Output:
0: No duplicate for: a\b.txt, b\b.csv not found.
1: No duplicate for: a\c\e.txt, b\c\e.csv not found.
2 files found
What you're doing is looking for are matching files, not duplicate ones. One problem is that you're not using use the source_ext argument when searching. Another is I think the command-line argument handling is messed-up. Here's a corrected version that accomplishes what you're trying to do:
import os
import sys
if len(sys.argv) != 5:
print("usage: python "
"del_orphans_dir1_dir2.py " # argv[0] (script name)
"source_folder " # argv[1]
"source_ext " # argv[2]
"dest_folder " # argv[3]
"dest_ext") # argv[4]
sys.exit(2) # command line error
source_folder, source_ext, dest_folder, dest_ext = sys.argv[1:6]
dest_ext = dest_ext if dest_ext.startswith('.') else '.'+dest_ext # check dot
found = 0
for d, ds, fs in os.walk(source_folder):
for i, fname in enumerate(fs, start=1):
basename, ext = os.path.splitext(fname)
if ext == source_ext:
if os.path.exists(os.path.join(dest_folder, basename+dest_ext)):
found += 1
else:
print '{}: No matching file found for: {}'.format(i, fname)
print '{} matches found'.format(found)
sys.exit(0)
Related
I've got 6 directories (A, B, C, D, E, F) containing .mov files.
The structure is:
A
-0001_01.mov
-0002_01.mov
-...
B
-0001_02.mov
-0002_02.mov
-...
And so on.
First, I want to create as many directories as there are files in one of the directories mentioned above.
Let's say A contains 35 .mov files (B, C .. contain the same amount of .mov files).
I now got 35 folders starting from "01" up to "35".
Now I want to copy each corresponding .mov file into the same directory, which means 0001_01.mov - 0001_06.mov go into "01", 0002_01.mov - 0002_06.mov go into "02" and so on.
I've got the creation of the directories working, but I just can't wrap my head around the copying part.
import os
pathA = ("./A/")
pathB = ("./B/")
pathC = ("./C/")
pathD = ("./D/")
pathE = ("./E/")
pathF = ("./F/")
path, dirs, filesA = next(os.walk(pathA))
file_countA = len(filesA)
path, dirs, filesB = next(os.walk(pathB))
file_countB = len(filesB)
path, dirs, filesC = next(os.walk(pathC))
file_countC = len(filesC)
path, dirs, filesD = next(os.walk(pathD))
file_countD = len(filesD)
path, dirs, filesE = next(os.walk(pathE))
file_countE = len(filesE)
path, dirs, filesF = next(os.walk(pathF))
file_countF = len(filesF)
path2 = ("./")
if file_countA == file_countB == file_countC == file_countD == file_countE == file_countF:
print("true")
else:
print ("false")
for i in range(file_countA):
try:
if i < 9:
os.mkdir(path2 + "0" + str(i + 1))
path3 = ("./" + "0" + str(i + 1))
print (path3)
elif i >= 9:
os.mkdir(path2 + str(i + 1))
path3 = ("./" + str(i + 1))
print (path3)
except OSError:
print ("Creation of the directory %s failed" % path2)
else:
print ("Successfully created the directory %s " % path2)
This is my first time using python, I think the code reflects that.
I've now wasted countless hours on this, so any help is appreciated.
So I changed your code quite a bit and tested it quickly on my system and it seemed to do what you wanted. Can you try and let me know if this gave you idea of how it can be done?
Disclaimer: I'm not Python expert by any means but I find my way around it and this is most likely not the prettiest solution but it deos work on my machine exactly as you wanted it. Just make sure you run it from inside your folder and if you are not running it from outside your folder then change cwd = os.getcwd() to cwd = "path-to-your-folder"
import os
import shutil
import glob
paths = ["/A/","/B/","/C/","/D/","/E/","/F/"]
cwd = os.getcwd()
num_folders = 0
for path in paths:
num_files = len([f for f in os.listdir(cwd+path)if os.path.isfile(os.path.join(cwd+path, f))])
if num_files>num_folders:
num_folders = num_files
for i in range(num_folders):
try:
if i < 9:
fname = cwd + "/0" + str(i + 1)
os.mkdir(fname)
for path in paths:
source = cwd + "/" + path
filename = "000{}_*.mov".format(i+1)
for file in glob.glob(os.path.join(source,filename)):
shutil.copy2(file,fname)
elif i >= 9:
fname = cwd + "/" + str(i + 1)
os.mkdir(fname)
for path in paths:
source = cwd + "/" + path
filename = "00{}_*.mov".format(i+1)
for file in glob.glob(os.path.join(source,filename)):
shutil.copy2(file,fname)
except OSError:
pass
I'm no python expert either (look at my scores too, hi), but I've tried to keep your original coding order as much as possible. I would recommend to look at different codes for real expert-tier code but it seems to do what you're asking for :
import os
import shutil
mov_pathes = ["./a/", "./b/"]
all_files = []
lengths = []
for mov_path in mov_pathes :
# listdir gives you all files in the direcetory
files_in_dir = os.listdir(mov_path)
# we'll save those in a list along with where it's from ,
# ex : ('./patha/',['0001_01.mov','0002_01.mov'])
all_files.append((mov_path, files_in_dir))
# also length info for "all items are equal length" comparison in the future
lengths.append(len(files_in_dir))
if lengths.count(lengths[0]) == len(lengths) :
print ("true")
else :
print ("false")
base_dir = "./"
for i in range (1,lengths[0]+1) :
try :
# zfill(n) fills rest of your string to 0, (ex. "7".zfill(5) gives you 00007), probably helpful for future
path_name = base_dir + str(i).zfill(2)
os.mkdir(path_name)
except OSError :
print ("Creation of the directory {path_name} failed".format(path_name = path_name))
else :
print ("Successfully created the directory {path_name}".format(path_name = path_name))
Does exactly the same thing but it would probably make maintaining your code easier laster on.
for your real question, IF we're sure that your inputs are gonna look like 00XX_NN.mov, adding
for files in all_files :
# Remember we saved as (original dir, list of files in the dir?)
# This is a original dir
source_dir = files[0]
# This is list of files in that directory
source_files = files[1]
for file in source_files :
# so original file is located in source_dir + file
source_file = source_dir + file
# and your target directory is 00XX, so getting file[2:4] gives the target directory
target_dir = base_dir + file[2:4]
#shutil.copy (source file, target directory) copies your files.
shutil.copy (source_file , target_dir)
seems to do what you're asking for, at least for me. Once again I'm no expert so let me know if it's not working!
tested with :
./a
- 0001_01
- 0002_01
- 0003_01
./b
- 0001_02
- 0002_02
- 0003_02
result :
./01 :
- 0001_01
- 0001_02
./02 :
- 0002_01
- 0002_02
./03 :
- 0003_01
- 0003_02
I have a piece of code i wrote for school:
import os
source = "/home/pi/lab"
dest = os.environ["HOME"]
for file in os.listdir(source):
if file.endswith(".c")
shutil.move(file,dest+"/c")
elif file.endswith(".cpp")
shutil.move(file,dest+"/cpp")
elif file.endswith(".sh")
shutil.move(file,dest+"/sh")
what this code is doing is looking for files in a source directory and then if a certain extension is found the file is moved to that directory. This part works. If the file already exists in the destination folder of the same name add 1 at end of the file name, and before the extension and if they are multiples copies do "1++".
Like this: test1.c,test2.c, test3.c
I tried using os.isfile(filename) but this only looks at the source directory. and I get a true or false.
To test if the file exists in the destination folder you should os.path.join the dest folder with the file name
import os
import shutil
source = "/home/pi/lab"
dest = os.environ["HOME"]
# Avoid using the reserved word 'file' for a variable - renamed it to 'filename' instead
for filename in os.listdir(source):
# os.path.splitext does exactly what its name suggests - split the name and extension of the file including the '.'
name, extension = os.path.splitext(filename)
if extension == ".c":
dest_filename = os.path.join(dest, filename)
if not os.path.isfile(dest_filename):
# We copy the file as is
shutil.copy(os.path.join(source, filename) , dest)
else:
# We rename the file with a number in the name incrementing the number until we find one that is not used.
# This should be moved to a separate function to avoid code duplication when handling the different file extensions
i = 0
dest_filename = os.path.join(dest, "%s%d%s" % (name, i, extension))
while os.path.isfile(dest_filename):
i += 1
dest_filename = os.path.join(dest, "%s%d%s" % (name, i, extension))
shutil.copy(os.path.join(source, filename), dest_filename)
elif extension == ".cpp"
...
# Handle other extensions
If you want to have put the renaming logic in a separate function using glob and re this is one way:
import glob
import re
...
def rename_file(source_filename, source_ext):
filename_pattern = os.path.join(dest, "%s[0-9]*%s"
% (source_filename, source_ext))
# Contains file such as 'a1.c', 'a2.c', etc...
existing_files = glob.glob(filename_pattern)
regex = re.compile("%s([0-9]*)%s" % (source_filename, source_ext))
# Retrieve the max of the index used for this file using regex
max_index = max([int(match.group(1))
for match in map(regex.search, existing_files)
if match])
source_full_path = os.path.join(source, "%s%s"
% (source_filename, source_ext))
# Rebuild the destination filename with the max index + 1
dest_full_path = os.path.join(dest, "%s%d%s"
% (source_filename,
(max_index + 1),
source_ext))
shutil.copy(source_full_path, dest_full_path)
...
# If the file already exists i.e. replace the while loop in the else statement
rename_file(name, extension)
I din't test the code. But something like this should do the job:-
i = 0
filename = "a.txt"
while True:
if os.isfile(filename):
i+= 1
break
if i:
fname, ext = filename.split('.')
filename = fname + str(i) + '.' + ext
I'm trying to define arg1 outside of rename() and it does not work since dirs is not defined. If I use rename("dirs", False), the function does not work.
Any idea?
# Defining the function that renames the target
def rename(arg1, arg2):
for root, dirs, files in os.walk( # Listing
path, topdown=arg2):
for i, name in enumerate(arg1):
output = name.replace(pattern, "") # Taking out pattern
if output != name:
os.rename( # Renaming
os.path.join(root, name),
os.path.join(root, output))
else:
pass
# Run
rename(dirs, False)
Here's the whole program:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# This program batch renames files or folders by taking out a certain pattern
import os
import subprocess
import re
# Defining the function that renames the target
def rename(arg1, arg2):
for root, dirs, files in os.walk( # Listing
path, topdown=arg2):
for i, name in enumerate(arg1):
output = name.replace(pattern, "") # Taking out pattern
if output != name:
os.rename( # Renaming
os.path.join(root, name),
os.path.join(root, output))
else:
pass
# User chooses between file and folder
print "What do you want to rename?"
print "1 - Folders\n2 - Files\n"
valid = False
while not valid:
try:
choice = int(raw_input("Enter number here: "))
if choice > 2:
print "Please enter a valid number\n"
valid = False
else:
valid = True
except ValueError:
print "Please enter a valid number\n"
valid = False
choice = 3 # To have a correct value of choice
# Asking for path & pattern
if choice == 1:
kind = "folders"
elif choice == 2:
kind = "files"
else:
pass
path = raw_input("What is the path to the %s?\n " % (kind))
pattern = raw_input("What is the pattern to remove?\n ")
# CHOICE = 1
# Renaming folders
if choice == 1:
rename(dirs, False)
# CHOICE = 2
# Renaming files
if choice == 2:
rename(files,True)
# Success message
kind = kind.replace("f", "F")
print "%s renamed" % (kind)
Recorrect my code in a better way.
#!/usr/bin/env python
import os
import sys
# the command like this: python rename dirs /your/path/name/ tst
if __name__ == '__main__':
mode = sys.argv[1] # dirs or files
pathname = sys.argv[2]
pattern = sys.argv[3]
ndict = {'dirs': '', 'files': ''}
topdown = {'dirs': False, 'files': True}
for root, ndict['dirs'], ndict['files'] in os.walk(
pathname, topdown[mode]):
for name in enumerate(ndict[mode]):
newname = name.replace(pattern, '')
if newname != name:
os.rename(
os.path.join(root, name),
os.path.join(root, newname))
This is better achieved as a command-line tool using the py library:
import sys
from py.path import local # import local path object/class
def rename_files(root, pattern):
"""
Iterate over all paths starting at root using ``~py.path.local.visit()``
check if it is a file using ``~py.path.local.check(file=True)`` and
rename it with a new basename with ``pattern`` stripped out.
"""
for path in root.visit(rec=True):
if path.check(file=True):
path.rename(path.new(basename=path.basename.replace(pattern, "")))
def rename_dirs(root, pattern):
"""
Iterate over all paths starting at root using ``~py.path.local.visit()``
check if it is a directory using ``~py.path.local.check(dir=True)`` and
rename it with a new basename with ``pattern`` stripped out.
"""
for path in root.visit(rec=True):
if path.check(dir=True):
path.rename(path.new(basename=path.basename.replace(pattern, "")))
def main():
"""Define our main top-level entry point"""
root = local(sys.argv[1]) # 1 to skip the program name
pattern = sys.argv[2]
if local(sys.argv[0]).purebasename == "renamefiles":
rename_files(root, pattern)
else:
rename_dirs(root, pattern)
if __name__ == "__main__":
"""
Python sets ``__name__`` (a global variable) to ``__main__`` when being called
as a script/application. e.g: Python renamefiles or ./renamefiles
"""
main() # Call our main function
Usage:
renamefiles /path/to/dir pattern
or:
renamedirs /path/to/dir pattern
Save this as renamefiles or renamedirs.
A common approach in UNIX is to name the script/tool renamefiles and symlink renamefiles to renamedirs.
Improvement Notes:
Use optparse or argparse to provide Command Line Options = and a --help
Make rename_files() and rename_dirs() generic and move it into a single function.
Write documentation (docstrings)
Write unit tests.
Can someone tell me if the following function declaration is the correct way to pass a relative path to a function? The call is only taking one variable. When I include a second variable (absolute path), my function does not work.
def extract(tar_url, extract_path='.'):
The call that does not work:
extract(chosen, path)
This works, but does not extract:
extract(chosen)
Full Code:
def do_fileExtract(self, line):
defaultFolder = "Extracted"
if not defaultFolder.endswith(':') and not os.path.exists('c:\\Extracted'):
os.mkdir('c:\\Extracted')
raw_input("PLACE .tgz FILES in c:\Extracted AT THIS TIME!!! PRESS ENTER WHEN FINISHED!")
else:
pass
def extract(tar_url, extract_path='.'):
print tar_url
tar = tarfile.open(tar_url, 'r')
for item in tar:
tar.extract(item, extract_path)
if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
extract(item.name, "./" + item.name[:item.name.rfind('/')])
userpath = "Extracted"
directory = os.path.join("c:\\", userpath)
os.chdir(directory)
path=os.getcwd() #Set log path here
dirlist=os.listdir(path)
files = [fname for fname in os.listdir(path)
if fname.endswith(('.tgz','.tar'))]
for item in enumerate(files):
print "%d- %s" % item
try:
idx = int(raw_input("\nEnter the file's number:\n"))
except ValueError:
print "You fail at typing numbers."
try:
chosen = files[idx]
except IndexError:
print "Try a number in range next time."
newDir = raw_input('\nEnter a name to create a folder a the c: root directory:\n')
selectDir = os.path.join("c:\\", newDir)
path=os.path.abspath(selectDir)
if not newDir.endswith(':') and not os.path.exists(selectDir):
os.mkdir(selectDir)
try:
extract(chosen, path)
print 'Done'
except:
name = os.path.basename(sys.argv[0])
print chosen
It looks like you missed an escape character in "PLACE .tgz FILES in c:\Extracted AT THIS TIME!!! PRESS ENTER WHEN FINISHED!"
I don't think raw_input sees the prompt string as a raw string, just the user input.
But this shouldn't affect the functionality of your program.
Are you on Unix or windows? I was under the impression that the on Unix you use / forward slash instead of \\ backslash as a separator.
I tested some code on this file:
http://simkin.asu.edu/geowall/mars/merpano0.tar.gz
The following code:
>>> from os import chdir
>>> import tarfile
>>> chdir(r'C:\Users\Acer\Downloads')
>>> tar_url = 'merpano0.tar.gz'
>>> print tar_url
merpano0.tar.gz
>>> tar = tarfile.open(tar_url, 'r')
>>> extract_path = 'C:\\Users\\Acer\\Downloads\\test\\'
>>> for item in tar:
tar.extract(item, extract_path)
executed cleanly with no problems on my end. In the test directory I got a single folder with some files, exactly as in the original tar file. Can you explain what you're doing differently in your code that might be bugging up?
can some one please provide me with an explanation of the code especially the use of maxversions and statements following the line "for f in files:".
I want to understand what xrange(MAXVERSION) means? What is the use of indexing i.e
for index in xrange(MAXVERSIONS): backup = '%s.%2.2d' % (destpath, index)
The code:
!/usr/bin/env python
import sys,os, shutil, filecmp
MAXVERSIONS=100
BAKFOLDER = '.bak'
def backup_files(tree_top, bakdir_name=BAKFOLDER):
top_dir = os.path.basename(tree_top)
tree_top += os.sep
for dir, subdirs, files in os.walk(tree_top):
if os.path.isabs(bakdir_name):
relpath = dir.replace(tree_top,'')
backup_dir = os.path.join(bakdir_name, top_dir, relpath)
else:
backup_dir = os.path.join(dir, bakdir_name)
if not os.path.exists(backup_dir):
os.makedirs(backup_dir)
subdirs[:] = [d for d in subdirs if d != bakdir_name]
for f in files:
filepath = os.path.join(dir, f)
destpath = os.path.join(backup_dir, f)
for index in xrange(MAXVERSIONS):
backup = '%s.%2.2d' % (destpath, index)
abspath = os.path.abspath(filepath)
if index > 0:
old_backup = '%s.%2.2d' % (destpath, index-1)
if not os.path.exists(old_backup): break
abspath = os.path.abspath(old_backup)
try:
if os.path.isfile(abspath) and filecmp.cmp(abspath, filepath, shallow=False):
continue
except OSError:
pass
try:
if not os.path.exists(backup):
print 'Copying %s to %s...' % (filepath, backup)
shutil.copy(filepath, backup)
except (OSError, IOError), e:
pass
if __name__=="__main__":
if len(sys.argv)<2:
sys.exit("Usage: %s [directory] [backup directory]" % sys.argv[0])
tree_top = os.path.abspath(os.path.expanduser(os.path.expandvars(sys.argv[1])))
if len(sys.argv)>=3:
bakfolder = os.path.abspath(os.path.expanduser(os.path.expandvars(sys.argv[2])))
else:
bakfolder = BAKFOLDER
if os.path.isdir(tree_top):
backup_files(tree_top, bakfolder)
The script tries to recursively copy the contents of a directory (defaults to current directory) to a backup directory (defaults to .bak in the current directory);
for each filename.ext, it creates a duplicate named filename.ext.00; if filename.ext.00 already exists, it creates filename.ext.01 instead, and so on.
xrange() is a generator which returns all numbers in 0..(MAXVERSION-1), so MAXVERSION controls how many version-suffixes to try, ie how many old versions of the file to keep.