Python script to recursively open .rar files in a directory - python

I'm writing this python script to recursively go through a directory and use the unrar utility on ubuntu to open all .rar files. For some reason it will enter the directory, list some contents, enter the first sub-directory, open one .rar file, then print all the other contents of the parent directory without going into them. Any ideas on why it would be doing this? I'm pretty new to python, so go easy.
"""Recursively unrars a directory"""
import subprocess
import os
def runrar(directory):
os.chdir(directory)
dlist = subprocess.check_output(["ls"])
for line in dlist.splitlines():
print(line)
if isRar(line):
print("unrar e " + directory + '/' + line)
arg = directory + '/' + line
subprocess.call(['unrar', 'e', arg])
if os.path.isdir(line):
print 'here'
runrar(directory + '/' + line)
def isRar(line):
var = line[-4:]
if os.path.isdir(line):
return False
if(var == ".rar"):
return True
return False
directory = raw_input("Please enter the full directory: ")
print(directory)
runrar(directory)

There are a lot of problems and clumsy uses of python in your code, I may not be able to list them. Examples:
using os.chdir
using output of ls to read a directory (very wrong and not portable on windows!)
recursivity can be avoided with os.walk
computing extension of a file is easier
checking twice if it's a directory...
my proposal:
for root,_,the_files in os.walk(path)
for f in the_files:
if f.lower().endswith(".rar"):
subprocess.call(['unrar', 'e', arg],cwd=root)
that loops through the files (and dirs, but we ignore them and we materialize it by putting the dir list into the _ variable, just a convention, but an effective one), and calls the command using subprocess which changes directory locally so unrar extracts the files in the directory where the archive is.
(In that typical os.walk loop, root is the directory and f is the filename (without path), just what we need to run a subprocess in the proper current dir, so no need for os.path.join to compute the full path)

Related

Is there a way to change your cwd in Python using a file as an input?

I have a Python program where I am calculating the number of files within different directories, but I wanted to know if it was possible to use a text file containing a list of different directory locations to change the cwd within my program?
Input: Would be a text file that has different folder locations that contains various files.
I have my program set up to return the total amount of files in a given folder location and return the amount to a count text file that will be located in each folder the program is called on.
You can use os module in Python.
import os
# dirs will store the list of directories, can be populated from your text file
dirs = []
text_file = open(your_text_file, "r")
for dir in text_file.readlines():
dirs.append(dir)
#Now simply loop over dirs list
for directory in dirs:
# Change directory
os.chdir(directory)
# Print cwd
print(os.getcwd())
# Print number of files in cwd
print(len([name for name in os.listdir(directory)
if os.path.isfile(os.path.join(directory, name))]))
Yes.
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
os.chdir(targetdir)
# Do your stuff here
os.chdir(start_dir)
Do bear in mind that if your program dies half way through it'll leave you in a different working directory to the one you started in, which is confusing for users and can occasionally be dangerous (especially if they don't notice it's happened and start trying to delete files that they expect to be there - they might get the wrong file). You might want to consider if there's a way to achieve what you want without changing the working directory.
EDIT:
And to suggest the latter, rather than changing directory use os.listdir() to get the files in the directory of interest:
import os
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
contents = os.listdir(targetdir)
numfiles = len(contents)
countfile = open(os.path.join(targetdir, "count.txt"), "w")
countfile.write(str(numfiles))
countfile.close()
Note that this will count files and directories, not just files. If you only want files then you'll have to go through the list returned by os.listdir checking whether each item is a file using os.path.isfile()

Python command to run function on directories specified in a text file

I have a text file called "folders" and in it lists directories that I want to run a function on (can include wildcards).
folders.txt:
Images*\ //Any folders that start with Images)
Music\Rap //The specific folder Music\Rap)
Video\Horror //The specific folder Video\Horror)
Python code:
directories = folders.readlines()
for lines in directories:
lines = lines.strip()
command= 'dir ' + RootPath + '\\' + lines + ' /B'
result = subprocess.check_output(command, shell=True)
for line in result.split('\r\n'):
Execute function
I decided to use the dir command cause that was the only way I knew how to handle wildcards. But running "dir c:\Music\Rap /B" would list all the files inside that directory rather than the directory itself. Is there a better way to do this?
The glob module contains methods for expanding wildcards according to the filesystem content. Example:
import glob
result = glob.glob( 'Images*' )

How to move in and out of folders in python

so I'm a rookie at programming and I'm trying to make a program in python that basically opens a text file with a bunch of columns and writes the data to 3 different text files based on a string in the row. As my program stands right now, I have it change the directory to a specific output folder using os.chdir so it can open my text file but what I want is it to do something like this:
Imagine a folder set up like this :
Source Folder contains N number of folders. Each of those folders contains N number of output folders. Each output folder contains 1 Results.txt.
The idea is to have the program start at the source folder, look into Folder 1, look for output 1, open the .txt file then do it's thing. Once it's done, it should go back to folder 1 and open output 2 and do it's thing again. Then it should go back to Folder 1 and if it can't find any more output folders, it needs to go to Folder A and then enter Folder 2 and repeat the process until there are no more folders. Honestly not sure where to really start with this, the best I could do is make a small program that prints all my .txt files but I'm not sure how to open them at all. Hope my question makes sense and thanks for the help.
If all you need is to process each file in a directory recursively:
import os
def process_dir(dir):
for subdir, dirs, files in os.walk(dir):
for file in files:
file_path = os.path.join(subdir, file)
print file_path
# process file here
This will process each file in the root dir recursively. If you're looking for conditional iteration you might need to make the loop a little smarter.
Read the base folder path and stored into variable and move to sub folder and process the text file using chdir and base path change the directory and read the sub folder once again.
dirlist = os.listdir(os.getcwd())
dirlist = filter(lambda x: os.path.isdir(x), filelist)
for dirname in dirlist:
print os.path.join(os.getcwd(),dirname,'Results.txt')
first, i think you could format your question for better reading.
Concerning your question, here's a naïve implementation example :
import os
where = "J:/tmp/"
what = "Results.txt"
def processpath(where, name):
for elem in os.listdir(where):
elempath = os.path.join(where,elem)
if (elem == name):
# Do something with your file
f = open(elempath, "w") # example
f.write("modified2") # example
elif(os.path.isdir(elempath)):
processpath(elempath, name)
processpath(where, what)
I would do this without chdir. The most straight forward solution to me is to use os.listdir and filter the results. Then os.path.join to construct complete relative paths instead of chdir. I suspect this would be less prone to bugs such as winding up in an unexpected current working directory where all your relative paths are then wrong.
nfolders = [d for d in os.listdir(".") if re.match("^Folder [0-9]+$", d)]
for f1 in nfolders:
noutputs = [d for d in os.listdir(f1) if re.match("^Output [0-9]+$", d)]
for f2 in noutputs:
resultsFilename = os.path.join(f1, f2, "results.txt")
#do whatever with resultsFilename

Python - empty dirs & subdirs after a shutil.copytree function

This is part of a program I'm writing. The goal is to extract all the GPX files, say at G:\ (specified with -e G:\ at the command line). It would create an 'Exports' folder and dump all files with matching extensions there, recursively that is. Works great, a friend helped me write it!! Problem: empty directories and subdirectories for dirs that did not contain GPX files.
import argparse, shutil, os
def ignore_list(path, files): # This ignore list is specified in the function below.
ret = []
for fname in files:
fullFileName = os.path.normpath(path) + os.sep + fname
if not os.path.isdir(fullFileName) \
and not fname.endswith('gpx'):
ret.append(fname)
elif os.path.isdir(fullFileName) \ # This isn't doing what it's supposed to.
and len(os.listdir(fullFileName)) == 0:
ret.append(fname)
return ret
def gpxextract(src,dest):
shutil.copytree(src,dest,ignore=ignore_list)
Later in the program we have the call for extractpath():
if args.extractpath:
path = args.extractpath
gpxextract(extractpath, 'Exports')
So the above extraction does work. But the len function call above is designed to prevent the creation of empty dirs and does not. I know the best way is to os.rmdir somehow after the export, and while there's no error, the folders remain.
So how can I successfully prune this Exports folder so that only dirs with GPXs will be in there? :)
If I understand you correctly, you want to delete empty folders? If that is the case, you can do a bottom up delete folder operation -- which will fail for any any folders that are not empty. Something like:
for root, dirs, files in os.walk('G:/', topdown=true):
for dn in dirs:
pth = os.path.join(root, dn)
try:
os.rmdir(pth)
except OSError:
pass

Visiting multiple folders with extensions

I'm working on something here, and I'm completely confused. Basically, I have the script in my directory, and that script has to run on multiple folders with a particular extension. Right now, I have it up and running on a single folder. Here's the structure, I have a main folder say, Python, inside that I have multiple folders all with the same .ext, and inside each sub-folder I again have few folders, inside which I have the working file.
Now, I want the script to visit the whole path say, we are inside the main folder 'python', inside which we have folder1.ext->sub-folder1->working-file, come out of this again go back to the main folder 'Python' and start visiting the second directory.
Now there are so many things in my head, the glob module, os.walk, or the for loop. I'm getting the logic wrong. I desperately need some help.
Say, Path=r'\path1'
How do I start about? Would greatly appreciate any help.
I'm not sure if this is what you want, but this main function with a recursive helper function gets a dictionary of all of the files in a main directory:
import os, os.path
def getFiles(path):
'''Gets all of the files in a directory'''
sub = os.listdir(path)
paths = {}
for p in sub:
print p
pDir = os.path.join(path, p)
if os.path.isdir(pDir):
paths.update(getAllFiles(pDir, paths))
else:
paths[p] = pDir
return paths
def getAllFiles(mainPath, paths = {}):
'''Helper function for getFiles(path)'''
subPaths = os.listdir(mainPath)
for path in subPaths:
pathDir = os.path.join(path, p)
if os.path.isdir(pathDir):
paths.update(getAllFiles(pathDir, paths))
else:
paths[path] = pathDir
return paths
This returns a dictionary of the form {'my_file.txt': 'C:\User\Example\my_file.txt', ...}.
Since you distinguish first level directories from its sub-directories, you could do something like this:
# this is a generator to get all first level directories
dirs = (d for d in os.listdir(my_path) if os.path.isdir(d)
and os.path.splitext(d)[-1] == my_ext)
for d in dirs:
for root, sub_dirs, files in os.walk(d):
for f in files:
# call your script on each file f
You could use Formic (disclosure: I am the author). Formic allows you to specify one multi-directory glob to match your files so eliminating directory walking:
import formic
fileset = formic.FileSet(include="*.ext/*/working-file", directory=r"path1")
for file_name in fileset:
# Do something with file_name
A couple of points to note:
/*/ matches every subdirectory, while /**/ recursively descends into every subdirectory, their subdirectories and so on. Some options:
If the working file is precisely one directory below your *.ext, then use /*/
If the working file is at any depth under *.ext, then use /**/ instead.
If the working file is at least one directory, then you might use /*/**/
Formic starts searching in the current working directory. If this is the correct directory, you can omit the directory=r"path1"
I am assuming the working file is literally called working-file. If not, substitute a glob that matches it, like *.sh or script-*.

Categories

Resources