Im rather new to python but I have been attemping to learn the basics.
Anyways I have several files that once i have extracted from their zip files (painfully slow process btw) produce several hundred subdirectories with 2-3 files in each. Now what I want to do is extract all those files ending with 'dem.tif' and place them in a seperate file (move not copy).
I may have attempted to jump into the deep end here but the code i've written runs without error so it must not be finding the files (that do exist!) as it gives me the else statement. Here is the code i've created
import os
src = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Extracted' # input
dst = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Analyses' # desired location
def move():
for (dirpath, dirs, files) in os.walk(src):
if files.endswith('dem.tif'):
shutil.move(os.path.join(src,files),dst)
print ('Moving ', + files, + ' to ', + dst)
else:
print 'No Such File Exists'
First, welcome to the community, and python! You might want to change your user name, especially if you frequent here. :)
I suggest the following (stolen from Mr. Beazley):
# genfind.py
#
# A function that generates files that match a given filename pattern
import os
import shutil
import fnmatch
def gen_find(filepat,top):
for path, dirlist, filelist in os.walk(top):
for name in fnmatch.filter(filelist,filepat):
yield os.path.join(path,name)
# Example use
if __name__ == '__main__':
src = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Extracted' # input
dst = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Analyses' # desired location
filesToMove = gen_find("*dem.tif",src)
for name in filesToMove:
shutil.move(name, dst)
I think you've mixed up the way you should be using os.walk().
for dirpath, dirs, files in os.walk(src):
print dirpath
print dirs
print files
for filename in files:
if filename.endswith('dem.tif'):
shutil.move(...)
else:
...
Update: the questioner has clarified below that he / she is actually calling the move function, which was the first point in my answer.
There are a few other things to consider:
You've got the order of elements returned in each tuple from os.walk wrong, I'm afraid - check the documentation for that function.
Assuming you've fixed that, also bear in mind that you need to iterate over files, and you need to os.join each of those to root, rather than src
The above would be obvious, hopefully, if you print out the values returned by os.walk and comment out the rest of the code in that loop.
With code that does potentially destructive operations like moving files, I would always first try some code that just prints out the parameters to shutil.move until you're sure that it's right.
Any particular reason you need to do it in Python? Would a simple shell command not be simpler? If you're on a Unix-like system, or have access to Cygwin on Windows:
find src_dir -name "*dem.tif" -exec mv {} dst_dir
Related
Here is the below code we have developed for single directory of files
from os import listdir
with open("/user/results.txt", "w") as f:
for filename in listdir("/user/stream"):
with open('/user/stream/' + filename) as currentFile:
text = currentFile.read()
if 'checksum' in text:
f.write('current word in ' + filename[:-4] + '\n')
else:
f.write('NOT ' + filename[:-4] + '\n')
I want loop for all directories
Thanks in advance
If you're using UNIX you can use grep:
grep "checksum" -R /user/stream
The -R flag allows for a recursive search inside the directory, following the symbolic links if there are any.
My suggestion is to use glob.
The glob module allows you to work with files. In the Unix universe, a directory is / should be a file so it should be able to help you with your task.
More over, you don't have to install anything, glob comes with python.
Note: For the following code, you will need python3.5 or greater
This should help you out.
import os
import glob
for path in glob.glob('/ai2/data/prod/admin/inf/**', recursive=True):
# At some point, `path` will be `/ai2/data/prod/admin/inf/inf_<$APP>_pvt/error`
if not os.path.isdir(path):
# Check the `id` of the file
# Do things with the file
# If there are files inside `/ai2/data/prod/admin/inf/inf_<$APP>_pvt/error` you will be able to access them here
What glob.glob does is, it Return a possibly-empty list of path names that match pathname. In this case, it will match every file (including directories) in /user/stream/. If these files are not directories, you can do whatever you want with them.
I hope this will help you!
Clarification
Regarding your 3 point comment attempting to clarify the question, especially this part we need to put appi dynamically in that path then we need to read all files inside that directory
No, you do not need to do this. Please read my answer carefully and please read glob documentation.
In this case, it will match every file (including directories) in /user/stream/
If you replace /user/stream/ with /ai2/data/prod/admin/inf/, you will have access to every file in /ai2/data/prod/admin/inf/. Assuming your app ids are 1, 2, 3, this means, you will have access to the following files.
/ai2/data/prod/admin/inf/inf_1_pvt/error
/ai2/data/prod/admin/inf/inf_2_pvt/error
/ai2/data/prod/admin/inf/inf_3_pvt/error
You do not have to specify the id, because you will be iterating over all files. If you do need the id, you can just extract it from the path.
If everything looks like this, /ai2/data/prod/admin/inf/inf_<$APP>_pvt/error, you can get the id by removing /ai2/data/prod/admin/inf/ and taking everything until you encounter _.
I'm using Glob.Glob to search a folder, and the sub-folders there in for all the invoices I have. To simplify that I'm going to add the program to the context menu, and have it take the path as the first part of,
import glob
for filename in glob.glob(path + "/**/*.pdf", recursive=True):
print(filename)
I'll have it keep the list and send those files to a Printer, in a later version, but for now just writing the name is a good enough test.
So my question is twofold:
Is there anything fundamentally wrong with the way I'm writing this?
Can anyone point me in the direction of how to actually capture folder path and provide it as path-variable?
You should have a look at this question: Python script on selected file. It shows how to set up a "Sent To" command in the context menu. This command calls a python script an provides the file name sent via sys.argv[1]. I assume that also works for a directory.
I do not have Python3.5 so that I can set the flag recursive=True, so I prefer to provide you a solution which you can run on any Python version (known up to day).
The solution consists in using calling os.walk() to run explore the directories and the set build-in type.
it is better to use set instead of list as with this later one you'll need more code to check if the directory you want to add is not listed already.
So basically you can keep two sets: one for the names of files you want to print and the other one for the directories and their sub folders.
So you can adapat this solution to your class/method:
import os
path = '.' # Any path you want
exten = '.pdf'
directories_list = set()
files_list = set()
# Loop over direcotries
for dirpath, dirnames, files in os.walk(path):
for name in files:
# Check if extension matches
if name.lower().endswith(exten):
files_list.add(name)
directories_list.add(dirpath)
You can then loop over directories_list and files_list to print them out.
so I'm a rookie at programming and I'm trying to make a program in python that basically opens a text file with a bunch of columns and writes the data to 3 different text files based on a string in the row. As my program stands right now, I have it change the directory to a specific output folder using os.chdir so it can open my text file but what I want is it to do something like this:
Imagine a folder set up like this :
Source Folder contains N number of folders. Each of those folders contains N number of output folders. Each output folder contains 1 Results.txt.
The idea is to have the program start at the source folder, look into Folder 1, look for output 1, open the .txt file then do it's thing. Once it's done, it should go back to folder 1 and open output 2 and do it's thing again. Then it should go back to Folder 1 and if it can't find any more output folders, it needs to go to Folder A and then enter Folder 2 and repeat the process until there are no more folders. Honestly not sure where to really start with this, the best I could do is make a small program that prints all my .txt files but I'm not sure how to open them at all. Hope my question makes sense and thanks for the help.
If all you need is to process each file in a directory recursively:
import os
def process_dir(dir):
for subdir, dirs, files in os.walk(dir):
for file in files:
file_path = os.path.join(subdir, file)
print file_path
# process file here
This will process each file in the root dir recursively. If you're looking for conditional iteration you might need to make the loop a little smarter.
Read the base folder path and stored into variable and move to sub folder and process the text file using chdir and base path change the directory and read the sub folder once again.
dirlist = os.listdir(os.getcwd())
dirlist = filter(lambda x: os.path.isdir(x), filelist)
for dirname in dirlist:
print os.path.join(os.getcwd(),dirname,'Results.txt')
first, i think you could format your question for better reading.
Concerning your question, here's a naïve implementation example :
import os
where = "J:/tmp/"
what = "Results.txt"
def processpath(where, name):
for elem in os.listdir(where):
elempath = os.path.join(where,elem)
if (elem == name):
# Do something with your file
f = open(elempath, "w") # example
f.write("modified2") # example
elif(os.path.isdir(elempath)):
processpath(elempath, name)
processpath(where, what)
I would do this without chdir. The most straight forward solution to me is to use os.listdir and filter the results. Then os.path.join to construct complete relative paths instead of chdir. I suspect this would be less prone to bugs such as winding up in an unexpected current working directory where all your relative paths are then wrong.
nfolders = [d for d in os.listdir(".") if re.match("^Folder [0-9]+$", d)]
for f1 in nfolders:
noutputs = [d for d in os.listdir(f1) if re.match("^Output [0-9]+$", d)]
for f2 in noutputs:
resultsFilename = os.path.join(f1, f2, "results.txt")
#do whatever with resultsFilename
I need to os.walk from my parent path (tutu), by all subfolders. For each one, each of the deepest subfolders have the files that i need to process with my code. For all the deepest folders that have files, the file 'layout' is the same: one file *.adf.txt, one file *.idf.txt, one file *.sdrf.txt and one or more files *.dat., as pictures shown.
My problem is that i don't know how to use the os module to iterate, from my parent folder, to all subfolders sequentially. I need a function that, for the current subfolder in os.walk, if that subfolder is empty, continue to the sub-subfolder inside that subfolder, if it exists. If exists, then verify if that file layout is present (this is no problem...), and if it is, then apply the code (no problem too). If not, and if that folder don't have more sub-folders, return to the parent folder and os.walk to the next subfolder, and this for all subfolders into my parent folder (tutu). To resume, i need some function like that below (written in python/imaginary code hybrid):
for all folders in tutu:
if os.havefiles in os.walk(current_path):#the 'havefiles' don´t exist, i think...
for filename in os.walk(current_path):
if 'adf' in filename:
etc...
#my code
elif:
while true:
go deep
else:
os.chdir(parent_folder)
Do you think that is best a definition to call in my code to do the job?
this is the code that i've tried to use, without sucess, of course:
import csv
import os
import fnmatch
abs_path=os.path.abspath('.')
for dirname, subdirs, filenames in os.walk('.'):
# print path to all subdirectories first.
for subdirname in subdirs:
print os.path.join(dirname, subdirname), 'os.path.join(dirname, subdirname)'
current_path= os.path.join(dirname, subdirname)
os.chdir(current_path)
for filename in os.walk(current_path):
print filename, 'f in os.walk'
if os.path.isdir(filename)==True:
break
elif os.path.isfile(filename)==True:
print filename, 'file'
#code here
Thanks in advance...
I need a function that, for the current subfolder in os.walk, if that subfolder is empty, continue to the sub-subfolder inside that subfolder, if it exists.
This doesn't make any sense. If a folder is empty, it doesn't have any subfolders.
Maybe you mean that if it has no regular files, then recurse into its subfolders, but if it has any, don't recurse, and instead check the layout?
To do that, all you need is something like this:
for dirname, subdirs, filenames in os.walk('.'):
if filenames:
# can't use os.path.splitext, because that will give us .txt instead of .adf.txt
extensions = collections.Counter(filename.partition('.')[-1]
for filename in filenames)
if (extensions['.adf.txt'] == 1 and extensions['.idf.txt'] == 1 and
extensions['.sdrf.txt'] == 1 and extensions['.dat'] >= 1 and
len(extensions) == 4):
# got a match, do what you want
# Whether this is a match or not, prune the walk.
del subdirs[:]
I'm assuming here that you only want to find directories that have exactly the specified files, and no others. To remove that last restriction, just remove the len(extensions) == 4 part.
There's no need to explicitly iterate over subdirs or anything, or recursively call os.walk from inside os.walk. The whole point of walk is that it's already recursively visiting every subdirectory it finds, except when you explicitly tell it not to (by pruning the list it gives you).
os.walk will automatically "dig down" recursively, so you don't need to recurse the tree yourself.
I think this should be the basic form of your code:
import csv
import os
import fnmatch
directoriesToMatch = [list here...]
filenamesToMatch = [list here...]
abs_path=os.path.abspath('.')
for dirname, subdirs, filenames in os.walk('.'):
if len(set(directoriesToMatch).difference(subdirs))==0: # all dirs are there
if len(set(filenamesToMatch).difference(filenames))==0: # all files are there
if <any other filename/directory checking code>:
# processing code here ...
And according to the python documentation, if you for whatever reason don't want to continue recursing, just delete entries from subdirs:
http://docs.python.org/2/library/os.html
If you instead want to check that there are NO sub-directories where you find your files to process, you could also change the dirs check to:
if len(subdirs)==0: # check that this is an empty directory
I'm not sure I quite understand the question, so I hope this helps!
Edit:
Ok, so if you need to check there are no files instead, just use:
if len(filenames)==0:
But as I stated above, it would probably be better to just look FOR specific files instead of checking for empty directories.
I'm working on something here, and I'm completely confused. Basically, I have the script in my directory, and that script has to run on multiple folders with a particular extension. Right now, I have it up and running on a single folder. Here's the structure, I have a main folder say, Python, inside that I have multiple folders all with the same .ext, and inside each sub-folder I again have few folders, inside which I have the working file.
Now, I want the script to visit the whole path say, we are inside the main folder 'python', inside which we have folder1.ext->sub-folder1->working-file, come out of this again go back to the main folder 'Python' and start visiting the second directory.
Now there are so many things in my head, the glob module, os.walk, or the for loop. I'm getting the logic wrong. I desperately need some help.
Say, Path=r'\path1'
How do I start about? Would greatly appreciate any help.
I'm not sure if this is what you want, but this main function with a recursive helper function gets a dictionary of all of the files in a main directory:
import os, os.path
def getFiles(path):
'''Gets all of the files in a directory'''
sub = os.listdir(path)
paths = {}
for p in sub:
print p
pDir = os.path.join(path, p)
if os.path.isdir(pDir):
paths.update(getAllFiles(pDir, paths))
else:
paths[p] = pDir
return paths
def getAllFiles(mainPath, paths = {}):
'''Helper function for getFiles(path)'''
subPaths = os.listdir(mainPath)
for path in subPaths:
pathDir = os.path.join(path, p)
if os.path.isdir(pathDir):
paths.update(getAllFiles(pathDir, paths))
else:
paths[path] = pathDir
return paths
This returns a dictionary of the form {'my_file.txt': 'C:\User\Example\my_file.txt', ...}.
Since you distinguish first level directories from its sub-directories, you could do something like this:
# this is a generator to get all first level directories
dirs = (d for d in os.listdir(my_path) if os.path.isdir(d)
and os.path.splitext(d)[-1] == my_ext)
for d in dirs:
for root, sub_dirs, files in os.walk(d):
for f in files:
# call your script on each file f
You could use Formic (disclosure: I am the author). Formic allows you to specify one multi-directory glob to match your files so eliminating directory walking:
import formic
fileset = formic.FileSet(include="*.ext/*/working-file", directory=r"path1")
for file_name in fileset:
# Do something with file_name
A couple of points to note:
/*/ matches every subdirectory, while /**/ recursively descends into every subdirectory, their subdirectories and so on. Some options:
If the working file is precisely one directory below your *.ext, then use /*/
If the working file is at any depth under *.ext, then use /**/ instead.
If the working file is at least one directory, then you might use /*/**/
Formic starts searching in the current working directory. If this is the correct directory, you can omit the directory=r"path1"
I am assuming the working file is literally called working-file. If not, substitute a glob that matches it, like *.sh or script-*.