I've got a bunch of files and a few folders. I'm trying to append the zips to a list so I can extract those files in other part of the code. It never finds the zips.
for file in os.listdir(path):
print(file)
if file.split(".")[1] == 'zip':
reg_zips.append(file)
The path is fine or it wouldn't print out anything. It picks up the same files each time but will not pick up any others. It picks up about 1/5th of the files in the directory.
At a complete loss. I've made sure that some weird race condition with the file availability isn't the problem by putting a time.sleep(3) in the code. Didn't solve it.
It's possible your files have more than one period in them. Try using str.endswith:
reg_zips = []
for file in os.listdir(path):
if file.endswith('zip'):
reg_zips.append(file)
Another good idea (thanks, Jean-François Fabre!) is to use os.path.splitext, which handles the extension quite nicely:
if os.path.splitext(file)[-1] == '.zip':
...
Am even better solution, I recommend with the glob.glob function:
import glob
reg_zips = glob.glob('*.zip')
reg_zips = [z for z in os.listdir(path) if z.endswith("zip")]
Related
I would be very grateful indeed for some help for a frustrated and confused Python beginner.
I am trying to create a script that searches a windows directory containing multiple subdirectories and different file types for a specific single string (a name) in the file contents and if found prints the filenames as a list. There are approximately 2000 files in 100 subdirectories, and all the files I want to search don't necessarily have the same extension - but are all in essence, ASCII files.
I've been trying to do this for many many days but I just cannot figure it out.
So far I have tried using glob recursive coupled with reading the file but I'm so very bewildered. I can successfully print a list of all the files in all subdirectories, but don't know where to go from here.
import glob
files = []
files = glob.glob('C:\TEMP' + '/**', recursive=True)
print(files)
Can anyone please help me? I am 72 year old scientist trying to improve my skills and "automate the boring stuff", but at the moment I'm just losing the will.
Thank you very much in advance to this community.
great to have you here!
What you have done so far is found all the file paths, now the simplest way is to go through each of the files, read them into the memory one by one and see if the name you are looking for is there.
import glob
files = glob.glob('C:\TEMP' + '/**', recursive=True)
target_string = 'John Smit'
# itereate over files
for file in files:
try:
# open file for reading
with open(file, 'r') as f:
# read the contents
contents = f.read()
# check if contents have your target string
if target_string in conents:
print(file)
except:
pass
This will print the file path each time it found a name.
Please also note I have removed the second line from your code, because it is redundant, you initiate the list in line 3 anyway.
Hope it helps!
You could do it like this, though i think there must be a better approach
When you find all files in your directory, you iterate over them and check if they contain that specific string.
for file in files:
if(os.path.isfile(file)):
with open(file,'r') as f:
if('search_string' in f.read()):
print(file)
Ok I'm going to try this again, apologies for my poor effort in my pervious question.
I am writing a program in Java and I wanted to move some files from one directory to another based on whether they appear in a list. I could do it manually but there a thousands of files in the directory so it would be an arduous task and I need to repeat it several times! I tried to do it in Java but because I am using Java it appears I cannot use java.nio, and I am not allowed to use external libraries.
So I have tried to write something in python.
import os
import shutil
with open('files.txt', 'r') as f:
myNames = [line.strip() for line in f]
print myNames
dir_src = "trainfile"
dir_dst = "train"
for file in os.listdir(dir_src):
print file # testing
src_file = os.path.join(dir_src, file)
dst_file = os.path.join(dir_dst, file)
shutil.move(src_file, dst_file)
"files.txt" is in the format:
a.txt
edfs.txt
fdgdsf.txt
and so on.
So at the moment it is moving everything from train to trainfile, but I need to only move files if the are in the myNames list.
Does anyone have any suggestions?
check whether the file name exists in the myNames list
put it before shutil.move
if file in myNames:
so at the moment it is moving everything from train to trainfile, but ii need to only move files if the are in the myName list
You can translate that "if they are in the myName list" directly from English to Python:
if file in myNames:
shutil.move(src_file, dst_file)
And that's it.
However, you probably want a set of names, rather than a list. It makes more conceptual sense. It's also more efficient, although the speed of looking things up in a list will probably be negligible compared to the cost of copying files around. Anyway, to do that, you need to change one more line:
myNames = {line.strip() for line in f}
And then you're done.
So I'm trying to use glob.glob to to go through a user-given path to Python and then finding all subsequent folders for a .txt. Currently, below, is what I have but it doesn't seem to work.
fileName = raw_input("> ")
listOfTxt = []
for files in glob.glob(os.path.join(fileName, "\\Folder\\*\\*\\*.txt")):
listOfTxt.append(files) # add it to the list
I'm not sure how to get this to work, or I'm just not understanding the glob.glob with os.path.
If you simply want a list of all .txt files in os.path.join(user_dir,'Folder'), then os.walk is probably a better way to go:
import os
user_dir = os.path.join(raw_input('> '),'Folder')
file_list = [ ]
for dirpath,_,filenames in os.path.walk(user_dir):
for name in filenames:
if name.endswith('.txt'):
file_list.append(os.path.join(user_dir,dirpath,name))
you join the given name with a path, which is strange and probably not what you want.
You probably want something like
glob.glob(os.path.join("\\Folder\\*\\*\\",fileName+".txt"))
?
Im rather new to python but I have been attemping to learn the basics.
Anyways I have several files that once i have extracted from their zip files (painfully slow process btw) produce several hundred subdirectories with 2-3 files in each. Now what I want to do is extract all those files ending with 'dem.tif' and place them in a seperate file (move not copy).
I may have attempted to jump into the deep end here but the code i've written runs without error so it must not be finding the files (that do exist!) as it gives me the else statement. Here is the code i've created
import os
src = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Extracted' # input
dst = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Analyses' # desired location
def move():
for (dirpath, dirs, files) in os.walk(src):
if files.endswith('dem.tif'):
shutil.move(os.path.join(src,files),dst)
print ('Moving ', + files, + ' to ', + dst)
else:
print 'No Such File Exists'
First, welcome to the community, and python! You might want to change your user name, especially if you frequent here. :)
I suggest the following (stolen from Mr. Beazley):
# genfind.py
#
# A function that generates files that match a given filename pattern
import os
import shutil
import fnmatch
def gen_find(filepat,top):
for path, dirlist, filelist in os.walk(top):
for name in fnmatch.filter(filelist,filepat):
yield os.path.join(path,name)
# Example use
if __name__ == '__main__':
src = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Extracted' # input
dst = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Analyses' # desired location
filesToMove = gen_find("*dem.tif",src)
for name in filesToMove:
shutil.move(name, dst)
I think you've mixed up the way you should be using os.walk().
for dirpath, dirs, files in os.walk(src):
print dirpath
print dirs
print files
for filename in files:
if filename.endswith('dem.tif'):
shutil.move(...)
else:
...
Update: the questioner has clarified below that he / she is actually calling the move function, which was the first point in my answer.
There are a few other things to consider:
You've got the order of elements returned in each tuple from os.walk wrong, I'm afraid - check the documentation for that function.
Assuming you've fixed that, also bear in mind that you need to iterate over files, and you need to os.join each of those to root, rather than src
The above would be obvious, hopefully, if you print out the values returned by os.walk and comment out the rest of the code in that loop.
With code that does potentially destructive operations like moving files, I would always first try some code that just prints out the parameters to shutil.move until you're sure that it's right.
Any particular reason you need to do it in Python? Would a simple shell command not be simpler? If you're on a Unix-like system, or have access to Cygwin on Windows:
find src_dir -name "*dem.tif" -exec mv {} dst_dir
I am fairly new to Python and I am trying to figure out the most efficient way to count the number of .TIF files in a particular sub-directory.
Doing some searching, I found one example (I have not tested), which claimed to count all of the files in a directory:
file_count = sum((len(f) for _, _, f in os.walk(myPath)))
This is fine, but I need to only count TIF files. My directory will contain other files types, but I only want to count TIFs.
Currently I am using the following code:
tifCounter = 0
for root, dirs, files in os.walk(myPath):
for file in files:
if file.endswith('.tif'):
tifCounter += 1
It works fine, but the looping seems to be excessive/expensive to me. Any way to do this more efficiently?
Thanks.
Something has to iterate over all files in the directory, and look at every single file name - whether that's your code or a library routine. So no matter what the specific solution, they will all have roughly the same cost.
If you think it's too much code, and if you don't actually need to search subdirectories recursively, you can use the glob module:
import glob
tifCounter = len(glob.glob1(myPath,"*.tif"))
For this particular use case, if you don't want to recursively search in the subdirectory, you can use os.listdir:
len([f for f in os.listdir(myPath)
if f.endswith('.tif') and os.path.isfile(os.path.join(myPath, f))])
Your code is fine.
Yes, you're going to need to loop over those files to filter out the .tif files, but looping over a small in-memory array is negligible compared to the work of scanning the file directory to find these files in the first place, which you have to do anyway.
I wouldn't worry about optimizing this code.
If you do need to search recursively, or for some other reason don't want to use the glob module, you could use
file_count = sum(len(f for f in fs if f.lower().endswith('.tif')) for _, _, fs in os.walk(myPath))
This is the "Pythonic" way to adapt the example you found for your purposes. But it's not going to be significantly faster or more efficient than the loop you've been using; it's just a really compact syntax for more or less the same thing.
try using fnmatch
https://docs.python.org/2/library/fnmatch.html
import fnmatch,os
num_files = len(fnmatch.filter(os.listdir(your_dir),'*.tif'))
print(num_files)