Matching specific file names - python

I have some code that looks for net script files eg: ifcfg-eth0 etc. The code currently uses the match function available in Augeas to get all the files in the directory e.g.:
augeas.match("/files/etc/sysconfig/network-scripts/*")
However this code is matching files such as ifcfg-eth0.bak which is not a valid file for my needs. I want to match only the network scripts ranging from eth0 to eth7 (and no backup files etc). What would be a good approach to match only the correct files?

I was able to meet my requirements using the following code:
files = []
for i in range(8):
try:
filename = augeas.match('/files/etc/sysconfig/network-scripts/ifcfg-eth' + str(i))[0]
files.append(filename)
except:
continue
print files

If your absolutely sure you don't want any files that have and extension you could try this.
augeas.match('etc/sysconfig/network-scripts/*[regexp("[\w-]")]')
Editied to add quotes as mentioned below.

Related

Missing a file with os.walk()

I'm trying to make a file listing tool for a colleague. The code is quite simple :
source = C:\Users\Documents\test
extension = '.txt'
file_list = []
lower_levels = False
for root, dirs, files in os.walk(source):
for n in files:
if n.endswith(extension):
file_list.append(n)
if (not lower_levels): #does not check lower levels
break
writing_in_excel(source, file_list) #output is an excel file
When testing it on my test folder, it works pretty fine, I get all my 121 files listed in the output.
However, when my colleague tries it, one file is missing compared to the number of files given by Windows (I verified, windows indicates 39735 files wih the right extension, for 39734 in the excel file) and given the number of files, it's hard to find out which file is missing.
The problem doesn't seem to come from the writing in excel, since I write the total number of files with len(file_list), and can already see that the file is missing in the list. I guess it comes from the walking in the directory ??
Does anyone know where the problem could come from ?
Thanks
There's probably an error that os.walk does not show by default. It can be handeled by setting the onerror parameter. Write an error handler:
def walk_error(error):
print(error.filename)
Then change your call to:
for root, dirs, files in os.walk(source, onerror=walk_error):
Looks like the problem came from the extension condition, one of the file had the extension in caps.
so I just replaced
if n.endswith(extension):
By
ext = os.path.splitext(n)[-1].lower() #get the current file extension in lower case
if ext== extension:
And it works !
Thank you for your help.

prevent getfiles from seeing .DS and other hidden files

I am currently working on a python project on my macintosh, and from time to time I get unexpected errors, because .DS or other files, which are not visible to the "not-root-user" are found in folders. I am using the following command
filenames = getfiles.getfiles(file_directory)
to retreive information about the amount and name of the files in a folder. So I was wondering, if there is a possibility to prevent the getfiles command to see these types of files, by for example limiting its right or the extensions which it can see (all files are of .txt format)
Many thanks in advance!
In your case, I would recommend you switch to the Python standard library glob.
In your case, if all files are of .txt format, and they are located in the directory of /sample/directory/, you can use the following script to get the list of files you wanted.
from glob import glob
filenames = glob.glob("/sample/directory/*.txt")
You can easily use regular expressions to match files and filter out files you do not need. More details can be found from Here.
Keep in mind that with regular expression, you could do much more complicated pattern matching than the above example to handle your future needs.
Another good example of using glob to glob multiple extensions can be found from Here.
If you only want to get the basenames of those files, you can always use standard library os to extract basenames from the full paths.
import os
file_basenames = [os.path.basename(full_path) for full_path in filenames]
There isn't an option to filter within getfiles, but you could filter the list after.
Most-likely you will want to skip all "dot files" ("system files", those with a leading .), which you can accomplish with code like the following.
filenames = [f for f in ['./.a', './b'] if not os.path.basename(f).startswith('.')]
Welcome to Stackoverflow.
You might find the glob module useful. The glob.glob function takes a path including wildcards and returns a list of the filenames that match.
This would allow you to either select the files you want, like
filenames = glob.glob(os.path.join(file_directory, "*.txt")
Alternatively, select the files you don't want, and ignore them:
exclude_files = glob.glob(os.path.join(file_directory, ".*"))
for filename in getfiles.getfiles(file_directory):
if filename in exclude_files:
continue
# process the file

Check if there are .format files in a directory

I have been trying to figure out for a while how to check if there are .pkl files in a given directory. I checked the website and I could find ways to find if there are files in the directory and list them, but I just want to check if they are there.
In my directory are a total of 7 .pkl files, as soon as I create one, the others are created so to check if the seven of them exist, it will be enough to check if one exists. Therefore, I would like to check if there is any .pkl file.
This is working if I do:
os.path.exists('folder1/folder2/filename.pkl')
But I had to write one of my file names. I would like to do so without searching for a specific file. I also tried
os.path.exists('folder1/folder2/*.pkl'),
but it is not working neither as I don't have any file named *.pkl.
You can use the python module glob (https://docs.python.org/3/library/glob.html)
Specifically, glob.glob('folder1/folder2/*.pkl') will return a list of all .pkl files in folder2.
You can use :
for dir_path, dir_names, file_names in os.walk(search_dir):
# Go over all files and folders
for file_name in file_names:
if (file_name.endswith(".pkl")):
# do something like break after the first one you find
Note : This can be used if you want to search entire directory with sub directories also
In case you want to search only one directory , you can run the "for" on os.listdir(path)

Using a list to find and move specific files - python 2.7

I've seen a lot of people asking questions about searching through folders and creating a list of files, but I haven't found anything that has helped me do the opposite.
I have a csv file with a list of files and their extensions (xxx0.laz, xxx1.laz, xxx2.laz, etc). I need to read through this list and then search through a folder for those files. Then I need to move those files to another folder.
So far, I've taken the csv and created a list. At first I was having trouble with the list. Each line had a "\n" at the end, so I removed those. From the only other example I've found... [How do I find and move certain files based on a list in excel?. So I created a set from the list. However, I'm not really sure why or if I need it.
So here's what I have:
id = open('file.csv','r')
list = list(id)
list_final = ''.join([item.rstrip('\n') for item in list])
unique_identifiers = set(list_final)
os.chdir(r'working_dir') # I set this as the folder to look through
destination_folder = 'folder_loc' # Folder to move files to
for identifier in unique_identifiers:
for filename in glob.glob('%s_*' % identifier)"
shutil.move(filename, destination_folder)
I've been wondering about this ('%s_*' % identifier) with the glob function. I haven't found any examples with this, perhaps that needs to be changed?
When I do all that, I don't get anything. No errors and no actual files moved...
Perhaps I'm going about this the wrong way, but that is the only thing I've found so far anywhere.
its really not hard:
for fname in open("my_file.csv").read().split(","):
shutil.move(fname.strip(),dest_dir)
you dont need a whole lot of things ...
also if you just want all the *.laz files in a source directory you dont need a csv at all ...
for fname in glob.glob(os.path.join(src_dir,"*.laz")):
shutil.move(fname,dest_dir)

Python, can i get a return true or false, if a file TYPE exists or not?

ive read through the path.exists() and path.isdir() questions on here, but none that ive found so far, deal with check if a particular file type exists in a directory or not... maybe im not searching the correct terms for this.
basically, i want to poll a set of folders to see if there are txt files there... if they are there, i will want to run a string of pexpect commands (each with different usernames/passwords) to put *.txt them to a remote server location. I have my put and get pexpect script working already.
I tried using a wildcard such as this, but of course no such luck.
>>> print(os.path.exists("/mnt/path/to/shared/folder/*.txt"))
False
instead of having 15 cron jobs, doing a blind put *.txt every 5 minutes, i'd like to run just one script that checks all folder locations. if txt files exists = true (then do pexpect job) if false, go to next folder path and check if txt files exist = true (then do pexpect job) ect....
glob seems to be the ticket. I tested it with this:
import glob
if next(glob.iglob("/path/to/files/*.txt"), None):
print "there are txt files" # there are text files
else:
print "there are no text files"
$ python check.py
there are txt files
You want to use glob.
import glob
if glob.glob("/mnt/path/to/shared/folder/*.txt"):
# there are text files
else:
# No text files
Glob will return a list of files matching the wildcard-accessible path. If there are no files, it will return an empty list. This is really just os.listdir and fnmatch.filter together.
If memory is an issue, use glob.iglob as 200OK suggests in the comments:
import glob
if next(glob.iglob("/mnt/path/to/shared/folder/*.txt"), None):
# there are text files
else:
# No text files
iglob builds an iterator instead of a list, which is massively more memory-saving.
If your problem specifically is to find files with particular criteria, consider opening/reading a pipe to the program find.
As in:
find dir1 dir2 dir3 -name "*.txt"
That program has dozens of options for filtering based on the type of file (symlink, etc) and should give you a lot of flexibility that might be easier than writing it yourself with various python libraries.

Categories

Resources