Python: get a complete file name based on a partial file name - python

In a directory, there are two files that share most of their names:
my_file_0_1.txt
my_file_word_0_1.txt
I would like to open my_file_0_1.txt
I need to avoid specifying the exact filename, and instead need to search the directory for a filename that matches the partial string my_file_0.
From this answer here, and this one, I tried the following:
import numpy as np
import os, fnmatch, glob
def find(pattern, path):
result = []
for root, dirs, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, pattern):
result.append(os.path.join(root, name))
return result
if __name__=='__main__':
#filename=find('my_file_0*.txt', '/path/to/file')
#print filename
print glob.glob('my_file_0' + '*' + '.txt')
Neither of these would print the actual filename, for me to read in later using np.loadtxt.
How can I find and store the name of a file, based on the result of a string match?

glob.glob() needs a path to be efficient, if you are running the script in another directory, it will not find what you expect.
(you can check the current directory with os.getcwd())
It should work with the line below :
print glob.glob('path/to/search/my_file_0' + '*.txt')
or
print glob.glob(r'C:\path\to\search\my_file_0' + '*.txt') # for windows

A solution using os.listdir()
Could you not also use the os module to search through os.listdir()? So for instance:
import os
partialFileName = "my_file_0"
for f in os.listdir():
if partialFileName = f[:len(partialFileName)]:
print(f)

I just developed the approach below and was doing a search to see if there was a better way and came across your question. I think you may like this approach. I needed pretty much the same thing you are asking for and came up with this clean one liner using list comprehension and a sure expectation that there would only be one file name matching my criteria. I modified my code to match your question.
import os
file_name = [n for n in os.listdir("C:/Your/Path") if 'my_file_0' in n][0]
print(file)
Now, if this is in a looping / repeated call situation, you can modify as below:
for i in range(1, 4):
file = [n for n in os.listdir("C:/Your/Path") if f'my_file_{i}' in n][0]
print(file)
or, probably more practically ...
def get_file_name_with_number(num):
file = [n for n in os.listdir("C:/Your/Path") if f'my_file_{num}' in n][0]
return file
print(get_file_name_with_number(0))

Related

Searching for a File in all Subdirectories in Python

I seem to be stuck on some logic within my code and would appreciate any insight. My objective is to find an excel file in two different sub folders. The user inputs the ID number in the terminal (which is the name of the root folder) and create a file path with the ID. Now im not sure why my if statement isn't detecting the file in either folder.
If anyone can look at my code, it would be greatly appreciated.
#ask user to input the ID
ID = input("Please enter folder ID: ")
#path of excel directory and use glob
path = "/Users/one/Downloads/" + str(ID) + "/"
for (dir,subdirs,files) in os.walk(path):
if "Filey1_*.xlsx" in files:
print("File Found:", os.path.join(dir, "Filey1_*.xlsx"))
To answer your question directly: the reason your if statement is not working is because the use of the keyword "in" is not like using glob or a regex and the asterisk you're including (*) is not doing what you think it is doing. In fact it's not really doing anything.
The result is that you're searching specifically for a file called exactly "Filey1_*.xlsx" rather than a file that matches the glob regex (* being a wild card), which is presumably what you want.
What you could do is add this import at the top:
from pathlib import Path
and then replace your if statement with:
temp = Path(path).rglob("Filey1_*.xlsx")
temp = list(temp)
if len(temp) > 0:
print("File Found:", os.path.join(dir, str(temp[0])))
the first line does a recursive glob search through all subfolders of path and if it finds a file, then the list length is larger than 0.
So the issue is with your if statement as it searches for exact "Filey1_*.xlsx" match in the file names.
You can try using something like this:
for (root, subdirs, files) in os.walk(path):
for f in files:
if "Filey1_" in f and ".xlsx" in f:
print("File Found:", os.path.join(root, f))
I found a really simple solution to my own problem lol I'll share it with everyone!
files = glob.glob(path + "/**/Filey1_*.xlsx", recursive = True)

Recursively find and copy files from many folders

I have some files in an array that I want to recursively search from many folders
An example of the filename array is ['A_010720_X.txt','B_120720_Y.txt']
Example of folder structure is as below which I can also provide as an array e.g ['A','B'] and ['2020-07-01','2020-07-12']. The "DL" remains the same for all.
C:\A\2020-07-01\DL
C:\B\2020-07-12\DL
etc
I have tried to use shutil but it doesn't seem to work effectively for my requirement as I can only pass in a full file name and not a wildcard. The code I have used with shutil which works but without wildcards and with absolute full file name and path e.g the code below will only give me A_010720_X.txt
I believe the way to go would be using glob or pathlib which i have not used before or cannot find some good examples similar to my use case
import shutil
filenames_i_want = ['A_010720_X.txt','B_120720_Y.txt']
RootDir1 = r'C:\A\2020-07-01\DL'
TargetFolder = r'C:\ELK\LOGS\ATH\DEST'
for root, dirs, files in os.walk((os.path.normpath(RootDir1)), topdown=False):
for name in files:
if name in filenames_i_want:
print ("Found")
SourceFolder = os.path.join(root,name)
shutil.copy2(SourceFolder, TargetFolder)
I think this should do what you need assuming they are all .txt files.
import glob
import shutil
filenames_i_want = ['A_010720_X.txt','B_120720_Y.txt']
TargetFolder = r'C:\ELK\LOGS\ATH\DEST'
all_files = []
for directory in ['A', 'B']:
files = glob.glob('C:\{}\*\DL\*.txt'.format(directory))
all_files.append(files)
for file in all_files:
if file in filenames_i_want:
shutil.copy2(file, TargetFolder)

Shutil find and remove files

I am trying to automate some work which is currently done by hand.
The aim is to find all the documents which have, for example, the number 408710 in their file name. Please note that the file name does also include other letters or figures. An example could be 2rsgf54087105f85sfr. The program should now search for all the files which own the combination 408710 and then move them into the right path.
I do know how to move the files, but so far I am only able to move the files by entering the exact file name. In that case I do only have one file and not all the files with the mentioned combination. Of course I do not know the exact file name in advance anyway.
Here the code for the stuff which is working:
import shutil
src = "C:/Users/Startklar/Desktop/Ausgangsordner"
dst = "C:/Users/Startklar/Desktop/Empfangsordner/Sven"
dst2 = "C:/Users/Startklar/Desktop/Empfangsordner/Gerald"
# remove files
shutil.move(src=src + "/AA023300408710LFVI.docx", dst=dst)
shutil.move(src=src + "/BB023310187105ADIK.docx", dst=dst2)
If you just want to remove the files you can do it like this using regexp:
import os
import re
regexp = r'yourPattern.*\.docx$'
res = [f for f in os.listdir(path) if re.search(regexp , f)]
for f in res:
print('Remove: '+f)
os.remove(f)
You will need to find a regular expression which only finds all the files you would like to remove.
If you want infact move the files, like in your example, this looks like this (just guessing the regexp from your example)
import os
import re
src = "C:/Users/Startklar/Desktop/Ausgangsordner"
filters = [["C:/Users/Startklar/Desktop/Empfangsordner/Sven", r'.*LFVI\.docx$'],
["C:/Users/Startklar/Desktop/Empfangsordner/Gerald", r'.*ADIK\.docx$']]
for f in os.listdir(src):
for dst,regexp in filters:
if re.search(regexp , f):
shutil.move(src=f, dst=dst)

Creating subdirectories and sorting files based on filename PYTHON

I have a large directory with many part files with their revisions, I want to recursively create a new folder for each part, and then move all of the related files into that folder. I am trying to do this by isolating a 7 digit number which would be used as an identifier for the part, and all the related filenames would also include this number.
import os
import shutil
import csv
import glob
from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree, copy2, Error, copystat
from shutil import copytree, ignore_patterns
dirname = ' '
# pattern = '*???????*'
for root, dirs, files in os.walk(dirname):
for fpath in files:
print(fpath)
if fpath[0:6].isdigit():
matchdir = os.mkdir(os.path.join(os.path.dirname(fpath)))
partnum = str(fpath[0:6])
pattern = str(partnum)
filematch = fnmatch(files, pattern)
print(filematch)
shutil.move(filematch, matchdir)
This is what I have so far, basically I'm not sure how to get the original filename and use it as the matching patter for the rest of the files. The original filename I want to use for this matching pattern is just a 7 digit number, and all of the related files may have other characters (REV-2) for example.
Don't overthink it
I think you're getting confused about what os.walk() gives you - recheck the docs. dirs and files are just a list of names of the directories / files, not the full paths.
Here's my suggestion. Assuming that you're starting with a directory layout something like:
directory1
1234567abc.txt
1234567abc.txt
1234567bcd.txt
2234567abc.txt
not-interesting.txt
And want to end with something like:
directory1
1234567
abc.txt
1234567
abc.txt
bcd.txt
2234567
abc.txt
not-interesting.txt
If that's correct, then there's no need to rematch the files in the directory, just operate on each file individually, and make the part directory only if it doesn't already exist. I would also use a regular expression to do this, so something like:
import os
import re
import shutil
for root, dirs, files in os.walk(dirname):
for fname in files:
# Match a string starting with 7 digits followed by everything else.
# Capture each part in a group so we can access them later.
match_object = re.match('([0-9]{7})(.*)$', fname)
if match_object is None:
# The regular expression did not match, ignore the file.
continue
# Form the new directory path using the number from the regular expression and the current root.
new_dir = os.path.join(root, match_object.group(1))
if not os.path.isdir(new_dir):
os.mkdir(new_dir)
new_file_path = os.path.join(new_dir, match_object.group(2))
# Or, if you don't want to change the filename, use:
new_file_path = os.path.join(new_dir, fname)
old_file_path = os.path.join(root, fname)
shutil.move(old_file_path, new_file_path)
Note that I have:
Switched the sense of the condition, we continue the loop immediately if the file is not interesting. This is a useful pattern to use to make sure that your code does not get too heavily indented.
Changed the name of fpath to fname. This is because it's not a path but just the name of the file, so it's better to call it fname.
Please clarify the question if that's not what you meant!
[edit] to show how to copy the file without changing its name.

Simplest way to get the equivalent of "find ." in python?

What is the simplest way to get the full recursive list of files inside a folder with python? I know about os.walk(), but it seems overkill for just getting the unfiltered list of all files. Is it really the only option?
There's nothing preventing you from creating your own function:
import os
def listfiles(folder):
for root, folders, files in os.walk(folder):
for filename in folders + files:
yield os.path.join(root, filename)
You can use it like so:
for filename in listfiles('/etc/'):
print filename
os.walk() is not overkill by any means. It can generate your list of files and directories in a jiffy:
files = [os.path.join(dirpath, filename)
for (dirpath, dirs, files) in os.walk('.')
for filename in (dirs + files)]
You can turn this into a generator, to only process one path at a time and safe on memory.
You could also use the find program itself from Python by using sh
import sh
text_files = sh.find(".", "-iname", "*.txt")
Either that or manually recursing with isdir() / isfile() and listdir() or you could use subprocess.check_output() and call find .. Bascially os.walk() is highest level, slightly lower level is semi-manual solution based on listdir() and if you want the same output find . would give you for some reason you can make a system call with subprocess.
pathlib.Path.rglob is pretty simple. It lists the entire directory tree
(The argument is a filepath search pattern. "*" means list everything)
import pathlib
for path in pathlib.Path("directory_to_list/").rglob("*"):
print(path)
os.walk() is hard to use, just kick it and use pathlib instead.
Here is a python function mimicking a similar function of list.files in R language.
def list_files(path,pattern,full_names=False,recursive=True):
if(recursive):
files=pathlib.Path(path).rglob(pattern)
else:
files=pathlib.Path(path).glob(pattern)
if full_names:
files=[str(f) for f in files]
else:
files=[f.name for f in files]
return(files)
import os
path = "path/to/your/dir"
for (path, dirs, files) in os.walk(path):
print files
Is this overkill, or am I missing something?

Categories

Resources