How to find a filename that contains a given string - python

I'm attempting to look for a keyword of a text file within a directory then find out the whole name of the file using Python.
Let this keyword be 'file', but this text file in the directory is called 'newfile'.
I'm trying to find out the name of the whole file in order to be able to open it.

import os
keyword = 'file'
for fname in os.listdir('directory/with/files'):
if keyword in fname:
print(fname, "has the keyword")

You could use fnmatch. From the documentation:
This example will print all file names in the current directory with the extension .txt:
import fnmatch
import os
for filename in os.listdir('.'):
if fnmatch.fnmatch(filename, '*.txt'):
print filename
From your example you would want fnmatch(filename, '*file*').
e.g:
>>> from fnmatch import fnmatch
>>> fnmatch('newfile', '*file*')
True
>>> fnmatch('newfoal', '*file*')
False

Using grep you can locate file containing the word you are looking for.
grep -r 'word' FOLDER
-r indicates grep to look for 'word' in all the files of FOLDER

Related

Search through multiple files/dirs for string, then print content of text file

I'm trying to make a small script that will allow me to search through text files located in a specific directory and folders nested inside that one. I've managed to get it to list all files in that path, but can't seem to get it to search for a specific string in those files and then print the full text file.
Code:
import os
from os import listdir
from os.path import isfile, join
path = "<PATH>"
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.txt'):
dfiles = str(file)
sTerm = input("Search: ")
for files in os.walk(path):
for file in files:
with open(dfiles) as f:
if sTerm in f.read():
print(f.read())
First part was from a test I did to list all the files, once that worked I tried using the second part to search through all of them for a matching string and then print the full file if it finds one. There's probably an easier way for me to do this.
Here is a solution with Python 3.4+ because of pathlib:
from pathlib import Path
path = Path('/some/dir')
search_string = 'string'
for o in path.rglob('*.txt'):
if o.is_file():
text = o.read_text()
if search_string in text:
print(o)
print(text)
The code above will look for all *.txt in path and its sub-directories, read the content of each file in text, search for search_string in text and, if it matches, print the file name and its contents.

Finding partial filenames in subdirectories

I'm trying to create a script where it will go through multiple directories and sub directories and find a matching filename and display it's path.
I was able to do this in shell script with ease and i was able to get the desired output. I have used this in shell like this:
echo "enter the name of the movie."
read moviename
cd "D:\movies"
find -iname "*$moviename*" > result.txt
cat result.txt
for i in result.txt
do
if [ "$(stat -c %s "$i")" -le 1 ]
then
echo "No such movie exists"
fi
done
This is what I have in python and i'm getting nowhere.
import os.path
from os import path
print ('What\'s the name of the movie?')
name = input()
for root, dirs, files in os.walk('D:\movies'):
for file in files:
if os.path.isfile('D:\movies'+name):
print(os.path.join(root, file))
else:
print('No such movie')
I want it to search for the filename case insensitive and have it display. I've tried so hard to do it.
import os
name = input('What\'s the name of the movie?')
success = False
for root, dirs, files in os.walk('D:\movies'):
for file in files:
if name.lower() in file.lower():
print(os.path.join(root, file))
success = True
if success == False:
print('No such movie')
You don't need to import each part of os separately.
You can combine input and print into one line.
This is basically asking 'if this string is in that string, print the path'. lower() will make it case insensitive.
I added the success variable as otherwise it will print line for every time a file doesn't match.
You may want to replace the (absolutely meaningless, because you don't compare file with anything):
if os.path.isfile('D:\movies'+name):
with:
if file.lower().find(name.lower()) != -1 :
and have fun with the file list you're getting =)
from pathlib import Path
MOVIES = Path('D:\movies')
def find_file(name)
for path in MOVIES.rglob('*'):
if path.is_file() and name.lower() in path.name.lower():
break
else:
print('File not found.')
path = None
return path
You could also look into the fuzzywuzzy library for fuzzy matching between file names and input name.

How can I take a list of strings and find files who name matches a string in the list?

I have a list of 600+ numbers, and a directory of 50,000+ files. All of the files are named like this:
99574404682_0.jpg
99574404682_1.jpg
99574437307_0.gif
99574437307_1.gif
99574437307_2.gif
99574449752.jpg
99574457597.jpg
99581722007.gif
I want to copy any file that has a name that matches a number in the list, up to the underscore, and copy to a new directory.
For example if my list contains:
99574404682
99574449752
99581722007
Then the files:
99574404682_0.jpg
99574404682_1.jpg
99574449752.jpg
99581722007.gif
would be copied to a new directory. I am on a Mac using bash 3.2. I am thinking something like python is what I need to use because the list is too large for grep or find but I am not sure. Thanks!
You could iterate through two lists taking item from one based on startswith condition:
files_lst = ['99574404682_0.jpg', '99574404682_1.jpg', '99574437307_0.gif', '99574437307_1.gif', '99574437307_2.gif', '99574449752.jpg', '99574457597.jpg', '99581722007.gif']
lst = [99574404682, 99574449752, 99581722007]
for x in files_lst:
for y in lst:
if x.startswith(str(y)):
print(x)
# 99574404682_0.jpg
# 99574404682_1.jpg
# 99574449752.jpg
# 99581722007.gif
This gets all files that starts with numbers provided in lst.
using os module and shutil module in python
import os
import shutil
and you can prepare a list contains the match pattern likes
match_pattern=['99574404682','99574449752','99581722007']
then use os.listdir() to get a list which contains the file name in source directory
files_in_source_dir=os.listdir(source_directory_path)
at last copy the matching files
for file in files_in_source_dir:
if file.split('.')[0] in match_pattern: #using split('.')[0] to get filename without extend name
shutil.copyfile(source_directory_path+file,target_directory_path+file)
You can use shutil.copy() to copy your files over from a source to a destination.
from shutil import copy
from os import listdir
from os import makedirs
from os.path import abspath
from os.path import exists
from os.path import splitext
filenames = {'99574404682', '99574449752', '99581722007'}
src_path = # your files
dest_path = # where you want to put them
# make the destination if it doesn't exist
if not exists(dest_path):
makedirs(dest_path)
# go over each file in src_path
for file in listdir(src_path):
# If underscore in file
if "_" in file:
prefix, *_ = file.split("_")
# otherwise treat as normal file
else:
prefix, _ = splitext(file)
# only copy if prefix exist in above set
if prefix in filenames:
copy(abspath(file), dest_path)
Which results in the following files in dest_path:
99574404682_0.jpg
99574404682_1.jpg
99574449752.jpg
99581722007.gif
I'm not really an expert in bash, but you can try something like this:
#!/bin/bash
declare -a arr=("99574404682" "99574449752" "99581722007")
## Example directories, you can change these
src_path="$PWD/*"
dest_path="$PWD/src"
if [ ! -d "$dest_path" ]; then
mkdir $dest_path
fi
for f1 in $src_path; do
filename=$(basename $f1)
prefix="${filename%.*}"
IFS='_' read -r -a array <<< $prefix
for f2 in "${arr[#]}"; do
if [ "${array[0]}" == "$f2" ]; then
cp $f1 $dest_path
fi
done
done

Remove all files in a directory matching regular expression in Python

I have two files in the directory home/documents/ named 2018-06-rs.csv000 and 2018-06-rs.csv001. I want to remove both the files from the directory.
Following is my code:
import datetime
import os
now = datetime.datetime.now()
file_date = now.strftime("%Y-%m")
os.remove("/home/documents/"+file_date+"-rs.csv*")
The error I'm getting is :
OSError: [Errno 2] No such file or directory: '/home/documents/201806-rs.csv*'
Listing the above path directs to the actual file though.
ls /home/documents/201806-rs.csv*
Appreciate any feedback.
Try this:
import os, re
def purge(dir, pattern):
for f in os.listdir(dir):
if re.search(pattern, f):
os.remove(os.path.join(dir, f))
Make sure dir is the correct path to the directory that contains your files, and pattern is a valid regex.

How to check if a file exists without looking at its extension name in Python?

A file can have multiple extensions but name of the file will remains same.
I have tried
import os.path
os.path.isfile("Filename")
but this code is looking at the extension of the file also.
This would list all files with same name but different extensions.
import glob
print glob.glob("E:\\Logs\\Filename.*")
You could use this check instead.
import glob
if glob.glob("E:\\Logs\\Filename.*"):
print "Found"
Refer this post.
Try this.
import os
def check_file(dir, prefix):
for s in os.listdir(dir):
if os.path.splitext(s)[0] == prefix and os.path.isfile(os.path.join(dir, s)):
return True
return False
You can call this function like, e.g., check_file("/path/to/dir", "my_file") to search for files of the form /path/to/dir/my_file.*.
You can use fnmatch also,
import os
import fnmatch
print filter(lambda f: fnmatch.fnmatch(f, "Filename.*"), os.listdir(FilePath))
Here, No need to format FilePath. You can simply write like 'C:\Python27\python.*'
This will get all the files with the basename you want (in this case 'tmp') with or without an extension and will exclude things that start with your basename - like tmp_tmp.txt for example:
import re
import os
basename='tmp'
for filename in os.listdir('.'):
if re.match(basename+"(\..*)?$", filename):
print("this file: %s matches my basename"%filename)
Or of course if you prefer them in a list, more succinctly:
[fn for fn in os.listdir('.') if re.match(basename+"(\..*)?$",fn)]

Categories

Resources