Check if any part of string is contained within Text File - python

Here is my code:
for r, d, f in os.walk(path):
for folder in d:
folders.append(os.path.join(r, folder))
for f in folders:
with open('search.txt') as file:
contents = file.read()
if f in contents:
print ('word found')
I'm having the script search through a given directory, and compare the name of the paths to a Text Files full of Virus Definitions. I'm trying to get the script to recognise if the name of the file is contained within said text file.
Problem I've found as you can see from the code, it will only work with a complete match, and as it takes the path as the string (for example; "C:/test/virus.bat") it will never match.
Is it possible to adjust this code so that any part of the path can be matched against the text file?
Not sure if that makes sense, any suggestions welcome or please say if not clear.
EDIT:
To be more clear, here is a logic version of what I am trying to achieve:
List all Files in Directory
Get file name within path name ("virus" within "C:/test/virus")
Check if file name is contained within Text File

You can use the following to get all files in a given path:
files = []
for r, d, f in os.walk(path):
for folder in d:
if f:
for file in f:
files.append(os.path.join(r, folder, file).replace('\\', '/'))
then slightly modify your code to achieve the result. You want to check if the content of the file is in the given file path, not vice versa -> if content in f:
for f in files:
with open('search.txt', 'r') as file:
content = file.read()
if content in f:
print('Found')

Related

How to add specific files from a series of folders to an array?

So far I've managed to compile all of the files from a series of folders using the following:
path = r'C:\Users\keefr\Documents\Data\Pulse Characterisation\sample 7'
subfolders = [f.path for f in os.scandir(path) if f.is_dir()]
for sub in subfolders:
for f in os.listdir(sub):
print(f)
files = [i for i in f if os.path.isfile(os.path.join(f,'*.txt')) and 'data' in f]
Where f prints out the names of all of the files. What I want to do is take only certain files from this (starts with 'data' and is a .txt file) and put these in an array called files. The last line in the above code is where I tried to do this but whenever I print files it's still an empty array. Any ideas where I'm going wrong and how to fix it?
Update
I've made some progress, I changed the last line to:
if os.path.isfile(os.path.join(sub,f)) and 'data' in f:
files.append(f)
So I now have an array with the correct file names. The problem now is that there's a mix of .meta, .index and .txt files and I only want the .txt files. What's the best way to filter out the other types of files?
I would probably do it like this. Considering f is the filename, and is a string, python has functions startswith() and endswith() that can be applied to specifically meet your criteria of starting with data and ending with .txt. If we find such a file, we append it to file_list. If you want the full path in file_list, I trust you are able to make that modification.
import os
path = r'C:\Users\keefr\Documents\Data\Pulse Characterisation\sample 7'
subfolders = [f.path for f in os.scandir(path) if f.is_dir()]
file_list = []
for sub in subfolders:
for f in os.listdir(sub):
if (f.startswith("data") and f.endswith(".txt")):
file_list.append(f)
print(file_list)

get files from a directory that starts with specific characters

I am having this path c:\testfolder\myfolder\
And inside this folder i got these folders:
gr432d4dr
fr34q2sf4fd
grdddxwes
pl34scc
us4352fc
us4245
gr00mis
us994k
These folders contain cpp files inside.
What i want is to search ONLY from the folders that starts with gr and us
Here is my code which tried to search the specific folders:
def search():
for filename in Path('c:\testfolder\myfolder\').glob('**/*.cpp') :
with open(filename) as f:
if "gr" or "us" in f:
#do something
I put a print to see but seems like it's still checking all the folders
f is not the filename, is the file object resource which let you access the file content. You must check the filename before opening the file.
def search():
for filename in Path(r'c:\testfolder\myfolder\').glob('**/*.cpp') :
if "gr" in os.fspath(filename) or "us" in os.fspath(filename):
with open(filename) as f:
#do something
You need os.fspath to get back a string from a Path object in order to use the in operator to search for a substring in the string.
Also note the if condition I've edited. Should be written in this way.

Searching a folder structure and modifying XML files using Python

I am trying to create a python script that will iterate through a folder structure, find folders named 'bravo', and modify the xml files contained within them.
In the xml files, I want to modify the 'location' attribute of a tag, called 'file'. Such as:
<file location="e:\one\two"/>
I just need to change the drive letter of the file path from ‘e’ to ‘f’. So that it will read:
<file location="f:\one\two"/>
However...
The name of these xml files are unique, so I cannot search for the exact xml file name. Instead I am searching by the xml file type.
Also, there are other xml files in my folder structure, without the ‘file’ tag reference, that I wish to ignore.
The only constant is that the xml files I want to modify are all stored in folders named, ‘bravo’.
I also wish to create a log file that lists all the xml files and their filepaths which have successfully been updated (and preferably the ones that failed).
Using answers to similar questions on this site, I have cobbled together the following script.
In its current state, the script trys to modify every xml files it finds. I have not been able to successfully add code that only searches folders called, ‘bravo'.
When the script modifies an xml file, not in a 'bravo' folder, it errors because these files do not contain a 'file' tag.
Please could someone help me to correct my script (or create a new one).
Here is an example of the folder structure...
My folder structure
And my script so far...
from xml.dom import minidom
import os
# enter the directory where to start search for xml files...
for root, dirs, files in os.walk("c:/temp"):
for file in files:
#search for xml files...
if file.endswith(".xml"):
xml_file = file
xmldoc = minidom.parse(os.path.join(root, xml_file))
# in the xml file look for tag called "file"...
file_location = xmldoc.getElementsByTagName("file")
# i don't understand the next line of code, but it's needed
file_location = file_location[0]
# 'location_string' is a variable for the 'location' path of the file tag in the xml document
location_string = (file_location.attributes["location"].value)
# the new drive letter is added to the location_string to create 'new_location'
new_location = "f" + location_string[1:]
# replace the 'location' value of the file tag with the new location...
file_location.attributes["location"].value = new_location
# write the change to the original file
with open((os.path.join(root, xml_file)),'w') as f:
f.write(xmldoc.toxml())
print "%s has been updated!" % (os.path.join(root, xml_file))
# add updated file name to log...
log_file = open("filepath_update_log.txt", "a")
log_file.write("%s\n" % (os.path.join(root, xml_file)))
log_file.close
Test if the directory name fits, before your second loop. You'd have to get the last directory in the path first. As in: How to get only the last part of a path in Python?
if os.path.basename(os.path.normpath(root)) == "bravo":
You could use the https://docs.python.org/3/library/logging.html module for logging.
If you only want to replace a single letter, then maybe you can directly replace it instead of parsing xml. As suggested in: https://stackoverflow.com/a/17548459/7062162
def inplace_change(filename, old_string, new_string):
# Safely read the input filename using 'with'
with open(filename) as f:
s = f.read()
if old_string not in s:
print('"{old_string}" not found in {filename}.'.format(**locals()))
return
# Safely write the changed content, if found in the file
with open(filename, 'w') as f:
print('Changing "{old_string}" to "{new_string}" in {filename}'.format(**locals()))
s = s.replace(old_string, new_string)
f.write(s)

Adding actual files to a list, instead of just the file's string name

I am having issues reading the contents of the files I am trying to open due to the fact that python believes there is:
"No such file or directory: 'Filename.xrf'"
Here is an outline of my code and what I think the problem may be:
The user's input defines the path to where the files are.
direct = str(raw_input("Enter directory name where your data is: ))
path = "/Users/myname/Desktop/heasoft/XRF_data/%s/40_keV" \
%(direct)
print os.listdir(path)
# This lists the correct contents of the directory that I wanted it to.
So here I essentially let the user decide which directory they want to manipulate and then I choose one more directory path named "40_keV".
Within a defined function I use the OS module to navigate to the corresponding directory and then append every file within the 40_keV directory to a list, named dataFiles.
def Spectrumdivide():
dataFiles = []
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.xrf'):
dataFiles.append(file)
Here, the correct files were appended to the list 'dataFiles', but I think this may be where the problem is occurring. I'm not sure whether or not Python is adding the NAME of the file to my list instead of the actual file object.
The code breaks because python believes there is no such file or directory.
for filename in dataFiles:
print filename
f = open(filename,'r') # <- THE CODE BREAKS HERE
print "Opening file: " + filename
line_num = f.readlines()
Again, the correct file is printed from dataFiles[0] in the first iteration of the loop but then this common error is produced:
IOError: [Errno 2] No such file or directory: '40keV_1.xrf'
I'm using an Anaconda launcher to run Spyder (Python 2.7) and the files are text files containing two columns of equal length. The goal is to assign each column to a list and the manipulate them accordingly.
You need to append the path name not just the file's name using the os.path.join function.
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.xrf'):
dataFiles.append(os.path.join(root, file))

How to confirm only html files exist in a given folder and if not then how to prompt the user to specify a folder with only html files within

I would like to create a python script that will do 3 things: 1) Take user input to navigate to a file directory2) Confirm the file contents (a particular set of files need to be in the folder for the script to proceed)3) Do a Find and Replace
The Code as of now:
import os, time
from os.path import walk
mydictionary = {"</i>":"</em>"}
for (path, dirs, files) in os.walk(raw_input('Copy and Paste Course Directory Here: ')):
for f in files:
if f.endswith('.html'):
filepath = os.path.join(path,f)
s = open(filepath).read()
for k, v in mydictionary.iteritems(): terms for a dictionary file and replace
s = s.replace(k, v)
f = open(filepath, 'w')
f.write(s)
f.close()
Now i Have parts 1 and 3, I just need part 2.
for part 2 though I need to confirm that only html files exist in the directory the the user will specified otherwise the script will prompt the user to enter the correct folder directory (which will contain html files)
Thanks
From what I understand, here's your pseudocode:
Ask user for directory
If all files in that directory are .html files:
Do the search-and-replace stuff on the files
Else:
Warn and repeat from start
I don't think you actually want a recursive walk here, so first I'll write that with a flat listing:
while True:
dir = raw_input('Copy and Paste Course Directory Here: ')
files = os.listdir(dir)
if all(file.endswith('.html') for file in files):
# do the search and replace stuff
break
else:
print 'Sorry, there are non-HTML files here. Try again.'
Except for having the translate the "repeat from start" into a while True loop with a break, this is almost a word-for-word translation from the English pseudocode.
If you do need the recursive walk through subdirectories, you probably don't want to write the all as a one-liner. It's not that hard to write "all members of the third member of any member of the os.walk result end with '.html'", but it will be hard to read. But if you turn that English description into something more understandable, you should be able to see how to turn it directly into code.

Categories

Resources