XML file edit with Python script [duplicate] - python

This question already has answers here:
Find and Replace Values in XML using Python
(4 answers)
Closed 6 years ago.
I need to write a Python script that reads and replaces some data in an XML file.
The data that is replaced has to be read automatically from a directory (it's a file's name)
<setting name="abc" serializeAs="String">
<value>fw.version.1.1</value>
the fw.version1.1 has to be replaced with the file name from a folder.
Could use some help:)
thanks,
Robert

Assuming the XML File looks something like that test.xml:
<someXml>
<setting name="abc" serializeAs="String"/>
<value>fw.version.1.1</value>
</someXml>
To read the XML Data from File:
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
xmlData = etree.parse('test.xml', parser )
Reading the text from the value Tag:
xmlData.xpath('//value')[0].text
Writing new text to the value Tag:
xmlData.xpath('//value')[0].text = "test"
And finally write your changes to the same (or any other) File:
xmlData.write( 'test.xml', pretty_print=True )

Related

XML to .txt using python

I have written code based on a txt file but in production it will be an XML file. I need to be able to iterate over every single character in an XML file as if it were a standard text file.
I have the below as an XML file:
<?xml version="1.0" encoding="UTF-8"?>
-<openBranchData>
<phoneSwMngmtEnabled/>
-<phoneSwMngmtData>
<imgProvisioningEnabled/>
<startTime>02:00</startTime>
<stopTime>06:00</stopTime>
<centralPhoneSwServer/>
<maxParallelAccess>3</maxParallelAccess>
<phoneSwPullingEnabled/>
</phoneSwMngmtData>
<voiceMailEnabled/>
I've tried the below python code.
with open('osb_file.xml') as test:
testing = test.read()
for x in testing:
print(x)
Do I need to convert to a txt file or is there a simpler way? It's been a while since I've looked at this project so apologies if I'm missing anything very obvious.
As Robert Harvey said, XML files are already text files. You can read them just like you read a standard text file in python using open(filename, 'r')

Can not save xml file using minidom [duplicate]

This question already has answers here:
Troubles while parsing with python very large xml file
(3 answers)
Closed 4 years ago.
I tried to modify and save a xml file using minidom in python.
Everything is quite working good except 1 specific file, that I only can read but can not write it back.
Code that I use to save xml file:
domXMLFile = minidom.parse(dom_document_filename)
#some modification
F= open(dom_document_filename,"w")
domXMLFile .writexml(F)
F.close()
My question is :
Is it true that minidom can not handle too large file ( 714KB )?
How do i solve my problem?
In my opinion, lxml is way better than minidom for handling XML. If you have it, here is how to use it:
from lxml import etree
root = etree.parse('path/file.xml')
# some changes to root
with open('path/file.xml', 'w') as f:
f.write(etree.tostring(root, pretty_print=True))
If not, you could use pdb to debug your code. Just write import pdb; pdb.set_trace() in your code where you want a break pont and when running your function in a shell, it should stop at this line. It may give you a better view for what is not working.

Python--Export Parsed XML to txt file using [duplicate]

This question already has an answer here:
How do I write all of these rows into a CSV file for a given range?
(1 answer)
Closed 6 years ago.
I'm parsing text from an XML file. Parsing works well, and I can print the results in full, but when I try to write the text into a text document, all I get in the document is the last item.
from bs4 import BeautifulSoup
import urllib.request
import sys
req = urllib.request.urlopen('file:///C:/Users/John/Desktop/Dow%20Jones/compaq%20neg%201.xml')
xml = BeautifulSoup(req, 'xml')
for item in xml.findAll('paragraph'):
sys.stdout = open('CN1.txt', 'w')
print(item.text)
sys.stdout.close()
What am I missing here?
It looks like you are opening the file every time you go through the loop, which I am surprised it let you do. When it opens the file, it is is opening it in write mode and therefore is wiping out everything that was in it on the last pass through the loop.

read a pdf in python [duplicate]

This question already has answers here:
How to read line by line in pdf file using PyPdf?
(3 answers)
Closed 7 years ago.
I want to read a pdf file in python. Tried some of the ways- PdfReader and pdfquery but not getting the result in string format. Want to have some of the content from that pdf file. is there any way to do that?
PDFminer is a tool for extracting information from PDF documents.
Does it matter in your case if file is pdf or not. If you just want to read your file as string, just open it as you would open a normal file.
E.g.-
with open('my_file.pdf') as file:
content = file.read()

Problems extracting XML code from Word with Python

I am attempting to extract the XML code from a Word document with Python. Here's the code I tried:
def getXml(docxFilename):
zip = zipfile.ZipFile(open(docxFilename,"rb"))
xmlString= str(zip.read("word/document.xml"))
return xmlString
I created a test document and ran the function getXML on it. Here's the result:
b'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\r\n<w:document xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"><w:body><w:p w:rsidR="00971B91" w:rsidRPr="00971B91" w:rsidRDefault="00B52719"><w:pPr><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/><w:sz w:val="24"/><w:szCs w:val="24"/></w:rPr></w:pPr><w:r><w:t>Test</w:t></w:r></w:p><w:sectPr w:rsidR="00971B91" w:rsidRPr="00971B91" w:rsidSect="009C4305"><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0"/><w:cols w:space="708"/><w:docGrid w:linePitch="360"/></w:sectPr></w:body></w:document>'
There are some obvious issues. One is that the XML code begins with an "b' " and ends with an apostrophe. Second, there is a "\r\n" right after the first set of angle brackets.
My ultimate goal is to modify the XML code to create a new Word document -- see this question -- but the anomalies with the extracted XML are preventing me from doing this.
Does anyone know why the extracted XML has these strange features and how I can remove them?
EDIT: I tried using the lxml module to parse this code but I only got different errors.
I created a new function getXmlTree:
from lxml import etree
def getXmlTree(xmlString):
return etree.fromstring(xmlString)
I then ran the code etree.tostring(getXmlTree(getXml("test.docx")),pretty_print=True) and received much more sensible XML code.
The problems arise when I tried to create a new Word document. I created the following function to convert XML code into a Word document (shamelessly stolen from here):
import zipfile
from lxml import etree
import os
import tempfile
import shutil
def createNewDocx(originalDocx,xmlContent,newFilename):
tmpDir = tempfile.mkdtemp()
zip = zipfile.ZipFile(open(originalDocx,"rb"))
zip.extractall(tmpDir)
with open(os.path.join(tmpDir,"word/document.xml"),"w") as f:
xmlString = etree.tostring(xmlContent,pretty_print=True)
f.write(xmlString)
filenames = zip.namelist()
zipCopyFilename = newFilename
with zipfile.ZipFile(zipCopyFilename,"w") as docx:
for filename in filenames:
docx.write(os.path.join(tmpDir,filename),filename)
shutil.rmtree(tmpDir)
Before trying to create a new Word document, I wanted to see if I could create a copy of my original test document by substituting xmlContent = getXmlTree(getXml("test.docx")) as an argument in the above function. When I ran the code, however, I received an error message:
f.write(xmlString)
TypeError: must be str, not bytes
Using f.write(str(xmlString)) instead didn't help; it created a new word document, but Word would crash if I tried to open it.
EDIT2: tried running the above code with f.write(xmlString.decode("utf-8")) instead, but it didn't help; Word still crashed.
My guess is that the XML is not being encoded properly. First, write the document file as binary using "wb" as the mode. Second, tell etree.tostring() what the encoding is and to include the XML declaration.
with open(os.path.join(tmpDir, "word/document.xml"), "wb") as f:
xmlBytes = etree.tostring(xmlContent, encoding="UTF-8", xml_declaration=True, pretty_print=True)
f.write(xmlBytes)

Categories

Resources