Elementree Fromstring and iterparse in Python 3.x - python

I am able to parse from file using this method:
for event, elem in ET.iterparse(file_path, events=("start", "end")):
But, how can I do the same with fromstring function? Instead of from file, xml content is stored in a variable now. But, I still want to have the events as before.

From the documentation for the iterparse method:
...Parses an XML section into an element tree incrementally, and
reports what’s going on to the user. source is a filename or file
object containing XML data...
I've never used the etree python module, but "or file object" says to me that this method accepts an open file-like object as well as a file name. It's an easy thing to construct a file-like object around a string to pass as input to a method like this.
Take a look at the StringIO module.

Related

How to extract the full path from a file while using the "with" statement?

I'm trying, just for fun, to understand if I can extract the full path of my file while using the with statement (python 3.8)
I have this simple code:
with open('tmp.txt', 'r') as file:
print(os.path.basename(file))
But I keep getting an error that it's not a suitable type format.
I've been trying also with the relpath, abspath, and so on.
It says that the input should be a string, but even after casting it into string, I'm getting something that I can't manipulate.
Perhaps there isn't an actual way to extract that full path name, but I think there is. I just can't find it, yet.
You could try:
import os
with open("tmp.txt", "r") as file_handle:
print(os.path.abspath(file_handle.name))
The functions in os.path accept strings or path-like objects. You are attempting to pass in a file instead. There are lots of reasons the types aren't interchangable.
Since you opened the file for text reading, file is an instance of io.TextIOWrapper. This class is just an interface that provides text encoding and decoding for some underlying data. It is not associated with a path in general: the underlying stream can be a file on disk, but also a pipe, a network socket, or an in-memory buffer (like io.StringIO). None of the latter are associated with a path or filename in the way that you are thinking, even though you would interface with them as through normal file objects.
If your file-like is an instance of io.FileIO, it will have a name attribute to keep track of this information for you. Other sources of data will not. Since the example in your question uses FileIO, you can do
with open('tmp.txt', 'r') as file:
print(os.path.abspath(file.name))
The full file path is given by os.path.abspath.
That being said, since file objects don't generally care about file names, it is probably better for you to keep track of that info yourself, in case one day you decide to use something else as input. Python 3.8+ allows you to do this without changing your line count using the walrus operator:
with open((filename := 'tmp.txt'), 'r') as file:
print(os.path.abspath(filename))

Modify XML with Python

I have an XML file which I want to modify and save to futher use. I only want to change text of some elements, but when I do, it also deletes all the line breaks in element's attributes:
<smth.xml
attr1="name1"
attr2="name2"
smth="21315423"
Debug="false" >
becomes
<smth.xml attr1="name1" attr2="name2" smth="21315423" Debug="false">
I am currently using lxml lib with parser = etree.XMLParser(encoding='utf-8')
but when I do, it also deletes all the line breaks in element's attributes
That's normal. Line-breaks between attributes in the XML source code are not part of the XML data model.
You are allowed to type them when you edit an XML file by hand, but XML parsers are not required by the spec to pay any attention to them, because they carry no meaning. So practically no XML parser does (I don't know a single XML parser that does).
I can only recommend one thing: Get over it. Don't try to retain them, that's fruitless, pointless, and it won't work.
Load the XML, read data from it, modify it, save it. Don't get too worked up about irrelevant details.

xml.etree.ElementTree.Element' object has no attribute 'write'

I want to read a XML string, edit it and save it as a XML file.
However I get the mentioned error in the title when I do .write()
I found out that when you read an XML string using ElementTree.fromstring(string) it will create an ElementTree.Element and not an ElementTree itself. An Element has no write method but the ElementTree does.
How can I write an Element to a XML file? Or how can I create an ElementTree and add my Element to that and then use the .write method?
I found out that when you read a xml string using ElementTree.fromstring(string) it will actually create an ElementTree.Element and not a ElementTree itself.
Yes, you get the top-level element back (also called the "document element").
An Element has no write method but the ElementTree does.
The ElementTree constructor signature goes like this:
class xml.etree.ElementTree.ElementTree(element=None, file=None)
Therefore it's completely straightforward:
import xml.etree.ElementTree as ET
doc = ET.fromstring("<test>test öäü</test>")
tree = ET.ElementTree(doc)
tree.write("test.xml", encoding="utf-8")
You always should specify the encoding when writing an XML file. Most of the time, UTF-8 is the best choice.
In case this helps anyone who gets this unclear error message when trying to use ElementTree to write an xml file, and spends way too long on it (like I did):
File "/usr/lib/python3.5/xml/etree/ElementTree.py", line 788, in _get_writer
write = file_or_filename.write
AttributeError: 'str' object has no attribute 'write'
... in my case, it was simply because the path to the directory I was trying to write my xml file to did not exist! For example:
tree.write("/FolderDidNotExist/test.xml", encoding="utf-8")
a simple mkdir /FolderDidNotExist did the trick. No more error. (Of course, this error message could use some "love" so I'm posting this here in case I forget what it means again [which I've done] and need to google this again)

lxml parsing with python: how to with objectify

I am trying to read xml behind an spss file, I would like to move from etree to objectify.
How can I convert this function below to return an objectify object? I would like to do this because objectify xml object would be easier for me (as a newbie) to work with as it is more pythonic.
def get_etree(path_file):
from lxml import etree
with open(path_file, 'r+') as f:
xml_text = f.read()
recovering_parser = etree.XMLParser(recover=True)
xml = etree.parse(StringIO(xml_text), parser=recovering_parser)
return xml
my failed attempt:
def get_etree(path_file):
from lxml import etree, objectify
with open(path_file, 'r+') as f:
xml_text = objectify.fromstring(xml)
return xml
but I get this error:
lxml.etree.XMLSyntaxError: xmlns:mdm: 'http://www.spss.com/mr/dm/metadatamodel/Arc 3/2000-02-04' is not a valid URI
The first, biggest mistake is to read a file into a string and feed that string to an XML parser.
Python will read the file as whatever your default file encoding is (unless you specify the encoding when you call read()), and that step will very likely break anything other than plain ASCII files.
XML files come in many encodings, you cannot predict them, and you really shouldn't make assumptions about them. XML files solve that problem with the XML declaration.
<?xml version="1.0" encoding="Windows-1252"?>
An XML parser will read that bit of information and configure itself correctly before reading the rest of the file. Make use of that facility. Never use open() and read() for XML files.
Luckily lxml makes it very easy:
from lxml import etree, objectify
def get_etree(path_file):
return etree.parse(path_file, parser=etree.XMLParser(recover=True))
def get_objectify(path_file):
return objectify.parse(path_file)
and
path = r"/path/to/your.xml"
xml1 = get_etree(path)
xml2 = get_objectify(path)
print xml1 # -> <lxml.etree._ElementTree object at 0x02A7B918>
print xml2 # -> <lxml.etree._ElementTree object at 0x02A7B878>
P.S.: Think hard if you really, positively must use a recovering parser. An XML file is a data structure. If it is broken (syntactically invalid, incomplete, wrongly decoded, you name it), would you really want to trust the (by definition undefined) result of an attempt to read it anyway or would you much rather reject it and display an error message?
I would do the latter. Using a recovering parser may cause nasty run-time errors later.

How to insert base64 file into a specif tag using Python

<book>
<title>sponge bob</title>
<author>Joe Doe</author>
<file>Tbase</file>
</book>
I have 2 files, one is a xml and the other is a base64 file. I would like to know how to insert and replace the string"Tbase" with the content of the base64 file using python.
Are you wanting to put the verbatim contents of the base64 file (still base64 encoded) into the XML file, in place of "Tbase"? If that's the case, you could just do something like:
xml = open("xmlfile.xml").read()
b64file = open("b64file.base64").read()
open("xmlfile.xml", "w").write(xml.replace("Tbase", b64file))
(If you're on Python 2.6 or later, you can do this a little bit cleaner using with statements, but that's another discussion.)
If you want to decode the base64 file first, and place the decoded contents into the XML file, then you'd replace b64file on the last line of the example above with b64file.decode("base64").
Of course, doing simple text replacement, as above, opens you up to the problems you'll have if, say, the title or author contain "Tbase" as well. A better way would be to use an actual XML parsing library, like so:
from xml.etree.ElementTree import fromstring, tostring
xml = fromstring(open("xmlfile.xml").read())
xml.find("file").text = open("b64file.base64").read()
open("xmlfile.xml", "w").write(tostring(xml))
This sets the contents of the <file> tag to be the contents of the file b64file.base64, regardless of what its former contents were and regardless of whether "Tbase" appears elsewhere in the XML document.

Categories

Resources