I am new to etree. I wanted to read etree and put that particular information in another file format like html, xml, etc. I checked and now I can do that but now what about other way around? Like, If I want to read any other file format and generate or write into etree. Please give me some suggestions or with example to proceed with that.
Suppose you want to write an xml file test.xml like the following:
<?xml version='1.0' encoding='ASCII'?>
<document category = "location">
<name>Timbuktu</name>
<name>Eldorado</name>
</document>
The corresponding code would be:
from lxml import etree
root = etree.Element("document", {"category" : "locations"})
for location in ["Timbuktu", "Eldorado"]:
name = etree.SubElement(root, "name")
name.text = location
tree = etree.ElementTree(element=root, file=None, parser=None)
tree.write('test.xml', pretty_print=True, xml_declaration=True)
If you want to add further sub-elements under name then you have to nest another for loop and create subelements under the name tag object.
Related
I have tried to save the xml file in the following variable and later work on it as normal xml file. This is not working. How can I approach this situation. I need to edit the xml file without editing in the original file and without creating a new xml file. Is that possible?
comment_2 = open("cool.xml").read()
Thanks and Regards
You can use xml.etree.ElementTree to parse the XML file and then save it to a variable:
import xml.getElementTree as ET
tree = ET.parse('xml.etr.xml')
root = tree.getroot()
root.save(root)
root_variable = root_variable
Then you can save the xml file to an instance of ElementTree.
Recently ive been messing around with editing xml files with a python script for a project im working on but i cant figure out how to edit the attributes of the root element.
for example the xml file would say:
<root width="200">
<element1>
</element1>
</root>
what i want to do is have my code find the width attribute and change it to some other value, i know how to edit elements after the root but not the root itself
code im using for changing attributes
You could use the following module xml.etree.ElementTree. With this module you can set up attributes using xml.etree.ElementTree.Element.set()
Here is an example of snippet you could use:
import xml.etree.ElementTree as ET
tree = ET.parse('input.xml')
root = tree.getroot()
root.set('width','400')
print(root.attrib)
tree.write('output.xml')
I am trying to parse an XML file with several namespaces. I already have a function which produces namespace map – a dictionary with namespace prefixes and namespace identifiers (example in the code). However, when I pass this dictionary to the findall() method, it works only with the first namespace but does not return anything if an element on XML path is in another namespace.
(It works only in case of the first namespace which has None as its prefix.)
Here is a code sample:
import xml.etree.ElementTree as ET
file - '.\folder\example_file.xml' # path to the file
xml_path = './DataArea/Order/Item/Price' # XML path to the element node
tree = ET.parse(file)
root = tree.getroot()
nsmap = dict([node for _, node in ET.iterparse(exp_file, events=['start-ns'])])
# This produces a dictionary with namespace prefixes and identifiers, e.g.
# {'': 'http://firstnamespace.example.com/', 'foo': 'http://secondnamespace.example.com/', etc.}
for elem in root.findall(xml_path, nsmap):
# Do something
EDIT:
On the mzjn's suggestion, I'm including sample XML file:
<?xml version="1.0" encoding="utf-8"?>
<SampleOrder xmlns="http://firstnamespace.example.com/" xmlns:foo="http://secondnamespace.example.com/" xmlns:bar="http://thirdnamespace.example.com/" xmlns:sta="http://fourthnamespace.example.com/" languageCode="en-US" releaseID="1.0" systemEnvironmentCode="PROD" versionID="1.0">
<ApplicationArea>
<Sender>
<SenderCode>4457</SenderCode>
</Sender>
</ApplicationArea>
<DataArea>
<Order>
<foo:Item>
<foo:Price>
<foo:AmountPerUnit currencyID="USD">58000.000000</foo:AmountPerUnit>
<foo:TotalAmount currencyID="USD">58000.000000</foo:TotalAmount>
</foo:Price>
<foo:Description>
<foo:ItemCode>259601</foo:ItemCode>
<foo:ItemName>PORTAL GUN 6UBC BLUE</foo:ItemName>
</foo:Description>
</foo:Item>
<bar:Supplier>
<bar:SupplierID>4474</bar:SupplierID>
<bar:SupplierName>APERTURE SCIENCE, INC</bar:SupplierName>
</bar:Supplier>
<sta:DeliveryLocation>
<sta:RecipientID>103</sta:RecipientID>
<sta:RecipientName>WARHOUSE 664</sta:RecipientName>
</sta:DeliveryLocation>
</Order>
</DataArea>
</SampleOrder>
You should specify the namespaces in your xml_path, for example: ./foo:DataArea/Order/Item/bar:Price. The reason it works with the empty namespace is because it is the default, you don't have to specify that one in your path.
Based on Jan Jaap Meijerink's answer and mzjn's comments under the question, the solution is to insert namespace prefixed in the XML path. This can be done by inserting a wildcard {*} as mzjn's comment and this answer (https://stackoverflow.com/a/62117710/407651) suggest.
To document the solution, you can add this simple operation to your code:
xml_path = './DataArea/Order/Item/Price/TotalAmount'
xml_path_splitted_to_list = xml_path.split('/')
xml_path_with_wildcard_prefix = '/{*}'.join(xml_path_splitted_to_list)
In case there are two or more nodes with the same XML path but different namespaces, findall() method (quite naturally) accesses all of those element nodes.
I have an XML input file which I need to split into multiple files based on MAPPING and WORKFLOW tags.
Since I have two MAPPING tags in my input XML and one WORKFLOW tag, I need to generate three files:
m_demo_trans_agg.XML
m_demo_trans_exp.XML
wf_m_demo_trans_agg_exp.XML
So, my mapping file (starting with m_) will have tags SOURCE, TARGET, and MAPPINGS. The workflow file will have tags WORKFLOW and CONFIG.
Please let me know how can I create mapping XML.
I started with workflow XML creation.
My code looks like:
import xml.etree.ElementTree
tree = ET.parse('input.xml')
root = tree.getroot()
target_node_first_parent = 'FOLDER'
target_nodes = ['SOURCE', 'TARGET', 'MAPPING']
for node in root.iter(target_node_first_parent):
for subnode in node.iter():
if subnode.tag in ['SOURCE', 'TARGET', 'MAPPING']:
print(subnode.tag)
node.remove(subnode)
out_tree = ET.ElementTree(root)
out_tree.write('output.xml')
I am getting the TARGET tags in my output.xml.
I am open to using any libraries apart from xml.etree.ElementTree.
Please assist.
Thanks
I know how to parse xml with sax in python, but how would I go about inserting elements into the document i'm parsing? Do I have to create a separate file?
Could someone provide a simple example or alter the one I've put below. Thanks.
from xml.sax.handler import ContentHandler
from xml.sax import make_parser
import sys
class aHandler(ContentHandler):
def startElement(self, name, attrs):
print "<",name,">"
def characters(self, content):
print content
def endElement(self,name):
print "</",name,">"
handler = aHandler()
saxparser = make_parser()
saxparser.setContentHandler(handler)
datasource = open("settings.xml","r")
saxparser.parse(datasource)
<?xml version="1.0"?>
<names>
<name>
<first>First1</first>
<second>Second1</second>
</name>
<name>
<first>First2</first>
<second>Second2</second>
</name>
<name>
<first>First3</first>
<second>Second3</second>
</name>
</names>
With DOM, you have the entire xml structure in memory.
With SAX, you don't have a DOM available, so you don't have anything to append an element to.
The main reason for using SAX is if the xml structure is really, really huge-- if it would be a serious performance hit to place the DOM in memory. If that isn't the case (as it appears to be from your small sample xml file), I would always use DOM vs. SAX.
If you go the DOM route, (which seems to be the only option to me), look into lxml. It's one of the best python xml libraries around.