I want to create a function to modify XML content without changing the format. I managed to change the text but I can't do it without changing the format in XML.
So now, what I wanted to do is to add space before and after CDATA in a XML file.
Default XML file:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>
And I am getting this result:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row><![CDATA[001 001 099]]></Row>
</Data>
</Device>
</Map>
</Maps>
However, I want the new xml to be like this:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 099]]> </Row>
</Data>
</Device>
</Map>
</Maps>
Here is my code:
from lxml import etree as ET
def xml_new(f,fpath,newtext,xmlrow):
xmlrow = 19
parser = ET.XMLParser(strip_cdata=False)
tree = ET.parse(f, parser)
root = tree.getroot()
for child in root:
value = child[0][2][xmlrow].text
text = ET.CDATA("001 001 099")
child[0][2][xmlrow] = ET.Element('Row')
child[0][2][xmlrow].text = text
child[0][2][xmlrow].tail = "\n"
ET.register_namespace('A', "http://www.semi.org")
tree.write(fpath,encoding='utf-8',xml_declaration=True)
return value
Anyone can help me on this? thanks in advance!
I don't quite understand what you want to do. Here's an example for you. I don't know if it can meet your needs.
from simplified_scrapy import SimplifiedDoc,req,utils
html ='''<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>'''
doc = SimplifiedDoc(html)
row = doc.Data.Row # Get the node you want to modify.
row.setContent(" "+row.html+" ") # Modify the node content.
print (doc.html)
Result:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice />
<Bin>
<Bin Bin="001" />
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>
thanks for all your help. I have found another way to achieve the result I want
This is the code:
# what you want to change
replaceby = '020]]> </Row>\n'
# row you want to change
row = 1
# col you want to change based on list
col = 3
file = open(file,'r')
line = file.readlines()
i = 0
editedXML=[]
for l in line:
if 'cdata' in l.lower():
i=i+1
if i == row:
oldVal = l.split(' ')
newVal = []
for index, old in enumerate(oldVal):
if index == col:
newVal.append(replaceby)
else:
newVal.append(old)
editedXML.append(' '.join(newVal))
else:
editedXML.append(l)
else:
editedXML.append(l)
file2 = open(newfile,'w')
file2.write(''.join(editedXML))
file2.close()
Related
I have an xml in python, need to obtain the elements of the "Items" tag in an iterable list.
I need get a iterable list from this XML, for example like it:
Item 1: Bicycle, value $250, iva_tax: 50.30
Item 2: Skateboard, value $120, iva_tax: 25.0
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<data>
<info>Listado de items</info>
<detalle>
<![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<tienda id="tiendaProd" version="1.1.0">
<items>
<item>
<nombre>Bicycle</nombre>
<valor>250</valor>
<data>
<tax name="iva" value="50.30"></tax>
</data>
</item>
<item>
<nombre>Skateboard</nombre>
<valor>120</valor>
<data>
<tax name="iva" value="25.0"></tax>
</data>
</item>
<item>
<nombre>Motorcycle</nombre>
<valor>900</valor>
<data>
<tax name="iva" value="120.50"></tax>
</data>
</item>
</items>
</tienda>]]>
</detalle>
</data>
I am working with
import xml.etree.ElementTree as ET
for example
import xml.etree.ElementTree as ET
xml = ET.fromstring(stringBase64)
ite = xml.find('.//detalle').text
tixml = ET.fromstring(ite)
You can use BeautifulSoup4 (BS4) to do this.
from bs4 import BeautifulSoup
#Read XML file
with open("example.xml", "r") as f:
contents = f.readlines()
#Create Soup object
soup = BeautifulSoup(contents, 'xml')
#find all the item tags
item_tags = soup.find_all("item") #returns everything in the <item> tags
#find the nombre and valor tags within each item
results = {}
for item in item_tags:
num = item.find("nombre").text
val = item.find("valor").text
results[str(num)] = val
#Prints dictionary with key value pairs from the xml
print(results)
I have an log file from an application in XML-like format that I'm trying to parse. As you can see from the file, one "group" starts with a [trace] line, and contains 4 nodes - RequestMeta, Request, ReplyMeta, and Reply.
Once the file is parsed, I want to create an object for each "group" and use the objects for further processing. There could be from 1:n groups depending on the complexity of the log file.
I have been able to parse the XML, but I have some questions on how best to proceed based on it's structure.
The first problem is how to structure/re-structure the file for parsing. Since I'm adding a single root node to more than one "group", there will be no easy way for me to know which children of the root node belong together in that group. In the original file, the group is denoted as everything between the [trace] line and the next [trace] line.
I think I could potentially solve this by taking each string "group" and create a tree for each group instead of a tree for the entire file.
The second problem is how to store the data once it's parsed. Each and every request/reply will contain different data elements under the srvdata node. I'm not sure how to dynamically store a variable number of values that have a variable number of names.
After parsing all of the data, I want to output it in a simple webpage that looks something like https://imgur.com/a/2l6ZSJK
py script
import xml.etree.ElementTree as ET
with open('C:/code/mra/requestreply.txt') as f:
txt = f.read()
pos = 0
# replace all [trace] lines
while pos >= 0:
pos = txt.find('[trace-')
pos2 = txt.find('\n', pos + 1) + 1
if pos >= 0:
txt = txt.replace(txt[pos:pos2], '')
# replace all xml instances because they are out of order
txt = txt.replace('<?xml version="1.0" encoding="utf-8"?>\n', '')
# add a master root node
xml = '<root>\n' + txt + '</root>'
tree = ET.fromstring(xml)
xml file - this is considered a single group (there could be hundreds)
[trace-592] TransactionID=6010 TransactionName=CPM.ExecuteDiscernScript User=MEPPS
<RequestMeta>
<?xml version="1.0" encoding="utf-8"?>
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</RequestMeta>
<Request>
<?xml version="1.0" encoding="utf-8"?>
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</Request>
<ReplyMeta>
<?xml version="1.0" encoding="utf-8"?>
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</ReplyMeta>
<Reply>
<?xml version="1.0" encoding="utf-8"?>
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</Reply>
I suggest modify your xml structure like this, I named the file trace.xml:
<?xml version="1.0" encoding="utf-8"?>
<root>
<!--[trace-592] TransactionID=6010 TransactionName=CPM.ExecuteDiscernScript User=MEPPS-->
<RequestMeta>
<!-- <?xml version="1.0" encoding="utf-8"?> -->
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</RequestMeta>
<Request>
<!-- <?xml version="1.0" encoding="utf-8"?> -->
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</Request>
<ReplyMeta>
<!-- <?xml version="1.0" encoding="utf-8"?> -->
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</ReplyMeta>
<Reply>
<!-- <?xml version="1.0" encoding="utf-8"?> -->
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</Reply>
</root>
Then you can parse each segment separate like:
import xml.etree.ElementTree as ET
def parseRequestMeta(RequestMeta):
"""Parse your interest here """
for root in RequestMeta:
print(root.tag)
for child in root.iter():
print(child.tag, child.text)
def parseRequest(Request):
psss
def parseReplyMeta(ReplyMeta):
psss
def parseReply(Reply):
psss
RequestMeta = []
Request = []
ReplyMeta = []
Reply = []
events = ["start", "end"]
for event, node in ET.iterparse('trace.xml', events=events):
if event == "end" and node.tag == "RequestMeta":
RequestMeta.append(node)
print(node.tag)
if event == "end" and node.tag == "Request":
Request.append(node)
print(node.tag)
if event == "end" and node.tag == "ReplyMeta":
ReplyMeta.append(node)
print(node.tag)
if event == "end" and node.tag == "Reply":
Reply.append(node)
print(node.tag)
parseRequestMeta(RequestMeta)
parseRequestMeta(Request)
parseRequestMeta(ReplyMeta)
parseRequestMeta(Reply)
I have one xml file that looks like this, XML1:
<?xml version='1.0' encoding='utf-8'?>
<report>
</report>
And the other one that is like this,
XML2:
<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla" >
<child1>
<child2>
....
</child2>
</child1>
</report>
I need to replace and put root element of XML2 without its children, so XML1 looks like this:
<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla">
</report>
Currently my code looks like this but it won't remove children but put whole tree inside:
source_tree = ET.parse('XML2.xml')
source_root = source_tree.getroot()
report = source_root.findall('report')
for child in list(report):
report.remove(child)
source_tree.write('XML1.xml', encoding='utf-8', xml_declaration=True)
Anyone has ide how can I achieve this?
Thanks!
Try the below (just copy attrib)
import xml.etree.ElementTree as ET
xml1 = '''<?xml version='1.0' encoding='utf-8'?>
<report>
</report>'''
xml2 = '''<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla" >
<child1>
<child2>
</child2>
</child1>
</report>'''
root1 = ET.fromstring(xml1)
root2 = ET.fromstring(xml2)
root1.attrib = root2.attrib
ET.dump(root1)
output
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla">
</report>
So here is working code:
source_tree = ET.parse('XML2.xml')
source_root = source_tree.getroot()
dest_tree = ET.parse('XML1.xml')
dest_root = dest_tree.getroot()
dest_root.attrib = source_root.attrib
dest_tree.write('XML1.xml', encoding='utf-8', xml_declaration=True)
sorry for my poor English. but i need your help ;(
i have 2 xml files.
one is:
<root>
<data name="aaaa">
<value>"old value1"</value>
<comment>"this is an old value1 of aaaa"</comment>
</data>
<data name="bbbb">
<value>"old value2"</value>
<comment>"this is an old value2 of bbbb"</comment>
</data>
</root>
two is:
<root>
<data name="aaaa">
<value>"value1"</value>
<comment>"this is a value 1 of aaaa"</comment>
</data>
<data name="bbbb">
<value>"value2"</value>
<comment>"this is a value2 of bbbb"</comment>
</data>
<data name="cccc">
<value>"value3"</value>
<comment>"this is a value3 of cccc"</comment>
</data>
</root>
one.xml will be updated from two.xml.
so, the one.xml should be like this.
one.xml(after) :
<root>
<data name="aaaa">
<value>"value1"</value>
<comment>"this is a value1 of aaaa"</comment>
</data>
<data name="bbbb">
<value>"value2"</value>
<comment>"this is a value2 of bbbb"</comment>
</data>
</root>
data name="cccc" is not exist in one.xml. therefore ignored.
actually what i want to do is
download two.xml(whole list) from db
update my one.xml (it contains DATA-lists that only the app uses) by two.xml
Any can help me please !!
Thanks!!
==============================================================
xml.etree.ElementTree
your code works with the example. but i found a problem in real xml file.
the real one.xml contains :
<?xml version="1.0" encoding="utf-8"?>
<root>
<resheader name="resmimetype">
<value>text/microsoft-resx</value>
</resheader>
<resheader name="version">
<value>2.0</value>
</resheader>
<resheader name="reader">
<value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader>
<resheader name="writer">
<value>System.Resources.ResXResourceWriter, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader>
<data name="NotesLabel" xml:space="preserve">
<value>Hinweise:</value>
<comment>label for input field</comment>
</data>
<data name="NotesPlaceholder" xml:space="preserve">
<value>z . Milch kaufen</value>
<comment>example input for notes field</comment>
</data>
<data name="AddButton" xml:space="preserve">
<value>Neues Element hinzufügen</value>
<comment>this string appears on a button to add a new item to the list</comment>
</data>
</root>
it seems, resheader causes trouble.
do you have any idea to fix?
You can use xml.etree.ElementTree and while there are propably more elegant ways, this should work on files that fit in memory if names are unique in two.xml
import xml.etree.ElementTree as ET
tree_one = ET.parse('one.xml')
root_one = tree_one.getroot()
tree_two = ET.parse('two.xml')
root_two = tree_two.getroot()
data_two=dict((e.get("name"), e) for e in root_two.findall("data"))
for eo in root_one.findall("data"):
name=eo.get("name")
tail=eo.tail
eo.clear()
eo.tail=tail
en=data_two[name]
for k,v in en.items():
eo.set(k,v)
eo.extend(en.findall("*"))
eo.text=en.text
tree_one.write("one.xml")
If your files do not fit in memory you can still use xml.dom.pulldom as long as single data entries do fit.
I am a novice in python and have the following task on hand.
I have a large xml file like the one below:
<Configuration>
<Parameters>
<Component Name='ABC'>
<Group Name='DEF'>
<Parameter Name='GHI'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
<Parameter Name='JKL'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
</Group>
<Group Name='MNO'>
<Parameter Name='PQR'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
<Parameter Name='TUV'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
</Group>
</Component>
</Parameters>
</Configuration>
In this xml file I have to parse through the component "ABC" go to group "MNO" and then to the parameter "TUV" and under this I have to change the item value to 10.
I have tried using xml.etree.cElementTree but to no use. And lxml dosent support on the server as its running a very old version of python. And I have no permissions to upgrade the version
I have been using the following code to parse and edit a relatively small xml:
def fnXMLModification(ArgStr):
argList = ArgStr.split()
strXMLPath = argList[0]
if not os.path.exists(strXMLPath):
fnlogs("XML File: " + strXMLPath + " does not exist.\n")
return False
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
f=open(strXMLPath, 'rt')
tree = ET.parse(f)
ValueSetFlag = False
AttrSetFlag = False
for strXPath in argList[1:]:
strXPathList = strXPath.split("[")
sxPath = strXPathList[0]
if len(strXPathList)==3:
# both present
AttrSetFlag = True
ValueSetFlag = True
valToBeSet = strXPathList[1].strip("]")
sAttr = strXPathList[2].strip("]")
attrList = sAttr.split(",")
elif len(strXPathList) == 2:
#anyone present
if "=" in strXPathList[1]:
AttrSetFlag = True
sAttr = strXPathList[1].strip("]")
attrList = sAttr.split(",")
else:
ValueSetFlag = True
valToBeSet = strXPathList[1].strip("]")
node = tree.find(sxPath)
if AttrSetFlag:
for att in attrList:
slist = att.split("=")
node.set(slist[0].strip(),slist[1].strip())
if ValueSetFlag:
node.text = valToBeSet
tree.write(strXMLPath)
fnlogs("XML File: " + strXMLPath + " has been modified successfully.\n")
return True
Using this function I am not able to traverse the current xml as it has lot of children attributes or sub groups.
import statement
import xml.etree.cElementTree as ET
Parse content by fromstring method.
root = ET.fromstring(data)
Iterate according our requirement and get target Item tag and change value of Value attribute
for component_tag in root.iter("Component"):
if "Name" in component_tag.attrib and component_tag.attrib['Name']=='ABC':
for group_tag in component_tag.iter("Group"):
if "Name" in group_tag.attrib and group_tag.attrib['Name']=='MNO':
#for value_tag in group_tag.iter("Value"):
for item_tag in group_tag.findall("Parameter[#Name='TUV']/Value/Item"):
item_tag.attrib["Value"] = "10"
We can use Xpath to get target Item tag
for item_tag in root.findall("Parameters/Component[#Name='ABC']/Group[#Name='MNO']/Parameter[#Name='TUV']/Value/Item"):
item_tag.attrib["Value"] = "10"
Use tostring method to get content.
data = ET.tostring(root)