Multiple results on a Xml stack with lml (Python) - python

This is what i want into an external xml file, through a for bucle add a registry with the same tag in <Data>as <Availability> and <Price> like this:
<UpdateInventoryRequest>
<StartDate>21/12/2015</StartDate>
<RoomId>1</RoomId>
<Data>
<Availability>1</Availability>
<Price>100</Price>
<Availability>3</Availability>
<Price>120</Price>
</Data>
</UpdateInventoryRequest>
And this is my code now, everytime returns the same value in all fields:
from lxml import etree
# Create Xml
root = etree.Element("UpdateInventoryRequest")
doc = etree.ElementTree(root)
root.append(etree.Element("StartDate"))
root.append(etree.Element("RoomId"))
root.append(etree.Element("Data"))
data_root = root[2]
data_root.append(etree.Element("Availability"))
data_root.append(etree.Element("Price"))
# Xml in code
def buildXmlUpdate(dfrom, roomId, ldays):
start_date_sard = dfrom
roomId = str(roomId)
room_id_sard = roomId
for n in ldays:
print (dfrom, roomId, n)
ldays[-1]['avail'] = str(ldays[-1]['avail'])
ldays[-1]['price'] =str(ldays[-1]['price'])
availability_in_data = ldays[-1]['avail']
price_in_data = ldays[-1]['price']
root[0].text = start_date_sard
root[1].text = room_id_sard
data_root[0].text = availability_in_data
data_root[1].text = price_in_data
#here execute the function
buildXmlUpdate('21/12/2015', 1, [{'avail': 1, 'price': 100}, {'avail': 3, 'price': 120}])
doc.write('testoutput.xml', pretty_print=True)

If it's the case that you want your script to build an XML packet as you've shown, there are a few issues.
You're doing a lot of swapping of variables around, simply to convert them to strings - for the most part you can just use the Python string conversion (str()) on demand.
In your loop, the data you are trying to deal with is in the variable n, however, when you are pulling data out, it's from the variable ldays, which means the data you are trying to put into your XML is the same, regardless of the number of times you go through the loop.
You've built an XML object with a single "Availability" element, and a single "Price" element, so there is no way, given the code you presented, you are ever going to generate multiple "Availability" and "Price" elements as in your sample XML file.
This isn't necessarily the best way to do things, but here is a potential solutions, utilizing the paradigms you've already established:
from lxml import etree
def buildXmlUpdate(dfrom, roomId, ldays):
root = etree.Element("UpdateInventoryRequest")
root.append(etree.Element("StartDate"))
root[-1].text = dfrom
root.append(etree.Element("RoomId"))
root[-1].text = str(roomId)
root.append(etree.Element("Data"))
dataroot = root[-1]
for item in ldays:
dataroot.append(etree.Element("Availability"))
dataroot[-1].text = str(item['avail'])
dataroot.append(etree.Element("Price"))
dataroot[-1].text = str(item['price'])
return root
myroot = buildXmlUpdate('21/12/2015', 1, [{'avail': 1, 'price': 100}, {'avail': 3, 'price': 120}])
print etree.tostring(myroot, pretty_print=True)
Again, this is only one possible way to do this; there are certainly more approaches you could take.
And if you haven't already, I might suggest going through the LXML Tutorial and trying the different things they go through there, as it may help you find better ways to do what you want.

Related

Run and evaluate imported text file in place as Python code

I have some Python code that is generated dynamically and stored in a text file. It basically consists of various variables like lists and strings that store data. This information is fed to a class to instantiate different objects. How can I feed the data from the text files into the class?
Here is my class:
class SomethingA(Else):
def construct(self):
// feed_data_a_here
self.call_method()
class SomethingB(Else):
def construct(self):
// feed_data_b_here
self.call_method()
Here is some sample content from the text_a file. As you can see, this is some valid Python code that I need to feed directly into the object. The call the call_method() depends on this data for the output.
self.height = 12
self.id = 463934
self.name = 'object_a'
Is there any way to load this data into the class without manually copying and pasting all of its from the text file one by one?
Thanks.
I would probably write a parser for your files which would delete 'self.' at the beginning and add the variable to the dictionary:
import re
# You could use more apprpriate regex depending on expected var names
regex = 'self\.(?P<var_name>\D+\d*) = (?P<var_value>.*)'
attributes= dict()
with open(path) as file:
for line in file:
search = re.search(regex, line)
var_name = search.group(var_name)
var_value = search.group(var_value).strip() # remove accidentalwhite spaces
attributes[var_name] = var_value
foo = classA(**attributes)
example of the regex in work
Edit
If you use the code I've proposed, all items in the dictionary will be of the string type. Probably you can try:
eval(), as proposed by #Welgriv but with small modification:
eval(f'attributes[{var_name}] = {var_value}')
If your data consists of standard python data and properly formated you can try using json:
import json
x = '12'
y = '[1, 2, 3]'
z = '{"A": 50.0, "B": 60.0}'
attributes = {}
for i, v in enumerate([x, y, z]):
attributes[f'var{i+1}'] = json.loads(v)
print(attributes)
# Prints
# {'var1': 12, 'var2': [1, 2, 3], 'var3': {'A': 50.0, 'B': 60.0}}
You probably look for the eval() function. It evaluate and try to execute a python expression as text. For example:
eval('a = 3')
Will create a variable named a equal to 3. In your case you should open the text file and then evaluate it.
Remarks
eval() function present some security issues because the user can potentially execute any code.
I'm not sure what is the overall context of what you try to implement but you might prefer to store your data (name, id, height...) in another way than python code such as key-values or something because it will make your application extremely dependent of the environment. As an example, if there is a python update and some code are deprecated your application will not work anymore.

Transforming XML into JSON loadable structure for BigQuery

I’m learning python on the job and need help improving my solution.
I need to load XML data into BigQuery.
I have it working but not sure if I have done it in a sensible way.
I call an API that returns an XML structure.
I use ElementTree to parse the XML and use tree.iter() to return the tags and text from the XML.
Printing my tags and text with:
for node in tree.iter():
print(f'{node.tag}, {node.text}')
Returns:
Tag Text
Responses None
Response None
ResponseId 393
ResponseText Please respond “Has this loaded”
ResponseType single
ResponseStatus 0
The Responses tag appears only once per API call but Response through to ResponseStatus are repeating groups, ResponseId is the key for each group. Each call would return less than a 100 repeating groups.
There is a key returned in the header, Response_key, that is the parent of all ResponseIds.
My aim is to take this data, convert to JSON and stream to BigQuery.
The table structure I need is:
ResponseKey, ResponseID, Response, ResponseText, ResponseType , ResponseStatus
The approach I use is
Use tree.iter() to loop and create a list
node_list = []
for node in tree.iter():
node_list.append(node.tag)
node_list.append(node.text)
Use itertools to group the list (this I found a difficult step)
r = 'Response '
response _split = [list(y) for x, y in itertools.groupby(node_list, lambda z:
z == r) if not x]
which returns:
[['Responses', 'None'], ['None', 'ResponseId', '393', 'ResponseText', Please respond “Has this loaded”
"', 'ResponseType', 'single', 'ResponseStatus', '0'], ['None', 'ResponseId', '394', 'ResponseText', Please confirm “Connection made” "', 'ResponseType', 'single', 'ResponseStatus', '0']]
Load into a Pandas data frame, remove any double quotes in case that causes BigQuery any issues.
Add ResponseKey as a column to the dataframe.
Convert data frame to JSON and pass to load_table_from_json.
It works but not sure if it is sensible.
Any suggested improvements would be appreciated.
Here is a sample of the XML:
{"GetResponses":"<Responses><Response><ResponseId>393938<\/ResponseId><ResponseText>Please respond to the following statement:\"The assigned task was easy to complete\"<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393939<\/ResponseId><ResponseText>Did you save your datafor later? Why\/why not?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>1<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393940<\/ResponseId><ResponseText>Did you notice how much it cost to find the item? How much was it?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393941<\/ResponseId><ResponseText>Did you select ‘signature on form’? Why\/why not?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>1<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393942<\/ResponseId><ResponseText>Was it easy to find thethe new page? Why\/why not?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>1<\/ResponseStatus><ExtendedType>4<\/ExtendedType><\/Response><Response><ResponseId>393943<\/ResponseId><ResponseText>Please enter your email. So that we can track your responses, we need you to provide this for each task.<\/ResponseText><ResponseShortCode>email<\/ResponseShortCode><ResponseType>text<\/ResponseType><ResponseStatus>1<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393944<\/ResponseId><ResponseText>Why didn't you save your datafor later?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393945<\/ResponseId><ResponseText>Why did you save your datafor later?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>4<\/ExtendedType><\/Response><Response><ResponseId>393946<\/ResponseId><ResponseText>Did you save your datafor later?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393947<\/ResponseId><ResponseText>Why didn't you select 'signature on form'?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393948<\/ResponseId><ResponseText>Why did you select 'signature on form'?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>4444449<\/ResponseId><ResponseText>Did you select ‘signature on form’?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393950<\/ResponseId><ResponseText>Why wasn't it easy to find thethe new page?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>4<\/ExtendedType><\/Response><Response><ResponseId>393951<\/ResponseId><ResponseText>Was it easy to find thethe new page?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393952<\/ResponseId><ResponseText>Please enter your email addressSo that we can track your responses, we need you to provide this for each task<\/ResponseText><ResponseShortCode>email<\/ResponseShortCode><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>4<\/ExtendedType><\/Response><\/Responses>"}
A sample JSON without all the processing steps:
node_list = []
for node in tree.iter():
node_list.append(node.tag)
node_list.append(node.text)
json_format = json.dumps(node_list )
print(json_format)
["Responses", null, "Response", null, "ResponseId", "393938", "ResponseText", Please respond to the following statement:\"The assigned task was easy to complete"", "ResponseType", "single", "ResponseStatus", "0", "ExtendedType", "0"]
I'm not sure what is the required outpot,
this is one way of doing it
import xml.etree.ElementTree as ET
import json
p = r"d:\tmp.xml"
tree = ET.parse(p)
root = tree.getroot()
json_dict = {}
json_dict[root.tag] = root.text
json_dict['response_list'] = []
for node in root:
tmp_dict = {}
for response_info in node:
tmp_dict[response_info.tag] = response_info.text
json_dict['response_list'].append(tmp_dict)
with open(r'd:\out.json', 'w') as of:
json.dump(json_dict, of)

How do I pull attributes from deeply nested XML sub elements using ElementTree?

I have .xml files that i am attempting to pull attributes from, and i am having trouble grabbing the attributes from the Raw_Material Sub element. Below is some sample data that represents what the files look like:
<XML_Order order_Id='1' terms='Net30' ship_via='UPS'>
<Line_Items>
<Line_Item upc='1234567' item_id='1' color='blk' qty='15'>
<Raw_Materials>
<Raw_Material Item_Id='H188' Vendor_Id='DI0001'> # This is what i need to grab
<Raw_Material Item_Id='ST03' Vendor_Id='DI0001'>
</Raw_Materials>
</Line_Item>
<Line_Item>
<Raw_Materials>
<Raw_Material>
<Raw_Material>
</Raw_Materials>
</Line_Item>
<Line_Item>
<Raw_Materials>
<Raw_Material>
<Raw_Material>
</Raw_Materials>
</Line_Item>
</Line_Items>
</XML_Order>
I am having no problem iterating and pulling the attributes from the Line_Item tags using the following code:
if filename.endswith('.xml'):
tree = Et.ElementTree(file = filename)
root = tree.getroot()
# order info
orderID = root.attrib['Order_Id'] # grab order ID from XML document
terms = root.attrib['terms']
shipVia = root.attrib['ship_via']
for child in root:
for grandchild in child:
upc = grandchild.get('upc')
lineItemID = grandchild.get('item_Id')
color = grandchild.get('item_Id')
# I assume this is where i would need a for loop to access the
# nested <Raw_Material> element and its attributes
I attempted to fill a list with the values with the following in my code (where the last comment is):
for element in tree.iter(tag = 'Raw_Material'):
itemID.append(element.get('Item_Id'))
and python returns the itemID list with the correct itemId's, but they are repeated over and over, when I only need it to grab the 2 item_Id attribute values. I think it is appending the list for each line item tag in my xml doc, instead of a new list for line item tag
Once I grab the data i need, would a list be the best way to hold the values? There will only ever be be 1 or 2 Raw_Material sub elements, and i don't want my variables to overwrite in a loop.
Try using xpath, something like this:
for raw_material in grandchild.findall('.//Raw_Material'):
# your code here
EDIT: Since grandchild refers to your LineItem elements, you might actually need something like .//Raw_Materials/RawMaterial as your xpath.
Below.
There are 6 Raw_Material elements in the xml doc. 4 of them are empty (zero attributes) and 2 of them has 2 attributes each. This is reflected in the 'output'
import xml.etree.ElementTree as ET
xml = """<XML_Order order_Id="1" terms="Net30" ship_via="UPS">
<Line_Items>
<Line_Item upc="1234567" item_id="1" color="blk" qty="15">
<Raw_Materials>
<Raw_Material Item_Id="H188" Vendor_Id="DI0001"/>
<Raw_Material Item_Id="ST03" Vendor_Id="DI0001"/>
</Raw_Materials>
</Line_Item>
<Line_Item>
<Raw_Materials>
<Raw_Material/>
<Raw_Material/>
</Raw_Materials>
</Line_Item>
<Line_Item>
<Raw_Materials>
<Raw_Material/>
<Raw_Material/>
</Raw_Materials>
</Line_Item>
</Line_Items>
</XML_Order>"""
root = ET.fromstring(xml)
# getting the attributes across the xml doc
print('Raw_Materials across the XML doc:')
raw_materials_lst = [entry.attrib for entry in
list(root.findall(".//Raw_Material"))]
print(raw_materials_lst)
# getting the attributes per Line_Item
print('Raw_Materials per line item:')
line_items = [entry for entry in list(root.findall(".//Line_Item"))]
for idx, line_item in enumerate(line_items,1):
print('{}) {}'.format(idx, [entry.attrib for entry in
list(line_item.findall(".//Raw_Material"))]))
output
Raw_Materials across the XML doc:
[{'Item_Id': 'H188', 'Vendor_Id': 'DI0001'}, {'Item_Id': 'ST03', 'Vendor_Id': 'DI0001'}, {}, {}, {}, {}]
Raw_Materials per line item:
1) [{'Item_Id': 'H188', 'Vendor_Id': 'DI0001'}, {'Item_Id': 'ST03', 'Vendor_Id': 'DI0001'}]
2) [{}, {}]
3) [{}, {}]
HUGE thanks to #audiodude, we worked through this for about an hour last night and managed to come up with a workable solution. Below is what he came up with, the attribute data is going into a FileMaker databse, so he set up some boolean flags to capture the item_Id in the Raw_Material tag (since some of my xml files have these tags, and some do not).
for child in root:
for grandchild in child:
# grab any needed attribute data from the line_items element
has_sticker = False
has_tag = False
for material_item in grandchild.findall('.//Raw_Material'):
item_id = material_item.get('Item_Id')
if item_id.startswith('H'):
has_tag = True
liRecord['hasTag'] = 'True' # this is the record in my database
fms.edit(liRecord)
if item_id.startswith('ST'):
has_sticker = True
liRecord['hasSticker'] = 'True'
fms.edit(liRecord)
if liRecord['hasTag'] == 'False' and liRecord['hasSticker'] == 'False':
liRecord['hasTag'] = 'True'
fms.edit(liRecord)

Turning ElementTree findall() into a list

I'm using ElementTree findall() to find elements in my XML which have a certain tag. I want to turn the result into a list. At the moment, I'm iterating through the elements, picking out the .text for each element, and appending to the list. I'm sure there's a more elegant way of doing this.
#!/usr/bin/python2.7
#
from xml.etree import ElementTree
import os
myXML = '''<root>
<project project_name="my_big_project">
<event name="my_first_event">
<location>London</location>
<location>Dublin</location>
<location>New York</location>
<month>January</month>
<year>2013</year>
</event>
</project>
</root>
'''
tree = ElementTree.fromstring(myXML)
for node in tree.findall('.//project'):
for element in node.findall('event'):
event_name=element.attrib.get('name')
print event_name
locations = []
if element.find('location') is not None:
for events in element.findall('location'):
locations.append(events.text)
# Could I use something like this instead?
# locations.append(''.join.text(*events) for events in element.findall('location'))
print locations
Outputs this (which is correct, but I'd like to assign the findall() results directly to a list, in text format, if possible;
my_first_event
['London', 'Dublin', 'New York']
You can try this - it uses a list comprehension to generate the list without having to create a blank one and then append.
if element.find('location') is not None:
locations = [events.text for events in element.findall('location')]
With this, you can also get rid of the locations definition above, so your code would be:
tree = ElementTree.fromstring(myXML)
for node in tree.findall('.//project'):
for element in node.findall('event'):
event_name=element.attrib.get('name')
print event_name
if element.find('location') is not None:
locations = [events.text for events in element.findall('location')]
print locations
One thing you will want to be wary of is what you are doing with locations - it won't be defined if location doesn't exist, so you will get a NameError if you try to print it and it doesn't exist. If that is an issue, you can retain the locations = [] definition - if the matching element isn't found, the result will just be an empty list.

Populating Python list using data obtained from lxml xpath command

I'm reading instrument data from a specialty server that delivers the info in xml format. The code I've written is:
from lxml import etree as ET
xmlDoc = ET.parse('http://192.168.1.198/Bench_read.xml')
print ET.tostring(xmlDoc, pretty_print=True)
dmtCount = xmlDoc.xpath('//dmt')
print(len(dmtCount))
dmtVal = []
for i in range(1, len(dmtCount)):
dmtVal[i:0] = xmlDoc.xpath('./address/text()')
dmtVal[i:1] = xmlDoc.xpath('./status/text()')
dmtVal[i:2] = xmlDoc.xpath('./flow/text()')
dmtVal[i:3] = xmlDoc.xpath('./dp/text()')
dmtVal[i:4] = xmlDoc.xpath('./inPressure/text()')
dmtVal[i:5] = xmlDoc.xpath('./actVal/text()')
dmtVal[i:6] = xmlDoc.xpath('./temp/text()')
dmtVal[i:7] = xmlDoc.xpath('./valveOnPercent/text()')
print dmtVal
And the results I get are:
$python XMLparse2.py
<response>
<heartbeat>0x24</heartbeat>
<dmt node="1">
<address>0x21</address>
<status>0x01</status>
<flow>0.000000</flow>
<dp>0.000000</dp>
<inPressure>0.000000</inPressure>
<actVal>0.000000</actVal>
<temp>0x00</temp>
<valveOnPercent>0x00</valveOnPercent>
</dmt>
<dmt node="2">
<address>0x32</address>
<status>0x01</status>
<flow>0.000000</flow>
<dp>0.000000</dp>
<inPressure>0.000000</inPressure>
<actVal>0.000000</actVal>
<temp>0x00</temp>
<valveOnPercent>0x00</valveOnPercent>
</dmt>
</response>
...Starting to parse XML nodes
2
[]
...Done
Sooo, nothing is coming out. I've tried using /value in place of the /text() in the xpath call, but the results are unchanged. Is my problem:
1) An incorrect xpath command in the for loop? or
2) A problem in the way I've structured list variable dmtVal ? or
3) Something else I'm missing completely?
I'd welcome any suggestions! Thanks in advance...
dmtVal[i:0] is the syntax for slicing.
You probably wanted indexing: dmtVal[i][0]. But that also wouldn't work.
You don't typically loop over the indices of a list in python, you loop over it's elements instead.
So, you'd use
for element in some_list:
rather than
for i in xrange(len(some_list)):
element = some_list[i]
The way you handle your xpaths is also wrong.
Something like this should work(not tested):
from lxml import etree as ET
xml_doc = ET.parse('http://192.168.1.198/Bench_read.xml')
dmts = xml_doc.xpath('//dmt')
dmt_val = []
for dmt in dmts:
values = []
values.append(dmt.xpath('./address/text()'))
# do this for all values
# making this a loop would be a good idea
dmt_val.append(values)
print dmt_val
Counting <dmt/> tags and then iterating over them by index is both inefficient and un-Pythonic. Apart from that you are using wrong syntax (slice instead of index) for indexing arrays. In fact you don't need to index the val at all, to do it Pythonic way use list comprehensions.
Here's a slightly modified version of what stranac suggested:
from lxml import etree as ET
xmlDoc = ET.parse('http://192.168.1.198/Bench_read.xml')
print ET.tostring(xmlDoc, pretty_print=True)
response = xmlDoc.getroot()
tags = (
'address',
'status',
'flow',
'dp',
'inPressure',
'actVal',
'temp',
'valveOnPercent',
)
dmtVal = []
for dmt in response.iter('dmt'):
val = [dmt.xpath('./%s/text()' % tag) for tag in tags]
dmtVal.append(val)
Can you explain this:
dmtVal[i:0]
If the iteration starts with a count of 0 and increments over times, you're not actually storing anything in the list.

Categories

Resources