Populating Python list using data obtained from lxml xpath command - python

I'm reading instrument data from a specialty server that delivers the info in xml format. The code I've written is:
from lxml import etree as ET
xmlDoc = ET.parse('http://192.168.1.198/Bench_read.xml')
print ET.tostring(xmlDoc, pretty_print=True)
dmtCount = xmlDoc.xpath('//dmt')
print(len(dmtCount))
dmtVal = []
for i in range(1, len(dmtCount)):
dmtVal[i:0] = xmlDoc.xpath('./address/text()')
dmtVal[i:1] = xmlDoc.xpath('./status/text()')
dmtVal[i:2] = xmlDoc.xpath('./flow/text()')
dmtVal[i:3] = xmlDoc.xpath('./dp/text()')
dmtVal[i:4] = xmlDoc.xpath('./inPressure/text()')
dmtVal[i:5] = xmlDoc.xpath('./actVal/text()')
dmtVal[i:6] = xmlDoc.xpath('./temp/text()')
dmtVal[i:7] = xmlDoc.xpath('./valveOnPercent/text()')
print dmtVal
And the results I get are:
$python XMLparse2.py
<response>
<heartbeat>0x24</heartbeat>
<dmt node="1">
<address>0x21</address>
<status>0x01</status>
<flow>0.000000</flow>
<dp>0.000000</dp>
<inPressure>0.000000</inPressure>
<actVal>0.000000</actVal>
<temp>0x00</temp>
<valveOnPercent>0x00</valveOnPercent>
</dmt>
<dmt node="2">
<address>0x32</address>
<status>0x01</status>
<flow>0.000000</flow>
<dp>0.000000</dp>
<inPressure>0.000000</inPressure>
<actVal>0.000000</actVal>
<temp>0x00</temp>
<valveOnPercent>0x00</valveOnPercent>
</dmt>
</response>
...Starting to parse XML nodes
2
[]
...Done
Sooo, nothing is coming out. I've tried using /value in place of the /text() in the xpath call, but the results are unchanged. Is my problem:
1) An incorrect xpath command in the for loop? or
2) A problem in the way I've structured list variable dmtVal ? or
3) Something else I'm missing completely?
I'd welcome any suggestions! Thanks in advance...

dmtVal[i:0] is the syntax for slicing.
You probably wanted indexing: dmtVal[i][0]. But that also wouldn't work.
You don't typically loop over the indices of a list in python, you loop over it's elements instead.
So, you'd use
for element in some_list:
rather than
for i in xrange(len(some_list)):
element = some_list[i]
The way you handle your xpaths is also wrong.
Something like this should work(not tested):
from lxml import etree as ET
xml_doc = ET.parse('http://192.168.1.198/Bench_read.xml')
dmts = xml_doc.xpath('//dmt')
dmt_val = []
for dmt in dmts:
values = []
values.append(dmt.xpath('./address/text()'))
# do this for all values
# making this a loop would be a good idea
dmt_val.append(values)
print dmt_val

Counting <dmt/> tags and then iterating over them by index is both inefficient and un-Pythonic. Apart from that you are using wrong syntax (slice instead of index) for indexing arrays. In fact you don't need to index the val at all, to do it Pythonic way use list comprehensions.
Here's a slightly modified version of what stranac suggested:
from lxml import etree as ET
xmlDoc = ET.parse('http://192.168.1.198/Bench_read.xml')
print ET.tostring(xmlDoc, pretty_print=True)
response = xmlDoc.getroot()
tags = (
'address',
'status',
'flow',
'dp',
'inPressure',
'actVal',
'temp',
'valveOnPercent',
)
dmtVal = []
for dmt in response.iter('dmt'):
val = [dmt.xpath('./%s/text()' % tag) for tag in tags]
dmtVal.append(val)

Can you explain this:
dmtVal[i:0]
If the iteration starts with a count of 0 and increments over times, you're not actually storing anything in the list.

Related

Can you iterate over only tags with the .children iterator from BeautifulSoup?

I am pulling down an xml file using BeautifulSoup with this code
dlink = r'https://www.sec.gov/Archives/edgar/data/1040188/000104018820000126/primary_doc.xml'
dreq = requests.get(dlink).content
dsoup = BeautifulSoup(dreq, 'lxml')
There is a level I'm trying to access and then place the elements into a dictionary. I've got it working with this code:
if dsoup.otherincludedmanagerscount.text != '0':
inclmgr = []
for i in dsoup.find_all('othermanagers2info'):
for m in i.find_all('othermanager2'):
for o in m.find_all('othermanager'):
imd={}
if o.cik:
imd['cik'] = o.cik.text
if o.form13ffilenumber:
imd['file_no'] = o.form13ffilenumber.text
imd['name'] = o.find('name').text
inclmgr.append(imd)
comp_dict['incl_mgr'] = inclmgr
I assume its easier to use the .children or .descendants generators, but every time I run it, I get an error. Is there a way to only iterate over tags using the BeautifulSoup generators?
Something like this?
for i in dsoup.othermanagers2info.children:
imd['cik'] = i.cik.text
AttributeError: 'NavigableString' object has no attribute 'cik'
Assuming othermanagers2info is a single item; you can create the same results using 1 for loop:
for i in dsoup.find('othermanagers2info').find_all('othermanager'):
imd={}
if i.cik:
imd['cik'] = i.cik.text
if i.form13ffilenumber:
imd['file_no'] = i.form13ffilenumber.text
imd['name'] = i.find('name').text
inclmgr.append(imd)
comp_dict['incl_mgr'] = inclmgr
You can also do for i in dsoup.find('othermanagers2info').findChildren():. However this will produce different results (unless you add additional code). It will flattened the list and include both parent & child items. You can also pass in a node name

multiple findAll in one for loop

I'm using BeatufulSoap to read some data from web page.
This code works fine, but I would like to improve it.
How do I make the for loop to extract more than one piece of data per iteration? Here I have 3 for loops to get values from:
for elem in bsObj.findAll('div', class_="grad"): ...
for elem in bsObj.findAll('div', class_="ulica"): ...
for elem in bsObj.findAll('div', class_="kada"): ...
How to change this to work in one for loop? Of course I'd like a simple solution.
Output can be list
My code so far
from bs4 import BeautifulSoup
# get data from a web page into the ``html`` varaible here
bsObj = BeautifulSoup(html.read(),'lxml')
mj=[]
adr=[]
vri=[]
for mjesto in bsObj.findAll('div', class_="grad"):
print (mjesto.get_text())
mj.append(mjesto.get_text())
for adresa in bsObj.findAll('div', class_="ulica"):
print (adresa.get_text())
adr.append(adresa.get_text())
for vrijeme in bsObj.findAll('div', class_="kada"):
print (vrijeme.get_text())
vri.append(vrijeme.get_text())
You can use BeautifulSoup's select method to target your various desired elements, and do whatever you want with them. In this case we are going to simplify the CSS selector pattern by using the :is() pseudo-class, but basically we are searching for any div that has class grad, ulica, or kada. As each element is returned that matches the pattern, we just sort them by which class they correspond to:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
lokacija="http://www.hep.hr/ods/bez-struje/19?dp=koprivnica&el=124"
datum="12.02.2019"
lokacija=lokacija+"&datum="+datum
print(lokacija)
r = requests.get(lokacija)
print(type(str(r)))
print(r.status_code)
html = urlopen(lokacija)
bsObj = BeautifulSoup(html.read(),'lxml')
print("Datum radova:",datum)
print("HEP područje:",bsObj.h3.get_text())
mj=[]
adr=[]
vri=[]
hep_podrucje=bsObj.h3.get_text()
for el in bsObj.select('div:is(.grad, .ulica, .kada)'):
if 'grad' in el.get('class'):
print (el.get_text())
mj.append(el.get_text())
elif 'ulica' in el.get('class'):
print(el.get_text())
adr.append(el.get_text())
elif 'kada' in el.get('class'):
print (el.get_text())
vri.append(el.get_text())
Note: basic explanation ahead. If you know this, skip directly to the listing of possibilities
To change the code into a loop, you have to look at the part that stays the same and the part that varies. In your case, you find a div, get the text and append it to a list.
The class attribute of the div objects varies each time, so does the list you append to. A for loop works by having one variable that is assigned different values each iteration, then executig the code within.
We get a basic structure:
for div_class in <div classes>:
<stuff to do>
Now, in <stuff to do>, we have a different list each time. We need some way of getting a different list into the loop. For this, there are multiple possibilities:
Put the list into a dict and use item lookup
zip the lists with <div classes> and iterate over them
The first two will involve using nested loops, the result looking similar to this:
list_1 = []
list_2 = []
list_3 = []
for div_class, the_list in zip(['div_cls1', 'div_cls2', 'div_cls3'], [list_1, list_2, list_3]):
for elem in bsObj.find_all('div', class_=div_class):
the_list.append(elem.get_text())
or
lists = {'div_cls1': [], 'div_cls2': [], 'div_cls3': []}
for div_class in lists: # note: keys MUST match the class of div elements
for elem in bsObj.find_all('div', class_=div_class):
lists[div_class].append(elem.get_text)
Of course, the inner loop could be replaced by list comprehension (works for the dict approach): lists[div_class] = [elem.get_text() for elem in bsObj.find_all('div', class_=div_class)]

Python SKIP element in JSON

I would like to get an element from the JSON tree
dataMod = data['products'][VARIABLE]['bla']['bla']
but I faced an issue when one of the element in this tree have VARIABLE inside, it is any clean way of skipping it? like:
dataMod = data['products'][*]['bla']['bla']
ANSWER:
for p in data['products']:
skipPLU = data['products'][p]
productPLU = skipPLU['bla']['bla']
Is this what you want?
head = data['products']
for skip in head :
for data in head[skip]:
print(head[skip][data]['bla']['bla'])
This will get all data under data['products'][VARIABLE]

Multiple results on a Xml stack with lml (Python)

This is what i want into an external xml file, through a for bucle add a registry with the same tag in <Data>as <Availability> and <Price> like this:
<UpdateInventoryRequest>
<StartDate>21/12/2015</StartDate>
<RoomId>1</RoomId>
<Data>
<Availability>1</Availability>
<Price>100</Price>
<Availability>3</Availability>
<Price>120</Price>
</Data>
</UpdateInventoryRequest>
And this is my code now, everytime returns the same value in all fields:
from lxml import etree
# Create Xml
root = etree.Element("UpdateInventoryRequest")
doc = etree.ElementTree(root)
root.append(etree.Element("StartDate"))
root.append(etree.Element("RoomId"))
root.append(etree.Element("Data"))
data_root = root[2]
data_root.append(etree.Element("Availability"))
data_root.append(etree.Element("Price"))
# Xml in code
def buildXmlUpdate(dfrom, roomId, ldays):
start_date_sard = dfrom
roomId = str(roomId)
room_id_sard = roomId
for n in ldays:
print (dfrom, roomId, n)
ldays[-1]['avail'] = str(ldays[-1]['avail'])
ldays[-1]['price'] =str(ldays[-1]['price'])
availability_in_data = ldays[-1]['avail']
price_in_data = ldays[-1]['price']
root[0].text = start_date_sard
root[1].text = room_id_sard
data_root[0].text = availability_in_data
data_root[1].text = price_in_data
#here execute the function
buildXmlUpdate('21/12/2015', 1, [{'avail': 1, 'price': 100}, {'avail': 3, 'price': 120}])
doc.write('testoutput.xml', pretty_print=True)
If it's the case that you want your script to build an XML packet as you've shown, there are a few issues.
You're doing a lot of swapping of variables around, simply to convert them to strings - for the most part you can just use the Python string conversion (str()) on demand.
In your loop, the data you are trying to deal with is in the variable n, however, when you are pulling data out, it's from the variable ldays, which means the data you are trying to put into your XML is the same, regardless of the number of times you go through the loop.
You've built an XML object with a single "Availability" element, and a single "Price" element, so there is no way, given the code you presented, you are ever going to generate multiple "Availability" and "Price" elements as in your sample XML file.
This isn't necessarily the best way to do things, but here is a potential solutions, utilizing the paradigms you've already established:
from lxml import etree
def buildXmlUpdate(dfrom, roomId, ldays):
root = etree.Element("UpdateInventoryRequest")
root.append(etree.Element("StartDate"))
root[-1].text = dfrom
root.append(etree.Element("RoomId"))
root[-1].text = str(roomId)
root.append(etree.Element("Data"))
dataroot = root[-1]
for item in ldays:
dataroot.append(etree.Element("Availability"))
dataroot[-1].text = str(item['avail'])
dataroot.append(etree.Element("Price"))
dataroot[-1].text = str(item['price'])
return root
myroot = buildXmlUpdate('21/12/2015', 1, [{'avail': 1, 'price': 100}, {'avail': 3, 'price': 120}])
print etree.tostring(myroot, pretty_print=True)
Again, this is only one possible way to do this; there are certainly more approaches you could take.
And if you haven't already, I might suggest going through the LXML Tutorial and trying the different things they go through there, as it may help you find better ways to do what you want.

Turning ElementTree findall() into a list

I'm using ElementTree findall() to find elements in my XML which have a certain tag. I want to turn the result into a list. At the moment, I'm iterating through the elements, picking out the .text for each element, and appending to the list. I'm sure there's a more elegant way of doing this.
#!/usr/bin/python2.7
#
from xml.etree import ElementTree
import os
myXML = '''<root>
<project project_name="my_big_project">
<event name="my_first_event">
<location>London</location>
<location>Dublin</location>
<location>New York</location>
<month>January</month>
<year>2013</year>
</event>
</project>
</root>
'''
tree = ElementTree.fromstring(myXML)
for node in tree.findall('.//project'):
for element in node.findall('event'):
event_name=element.attrib.get('name')
print event_name
locations = []
if element.find('location') is not None:
for events in element.findall('location'):
locations.append(events.text)
# Could I use something like this instead?
# locations.append(''.join.text(*events) for events in element.findall('location'))
print locations
Outputs this (which is correct, but I'd like to assign the findall() results directly to a list, in text format, if possible;
my_first_event
['London', 'Dublin', 'New York']
You can try this - it uses a list comprehension to generate the list without having to create a blank one and then append.
if element.find('location') is not None:
locations = [events.text for events in element.findall('location')]
With this, you can also get rid of the locations definition above, so your code would be:
tree = ElementTree.fromstring(myXML)
for node in tree.findall('.//project'):
for element in node.findall('event'):
event_name=element.attrib.get('name')
print event_name
if element.find('location') is not None:
locations = [events.text for events in element.findall('location')]
print locations
One thing you will want to be wary of is what you are doing with locations - it won't be defined if location doesn't exist, so you will get a NameError if you try to print it and it doesn't exist. If that is an issue, you can retain the locations = [] definition - if the matching element isn't found, the result will just be an empty list.

Categories

Resources