Element Tree, Parsing XML didnt appear in QLineEdit - python

i make plugin in QGIS to open and parse xml from local disk or removable disk, this is code i use to open xml file :
from PyQt4 import QtCore, QtGui
from ui_testparse import Ui_testparse
import xml.etree.ElementTree as ETree
# create the dialog for zoom to point
class testparseDialog(QtGui.QDialog):
def __init__(self):
QtGui.QDialog.__init__(self)
# Set up the user interface from Designer.
self.ui = Ui_testparse()
self.ui.setupUi(self)
opendata = self.ui.btnCari
QtCore.QObject.connect(opendata, QtCore.SIGNAL('clicked()'),self.openxml)
def openxml(self, event=None):
#open dialog
openfile = QtGui.QFileDialog.getOpenFileName(self, 'Open File', '*.xml')
self.ui.lineLokasi.setText(openfile)
#call XML data
self.isiData(openfile)
def isiData(self, nmsatu):
#open teks with read mode
openteks = open(nmsatu, 'r').read()
self.ui.textXml.setText(openteks)
and to parse xml after that i try use Element Tree, this code i use to parse xml from code above :
#Parse XML from Above
self.parsenow(openteks)
def parsenow(self, parse):
element = ETree.fromstring(parse)
xml_obj = ETree.ElementTree(element)
for title_obj in xml_obj.findall('./{gmd#}dateStamp/{gco#}Date'):
print element
self.ui.lineSkala.setText(element)
and xml i want to parse have format like this :
<gmd:datestamp>
<gco:Date> XML Date </gco:Date>
i try to show XML Date in LineSkala(lineEdit) in QT but when i run it, it can open and read xml but failed to show XML Date in lineSkala, it just blank and didn't give me any error message
What i miss?
Thanks for your help in advance

The XPath syntax supported by etree is quite limited. Also, you must either supply a prefix dictionary when using find/findall (although this is not properly documented in python2), or use the full namespace uri.
So try something like:
ns = {
'gmd': 'http://www.isotc211.org/2005/gmd',
'gco': 'http://www.isotc211.org/2005/gco',
}
tree.findall('.//gmd:dateStamp/gco:Date', ns)
or:
tree.findall('.//{http://www.isotc211.org/2005/gmd}dateStamp/'
'{http://www.isotc211.org/2005/gco}Date')
PS:
If you need to use more sophisticated XPath syntax, try lxml, which has a very similar API to ElementTree, but many more features.

Related

How to better read and parse an xml file using Python and SAX?

Windows 11/Python 3.8.10 - Using Spyder Python IDE and PyCharm
Hey all, newish to python app dev and have a big project to parse xml files. Trying to write a python program for it. Below is a very small sample of the xml file data structure I am working with.
<PillCall XMLInstanceID="98089D9A-768A-4FA0-A7CD-DC5966EB5B06" PillCallID="49" VersionNumber="1.2">
</PillCall>
These xml files will be huge. Eventually this will need to be able to process multiple large files with a lot of data 24/7 concurrently. Eventually parsing the data and saving it to a db, then after modification, creating an new modified xml file based on the current data in db.
Here is my sample program, from Python Spyder IDE: -- I have tried a bunch of other methods but the SAX method has been the best to understand for me personally so far. I am sure there are better ways though.
import xml.sax
class XMLHandler(xml.sax.ContentHandler):
def __init__(self):
self.CurrentData = ""
self.pillcall = ""
self. pillcallid= ""
self.vernum = ""
# Call when an element starts
def startElement(self, tag, attributes):
self.CurrentData = tag
if(tag == "PillCall"):
print("*****PillCall*****")
title = attributes["XMLInstanceID"]
print("XMLInstanceID:=", title) #How at add multiple values/strings here?
# print(sorted()
# create an XMLReader
parser = xml.sax.make_parser()
# turn off namepsaces
parser.setFeature(xml.sax.handler.feature_namespaces, 0)
# override the default ContextHandler
Handler = XMLHandler()
parser.setContentHandler( Handler )
parser.parse("xmltest10.xml")
My output is this:
PillCall
XMLInstanceID:= 98089D9A-768A-4FA0-A7CD-DC5966EB5B06
I have tried many different ways to read the whole string with element tree and beautifulsoap but can't get it to work. I also get no output with running this program in PyCharm.
Here is some extra python/sax code that I have been messing with as well but haven't got it to work right either.
I just need to be able to clearly read the data and parse it to a new file for now. And also how to loop through it and find all the data to ouput. Thanks for any and all help!!
# Call when an elements ends
def endElement(self, tag):
if(self.CurrentData != "/PillCall"):
print("End of PillCall:", self.pillcall)
elif(self.CurrentData == "PillCallID"):
print("PillCallID:=", self.pillcallid)
elif(self.CurrentData == "VersionNumber"):
print("VersionNumber:=", self.vernum)
self.CurrentData = ""
# Call when a character is read
def characters(self, content):
if(self.CurrentData == "PillCall"):
self.pillcall = content
elif(self.CurrentData == "qty"):
self.pillcallid = content
elif(self.CurrentData == "company"):
self.vernum = content
Using BeautifulSoup's find_all may be what you're looking for...
Given:
text = """
<PillCall XMLInstanceID="98089D9A-768A-4FA0-A7CD-DC5966EB5B06" PillCallID="49" VersionNumber="1.2">
</PillCall>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(text, 'xml')
for result in soup.find_all('PillCall'):
print(result.attrs)
Output:
{'PillCallID': '49',
'VersionNumber': '1.2',
'XMLInstanceID': '98089D9A-768A-4FA0-A7CD-DC5966EB5B06'}

Displaying XML attributes in PHP

I am a beginner in coding and have this question below. I would gladly appreciate any help.
I have this python code below that request for information regarding a organization.
Note: The Commented "target" variable is for future use when i pass the user input from php to this python script.
import requests, sys
#target = sys.argv[1]
target = "logitech"
request = requests.get('http://whois.arin.net/rest/nets;name={}'.format(target))
print(request.text)
The output is similar to this but the number of "netRef" tags may vary depending on the organization.
<?xml version='1.0'?><?xml-stylesheet type='text/xsl' href='http://whois.arin.net/xsl/website.xsl' ?><nets xmlns="http://www.arin.net/whoisrws/core/v1" xmlns:ns2="http://www.arin.net/whoisrws/rdns/v1" xmlns:ns3="http://www.arin.net/whoisrws/netref/v2" copyrightNotice="Copyright 1997-2020, American Registry for Internet Numbers, Ltd." inaccuracyReportUrl="https://www.arin.net/resources/registry/whois/inaccuracy_reporting/" termsOfUse="https://www.arin.net/resources/registry/whois/tou/"><limitExceeded limit="256">false</limitExceeded>
<netRef endAddress="173.8.217.111" startAddress="173.8.217.96" handle="NET-173-8-217-96-1" name="LOGITECH">https://whois.arin.net/rest/net/NET-173-8-217-96-1</netRef>
<netRef endAddress="50.193.49.47" startAddress="50.193.49.32" handle="NET-50-193-49-32-1" name="LOGITECH">https://whois.arin.net/rest/net/NET-50-193-49-32-1</netRef></nets>
I was wondering, is it possible to only display all of the endAddress and startAddress attributes in PHP?
I've tried using the xml.etree.ElementTree module but because the request variable is a "response" instead of a "byte", i can't parse the XML directly into an element.
My PHP code currently looks like this as i am unsure of how to proceed. testapi.py refers to the python code above.
<?php
$output1 = shell_exec('python testapi.py');
echo $output1;
?>
My desired output on the PHP side is as follow:
IP range: 173.8.217.96-173.8.217.111, 50.193.49.32-50.193.49.47
I would gladly appreciate any help, Thank You.
Python's etree maintains the fromstring method to parse XML trees from text. From there, you can parse content and be sure to assign prefixes to the default namespace in XML:
xmlns="http://www.arin.net/whoisrws/core/v1"
import requests as rq
import xml.etree.ElementTree as ET
request = rq.get('http://whois.arin.net/rest/nets;name=logitech')
tree = ET.fromstring(request.text)
nmsp = {"doc": "http://www.arin.net/whoisrws/core/v1"}
for elem in tree.findall(".//doc:netRef", nmsp):
print(f"endAddress: {elem.attrib['endAddress']}")
print(f"startAddress: {elem.attrib['startAddress']}")
print("---------------------------\n")
# endAddress: 173.8.217.111
# startAddress: 173.8.217.96
# ---------------------------
# endAddress: 50.193.49.47
# startAddress: 50.193.49.32
# ---------------------------

parse XML with Python, key value namespaces

I have a XML file downloaded from Wordpress that is structured like this:
<wp:postmeta>
<wp:meta_key><![CDATA[country]]></wp:meta_key>
<wp:meta_value><![CDATA[Germany]]></wp:meta_value>
</wp:postmeta>
my goals is to look through the XML file for all the country keys and print the value. I'm completely new to the XML library so I'm looking where to take it from here.
# load libraries
# importing os to handle directory functions
import os
# import XML handlers
from xml.etree import ElementTree
# importing json to handle structured data saving
import json
# dictonary with namespaces
ns = {'wp:meta_key', 'wp:meta_value'}
tree = ElementTree.parse('/var/www/python/file.xml')
root = tree.getroot()
# item
for item in root.findall('wp:post_meta', ns):
print '- ', item.text
print "Finished running"
this throws me a error about using wp as a namespace but I'm not sure where to go from here the documentation is unclear to me. Any help is appreciated.
Downvoters please let me know how I can improve my question.
I don't know XML, but I can treat it as a string like this.
from simplified_scrapy import SimplifiedDoc, req, utils
xml = '''
<wp:postmeta>
<wp:meta_key><![CDATA[country]]></wp:meta_key>
<wp:meta_value><![CDATA[Germany]]></wp:meta_value>
</wp:postmeta>
'''
doc = SimplifiedDoc(xml)
kvs = doc.select('wp:postmeta').selects('wp:meta_key|wp:meta_value').html
print (kvs)
Result:
['<![CDATA[country]]>', '<![CDATA[Germany]]>']

Using python to get data (text) from wix

I'm making a python project in which I created a test wix website.
I want to get the data (text) from the wix website using urllib
so I did
url.urlopen(ADDRESS).readlines()
the problem is it did not give me anything from the text in the page and only information about the structure of the page in HTML.
how would I extricate the requested text information from the website?
I think you'll need to end up parsing the html for the information you want. Check out this python library:
https://docs.python.org/3/library/html.parser.html
You could potentially do something like this:
from html.parser import HTMLParser
rel_data = []
class MyHTMLParser(HTMLParser):
def handle_data(self, data):
rel_data.append(data)
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
'<body><h1>Parse me!</h1></body></html>')
print(rel_data)
Output
["Test", "Parse me!"]

Use Minidom to parse XML But just crashes applet

Having some issues with Minidom for parsing an XML file on a remote server.
This is the code I am trying to parse:
<mod n="1">
<body>
Random Body information will be here
</body>
<b>1997-01-27</b>
<d>1460321480</d>
<l>United Kingdom</l>
<s>M</s>
<t>About Denisstoff</t>
</mod>
I'm trying to return the <d> values with Minidom. This is the code I am trying to use to find the value:
expired = True
f = urlreq.urlopen("http://st.chatango.com/profileimg/"+args[:1]+"/"+args[1:2]+"/"+args+"/mod1.xml")
data = f.read().decode("utf-8")
dom = minidom.parseString(data)
itemlist = dom.getElementsByTagName('d')
print(itemlist)
It returns the value is there, but I followed a way to read the data I found here (Below) and it just crashed my python app. This is the code I tried to fix with:
for s in itemlist:
if s.hasAttribute('d'):
print(s.attributes['d'].value)
This is the crash:
AttributeError: 'NodeList' object has no attribute 'value'
I also tried ElementTree but that didn't return any data at all. I have tested the URL and it's correct for the data I want, but I just can't get it to read the data in the tags. Any and all help is appreciated.
if you want to print values from this xml you should use this:
for s in itemlist:
if hasattr(s.childNodes[0], "data"):
print(s.childNodes[0].data)
I hope it help :D

Categories

Resources