parse Dutch NDW xml - python

I am trying to parse the XML file from the Dutch NDW which contains every minute the trafficspeed on many Dutch motorways. I use this example file: http://www.ndw.nu/downloaddocument/e838c62446e862f5b6230be485291685/Reistijden.zip
I am trying to parse the traveltime data in variables with Python but i am struggling.
from xml.etree import ElementTree
import urllib2
url = "http://weburloffile.nl/ndw/Reistijden.xml"
response = urllib2.urlopen(url)
namespaces = {
'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
'a': 'http://datex2.eu/schema/2/2_0'
}
dom = ElementTree.fromstring(response.read)
names = dom.findall(
'soap:Envelope'
'/a:duration',
namespaces,
)
#print names
for duration in names:
print(duration.text)
I get this new error
Traceback (most recent call last):
File "test.py", line 9, in <module>
dom = ElementTree.fromstring(response.read)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1311, in XML
parser.feed(text)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1651, in feed
self._parser.Parse(data, 0)
TypeError: Parse() argument 1 must be string or read-only buffer, not instancemethod
How to parse this (complex) xml correctly?
-- changed it into read as suggested by comment

The problem isn't the XML parsing; it's that you are using the response object incorrectly. urllib2.urlopen returns a file-like object that does not have a content attribute. Instead, you should be calling read on it:
dom = ElementTree.fromstring(response.read())

Related

Error while parsing XML document in Python

I'm trying to write a simple script that parses my XML document to get name from all <xs:element> tags. I'm using minidom (is there a better way?) Here is my code so far:
import csv
from xml.dom import minidom
xmldoc = minidom.parse('core.xml')
core = xmldoc.getElementsByTagName('xs:element')
print(len(core))
print(core[0].attributes['name'].value)
for x in core:
print(x.attributes['name'].value)
I'm getting this error:
Traceback (most recent call last):
File "C:/Users/user/Desktop/XML Parsing/test.py", line 9, in <module>
print(core[0].attributes['name'].value)
File "C:\Python27\lib\xml\dom\minidom.py", line 522, in __getitem__
return self._attrs[attname_or_tuple]
KeyError: 'name'
As you have the tag name, you don't need to add the index.
Just replace with the following code:
print(core.attributes['name'].value)

Python - write Xml (formatted)

I wrote this python script in order to create Xml content and i would like to write this "prettified" xml to a file (50% done):
My script so far:
data = ET.Element("data")
project = ET.SubElement(data, "project")
project.text = "This project text"
rawString = ET.tostring(data, "utf-8")
reparsed = xml.dom.minidom.parseString(rawString)
cleanXml = reparsed.toprettyxml(indent=" ")
# This prints the prettified xml i would like to save to a file
print cleanXml
# This part does not work, the only parameter i can pass is "data"
# But when i pass "data" as a parameter, a xml-string is written to the file
tree = ET.ElementTree(cleanXml)
tree.write("config.xml")
The error i get when i pass cleanXml as parameter:
Traceback (most recent call last):
File "app.py", line 45, in <module>
tree.write("config.xml")
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 817, in write
self._root, encoding, default_namespace
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 876, in _namespaces
iterate = elem.getiterator # cET compatibility
AttributeError: 'unicode' object has no attribute 'getiterator'
Anybody knows how i can get my prettified xml to a file ?
Thanks and Greetings!
The ElementTree constructor can be passed a root element and a file. To create an ElementTree from a string, use ElementTree.fromstring.
However, that isn't what you want. Just open a file and write the string directly:
with open("config.xml", "w") as config_file:
config_file.write(cleanXml)

using rest and parsing xml in python

I have a python script that does a REST GET call and stores the xml response in a string "response". However when I try to print the root of the XML, it fails with the following error. If i just print response i.e "print response.read()", I get the response body correctly. What could be wrong here? Could you please help?
import urllib
import urllib2
import xml.etree.ElementTree as ET
url = "http://192.168.1.1/health"
headers = {"Content-Type":"application/xml"}
request = urllib2.Request(url)
for key in headers.items():
request.add_header(key)
response = urllib2.urlopen(request)
#print response.read()
root = ET.fromstring(response)
#print root
Here is the error when executing the script
~]# python test4.py
Traceback (most recent call last):
File "test4.py", line 24, in <module>
root = ET.fromstring(response)
File "/usr/lib64/python2.6/xml/etree/ElementTree.py", line 963, in XML
parser.feed(text)
File "/usr/lib64/python2.6/xml/etree/ElementTree.py", line 1245, in feed
self._parser.Parse(data, 0)
TypeError: Parse() argument 1 must be string or read-only buffer, not instance
Change this
root = ET.fromstring(response)
to
root = ET.fromstring(response.read())

How to extract information from json?

i am trying to extract from json data some information. On the following code, I first extract the part of json data that contains the information i want and i store it in a file. Then i am trying to open this file and i get the error that follows my code. Can you help me find where i am wrong?
import json
import re
input_file = 'path'
text = open(input_file).read()
experience = re.findall(r'Experience":{"positionsMpr":{"showSection":true," (.+?),"visible":true,"find_title":"Find others',text)
output_file = open ('/home/evi.nastou/Documenten/LinkedIn_data/Alewijnse/temp', 'w')
output_file.write('{'+experience[0]+'}')
output_file.close()
text = open('path/temp')
input_text = text.read()
data = json.load(input_text)
positions = json.dumps([s['companyName'] for s in data['positions']])
print positions
Error:
Traceback (most recent call last):
File "test.py", line 13, in <module>
data = json.load(input_text)
File "/home/evi.nastou/.pythonbrew/pythons/Python-2.7.2/lib/python2.7/json/__init__.py", line 274, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
You want to use json.loads() (note the s), or pass in the file object instead of the result of .read():
text = open('path/temp')
data = json.load(text)
json.load() takes an open file object, but you were passing it a string; json.loads() takes a string.

Upper limit of fromstring function in ElementTree

I'm using Python 2.4 version on a Windows 32-bit PC. I'm trying to parse through a very large XML file using the ElementTree module. I downloaded version 1.2.6 of this module from effbot.org.
I followed the below code for my purpose:
import elementtree.ElementTree as ET
input = ''' 001 Chuck 009 Brent '''
stuff = ET.fromstring(input)
lst = stuff.findall("users/user")
print len(lst)
for item in lst:
print item.attrib["x"]
item = lst[0]
ET.dump(item)
item.get("x") # get works on attributes
item.find("id").text
item.find("id").tag
for user in stuff.getiterator('user'):
print "User" , user.attrib["x"]
ET.dump(user)
If the content of input is too large, more than 10,000 lines, the fromstring function raises an error (below). Can anyone help me out in rectifying this error?
This is the error generated:
Traceback (most recent call last): File "C:\Documents and Settings\hariprar\My Documents\My files\Python Try\xml_try1.py", line 16, in -toplevel- stuff = ET.fromstring(input) File "C:\Python24\Lib\site-packages\elementtree\ElementTree.py", line 1012, in XML return api.fromstring(text) File "C:\Python24\Lib\site-packages\elementtree\ElementTree.py", line 182, in fromstring parser.feed(text) File "C:\Python24\Lib\site-packages\elementtree\ElementTree.py", line 1292, in feed self._parser.Parse(data, 0) ExpatError: not well-formed (invalid token): line 2445, column 39
Take a look at the iterparse function. It will let you parse your input incrementally rather than reading it into memory as one big chunk.
It's described here: http://effbot.org/zone/element-iterparse.htm

Categories

Resources