I have an XML file I am trying to parse and access one root of: DonorAdvisedFundInd which I shouldn't have a problem with but when I'm trying to parse the XML file I get an error message saying:
[Errno 36] File name too long:`
Here's the code I'm currently using: I cut off most of it so it's easier to see the problem. The error is occurring on the parse line.
import pandas as pd
import xml.etree.ElementTree as et
import requests
xml_data = requests.get("https://s3.amazonaws.com/irs-form-990/201903199349320465_public.xml").content
xtree = et.parse(xml_data)
Now the reason I'm so confused is if you open that link, the XML file really isn't all that long. It should be able to be parsed. I'm using IBM Watson Studio's online compiler if it makes any difference.
I'd appreciate any insight or feedback anyone can provide.
Try fromstring:
import pandas as pd
import xml.etree.ElementTree as et
import requests
xml_data = requests.get("https://s3.amazonaws.com/irs-form-990/201903199349320465_public.xml").content
xtree = et.fromstring(xml_data)
Update (for finding the specific element):
for i in xtree.findall(".//"):
if 'DonorAdvisedFundInd' in i.tag:
print(i.tag, i.attrib, i.text)
Another way would have been using this xmltodict lib like this:
result = xmltodict.parse(xml_data)
result['Return']['ReturnData']['IRS990']['DonorAdvisedFundInd']
Related
Currently I am using following code to read xml files and extract data.
import pandas as pd
import numpy as np
import xml.etree.cElementTree as et
import datetime
tree=et.parse(r'/data/dump_xml/myfile1.xml')
root=tree.getroot()
NAME = []
for name in root.iter('name'):
NAME.append(name.text)
UPDATE = []
for update in root.iter('lastupdate'):
UPDATE.append(update.text)
updated = datetime.datetime.fromtimestamp(int(UPDATE[0]))
lastupdate=updated.strftime('%Y-%m-%d %H:%M:%S')
ParaValue = []
for parameterevalue in root.iter('value'):
ParaValue.append(parameterevalue.text)
print(ParaValue[0])
print(ParaValue[1])
print(lastupdate,NAME[0],ParaValue[0])
print(lastupdate,NAME[1],ParaValue[1])
From one file I could get following output as a result
2022-05-23 11:25:01 traffic_in 1.5012356187e+05
2022-05-23 11:25:01 traffic_out 1.7723777592e+05
But I have set of xml files in /data/dump_xml/ and I need to get all the results as below with the file name as well. I need to export all those as a dataframe.Can someone help me to do this for whole directory?
I read a lot of different answers similar questions, but no one seems providing a simple solution.
Supposing to have a remote url like this https://www.emidius.eu/fdsnws/event/1/query?eventid=quakeml:eu.ahead/event/13270512_0000_000&format=xml the final aim is to get an usable python object (e.g. a dictionary or a json like object).
I did find different methods if the xml is save as a local file:
import xml.etree.ElementTree as ET
file = '/home/user/query.xml'
tree = ET.parse(file)
root = tree.getroot()
for c in root:
print(c.tag)
for i in c:
print(i.tag)
I did not find a method (with native python modules) to bump a url string and get an object.
OK I think the best solution is this one:
import xml.etree.ElementTree as ET
import urllib.request
opener = urllib.request.build_opener()
url = 'https://www.emidius.eu/fdsnws/event/1/query?eventid=quakeml:eu.ahead/event/13270512_0000_000&includeallorigins=true&includeallmagnitudes=true&format=xml'
tree = ET.parse(opener.open(url))
This works, but you don't need build_opener() for that.
You can build a custom opener for some specific case or protocol, but you use normal https. So you can just use
import urllib.request
import xml.etree.ElementTree as ET
url = 'https://www.emidius.eu/fdsnws/event/1/query?eventid=quakeml:eu.ahead/event/13270512_0000_000&includeallorigins=true&includeallmagnitudes=true&format=xml'
with urllib.request.urlopen(url) as response:
html = ET.fromstring(response.read().decode())
I am using xmltodict library in python (https://pypi.org/project/xmltodict/) to parse a xml file by:
import xmltodict
with open("MyXML.xml") as MyXML:
doc = xmltodict.parse(MyXML.read())
The xml file looks good but I get this error:
ExpatError: no element found: line 1, column 0
What should I do?
In my uses of xmltodict, I have always parsed a string and to get an xml string is use etree. Try this:
import xml.etree.ElementTree as ET
import xmltodict
tree = ET.parse("MyXml.xml")
root = tree.getroot()
data = xmltodict.parse(ET.toString(root))
if you have your MyXml.xml file in a different locatin than this file you will need to handle that using file and the import os.
Good Luck, Hope this helps.
I am new to python,I am trying to parse a xml document to count the total no. of words,I tried the below program to count the n no. of words in the file,But i get the error as follows:
After getting this error,i installed "utils",but still it comes.
Is there any other easy way of getting the totla no. of words of an xml document in python,Please help!
Traceback (most recent call last):
File "C:\Python27\xmlp.py", line 1, in <module>
from xml.dom import utils,core
ImportError: cannot import name utils
Coding
from xml.dom import utils,core
import string
reader = utils.FileReader('Greeting.xml')
doc = reader.document
Storage = ""
for n in doc.documentElement.childNodes:
if n.nodeType == core.TEXT_NODE:
# Accumulate contents of text nodes
Storage = Storage + n.nodeValue
print len(string.split(Storage))
You'll find it easier to use ElementTree, eg:
from xml.etree import ElementTree as ET
xml = '<a>one two three<b>four five<c>Six Seven</c></b></a>'
tree = ET.fromstring(xml)
total = sum(len(text.split()) for text in tree.itertext())
# 7
But use tree = ET.parse('Greeting.xml') to load your real data.
imho you do not need utils and core
just from xml.dom import minidom
look a similar example here: Python XML File Open
How can I import data for example for the field A1?
When I use etree.parse() I get an error, because I dont have a xml file.
It's a zip file:
import zipfile
from lxml import etree
z = zipfile.ZipFile('mydocument.ods')
data = z.read('content.xml')
data = etree.XML(data)
etree.dump(data)