I'm trying to get .xml file from a URL http://192.168.1.80/api/current and process the content by SUBSCRIBER by SUBSCRIBER,i wrote code to get a xml file as string using python urllib2 module,i like to convert xml file to object and process,how can i proceed
import urllib2
from xml.dom import minidom
usock = urllib2.urlopen('http://192.168.1.80/api/current')
xmldoc = minidom.parse(usock)
usock.close()
data = xmldoc.toxml()
print data
xml content
<NSE COMMAND="CURR_USERS_RSP">
<SUBSCRIBER>
<SUB_MAC_ADDR>
70:16:00:C1:12:76
</SUB_MAC_ADDR>
<SUB_IP>
192.168.1.20
</SUB_IP>
<LOCATION>
0
</LOCATION>
</SUBSCRIBER>
<SUBSCRIBER>
<SUB_MAC_ADDR>
58:E6:F6:E5:7B:78
</SUB_MAC_ADDR>
<SUB_IP>
192.168.1.21
</SUB_IP>
<LOCATION>
0
</LOCATION>
</SUBSCRIBER>
</NSE>
Finally i figure it out a way to solving above problem
import xml.etree.ElementTree as ET
import urllib2
from xml.dom import minidom
url = 'http://192.168.1.80/api/current'
try:
usock = urllib2.urlopen(url)
xmldoc = minidom.parse(usock)
usock.close()
data = xmldoc.toxml()
root = ET.fromstring(data)
for ele in root.findall('SUBSCRIBER'):
print 'MAC = ' + ele.find('SUB_MAC_ADDR').text + ', IP = ' + ele.find('SUB_IP').text + ', Location = ' + ele.find('LOCATION').text
except Exception as e:
print e.getcode()
You can use the library BeautifulSoup4 to parse the xml file. Do not forget to select the appropriate xml parser. See this.
Related
I want to parse this url to get the text of \Roman\
http://jlp.yahooapis.jp/FuriganaService/V1/furigana?appid=dj0zaiZpPU5TV0Zwcm1vaFpIcCZzPWNvbnN1bWVyc2VjcmV0Jng9YTk-&grade=1&sentence=私は学生です
import urllib
import xml.etree.ElementTree as ET
url = 'http://jlp.yahooapis.jp/FuriganaService/V1/furigana?appid=dj0zaiZpPU5TV0Zwcm1vaFpIcCZzPWNvbnN1bWVyc2VjcmV0Jng9YTk-&grade=1&sentence=私は学生です'
uh = urllib.urlopen(url)
data = uh.read()
tree = ET.fromstring(data)
counts = tree.findall('.//Word')
for count in counts
print count.get('Roman')
But it didn't work.
Try tree.findall('.//{urn:yahoo:jp:jlp:FuriganaService}Word') . It seems you need to specify the namespace too .
I recently ran into a similar issue to this. It was because I was using an older version of the xml.etree package and to workaround that issue I had to create a loop for each level of the XML structure. For example:
import urllib
import xml.etree.ElementTree as ET
url = 'http://jlp.yahooapis.jp/FuriganaService/V1/furigana?appid=dj0zaiZpPU5TV0Zwcm1vaFpIcCZzPWNvbnN1bWVyc2VjcmV0Jng9YTk-&grade=1&sentence=私は学生です'
uh = urllib.urlopen(url)
data = uh.read()
tree = ET.fromstring(data)
counts = tree.findall('.//Word')
for result in tree.findall('Result'):
for wordlist in result.findall('WordList'):
for word in wordlist.findall('Word'):
print(word.get('Roman'))
Edit:
With the suggestion from #omu_negru I was able to get this working. There was another issue, when getting the text for "Roman" you were using the "get" method which is used to get attributes of the tag. Using the "text" attribute of the element you can get the text between the opening and closing tags. Also, if there is no 'Roman' tag, you'll get a None object and won't be able to get an attribute on None.
# encoding: utf-8
import urllib
import xml.etree.ElementTree as ET
url = 'http://jlp.yahooapis.jp/FuriganaService/V1/furigana?appid=dj0zaiZpPU5TV0Zwcm1vaFpIcCZzPWNvbnN1bWVyc2VjcmV0Jng9YTk-&grade=1&sentence=私は学生です'
uh = urllib.urlopen(url)
data = uh.read()
tree = ET.fromstring(data)
ns = '{urn:yahoo:jp:jlp:FuriganaService}'
counts = tree.findall('.//%sWord' % ns)
for count in counts:
roman = count.find('%sRoman' % ns)
if roman is None:
print 'Not found'
else:
print roman.text
I need to get information about length and domain structure of a particular protein, for example 1btk. For this I need to get UniprotKB, how can I do it?
from web site http://www.rcsb.org/pdb/explore.do?structureId=1BTK
the UniprotKB is 'Q06187'
you may use urllib2 for download pdb file, next to use regular expression for extract the Uniprot id
url_template = "http://www.rcsb.org/pdb/files/{}.pdb"
protein = "1BTK"
url = url_template.format(protein)
import urllib2
response = urllib2.urlopen(url)
pdb = response.read()
response.close() # best practice to close the file
import re
m = re.search('UNP\ +(\w+)', pdb)
m.group(1)
# you get 'Q06187'
bonus, if you wish parser the pdb file:
from Bio.PDB.PDBParser import PDBParser
response = urllib2.urlopen(url)
parser = PDBParser()
structure = parser.get_structure(protein, response)
response.close() # best practice to close the file
header = parser.get_header()
trailer = parser.get_trailer()
#info about protein in structure, header and trailer
Below is my sample code where in the background I am downloading statsxml.jsp with wget and then parsing the xml. My question is now I need to parse multiple XML URL and as you can see in the below code I am using one single file. How to accomplish this?
Example URL - http://www.trion1.com:6060/stat.xml,
http://www.trion2.com:6060/stat.xml, http://www.trion3.com:6060/stat.xml
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='statsxml.jsp')
root = tree.getroot()
root.tag, root.attrib
print "root subelements: ", root.getchildren()
root.getchildren()[0][1]
root.getchildren()[0][4].getchildren()
for component in tree.iterfind('Component name'):
print component.attrib['name']
You can use urllib2 to download and parse the file in the same way. For e.g. the first few lines will be changed to:
import xml.etree.cElementTree as ET
import urllib2
for i in range(3):
tree = ET.ElementTree(file=urllib2.urlopen('http://www.trion%i.com:6060/stat.xml' % i ))
root = tree.getroot()
root.tag, root.attrib
# Rest of your code goes here....
I'm writing some script which capture data from web site and save them into DB. Some of datas are merged and I need to split them. I have sth like this
Endokrynologia (bez st.),Położnictwo i ginekologia (II st.)
So i need to get:
Endokrynologia (bez st.)
Położnictwo i ginekologia (II st.)
So i wrote some code in python:
#!/usr/bin/env python
# -*- encoding: utf-8
import MySQLdb as mdb
from lxml import html, etree
import urllib
import sys
import re
Nr = 17268
Link = "http://rpwdl.csioz.gov.pl/rpz/druk/wyswietlKsiegaServletPub?idKsiega="
sock = urllib.urlopen(Link+str(Nr))
htmlSource = sock.read()
sock.close()
root = etree.HTML(htmlSource)
result = etree.tostring(root, pretty_print=True, method="html")
Spec = etree.XPath("string(//html/body/div/table[2]/tr[18]/td[2]/text())")
Specjalizacja = Spec(root)
if re.search(r'(,)\b', Specjalizacja):
text = Specjalizacja.split()
print text[0]
print text[1]
and i get:
Endokrynologia
(bez
what i'm doing wrong ?
you would try to replace
text = Specjalizacja.split()
with
text = Specjalizacja.split(',')
Don't know whether that would fix your problem.
I'm trying to parse data from a website and cannot print the data.
import xml.etree.ElementTree as ET
from urllib import urlopen
link = urlopen('http://weather.aero/dataserver_current/httpparam?dataSource=metars&requestType=retrieve&format=xml&stationString=KSFO&hoursBeforeNow=1')
tree = ET.parse(link)
root = tree.getroot()
data = root.findall('data/metar')
for metar in data:
print metar.find('temp_c').text
It is case sensitive:
data = root.findall('data/METAR')