I am a beginner in coding and have this question below. I would gladly appreciate any help.
I have this python code below that request for information regarding a organization.
Note: The Commented "target" variable is for future use when i pass the user input from php to this python script.
import requests, sys
#target = sys.argv[1]
target = "logitech"
request = requests.get('http://whois.arin.net/rest/nets;name={}'.format(target))
print(request.text)
The output is similar to this but the number of "netRef" tags may vary depending on the organization.
<?xml version='1.0'?><?xml-stylesheet type='text/xsl' href='http://whois.arin.net/xsl/website.xsl' ?><nets xmlns="http://www.arin.net/whoisrws/core/v1" xmlns:ns2="http://www.arin.net/whoisrws/rdns/v1" xmlns:ns3="http://www.arin.net/whoisrws/netref/v2" copyrightNotice="Copyright 1997-2020, American Registry for Internet Numbers, Ltd." inaccuracyReportUrl="https://www.arin.net/resources/registry/whois/inaccuracy_reporting/" termsOfUse="https://www.arin.net/resources/registry/whois/tou/"><limitExceeded limit="256">false</limitExceeded>
<netRef endAddress="173.8.217.111" startAddress="173.8.217.96" handle="NET-173-8-217-96-1" name="LOGITECH">https://whois.arin.net/rest/net/NET-173-8-217-96-1</netRef>
<netRef endAddress="50.193.49.47" startAddress="50.193.49.32" handle="NET-50-193-49-32-1" name="LOGITECH">https://whois.arin.net/rest/net/NET-50-193-49-32-1</netRef></nets>
I was wondering, is it possible to only display all of the endAddress and startAddress attributes in PHP?
I've tried using the xml.etree.ElementTree module but because the request variable is a "response" instead of a "byte", i can't parse the XML directly into an element.
My PHP code currently looks like this as i am unsure of how to proceed. testapi.py refers to the python code above.
<?php
$output1 = shell_exec('python testapi.py');
echo $output1;
?>
My desired output on the PHP side is as follow:
IP range: 173.8.217.96-173.8.217.111, 50.193.49.32-50.193.49.47
I would gladly appreciate any help, Thank You.
Python's etree maintains the fromstring method to parse XML trees from text. From there, you can parse content and be sure to assign prefixes to the default namespace in XML:
xmlns="http://www.arin.net/whoisrws/core/v1"
import requests as rq
import xml.etree.ElementTree as ET
request = rq.get('http://whois.arin.net/rest/nets;name=logitech')
tree = ET.fromstring(request.text)
nmsp = {"doc": "http://www.arin.net/whoisrws/core/v1"}
for elem in tree.findall(".//doc:netRef", nmsp):
print(f"endAddress: {elem.attrib['endAddress']}")
print(f"startAddress: {elem.attrib['startAddress']}")
print("---------------------------\n")
# endAddress: 173.8.217.111
# startAddress: 173.8.217.96
# ---------------------------
# endAddress: 50.193.49.47
# startAddress: 50.193.49.32
# ---------------------------
Related
I have a XML file downloaded from Wordpress that is structured like this:
<wp:postmeta>
<wp:meta_key><![CDATA[country]]></wp:meta_key>
<wp:meta_value><![CDATA[Germany]]></wp:meta_value>
</wp:postmeta>
my goals is to look through the XML file for all the country keys and print the value. I'm completely new to the XML library so I'm looking where to take it from here.
# load libraries
# importing os to handle directory functions
import os
# import XML handlers
from xml.etree import ElementTree
# importing json to handle structured data saving
import json
# dictonary with namespaces
ns = {'wp:meta_key', 'wp:meta_value'}
tree = ElementTree.parse('/var/www/python/file.xml')
root = tree.getroot()
# item
for item in root.findall('wp:post_meta', ns):
print '- ', item.text
print "Finished running"
this throws me a error about using wp as a namespace but I'm not sure where to go from here the documentation is unclear to me. Any help is appreciated.
Downvoters please let me know how I can improve my question.
I don't know XML, but I can treat it as a string like this.
from simplified_scrapy import SimplifiedDoc, req, utils
xml = '''
<wp:postmeta>
<wp:meta_key><![CDATA[country]]></wp:meta_key>
<wp:meta_value><![CDATA[Germany]]></wp:meta_value>
</wp:postmeta>
'''
doc = SimplifiedDoc(xml)
kvs = doc.select('wp:postmeta').selects('wp:meta_key|wp:meta_value').html
print (kvs)
Result:
['<![CDATA[country]]>', '<![CDATA[Germany]]>']
I'm making a python project in which I created a test wix website.
I want to get the data (text) from the wix website using urllib
so I did
url.urlopen(ADDRESS).readlines()
the problem is it did not give me anything from the text in the page and only information about the structure of the page in HTML.
how would I extricate the requested text information from the website?
I think you'll need to end up parsing the html for the information you want. Check out this python library:
https://docs.python.org/3/library/html.parser.html
You could potentially do something like this:
from html.parser import HTMLParser
rel_data = []
class MyHTMLParser(HTMLParser):
def handle_data(self, data):
rel_data.append(data)
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
'<body><h1>Parse me!</h1></body></html>')
print(rel_data)
Output
["Test", "Parse me!"]
I think what I am trying to do is pretty much like github issue in zeep repo --- but sadly there is no response to this issue yet. I researched suds and installed and tried -- did not even get sending parameter to work and thought zeep seems better maintained?
Edit 1:
For sure I am not talking about this
You can use a Plugin for editing the xml as a plain string. I used this plugin for keeping the characters '<' and '>' in a CDATA element.
from xml import etree
from zeep import Plugin
class my_plugin(Plugin):
def egress(self, envelope, http_headers, operation, binding_options):
xml_string = etree.ElementTree.tostring(envelope)
xml_string = xml_string.replace("<", "<")
xml_string = xml_string.replace(">", ">")
parser = etree.ElementTree.XMLParser(strip_cdata=False)
new_envelope = etree.ElementTree.XML(xml_string, parser=parser)
return new_envelope, http_headers
Then just import the plugin on the client:
client = Client(wsdl='url', transport=transport, plugins=[my_plugin()])
Take a look at the docs: http://docs.python-zeep.org/en/master/plugins.html
On Python 3.9, #David Ortiz answer didn't work for me, maybe something has changed. The etree_to_string was failing to convert the XML to string.
What worked for me, instead of a plugin, I created a custom transport, that replaced the stripped tags with the correct characters, just like David's code, before the post was sent.
import zeep
from zeep.transports import Transport
from xml.etree import ElementTree
class CustomTransport(Transport):
def post_xml(self, address, envelope, headers):
message = ElementTree.tostring(envelope, encoding="unicode")
message = message.replace("<", "<")
message = message.replace(">", ">")
return self.post(address, message, headers)
client = zeep.Client('wsdl_url', transport=CustomTransport())
Having some issues with Minidom for parsing an XML file on a remote server.
This is the code I am trying to parse:
<mod n="1">
<body>
Random Body information will be here
</body>
<b>1997-01-27</b>
<d>1460321480</d>
<l>United Kingdom</l>
<s>M</s>
<t>About Denisstoff</t>
</mod>
I'm trying to return the <d> values with Minidom. This is the code I am trying to use to find the value:
expired = True
f = urlreq.urlopen("http://st.chatango.com/profileimg/"+args[:1]+"/"+args[1:2]+"/"+args+"/mod1.xml")
data = f.read().decode("utf-8")
dom = minidom.parseString(data)
itemlist = dom.getElementsByTagName('d')
print(itemlist)
It returns the value is there, but I followed a way to read the data I found here (Below) and it just crashed my python app. This is the code I tried to fix with:
for s in itemlist:
if s.hasAttribute('d'):
print(s.attributes['d'].value)
This is the crash:
AttributeError: 'NodeList' object has no attribute 'value'
I also tried ElementTree but that didn't return any data at all. I have tested the URL and it's correct for the data I want, but I just can't get it to read the data in the tags. Any and all help is appreciated.
if you want to print values from this xml you should use this:
for s in itemlist:
if hasattr(s.childNodes[0], "data"):
print(s.childNodes[0].data)
I hope it help :D
I'm using lxml to parse some HTML with Russian letters. That's why i have headache with encodings.
I transform html text to tree using following code. Then i'm trying to extract some things from the page (header, arcticle content) using css queries.
from lxml import html
from bs4 import UnicodeDammit
doc = UnicodeDammit(html_text, is_html=True)
parser = html.HTMLParser(encoding=doc.original_encoding)
tree = html.fromstring(html_text, parser=parser)
...
def extract_title(tree):
metas = tree.cssselect("meta[property^=og]")
for meta in metas:
# print(meta.attrib)
# print(sys.stdout.encoding)
# print("123") # Uncomment this to fix error
content = meta.attrib['content']
print(content.encode('utf-8')) # This fails with "[Decode error - output not utf-8]"
I get "Decode error" when i'm trying to print unicode symbols to stdout. But if i add some print statement before failing print then everything works fine. I never saw such strange behavior of python print function. I thought it has no side-effects.
Do you have any idea why this is happening?
I use Windows and Sublime to run this code.