python zeep: send un-escaped xml as content - python

I think what I am trying to do is pretty much like github issue in zeep repo --- but sadly there is no response to this issue yet. I researched suds and installed and tried -- did not even get sending parameter to work and thought zeep seems better maintained?
Edit 1:
For sure I am not talking about this

You can use a Plugin for editing the xml as a plain string. I used this plugin for keeping the characters '<' and '>' in a CDATA element.
from xml import etree
from zeep import Plugin
class my_plugin(Plugin):
def egress(self, envelope, http_headers, operation, binding_options):
xml_string = etree.ElementTree.tostring(envelope)
xml_string = xml_string.replace("<", "<")
xml_string = xml_string.replace(">", ">")
parser = etree.ElementTree.XMLParser(strip_cdata=False)
new_envelope = etree.ElementTree.XML(xml_string, parser=parser)
return new_envelope, http_headers
Then just import the plugin on the client:
client = Client(wsdl='url', transport=transport, plugins=[my_plugin()])
Take a look at the docs: http://docs.python-zeep.org/en/master/plugins.html

On Python 3.9, #David Ortiz answer didn't work for me, maybe something has changed. The etree_to_string was failing to convert the XML to string.
What worked for me, instead of a plugin, I created a custom transport, that replaced the stripped tags with the correct characters, just like David's code, before the post was sent.
import zeep
from zeep.transports import Transport
from xml.etree import ElementTree
class CustomTransport(Transport):
def post_xml(self, address, envelope, headers):
message = ElementTree.tostring(envelope, encoding="unicode")
message = message.replace("<", "<")
message = message.replace(">", ">")
return self.post(address, message, headers)
client = zeep.Client('wsdl_url', transport=CustomTransport())

Related

Displaying XML attributes in PHP

I am a beginner in coding and have this question below. I would gladly appreciate any help.
I have this python code below that request for information regarding a organization.
Note: The Commented "target" variable is for future use when i pass the user input from php to this python script.
import requests, sys
#target = sys.argv[1]
target = "logitech"
request = requests.get('http://whois.arin.net/rest/nets;name={}'.format(target))
print(request.text)
The output is similar to this but the number of "netRef" tags may vary depending on the organization.
<?xml version='1.0'?><?xml-stylesheet type='text/xsl' href='http://whois.arin.net/xsl/website.xsl' ?><nets xmlns="http://www.arin.net/whoisrws/core/v1" xmlns:ns2="http://www.arin.net/whoisrws/rdns/v1" xmlns:ns3="http://www.arin.net/whoisrws/netref/v2" copyrightNotice="Copyright 1997-2020, American Registry for Internet Numbers, Ltd." inaccuracyReportUrl="https://www.arin.net/resources/registry/whois/inaccuracy_reporting/" termsOfUse="https://www.arin.net/resources/registry/whois/tou/"><limitExceeded limit="256">false</limitExceeded>
<netRef endAddress="173.8.217.111" startAddress="173.8.217.96" handle="NET-173-8-217-96-1" name="LOGITECH">https://whois.arin.net/rest/net/NET-173-8-217-96-1</netRef>
<netRef endAddress="50.193.49.47" startAddress="50.193.49.32" handle="NET-50-193-49-32-1" name="LOGITECH">https://whois.arin.net/rest/net/NET-50-193-49-32-1</netRef></nets>
I was wondering, is it possible to only display all of the endAddress and startAddress attributes in PHP?
I've tried using the xml.etree.ElementTree module but because the request variable is a "response" instead of a "byte", i can't parse the XML directly into an element.
My PHP code currently looks like this as i am unsure of how to proceed. testapi.py refers to the python code above.
<?php
$output1 = shell_exec('python testapi.py');
echo $output1;
?>
My desired output on the PHP side is as follow:
IP range: 173.8.217.96-173.8.217.111, 50.193.49.32-50.193.49.47
I would gladly appreciate any help, Thank You.
Python's etree maintains the fromstring method to parse XML trees from text. From there, you can parse content and be sure to assign prefixes to the default namespace in XML:
xmlns="http://www.arin.net/whoisrws/core/v1"
import requests as rq
import xml.etree.ElementTree as ET
request = rq.get('http://whois.arin.net/rest/nets;name=logitech')
tree = ET.fromstring(request.text)
nmsp = {"doc": "http://www.arin.net/whoisrws/core/v1"}
for elem in tree.findall(".//doc:netRef", nmsp):
print(f"endAddress: {elem.attrib['endAddress']}")
print(f"startAddress: {elem.attrib['startAddress']}")
print("---------------------------\n")
# endAddress: 173.8.217.111
# startAddress: 173.8.217.96
# ---------------------------
# endAddress: 50.193.49.47
# startAddress: 50.193.49.32
# ---------------------------

ParseError: not well-formed (invalid token) using cElementTree

I receive xml strings from an external source that can contains unsanitized user contributed content.
The following xml string gave a ParseError in cElementTree:
>>> print repr(s)
'<Comment>dddddddd\x08\x08\x08\x08\x08\x08_____</Comment>'
>>> import xml.etree.cElementTree as ET
>>> ET.XML(s)
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
ET.XML(s)
File "<string>", line 106, in XML
ParseError: not well-formed (invalid token): line 1, column 17
Is there a way to make cElementTree not complain?
It seems to complain about \x08 you will need to escape that.
Edit:
Or you can have the parser ignore the errors using recover
from lxml import etree
parser = etree.XMLParser(recover=True)
etree.fromstring(xmlstring, parser=parser)
I was having the same error (with ElementTree). In my case it was because of encodings, and I was able to solve it without having to use an external library. Hope this helps other people finding this question based on the title. (reference)
import xml.etree.ElementTree as ET
parser = ET.XMLParser(encoding="utf-8")
tree = ET.fromstring(xmlstring, parser=parser)
EDIT: Based on comments, this answer might be outdated. But this did work back when it was answered...
This code snippet worked for me. I have an issue with the parsing batch of XML files. I had to encode them to 'iso-8859-5'
import xml.etree.ElementTree as ET
tree = ET.parse(filename, parser = ET.XMLParser(encoding = 'iso-8859-5'))
See this answer to another question and the according part of the XML spec.
The backspace U+0008 is an invalid character in XML documents. It must be represented as escaped entity  and cannot occur plainly.
If you need to process this XML snippet, you must replace \x08 in s before feeding it into an XML parser.
None of the above fixes worked for me. The only thing that worked was to use BeautifulSoup instead of ElementTree as follows:
from bs4 import BeautifulSoup
with open("data/myfile.xml") as fp:
soup = BeautifulSoup(fp, 'xml')
Then you can search the tree as:
soup.find_all('mytag')
This is most probably an encoding error. For example I had an xml file encoded in UTF-8-BOM (checked from the Notepad++ Encoding menu) and got similar error message.
The workaround (Python 3.6)
import io
from xml.etree import ElementTree as ET
with io.open(file, 'r', encoding='utf-8-sig') as f:
contents = f.read()
tree = ET.fromstring(contents)
Check the encoding of your xml file. If it is using different encoding, change the 'utf-8-sig' accordingly.
After lots of searching through the entire WWW, I only found out that you have to escape certain characters if you want your XML parser to work! Here's how I did it and worked for me:
escape_illegal_xml_characters = lambda x: re.sub(u'[\x00-\x08\x0b\x0c\x0e-\x1F\uD800-\uDFFF\uFFFE\uFFFF]', '', x)
And use it like you'd normally do:
ET.XML(escape_illegal_xml_characters(my_xml_string)) #instead of ET.XML(my_xml_string)
A solution for gottcha for me, using Python's ElementTree... this has the invalid token error:
# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
xml = u"""<?xml version='1.0' encoding='utf8'?>
<osm generator="pycrocosm server" version="0.6"><changeset created_at="2017-09-06T19:26:50.302136+00:00" id="273" max_lat="0.0" max_lon="0.0" min_lat="0.0" min_lon="0.0" open="true" uid="345" user="john"><tag k="test" v="Съешь же ещё этих мягких французских булок да выпей чаю" /><tag k="foo" v="bar" /><discussion><comment data="2015-01-01T18:56:48Z" uid="1841" user="metaodi"><text>Did you verify those street names?</text></comment></discussion></changeset></osm>"""
xmltest = ET.fromstring(xml.encode("utf-8"))
However, it works with the addition of a hyphen in the encoding type:
<?xml version='1.0' encoding='utf-8'?>
Most odd. Someone found this footnote in the python docs:
The encoding string included in XML output should conform to the
appropriate standards. For example, “UTF-8” is valid, but “UTF8” is
not.
I have been in stuck with similar problem. Finally figured out the what was the root cause in my particular case. If you read the data from multiple XML files that lie in same folder you will parse also .DS_Store file.
Before parsing add this condition
for file in files:
if file.endswith('.xml'):
run_your_code...
This trick helped me as well
lxml solved the issue, in my case
from lxml import etree
for _, elein etree.iterparse(xml_file, tag='tag_i_wanted', unicode='utf-8'):
print(ele.tag, ele.text)
in another case,
parser = etree.XMLParser(recover=True)
tree = etree.parse(xml_file, parser=parser)
tags_needed = tree.iter('TAG NAME')
Thanks to theeastcoastwest
Python 2.7
In my case I got the same error. (using Element Tree)
I had to add these lines:
import xml.etree.ElementTree as ET
from lxml import etree
parser = etree.XMLParser(recover=True,encoding='utf-8')
xml_file = ET.parse(path_xml,parser=parser)
Works in pyhton 3.10.2
What helped me with that error was Juan's answer - https://stackoverflow.com/a/20204635/4433222
But wasn't enough - after struggling I found out that an XML file needs to be saved with UTF-8 without BOM encoding.
The solution wasn't working for "normal" UTF-8.
The only thing that worked for me is I had to add mode and encoding while opening the file like below:
with open(filenames[0], mode='r',encoding='utf-8') as f:
readFile()
Otherwise it was failing every time with invalid token error if I simply do this:
f = open(filenames[0], 'r')
readFile()
this error is coming while you are giving a link . but first you have to find the string of that link
response = requests.get(Link)
root = cElementTree.fromstring(response.content)
I tried the other solutions in the answers here but had no luck. Since I only needed to extract the value from a single xml node I gave in and wrote my function to do so:
def ParseXmlTagContents(source, tag, tagContentsRegex):
openTagString = "<"+tag+">"
closeTagString = "</"+tag+">"
found = re.search(openTagString + tagContentsRegex + closeTagString, source)
if found:
start = found.regs[0][0]
end = found.regs[0][1]
return source[start+len(openTagString):end-len(closeTagString)]
return ""
Example usage would be:
<?xml version="1.0" encoding="utf-16"?>
<parentNode>
<childNode>123</childNode>
</parentNode>
ParseXmlTagContents(xmlString, "childNode", "[0-9]+")

xml parsing using ElementTree

I have written a small function, which uses ElementTree to parse xml file,but it is throwing the following error "xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0". please find the code below
tree = ElementTree.parse(urllib2.urlopen('http://api.ean.com/ean-services/rs/hotel/v3/list?type=xml&apiKey=czztdaxrhfbusyp685ut6g6v&cid=8123&locale=en_US&city=Dallas%20&stateProvinceCode=TX&countryCode=US&minorRev=12'))
rootElem = tree.getroot()
hotel_list = rootElem.findall("HotelList")
There are multiple problems with the site you are using:
Site you are using somehow doesn't honour type=xml you are sending as GET arg, instead you will need to send accept header, telling site that you accept XML else it returns JSON data
Site is not accepting content-type text/xml so you need to send application/xml
Your parse call is correct, it is wrongly mentioned in other answer that it should take data, instead parse takes file name or file type object
So here is the working code
import urllib2
from xml.etree import ElementTree
url = 'http://api.ean.com/ean-services/rs/hotel/v3/list?type=xml&apiKey=czztdaxrhfbusyp685ut6g6v&cid=8123&locale=en_US&city=Dallas%20&stateProvinceCode=TX&countryCode=US&minorRev=12'
request = urllib2.Request(url, headers={"Accept" : "application/xml"})
u = urllib2.urlopen(request)
tree = ElementTree.parse(u)
rootElem = tree.getroot()
hotel_list = rootElem.findall("HotelList")
print hotel_list
output:
[<Element 'HotelList' at 0x248cd90>]
Note I am creating a Request object and passing Accept header
btw if site is returning JSON why you need to parse XML, parsing JSON is simpler and you will get a ready made python object.

Script to connect to a web page

Looking for a python script that would simply connect to a web page (maybe some querystring parameters).
I am going to run this script as a batch job in unix.
urllib2 will do what you want and it's pretty simple to use.
import urllib
import urllib2
params = {'param1': 'value1'}
req = urllib2.Request("http://someurl", urllib.urlencode(params))
res = urllib2.urlopen(req)
data = res.read()
It's also nice because it's easy to modify the above code to do all sorts of other things like POST requests, Basic Authentication, etc.
Try this:
aResp = urllib2.urlopen("http://google.com/");
print aResp.read();
If you need your script to actually function as a user of the site (clicking links, etc.) then you're probably looking for the python mechanize library.
Python Mechanize
A simple wget called from a shell script might suffice.
in python 2.7:
import urllib2
params = "key=val&key2=val2" #make sure that it's in GET request format
url = "http://www.example.com"
html = urllib2.urlopen(url+"?"+params).read()
print html
more info at https://docs.python.org/2.7/library/urllib2.html
in python 3.6:
from urllib.request import urlopen
params = "key=val&key2=val2" #make sure that it's in GET request format
url = "http://www.example.com"
html = urlopen(url+"?"+params).read()
print(html)
more info at https://docs.python.org/3.6/library/urllib.request.html
to encode params into GET format:
def myEncode(dictionary):
result = ""
for k in dictionary: #k is the key
result += k+"="+dictionary[k]+"&"
return result[:-1] #all but that last `&`
I'm pretty sure this should work in either python2 or python3...
What are you trying to do? If you're just trying to fetch a web page, cURL is a pre-existing (and very common) tool that does exactly that.
Basic usage is very simple:
curl www.example.com
You might want to simply use httplib from the standard library.
myConnection = httplib.HTTPConnection('http://www.example.com')
you can find the official reference here: http://docs.python.org/library/httplib.html

HTTP POST binary files using Python: concise non-pycurl examples?

I'm interested in writing a short python script which uploads a short binary file (.wav/.raw audio) via a POST request to a remote server.
I've done this with pycurl, which makes it very simple and results in a concise script; unfortunately it also requires that the end
user have pycurl installed, which I can't rely on.
I've also seen some examples in other posts which rely only on basic libraries, urllib, urllib2, etc., however these generally seem to be quite verbose, which is also something I'd like to avoid.
I'm wondering if there are any concise examples which do not require the use of external libraries, and which will be quick and easy for 3rd parties to understand - even if they aren't particularly familiar with python.
What I'm using at present looks like,
def upload_wav( wavfile, url=None, **kwargs ):
"""Upload a wav file to the server, return the response."""
class responseCallback:
"""Store the server response."""
def __init__(self):
self.contents=''
def body_callback(self, buf):
self.contents = self.contents + buf
def decode( self ):
self.contents = urllib.unquote(self.contents)
try:
self.contents = simplejson.loads(self.contents)
except:
return self.contents
t = responseCallback()
c = pycurl.Curl()
c.setopt(c.POST,1)
c.setopt(c.WRITEFUNCTION, t.body_callback)
c.setopt(c.URL,url)
postdict = [
('userfile',(c.FORM_FILE,wavfile)), #wav file to post
]
#If there are extra keyword args add them to the postdict
for key in kwargs:
postdict.append( (key,kwargs[key]) )
c.setopt(c.HTTPPOST,postdict)
c.setopt(c.VERBOSE,verbose)
c.perform()
c.close()
t.decode()
return t.contents
this isn't exact, but it gives you the general idea. It works great, it's simple for 3rd parties to understand, but it requires pycurl.
POSTing a file requires multipart/form-data encoding and, as far as I know, there's no easy way (i.e. one-liner or something) to do this with the stdlib. But as you mentioned, there are plenty of recipes out there.
Although they seem verbose, your use case suggests that you can probably just encapsulate it once into a function or class and not worry too much, right? Take a look at the recipe on ActiveState and read the comments for suggestions:
Recipe 146306: Http client to POST using multipart/form-data
or see the MultiPartForm class in this PyMOTW, which seems pretty reusable:
PyMOTW: urllib2 - Library for opening URLs.
I believe both handle binary files.
I met similar issue today, after tried both and pycurl and multipart/form-data, I decide to read python httplib/urllib2 source code to find out, I did get one comparably good solution:
set Content-Length header(of the file) before doing post
pass a opened file when doing post
Here is the code:
import urllib2, os
image_path = "png\\01.png"
url = 'http://xx.oo.com/webserviceapi/postfile/'
length = os.path.getsize(image_path)
png_data = open(image_path, "rb")
request = urllib2.Request(url, data=png_data)
request.add_header('Cache-Control', 'no-cache')
request.add_header('Content-Length', '%d' % length)
request.add_header('Content-Type', 'image/png')
res = urllib2.urlopen(request).read().strip()
return res
see my blog post: http://www.2maomao.com/blog/python-http-post-a-binary-file-using-urllib2/
I know this is an old old stack, but I have a different solution.
If you went thru the trouble of building all the magic headers and everything, and are just UPSET that suddenly a binary file can't pass because python library is mean.. you can monkey patch a solution..
import httplib
class HTTPSConnection(httplib.HTTPSConnection):
def _send_output(self, message_body=None):
self._buffer.extend(("",""))
msg = "\r\n".join(self._buffer)
del self._buffer[:]
self.send(msg)
if message_body is not None:
self.send(message_body)
httplib.HTTPSConnection = HTTPSConnection
If you are using HTTP:// instead of HTTPS:// then replace all instances of HTTPSConnection above with HTTPConnection.
Before people get upset with me, YES, this is a BAD SOLUTION, but it is a way to fix existing code you really don't want to re-engineer to do it some other way.
Why does this fix it? Go look at the original Python source, httplib.py file.
How's urllib substantially more verbose? You build postdict basically the same way, except you start with
postdict = [ ('userfile', open(wavfile, 'rb').read()) ]
Once you vave postdict,
resp = urllib.urlopen(url, urllib.urlencode(postdict))
and then you get and save resp.read() and maybe unquote and try JSON-loading if needed. Seems like it would be actually shorter! So what am I missing...?
urllib.urlencode doesn't like some kinds of binary data.

Categories

Resources