Parsing a SOAP response in Python

Parsing a SOAP response in Python - python

I'm trying to parse a SOAP response from a server. I'm 100% new to SOAP and pretty new to communicating using HTTP/HTTPS. I'm using Python 2.7 on Ubuntu 12.04.
It looks like SOAP is very much like XML. However, I seem to be unable to parse it as such. I've tried to use ElementTree but keep getting errors. From searches I've been able to conclude that there may be issues with the SOAP tags. (I could be way off here...let me know if I am.)
So, here is an example of the SOAP message I have and what I'm trying to do to parse it (this is an actual server response from Link Point Gateway, in case that's relevant).
import xml.etree.ElementTree as ET
soap_string = '<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Body><fdggwsapi:FDGGWSApiOrderResponse xmlns:fdggwsapi="http://secure.linkpt.net/fdggwsapi/schemas_us/fdggwsapi"><fdggwsapi:CommercialServiceProvider/><fdggwsapi:TransactionTime>Wed Jul 25 10:26:40 2012</fdggwsapi:TransactionTime><fdggwsapi:TransactionID/><fdggwsapi:ProcessorReferenceNumber/><fdggwsapi:ProcessorResponseMessage/><fdggwsapi:ErrorMessage>SGS-002303: Invalid credit card number.</fdggwsapi:ErrorMessage><fdggwsapi:OrderId>1</fdggwsapi:OrderId><fdggwsapi:ApprovalCode/><fdggwsapi:AVSResponse/><fdggwsapi:TDate/><fdggwsapi:TransactionResult>FAILED</fdggwsapi:TransactionResult><fdggwsapi:ProcessorResponseCode/><fdggwsapi:ProcessorApprovalCode/><fdggwsapi:CalculatedTax/><fdggwsapi:CalculatedShipping/><fdggwsapi:TransactionScore/><fdggwsapi:FraudAction/><fdggwsapi:AuthenticationResponseCode/></fdggwsapi:FDGGWSApiOrderResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>'
targetTree = ET.fromstring(soap_string)
This yields the following error:
unbound prefix: line 1, column 0
From another stackoverflow post I've concluded that SOAP-ENV:Body may be causing a namespace problem. (I could be wrong.)
I've done other searches to find a good solution for parsing SOAP but most of them are from 3+ years ago. It seems that suds is pretty highly recommended. I wanted to get "updated" recommendations before I got too far down a path.
Can anyone recommend a solid (and easy) way to parse a SOAP response like the one I received above? It would be appreciated if you could provide a simple example to get me started (as I said above, I'm completely new to SOAP).

I was unable to find a straight-forward approach using Python. I decided to use PHP instead.
Much like the following:
Python:
import subprocess
command = 'php /path/to/script.php "{1}"'.format(soap_string)
process = subprocess.Popen(command, shell = True, stderr = subprocess.PIPE, stdout = subprocess.PIPE)
process.wait()
output = process.communicate()[0]
(error, result, order_id) = output.split(',')
PHP:
#!/usr/bin/php
<?php
$soap_response = $argv[1];
$doc = simplexml_load_string($soap_response);
$doc->registerXPathNamespace('fdggwsapi', 'http://secure.linkpt.net/fdggwsapi/schemas_us/fdggwsapi');
$nodes = $doc->xpath('//fdggwsapi:FDGGWSApiOrderResponse/fdggwsapi:ErrorMessage');
$error = strval($nodes[0]);
$nodes = $doc->xpath('//fdggwsapi:FDGGWSApiOrderResponse/fdggwsapi:TransactionResult');
$result = strval($nodes[0]);
$nodes = $doc->xpath('//fdggwsapi:FDGGWSApiOrderResponse/fdggwsapi:OrderId');
$order_id = strval($nodes[0]);
$array = array($error, $result, $order_id);
$response = implode(',', $array);
echo $response;
This code only parses specific aspects of this particular SOAP response. It should be enough to get you going to solve your problem.
I'm a complete newbie when it comes to PHP (I've used Perl a bit so that helped). I must give credit to #scoffey for his solution to parsing SOAP in a way that finally made sense to me.

EDITED:
Working with SOAP in Python is really fun - most tools are not maintained for years. If we talk about features - maybe ZSI is the leader. But it has lots of bugs if it comes to support some more complex XSD schemas(just one example - it doesn't support unions and complex types based on extensions, where the extended type is not a base type).
Suds is very easy to use, but not so powerful as ZSI - it has worse support for some complex XSD constructs than ZSI.
There is an interesting tool - generateDS, which works with XSD and not directly with WSDL - you have to implement the methods yourself. But it does a pretty good job actually.

Related

Equivalence of load() and dump() functions from Php 'EasyRdf' in Python 'rdflib'

I recently started working with Knowledge Graphs and RDFs, and I was fortunate enough that someone provided me with an introductory excercise, which helped me implement some basic functionalities on a local Apache server using Php and EasyRdf. My long-term goal is to make use of NLP and Machine Learning libraries, which is why I switched to Python and the rdflib library.
As far as I know, after reading the documentation of rdflib and several plugins, there is no equivalence for the load() and dump() functions that are available in EasyRdf.
Background
I used load() in Php for loading URLs that conform with the GraphStore protocol, which allows the retrieval of content from a specific graph
I then use dump() for a readable output of a graph (without checking the pages source text)
<?php
require 'vendor/autoload.php';
$graph=new EasyRdf\Graph();
print "<br/><b>The content of the graph:</b><br/>";
$graph->load("http://localhost:7200/repositories/myrepo/rdf-graph/service?graph=".urlencode("http://example.com#graph"));
print $graph->dump();
?>
My Question
My question now is what the most straightforward solution would be to implement something similar to the given example in Python using rdflib. Did I miss something or are there just no equivalent functions available in rdflib?
I already used the SPARQLWrapper and the SPARQLUpdateStore for a different purpose, but they are not helping in this case. So I would be interested in possibilities that are similar to the use of the EasyRdf\Http\Client() from EasyRDF. A basic example of mine in Php would be this:
<?php
require 'vendor/autoload.php';
$adress="http://localhost:7200/repositories/myrepo?query=";
$query=urlencode("some_query")
$clienthttp=new EasyRdf\Http\Client($adress.$query);
$clienthttp->setHeaders("Accept","application/sparql-results+json");
$resultJSON=$clienthttp->request()->getBody();
print "<br/><br/><b>The received JSON response: </b>".$resultJSON;
?>
Much thanks in advance for any kind of help.
UPDATE
There actually seems to be no equivalent in rdflib to the dump() function from easyRDF, as Python is not a language for web pages (like PHP is), so it does not have default awareness of HTML and front-end stuff. The only thing I managed was to parse and serialize the graph as needed, and then just return it. Although the output is not lovely, the graph content will be correctly displayed in the page's source.
Regarding the load() function, this can be achieved by using the respective URL that is needed to access your repository in combination with a simple request. In the case of GraphDB, the following code worked out for me:
# userinput is provided in the form of a Graph URI from the repository
def graphparser_helper(userinput):
repoURL = "http://localhost:7200/repositories/SomeRepo/rdf-graphs/service?"
graphURI = {"graph": userinput}
desiredFormat = {"Accept": "text/turtle"}
myRequest = requests.get(repoURL, headers=desiredFormat, params=graphURI)
data = myRequest.text
return data
I hope that helps someone.

How to get unparsed XML from a suds response, and best django model field to use for storage

I am using suds to request data from a 3rd party using a wsdl. I am only saving some of the data returned for now, but I am paying for the data that I get so I would like to keep all of it. I have decided that the best way to save this data is by capturing the raw xml response into a database field both for future use should I decide that I want to start using different parts of the data and as a paper trail in the event of discrepancies.
So I have a two part question:
Is there a simple way to output the raw received xml from the suds.client object? In my searches for the answer to this I have learned this can be done through logging, but I was hoping to not have to dig that information back out of the logs to put into the database field. I have also looked into the MessagePlugin.recieved() hook, but could not really figure out how to access this information after it has been parsed, only that I can override that function and have access to the raw xml as it is being parsed (which is before I have decided whether or not it is actually worth saving yet or not). I have also explored the retxml option but I would like to use the parsed version as well and making two separate calls, one as retxml and the other parsed will cost me twice. I was hoping for a simple function built into the suds client (like response.as_xml() or something equally simple) but have not found anything like that yet. The option bubbling around in my head might be to extend the client object using the .received() plugin hook that saves the xml as an object parameter before it is parsed, to be referenced later... but the execution of such seems a little tricky to me right now, and I have a hard time believing that the suds client doesn't just have this built in somewhere already, so I thought I would ask first.
The other part to my question is: What type of django model field would be best suited to handle up to ~100 kb of text data as raw xml? I was going to simply use a simple CharField with a stupidly long max_length, but that feels wrong.
Thanks in advance.

I solved this by using the flag retxml on client initialization:
client = Client(settings.WSDL_ADDRESS, retxml=True)
raw_reply = client.service.PersonSearch(soapified_search_object)
I was then able to save raw_reply as the raw xml into a django models.TextField()
and then inject the raw xml to get a suds parsed result without having to re-submit my search lika so:
parsed_result = client.service.PersonSearch(__inject={'reply': raw_reply})
I suppose if I had wanted to strip off the suds envelope stuff from raw reply I could have used a python xml library for further usage of the reply, but as my existing code was already taking the information I wanted from the suds client result I just used that.
Hope this helps someone else.

I have used kyrayzk solution for a while, but have always found it a bit hackish, as I had to create a separate dummy client just for when I needed to process the raw XML.
So I sort of reimplemented .last_received() and .last_sent() methods (which were (IMHO, mistakenly) removed in suds-jurko 0.4.1) through a MessagePlugin.
Hope it helps someone:
class MyPlugin(MessagePlugin):
def __init__(self):
self.last_sent_raw = None
self.last_received_raw = None
def sending(self, context):
self.last_sent_raw = str(context.envelope)
def received(self, context):
self.last_received_raw = str(context.reply)
Usage:
plugin = MyPlugin()
client = Client(TRTH_WSDL_URL, plugins=[plugin])
client.service.SendSomeRequest()
print plugin.last_sent_raw
print plugin.last_received_raw
And as an extra, if you want a nicely indented XML, try this:
from lxml import etree
def xmlpprint(xml):
return etree.tostring(etree.fromstring(xml), pretty_print=True)

Consume SOAP webservices using escaped xml as attribute

I am using suds to consume SOAP web services like this way:
from suds.client import Client
url = "http://www.example.com?wsdl"
client = Client(url)
client.service.example(xml_argument)
If I call the method using this xml works:
<?xml version="1.0" encoding="UTF-8"?><a><b description="Foo Bar"></b></a>
But If I add a quote (escaped) like this:
<?xml version="1.0" encoding="UTF-8"?><a><b description="Foo " Bar"></b></a>
I get the following error (from the webservice):
Attribute name "Bar" associated with an element type "b" must be
followed by the ' = ' character.
I am using version: 0.4 GA build: R699-20100913
Am I not using suds.client in the proper way? any suggestions?
UPDATE:
I have already contacted customer support, emailed them my escaped XML and they told me that it works for them, so probably is caused due a bad use from suds in my side. I'll give a try with PySimpleSOAP.

Mine is mostly a guess, but the error you are quoting seems to be generated from the XML well-formedness checker on the machine providing the service.
It seems that on that side of the cable they are getting something like:
<a><b description="Foo" Bar"></b></a>
(" converted to ") and thus they are telling you that you should instead send something like:
<a><b description="Foo" Bar="..."></b></a>
which is clearly not what you want.
AFAIK your XML is well formed (just tested here for extra safety), so either there is a bug in suds (which would surprise me, given the magnitude of the bug and the maturity of the package) or there is a bug on the server providing the service (possibly a "too early conversion" from XML entities to regular chars).
Again: lot of speculation and few hard facts here, but I still HTH! :)

Scripting HTTP more effeciently

Often times I want to automate http queries. I currently use Java(and commons http client), but would probably prefer a scripting based approach. Something really quick and simple. Where I can set a header, go to a page and not worry about setting up the entire OO lifecycle, setting each header, calling up an html parser... I am looking for a solution in ANY language, preferable scripting

Have a look at Selenium. It generates code for C#, Java, Perl, PHP, Python, and Ruby if you need to customize the script.

Watir sounds close to what you want although it (like Selenium linked to in another answer) actually opens up a browser to do stuff. You can see some examples here. Another browser based record + playback approach system is sahi.
If your application is uses WSGI, then paste is a nice option.
Mechanize linked to in another answer is a "browser in a library" and there are clones in perl, Ruby and Python. The Perl one is the original one and this seems to be the way to go if you don't want a browser. The problem with this approach is that all the front end code (which might rely on JavaScript), won't be exercised.

Mechanize for Python seems easy to use: http://wwwsearch.sourceforge.net/mechanize/

My turn : wget or perl with lwp. You'll find example on the linked page.

If you have simple needs (fetch a page and then parse it), it is hard to beat LWP::Simple and HTML::TreeBuilder.
use strict;
use warnings;
use LWP::Simple;
use HTML::TreeBuilder;
my $url = 'http://www.example.com';
my $content = get( $url) or die "Couldn't get $url";
my $t = HTML::TreeBuilder->new_from_content( $content );
$t->eof;
$t->elementify;
# Get first match:
my $thing = $t->look_down( _tag => 'p', id => qr/match_this_regex/ );
print $thing ? $thing->as_text : "No match found\n";
# Get all matches:
my #things = $t->look_down( _tag => 'p', id => qr/match_this_regex/ );
print $_ ? $_->as_text : "No match found" for #things;

I'm testing ReST APIs at the moment and found the ReST Client very nice. It's a GUI program, but nonetheless you can save and restore queries as XML files (or let them be generated), embed, write test scripts, and so on. And it's Java based (which is not an ad-hoc advantage, but you mentioned it).
Minus points for recording sessions. The ReST Client is good for stateless "one-shots".
If it doesn't suit your needs, I'd go for the already mentioned Mechanize (or WWW-Mechanize, as it is called at CPAN).

Depending on exactly what you're doing the easiest solution looks to be bash + curl.
The man page for the latter is available here:
http://curl.haxx.se/docs/manpage.html
You can do posts as well as gets, HTTPS, show headers, work with cookies, basic and digest HTTP authentication, tunnel through all sorts of proxies, including NTLM on *nix amongst other things.
curl is also available as shared library with C and PHP support.
HTH
C.

Python urllib may be what you're looking for.
Alternatively powershell exposes the full .NET http library in a scripting environment.

Twill is pretty good and made for testing. It can be used as script, in an interactive session or within a Python program.

Perl and WWW::Mechanize can make web scraping etc simple and easy, including easy handling of forms (let's say you want to go to a login page, fill in a username and password and submit the form, handling cookies / hidden session identifiers just as a browser would...)
Similarly, finding or extracting links from the fetched page is trivial.
If you need to parse stuff out of the resulting pages that WWW::Mechanize can't easily help with, then feed the result to HTML::TreeBuilder to make parsing easy.

What about using PHP+Curl, or just bash?

Some ruby libraries:
httparty: really interesting, the philosophy is interesting.
mechanize: classic good-quality web automatization library.
scrubYt: puzzling at first glance but fun to use.

ElementTree in Python 2.6.2 Processing Instructions support?

I'm trying to create XML using the ElementTree object structure in python. It all works very well except when it comes to processing instructions. I can create a PI easily using the factory function ProcessingInstruction(), but it doesn't get added into the elementtree. I can add it manually, but I can't figure out how to add it above the root element where PI's are normally placed. Anyone know how to do this? I know of plenty of alternative methods of doing it, but it seems that this must be built in somewhere that I just can't find.

Try the lxml library: it follows the ElementTree api, plus adds a lot of extras. From the compatibility overview:
ElementTree ignores comments and processing instructions when parsing XML, while etree will read them in and treat them as Comment or ProcessingInstruction elements respectively. This is especially visible where comments are found inside text content, which is then split by the Comment element.
You can disable this behaviour by passing the boolean remove_comments and/or remove_pis keyword arguments to the parser you use. For convenience and to support portable code, you can also use the etree.ETCompatXMLParser instead of the default etree.XMLParser. It tries to provide a default setup that is as close to the ElementTree parser as possible.
Not in the stdlib, I know, but in my experience the best bet when you need stuff that the standard ElementTree doesn't provide.

With the lxml API it couldn't be easier, though it is a bit "underdocumented":
If you need a top-level processing instruction, create it like this:
from lxml import etree
root = etree.Element("anytagname")
root.addprevious(etree.ProcessingInstruction("anypi", "anypicontent"))
The resulting document will look like this:
<?anypi anypicontent?>
<anytagname />
They certainly should add this to their FAQ because IMO it is another feature that sets this fine API apart.

Yeah, I don't believe it's possible, sorry. ElementTree provides a simpler interface to (non-namespaced) element-centric XML processing than DOM, but the price for that is that it doesn't support the whole XML infoset.
There is no apparent way to represent the content that lives outside the root element (comments, PIs, the doctype and the XML declaration), and these are also discarded at parse time. (Aside: this appears to include any default attributes specified in the DTD internal subset, which makes ElementTree strictly-speaking a non-compliant XML processor.)
You can probably work around it by subclassing or monkey-patching the Python native ElementTree implementation's write() method to call _write on your extra PIs before _writeing the _root, but it could be a bit fragile.
If you need support for the full XML infoset, probably best stick with DOM.

I don't know much about ElementTree. But it is possible that you might be able to solve your problem using a library I wrote called "xe".
xe is a set of Python classes designed to make it easy to create structured XML. I haven't worked on it in a long time, for various reasons, but I'd be willing to help you if you have questions about it, or need bugs fixed.
It has the bare bones of support for things like processing instructions, and with a little bit of work I think it could do what you need. (When I started adding processing instructions, I didn't really understand them, and I didn't have any need for them, so the code is sort of half-baked.)
Take a look and see if it seems useful.
http://home.avvanta.com/~steveha/xe.html
Here's an example of using it:
import xe
doc = xe.XMLDoc()
prefs = xe.NestElement("prefs")
prefs.user_name = xe.TextElement("user_name")
prefs.paper = xe.NestElement("paper")
prefs.paper.width = xe.IntElement("width")
prefs.paper.height = xe.IntElement("height")
doc.root_element = prefs
prefs.user_name = "John Doe"
prefs.paper.width = 8
prefs.paper.height = 10
c = xe.Comment("this is a comment")
doc.top.append(c)
If you ran the above code and then ran print doc here is what you would get:
<?xml version="1.0" encoding="utf-8"?>
<!-- this is a comment -->
<prefs>
<user_name>John Doe</user_name>
<paper>
<width>8</width>
<height>10</height>
</paper>
</prefs>
If you are interested in this but need some help, just let me know.
Good luck with your project.

f = open('D:\Python\XML\test.xml', 'r+')
old = f.read()
f.seek(44,0) #place cursor after xml declaration
f.write('<?xml-stylesheet type="text/xsl" href="C:\Stylesheets\expand.xsl"?>'+ old[44:])
I was facing the same problem and came up with this crude solution after failing to insert the PI into the .xml file correctly even after using one of the Element methods in my case root.insert (0, PI) and trying multiple ways to cut and paste the inserted PI to the correct location only to find the data to be deleted from unexpected locations.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.