How do I allow a JSON response in a mechanize test? - python

I have a web service that returns JSON responses when successful. Unfortunately, when I try to test this service via multi-mechanize, I get an error - "not viewing HTML". Obviously it's not viewing HTML, it's getting content clearly marked as JSON. How do I get mechanize to ignore this error and accept the JSON it's getitng back?

It turns out mechanize isn't set up to accept JSON responses out of the box. For a quick and dirty solution to this, update mechanize's _headersutil.py file (check /usr/local/lib/python2.7/dist-packages/mechanize).
In the is_html() method, change the line:
html_types = ["text/html"]
to read:
html_types = ["text/html", "application/json"]

Related

Getting a specific file from requested iframe

I want to get the file link from the anime I'm watching from the site.
`import requests
from bs4 import BeautifulSoup
import re
page = requests.get("http://naruto-tube.org/shippuuden-sub-219")
soup = BeautifulSoup(page.content, "html.parser")
inner_content = requests.get(soup.find("iframe")["src"])
print(inner_content.text)`
the output is the source code from the filehoster's website (ani-stream). However, my problem now is how to i get the "file: xxxxxxx" line to be printed and not the whole source code?
You can Beautiful Soup to parse the iframe source code and find the script elements, but from there you're on your own. The file: "xxxxx", line is in JavaScript code, so you'll have to find the function call (to playerInstance.setup() in this case) and decide which of the two such "file:" lines is the one you want, and strip away the unwanted JS syntax around the URL.
Regular expressions will help with that, and you're probably better off just looking for the lines in the iframe's HTML. You already have re imported, so I just replaced your last line with:
lines = re.findall("file: .*$", inner_content.text, re.MULTILINE)
print( '\n'.join(lines) )
...to get a list of lines with "file:" in them. You can (and should) use a fancier RE to find just the one with "http:// and allows only whitespace before "file:" on the lines. (Python, Java and my text editor all have different ideas about what's in an RE, so I have to go to docs every time I write one. You can do that too--it's your problem, after all.)
The requests.get() function doesn't seem to work to get the bytes. Try Vishnu Kiran's urlretrieve approach--maybe that will work. Using the URL in a browser window does seem to get the right video, though, so there may be a user agent and/or cookie setting that you'll have to spoof.
If the iframe's source is not the primary domain of the website(naruto-tube.org) its contents cannot be accessed via scraping.
You will have to use a different website or you will need to get the url in the Iframe and use some library like requests to call the url.
Note you must also pass all parameters to the url if any to actually get any result. Like so
import urllib
urllib.urlretrieve ("url from the Iframe", "mp4.mp4")

Retrieving full URL from cgi.FieldStorage

I'm passing a URL to a python script using cgi.FieldStorage():
http://localhost/cgi-bin/test.py?file=http://localhost/test.xml
test.py just contains
#!/usr/bin/env python
import cgi
print "Access-Control-Allow-Origin: *"
print "Content-Type: text/plain; charset=x-user-defined"
print "Accept-Ranges: bytes"
print
print cgi.FieldStorage()
and the result is
FieldStorage(None, None, [MiniFieldStorage('file', 'http:/localhost/test.xml')])
Note that the URL only contains http:/localhost - how do I pass the full encoded URI so that file is the whole URI? I've tried encoding the file parameter (http%3A%2F%2Flocalhost%2ftext.xml) but this also doesn't work
The screenshot shows that the output to the webpage isn't what is expected, but that the encoded url is correct
Your CGI script works fine for me using Apache 2.4.10 and Firefox (curl also). What web server and browser are you using?
My guess is that you are using Python's CGIHTTPServer, or something based on it. This exhibits the problem that you identify. CGIHTTPServer assumes that it is being provided with a path to a CGI script so it collapses the path without regard to any query string that might be present. Collapsing the path removes duplicate forward slashes as well as relative path elements such as ...
If you are using this web server I don't see any obvious way around by changing the URL. You won't be using it in production, so perhaps look at another web server such as Apache, nginx, lighttpd etc.
The problem is with your query parameters, you should be encoding them:
>>> from urllib import urlencode
>>> urlencode({'file': 'http://localhost/test.xml', 'other': 'this/has/forward/slashes'})
'other=this%2Fhas%2Fforward%2Fslashes&file=http%3A%2F%2Flocalhost%2Ftest.xml'

Call command of web service from command line of Python

I do the following Python commands:
import urllib
data = urllib.urlencode({"contains":"my_function"})
u = urllib.urlopen("http://myservername:1000/myfolder/?%s" % data)
u.read()
Then I get from that read command a lot of lines with HTML tags and one of the strings is of my interest. It looks like this:
...... onClick='doCommand("my_function","51267", $("ttt27222").value); $("ttt27222").value="";' >Apply
This is what I want to do from command line of Python using urllib.
Please let me know how to build urllib statement in order to call this my_function function passing it two parameters: 51267 and soem number for value.
Thank you
doCommand() seems like a javascript function. urllib doesn't execute javascript. You could use selenium webdriver, ghost.py to emulate web browser (to execute javascript in the context of the web page).

The Requests streaming example does not work in my environment

I've been trying to consume the Twitter Streaming API using Python Requests.
There's a simple example in the documentation:
import requests
import json
r = requests.post('https://stream.twitter.com/1/statuses/filter.json',
data={'track': 'requests'}, auth=('username', 'password'))
for line in r.iter_lines():
if line: # filter out keep-alive new lines
print json.loads(line)
When I execute this, the call to requests.post() never returns. I've experimented and proved that it is definitely connecting to Twitter and receiving data from the API. However, instead of returning a response object, it just sits there consuming as much data as Twitter sends. Judging by the code above, I would expect requests.post() to return a response object with an open connection to Twitter down which I could continue to receive realtime results.
(To prove it was receiving data, I connected to Twitter using the same credentials in another shell, whereupon Twitter closed the first connection, and the call returned the response object. The r.content attribute contained all the backed up data received while the connection was open.)
The documentation makes no mention of any other steps required to cause requests.post to return before consuming all the supplied data. Other people seem to be using similar code without encountering this problem, e.g. here.
I'm using:
Python 2.7
Ubuntu 11.04
Requests 0.14.0
You need to switch off prefetching, which I think is a parameter that changed defaults:
r = requests.post('https://stream.twitter.com/1/statuses/filter.json',
data={'track': 'requests'}, auth=('username', 'password'),
prefetch=False)
for line in r.iter_lines():
if line: # filter out keep-alive new lines
print json.loads(line)
Note that as of requests 1.x the parameter has been renamed, and now you use stream=True:
r = requests.post('https://stream.twitter.com/1/statuses/filter.json',
data={'track': 'requests'}, auth=('username', 'password'),
stream=True)
for line in r.iter_lines():
if line: # filter out keep-alive new lines
print json.loads(line)
Ah, I found the answer by reading the code. At some point, a prefetch parameter was added to the post method (and other methods, I assume).
I just needed to add a prefetch=False kwarg to requests.post().

How to parse the "request body" using python CGI?

I just need to write a simple python CGI script to parse the contents of a POST request containing JSON. This is only test code so that I can test a client application until the actual server is ready (written by someone else).
I can read the cgi.FieldStorage() and dump the keys() but the request body containing the JSON is nowhere to be found.
I can also dump the os.environ() which provides lots of info except that I do not see a variable containing the request body.
Any input appreciated.
Chris
If you're using CGI, just read data from stdin:
import sys
data = sys.stdin.read()
notice that if you call cgi.FieldStorage() before in your code, you can't get the body data from stdin, because it just be read once.

Categories

Resources