Call command of web service from command line of Python - python

I do the following Python commands:
import urllib
data = urllib.urlencode({"contains":"my_function"})
u = urllib.urlopen("http://myservername:1000/myfolder/?%s" % data)
u.read()
Then I get from that read command a lot of lines with HTML tags and one of the strings is of my interest. It looks like this:
...... onClick='doCommand("my_function","51267", $("ttt27222").value); $("ttt27222").value="";' >Apply
This is what I want to do from command line of Python using urllib.
Please let me know how to build urllib statement in order to call this my_function function passing it two parameters: 51267 and soem number for value.
Thank you

doCommand() seems like a javascript function. urllib doesn't execute javascript. You could use selenium webdriver, ghost.py to emulate web browser (to execute javascript in the context of the web page).

Related

Getting a specific file from requested iframe

I want to get the file link from the anime I'm watching from the site.
`import requests
from bs4 import BeautifulSoup
import re
page = requests.get("http://naruto-tube.org/shippuuden-sub-219")
soup = BeautifulSoup(page.content, "html.parser")
inner_content = requests.get(soup.find("iframe")["src"])
print(inner_content.text)`
the output is the source code from the filehoster's website (ani-stream). However, my problem now is how to i get the "file: xxxxxxx" line to be printed and not the whole source code?
You can Beautiful Soup to parse the iframe source code and find the script elements, but from there you're on your own. The file: "xxxxx", line is in JavaScript code, so you'll have to find the function call (to playerInstance.setup() in this case) and decide which of the two such "file:" lines is the one you want, and strip away the unwanted JS syntax around the URL.
Regular expressions will help with that, and you're probably better off just looking for the lines in the iframe's HTML. You already have re imported, so I just replaced your last line with:
lines = re.findall("file: .*$", inner_content.text, re.MULTILINE)
print( '\n'.join(lines) )
...to get a list of lines with "file:" in them. You can (and should) use a fancier RE to find just the one with "http:// and allows only whitespace before "file:" on the lines. (Python, Java and my text editor all have different ideas about what's in an RE, so I have to go to docs every time I write one. You can do that too--it's your problem, after all.)
The requests.get() function doesn't seem to work to get the bytes. Try Vishnu Kiran's urlretrieve approach--maybe that will work. Using the URL in a browser window does seem to get the right video, though, so there may be a user agent and/or cookie setting that you'll have to spoof.
If the iframe's source is not the primary domain of the website(naruto-tube.org) its contents cannot be accessed via scraping.
You will have to use a different website or you will need to get the url in the Iframe and use some library like requests to call the url.
Note you must also pass all parameters to the url if any to actually get any result. Like so
import urllib
urllib.urlretrieve ("url from the Iframe", "mp4.mp4")

Web scraping Python Shell Not Responding

I am trying to run this basic code but even after waiting for long, Python shell simply get stuck and i always find myself facing 'Python 3.6.5 Shell(Not Responding)'. Please suggest.
import requests
from bs4 import BeautifulSoup
webdump = requests.get("https://www.flipkart.com/").text
soup = BeautifulSoup(webdump,'lxml')
print(soup.prettify())
This page is around 1MB, so spitting more than 974047 bytes (soup.prettify() adds more spaces and newlines) into the terminal at once is probably what makes it stuck.
Try printing this text line by line:
for line in soup.prettify().splitlines(False):
print(line)

Read url line by line in python

I have a list containing url of images. I want to read the images in each url line by line using python. I have tried different ways, but could only read one line.
Not having seen your code, but I would recommend using Requets.
In a shell I did:
pip install --user requests
to get the above module.
If you have an url you would be able to perform in an interactive Python
import requests
r = requests.get("http://docs.python-requests.org/en/master/_static/requests-sidebar.png")
And to examine the content of the image:
print r.content
Beware the above prints the binary content to your console.
Hope it helps.

Retrieving full URL from cgi.FieldStorage

I'm passing a URL to a python script using cgi.FieldStorage():
http://localhost/cgi-bin/test.py?file=http://localhost/test.xml
test.py just contains
#!/usr/bin/env python
import cgi
print "Access-Control-Allow-Origin: *"
print "Content-Type: text/plain; charset=x-user-defined"
print "Accept-Ranges: bytes"
print
print cgi.FieldStorage()
and the result is
FieldStorage(None, None, [MiniFieldStorage('file', 'http:/localhost/test.xml')])
Note that the URL only contains http:/localhost - how do I pass the full encoded URI so that file is the whole URI? I've tried encoding the file parameter (http%3A%2F%2Flocalhost%2ftext.xml) but this also doesn't work
The screenshot shows that the output to the webpage isn't what is expected, but that the encoded url is correct
Your CGI script works fine for me using Apache 2.4.10 and Firefox (curl also). What web server and browser are you using?
My guess is that you are using Python's CGIHTTPServer, or something based on it. This exhibits the problem that you identify. CGIHTTPServer assumes that it is being provided with a path to a CGI script so it collapses the path without regard to any query string that might be present. Collapsing the path removes duplicate forward slashes as well as relative path elements such as ...
If you are using this web server I don't see any obvious way around by changing the URL. You won't be using it in production, so perhaps look at another web server such as Apache, nginx, lighttpd etc.
The problem is with your query parameters, you should be encoding them:
>>> from urllib import urlencode
>>> urlencode({'file': 'http://localhost/test.xml', 'other': 'this/has/forward/slashes'})
'other=this%2Fhas%2Fforward%2Fslashes&file=http%3A%2F%2Flocalhost%2Ftest.xml'

How do I allow a JSON response in a mechanize test?

I have a web service that returns JSON responses when successful. Unfortunately, when I try to test this service via multi-mechanize, I get an error - "not viewing HTML". Obviously it's not viewing HTML, it's getting content clearly marked as JSON. How do I get mechanize to ignore this error and accept the JSON it's getitng back?
It turns out mechanize isn't set up to accept JSON responses out of the box. For a quick and dirty solution to this, update mechanize's _headersutil.py file (check /usr/local/lib/python2.7/dist-packages/mechanize).
In the is_html() method, change the line:
html_types = ["text/html"]
to read:
html_types = ["text/html", "application/json"]

Categories

Resources