Python urllib2 unable to open url - python

I have a URL in this format:
https://aLongStringWithNumbers:anotherLongStringWithNumbers#somewhere.com/admin/someAPICall.json
which looks like not something that Python's urllib2 can understand, keep getting errors when using that with urllib2.open:
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
Is this a case where the library is not capable of interpreting that URL? obviously it works in the browser... Using Python 2.7
Any advice would be appreciated!
Thanks
- V

Python is interpreting the : as the port number in the URL. You need to force it to pull in as a url using something like quote
See quote from the non-accepted answer here
>>> import urllib2
>>> urllib2.quote('https://aLongStringWithNumbers:anotherLongStringWithNumbers#somewhere.com/admin/someAPICall.json')

Related

Retrieving full URL from cgi.FieldStorage

I'm passing a URL to a python script using cgi.FieldStorage():
http://localhost/cgi-bin/test.py?file=http://localhost/test.xml
test.py just contains
#!/usr/bin/env python
import cgi
print "Access-Control-Allow-Origin: *"
print "Content-Type: text/plain; charset=x-user-defined"
print "Accept-Ranges: bytes"
print
print cgi.FieldStorage()
and the result is
FieldStorage(None, None, [MiniFieldStorage('file', 'http:/localhost/test.xml')])
Note that the URL only contains http:/localhost - how do I pass the full encoded URI so that file is the whole URI? I've tried encoding the file parameter (http%3A%2F%2Flocalhost%2ftext.xml) but this also doesn't work
The screenshot shows that the output to the webpage isn't what is expected, but that the encoded url is correct
Your CGI script works fine for me using Apache 2.4.10 and Firefox (curl also). What web server and browser are you using?
My guess is that you are using Python's CGIHTTPServer, or something based on it. This exhibits the problem that you identify. CGIHTTPServer assumes that it is being provided with a path to a CGI script so it collapses the path without regard to any query string that might be present. Collapsing the path removes duplicate forward slashes as well as relative path elements such as ...
If you are using this web server I don't see any obvious way around by changing the URL. You won't be using it in production, so perhaps look at another web server such as Apache, nginx, lighttpd etc.
The problem is with your query parameters, you should be encoding them:
>>> from urllib import urlencode
>>> urlencode({'file': 'http://localhost/test.xml', 'other': 'this/has/forward/slashes'})
'other=this%2Fhas%2Fforward%2Fslashes&file=http%3A%2F%2Flocalhost%2Ftest.xml'

Syntax Issue in Python urllib2?

Am trying to test out urllib2. Here's my code:
import urllib2
response = urllib2.urlopen('http://pythonforbeginners.com/')
print response.info()
html = response.read()
response.close()
When I run it, I get:
Syntax Error: invalid syntax. Carrot points to line 3 (the print line). Any idea what's going on here? I'm just trying to follow a tutorial and this is the first thing they do...
Thanks,
Mariogs
In Python3 print is a function. Therefore it needs parentheses around its argument:
print(response.info())
In Python2, print is a statement, and hence does not require parentheses.
After correcting the SyntaxError, as alecxe points out, you'll probably encounter an ImportError next. That is because the Python2 module called urllib2 was renamed to urllib.request in Python3. So you'll need to change it to
import urllib.request as request
response = request.urlopen('http://pythonforbeginners.com/')
As you can see, the tutorial you are reading is meant for Python2. You might want to find a Python3 tutorial or Python3 urllib HOWTO to avoid running into more of these problems.

get source html in local system python

Dears I want get source page but not in internet rather in local system
example : url=urllib.request.urlopen ('c://1.html')
>>> import urllib.request
>>> url=urllib.request.urlopen ('http://google.com')
>>> page =url.read()
>>> page=page.decode()
>>> page
what's my problem ?
from os.path import abspath
with open(abspath('c:/1.html') as fh:
print(fh.read())
Since url.read() just gives you the data as-is, and .decode() doesn't really do anything except convert the byte data from the socket to a traditional string, just print the filecontents?
urllib is mainly (if not only) a transporter to recieve HTML data, not actually parse the content. So all it does is connect to the source, separate the headers and give you the content. If you've already stored it locally, in a file.. Well then urllib has no more use to you. Consider looking at a HTML Parsing library such as BeautifulSoup for instance.

Facebook Graph API encoding - Python

I am going around in circles and tried so many different ways so I guess my core understanding is wrong. I would be grateful for help in understanding my encoding/decoding issues.
import urllib2
result = urllib2.urlopen("https://graph.facebook.com/163146530455639")
rawdata = result.read().decode('utf-8')
print "HEADER: " + str(result.info())
print "I want this to work ", rawdata.find('http://www.facebook.com')
print "I dont want this to work ", rawdata.find('http:\/\/www.facebook.com')
I guess what im getting isnt utf-8 even though the header seems to say it is. Or as a newbie to Python im doing something dumb. :(
Thanks for any help,
Phil
You're getting JSON back from Facebook, so the easiest thing to do is use the built in json module to decode it (provided you're using Python 2.6+, otherwise you'll have to install).
import json
import urllib2
result = urllib2.urlopen("https://graph.facebook.com/163146530455639")
rawdata = result.read()
jsondata = json.load(rawdata)
print jsondata['link']
gives you:
u'http://www.facebook.com/GrosvenorCafe'

Trying to convert php to python and getting a syntax error

I'm trying to convert some php code into python and am using curl. I've gotten most of it to be accepted, but when it gets to the result = pycurl.exec(Moe) it keeps throwing a syntax error. I guess that I'm not filling out the exec field correctly, but I can't seem to figure out where it is going wrong.
from urllib2 import urlopen
from ClientForm import ParseResponse
import cgi, cgitb
import webbrowser
import curl, pycurl
Moe = pycurl.Curl(wsdl)
Moe.setopt(pycurl.POST, 1)
Moe.setopt(pycurl.HTTPHEADER, ["Content-Type: text/xml"])
Moe.setopt(pycurl.HTTPAUTH, pycurl.BASIC)
Moe.setopt(pycurl.USERPWD, "userid:password")
Moe.setopt(pycurl.POSTFIELDS, Larry)
Moe.setopt(pycurl.SSL_VERIFYPEER, 0)
Moe.setopt(pycurl.SSLCERT, pemlocation)
Moe.setopt(pycurl.SSLKEY, keylocation)
Moe.setopt(pycurl.SSLKEYPASSWD, keypassword)
Moe.setopt(pycurl.RETURNTRANSFER, 1)
result = pycurl.exec(Moe)
pycurl.close(Moe)
Use result = Moe.perform() to execute your request.
PS:
Moe.setopt(pycurl.POSTFIELDS, Larry)
Is Larry actually a variable? If it's a string, quote it.
exec is a reserved word in Python, you cannot have a function of that name. Try reading the pycurl documentation to see what function you should be calling.
I haven't used pycurl myself, but maybe you want to call Moe.perform()?

Categories

Resources