Convert urllib2 python code to use urllib module - python

I have the following code below which runs using the urllib2 module, but I have a requirement to upgrade to Python 3.x and this prevents the use of urllib2. I am aware it is split across urllib.request and urllib.error, but I am struggling to convert the following code to use the urllib module instead after reading through the doc and other relevant questions. Any help is greatly appreciated.
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = urllib2.Request(url=event['ResponseURL'], data=data)
request.add_header('Content-Type', '')
request.get_method = lambda: 'PUT'
url = opener.open(request)

All you need to do is replace urllib2 with urllib.request. You are not using anything that has moved to other urllib.* modules:
import urllib.request
opener = urllib.request.build_opener(urllib.request.HTTPHandler)
request = urllib.request.Request(url=event['ResponseURL'], data=data)
request.add_header('Content-Type', '')
request.get_method = lambda: 'PUT'
url = opener.open(request)
You can always run the 2to3 command-line tool on your Python 2 code and see what changes it makes; the default action is to output changes on stdout in unified diff format.
The urllib fixer will then also add imports for urllib.error and urllib.parse at the top, because it knows that code that imported urllib2 could need any of the 3 urllib.* modules; it isn't smart enough to limit the import only to those that are actually needed after transforming the rest of the urllib2 references in the module.

Related

How do I get the HTML of a website using Python 3?

I've been trying to do this with repl.it and have tried several solutions on this site, but none of them work. Right now, my code looks like
import urllib
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
print (urllib.urlopen(url).read())
but it just says "AttributeError: module 'urllib' has no attribute 'urlopen'".
If I add import urllib.urlopen, it tells me there's no module named that. How can I fix my problem?
The syntax you are using for the urllib library is from Python v2. The library has changed somewhat for Python v3. The new notation would look something more like:
import urllib.request
response = urllib.request.urlopen("http://www.google.com")
html = response.read()
The html object is just a string, with the returned HTML of the site. Much like the original urllib library, you should not expect images or other data files to be included in this returned object.
The confusing part here is that, in Python 3, this would fail if you did:
import urllib
response = urllib.request.urlopen("http://www.google.com")
html = response.read()
This strange module-importing behavior is, I am told, as intended and working. BUT it is non-intuitive and awkward. More importantly, for you, it makes the situation harder to debug. Enjoy.
Python3
import urllib
import requests
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
r = urllib.request.urlopen(url).read()
print(r)
or
import urllib.request
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
r = urllib.request.urlopen(url).read()
print(r)

Build a URL using Requests module Python

Is it possible to build a URL using the Requests library for Python?
Building a query string is supported but what about building the rest of the URL. Specifically I'd be interested in adding on to the base URL with URL encoded strings:
http :// some address.com/api/[term]/
term = 'This is a test'
http :// some address.com/api/This+is+a+test/
This is presumably possible using urllib but it seems like it would be better in Requests. Does this feature exist? If not is there a good reason that it shouldn't?
requests is basically a convenient wrapper around urllib (and 2,3 and other related libraries).
You can import urljoin(), quote() from requests.compat, but this is essentially the same as using them directly from urllib and urlparse modules:
>>> from requests.compat import urljoin, quote_plus
>>> url = "http://some-address.com/api/"
>>> term = 'This is a test'
>>> urljoin(url, quote_plus(term))
'http://some-address.com/api/This+is+a+test'

Send Github Markdown API request with urllib in python 3 got BadStatusLine

I have follow python code to convert a markdown using github API.
gfm.py (python 3 code)
import traceback
import json
import urllib.request
import http.client
import sys
try:
content = open(sys.argv[1], 'r').read()
data = {"text": content,"mode": 'gfm'}
headers = {'Content-Type': 'application/json'}
bytes = json.dumps(data).encode('utf-8')
url = "https://api.github.com/markdown"
request = urllib.request.Request(url, data=bytes, headers=headers)
result = urllib.request.urlopen(request).read().decode('utf-8')
print(result)
except http.client.BadStatusLine:
traceback.print_exc()
except:
traceback.print_exc()
Scripts and test markdown files used below are contained here: https://gist.github.com/xpol/6332952
When convert small markdown file (eg. gfm.py Sample.md in gist), it got fine result.
When convert large markdown file (eg. gfm.py Cheatsheet.md in gist), it got http.client.BadStatusLine: '' when at the urllib.request.urlopen line.
Could any one know what's wrong with this?
Thanks a lot!
I can not reproduce this but I will suggest regardless that you use a library like python-requests or even an API wrapper listed in the GitHub API docs.

Download file as string in python

I want to download a file to python as a string. I have tried the following, but it doesn't seem to work. What am I doing wrong, or what else might I do?
from urllib import request
webFile = request.urlopen(url).read()
print(webFile)
The following example works.
from urllib.request import urlopen
url = 'http://winterolympicsmedals.com/medals.csv'
output = urlopen(url).read()
print(output.decode('utf-8'))
Alternatively, you could use requests which provides a more human readable syntax. Keep in mind that requests requires that you install additional dependencies, which may increase the complexity of deploying the application, depending on your production enviornment.
import requests
url = 'http://winterolympicsmedals.com/medals.csv'
output = requests.get(url).text
print(output)
In Python3.x, using package 'urllib' like this:
from urllib.request import urlopen
data = urlopen('http://www.google.com').read() #bytes
body = data.decode('utf-8')
Another good library for this is http://docs.python-requests.org
It's not built-in, but I've found it to be much more usable than urllib*.

Script to connect to a web page

Looking for a python script that would simply connect to a web page (maybe some querystring parameters).
I am going to run this script as a batch job in unix.
urllib2 will do what you want and it's pretty simple to use.
import urllib
import urllib2
params = {'param1': 'value1'}
req = urllib2.Request("http://someurl", urllib.urlencode(params))
res = urllib2.urlopen(req)
data = res.read()
It's also nice because it's easy to modify the above code to do all sorts of other things like POST requests, Basic Authentication, etc.
Try this:
aResp = urllib2.urlopen("http://google.com/");
print aResp.read();
If you need your script to actually function as a user of the site (clicking links, etc.) then you're probably looking for the python mechanize library.
Python Mechanize
A simple wget called from a shell script might suffice.
in python 2.7:
import urllib2
params = "key=val&key2=val2" #make sure that it's in GET request format
url = "http://www.example.com"
html = urllib2.urlopen(url+"?"+params).read()
print html
more info at https://docs.python.org/2.7/library/urllib2.html
in python 3.6:
from urllib.request import urlopen
params = "key=val&key2=val2" #make sure that it's in GET request format
url = "http://www.example.com"
html = urlopen(url+"?"+params).read()
print(html)
more info at https://docs.python.org/3.6/library/urllib.request.html
to encode params into GET format:
def myEncode(dictionary):
result = ""
for k in dictionary: #k is the key
result += k+"="+dictionary[k]+"&"
return result[:-1] #all but that last `&`
I'm pretty sure this should work in either python2 or python3...
What are you trying to do? If you're just trying to fetch a web page, cURL is a pre-existing (and very common) tool that does exactly that.
Basic usage is very simple:
curl www.example.com
You might want to simply use httplib from the standard library.
myConnection = httplib.HTTPConnection('http://www.example.com')
you can find the official reference here: http://docs.python.org/library/httplib.html

Categories

Resources