What can be used instead of parse_qs function - python

I have the following code for parsing youtube feed and returning youtube movie id. How can I rewrite this to be python 2.4 compatible which I suppose doesn't support parse_qs function ?
YTSearchFeed = feedparser.parse("http://gdata.youtube.com" + path)
videos = []
for yt in YTSearchFeed.entries:
url_data = urlparse.urlparse(yt['link'])
query = urlparse.parse_qs(url_data[4])
id = query["v"][0]
videos.append(id)

I assume your existing code runs in 2.6 or something newer, and you're trying to go back to 2.4? parse_qs used to be in the cgi module before it was moved to urlparse. Try import cgi, cgi.parse_qs.
Inspired by TryPyPy's comment, I think you could make your source run in either environment by doing:
import urlparse # if we're pre-2.6, this will not include parse_qs
try:
from urlparse import parse_qs
except ImportError: # old version, grab it from cgi
from cgi import parse_qs
urlparse.parse_qs = parse_qs
But I don't have 2.4 to try this out, so no promises.

I tried that, and still.. it wasn't working.
It's easier to simply copy the parse_qs/qsl functions over from the cgi module to the urlparse module.
Problem solved.

Related

Convert urllib2 python code to use urllib module

I have the following code below which runs using the urllib2 module, but I have a requirement to upgrade to Python 3.x and this prevents the use of urllib2. I am aware it is split across urllib.request and urllib.error, but I am struggling to convert the following code to use the urllib module instead after reading through the doc and other relevant questions. Any help is greatly appreciated.
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = urllib2.Request(url=event['ResponseURL'], data=data)
request.add_header('Content-Type', '')
request.get_method = lambda: 'PUT'
url = opener.open(request)
All you need to do is replace urllib2 with urllib.request. You are not using anything that has moved to other urllib.* modules:
import urllib.request
opener = urllib.request.build_opener(urllib.request.HTTPHandler)
request = urllib.request.Request(url=event['ResponseURL'], data=data)
request.add_header('Content-Type', '')
request.get_method = lambda: 'PUT'
url = opener.open(request)
You can always run the 2to3 command-line tool on your Python 2 code and see what changes it makes; the default action is to output changes on stdout in unified diff format.
The urllib fixer will then also add imports for urllib.error and urllib.parse at the top, because it knows that code that imported urllib2 could need any of the 3 urllib.* modules; it isn't smart enough to limit the import only to those that are actually needed after transforming the rest of the urllib2 references in the module.

How to get the part of a URL without protocol nor domain

I have URLs of the form
http://example.com/example/a/b/c.html
https//www.example.com/
How do I get the path from the server root, without protocol or domain name? With the examples above, the function should return:
/example/a/b/c.html
/
(I am using Django: answers relying on this framework are accepted!)
urlparse module can solve this:
from urlparse import urlparse # for python 2
from urllib.parse import urlparse # for python 3
parsed_url = urlparse('http://example.com/abc/cde')
assert parsed_url.path == '/abc/cde'
You could use the path attribute of django HttpRequest object, in other words:
request.path
see the docs for more

How do I get the HTML of a website using Python 3?

I've been trying to do this with repl.it and have tried several solutions on this site, but none of them work. Right now, my code looks like
import urllib
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
print (urllib.urlopen(url).read())
but it just says "AttributeError: module 'urllib' has no attribute 'urlopen'".
If I add import urllib.urlopen, it tells me there's no module named that. How can I fix my problem?
The syntax you are using for the urllib library is from Python v2. The library has changed somewhat for Python v3. The new notation would look something more like:
import urllib.request
response = urllib.request.urlopen("http://www.google.com")
html = response.read()
The html object is just a string, with the returned HTML of the site. Much like the original urllib library, you should not expect images or other data files to be included in this returned object.
The confusing part here is that, in Python 3, this would fail if you did:
import urllib
response = urllib.request.urlopen("http://www.google.com")
html = response.read()
This strange module-importing behavior is, I am told, as intended and working. BUT it is non-intuitive and awkward. More importantly, for you, it makes the situation harder to debug. Enjoy.
Python3
import urllib
import requests
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
r = urllib.request.urlopen(url).read()
print(r)
or
import urllib.request
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
r = urllib.request.urlopen(url).read()
print(r)

Build a URL using Requests module Python

Is it possible to build a URL using the Requests library for Python?
Building a query string is supported but what about building the rest of the URL. Specifically I'd be interested in adding on to the base URL with URL encoded strings:
http :// some address.com/api/[term]/
term = 'This is a test'
http :// some address.com/api/This+is+a+test/
This is presumably possible using urllib but it seems like it would be better in Requests. Does this feature exist? If not is there a good reason that it shouldn't?
requests is basically a convenient wrapper around urllib (and 2,3 and other related libraries).
You can import urljoin(), quote() from requests.compat, but this is essentially the same as using them directly from urllib and urlparse modules:
>>> from requests.compat import urljoin, quote_plus
>>> url = "http://some-address.com/api/"
>>> term = 'This is a test'
>>> urljoin(url, quote_plus(term))
'http://some-address.com/api/This+is+a+test'

Download file as string in python

I want to download a file to python as a string. I have tried the following, but it doesn't seem to work. What am I doing wrong, or what else might I do?
from urllib import request
webFile = request.urlopen(url).read()
print(webFile)
The following example works.
from urllib.request import urlopen
url = 'http://winterolympicsmedals.com/medals.csv'
output = urlopen(url).read()
print(output.decode('utf-8'))
Alternatively, you could use requests which provides a more human readable syntax. Keep in mind that requests requires that you install additional dependencies, which may increase the complexity of deploying the application, depending on your production enviornment.
import requests
url = 'http://winterolympicsmedals.com/medals.csv'
output = requests.get(url).text
print(output)
In Python3.x, using package 'urllib' like this:
from urllib.request import urlopen
data = urlopen('http://www.google.com').read() #bytes
body = data.decode('utf-8')
Another good library for this is http://docs.python-requests.org
It's not built-in, but I've found it to be much more usable than urllib*.

Categories

Resources