Syntax Issue in Python urllib2? - python

Am trying to test out urllib2. Here's my code:
import urllib2
response = urllib2.urlopen('http://pythonforbeginners.com/')
print response.info()
html = response.read()
response.close()
When I run it, I get:
Syntax Error: invalid syntax. Carrot points to line 3 (the print line). Any idea what's going on here? I'm just trying to follow a tutorial and this is the first thing they do...
Thanks,
Mariogs

In Python3 print is a function. Therefore it needs parentheses around its argument:
print(response.info())
In Python2, print is a statement, and hence does not require parentheses.
After correcting the SyntaxError, as alecxe points out, you'll probably encounter an ImportError next. That is because the Python2 module called urllib2 was renamed to urllib.request in Python3. So you'll need to change it to
import urllib.request as request
response = request.urlopen('http://pythonforbeginners.com/')
As you can see, the tutorial you are reading is meant for Python2. You might want to find a Python3 tutorial or Python3 urllib HOWTO to avoid running into more of these problems.

Related

Web scraping Python Shell Not Responding

I am trying to run this basic code but even after waiting for long, Python shell simply get stuck and i always find myself facing 'Python 3.6.5 Shell(Not Responding)'. Please suggest.
import requests
from bs4 import BeautifulSoup
webdump = requests.get("https://www.flipkart.com/").text
soup = BeautifulSoup(webdump,'lxml')
print(soup.prettify())
This page is around 1MB, so spitting more than 974047 bytes (soup.prettify() adds more spaces and newlines) into the terminal at once is probably what makes it stuck.
Try printing this text line by line:
for line in soup.prettify().splitlines(False):
print(line)

Python urllib2 unable to open url

I have a URL in this format:
https://aLongStringWithNumbers:anotherLongStringWithNumbers#somewhere.com/admin/someAPICall.json
which looks like not something that Python's urllib2 can understand, keep getting errors when using that with urllib2.open:
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
Is this a case where the library is not capable of interpreting that URL? obviously it works in the browser... Using Python 2.7
Any advice would be appreciated!
Thanks
- V
Python is interpreting the : as the port number in the URL. You need to force it to pull in as a url using something like quote
See quote from the non-accepted answer here
>>> import urllib2
>>> urllib2.quote('https://aLongStringWithNumbers:anotherLongStringWithNumbers#somewhere.com/admin/someAPICall.json')

Requiring help in figuring out indent error in python code

I get an indentation error when trying to run the code below. I am trying to print out the URLs of a set of html pages recursively.
import urllib2
from BeautifulSoup import *
from urlparse import urljoin
# Create a list of words to ignore
ignorewords=set(['the','of','to','and','a','in','is','it'])
def crawl(self,pages,depth=2):
for i in range(depth):
newpages=set()
for page in pages:
try:
c=urllib2.urlopen(page)
except:
print "Could not open %s" % page
continue
soup=BeautifulSoup(c.read())
self.addtoindex(page,soup)
links=soup('a')
for link in links:
if ('href' in dict(link.attrs)):
url=urljoin(page,link['href'])
if url.find("'")!=-1: continue
url=url.split('#')[0] # remove location portion
if url[0:4]=='http' and not self.isindexed(url):
newpages.add(url)
linkText=self.gettextonly(link)
self.addlinkref(page,url,linkText)
self.dbcommit()
pages=newpages
Well you're coded is totally unindented so Python will cry when you try and run it.
Remember in Python whitespace is important. Indenting with 4 spaces rather than tab saves a lot of "invisible" indentation errors.
I've down-voted as the code was pasted unformatted/unindented which means either the poster doesn't understand python (and hasn't read a basic tutorial) or pasted the code without re-indenting , which makes it impossible for anyone to answer.

python cgitb is not functioning through a browser

I can't seem to get the python module cgitb to output the stack trace in a browser. I have no problems in a shell environment. I'm running Centos 6 with python 2.6.
Here is an example simple code that I am using:
import cgitb; cgitb.enable()
print "Content-type: text/html"
print
print 1/0
I get an Internal Server error instead of the printed detailed report. I have tried different error types, different browsers, etc.
When I don't have an error, of course python works fine. It will print the error in a shell fine. The point of cgitb is to print the error instead of returning an "Internal Server Error" in the browser for most error exceptions. Basically I'm just trying to get cgitb to work in a browser environment.
Any Suggestions?
Okay, I got my problem fixed and the OP brought me to it: Even tho cgitb will output HTML by default, it will not output a header! And Apache does not like that and might give you some stupid error like:
<...blablabla>: Response header name '<!--' contains invalid characters, aborting request
It indicates, that Apache was still working its way through the headers when it already encountered some HTML. Look at what the OP prints before the error is triggered. That is a header and you need that. Including the empty line.
I will just quote the docs:
Make sure that your script is readable and executable by "others"; the Unix file mode should be 0755 octal (use chmod 0755 filename).
Make sure that the first line of the script contains #! starting in column 1 followed by the pathname of the Python interpreter, for instance:
#!/usr/local/bin/python

Trying to convert php to python and getting a syntax error

I'm trying to convert some php code into python and am using curl. I've gotten most of it to be accepted, but when it gets to the result = pycurl.exec(Moe) it keeps throwing a syntax error. I guess that I'm not filling out the exec field correctly, but I can't seem to figure out where it is going wrong.
from urllib2 import urlopen
from ClientForm import ParseResponse
import cgi, cgitb
import webbrowser
import curl, pycurl
Moe = pycurl.Curl(wsdl)
Moe.setopt(pycurl.POST, 1)
Moe.setopt(pycurl.HTTPHEADER, ["Content-Type: text/xml"])
Moe.setopt(pycurl.HTTPAUTH, pycurl.BASIC)
Moe.setopt(pycurl.USERPWD, "userid:password")
Moe.setopt(pycurl.POSTFIELDS, Larry)
Moe.setopt(pycurl.SSL_VERIFYPEER, 0)
Moe.setopt(pycurl.SSLCERT, pemlocation)
Moe.setopt(pycurl.SSLKEY, keylocation)
Moe.setopt(pycurl.SSLKEYPASSWD, keypassword)
Moe.setopt(pycurl.RETURNTRANSFER, 1)
result = pycurl.exec(Moe)
pycurl.close(Moe)
Use result = Moe.perform() to execute your request.
PS:
Moe.setopt(pycurl.POSTFIELDS, Larry)
Is Larry actually a variable? If it's a string, quote it.
exec is a reserved word in Python, you cannot have a function of that name. Try reading the pycurl documentation to see what function you should be calling.
I haven't used pycurl myself, but maybe you want to call Moe.perform()?

Categories

Resources