Python, NameError in urllib2 module but only in a few websites - python

website = raw_input('website: ')
with open('words.txt', 'r+') as arquivo:
for lendo in arquivo.readlines():
msmwebsite = website + lendo
try:
abrindo = urllib2.urlopen(msmwebsite)
abrindo2 = abrindo.read()
except URLError as e:
pass
if abrindo.code == 200:
palavras = ['registration', 'there is no form']
for palavras2 in palavras:
if palavras2 in abrindo2:
print msmwebsite, 'up'
else:
pass
else:
pass
It's working but for some reason, some websites I got this error:
if abrindo.code == 200:
NameError: name 'abrindo' is not defined
How to fix it?
.......................................................................................................................................................................................

Replace pass with continue. And at least do some error logging, as you silently skip erroneous links.
In case your request resulted in an URLError, no variable abrindo is defined, hence your error.

abrindo is created only in the try block. It will not be available if the catch block is executed. To fix this, move the block of code starting with
if abrindo.code == 200:
inside the try block. One more suggestion, if you are not doing anything in the else part, instead of explicitly writing that with pass, simply remove them.

Related

Multiple exceptions with python

I am trying to write code that allows me to do 4 things, and I am using try and except.
The code is as follows:
try:
for i in lista:
a = url1 + i
print(a)
wget.download(a, '/Users/******/downloads')
except:
for i in lista:
b = url2 + i
wget.download(b, '/Users/*****/downloads')
But I need to use 2 more exceptions. Can you explain to me how I can do it?
The main goal is to download a file; if it is still not there, download a second file, and so on and so forth.
You can specify the error after the except statement. For example:
urls = [
"algumsite.com",
"outrosite.org",
"sitezinho.com.br"
]
for url in urls:
try:
wget.download(url, "path_to_download_folder/")
except <error/s that can be raised in the previous try block>:
# code that will be executed if the error were raised

After timeout keep trying request

I've just started using Python to scrape the data. But my code as below freezes during work and I guess that's because some url did not response anything; I guess it would work if I just try that url again. My question here is, if I just revise the code like,
reshomee = requests.get(homeUrl, headers=headerss, timeout=10)
then does this code try that url again after 10 seconds with no response? I am just worried if it would be just over without trying again...?
I couldn't help asking this because I have no idea how to try this code since url freezes very rare and randomly. Thank you!
def reshome(tries=0):
try:
reshomee = requests.get(homeUrl, headers=headerss)
return reshomee
except Exception as e:
print(e)
if tries < 10:
print('try:' + str(tries))
sleep(tries*30+100)
return reshome(tries+1)
else:
print('cannot make it')
You can use requests.exceptions in the module.
def reshome(tries=0):
try:
reshomee = requests.get(homeUrl, headers=headerss, timeout=0.001)
return reshomee
except requests.exceptions.Timeout as e:
return reshome(tries+1)

How to continue after receiving None response from xml parse

I am finding prices of products from Amazon using their API with Bottlenose and parsing the xml response with BeautifulSoup.
I have a predefined list of products that the code iterates through.
This is my code:
import bottlenose as BN
import lxml
from bs4 import BeautifulSoup
i = 0
amazon = BN.Amazon('myid','mysecretkey','myassoctag',Region='UK',MaxQPS=0.9)
list = open('list.txt', 'r')
print "Number", "New Price:","Used Price:"
for line in list:
i = i + 1
listclean = line.strip()
response = amazon.ItemLookup(ItemId=listclean, ResponseGroup="Large")
soup = BeautifulSoup(response, "xml")
usedprice=soup.LowestUsedPrice.Amount.string
newprice=soup.LowestNewPrice.Amount.string
print i , newprice, usedprice
This works fine and will run through my list of amazon products until it gets to a product which doesn't have any value for that set of tags, like no new/used price.
At which Python will throw up this response:
AttributeError: 'NoneType' object has no attribute 'Amount'
Which makes sense as there is no tags/string found by BS that I searched for. Having no value is perfectly fine from what I'm trying to achieve, however the code collapses at this point and will not continue.
I have tried:
if soup.LowestNewPrice.Amount != None:
newprice=soup.LowestNewPrice.Amount.string
else:
continue
and also tried:
newprice=0
if soup.LowestNewPrice.Amount != 0:
newprice=soup.LowestNewPrice.Amount.string
else:
continue
I am at a loss for how to continue after receiving the nonetype value return. Unsure whether the problem lies fundamentally in the language or in the libraries I'm using.
You can use exception handling:
try:
# operation which causes AttributeError
except AttributeError:
continue
The code in the try block will be executed and if an AttributeError is raised, the execution will immediately drop into the except block (which will cause the next item in the loop to be ran). If no error is raised, the code will happily skip the except block.
If you just wish to set the missing values to zero and print, you can do
try: newprice=soup.LowestNewPrice.Amount.string
except AttributeError: newprice=0
try: usedprice=soup.LowestUsedPrice.Amount.string
except AttributeError: usedprice=0
print i , newprice, usedprice
The correct way of comparing with None is is None, not == None or is not None, not != None.
Secondly, you also need to check soup.LowestNewPrice for None, not the Amount, i.e.:
if soup.LowestNewPrice is not None:
... read soup.LowestNewPrice.Amount

Handling a url which fails to open, error handling using urllib

I would like some help on how to handle an url which fails to open, currently the whole program gets interrupted when it fails to open the url ( tree = ET.parse(opener.open(input_url)) )...
If the opening of an url fails on my first function call (motgift) I would like it to wait 10 seconds and then try to open the url again, if it once again fails I would like my script to continue with next function call (observer).
def spider_xml(input_url, extract_function, input_xpath, pipeline, object_table, object_model):
opener = urllib.request.build_opener()
tree = ET.parse(opener.open(input_url))
print(object_table)
for element in tree.xpath(input_xpath):
pipeline.process_item(extract_function(element), object_model)
motgift = spider_xml(motgift_url, extract_xml_item, motgift_xpath, motgift_pipeline, motgift_table, motgift_model)
observer = spider_xml(observer_url, extract_xml_item, observer_xpath, observer_pipeline, observer_table, observer_model)
Would be very happy and appreciate an example on how to make this happen.
Would a Try Except block work?
error = 0
while error < 2:
try:
motgift = spider_xml(motgift_url, extract_xml_item, motgift_xpath, motgift_pipeline, motgift_table, motgift_model
break
except:
error += 1
sleep(10)
try:
resp = opener.open(input_url)
except Exception:
time.sleep(10)
try:
resp = opener.open(input_url)
except Exception:
pass
Are you looking for this?

URLRetrieve Error Handling

I have the following code that grabs images using urlretrieve working..... too a point.
def Opt3():
global conn
curs = conn.cursor()
results = curs.execute("SELECT stock_code FROM COMPANY")
for row in results:
#for image_name in list_of_image_names:
page = requests.get('url?prodid=' + row[0])
tree = html.fromstring(page.text)
pic = tree.xpath('//*[#id="bigImg0"]')
#print pic[0].attrib['src']
print 'URL'+pic[0].attrib['src']
try:
urllib.urlretrieve('URL'+pic[0].attrib['src'],'images\\'+row[0]+'.jpg')
except:
pass
I am reading a CSV to input the image names. It works except when it hits an error/corrupt url (where there is no image I think). I was wondering if I could simply skip any corrupt urls and get the code to continue grabbing images? Thanks
urllib has a very bad support for error catching. urllib2 is a much better choice. The urlretrieve equivalent in urllib2 is:
resp = urllib2.urlopen(im_url)
with open(sav_name, 'wb') as f:
f.write(resp.read())
And the errors to catch are:
urllib2.URLError, urllib2.HTTPError, httplib.HTTPException
And you can also catch socket.error in case that the network is down.
Simply using except Exception is a very stupid idea. It'll catch every error in the above block even your typos.
Just use a try/except and continue if it fails
try:
page = requests.get('url?prodid=' + row[0])
except Exception,e:
print e
continue # continue to next row
Instead of pass why don't you try continue when an error occurs.
try:
urllib.urlretrieve('URL'+pic[0].attrib['src'],'images\\'+row[0]+'.jpg')
except Exception e:
continue

Categories

Resources