Passing over errors in loop for my web-scraper - python

I currently have a loop running for my web-scraper. If it encounters an error (i.e can't load the page) I have it set to ignore it and continue with the loop.
for i in links:
try:
driver.get(i);
d = driver.find_elements_by_xpath('//p[#class = "list-details__item__date"]')
s = driver.find_elements_by_xpath('//p[#class = "list-details__item__score"]')
m = driver.find_elements_by_xpath('//span[#class="list-breadcrumb__item__in"]')
o = driver.find_elements_by_xpath('//tr[#data-bid]');
l = len(o)
lm= len(m)
for i in range(l):
a = o[i].text
for i in range(lm):
b = m[i].text
c = s[i].text
e = d[i].text
odds.append((a,b,c,e))
except:
pass
However, I now wish for there to be a note of some kind when an error was encountered so that I can see which pages didn't load. Even if they are just left blank in the output table, that would be fine.
Thanks for any help.

You can add a catch for the exception and then do something with that catch. This should be suitable for your script.
import ... (This is where your initial imports are)
import io
import trackback
for i in links:
try:
driver.get(i);
d = driver.find_elements_by_xpath('//p[#class = "list-details__item__date"]')
s = driver.find_elements_by_xpath('//p[#class = "list-details__item__score"]')
m = driver.find_elements_by_xpath('//span[#class="list-breadcrumb__item__in"]')
o = driver.find_elements_by_xpath('//tr[#data-bid]');
l = len(o)
lm= len(m)
for i in range(l):
a = o[i].text
for i in range(lm):
b = m[i].text
c = s[i].text
e = d[i].text
odds.append((a,b,c,e))
except Exception as error_script:
print(traceback.format_exc())
odds.append('Error count not add')
Essentially what happens is that you catch the exception using the exception Exception as error_script: line. Afterwards , you can print the actual error message to the console using thetraceback.format_exc()`command.
But most importantly you can append a string to the list by passing the append statement in the exception catch and use pass at the end of the exception. pass will run the code int he catch and then go to the next iteration.

Related

Execute statement 500 times ignoring exceptions

import random
import string
from bitcoinrpc.authproxy import AuthServiceProxy, JSONRPCException
rpc_port = 18444
rpc_user = 'user3'
rpc_password = 'pass3'
def wallet_name(size):
generate_wallet = ''.join([random.choice(string.punctuation + string.ascii_letters)
for n in range(size)])
return generate_wallet
try:
rpc_connection = AuthServiceProxy("http://%s:%s#127.0.0.1:%s"%(rpc_user,rpc_password,rpc_port))
i=0
while i < 500:
wallet = wallet_name(20)
result = rpc_connection.createwallet(wallet)
i += 1
except Exception:
pass
I want this code to try and create 500 wallets but it stops at 2-3. If I print the exception its giving an error related to incorrect file name or file path but the exception should be ignored and try creating wallet with next string.
What's the point of creating 500 randomly named wallets, when you're not even saving the names?
for i in range(500):
wallet = wallet_name(20)
try:
result = rpc_connection.createwallet(wallet)
except:
pass

Duplicate output with arrays in python

I'm trying to gather the data from 6 stocks in the array, but when my API can find the data I want it to move to the next item but still selecting just 6.
I tried with this code and other variants but nothing seems to work. The output always duplicate one stock and I don't know why
portfolio = ['NVDA', 'SPCE', 'IMGN', 'SUMR', 'EXPE', 'PWM.V', 'SVMK', 'DXCM']
tt = 0
irange = 6;
for i in range(irange):
try:
t = requests.get('https://finnhub.io/api/v1/stock/profile?symbol='+portfolio[tt])
t = t.json()
except Exception as e:
print("Error calling API, waiting 70 seconds and trying again...")
time.sleep(70)
t = requests.get('https://finnhub.io/api/v1/stock/profile?symbol='+portfolio[tt])
t = t.json()
try:
coticker = t['ticker']
coexchange = t['exchange']
coname = t['name']
codesc = t['ggroup']
coipo = t['ipo']
cosector = t['gsector']
costate = t['state']
coweburl = t['weburl']
except Exception as e:
print("Information not available")
irange = irange+1
print("THE TT IS:"+str(tt))
tt = tt+1
print("")
print(coticker,coexchange,coname,codesc,coipo,cosector,costate,coweburl)
This is the output:
THE TT IS:0
NVDA -- GATHERED DATA
Information not available
THE TT IS:1
NVDA -- GATHERED DATA
THE TT IS:2
IMGN -- GATHERED DATA
THE TT IS:3
SUMR -- GATHERED DATA
THE TT IS:4
EXPE -- GATHERED DATA
Information not available
THE TT IS:5
EXPE -- GATHERED DATA
As you can see, when there is no information available, it doesn't move to the next one, it repeats the same one. What's the mistake? Thanks in advance for your kind help.
Put the line that prints the information inside the try block that sets all the variables. Otherwise, you'll print the variables from the previous stock.
To make it keep going past 6 items when you have failures, don't use range(irange). Loop over the entire list with for symbol in portfolio:, and use a variable to count the successful attempts. Then break out of the loop when you've printed 6 stocks.
I've changed the code to use if statements instead of try/except to handle empty responses.
portfolio = ['NVDA', 'SPCE', 'IMGN', 'SUMR', 'EXPE', 'PWM.V', 'SVMK', 'DXCM']
irange = 6
successes = 0
for symbol in portfolio:
try:
t = requests.get('https://finnhub.io/api/v1/stock/profile?symbol='+symbol)
t = t.json()
except Exception as e:
print("Error calling API, waiting 70 seconds and trying again...")
time.sleep(70)
t = requests.get('https://finnhub.io/api/v1/stock/profile?symbol='+symbol)
if t:
t = t.json()
if t:
coticker = t['ticker']
coexchange = t['exchange']
coname = t['name']
codesc = t['ggroup']
coipo = t['ipo']
cosector = t['gsector']
costate = t['state']
coweburl = t['weburl']
print("")
print(coticker,coexchange,coname,codesc,coipo,cosector,costate,coweburl)
successes += 1
if successes >= irange:
break
else:
print("Information not available for "+symbol)

Python Warning - Expected type 'Union[Integral, slice]', got 'str' instead

My code below in python is giving me a warning on the line:
some_new_object['someVar'] = cd['someVar']
The warning is
Expected type 'Union[Integral, slice]', got 'str' instead
Code:
def some_object():
return {
'someId': 0,
'someVar' : ''
}
def warn_test(in_list):
try:
new_list = []
some_new_object = some_object()
for cd in in_list:
if cd['someVar']:
new_list.append(cd)
for cd in new_list:
some_new_object['someVar'] = cd['someVar']
in_list.append(some_new_object.copy())
return in_list
except Exception:
print 'baaa'
#Main Program
new_obj = some_object()
new_obj['someId'] = 1
new_obj['someVar'] = 'Next'
new_obj2 = some_object()
new_obj2['someId'] = 1
new_obj2['someVar'] = None
new_list = []
new_list.append(new_obj)
new_list.append(new_obj2)
out_list = warn_test(new_list)
for obj in out_list:
print obj
If I change the function warn_test to this:
def warn_test(in_list):
try:
new_list = []
some_new_object = some_object()
for cd in in_list:
if cd['someVar']:
some_new_object['someVar'] = cd['someVar']
new_list.append(some_new_object.copy())
for cd in new_list:
in_list.append(cd)
return in_list
except Exception:
print 'baaa'
It gives me no warning.
Can someone help me to understand why I get the warning, and how I can access the cd['someVar'] in the second iteration without getting a warning?
I know this code is weird, I need this for a project I am working on, I made this test to share here, but it gives me the same Warning so a solution for this will fix it in my system. (No warnings is one of the Must Haves for this system)
Better late than never.
In general I have found that if variables/method returns are strongly typed these warnings go away.

How to append global list variable in try/except exception in Python?

I'm trying to append global list variable with new words in try/except exceptions, but after try/except I get empty list.
list = [] # created empty list with global scope
def try_multiple_operations(j):
try:
jopen = urllib2.urlopen(j) # opened url for parsing content
versions = jopen.read() # read and save to variable
version = pq(versions) # filtering content with pyquery
.... # pyquery operations
list.append(version)
except urllib2.URLError: # urllib2 exception
list.append('0.0')
except urllib2.HTTPError: # urllib2 exception
list.append('0.0')
executor = concurrent.futures.ProcessPoolExecutor(5)
futures = [executor.submit(try_multiple_operations, j) for j in list]
concurrent.futures.wait(futures)
print len(list) # 0 elements
At the end I got empty list. How can I add/append new results to global list within try/except?
You have several problems. First, list (which really should be renamed so that you don't shadow the built-in list function) is empty, so
futures = [executor.submit(try_multiple_operations, j) for j in list]
runs your function zero times.
The second is that a ProcessPoolExecutor runs the worker in another process. The worker would update that process's list global, not the one in the main process. You should use one of the other pool methods such as map and have your worker return its result.
Since your code isn't runnable, I cooked up a different working example
import concurrent.futures
def try_multiple_operations(j):
try:
if j % 2:
raise ValueError('oops')
return '%d even' % j
except ValueError:
return '%d odd' % j
executor = concurrent.futures.ProcessPoolExecutor(5)
my_list = list(executor.map(try_multiple_operations, range(10)))
print(my_list)
And your code could be changed to
def try_multiple_operations(j):
try:
jopen = urllib2.urlopen(j) # opened url for parsing content
versions = jopen.read() # read and save to variable
version = pq(versions) # filtering content with pyquery
.... # pyquery operations
return version
except urllib2.URLError: # urllib2 exception
return '0.0'
except urllib2.HTTPError: # urllib2 exception
return '0.0'
url_list = [ ...something... ]
executor = concurrent.futures.ProcessPoolExecutor(5)
my_list = list(executor.map(try_multiple_operations, url_list)
print len(my_list) # 0 elements

How can I replace multiple try/except blocks with less code?

I find myself writing code as below quite a bit. It's very verbose. What I'd like to do is assign array indeces to different variables, and if there's an indexerror, assign false. I feel like there should be a shorter syntax for doing this (compared to what I have below).
Edit - here's my actual code. page is a valid lxml.html object. Each of the selectors may or may not return a value, depending on whether that section is present on the page.
def extract_data( page ):
# given lxml.html obj, extract data from g+ page and return as dict
try:
profile_name = page.xpath( '//div[#guidedhelpid="profile_name"]/text()' )[0]
except IndexError:
profile_name = False
try:
website = page.cssselect( 'span.K9a' )[0].text_content().rstrip('/')
except IndexError:
website = False
try:
contact_div = html.tostring( page.xpath( '//div[normalize-space(text())="Contact Information"]/../../..' )[0] )
except IndexError:
contact_div = False
return {
'profile_name' : profile_name,
'website' : website,
'contact_div' : contact_div,
}
Assuming what you're trying to do makes sense within the context of your use case, you can encapsulate this notion of a default value inside a function:
def retrieve(my_list, index, default_value=False):
try:
return my_list[index]
except IndexError:
return default_value
That way you can do something like:
my_list = [2, 4]
first = retrieve(my_list, 0)
# first will be 2
second = retrieve(my_list, 1)
# second will be 4
third = retrieve(my_list, 2)
# third will be False
You can even change the value you'd like to default to in case the index does not exist.
In general, when you're repeating code like in the manner you're doing above, the first thing you should think about is whether you can write a function that does what you're trying to do.
Using your actual code, you could do something like:
profile_name = retrieve(page.xpath( '//div[#guidedhelpid="profile_name"]/text()'), 0)
website = retrieve(page.cssselect( 'span.K9a' ), 0)
if website:
website = website.text_content().rstrip('/')
contact_div = retrieve(page.xpath( '//div[normalize-space(text())="Contact Information"]/../../..' ), 0)
if contact_div:
contact_div = html.tostring(contact_div)
vars = ['first', 'second', 'third']
r = {}
for i, var in enumerate(vars):
try:
r[var] = l[i]
except IndexError:
r[var] = False
This should solve your problem :) exec + looping to the rescue!
l = list([0,2])
numberWords = { 0:"first", 1:"second", 2:"third"}
for i in range(0,len(l)):
try:
exec(numberWords[i]+"=l["+str(i)+"]")
except IndexError:
exec(numberWords[i]+"=false")

Categories

Resources