I'm scraping Linkedin using Selenium. This a very brittle task and exceptions are raised often. I want to find an elegant way to handle errors. The internet has the usual try catch but its clunky... See the code below:
try:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable(job))
job_title = job.find_element(By.CLASS_NAME, "base-search-card__title").text
company = job.find_element(By.CLASS_NAME, "base-search-card__subtitle").text
location = job.find_element(By.CLASS_NAME, "job-search-card__location").text
except :
print("Boom Boom")
If any of the find_element methods throws the expect part is run and the code in the try wont execute further. I'd like a scenario where if one fails the except wouldn't be hit i.e. if it fails I can return an empty string. I can wrap everything in a function and do something like this:
def extract_job_title(job):
try:
return job.find_element(By.CLASS_NAME, "base-search-card__title").text
except:
return ""
and have:
job_title = extract_job_title(job)
but that is also clunky... I want something like I would have in Swift. Something like this:
let job_title = try? job.find_element(By.CLASS_NAME, "base-search-card__title").text ?? ""
Does something similar to Swift exist and if not can anyone else see a way of making things "nicer" other than using functions?
Generally, if you have a lot of very repetitive code with only small differences, you can probably extract that into a function or loop. E.g.:
searches = {'title': 'base-search-card__title', ...}
data = {}
for item, cls in searches.items():
try:
data[item] = job.find_element(By.CLASS_NAME, cls).text
except:
pass
This would also extract data into one dict, which seems more logical than a bunch of separate variables.
Alternatively, extracted into a function:
def get(job, cls):
try:
return job.find_element(By.CLASS_NAME, cls).text
except:
return None
job_title = get(job, 'base-search-card__title')
Note that you should intercept a specific exception in except, not any and all exceptions.
Alternatively, turn the entire thing on its head and only evaluate the elements that actually exist, and sort them into the right bins. Something along the lines of:
classes = {'base-search-card__title': 'title', ...}
data = {}
for elem in job.some_broader_find_query():
data[classes[elem.class_name]] = elem.text
Related
def store(self) -> list:
result = []
for url in self.urls():
if url.should_store():
stored_url = self.func_that_can_throw_errors(url)
if stored_url: result.append(stored_url)
return result
Preface: not actual method names. Silly names chosen to emphasize
During the loop errors may occur. In that case, I desire the intermediate result to be returned by store() and still raise the original exception for handling in a different place.
Doing something like
try:
<accumulating results ... might break>
except Exception:
return result
raise
sadly doesn't do the trick, since trivially the raise stmt won't be reached (and thus an empty list get's returned).
Do you guys have recommendations on how not to lose the intermediate result?
Thanks a lot in advance - Cheers!
It is not possible as you imagine it. You can't raise an exception and return a value.
So I think what you are asking for is a work around. There, I see two possibilities:
return a Flag/Exception along the actual return value:
Return flag:
except Exception:
return result, False
where False is the Flag telling that something went wrong
Return Exception:
except Exception as e:
return result, e
Since it appears, that store is a method of some class, you could raise the exception and retrieve the intermediary result with a second call like so:
def store(self):
result = []
try:
# something
except Exception:
self.intermediary_result = result
raise
def retrieve_intermediary(self):
return self.intermediary_result
The best answer I can come up with given my limited knowledge of Python would be to always return a pair, where the first part of the pair is the result and the second part of the pair is an optional exception value.
def store(self) -> list:
'''
TODO: Insert documentation here.
If an error occurs during the operation, a partial list of results along with
the exception value will be returned.
:return A tuple of [list of results, exception]. The exception part may be None.
'''
result = []
for url in self.urls():
if url.should_store():
try:
stored_url = self.func_that_can_throw_errors(url)
except Exception as e:
return result, e
if stored_url: result.append(stored_url)
return result, None
That said, as you have mentioned, if you have this call multiple places in your code, you would have to be careful to change it in all relevant places as well as possibly change the handling. Type checking might be helpful there, though I only have very limited knowledge of Python's type hints.
Meanwhile I had the idea to just use an accumulator which appears to be the 'quickest' fix for now with the least amount of changes in the project where store() is called.
The (intermediate) result is not needed everywhere (let's say it's optional). So...
I'd like to share that with you:
def store(self, result_accu=None) -> list:
if result_accu is None:
result_accu = []
for url in self.urls():
if url.should_store():
stored_url = self.func(url)
if stored_url: result_accu.append(stored_url)
return result_accu
Still returning a list but alongside the intermediate result is accessible by reference on the accu list.
Making the parameter optional enables to leave most statements in project as they are since the result is not needed everywhere.
store() is rather some kind of a command where the most work on data integrity is done within already. The result is nice-to-have for now.
But you guys also enabled me to notice that there's work to do in ordner to process the intermediate result anyway. Thanks! #attalos #MelvinWM
I have this:
try:
if session.var:
otherVar = session.var
else:
util = db.utility[1]
otherVar = session.var = util.freshOutTheBank
except AttributeError:
util = db.utility[1]
otherVar = session.var = util.freshOutTheBank
...do stuff with otherVar
The case is that the session.var might not exist or could be None. This code is also run more than once by a user during a session.
How do I avoid repeating the code. I basically want to do an 'except and else' or am I looking at this incorrectly?
Assuming this is a web2py session object, note that it is an instance of gluon.Storage, which is like a dictionary with two exceptions: (1) keys can be accessed like properties, and (2) accessing a non-existent key/property returns None rather than raising an exception. So, you can simply do something like:
otherVar = session.var = session.var if session.var else db.utility[1].freshOutTheBank
Note, if you want to distinguish between non-existent keys and keys that have an explicit value of None, you cannot use hasattr(session, 'var'), as that will return True even if there is no 'var' key. Instead, you can check session.has_key('var'), which will return False if there is no 'var' key.
You can avoid using session.var if it doesn't exist by checking for it first, using hasattr. This avoids the need for the try/except block all together.
if hasattr(session, 'var') and session.var is not None:
...
else:
...
An alternative might be to have the else in your original code just raise an exception to get to the except block, but it's sort of ugly:
try:
if session.var:
...
else:
raise AttributeError
except AttributeError:
...
In this situation, I think the "Look Before you Leap" style of programming (using hasattr) is nicer than the usually more Pythonic style of "Easier to Ask Forgiveness than Permission" (which uses exceptions as part of flow control). But either one can work.
If your code was compartmentalized into smaller functions, it might be even easier to deal with the issue. For instance, if you wrote a get_session_var function, it could return from the successful case (inside the try and if blocks), and the two error cases could be resolved later:
def get_session_var(session):
try:
if session.var:
return session.var
except AttributeError:
pass
util = db.utility[1]
session.var = util.freshOutTheBank
return session.var
I am trying to replace a largish if-elif block in python with something a bit more like java switch. My understanding is this should be a bit faster and we are parsing lots of data so if I can get speed improvement I will take it. However, what is happening is the code is always acting as if the key is 'deposits' even for entries that are not. The "func =" line is there to validate I am getting things correctly. I will probably not return a result to func since my goal is to fill a list with results.
What am I doing wrong that the switcher.get always finds a match even when one does not exist?
`def parsePollFile(thisFile):
line = ''
switcher = {
'deposits': deposits(line)
}
try:
reader = csv.reader(open(thisFile, 'r'))
for line in reader:
try:
if line[2] == "D":
func = switcher.get(line[0], lambda: 'invalid key')
print('key: {} -- {}\n'.format(line[0],func))
except IndexError:
continue
except Exception as e:
print("exception {}\n".format(e))
`
Your problem is that the code
switcher = {'deposits': deposits(line)}
Doesn't create a key in your dictionary with the value being the deposits function object. 'deposits': deposits(line) actually runs the deposits function, and stores the return value as the value of the 'deposits' key. You need to store a function object in the dictionary.
Since your function takes arguments, this is a bit tricky. There are several ways around this problem, but perhaps the simplest is to wrap your function call in another function
switcher = {'deposits': lambda: deposits(line)}
You would then use the dictionary like so
func = switcher.get(line[0], lambda: 'invalid key')
func()
My solution resulted from both the comment and the answer. I changed the 'deposits': deposits(line) to be 'deposits':deposits and then I run the func(line) when a match is found.
Thank you to the answers. I did not realize I was actually running the method instead of defining the value.
I know this is in the pipeline for Java 8 or 9 but I think there must be a way to do this in python. Say for example I am writing a complex expression and cannot be bothered to add null checks at all levels (example below)
post_code = department.parent_department.get('sibling').employees.get('John').address.post_code
I dont want to worry about several intermediate values being 'None'. For example if the parent_department does not have a sibling key I want to shunt and return None assigned to post_code. Something like
post_code = department?.parent_department?.get('sibling')?.employees?.get('John')?.address?.post_code
Can this be done in Python 2.7.1? I know this means more trouble while debugging but assume I have done all pre-checks and if any value is null it means an internal error so it is enough if I just get the error trace that the particular line failed.
Here is a more verbose way. I just need a one-liner that does not throw random exceptions
def get_post_code(department):
if department is None:
return None
if department.parent_department is None:
return None
if department.parent_department.get('sibling') is None:
return None
... more checks...
return post_code = department.parent_department.get('sibling').employees.get('John').address.post_code
If you want post_code to be None then catch the exceptions raised by trying to access non-existing items:
try:
post_code = department.parent_department.get('sibling').employees.get('John').address.post_code
except (AttributeError, KeyError):
post_code = None
Actually one valid answer for this is to start thinking in terms of Monads (may-be monads) to chain these functions. A very primitive tutorial is at https://github.com/dustingetz/dustingetz.github.com/blob/master/_posts/2012-04-07-dustins-awesome-monad-tutorial-for-humans-in-python.md
I'm a python/coding newbie and I'm trying to put a two for loops into a while loop? Can I do this? How can I print out the dictionary mydict to make sure I am doing this correctly?
I'm stuck.
40 minutes later. Not stuck anymore. Thanks everyone!
def runloop():
while uid<uidend:
for row in soup.findAll('h1'):
try:
name = row.findAll(text = True)
name = ''.join(name)
name = name.encode('ascii','ignore')
name = name.strip()
mydict['Name'] = name
except Exception:
continue
for row in soup.findAll('div', {'class':'profile-row clearfix'}):
try:
field = row.find('div', {'class':'profile-row-header'}).findAll$
field = ''.join(field)
field = field.encode('ascii','ignore')
field = field.strip()
except Exception:
continue
try:
value = row.find('div', {'class':'profile-information'}).findAl$
value = ''.join(value)
value = value.encode('ascii','ignore')
value = value.strip()
return mydict
mydict[field] = value
print mydict
except Exception:
continue
uid = uid + 1
runloop()
On nested loops:
You can nest for and while loops very deeply before python will give you an error, but it's usually bad form to go more than 4 deep. Make another function if you find yourself needing to do a lot of nesting. Your use is fine though.
Some problems with the code:
It will never reach the print statements because under the first for loop you have a return statement. When python sees a return inside a function, it will leave the function and present the return value.
I would avoid using try and except until you understand why you're getting the errors that you get without those.
Make sure the indentation is consistent. Maybe it's a copy and paste error, but it looks like the indentation of some lines is a character more than others. Make sure every tab is 4 spaces. Python, unlike most languages, will freak out if the indentation is off.
Not sure if you just didn't post the function call, but you would need to call runloop() to actually use the function.
You can put as many loops within other loops as you'd like. These are called nested loops.
Also, printing a dictionary is simple:
mydict = {}
print mydict
You are not helping yourself by having these all over the place
except Exception:
continue
That basically says, "if anything goes wrong, carry one and don't tell me about it."
Something like this lets you at least see the exception
except Exception as e:
print e
continue
Is mydict declared somewhere? That could be your problem