I'm making a API call that occasionally does not have certain fields in the response. If I run across one of these responses, my script throws the KeyError as expected, but then breaks out of the for loop completely. Is there any way of getting it to simply skip over the errored output and continue with the loop?
I've considered trying to put all the fields I'm searching for into a list and iterate over that using a continue statement to keep the iteration going when it encounters a missing field, but 1) it seems cumbersome and 2) I've got multiple levels of iterations within the output.
try:
for item in result["results"]:
print(MAJOR_SEP) # Just a line of characters separating the output
print("NPI:", item['number'])
print("First Name:", item['basic']['first_name'])
print("Middle Name:", item['basic']['middle_name'])
print("Last Name:", item['basic']['last_name'])
print("Credential:", item['basic']['credential'])
print(MINOR_SEP)
print("ADDRESSES")
for row in item['addresses']:
print(MINOR_SEP)
print(row['address_purpose'])
print("Address (Line 1):", row['address_1'])
print("Address (Line 2):", row['address_2'])
print("City:", row['city'])
print("State:", row['state'])
print("ZIP:", row['postal_code'])
print("")
print("Phone:", row['telephone_number'])
print("Fax:", row['fax_number'])
print(MINOR_SEP)
print("LICENSES")
for row in item['taxonomies']:
print(MINOR_SEP)
print("State License: {} - {}, {}".format(row['state'],row['license'],row['desc']))
print(MINOR_SEP)
print("OTHER IDENTIFIERS")
for row in item['identifiers']:
print(MINOR_SEP)
print("Other Identifier: {} - {}, {}".format(row['state'],row['identifier'],row['desc']))
print(MAJOR_SEP)
except KeyError as e:
print("{} is not defined.".format(e))
Those try...except blocks, specially those for very specific errors such as KeyError should be added around only the lines where it matter.
If you want to be able to continue processing, at least put the block inside the for loop so on error it will skip to the next item on the iteration. But even better would be to verify when the values are actually necessary and just replace them with a dummy value in case they are not.
For example: for row in item['addresses']:
Could be: for row in item.get('addresses', []):
Therefore you will accept items without an address
Try using try/except clause after for.
In example:
for item in result["results"]:
try:
# Code here.
except KeyError as e:
print("{} is not defined.".format(e))
Python documentation for exceptions: https://docs.python.org/3/tutorial/errors.html
You could also use contextlib.suppress (https://docs.python.org/3/library/contextlib.html#contextlib.suppress)
Example:
from contextlib import suppress
for item in result["results"]:
with suppress(KeyError):
# Code here
Related
Scraping a table from a website. But encountering empty cells during the process. Below try-except block is screwing up the data at the end. Also dont want to exclude the complete row, as the information is still relevant even when the some attribute is missing.
try:
for i in range(10):
data = {'ID': IDs[i].get_attribute('textContent'),
'holder': holder[i].get_attribute('textContent'),
'view': view[i].get_attribute('textContent'),
'material': material[i].get_attribute('textContent'),
'Addons': addOns[i].get_attribute('textContent'),
'link': link[i].get_attribute('href')}
list.append(data)
except:
print('Error')
Any ideas?
What you can do is place all the objects to which you want to access the attributes to in a dictionary like this:
objects={"IDs":IDs,"holder":holder,"view":view,"material":material...]
Then you can iterate through this dictionary and if the specific attribute does not exist, simply append an empty string to the value corresponding to the dict key. Something like this:
the_keys=list(objects.keys())
for i in range(len(objects["IDs"])): #I assume the ID field will never be empty
#so making a for loop like this is better since you iterate only through
#existing objects
data={}
for j in range(len(objects)):
try:
data[the_keys[j]]=objects[the_keys[j]][i].get_attribute('textContent')
except Exception as e:
print("Exception: {}".format(e))
data[the_keys[j]]="" #this means we had an exception
#it is better to catch the specific exception that is thrown
#when the attribute of the element does not exist but I don't know what it is
list.append(data)
I don't know if this code works since I didn't try it but it should give you an overall idea on how to solve your problem.
If you have any questions, doubts, or concerns please ask away.
Edit: To get another object's attribute like the href you can simply include an if statement checking the value of the key. I also realized you can just loop through the objects dictionary getting the keys and values instead of accessing each key and value by an index. You could change the inner loop to be like this:
for key,value in objects.items():
try:
if key=="link":
data[key]=objects[key][i].get_attribute("href")
else:
data[key]=objects[key][i].get_attribute("textContent")
except Exception as e:
print("Error: ",e)
data[key]=""
Edit 2:
data={}
for i in list(objects.keys()):
data[i]=[]
for key,value in objects.items():
for i in range(len(objects["IDs"])):
try:
if key=="link":
data[key].append(objects[key][i].get_attribute("href"))
else:
data[key].append(objects[key][i].get_attribute("textContent"))
except Exception as e:
print("Error: ",e)
data[key].append("")
Try with this. You won't have to append the data dictionary to the list. Without the original data I won't be able to help much more. I believe this should work.
So I'm trying to get this working, where I remove the week's stats (weeklydict) from this second's stats (instantdict) so I have an accurate weekly progress for all keys of instantdict (keys being members). It works fine and dandy, but when a new member joins (adding to the keys in instantdict), shit hits the fan, so I use try/except, and attempt to add the missing member to weeklydict too, except when I do that using except keyerror as e and str(e), I'm given a 'none' value. Any idea on what to do?
Code:
for member, wins in instantDict.items():
try:
instantDict[member] = instantDict[member] - weeklyDict[member]
except KeyError as e:
weeklyDict[str(e)] = instantDict.get(str(e)) #error occurs here
instantDict[member] = instantDict[member] - weeklyDict[member] #thus fucking this up
Based on my testing, str(e) returns a string as such:
"'test'"
The value is a string displaying a string, so .get() is not finding the value. Try something like:
for member, wins in instantDict.items():
try:
instantDict[member] = instantDict[member] - weeklyDict[member]
except KeyError as e:
weeklyDict[str(e).strip("'")] = instantDict.get(str(e).strip("'"))
instantDict[member] = instantDict[member] - weeklyDict[member]
That should take the extra string characters off of the keyword, and allow .get() to actually find the value.
Alternatively, if you know that it errored because you know that member is not in the dictionary, why pull the exact same variable from the exception when you could just use member again?
Maybe it can't fetch the thing so try this:
weeklyDict[stre(e)] = instantDict.get(stre(e)]
Please mind you, I'm new to Pandas/Python and I don't know what I'm doing.
I'm working with CSV files and I basically filter currencies.
Every other day, the exported CSV file may contain or not contain certain currencies.
I have several such cells of codes--
AUDdf = df.loc[df['Currency'] == 'AUD']
AUDtable = pd.pivot_table(AUDdf,index=["Username"],values=["Amount"],aggfunc=np.sum)
AUDtable.loc['AUD Amounts Rejected Grand Total'] = (AUDdf['Amount'].sum())
AUDdesc = AUDdf['Amount'].describe()
When the CSV doesn't contain AUD, I get ValueError: cannot set a frame with no defined columns.
What I'd like to produce is a function or an if statement or a loop that checks if the column contains AUD, and if it does, it runs the above code, and if it doesn't, it simply skips it and proceeds to the next line of code for the next currency.
Any idea how I can accomplish this?
Thanks in advance.
This can be done in 2 ways:
You can create a try and except statement, this will try and look for the given currency and if a ValueError occurs it will skip and move on:
try:
AUDdf = df.loc[df['Currency'] == 'AUD']
AUDtable = pd.pivot_table(AUDdf,index=["Username"],values["Amount"],aggfunc=np.sum)
AUDtable.loc['AUD Amounts Rejected Grand Total'] = (AUDdf['Amount'].sum())
AUDdesc = AUDdf['Amount'].describe()
except ValueError:
pass
You can create an if statement which looks for the currencies presence first:
currency_set = set(list(df['Currency'].values))
if 'AUD' in currency_set:
AUDdf = df.loc[df['Currency'] == 'AUD']
AUDtable = pd.pivot_table(AUDdf,index=["Username"],values=["Amount"],aggfunc=np.sum)
AUDtable.loc['AUD Amounts Rejected Grand Total'] = (AUDdf['Amount'].sum())
AUDdesc = AUDdf['Amount'].describe()
1.Worst way to skip over the error/exception:
try:
<Your Code>
except:
pass
The above is probably the worst way because you want to know when an exception occur. using generic Except statements is bad practice because you want to avoid "catch em all" code. You want to be catching exceptions that you know how to handle. You want to know what specific exception occurred and you need to handle them on an exception-by-exception basis. Writing Generic except statements leads to missed bugs and tends to mislead while running the code to test.
Slightly worse way to handle the exception:
try:
<Your Code>
except Exception as e:
<Some code to handle an exception>
Still not optimal as it is still generic handling
Average way to handle it for your case:
try:
<Your Code>
except ValueError:
<Some code to handle this exception>
Other suggestion - Much Better Ways to deal with this:
1.You can get a set of the available columns at run time and aggregate based on if 'AUD' is in the list.
2.Clean your data set
You can use try and except where
try:
#your code here
except:
#some print statement
pass
I have code that is meant to find a graph on a webpage and create a link for web-crawling from it. If a graph is not found, then I've put in a try/except to print a message with a corresponding (player) link so it goes on to the next one if not found.
It's from a football valuation website and I've reduced the list two players for debugging: one is Kylian Mbappé (who has a graph on his page and should pass) and the other Ansu Fati (who doesn't). Attempting to grab the Ansu Fati's graph tag from his profile using BeautifulSoup results in a NoneType error.
The issue here is that Mbappé's graph link does get picked up for processing downstream in the code, but the "except" error/link message in the except clause is also printed to the console. This should only be the case for Ansu Fati.
Here's the code
final_url_list = ['https://www.transfermarkt.us/kylian-mbappe/profil/spieler/342229','https://www.transfermarkt.com/ansu-fati/profil/spieler/466810']
for i in final_url_list:
try:
int_page = requests.get(i, headers = {'User-Agent':'Mozilla/5.0'}).text
except requests.exceptions.Timeout:
sys.exit(1)
parsed_int_page = BeautifulSoup(int_page,'lxml')
try:
graph_container = parsed_int_page.find('div', class_='large-7 columns small-12 marktwertentwicklung-graph')
graph_a = graph_container.find('a')
graph_link = graph_a.get('href')
final_url_list.append('https://www.transfermarkt.us' + graph_link)
except:
pass
print("Graph error:" + i)
I tried using PyCharm's debugging to see how the interpreter is going through the steps and it seems like the whole except clause is skipped, but when I run it in the console, the "Graph error: link" is posted for both. I'm not sure what is wrong with the code for the try/except issue to be behaving this way.
The line
except None:
is looking for an exception with type None, which is impossible.
Try changing that line to
except AttributeError:
Doing so will result in the following output:
Graph error:https://www.transfermarkt.com/ansu-fati/profil/spieler/466810
Graph error:https://www.transfermarkt.us/kylian-mbappe/marktwertverlauf/spieler/342229
There's an additional issue here where you're modifying the list that you're iterating over, which is not only bad practice, but is resulting in the unexpected behavior you're seeing.
Because you're appending to the list you're iterating over, you're going to add an iteration for a url that you don't actually want to be scraping. To fix this, change the first couple of lines in your script to this:
url_list = ['https://www.transfermarkt.us/kylian-mbappe/profil/spieler/342229','https://www.transfermarkt.com/ansu-fati/profil/spieler/466810']
final_url_list = []
for i in url_list:
This way, you're appending the graph links to a different list, and you won't try to scrape links that you shouldn't be scraping. This will put all of the "graph links" into final_url_list
I'm trying to write a script that will go through a list of urls and scrape the web page connected to that url and save the contents to a text file. Unfortunately, a few random urls lead to a page that isn't formatted in the same way and that gets me an IndexError. How do I write a script that will just skip the IndexError and move onto the next URL? I tried the code below but just get syntax errors. Thank you so much in advance for your help.
from bs4 import BeautifulSoup, SoupStrainer
import urllib2
import io
import os
import re
urlfile = open("dailynewsurls.txt",'r') # read one line at a time until end of file
for url in urlfile:
try:
page = urllib2.urlopen(url)
pagecontent = page.read() # get a file-like object at this url
soup = BeautifulSoup(pagecontent)
title = soup.find_all('title')
article = soup.find_all('article')
title = str(title[0].get_text().encode('utf-8'))
except IndexError:
return None
article = str(article[0].get_text().encode('utf-8'))
except IndexError:
return None
outfile = open(output_files_pathname + new_filename,'w')
outfile.write(title)
outfile.write("\n")
outfile.write(article)
outfile.close()
print "%r added as a text file" % title
print "All done."
The error I get is:
File "dailynews.py", line 39
except IndexError:
^
SyntaxError: invalid syntax
you would do something like:
try:
# the code that can cause the error
except IndexError: # catch the error
pass # pass will basically ignore it
# and execution will continue on to whatever comes
# after the try/except block
If you're in a loop, you could use continue instead of pass.
continue will immediately jump to the next iteration of the loop,
regardless of whether there was more code to execute in the iteration
it jumps from. sys.exit(0) would end the program.
Do the following:
except IndexError:
pass
And as suggested by another user, remove the another except IndexError.
When I run your actual program, either the original version or the edited one, in either Python 2.5 or 2.7, the syntax error I get is:
SyntaxError: 'return' outside function
And the meaning of that should be pretty obvious: You can't return from a function if you aren't in a function. If you want to "return" from the entire program, you can do that with exit:
import sys
# ...
except IndexError:
sys.exit()
(Note that you can give a value to exit, but it has to be a small integer, not an arbitrary Python value. Most shells have some way to use that return value, normally expecting 0 to mean success, a positive number to mean an error.)
In your updated version, if you fix that (whether by moving this whole thing into a function and then calling it, or by using exit instead of return) you will get an IndentationError. The lines starting with outfile = … have to be either indented to the same level as the return None above (in which case they're part of the except clause, and will never get run), or dedented back to the same level as the try and except lines (in which case they will always run, unless you've done a continue, return, break, exit, unhandled raise, etc.).
If you fix that, there are no more syntax errors in the code you showed us.
I suspect that your edited code still isn't your real code, and you may have other syntax errors in your real code. One common hard-to-diagnose error is a missing ) (or, less often, ] or }) at the end of a line, which usually causes the next line to report a SyntaxError, often at some odd location like a colon that looks (and would be, without the previous line) perfectly valid. But without seeing your real code (or, better, a real verifiable example), it's impossible to diagnose any further.
That being said, I don't think you want to return (or exit) here at all. You're trying to continue on to the next iteration of the loop. You do that with the continue statement. The return statement breaks out of the loop, and the entire function, which means none of the remaining URLs will ever get processed.
Finally, while it's not illegal, it's pointless to have extra statements after a return, continue, etc., because those statements can never get run. And similarly, while it's not illegal to have two except clauses with the same exception, it's pointless; the second one can only run in the case where the exception isn't an IndexError but is an IndexError, which means never.
I suspect you may have wanted a separate try/except around each of the two indexing statements, instead of one around the entire loop. While that isn't at all necessary here, it can sometimes make things clearer. If that's what you're going for, you want to write it like this:
page = urllib2.urlopen(url)
pagecontent = page.read() # get a file-like object at this url
soup = BeautifulSoup(pagecontent)
title = soup.find_all('title')
article = soup.find_all('article')
try:
title = str(title[0].get_text().encode('utf-8'))
except IndexError:
continue
try:
article = str(article[0].get_text().encode('utf-8'))
except IndexError:
return continue
outfile = open(output_files_pathname + new_filename,'w')
outfile.write(title)
outfile.write("\n")
outfile.write(article)
outfile.close()
print "%r added as a text file" % title
You cant "return"
except IndexError:
return None
article = str(article[0].get_text().encode('utf-8'))
this is not a function call
use a "pass" or "break" or "continue"
EDIT
try this
try:
page = urllib2.urlopen(url)
pagecontent = page.read() # get a file-like object at this url
soup = BeautifulSoup(pagecontent)
title = soup.find_all('title')
article = soup.find_all('article')
title = str(title[0].get_text().encode('utf-8'))
except IndexError:
try:
article = str(article[0].get_text().encode('utf-8'))
except IndexError:
continue