Get XML from xml.parsers.expat.ExpatError - python

I basically have the following code that is pinging a website and trying to process xml that it returns.
def listen_once(self, seen):
data = getDataFromWeb()
xml = xmltodict.parse(data.text)['root']
self.foo(xml)
def listen_once_safe(self, seen):
''' One loop of the main loop with error handling. '''
try:
return self.listen_once(seen)
except xml.parsers.expat.ExpatError as exc:
frameinfo = getframeinfo(currentframe())
print(frameinfo.lineno, exc)
I get ExpatErrors somewhat frequently, but I'm not sure how to debug it. Is there a way for me to find what data.text was from within the except block?
Edit:
I ended up solving my problem by just putting the debug code in listen_once, but that's not a real answer so I would stil like one.

Related

Python: How to properly handle an exception using 'ask for forgiveness' (try–except) approach?

I encountered a problem with my custom exceptions since they exit the process (the interpreter displays a traceback) instead of being properly handled in the code. Since I do not have much experience working with custom exceptions and exceptions imported from modules in the same code work properly, I pressue I made some mistake while defining my exceptions but I cannot find proper documentation to fix it myself.
Here is a sample code.
It is supposed to check whether XML path input by user works (by work I mean it returns the value contained within that XML element node) and if it does not work, it raises XMLPrefixMissing exception (due to possibly missing namespace prefix in the XML path). Then it uses an XML path with wildcard operator in place of a namespace prefix) but if it does not work anyways, it raises XMLElementNotFound (due to the fact that an element is possibly not in the XML file).
import xml.etree.ElementTree as ElementTree
class Error(Exception):
"""Error base class"""
pass
class XMLPrefixMissing(Error):
"""Error for when an element is not found on an XML path"""
def __init__(self,
message='No element found on an XML path. Possibly missing namespace prefix.'):
self.message = message
super(Error, self).__init__(message)
class XMLElementNotFound(Error):
"""Error for when an element value on an XML path is an empty string"""
def __init__(self, message='No element found on an XML path.'):
self.message = message
super(Error, self).__init__(message)
# Some code
file = '.\folder\example_file.xml'
xml_path = './DataArea/Order/Item/Description/ItemName'
xml_path_with_wildcard = './{*}DataArea/{*}Order/{*}Item/{*}Description/{*}ItemName'
namespaces = {'': 'http://firstnamespace.example.com/', 'foo': 'http://secondnamespace.example.com/'}
def xml_parser(file, xml_path, xml_path_with_wildcard, namespaces):
tree = ElementTree.parse(file)
root = tree.getroot()
try:
if root.find(xml_path, namespaces=namespaces) is None:
raise XMLElementNotFound
# Some code
except XMLPrefixMissing:
if root.find(xml_path_with_wildcard, namespaces=namespaces) is None:
raise XMLElementValueEmpty
# Some code
except XMLElementNotFound as e:
print(e)
From experience, try-except blocks kind of operate as an if/elif/else chain.
It looks to me like you're trying to raise exceptions in one except block in hopes that it will be caught in the next except block.
Instead, you should try to account for all exceptions inside the try block, and just have different except blocks to catch the different exceptions.
try:
if root.find(xml_path, namespaces=namespaces) is None:
raise XMLElementNotFound
elif root.find(xml_path_with_wildcard, namespaces=namespaces) is None:
raise XMLElementValueEmpty
except XMLPrefixMissing:
#some code
except XMLElementValueEmpty as e:
print(e)

Python - Instaloader ProfileNotExistsException

I am new to Instaloader and I am running into a problem when trying to pull in Bio information. We have scraped Google for a list of Instagram handles for our accounts, unfortunately the data isn't perfect and some of the handles we have pulled in are no longer active(User has changed profile handle or deleted account). This causes the ProfileNotExistsException error to come up and stops pulling in the information for all subsequent accounts.
Is there a way to ignore this and continue pulling in the rest of the bios while just leaving this one blank?
Here is the code that is throwing me the error. handles is the list of handles we have.
bios = []
for element in handles:
if element == '': bios.append('NULL')
else:
bios.append(instaloader.Profile.from_username(L.context, element).biography)
I have tried using the workaround found in this forum(can't find the post) but it is not working for me. No errors, just not solving the issue. The code they suggested was:
def _obtain_metadata(self):
try:
if self._rhx_gis == None:
metadata = self._context.get_json('{}/'.format(self.username), params={})
self._node = metadata['entry_data']['ProfilePage'][0]['graphql']['user']
self._rhx_gis = metadata['rhx_gis']
metadata = self._context.get_json('{}/'.format(self.username), params={})
self._node = metadata['entry_data']['ProfilePage'][0]['graphql']['user']
except (QueryReturnedNotFoundException, KeyError) as err:
raise ProfileNotExistsException('Profile {} does not exist.'.format(self.username)) from err
Thanks in advance!

Who/How to get the control of the program after an exception has ocurred

I have always wondered who takes the control of the program after an exception has thrown. I was seeking for a clear answer but did not find any. I have the following functions described, each one executes an API call which involves a network request, therefore I need to handle any possible errors by a try/except and possibly else block (JSON responses must be parsed/decoded as well):
# This function runs first, if this fails, none of the other functions will run. Should return a JSON.
def get_summary():
pass
# Gets executed after get_summary. Should return a string.
def get_block_hash():
pass
# Gets executed after get_block_hash. Should return a JSON.
def get_block():
pass
# Gets executed after get_block. Should return a JSON.
def get_raw_transaction():
pass
I wish to implement a kind of retry functionality on each function, so if it fails due to a timeout error, connection error, JSON decode error etc., it will keep retrying without compromising the flow of the program:
def get_summary():
try:
response = request.get(API_URL_SUMMARY)
except requests.exceptions.RequestException as error:
logging.warning("...")
#
else:
# Once response has been received, JSON should be
# decoded here wrapped in a try/catch/else
# or outside of this block?
return response.text
def get_block_hash():
try:
response = request.get(API_URL + "...")
except requests.exceptions.RequestException as error:
logging.warning("...")
#
else:
return response.text
def get_block():
try:
response = request.get(API_URL + "...")
except requests.exceptions.RequestException as error:
logging.warning("...")
#
else:
#
#
#
return response.text
def get_raw_transaction():
try:
response = request.get(API_URL + "...")
except requests.exceptions.RequestException as error:
logging.warning("...")
#
else:
#
#
#
return response.text
if __name__ == "__main__":
# summary = get_summary()
# block_hash = get_block_hash()
# block = get_block()
# raw_transaction = get_raw_transaction()
# ...
I want to keep clean code on the outermost part of it (block after if __name__ == "__main__":), I mean, I don't want to fill it with full of confused try/catch blocks, logging, etc.
I tried to call a function itself when an exception threw on any of those functions but then I read about stack limit and thought it was a bad idea, there should be a better way to handle this.
request already retries by itself N number of times when I call the get method, where N is a constant in the source code, it is 100. But when the number of retries has reached 0 it will throw an error I need to catch.
Where should I decode JSON response? Inside each function and wrapped by another try/catch/else block? or in the main block? How can I recover from an exception and keep trying on the function it failed?
Any advice will be grateful.
You could keep those in an infinite loop (to avoid recursion) and once you get the expected response just return:
def get_summary():
while True:
try:
response = request.get(API_URL_SUMMARY)
except requests.exceptions.RequestException as error:
logging.warning("...")
#
else:
# As winklerrr points out, try to return the transformed data as soon
# as possible, so you should be decoding JSON response here.
try:
json_response = json.loads(response)
except ValueError as error: # ValueError will catch any error when decoding response
logging.warning(error)
else:
return json_response
This function keeps executing until it receives the expected result (reaches return json_response) otherwise it will be trying again and again.
You can do the following
def my_function(iteration_number=1):
try:
response = request.get(API_URL_SUMMARY)
except requests.exceptions.RequestException:
if iteration_number < iteration_threshold:
my_function(iteration_number+1)
else:
raise
except Exception: # for all other exceptions, raise
raise
return json.loads(resonse.text)
my_function()
Where should I decode JSON response?
Inside each function and wrapped by another try/catch/else block or in the main block?
As a rule thumb: try to transform data as soon as possible into the format you want it to be. It makes the rest of your code easier if you don't have to extract everything again from a response object all the time. So just return the data you need, in the easiest format you need it to be.
In your scenario: You call that API in every function with the same call to requests.get(). Normally all the responses from an API have the same format. So this means, you could write an extra function which does that call for you to the API and directly loads the response into a proper JSON object.
Tip: For working with JSON make use of the standard library with import json
Example:
import json
def call_api(api_sub_path):
repsonse = requests.get(API_BASE_URL + api_sub_path)
json_repsonse = json.loads(repsonse.text)
# you could verify your result here already, e.g.
if json_response["result_status"] == "successful":
return json_response["result"]
# or maybe throw an exception here, depends on your use case
return json_response["some_other_value"]
How can I recover from an exception and keep trying on the function it failed?
You could use a while loop for that:
def main(retries=100): # default value if no value is given
result = functions_that_could_fail(retries)
if result:
logging.info("Finished successfully")
functions_that_depend_on_result_from_before(result)
else:
logging.info("Finished without result")
def functions_that_could_fail(retry):
while(retry): # is True as long as retry is bigger than 0
try:
# call all functions here so you just have to write one try-except block
summary = get_summary()
block_hash = get_block_hash()
block = get_block()
raw_transaction = get_raw_transaction()
except Exception:
retry -= 1
if retry:
logging.warning("Failed, but trying again...")
else:
# else gets only executed when no exception was raised in the try block
logging.info("Success")
return summary, block_hash, block, raw_transaction
logging.error("Failed - won't try again.")
result = None
def functions_that_depend_on_result_from_before(result):
[use result here ...]
So with the code from above you (and maybe also some other people who use your code) could start your program with:
if __name__ == "__main__":
main()
# or when you want to change the number of retries
main(retries=50)

Python: Break-up large function into segments

I am creating a Bot for Reddit. I currently only have 1 very large function and I am looking to create sub-functions to make it more readable.
Here is a rough break-down of what it does
def replybot():
submissions = reversed(list(subreddit.get_new(limit=MAXPOSTS)))
for post in submissions:
try:
author = post.author.name
except AttributeError:
print "AttributeError: Author is deleted"
continue # Author is deleted. We don't care about this post.
# DOES PID EXIST IN DB? IF NOT ADD IT
cur.execute('SELECT * FROM oldposts WHERE ID=?', [pid])
sql.commit()
if cur.fetchone(): # Post is already in the database
continue
cur.execute('INSERT INTO oldposts VALUES(?)', [pid])
sql.commit()
...
I am looking to break the code up into segments i.e. put
try:
author = post.author.name
except AttributeError:
print "AttributeError: Author is deleted"
continue # Author is deleted. We don't care about this post.
in it's own function and call it from within replybot() but I run into the issue of calling continue. I get SyntaxError: 'continue' not properly in loop
Is there a way for me to do this?
If you take the inner part of a loop and convert it to its own function, it's no longer in a loop. The equivalent of continue in a loop, for a function, is return (i.e. terminate this iteration (which is now a function call) early).
Raise the error again instead of trying to continue. Either simply let it bubble to the main loop, or if you need better error handling, make your own custom error. For instance:
class MyNotFatalError(Exception):
pass
def do_something():
try:
a, b, c = 1, 2
except ValueError:
raise MyNotFatalError('Something went wrong')
# In your main function
for post in submissions:
try:
do_something()
do_some_other_thing()
do_some_more()
except MyNotFatalError as err:
continue # we could print out the error text here
do_some_last_thing()
It is probably better that way because you only catch errors you know you want to catch, and still let the program crash when there are actual bugs.
If you had simply caught ValueError that would also intercept and hide all other possible sources of the same kind of error.
as Claudiu said, when you broke inner commands into it's own function; It's not no longer in the loop and your code will be look like this:
def isNotAuthorDeleted(post):
try:
author = post.author.name
return author
except AttributeError:
print "AttributeError: Author is deleted"
return false
and your loop will be:
for post in submissions:
if not isNotAuthorDeleted(post):
continue

dealing with empty url breaking xml parsing loop

I am writing a code to parse through a bunch of xml files. It basically looks like this:
for i in range(0, 20855):
urlb = str(i)
url = urla + urlb
trys=0
t=0
while (trys < 3):
try:
cfile = UR.urlopen(url)
trys = 3
except urllib.error.HTTPError as e:
t=t+1
print('error at '+str(time.time()-tstart)+' seconds')
print('typeID = '+str(i))
print(e.code)
print(e.read())
time.sleep (0.1)
trys=0+t
tree = ET.parse(cfile) ##parse xml file
root = tree.getroot()
...do a bunch of stuff with i and the file data
I'm having a problem with some of the urls I'm calling not actually containing an xml file which breaks my code. I have a list of all the actual numbers that I use instead of the range shown but i really don't want to go through all 21000 and remove each number that fails. Is there an easier way to get around this? I get an error from the while loop (which i have to deal with timeouts really) that looks like this:
b'A non-marketable type was given'
error at 4.321678161621094 seconds
typeID = 31
400
So I was thinking there has to be a good way to bail out of that iteration of the for-loop if my while-loop returns three errors but i can't use break. Maybe an if/else-loop under the while-loop that just passes if the t variable is 3?
You might try this:
for i in range(0, 20855):
url = '%s%d' % (urla, i)
for trys in range(3):
try:
cfile = UR.urlopen(url)
break
except urllib.error.HTTPError as e:
print('error at %s seconds' % (time.time()-tstart))
print('typeID = %i'%i)
print(e.code)
print(e.read())
time.sleep(0.1)
else:
print "retry failed 3 times"
continue
try:
tree = ET.parse(cfile) ##parse xml file
except Exception, e:
print "cannot read xml"
print e
continue
root = tree.getroot()
...do a bunch of stuff with i and the file data
Regarding your "algorithmic" problem: You can always set an error state (as simple as e.g. last_iteration_successful = False) in the while body, then break out of the while body, then check the error state in the for body, and conditionally break out of the for body, too.
Regarding architecture: Prepare your code for all relevant errors that might occur, via proper exception handling with try/except blocks. It might also make sense to define custom Exception types, and then raise them manually. Raising an exception immediately interrupts the current control flow, it could save many breaks.

Categories

Resources