dealing with empty url breaking xml parsing loop - python

I am writing a code to parse through a bunch of xml files. It basically looks like this:
for i in range(0, 20855):
urlb = str(i)
url = urla + urlb
trys=0
t=0
while (trys < 3):
try:
cfile = UR.urlopen(url)
trys = 3
except urllib.error.HTTPError as e:
t=t+1
print('error at '+str(time.time()-tstart)+' seconds')
print('typeID = '+str(i))
print(e.code)
print(e.read())
time.sleep (0.1)
trys=0+t
tree = ET.parse(cfile) ##parse xml file
root = tree.getroot()
...do a bunch of stuff with i and the file data
I'm having a problem with some of the urls I'm calling not actually containing an xml file which breaks my code. I have a list of all the actual numbers that I use instead of the range shown but i really don't want to go through all 21000 and remove each number that fails. Is there an easier way to get around this? I get an error from the while loop (which i have to deal with timeouts really) that looks like this:
b'A non-marketable type was given'
error at 4.321678161621094 seconds
typeID = 31
400
So I was thinking there has to be a good way to bail out of that iteration of the for-loop if my while-loop returns three errors but i can't use break. Maybe an if/else-loop under the while-loop that just passes if the t variable is 3?

You might try this:
for i in range(0, 20855):
url = '%s%d' % (urla, i)
for trys in range(3):
try:
cfile = UR.urlopen(url)
break
except urllib.error.HTTPError as e:
print('error at %s seconds' % (time.time()-tstart))
print('typeID = %i'%i)
print(e.code)
print(e.read())
time.sleep(0.1)
else:
print "retry failed 3 times"
continue
try:
tree = ET.parse(cfile) ##parse xml file
except Exception, e:
print "cannot read xml"
print e
continue
root = tree.getroot()
...do a bunch of stuff with i and the file data

Regarding your "algorithmic" problem: You can always set an error state (as simple as e.g. last_iteration_successful = False) in the while body, then break out of the while body, then check the error state in the for body, and conditionally break out of the for body, too.
Regarding architecture: Prepare your code for all relevant errors that might occur, via proper exception handling with try/except blocks. It might also make sense to define custom Exception types, and then raise them manually. Raising an exception immediately interrupts the current control flow, it could save many breaks.

Related

Get XML from xml.parsers.expat.ExpatError

I basically have the following code that is pinging a website and trying to process xml that it returns.
def listen_once(self, seen):
data = getDataFromWeb()
xml = xmltodict.parse(data.text)['root']
self.foo(xml)
def listen_once_safe(self, seen):
''' One loop of the main loop with error handling. '''
try:
return self.listen_once(seen)
except xml.parsers.expat.ExpatError as exc:
frameinfo = getframeinfo(currentframe())
print(frameinfo.lineno, exc)
I get ExpatErrors somewhat frequently, but I'm not sure how to debug it. Is there a way for me to find what data.text was from within the except block?
Edit:
I ended up solving my problem by just putting the debug code in listen_once, but that's not a real answer so I would stil like one.

Restart script when error happens and continue with right line in csv file

hi guys its my first time to ask help here i hope you can help me
i have this code that i write it
def my_function():
try :
with open('file.csv', 'r') as f:
data = list(csv.reader(f, delimiter=','))
i = 1
while i <= 10:
i += 1
fname = data[i][0]
lname = data[i][1]
options = Options()
driver = webdriver.Chrome(options=options)
driver.get("https://www.test.net/")
#Do staff
except Exception as e:
print(e)
driver.quit()
time.sleep(1)
print('******RESTART******')
my_function()
my_function()
well i'm trying to make this script run without stop .. the problem that i'm facing is when it stops for example in line number 8 (i = 8) and restart again ,it starts from first line (i = 1) .
i want the script to restart from line 8 and continue to 9 , 10 ...
can you please guide me to the right solution .. thank you
Your code is making this far more difficult than it needs to be.
First, you almost certainly don't want to wrap this entire block of code in a "catch all" exception handler. You want your exception handling to be sufficiently specific (limited) that you can do something meaningful with the exception. For example:
#!python
# Assumes Python version 3 or later
import sys, csv
filename='myfile.csv'
with open(filename as f:
try:
reader = csv.reader(f)
for record in reader:
if len(record) != 2:
# log error and continue
print('Malformed records in {}: {}'.format(filename, reader.line_num), file=sys.stderr)
continue
# do stuff with this record, knowing it has exactly two fields:
fname = record[0]
lname = record[1]
# etc ...
except csv.Error as e:
print('Error handling {} at line {}: {}'.format(filename, reader.line_num, e), file=sys.stderr)
Note that your errors probably weren't specifically in the csv module. It's pretty tolerant of malformed lines. But I'm showing how to wrap the reader and processing code within exception handling just for that. Your error was probably an IndexError (trying to access an item past the number of items in a list ... outside of its valid indexing range. It's better to just check the length of each record rather than use exception handling for that ... though it's possible either way.
There's a quite reasonable example (very similar code) in the documentation for the standard libraries: https://docs.python.org/3/library/csv.html
Also, stylistically, I'd suggest that a named tuple or a lightweight class (using __slots__) for managing these records. This would allow you to use dot notation to access the .fname and .lname of each rather than using [x] and numeric indexing. (Numeric indexing gets progressively more cumbersome and error prone as your code complexity increases).
You can set i to a key word argument with a default of 1 then on each exception pass the current i when you restart your function so it picks up from there.
This is a simplified example of what I'm recommending following the same general method you are using in your question (but with fake data so I can run it without having your CSV file).
def my_function(i=1):
try:
if i == 4: # to prevent forever recursion
return
else:
print(i) # keep track of loops
i += 1
x = int('te') # causes an error
except ValueError:
my_function(i) # send current i back through
my_function(i=0)
thank you for your quick response .. i tried the solution provided by –Kevin Welch and –Selcuk it works fine for me thx
here is the solution
def my_function():
try :
with open('file.csv', 'r') as f:
data = list(csv.reader(f, delimiter=','))
i = 1
while i <= 10:
i += 1
try :
fname = data[i][0]
lname = data[i][1]
options = Options()
driver = webdriver.Chrome(options=options)
driver.get("https://www.test.net/")
# Do staff
except Exception as e:
print(e)
driver.quit()
time.sleep(1)
print('******RESTAR******')
continue
my_function()

How to skip one part of a single loop iteration in Python

I am creating about 200 variables within a single iteration of a python loop (extracting fields from excel documents and pushing them to a SQL database) and I am trying to figure something out.
Let's say that a single iteration is a single Excel workbook that I am looping through in a directory. I am extracting around 200 fields from each workbook.
If one of these fields I extract (lets say field #56 out of 200) and it isn't in proper format (lets say the date was filled out wrong ie. 9/31/2015 which isnt a real date) and it errors out with the operation I am performing.
I want the loop to skip that variable and proceed to creating variable #57. I don't want the loop to completely go to the next iteration or workbook, I just want it to ignore that error on that variable and continue with the rest of the variables for that single loop iteration.
How would I go about doing something like this?
In this sample code I would like to continue extracting "PolicyState" even if ExpirationDate has an error.
Some sample code:
import datetime as dt
import os as os
import xlrd as rd
files = os.listdir(path)
for file in files: #Loop through all files in path directory
filename = os.fsdecode(file)
if filename.startswith('~'):
continue
elif filename.endswith( ('.xlsx', '.xlsm') ):
try:
book = rd.open_workbook(os.path.join(path,file))
except KeyError:
print ("Error opening file for "+ file)
continue
SoldModelInfo=book.sheet_by_name("SoldModelInfo")
AccountName=str(SoldModelInfo.cell(1,5).value)
ExpirationDate=dt.datetime.strftime(xldate_to_datetime(SoldModelInfo.cell(1,7).value),'%Y-%m-%d')
PolicyState=str(SoldModelInfo.cell(1,6).value)
print("Insert data of " + file +" was successful")
else:
continue
Use multiple try blocks. Wrap each decode operation that might go wrong in its own try block to catch the exception, do something, and carry on with the next one.
try:
book = rd.open_workbook(os.path.join(path,file))
except KeyError:
print ("Error opening file for "+ file)
continue
errors = []
SoldModelInfo=book.sheet_by_name("SoldModelInfo")
AccountName=str(SoldModelInfo.cell(1,5).value)
try:
ExpirationDate=dt.datetime.strftime(xldate_to_datetime(SoldModelInfo.cell(1,7).value),'%Y-%m-%d')
except WhateverError as e:
# do something, maybe set a default date?
ExpirationDate = default_date
# and/or record that it went wrong?
errors.append( [ "ExpirationDate", e ])
PolicyState=str(SoldModelInfo.cell(1,6).value)
...
# at the end
if not errors:
print("Insert data of " + file +" was successful")
else:
# things went wrong somewhere above.
# the contents of errors will let you work out what
As suggested you could use multiple try blocks on each of your extract variable, or you could streamline it with your own custom function that handles the try for you:
from functools import reduce, partial
def try_funcs(cell, default, funcs):
try:
return reduce(lambda val, func: func(val), funcs, cell)
except Exception as e:
# do something with your Exception if necessary, like logging.
return default
# Usage:
AccountName = try_funcs(SoldModelInfo.cell(1,5).value, "some default str value", str)
ExpirationDate = try_funcs(SoldModelInfo.cell(1,7).value), "some default date", [xldate_to_datetime, partial(dt.datetime.strftime, '%Y-%m-%d')])
PolicyState = try_funcs(SoldModelInfo.cell(1,6).value, "some default str value", str)
Here we use reduce to repeat multiple functions, and pass partial as a frozen function with arguments.
This can help your code look tidy without cluttering up with lots of try blocks. But the better, more explicit way is just handle the fields you anticipate might error out individually.
So, basically you need to wrap your xldate_to_datetime() call into try ... except
import datetime as dt
v = SoldModelInfo.cell(1,7).value
try:
d = dt.datetime.strftime(xldate_to_datetime(v), '%Y-%m-%d')
except TypeError as e:
print('Could not parse "{}": {}'.format(v, e)

Use API to write to json file

I am facing this problem while I try to loop tweet_id using the API and write it to tweet_json.txt, the output for all data is Failed which I know is wrong
Before it was working good but when I try to Run all the code again it starts to show failed
for tweet_id in df['tweet_id']:
try:
tweet = api.get_status(tweet_id, tweet_mode = 'extended')
with open('tweet_json.txt', 'a+') as file:
json.dump(tweet._json, file)
file.write('\n')
print (tweet_id, 'success')
except:
print (tweet_id, 'Failed')
Your except is swallowing whatever exception is causing your code to die. Until you comment out the except or make it more specific you won't know if your problem is the Twitter API or file I/O or something else. Good luck!
A quick step forward would be to adjust your exception handler so that it writes the exception. I like to use the format_exc function to get my stack traces so i can write it with a logger, or however i want to handle it.
from traceback import format_exc
try:
a = "" + 1
except Exception as ex:
print("Exception encountered! \n %s " % format_exc())

Who/How to get the control of the program after an exception has ocurred

I have always wondered who takes the control of the program after an exception has thrown. I was seeking for a clear answer but did not find any. I have the following functions described, each one executes an API call which involves a network request, therefore I need to handle any possible errors by a try/except and possibly else block (JSON responses must be parsed/decoded as well):
# This function runs first, if this fails, none of the other functions will run. Should return a JSON.
def get_summary():
pass
# Gets executed after get_summary. Should return a string.
def get_block_hash():
pass
# Gets executed after get_block_hash. Should return a JSON.
def get_block():
pass
# Gets executed after get_block. Should return a JSON.
def get_raw_transaction():
pass
I wish to implement a kind of retry functionality on each function, so if it fails due to a timeout error, connection error, JSON decode error etc., it will keep retrying without compromising the flow of the program:
def get_summary():
try:
response = request.get(API_URL_SUMMARY)
except requests.exceptions.RequestException as error:
logging.warning("...")
#
else:
# Once response has been received, JSON should be
# decoded here wrapped in a try/catch/else
# or outside of this block?
return response.text
def get_block_hash():
try:
response = request.get(API_URL + "...")
except requests.exceptions.RequestException as error:
logging.warning("...")
#
else:
return response.text
def get_block():
try:
response = request.get(API_URL + "...")
except requests.exceptions.RequestException as error:
logging.warning("...")
#
else:
#
#
#
return response.text
def get_raw_transaction():
try:
response = request.get(API_URL + "...")
except requests.exceptions.RequestException as error:
logging.warning("...")
#
else:
#
#
#
return response.text
if __name__ == "__main__":
# summary = get_summary()
# block_hash = get_block_hash()
# block = get_block()
# raw_transaction = get_raw_transaction()
# ...
I want to keep clean code on the outermost part of it (block after if __name__ == "__main__":), I mean, I don't want to fill it with full of confused try/catch blocks, logging, etc.
I tried to call a function itself when an exception threw on any of those functions but then I read about stack limit and thought it was a bad idea, there should be a better way to handle this.
request already retries by itself N number of times when I call the get method, where N is a constant in the source code, it is 100. But when the number of retries has reached 0 it will throw an error I need to catch.
Where should I decode JSON response? Inside each function and wrapped by another try/catch/else block? or in the main block? How can I recover from an exception and keep trying on the function it failed?
Any advice will be grateful.
You could keep those in an infinite loop (to avoid recursion) and once you get the expected response just return:
def get_summary():
while True:
try:
response = request.get(API_URL_SUMMARY)
except requests.exceptions.RequestException as error:
logging.warning("...")
#
else:
# As winklerrr points out, try to return the transformed data as soon
# as possible, so you should be decoding JSON response here.
try:
json_response = json.loads(response)
except ValueError as error: # ValueError will catch any error when decoding response
logging.warning(error)
else:
return json_response
This function keeps executing until it receives the expected result (reaches return json_response) otherwise it will be trying again and again.
You can do the following
def my_function(iteration_number=1):
try:
response = request.get(API_URL_SUMMARY)
except requests.exceptions.RequestException:
if iteration_number < iteration_threshold:
my_function(iteration_number+1)
else:
raise
except Exception: # for all other exceptions, raise
raise
return json.loads(resonse.text)
my_function()
Where should I decode JSON response?
Inside each function and wrapped by another try/catch/else block or in the main block?
As a rule thumb: try to transform data as soon as possible into the format you want it to be. It makes the rest of your code easier if you don't have to extract everything again from a response object all the time. So just return the data you need, in the easiest format you need it to be.
In your scenario: You call that API in every function with the same call to requests.get(). Normally all the responses from an API have the same format. So this means, you could write an extra function which does that call for you to the API and directly loads the response into a proper JSON object.
Tip: For working with JSON make use of the standard library with import json
Example:
import json
def call_api(api_sub_path):
repsonse = requests.get(API_BASE_URL + api_sub_path)
json_repsonse = json.loads(repsonse.text)
# you could verify your result here already, e.g.
if json_response["result_status"] == "successful":
return json_response["result"]
# or maybe throw an exception here, depends on your use case
return json_response["some_other_value"]
How can I recover from an exception and keep trying on the function it failed?
You could use a while loop for that:
def main(retries=100): # default value if no value is given
result = functions_that_could_fail(retries)
if result:
logging.info("Finished successfully")
functions_that_depend_on_result_from_before(result)
else:
logging.info("Finished without result")
def functions_that_could_fail(retry):
while(retry): # is True as long as retry is bigger than 0
try:
# call all functions here so you just have to write one try-except block
summary = get_summary()
block_hash = get_block_hash()
block = get_block()
raw_transaction = get_raw_transaction()
except Exception:
retry -= 1
if retry:
logging.warning("Failed, but trying again...")
else:
# else gets only executed when no exception was raised in the try block
logging.info("Success")
return summary, block_hash, block, raw_transaction
logging.error("Failed - won't try again.")
result = None
def functions_that_depend_on_result_from_before(result):
[use result here ...]
So with the code from above you (and maybe also some other people who use your code) could start your program with:
if __name__ == "__main__":
main()
# or when you want to change the number of retries
main(retries=50)

Categories

Resources