I am creating about 200 variables within a single iteration of a python loop (extracting fields from excel documents and pushing them to a SQL database) and I am trying to figure something out.
Let's say that a single iteration is a single Excel workbook that I am looping through in a directory. I am extracting around 200 fields from each workbook.
If one of these fields I extract (lets say field #56 out of 200) and it isn't in proper format (lets say the date was filled out wrong ie. 9/31/2015 which isnt a real date) and it errors out with the operation I am performing.
I want the loop to skip that variable and proceed to creating variable #57. I don't want the loop to completely go to the next iteration or workbook, I just want it to ignore that error on that variable and continue with the rest of the variables for that single loop iteration.
How would I go about doing something like this?
In this sample code I would like to continue extracting "PolicyState" even if ExpirationDate has an error.
Some sample code:
import datetime as dt
import os as os
import xlrd as rd
files = os.listdir(path)
for file in files: #Loop through all files in path directory
filename = os.fsdecode(file)
if filename.startswith('~'):
continue
elif filename.endswith( ('.xlsx', '.xlsm') ):
try:
book = rd.open_workbook(os.path.join(path,file))
except KeyError:
print ("Error opening file for "+ file)
continue
SoldModelInfo=book.sheet_by_name("SoldModelInfo")
AccountName=str(SoldModelInfo.cell(1,5).value)
ExpirationDate=dt.datetime.strftime(xldate_to_datetime(SoldModelInfo.cell(1,7).value),'%Y-%m-%d')
PolicyState=str(SoldModelInfo.cell(1,6).value)
print("Insert data of " + file +" was successful")
else:
continue
Use multiple try blocks. Wrap each decode operation that might go wrong in its own try block to catch the exception, do something, and carry on with the next one.
try:
book = rd.open_workbook(os.path.join(path,file))
except KeyError:
print ("Error opening file for "+ file)
continue
errors = []
SoldModelInfo=book.sheet_by_name("SoldModelInfo")
AccountName=str(SoldModelInfo.cell(1,5).value)
try:
ExpirationDate=dt.datetime.strftime(xldate_to_datetime(SoldModelInfo.cell(1,7).value),'%Y-%m-%d')
except WhateverError as e:
# do something, maybe set a default date?
ExpirationDate = default_date
# and/or record that it went wrong?
errors.append( [ "ExpirationDate", e ])
PolicyState=str(SoldModelInfo.cell(1,6).value)
...
# at the end
if not errors:
print("Insert data of " + file +" was successful")
else:
# things went wrong somewhere above.
# the contents of errors will let you work out what
As suggested you could use multiple try blocks on each of your extract variable, or you could streamline it with your own custom function that handles the try for you:
from functools import reduce, partial
def try_funcs(cell, default, funcs):
try:
return reduce(lambda val, func: func(val), funcs, cell)
except Exception as e:
# do something with your Exception if necessary, like logging.
return default
# Usage:
AccountName = try_funcs(SoldModelInfo.cell(1,5).value, "some default str value", str)
ExpirationDate = try_funcs(SoldModelInfo.cell(1,7).value), "some default date", [xldate_to_datetime, partial(dt.datetime.strftime, '%Y-%m-%d')])
PolicyState = try_funcs(SoldModelInfo.cell(1,6).value, "some default str value", str)
Here we use reduce to repeat multiple functions, and pass partial as a frozen function with arguments.
This can help your code look tidy without cluttering up with lots of try blocks. But the better, more explicit way is just handle the fields you anticipate might error out individually.
So, basically you need to wrap your xldate_to_datetime() call into try ... except
import datetime as dt
v = SoldModelInfo.cell(1,7).value
try:
d = dt.datetime.strftime(xldate_to_datetime(v), '%Y-%m-%d')
except TypeError as e:
print('Could not parse "{}": {}'.format(v, e)
Related
I have a file which contains extracted data in the form of python variables.
I expect this data to always come in form of 2 variables(check, output_check).
The problem is that there are cases in which data is either incomplete(just the check variable) or data could not be extracted(meaning no variable at all).The contents of the file will always be different based on the data it extracted.
Here is an example of a file:
check_1 = "Warning, the check has failed"
output_check_1 = "Here is a more detailed result."
check_2 = "Warning, the check has failed"
#There is no output_check_2 variable
#There is no check_3 at all
Next up, a function will generate a report based on the data from the file:
def create_report(check, output_check):
if check.startswith("Warning"):
print(check)
if output_check:
print(output_check)
else:
print("Sorry, couldn't get a more detailed result")
else:
print("Check was successful.")
#Let's call the function
create_report(check_1, output_check_1) #Works
create_report(check_2, output_check_2) #-> NameError because in this case output_check_2 does not exist
create_report(check_3, output_check_3) #-> NameError because none of the arguments exist
As a fix, I came up with the following:
try:
create_report(check_2, output_check_2)
except NameError:
output_check_2 = ""
try:
create_report(check_2, output_check_2)
except NameError:
print("Error - data could not be extracted from the server!")
This way, if argument 2 (output_check_2) is missing I will just receive the result of the check without detailed data and if both arguments are missing(check_3, output_check_3) I am going to receive an error stating that the data could not be extracted at all.
The thing is that I find my "fix" rather barbaric and I am looking for a cleaner way in which to have the same result given that the function will be called many times during execution.
Edit: The variables come from an extraced_data.py file which I import at the start of the script. Unfortunately, I have no access to the script which generated the variables in the fist place thus encountering this issue.
Assuming there's no way to fix how the data is stored*, you could use getattr to deal with the missing data. I'm assuming that you're doing something like this to import:
from dataScript import *
Change that to this:
import dataScript
Then you can do:
check1 = getattr(dataScript, "check1", None) # Will default to None if it doesn't exist
if check1 is not None:
create_report(check_1, getattr(dataScript, "output_check1", None))
Or, if you're on Python 3.8+ and the check variable will never be an empty string:
if check1 := getattr(dataScript, "check1", None):
create_report(check_1, getattr(dataScript, "output_check1", None))
If you have an arbitrary number of these variables, you may need to use a loop. Something like:
for i in range(MAX_VARS):
n = i + 1
if check := getattr(dataScript, f"check{n}", None):
create_report(check_, getattr(dataScript, f"output_check{n}", None))
Where MAX_VARS is the highest variable that you're expecting.
* The input format here is really the issue though. Using a Python script as a database that only sometimes has the correct data seems like the real problem. My solution above is just a workaround.
You could also pass (*argv) instead of (check, output_check). This allows you to pass any number of arguments into your function.
def create_report(*argv):
if argv[0].startswith("Warning"): # argv[0] is your 'check' variable
print(argv[0])
if len(argv) > 1:
print(argv[1]) # argv[1] is your output_check variable
else:
print("No more info.")
else:
print("Check successful")
All you have to do is to change create_report function.
def create_report(check, output_check):
if not check:
print("Error - data could not be extracted from the server!")
return
if not output_check:
output_check = ""
if check.startswith("Warning"):
print(check)
if output_check:
print(output_check)
else:
print("Sorry, couldn't get a more detailed result")
else:
print("Check was successful.")
Now you can just call the function without try-except block.
hi guys its my first time to ask help here i hope you can help me
i have this code that i write it
def my_function():
try :
with open('file.csv', 'r') as f:
data = list(csv.reader(f, delimiter=','))
i = 1
while i <= 10:
i += 1
fname = data[i][0]
lname = data[i][1]
options = Options()
driver = webdriver.Chrome(options=options)
driver.get("https://www.test.net/")
#Do staff
except Exception as e:
print(e)
driver.quit()
time.sleep(1)
print('******RESTART******')
my_function()
my_function()
well i'm trying to make this script run without stop .. the problem that i'm facing is when it stops for example in line number 8 (i = 8) and restart again ,it starts from first line (i = 1) .
i want the script to restart from line 8 and continue to 9 , 10 ...
can you please guide me to the right solution .. thank you
Your code is making this far more difficult than it needs to be.
First, you almost certainly don't want to wrap this entire block of code in a "catch all" exception handler. You want your exception handling to be sufficiently specific (limited) that you can do something meaningful with the exception. For example:
#!python
# Assumes Python version 3 or later
import sys, csv
filename='myfile.csv'
with open(filename as f:
try:
reader = csv.reader(f)
for record in reader:
if len(record) != 2:
# log error and continue
print('Malformed records in {}: {}'.format(filename, reader.line_num), file=sys.stderr)
continue
# do stuff with this record, knowing it has exactly two fields:
fname = record[0]
lname = record[1]
# etc ...
except csv.Error as e:
print('Error handling {} at line {}: {}'.format(filename, reader.line_num, e), file=sys.stderr)
Note that your errors probably weren't specifically in the csv module. It's pretty tolerant of malformed lines. But I'm showing how to wrap the reader and processing code within exception handling just for that. Your error was probably an IndexError (trying to access an item past the number of items in a list ... outside of its valid indexing range. It's better to just check the length of each record rather than use exception handling for that ... though it's possible either way.
There's a quite reasonable example (very similar code) in the documentation for the standard libraries: https://docs.python.org/3/library/csv.html
Also, stylistically, I'd suggest that a named tuple or a lightweight class (using __slots__) for managing these records. This would allow you to use dot notation to access the .fname and .lname of each rather than using [x] and numeric indexing. (Numeric indexing gets progressively more cumbersome and error prone as your code complexity increases).
You can set i to a key word argument with a default of 1 then on each exception pass the current i when you restart your function so it picks up from there.
This is a simplified example of what I'm recommending following the same general method you are using in your question (but with fake data so I can run it without having your CSV file).
def my_function(i=1):
try:
if i == 4: # to prevent forever recursion
return
else:
print(i) # keep track of loops
i += 1
x = int('te') # causes an error
except ValueError:
my_function(i) # send current i back through
my_function(i=0)
thank you for your quick response .. i tried the solution provided by –Kevin Welch and –Selcuk it works fine for me thx
here is the solution
def my_function():
try :
with open('file.csv', 'r') as f:
data = list(csv.reader(f, delimiter=','))
i = 1
while i <= 10:
i += 1
try :
fname = data[i][0]
lname = data[i][1]
options = Options()
driver = webdriver.Chrome(options=options)
driver.get("https://www.test.net/")
# Do staff
except Exception as e:
print(e)
driver.quit()
time.sleep(1)
print('******RESTAR******')
continue
my_function()
I am writing a code, that gathers some statistics about ontologies. as input I have a folder with files some are RDF/XML, some are turtle or nt.
My problem is, that when I try to parse a file using wrong format, next time even if I parse it with correct format it fails.
Here test file is turtle format. If first parse it with turtle format all is fine. but if I first parse it with the wrong format 1. error is understandable (file:///test:1:0: not well-formed (invalid token)), but error for second is (Unknown namespace prefix : owl). Like I said when I first parse with the correct one, I don't get namespace error.
Pleas help, after 2 days, I'm getting desperate.
query = 'SELECT DISTINCT ?s ?o WHERE { ?s ?p owl:Ontology . ?s rdfs:comment ?o}'
data = open("test", "r")
g = rdflib.Graph("IOMemory")
try:
result = g.parse(file=data,format="xml")
relations = g.query(query)
print(( " graph has %s statements." % len(g)))
except:
print "bad1"
e = sys.exc_info()[1]
print e
try:
result = g.parse(file=data,format="turtle")
relations = g.query(query)
print(( " graph has %s statements." % len(g)))
except :
print "bad2"
e = sys.exc_info()[1]
print e
The problem is that the g.parse reads some part from the file input stream of data first, only to figure out afterwards that it is not xml. The second call (with the turtle format) then continues to read from the input stream after the part where the previous attempt has stopped. The part read by the first parser is lost to the secnd one.
If your test file is small, the xml-parser might have read it all, leaving an "empty" rest. It seems the turtle parser did not complain - it just read in nothing. Only the query in the next statement failed to find anything owl-like in it, as the graph is empty. (I have to admit I cannot reproduce this part, the turtle parser does complain in my case, but maybe I have a different version of rdflib)
To fix it, try to reopen the file; either reorganize the code so you have an data = open("test", "r") every time you call result = g.parse(file=data, format="(some format)"), or call data.seek(0) in the except: clause, like:
for format in 'xml','turtle':
try:
print 'reading', format
result = g.parse(data, format=format)
print 'success'
break
except Exception:
print 'failed'
data.seek(0)
I am writing a code to parse through a bunch of xml files. It basically looks like this:
for i in range(0, 20855):
urlb = str(i)
url = urla + urlb
trys=0
t=0
while (trys < 3):
try:
cfile = UR.urlopen(url)
trys = 3
except urllib.error.HTTPError as e:
t=t+1
print('error at '+str(time.time()-tstart)+' seconds')
print('typeID = '+str(i))
print(e.code)
print(e.read())
time.sleep (0.1)
trys=0+t
tree = ET.parse(cfile) ##parse xml file
root = tree.getroot()
...do a bunch of stuff with i and the file data
I'm having a problem with some of the urls I'm calling not actually containing an xml file which breaks my code. I have a list of all the actual numbers that I use instead of the range shown but i really don't want to go through all 21000 and remove each number that fails. Is there an easier way to get around this? I get an error from the while loop (which i have to deal with timeouts really) that looks like this:
b'A non-marketable type was given'
error at 4.321678161621094 seconds
typeID = 31
400
So I was thinking there has to be a good way to bail out of that iteration of the for-loop if my while-loop returns three errors but i can't use break. Maybe an if/else-loop under the while-loop that just passes if the t variable is 3?
You might try this:
for i in range(0, 20855):
url = '%s%d' % (urla, i)
for trys in range(3):
try:
cfile = UR.urlopen(url)
break
except urllib.error.HTTPError as e:
print('error at %s seconds' % (time.time()-tstart))
print('typeID = %i'%i)
print(e.code)
print(e.read())
time.sleep(0.1)
else:
print "retry failed 3 times"
continue
try:
tree = ET.parse(cfile) ##parse xml file
except Exception, e:
print "cannot read xml"
print e
continue
root = tree.getroot()
...do a bunch of stuff with i and the file data
Regarding your "algorithmic" problem: You can always set an error state (as simple as e.g. last_iteration_successful = False) in the while body, then break out of the while body, then check the error state in the for body, and conditionally break out of the for body, too.
Regarding architecture: Prepare your code for all relevant errors that might occur, via proper exception handling with try/except blocks. It might also make sense to define custom Exception types, and then raise them manually. Raising an exception immediately interrupts the current control flow, it could save many breaks.
So i am working on this code below. It complied alright when my Reff.txt has more than one line. But it doesnt work when my Reff.txt file has one line. Why is that? I also wondering why my code doesn't run "try" portion of my code but it always run only "exception" part.
so i have a reference file which has a list of ids (one id per line)
I use the reference file(Reff.txt) as a reference to search through the database from the website and the database from the server within my network.
The result i should get is there should be an output file and file with information of that id; for each reference id
However, this code doesn't do anything on my "try:" portion at all
import sys
import urllib2
from lxml import etree
import os
getReference = open('Reff.txt','r') #open the file that contains list of reference ids
global tID
for tID in getReference:
tID = tID.strip()
try:
with open(''+tID.strip()+'.txt') as f: pass
fileInput = open(''+tID+'.txt','r')
readAA = fileInput.read()
store_value = (readAA.partition('\n'))
aaSequence = store_value[2].replace('\n', '') #concatenate lines
makeList = list(aaSequence)#print makeList
inRange = ''
fileAddress = '/database/int/data/'+tID+'.txt'
filename = open(fileAddress,'r')#name of the working file
print fileAddress
with open(fileAddress,'rb') as f:
root = etree.parse(f)
for lcn in root.xpath("/protein/match[#dbname='PFAM']/lcn"):#find dbname =PFAM
start = int(lcn.get("start"))#if it is PFAM then look for start value
end = int(lcn.get("end"))#if it is PFAM then also look for end value
while start <= end:
inRange = makeList[start]
start += 1
print outputFile.write(inRange)
outputFile.close()
break
break
break
except IOError as e:
newURL ='http://www.uniprot.org/uniprot/'+tID+'.fasta'
print newURL
response = urllib2.urlopen(''+newURL) #go to the website and grab the information
creatNew = open(''+uniprotID+'.txt','w')
html = response.read() #read file
creatNew.write(html)
creatNew.close()
So, when you do Try/Except - if try fails, Except runs. Except is always running, because Try is always failing.
Most likely reason for this is that you have this - "print outputFile.write(inRange)", but you have not previously declared outputFile.
ETA: Also, it looks like you are only interested in testing to the first pass of the for loop? You break at that point. Your other breaks are extraneous in that case, because they will never be reached while that one is there.