How to write try except for loading data - python

I'm pretty new to coding so I apologize for this being stupid question. I'm writing a spark function that takes in a file path and file type and creates a dataframe. If the input is invalid, I want to just print some sort of error message and return an empty dataframe. Would I use try except?
def rdf(name, type):
try:
df=spark.read.format(type).load(name)
return df
except ____ as error:
print(error)
return "" #I want to return an empty RDD here, but I can't figure out how to make one
How do I know what goes in the ____? I tried org.apache.spark.SparkException because that's the error I get when I pass in a .csv file as a parquet and it breaks but that isn't working

Welcome to StackOverflow!
You can catch multiple exceptions in the try-except block; for instance:
def rdf(name, type):
try:
df=spark.read.format(type).load(name)
return df
except (SparkException, TypeError) as error:
print(error)
return ""
You could replace or add errors to that tuple.
Using a Exception will potentially silence errors that are unrelated to your code (like a networking issue if name is an S3 path). That is probably something you want your program to not handle.

Use Exception if you don't know what exception it might be:
def rdf(name, type):
try:
df=spark.read.format(type).load(name)
return df
except Exception as error:
print(error)
return ""
WARNING: This is not good practice as it could silence errors that would be useful during debugging and troubleshooting.
(Thanks to #RafaelBarros)

Related

Error handling and saving error in python

I have a for loop that is iteration through a column and doing some xml processing. There are some errors that I am encountering, but rather than just pass them, I would like to save them into a seperate list as part of the output. This is what I have so far:
def outputfunc(column_name):
output = []
errors = []
for i in column_name:
try:
tree = etree.fromstring(i)
except:
errors.append(i) #basic logic here is that i append errors with i that raise error and then pass
pass
Elaborating on Carcigenicates comment, if you want to find all errors using the Exception base class, you can change your code to:
def outputfunc(column_name):
output = []
errors = []
for i in column_name:
try:
tree = etree.fromstring(i)
except Exception as e:
errors.append(e) #basic logic here is that i append errors with i that raise error and then pass
This sets e to the text of the error message. e is then appended to errors. Furthermore, I've removed a nonfunctional pass statement. However, you currently have no way to output errors, so I suggest using
print('\n'.join(map(str,errors)))
Within the scope of outputfunc. Furthermore, as Paul Cornelius suggests, returning your output and errors in tuple format would be useful. This would make your code
def outputfunc(column_name):
output = []
errors = []
for i in column_name:
try:
tree = etree.fromstring(i)
except Exception as e:
errors.append(e) #basic logic here is that i append errors with i that raise error and then pass
print('\n'.join(map(str,errors)))
return output, errors
References
2. Lexical analysis — Python 3.9.6 documentation
Built-in Exceptions — Python 3.9.6 documentation

Understanding `traceback.format_exception_only()`

I wanted to just print exception type and message.
I initially tried:
try:
raise Exception("duh!!!")
except Exception as err:
print(err)
But this only printed exception message.
duh!!!
Then I went through the whole traceback doc and I felt traceback.format_exception_only() is the one I am looking for.
So i tried it as follows:
try:
raise Exception("duh!!!")
except:
etype, evalue, tb = sys.exc_info()
print(traceback.format_exception_only(etype, evalue)))
and it printed following:
['Exception: duh!!!\n']
which looked a bit unexpected to me. So I re-read the doc for this method. It says following:
Format the exception part of a traceback. The arguments are the exception type and value such as given by sys.last_type and sys.last_value. The return value is a list of strings, each ending in a newline. Normally, the list contains a single string; however, for SyntaxError exceptions, it contains several lines that (when printed) display detailed information about where the syntax error occurred. The message indicating which exception occurred is the always last string in the list.
So I understood that the doc says its a list, which is why there is a list in the output [...]. Also the doc says each line ends in newline, thats why there is \n in the output. But I dont get when there will be multiple lines in case of SyntaxError? I am not able to produce SyntaxError which will result in multiple lines in the return value of format_exception_only().
Also it suddenly clicked to me that I can simply do
try:
raise Exception("duh!!!")
except:
etype, evalue, tb = sys.exc_info()
print('{}: {}'.format(etype.__name__, evalue))
to get
Exception: duh!!!
But then how format_exception_only() adds more value to this?
Python is open source so you can view the implementation and decide what the extra value is for yourself:
if self.exc_type is None:
yield _format_final_exc_line(None, self._str)
return
stype = self.exc_type.__qualname__
smod = self.exc_type.__module__
if smod not in ("__main__", "builtins"):
stype = smod + '.' + stype
if not issubclass(self.exc_type, SyntaxError):
yield _format_final_exc_line(stype, self._str)
else:
yield from self._format_syntax_error(stype)
It's a fairly simple function so yes, per your question if you ignore the parts of the functionality you don't use it doesn't add much value to you.
Of course, you're now on the hook for maintaining your code if there's any Exception changes in the future, or if turns out the Exception raiser starts returning SyntaxErrors, or any other edge cases that come in the future.
Reduced maintenance and increased readability (because everyone knows what the library code does) are the two relatively universal advantages of using standard library code.

catch specific error message in Python

I need to catch when one of my dependencies throws a specific ValueError and deal with that a certain way, and otherwise re-raise the error.
I don't find any recent questions that deal with this in a way that's Python 3 compliant, and that deals with cases where the only thing distinguishing errors returned is the string message.
This post is probably the closest: Python: Catching specific exception
Something like this-- catch specific HTTP error in python --won't work because I'm not using a dependency that also supplies specific codes like an HTTP error would have.
Here's my attempt:
try:
spect, freq_bins, time_bins = spect_maker.make(syl_audio,
self.sampFreq)
except ValueError as err:
if str(err) == 'window is longer than input signal':
warnings.warn('Segment {0} in {1} with label {2} '
'not long enough for window function'
' set with current spect_params.\n'
'spect will be set to nan.')
spect, freq_bins, time_bins = (np.nan,
np.nan,
np.nan)
else:
raise
If it matters, the dependency is scipy and I need to catch when the spectrogram fails for a specific reason (the segment I'm taking a spectrogram of is shorter than the window function).
I realize my approach is fragile because it depends on the error string not changing, but the error string is the only thing that distinguishes it from other ValueErrors returned by the same function. So I plan to have a unit test to defend myself against that.
Ok, so based on other people's comments, I'm guessing it should be something like this:
# lower-level module
class CustomError(Exception):
pass
# in method
Class Thing:
def __init__(prop1):
self.prop1 = prop1
def method(self,element):
try:
dependency.function(element,self.prop1)
except ValueError as err:
if str(err) == 'specific ValueError':
raise CustomError
else:
raise # re-raise ValueError because string not recognized
# back in higher-level module
thing = lowerlevelmodule.Thing(prop1)
for element in list_of_stuff:
try:
output = thing.method(element)
except CustomError:
output = None
warnings.warn('set output to None for {} because CustomError'.
format(element))

Handling DisambiguationError?

I'm using the wikipedia library and I want to handle the DisambiguationError as an exception. My first try was
try:
wikipedia.page('equipment') # could be any ambiguous term
except DisambiguationError:
pass
During execution line 3 isn't reached. A more general question is: how can I find the error type for a library-specific class like this?
Here's a working example:
import wikipedia
try:
wikipedia.page('equipment')
except wikipedia.exceptions.DisambiguationError as e:
print("Error: {0}".format(e))
Regarding to your more general question how can I find the error type for a library-specific class like this?, my trick is actually quite simple, I tend to capture Exception and then just printing the __class__, that way I'll know what specific Exception I need to capture.
One example of figuring out which specific exception to capture here:
try:
0/0
except Exception as e:
print("Exception.__class__: {0}".format(e.__class__))
This would print Exception.__class__: <type 'exceptions.ZeroDivisionError'>, so I knew exceptions.ZeroDivisionError would be the exact Exception to deal with instead of something more generic

Numpy.savetxt - how can I ensure that saving is complete?

I would like to use the numpy.savetxt function, however from the documentation there doesn't seem to be a way to have a flag indicating if the file has been saved returned.
Is there any other way to ensure that the document was saved before continuing?
My problem is that when I save my document, the next line opens that document and I get some problems. I used a for loop to open the document several times and compared the results. The first time it opened it was ok. After that the values are incorrect and the same.
Inside a for-loop
savetxt('forest_submitfile.csv', end_matrix , delimiter=',', fmt='%s,%s,%s',
header='EventId,RankOrder,Class', comments = '')
print('Saving for Submit in CSV SUCCESS')
is_file_ok = False
while not is_file_ok:
if os.path.isfile("forest_submitfile.csv") and os.access("forest_submitfile.csv", os.R_OK):
break
print('Calculate AMS Metric Score')
AMS_metric("solutionFile.csv", "forest_submitfile.csv")
You can use a couple functions from os to check for you.
isfile checks for the existence of a file.
R_OK checks that it is in a readable state, which implies numpy is done writing to it.
yourFile = "C:\folder\folder\file.txt"
import os
if os.path.isfile(yourFile) and os.access(yourFile, os.R_OK):
# if you got into this check, your file is good to go!
According to the source, it can raise a ValueError or an AttributeError on failure. So, maybe catch those:
try:
np.savetxt('file', dataStructure)
except ValueError, e:
print('Save failed! {}'.format(str(e))
raise SystemError
except AttributeError, e:
print('Save failed! {}'.format(str(e))
raise SystemError
Hope this helps...

Categories

Resources