How To Handle Exceptions From an "Opaque" Iterator

How To Handle Exceptions From an "Opaque" Iterator - python

What is the best way to handle exceptions from an "opaque" iterator? Specifically, I am using Pandas read_csv with the chunksize option (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) to read large files. I have some corrupt CSV files that, for the sake of this question, are too big/numerous to fix, and I don't need every line, besides. I'm happy to read as far as I can before the error crops up, and then just give up on that file and/or skip over the problematic chunk/line. Here's the specific error I'm getting (with non-pandas precursor stuff omitted):
File "/home/hadoop/miniconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 1107, in __next__
return self.get_chunk()
File "/home/hadoop/miniconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 1167, in get_chunk
return self.read(nrows=size)
File "/home/hadoop/miniconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 1133, in read
ret = self._engine.read(nrows)
File "/home/hadoop/miniconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 2037, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 887, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 81306332
I've already set error_bad_lines=False in read_csv, so that's not the solution. I think I could set engine='python' (as in this post) but if I can keep the speed of C, that's preferable (and again, I'm fine losing some data as long as I don't get errors). I tried something like this in the for loop:
import pandas.errors
def myiter():
for i in range(2):
if i == 0:
yield i
else:
raise pandas.errors.ParserError
my = myiter()
for this in my:
try:
print(this)
except:
continue
but it doesn't work because the iterator, not the yield-ed item, raises the error:
0
---------------------------------------------------------------------------
ParserError Traceback (most recent call last)
<ipython-input-5-5ed872953186> in <module>
1 my = myiter()
----> 2 for this in my:
3 try:
4 print(this)
5 except:
<ipython-input-3-e8331b7081bf> in myiter()
4 yield i
5 else:
----> 6 raise pandas.errors.ParserError
ParserError:
What is the correct to handle this? How do I handle a loop error like this gracefully and appropriately?

Something like this seems to work.
my = myiter()
while True:
try:
print('trying')
this = next(my)
print(this)
except pandas.errors.ParserError:
print('ParserError')
continue
except StopIteration:
print('break')
break
trying
0
trying
ParserError
trying
break

Related

Error tokenizing data. C error: out of memory - python

I am trying to read 4 .txt files delimited by |.
As one of them is over 1Gb df_tradeCash_mhi = pd.concat(chunk_read(mhi_tradeCashFiles, "MHI"))
I found the 'chunk' method to read them, but I am getting Error tokenizing data. Out of memory.
Does anyone know how I can solve this problem?
Below is my code
def findmefile(directory, containsInFilename):
entity_filenames = {}
for file in os.listdir(directory):
if containsInFilename in file:
if file[:5] == "Trade":
entity_filenames["MHI"] = file
else:
entity_filenames[re.findall("(.*?)_", file)[0]] = file
return entity_filenames
# Get the core Murex file names
mhi_tradeFiles = findmefile(CoreMurexFilesLoc, "Trade")
mhi_tradeCashFiles = findmefile(CoreMurexFilesLoc, "TradeCash_")
mheu_tradeFiles = findmefile(CoreMurexFilesLoc, "MHEU")
mheu_tradeCashFiles = findmefile(CoreMurexFilesLoc, "MHEU_TradeCash")
# Read the csv using chunck
mylist = []
size = 10**2
def chunk_read(fileName, entity):
for chunk in pd.read_csv(
CoreMurexFilesLoc + "\\" + fileName[entity],
delimiter="|",
low_memory=False,
chunksize=size,
):
mylist.append(chunk)
return mylist
df_trade_mhi = pd.concat(chunk_read(mhi_tradeFiles, "MHI"))
df_trade_mheu = pd.concat(chunk_read(mheu_tradeFiles, "MHEU"))
df_tradeCash_mheu = pd.concat(chunk_read(mheu_tradeCashFiles, "MHEU"))
df_tradeCash_mhi = pd.concat(chunk_read(mhi_tradeCashFiles, "MHI"))
df_trades = pd.concat(
[df_trade_mheu, df_trade_mhi, df_tradeCash_mheu, df_tradeCash_mhi]
)
del df_trade_mhi
del df_tradeCash_mhi
del df_trade_mheu
del df_tradeCash_mheu
# Drop any blank fields and duplicates
nan_value = float("NaN")
df_trades.replace("", nan_value, inplace=True)
df_trades.dropna(subset=["MurexCounterpartyRef"], inplace=True)
df_trades.drop_duplicates(subset=["MurexCounterpartyRef"], inplace=True)
counterpartiesList = df_trades["MurexCounterpartyRef"].tolist()
print(colored('All Core Murex trade and tradeCash data loaded.', "green"))
Error:
Traceback (most recent call last):
File "h:\DESKTOP\test_check\check_securityPrices.py", line 52, in <module>
df_tradeCash_mhi = pd.concat(chunk_read(mhi_tradeCashFiles, "MHI"))
File "h:\DESKTOP\test_check\check_securityPrices.py", line 39, in chunk_read
for chunk in pd.read_csv(
File "C:\Users\MIRABR\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers\readers.py", line 1024, in __next__
return self.get_chunk()
File "C:\Users\MIRABR\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers\readers.py", line 1074, in get_chunk
return self.read(nrows=size)
File "C:\Users\MIRABR\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers\readers.py", line 1047, in read
index, columns, col_dict = self._engine.read(nrows)
File "C:\Users\MIRABR\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 228, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 783, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 857, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1925, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: out of memory

I think the problem is obvious - you're running out of memory because you're trying to load so much data into memory at once, and then process it.
You need to either:
get a machine with more memory.
re-architect the solution to use a pipelined approach using either a generator or coroutine pipeline to do the processing stepwise over your data.
The problem with the first approach is it won't scale indefinitely and is expensive. The second way is the right way to do it, but needs more coding.
As a good reference on generator/coroutine type pipeline approaches check out any of the pycon talks by David Beazley.

How can I concatenate 3 large tweet dataframe (csv) files each having approximately 5M tweets?

I have three csv dataframes of tweets, each ~5M tweets. The following code for concatenating them exists with low memory error. My machine has 32GB memory. How can I assign more memory for this task in pandas?
df1 = pd.read_csv('tweets.csv')
df2 = pd.read_csv('tweets2.csv')
df3 = pd.read_csv('tweets3.csv')
frames = [df1, df2, df3]
result = pd.concat(frames)
result.to_csv('tweets_combined.csv')
The error is:
$ python concantenate_dataframes.py
sys:1: DtypeWarning: Columns (0,1,2,3,4,5,6,8,9,10,11,12,13,14,19,22,23,24) have mixed types.Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "concantenate_dataframes.py", line 19, in <module>
df2 = pd.read_csv('tweets2.csv')
File "/home/mona/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/mona/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 454, in _read
data = parser.read(nrows)
File "/home/mona/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1133, in read
ret = self._engine.read(nrows)
File "/home/mona/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 2037, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 859, in pandas._libs.parsers.TextReader.read
UPDATE: tried the suggestions in the answer and still get error
$ python concantenate_dataframes.py
Traceback (most recent call last):
File "concantenate_dataframes.py", line 18, in <module>
df1 = pd.read_csv('tweets.csv', low_memory=False, error_bad_lines=False)
File "/home/mona/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/mona/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 454, in _read
data = parser.read(nrows)
File "/home/mona/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1133, in read
ret = self._engine.read(nrows)
File "/home/mona/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 2037, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 862, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 943, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 2070, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
File "pandas/_libs/parsers.pyx", line 874, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 928, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 915, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2070, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
I am running the code on Ubuntu 20.04 OS

I think this is problem with malformed data (some data not structure properly in tweets2.csv) for that you can use error_bad_lines=False and try to chnage engine from c to python like engine='python'
ex : df2 = pd.read_csv('tweets2.csv', error_bad_lines=False)
or
ex : df2 = pd.read_csv('tweets2.csv', engine='python')
or maybe
ex : df2 = pd.read_csv('tweets2.csv', engine='python', error_bad_lines=False)
but I recommand to identify those revord and repair that.
And also if you want hacky way to do this than use
1) https://askubuntu.com/questions/941480/how-to-merge-multiple-files-of-the-same-format-into-a-single-file
2) https://askubuntu.com/questions/656039/concatenate-multiple-files-without-headerenter link description here

Specify dtype option on import or set low_memory=False

Conversion from big csv to parquet using python error

I have csv file that approximately has 200+ cols and 1mil+ rows. When I am converting from csv to python, i had error:
csv_file = 'bigcut.csv'
chunksize = 100_000
parquet_file ='output.parquet'
parser=argparse.ArgumentParser(description='Process Arguments')
parser.add_argument("--fname",action="store",default="",help="specify <run/update>")
args=parser.parse_args()
argFname=args.__dict__["fname"]
csv_file=argFname
csv_stream = pd.read_csv(csv_file, encoding = 'utf-8',sep=',', >chunksize=chunksize, low_memory=False)
for i, chunk in enumerate(csv_stream):
print("Chunk", i)
if i==0:
parquet_schema = pa.Table.from_pandas(df=chunk).schema
parquet_writer = pq.ParquetWriter(parquet_file, parquet_schema, compression='snappy')
table = pa.Table.from_pandas(chunk, schema=parquet_schema)
parquet_writer.write_table(table)
parquet_writer.close()
When I ran, it produces the following error
File "pyconv.py", line 25, in <module>
table = pa.Table.from_pandas(chunk, schema=parquet_schema)
File "pyarrow/table.pxi", line 1217, in pyarrow.lib.Table.from_pandas
File "/home/cloud-user/pydev/py36-venv/lib64/python3.6/site-packages/pyarrow/pandas_compat.py", line 387, in dataframe_to_arrays
convert_types))
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
yield fs.pop().result()
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/cloud-user/pydev/py36-venv/lib64/python3.6/site-packages/pyarrow/pandas_compat.py", line 376, in convert_column
raise e
File "/home/cloud-user/pydev/py36-venv/lib64/python3.6/site-packages/pyarrow/pandas_compat.py", line 370, in convert_column
return pa.array(col, type=ty, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 169, in pyarrow.lib.array
File "pyarrow/array.pxi", line 69, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ("'utf-32-le' codec can't decode bytes in position 0-3: code point not in range(0x110000)", 'Conversion failed for column agent_number__c with type float64')
I am new pandas/pyarrow/python, if anyone has any recommendation what should i do next to debug is appreciated.

'utf-32-le' codec can't decode bytes in position 0-3
It looks like a library is trying to decode your data in utf-32-le whereas you read the csv data as utf-8.
So you'll somehow have to tell that reader (pyarrow.lib) to read as utf-8 (I don't know Python/Parquet so I can't provide the exact code to do this).

The csv has around 3mils records. I managed to catch 1 potential problem.
On 1 of the col has data type of string/text. Somehow most of them are numeric, however some of them is mixed with text, for example many of them are 1000,230,400 etc but few of them was being entered like 5k, 100k, 29k.
So the code somehow did not like it try to set as number/int all around.
Can you advise?

Python Decompressing gzip csv in pandas csv reader

The following code works in Python3 but fails in Python2
r = requests.get("http://api.bitcoincharts.com/v1/csv/coinbaseUSD.csv.gz", stream=True)
decompressed_file = gzip.GzipFile(fileobj=r.raw)
data = pd.read_csv(decompressed_file, sep=',')
data.columns = ["timestamp", "price" , "volume"] # set df col headers
return data
The error I get in Python2 is the following:
TypeError: 'int' object has no attribute '__getitem__'
The error is on the line where I set data equal to pd.read_csv(...)
Seems to be a pandas error to me
Stacktrace:
Traceback (most recent call last):
File "fetch.py", line 51, in <module>
print(f.get_historical())
File "fetch.py", line 36, in get_historical
data = pd.read_csv(f, sep=',')
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 449, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 818, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1695, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 562, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 760, in pandas._libs.parsers.TextReader._get_header
File "pandas/_libs/parsers.pyx", line 965, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2197, in pandas._libs.parsers.raise_parser_error
io.UnsupportedOperation: seek

The issue from the traceback you posted is related to the fact that the Response object's raw attribute is a file-like object that does not support the .seek method that typical file objects support. However, when ingesting the file object with pd.read_csv, pandas (in python2) seems to be making use of the seek method of the provided file object.
You can confirm that the returned response's raw data is not seekable by calling r.raw.seekable(), which should normally return False.
The way to circumvent this issue may be to wrap the returned data into an io.BytesIO object as follows:
import gzip
import io
import pandas as pd
import requests
# file_url = "http://api.bitcoincharts.com/v1/csv/coinbaseUSD.csv.gz"
file_url = "http://api.bitcoincharts.com/v1/csv/aqoinEUR.csv.gz"
r = requests.get(file_url, stream=True)
dfile = gzip.GzipFile(fileobj=io.BytesIO(r.raw.read()))
data = pd.read_csv(dfile, sep=',')
print(data)
0 1 2
0 1314964052 2.60 0.4
1 1316277154 3.75 0.5
2 1316300526 4.00 4.0
3 1316300612 3.80 1.0
4 1316300622 3.75 1.5
As you can see, I used a smaller file from the directory of files available. You can switch this to your desired file.
In any case, io.BytesIO(r.raw.read()) should be seekable, and therefore should help avoid the io.UnsupportedOperation exception you are encountering.
As for the TypeError exception, it is inexistent in this snippet of code.
I hope this helps.

Shorten large stack traces when using libraries

I work very often with large libraries like pandas, or matplotlib.
This means that exceptions often produce long stack traces.
Since the error is extremely rarely with the library, and extremely often with my own code, I don't need to see the library detail in the vast majority of cases.
A couple of common examples:
Pandas
>>> import pandas as pd
>>> df = pd.DataFrame(dict(a=[1,2,3]))
>>> df['b'] # Hint: there _is_ no 'b'
Here I've attempted to access an unknown key. This simple error produces a stacktrace containing 28 lines:
Traceback (most recent call last):
File "an_arbitrary_python\lib\site-packages\pandas\core\indexes\base.py", line 2393, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)
File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)
File "pandas\_libs\hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)
File "pandas\_libs\hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)
KeyError: 'b'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "an_arbitrary_python\lib\site-packages\pandas\core\frame.py", line 2062, in __getitem__
return self._getitem_column(key)
File "an_arbitrary_python\lib\site-packages\pandas\core\frame.py", line 2069, in _getitem_column
return self._get_item_cache(key)
File "an_arbitrary_python\lib\site-packages\pandas\core\generic.py", line 1534, in _get_item_cache
values = self._data.get(item)
File "an_arbitrary_python\lib\site-packages\pandas\core\internals.py", line 3590, in get
loc = self.items.get_loc(item)
File "an_arbitrary_python\lib\site-packages\pandas\core\indexes\base.py", line 2395, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)
File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)
File "pandas\_libs\hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)
File "pandas\_libs\hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)
KeyError: 'b'
Knowing that I ended up in hashtable_class_helper.pxi is almost never helpful for me. I need to know where in my code I've messed up.
Matplotlib
>>> import matplotlib.pyplot as plt
>>> import matplotlib.cm as cm
>>> def foo():
... plt.plot([1,2,3], cbap=cm.Blues) # cbap is a typo for cmap
...
>>> def bar():
... foo()
...
>>> bar()
This time, there's a typo in my keyword argument. But I still have to see 25 lines of stack trace:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in bar
File "<stdin>", line 2, in foo
File "an_arbitrary_python\lib\site-packages\matplotlib\pyplot.py", line 3317, in plot
ret = ax.plot(*args, **kwargs)
File "an_arbitrary_python\lib\site-packages\matplotlib\__init__.py", line 1897, in inner
return func(ax, *args, **kwargs)
File "an_arbitrary_python\lib\site-packages\matplotlib\axes\_axes.py", line 1406, in plot
for line in self._get_lines(*args, **kwargs):
File "an_arbitrary_python\lib\site-packages\matplotlib\axes\_base.py", line 407, in _grab_next_args
for seg in self._plot_args(remaining, kwargs):
File "an_arbitrary_python\lib\site-packages\matplotlib\axes\_base.py", line 395, in _plot_args
seg = func(x[:, j % ncx], y[:, j % ncy], kw, kwargs)
File "an_arbitrary_python\lib\site-packages\matplotlib\axes\_base.py", line 302, in _makeline
seg = mlines.Line2D(x, y, **kw)
File "an_arbitrary_python\lib\site-packages\matplotlib\lines.py", line 431, in __init__
self.update(kwargs)
File "an_arbitrary_python\lib\site-packages\matplotlib\artist.py", line 885, in update
for k, v in props.items()]
File "an_arbitrary_python\lib\site-packages\matplotlib\artist.py", line 885, in <listcomp>
for k, v in props.items()]
File "an_arbitrary_python\lib\site-packages\matplotlib\artist.py", line 878, in _update_property
raise AttributeError('Unknown property %s' % k)
AttributeError: Unknown property cbap
Here I get to find out that I ended on a line in artist.py that raises an AttributeError, and then see directly underneath that the AttributeError was indeed raised. This is not much value add in information terms.
In these trivial, interactive examples, you might just say "Look at the top of the stack trace, not the bottom", but often my foolish typo has occurred within a function so the line I'm interested in is somewhere in the middle of these library-cluttered stack traces.
Is there any way I can make these stack traces less verbose, and help me find the source of the problem, which almost always lies with my own code and not in the libraries I happen to be employing?

You can use traceback to have better control over exception printing. For example:
import pandas as pd
import traceback
try:
df = pd.DataFrame(dict(a=[1,2,3]))
df['b']
except Exception, e:
traceback.print_exc(limit=1)
exit(1)
This triggers the standard exception printing mechanism, but only shows you the first frame of the stack trace (which is the one you care about in your example). For me this produces:
Traceback (most recent call last):
File "t.py", line 6, in <module>
df['b']
KeyError: 'b'
Obviously you lose the context, which will be important when debugging your own code. If we want to get fancy, we can try and devise a test and see how far the traceback should go. For example:
def find_depth(tb, continue_test):
depth = 0
while tb is not None:
filename = tb.tb_frame.f_code.co_filename
# Run the test we're given against the filename
if not continue_test(filename):
return depth
tb = tb.tb_next
depth += 1
I don't know how you're organising and running your code, but perhaps you can then do something like:
import pandas as pd
import traceback
import sys
def find_depth():
# ... code from above here ...
try:
df = pd.DataFrame(dict(a=[1, 2, 3]))
df['b']
except Exception, e:
traceback.print_exc(limit=get_depth(
sys.exc_info()[2],
# The test for which frames we should include
lambda filename: filename.startswith('my_module')
))
exit(1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How To Handle Exceptions From an "Opaque" Iterator - python

Something like this seems to work. my = myiter() while True: try: print('trying') this = next(my) print(this) except pandas.errors.ParserError: print('ParserError') continue except StopIteration: print('break') break trying 0 trying ParserError trying break

Related

Error tokenizing data. C error: out of memory - python

How can I concatenate 3 large tweet dataframe (csv) files each having approximately 5M tweets?

Conversion from big csv to parquet using python error

Python Decompressing gzip csv in pandas csv reader

Shorten large stack traces when using libraries

Categories

Resources