Mysterious IndexError - python

I've been getting errors similar to this recently:
IndexError Traceback (most recent call last)
<ipython-input-124-59ca523b1b36> in <module>()
----> 1 first_experiment_comb(model)
c:\python26\26664\lib\site-packages\experiments.py in first_experiment_comb(mod
l)
172 "Number NZ: " + str(modelz[j].NumNZs) +"\n")
173
--> 174 first_experiment(modelz[j], str(j))
175
176
c:\python26\26664\lib\site-packages\experiments.py in first_experiment(model, e
t)
89 plt.close()
90
---> 91 fl.timberFlow(model)
92 plt.savefig(dire + "\\timber_flow" +ext+".pdf", bbox_inches = 0)
93 plt.close()
C:\Python26\26664\lib\site-packages\func_lib.py in timberFlow(model)
304 if not unVars:
305 unVars = varValues(model, 'PIEHTLVOL')
--> 306
307 for i in range(19):
308 swVarVals.append(swVars[i].X)
IndexError: list index out of range
Where the final line of the trace points to code that doesn't exist, or in previous cases has been commented out. When I run the last function (in func_lib.py) on it's own I never get the mysterious IndexError, only when it's called from experiments.py.
I'm running this in pylab python 2.6 W64.
I haven't been able to find known bug in iPython or Pylab docs about this.
How could line 306 be the root of the error?

Your code is out of sync with the bytecode. Reload your code properly.
When an exception occurs, the bytecode is inspected for a filename and a linenumber, and then the sourcefile is loaded to show the original source for that line.
If, however, you changed the source but did not yet restart your python process (or reloaded the code in ipython) then the wrong lines are being shown when an exception occurs.

Related

df_data not defined, unsure of the cause

working through a tutorial that is supposed to help students do the assignment, but I'm encountering a problem. I'm using python on a notebook project in IBM. Right now the section is simply data exploration. However this error is occurring and I'm not sure how to fix it, no one else seemed to have this problem in this class and the teacher is rather slow to help so I came here!
I tried just defining the variable before its called, but no dice either way.
All the code prior to this is just importing libraries and then parsing the data
# Infer the data type of each column and convert the data to the inferred data type
from ingest import *
eu = ExtensionUtils(sqlContext)
df_data_1 = eu.convertTypes(df_data_1)
df_data_1.printSchema()
the error I'm getting is
TypeError Traceback (most recent call last)
<ipython-input-14-33250ae79106> in <module>()
2 from ingest import *
3 eu = ExtensionUtils(sqlContext)
----> 4 df_data_1 = eu.convertTypes(df_data_1)
5 df_data_1.printSchema()
/opt/ibm/third-party/libs/python3/ingest/extension_utils.py in convertTypes(self, input_obj, dictVal)
304 """
305
--> 306 checkEnrichType_or_DataFrame("input_obj",input_obj)
307 self.logger = self._jLogger.getLogger(__name__)
308 methodname = str(inspect.stack()[0][3])
/opt/ibm/third-party/libs/python3/ingest/extension_utils.py in checkEnrichType_or_DataFrame(param, paramval)
81 if not isinstance(paramval,(EnrichType ,DataFrame)):
82 raise TypeError("%s should be a EnrichType class object or DataFrame, got type %s"
---> 83 % (str(param), type(paramval)))
84
85
TypeError: input_obj should be a EnrichType class object or DataFrame, got type <class 'NoneType'>
The solution was not with the code itself but rather with the notebook. A code snippet from a built in function needed to be inserted first before this.

numpy.where produces inconsistent results

I have a piece of code where I need to look for an index of a value in a numpy array.
For this task, I use numpy.where.
The problem is that numpy.where produces a wrong result, i.e. returns an empty array, in situations where I am certain that the searched value is in the array.
To make things worse, I tested that the element is really in the array with a for loop, and in case it is found, also look for it with numpy.where.
Oddly enough, then it finds a result, while literally a line later, it doesnt.
Here is how the code looks like:
# progenitors, descendants and progenitor_outputnrs are 2D-arrays that are filled from reading in files.
# outputnrs is a 1d-array.
ozi = 0
for i in range(descendants[ozi].shape[0]):
if descendants[ozi][i] > 0:
if progenitors[ozi][i] < 0:
oind = outputnrs[0] - progenitor_outputnrs[ozi][i] - 1
print "looking for prog", progenitors[ozi][i], "with outputnr", progenitor_outputnrs[ozi][i], "in", outputnrs[oind]
for p in progenitors[oind]:
if p == -progenitors[ozi][i]:
# the following line works...
print "found", p, np.where(progenitors[oind]==-progenitors[ozi][i])[0][0]
# the following line doesn't!
iind = np.where(progenitors[oind]==-progenitors[ozi][i])[0][0]
I get the output:
looking for prog -76 with outputnr 65 in 66
found 76 79
looking for prog -2781 with outputnr 65 in 66
found 2781 161
looking for prog -3797 with outputnr 63 in 64
found 3797 163
looking for prog -3046 with outputnr 65 in 66
found 3046 163
looking for prog -6488 with outputnr 65 in 66
found 6488 306
Traceback (most recent call last):
File "script.py", line 1243, in <module>
main()
File "script.py", line 974, in main
iind = np.where(progenitors[oind]==-progenitors[out][i])[0][0]
IndexError: index 0 is out of bounds for axis 0 with size 0
I use python 2.7.12 and numpy 1.14.2.
Does anyone have an idea why this is happening?

Gensim Summarizer throws MemoryError, Any Solution?

I am trying to generate the summary of a large text file using Gensim Summarizer.
I am getting memory error. Have been facing this issue since sometime, any help
would be really appreciated. feel free to ask for more details.
from gensim.summarization.summarizer import summarize
file_read =open("xxxxx.txt",'r')
Content= file_read.read()
def Summary_gen(content):
print(len(Content))
summary_r=summarize(Content,ratio=0.02)
print(summary_r)
Summary_gen(Content)
The length of the document is:
365042
Error messsage:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-6-a91bd71076d1> in <module>()
10
11
---> 12 Summary_gen(Content)
<ipython-input-6-a91bd71076d1> in Summary_gen(content)
6 def Summary_gen(content):
7 print(len(Content))
----> 8 summary_r=summarize(Content,ratio=0.02)
9 print(summary_r)
10
c:\python3.6\lib\site-packages\gensim\summarization\summarizer.py in summarize(text, ratio, word_count, split)
428 corpus = _build_corpus(sentences)
429
--> 430 most_important_docs = summarize_corpus(corpus, ratio=ratio if word_count is None else 1)
431
432 # If couldn't get important docs, the algorithm ends.
c:\python3.6\lib\site-packages\gensim\summarization\summarizer.py in summarize_corpus(corpus, ratio)
367 return []
368
--> 369 pagerank_scores = _pagerank(graph)
370
371 hashable_corpus.sort(key=lambda doc: pagerank_scores.get(doc, 0), reverse=True)
c:\python3.6\lib\site-packages\gensim\summarization\pagerank_weighted.py in pagerank_weighted(graph, damping)
57
58 """
---> 59 adjacency_matrix = build_adjacency_matrix(graph)
60 probability_matrix = build_probability_matrix(graph)
61
c:\python3.6\lib\site-packages\gensim\summarization\pagerank_weighted.py in build_adjacency_matrix(graph)
92 neighbors_sum = sum(graph.edge_weight((current_node, neighbor)) for neighbor in graph.neighbors(current_node))
93 for j in xrange(length):
---> 94 edge_weight = float(graph.edge_weight((current_node, nodes[j])))
95 if i != j and edge_weight != 0.0:
96 row.append(i)
c:\python3.6\lib\site-packages\gensim\summarization\graph.py in edge_weight(self, edge)
255
256 """
--> 257 return self.get_edge_properties(edge).setdefault(self.WEIGHT_ATTRIBUTE_NAME, self.DEFAULT_WEIGHT)
258
259 def neighbors(self, node):
c:\python3.6\lib\site-packages\gensim\summarization\graph.py in get_edge_properties(self, edge)
404
405 """
--> 406 return self.edge_properties.setdefault(edge, {})
407
408 def add_edge_attributes(self, edge, attrs):
MemoryError:
I have tried looking up for this error on the internet, but, couldn't find a workable solution to this.
From the logs, it looks like the code builds an adjacency matrix
---> 59 adjacency_matrix = build_adjacency_matrix(graph)
This probably tries to create a huge adjacency matrix with your 365042 documents which cannot fit in your memory(i.e., RAM).
You could try:
Reducing the document size to fewer files (maybe start with 10000)
and check if it works
Try running it on a system with more RAM
Did you try to use word_count argument instead of ratio?
If the above still doesn't solve the problem, then that's because of gensim's implementation limitations. The only way to use gensim if you still OOM errors is to split documents. That also will speed up your solution (and if the document is really big, it shouldn't be a problem anyway).
What's the problem with summarize:
gensim's summarizer uses TextRank by default, an algorithm that uses PageRank. In gensim it is unfortunately implemented using a Python list of PageRank graph nodes, so it may fail if your graph is too big.
BTW is the document length measured in words, or characters?

"Read_Ncol" exit with error code -1073740791

I am using python 3.5.3 and igraph 0.7.1.
Why the following code finishes with "Process finished with exit code -1073740791 (0xC0000409)" error message.
from igraph import Graph
g = Graph.Read_Ncol('test.csv', directed=False)
test.csv
119 205
119 625
124 133
124 764
124 813
55 86
55 205
55 598
133 764
The Read_Ncol function reads files in NCOL format, as produced by the Large Graph Layout program.
Your example works fine for me, also on Python 3.5.3 with igraph 0.7.1.
>>> g = Graph.Read_Ncol('test.csv', directed=False)
>>> g
<igraph.Graph object at 0x10c4844f8>
>>> print(g)
IGRAPH UN-- 10 9 --
+ attr: name (v)
+ edges (vertex names):
119--205, 119--625, 124--133, 124--764, 124--813, 55--86, 205--55, 55--598,
133--764
It seems the error C0000409 means "Stack Buffer Overrun" on Windows, which probably means that your program is writing outside of the space allocated on the stack (it's different from a stack overflow, according to this Microsoft Technet Blog.)

Python memory error in sympy.simplify

Using 64-bit Python 3.3.1 and 32GB RAM and this function to generate target expression 1+1/(2+1/(2+1/...)):
def sqrt2Expansion(limit):
term = "1+1/2"
for _ in range(limit):
i = term.rfind('2')
term = term[:i] + '(2+1/2)' + term[i+1:]
return term
I'm getting MemoryError when calling:
simplify(sqrt2Expansion(100))
Shorter expressions work fine, e.g:
simplify(sqrt2Expansion(50))
Is there a way to configure SymPy to complete this calculation? Below is the error message:
MemoryError Traceback (most recent call last)
<ipython-input-90-07c1e2de29d1> in <module>()
----> 1 simplify(sqrt2Expansion(100))
C:\Python33\lib\site-packages\sympy\simplify\simplify.py in simplify(expr, ratio, measure)
2878 from sympy.functions.special.bessel import BesselBase
2879
-> 2880 original_expr = expr = sympify(expr)
2881
2882 expr = signsimp(expr)
C:\Python33\lib\site-packages\sympy\core\sympify.py in sympify(a, locals, convert_xor, strict, rational)
176 try:
177 a = a.replace('\n', '')
--> 178 expr = parse_expr(a, locals or {}, rational, convert_xor)
179 except (TokenError, SyntaxError):
180 raise SympifyError('could not parse %r' % a)
C:\Python33\lib\site-packages\sympy\parsing\sympy_parser.py in parse_expr(s, local_dict, rationalize, convert_xor)
161
162 code = _transform(s.strip(), local_dict, global_dict, rationalize, convert_xor)
--> 163 expr = eval(code, global_dict, local_dict) # take local objects in preference
164
165 if not hit:
MemoryError:
EDIT:
I wrote a version using sympy expressions instead of strings:
def sqrt2Expansion(limit):
x = Symbol('x')
term = 1+1/x
for _ in range(limit):
term = term.subs({x: (2+1/x)})
return term.subs({x: 2})
It runs better: sqrt2Expansion(100) returns valid result, but sqrt2Expansion(200) produces RuntimeError with many pages of traceback and hangs up IPython interpreter with plenty of system memory left unused. I created new question Long expression crashes SymPy with this issue.
SymPy is using eval along the path to turn your string into a SymPy object, and eval uses the built-in Python parser, which has a maximum limit. This isn't really a SymPy issue.
For example, for me:
>>> eval("("*100+'3'+")"*100)
s_push: parser stack overflow
Traceback (most recent call last):
File "<ipython-input-46-1ce3bf24ce9d>", line 1, in <module>
eval("("*100+'3'+")"*100)
MemoryError
Short of modifying MAXSTACK in Parser.h and recompiling Python with a different limit, probably the best way to get where you're headed is to avoid using strings in the first place. [I should mention that the PyPy interpreter can make it up to ~1100 for me.]

Categories

Resources