Python memory error in sympy.simplify - python

Using 64-bit Python 3.3.1 and 32GB RAM and this function to generate target expression 1+1/(2+1/(2+1/...)):
def sqrt2Expansion(limit):
term = "1+1/2"
for _ in range(limit):
i = term.rfind('2')
term = term[:i] + '(2+1/2)' + term[i+1:]
return term
I'm getting MemoryError when calling:
simplify(sqrt2Expansion(100))
Shorter expressions work fine, e.g:
simplify(sqrt2Expansion(50))
Is there a way to configure SymPy to complete this calculation? Below is the error message:
MemoryError Traceback (most recent call last)
<ipython-input-90-07c1e2de29d1> in <module>()
----> 1 simplify(sqrt2Expansion(100))
C:\Python33\lib\site-packages\sympy\simplify\simplify.py in simplify(expr, ratio, measure)
2878 from sympy.functions.special.bessel import BesselBase
2879
-> 2880 original_expr = expr = sympify(expr)
2881
2882 expr = signsimp(expr)
C:\Python33\lib\site-packages\sympy\core\sympify.py in sympify(a, locals, convert_xor, strict, rational)
176 try:
177 a = a.replace('\n', '')
--> 178 expr = parse_expr(a, locals or {}, rational, convert_xor)
179 except (TokenError, SyntaxError):
180 raise SympifyError('could not parse %r' % a)
C:\Python33\lib\site-packages\sympy\parsing\sympy_parser.py in parse_expr(s, local_dict, rationalize, convert_xor)
161
162 code = _transform(s.strip(), local_dict, global_dict, rationalize, convert_xor)
--> 163 expr = eval(code, global_dict, local_dict) # take local objects in preference
164
165 if not hit:
MemoryError:
EDIT:
I wrote a version using sympy expressions instead of strings:
def sqrt2Expansion(limit):
x = Symbol('x')
term = 1+1/x
for _ in range(limit):
term = term.subs({x: (2+1/x)})
return term.subs({x: 2})
It runs better: sqrt2Expansion(100) returns valid result, but sqrt2Expansion(200) produces RuntimeError with many pages of traceback and hangs up IPython interpreter with plenty of system memory left unused. I created new question Long expression crashes SymPy with this issue.

SymPy is using eval along the path to turn your string into a SymPy object, and eval uses the built-in Python parser, which has a maximum limit. This isn't really a SymPy issue.
For example, for me:
>>> eval("("*100+'3'+")"*100)
s_push: parser stack overflow
Traceback (most recent call last):
File "<ipython-input-46-1ce3bf24ce9d>", line 1, in <module>
eval("("*100+'3'+")"*100)
MemoryError
Short of modifying MAXSTACK in Parser.h and recompiling Python with a different limit, probably the best way to get where you're headed is to avoid using strings in the first place. [I should mention that the PyPy interpreter can make it up to ~1100 for me.]

Related

Using Sympy sympify on a black-box numerical function

The overall problem that I am trying to solve is to develop code which accepts string equations from user input or files, parses the equations, and solves the equations given a valid set of known values for variables. The approach must allow the user to enter a thermophysical function (such as CoolProp's PropsSI or HAPropsSI) in equation(s), and ideally, any user-defined function or object. Based on initial work I thought Sympy was a way to go.
Therefore, I have been trying to understand how to sympify a numerical function for use in systems of equations in Sympy.
The function is HAPropsSI from the CoolProp library. The Coolprops functions are implemented in C++ and wrapped for use in Python. It is not built on numpy per se, but is vectorized to accept 1D numpy arrays in addition to ints, floats, and lists.
Here is an example of what I tried:
from CoolProp.HumidAirProp import HAPropsSI
from sympy import symbols, sympify
# Example calculating enthalpy as a function of temp., pressure, % RH:
T = 298.15
P = 101325
RH = 0.5
h = HAPropsSI("H", "T", T, "P", P, "R", RH)
print(h) # returns the float value h = 50423.45
# Example using Sympy:
Temp, Press, RH = symbols('Temp Press RH')
sym_h = sympify('HAPropsSI("H", "T", Temp, "P", Press, "R", RH)', {'HAPropsSI':HAPropsSI})
Sympify tries to parse the expression and then use eval on the function with symbols which results in the following traceback:
ValueError Traceback (most recent call last)
ValueError: Error from parse_expr with transformed code: 'HAPropsSI ("H","T",Symbol (\'Temp\' ),"P",Symbol (\'Press\' ),"R",Symbol (\'RH\' ))'
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
C:\Users\JIMCAR~1\AppData\Local\Temp/ipykernel_3076/1321321868.py in <module>
12
13 Temp, Press, RH = symbols('Temp Press RH')
---> 14 sym_h = sympify('HAPropsSI("H", "T", Temp, "P", Press, "R", RH)', {'HAPropsSI':HAPropsSI})
15
16 '''
~\AppData\Roaming\Python\Python38\site-packages\sympy\core\sympify.py in sympify(a, locals, convert_xor, strict, rational, evaluate)
470 try:
471 a = a.replace('\n', '')
--> 472 expr = parse_expr(a, local_dict=locals, transformations=transformations, evaluate=evaluate)
473 except (TokenError, SyntaxError) as exc:
474 raise SympifyError('could not parse %r' % a, exc)
~\AppData\Roaming\Python\Python38\site-packages\sympy\parsing\sympy_parser.py in parse_expr(s, local_dict, transformations, global_dict, evaluate)
1024 for i in local_dict.pop(None, ()):
1025 local_dict[i] = None
-> 1026 raise e from ValueError(f"Error from parse_expr with transformed code: {code!r}")
1027
1028
~\AppData\Roaming\Python\Python38\site-packages\sympy\parsing\sympy_parser.py in parse_expr(s, local_dict, transformations, global_dict, evaluate)
1015
1016 try:
-> 1017 rv = eval_expr(code, local_dict, global_dict)
1018 # restore neutral definitions for names
1019 for i in local_dict.pop(None, ()):
~\AppData\Roaming\Python\Python38\site-packages\sympy\parsing\sympy_parser.py in eval_expr(code, local_dict, global_dict)
909 Generally, ``parse_expr`` should be used.
910 """
--> 911 expr = eval(
912 code, global_dict, local_dict) # take local objects in preference
913 return expr
<string> in <module>
CoolProp\HumidAirProp.pyx in CoolProp.CoolProp.HAPropsSI()
CoolProp\HumidAirProp.pyx in CoolProp.CoolProp.HAPropsSI()
TypeError: Numerical inputs to HAPropsSI must be ints, floats, lists, or 1D numpy arrays.
An example application would be to create an equation and solve for an unknown (Press, Temp, or RH) given the value of h:
eqn = Eq(sym_h, 50423.45)
nsolve(eqn, Press, 1e5)
What I am trying to accomplish is not so different from:
Python: Using sympy.sympify to perform a safe eval() on mathematical functions
Though I admit I am unclear on the details of the subclassing.
Thanks for any insights.

SageMath: Why doesn't sagemath give line number in case of TypeErrors? Is there a way to trace the actual line number?

Using Sagemath 9.2 on Windows 10
a.sage
i = 10
print("hello " + i)
sage: load("a.sage")
--------------------------------------------------------------------------- TypeError Traceback (most recent call last)
in
----> 1 load("a.sage")
/opt/sagemath-9.2/local/lib/python3.7/site-packages/sage/misc/persist.pyx
in sage.misc.persist.load
(build/cythonized/sage/misc/persist.c:2558)()
141
142 if sage.repl.load.is_loadable_filename(filename):
--> 143 sage.repl.load.load(filename, globals())
144 return
145
/opt/sagemath-9.2/local/lib/python3.7/site-packages/sage/repl/load.py
in load(filename, globals, attach)
270 add_attached_file(fpath)
271 with open(fpath) as f:
--> 272 exec(preparse_file(f.read()) + "\n", globals)
273 elif ext == '.spyx' or ext == '.pyx':
274 if attach:
in
/opt/sagemath-9.2/local/lib/python3.7/site-packages/sage/rings/integer.pyx
in sage.rings.integer.Integer.add
(build/cythonized/sage/rings/integer.c:12447)() 1785
return y 1786
-> 1787 return coercion_model.bin_op(left, right, operator.add) 1788 1789 cpdef add(self, right):
/opt/sagemath-9.2/local/lib/python3.7/site-packages/sage/structure/coerce.pyx
in sage.structure.coerce.CoercionModel.bin_op
(build/cythonized/sage/structure/coerce.c:11304)() 1246 #
We should really include the underlying error. 1247 # This
causes so much headache.
-> 1248 raise bin_op_exception(op, x, y) 1249 1250 cpdef canonical_coercion(self, x, y):
TypeError: unsupported operand parent(s) for +: '<class 'str'>' and
'Integer Ring'
In many other types of errors, sage math does give line number where the error happened, but usually in TypeErrors, I don't see that happening
So,
This is a big problem in longer programs & especially in more complicated datatypes. It's quite difficult to track the line giving the problem.
What the different kinds of errors where this happens?
Is there a simple way to track the line number (I use a rather long way).
If you use %attach a.sage instead, it will print line numbers. The line numbers are for the preparsed version of the file, but you can perhaps extract enough information from that. Here is what I see:
sage: %attach /Users/palmieri/Desktop/a.sage
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-a6e4524362f6> in <module>
----> 1 get_ipython().run_line_magic('attach', '/Users/palmieri/Desktop/a.sage')
[snip]
~/.sage/temp/John-iMac-2017.local/34847/a.sage5dnlgxa9.py in <module>
5 _sage_const_10 = Integer(10)
6 i = _sage_const_10
----> 7 print("hello " + i)
[snip]
TypeError: unsupported operand parent(s) for +: '<class 'str'>' and 'Integer Ring'
%attach also has the feature that whenever the file is changed, it automatically gets reloaded.

Python: determinant, matrix, can't convert expression to float

I'm writing a short program which should find value for which real and imaginary part of function are both zero. I don't understand why I get "can't convert expression to float" after running program. (Please forgive my messiness while writing the code!) I cut-off definitions of symbols a11-a88, to save you reading, but all of them are type Acmath.exp(bx), A1*cmath.exp(1.0j*b1*x) or 1.0j*A2*cmath.exp(1.0j*b2*x). I consistently use the cmath function instead of math (cmath.exp not exp, and cmath.sqrt not sqrt).
import sys
import math
from scipy import *
from numpy.linalg import *
from sympy import *
import numpy
from sympy.solvers import solve
import cmath
from scipy import optimize
plik=open('solution_e-.txt','w')
#I cut-off definitions of symbols a11-a88.
Det =((a77*a88+(-1.0)*a78*a87)*(a44*a55*a66+a45*a56*a64)+(a76*a88+(-1.0)*a78*a86)*(a44*a57*a65+a45*a54*a67))*(a11*(a22*a33+(-1.0)*a23*a32)+a21*(a13*a32+(-1.0)*a12*a33))+((a77*a88+(-1.0)*a78*a87)*(a34*a56*a65+a35*a54*a66)+(a76*a88+(-1.0)*a78*a86)*(a34*a55*a67+a35*a57*a64))*(a11*(a22*a43+(-1.0)*a23*a42)+a21*(a13*a42+(-1.0)*a12*a43))+((a77*a88+(-1.0)*a78*a87)*(a44*a56*a65+a45*a54*a66)+(a76*a88+(-1.0)*a78*a86)*(a44*a55*a67+a45*a57*a64))*(a11*(a23*a32+(-1.0)*a22*a33)+a21*(a12*a33+(-1.0)*a13*a32))+((a77*a88+(-1.0)*a78*a87)*(a34*a55*a66+a35*a56*a64)+(a76*a88+(-1.0)*a78*a86)*(a34*a57*a65+a35*a54*a67))*(a11*(a23*a42+(-1.0)*a22*a43)+a21*(a12*a43+(-1.0)*a13*a42))
equat = Det.real + Det.imag
for i in range (76500,76550,1):
n=i/100000.0
equat_lam = lambdify(x,equat)
Solut = optimize.fsolve(equat_lam, n)
plik.write(str(float(Solut))+'\n')
print n
plik.close()
Edit: full traceback of the error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\Anaconda\lib\site-packages\IPython\utils\py3compat.pyc in execfile(fname, glob, loc)
195 else:
196 filename = fname
--> 197 exec compile(scripttext, filename, 'exec') in glob, loc
198 else:
199 def execfile(fname, *where):
C:\Users\Melania\Documents\doktorat\2017\analiza\Próbka I\poziomy_en\rozwiazanie_elektrony.py in <module>()
33 print 'I defined other symbols'
34
---> 35 k1=(cmath.sqrt(2.0*(V1-x)*m))/hkr
36 k2=(cmath.sqrt(2.0*(V2-x)*m))/hkr
37 k3=(cmath.sqrt(2.0*x*m))/hkr
C:\Anaconda\lib\site-packages\sympy\core\expr.pyc in __complex__(self)
210 result = self.evalf()
211 re, im = result.as_real_imag()
--> 212 return complex(float(re), float(im))
213
214 #_sympifyit('other', False) # sympy > other
C:\Anaconda\lib\site-packages\sympy\core\expr.pyc in __float__(self)
205 if result.is_number and result.as_real_imag()[1]:
206 raise TypeError("can't convert complex to float")
--> 207 raise TypeError("can't convert expression to float")
208
209 def __complex__(self):
TypeError: can't convert expression to float
The trace starts with
C:\Users\...
k1=(cmath.sqrt(2.0*(V1-x)*m))/hkr
and at the end you see a
TypeError: can't convert expression to float
raised by Sympy's expr.__float__ that was called by expr.__complex__ so one can deduce that the expression 2.0*(V1-x)*m cannot be converted to a complex number — typically this happens because it contains a free symbol.
If you want to compute numerically the square root you must substitute a numerical value for all the symbols that constitute the argument of cmath.sqrt where every term, say e.g. V1, can be a symbolic expression containing a large number of symbols.
That said, if you want to "find value for which real and imaginary part of function are both zero" apparently you shouldn't write equat = Det.real + Det.imag

Using sframe.apply() causing runtime error

I am trying to use a simple apply on s frame full of data. This is for a simple data transform on one of the columns applying a function that takes a text input and splits it into a list. Here is the function and its call/output:
In [1]: def count_words(txt):
count = Counter()
for word in txt.split():
count[word]+=1
return count
In [2]: products.apply(lambda x: count_words(x['review']))
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-8-85338326302c> in <module>()
----> 1 products.apply(lambda x: count_words(x['review']))
C:\Anaconda3\envs\dato-env\lib\site-packages\graphlab\data_structures\sframe.pyc in apply(self, fn, dtype, seed)
2607
2608 with cython_context():
-> 2609 return SArray(_proxy=self.__proxy__.transform(fn, dtype, seed))
2610
2611 def flat_map(self, column_names, fn, column_types='auto', seed=None):
C:\Anaconda3\envs\dato-env\lib\site-packages\graphlab\cython\context.pyc in __exit__(self, exc_type, exc_value, traceback)
47 if not self.show_cython_trace:
48 # To hide cython trace, we re-raise from here
---> 49 raise exc_type(exc_value)
50 else:
51 # To show the full trace, we do nothing and let exception propagate
RuntimeError: Runtime Exception. Unable to evaluate lambdas. Lambda workers did not start.
When I run my code I get that error. The s frame (df) is only 10 by 2 so there should be no overload coming from there. I don't know how to fix this issue.
If you're using GraphLab Create, there is actually a built-in tool for doing this, in the "text analytics" toolkit. Let's say I have data like:
import graphlab
products = graphlab.SFrame({'review': ['a portrait of the artist as a young man',
'the sound and the fury']})
The easiest way to count the words in each entry is
products['counts'] = graphlab.text_analytics.count_words(products['review'])
If you're using the sframe package by itself, or if you want to do a custom function like the one you described, I think the key missing piece in your code is that the Counter needs to be converted into a dictionary in order for the SFrame to handle the output.
from collections import Counter
def count_words(txt):
count = Counter()
for word in txt.split():
count[word] += 1
return dict(count)
products['counts'] = products.apply(lambda x: count_words(x['review']))
For anyone who has come across this issue while using graphlab here is the the discussion thread on the issue on dato support:
http://forum.dato.com/discussion/1499/graphlab-create-using-anaconda-ipython-notebook-lambda-workers-did-not-start
Here is the code that can be run to provide a case by case basis for this issue.
After starting ipython or ipython notebook in the Dato/Graphlab environment, but before importing graphlab, copy and run the following code
import ctypes, inspect, os, graphlab
from ctypes import wintypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
kernel32.SetDllDirectoryW.argtypes = (wintypes.LPCWSTR,)
src_dir = os.path.split(inspect.getfile(graphlab))[0]
kernel32.SetDllDirectoryW(src_dir)
# Should work
graphlab.SArray(range(1000)).apply(lambda x: x)
If this is run, the the apply function should work fine with sframe.

IPython.parallel ValueError: cannot create an OBJECT array from memory buffer

I'm trying to write a function to be executed in several IPython engines. The function takes a pandas Series as an argument. Each element of the Series is a string, and the whole Series constitutes a corpus for TF.IDF computation.
After reading IPython parallel documentation and some tutorials, it seems to be quite straightforward to do, and I came up with the following:
import pandas as pd
from IPython.parallel import Client
def calculemus(corpus):
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(min_df=1, stop_words='english')
return vectorizer.fit_transform(corpus)
review = pd.read_csv('review.csv')['text']
review = review.fillna('')
client = Client()
r = client[-1].apply(calculemus, review).get()
BUT I got this error instead:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)/xxx/site-packages/IPython/zmq/serialize.pyc in unpack_apply_message(bufs, g, copy)
154 sa.data = m.bytes
155
--> 156 args = uncanSequence(map(unserialize, sargs), g)
157 kwargs = {}
158 for k in sorted(skwargs.iterkeys()):
/xxx/site-packages/IPython/utils/newserialized.pyc in unserialize(serialized)
175
176 def unserialize(serialized):
--> 177 return UnSerializeIt(serialized).getObject()
/xxx/site-packages/IPython/utils/newserialized.pyc in getObject(self)
159 buf = self.serialized.getData()
160 if isinstance(buf, (bytes, buffer, memoryview)):
--> 161 result = numpy.frombuffer(buf, dtype = self.serialized.metadata['dtype'])
162 else:
163 raise TypeError("Expected bytes or buffer/memoryview, but got %r"%type(buf))
ValueError: cannot create an OBJECT array from memory buffer
I'm not sure what the problem is, could someone enlighten me on this?
UPDATE
Apparently the error says exactly what it says. If I do this:
r = client[-1].apply(calculemus, np.array(review, dtype=str)).get()
it kinda works.
So the next question is, is this a feature or a limitation of IPython?
This is a bug in IPython 0.13 that should be fixed in master. There is a special case for serializing numpy arrays that avoids copying data, and this behavior is triggered by an isinstance(numpy.ndarray) check. This was inappropriate, because isinstance catches subclasses, which includes pandas objects, but those pandas objects (and array subclasses in general) should not be treated in the same way, as metadata will be lost, and reconstruction on the other side will often fail.
PS:
r = client[-1].apply(calculemus, np.array(review, dtype=str)).get()
is equivalent to
r = client[-1].apply_sync(calculemus, np.array(review, dtype=str))

Categories

Resources