Maximum match length of a regular expression - python

What is the easiest way to determine the maximum match length of a regular expression?
Specifically, I am using Python's re module.
E.g. for foo((bar){2,3}|potato) it would be 12.
Obviously, regexes using operators like * and + have theoretically unbounded match lengths; in those cases returning an error or something is fine. Giving an error for regexes using the (?...) extensions is also fine.
I would also be ok with getting an approximate upper bound, as long as it is always greater than the actual maximum length, but not too much greater.

Using pyparsing's invRegex module:
import invRegex
data='foo(bar{2,3}|potato)'
print(list(invRegex.invert(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,invRegex.invert(data))))
# 9
Another alternative is to use ipermute from this module.
import inverse_regex
data='foo(bar{2,3}|potato)'
print(list(inverse_regex.ipermute(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,inverse_regex.ipermute(data))))
# 9

Solved, I think. Thanks to unutbu for pointing me to sre_parse!
import sre_parse
def get_regex_max_match_len(regex):
minlen, maxlen = sre_parse.parse(regex).getwidth()
if maxlen >= sre_parse.MAXREPEAT: raise ValueError('unbounded regex')
return maxlen
Results in:
>>> get_regex_max_match_len('foo((bar){2,3}|potato)')
12
>>> get_regex_max_match_len('.*')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in get_regex_max_match_len
ValueError: unbounded regex

Related

How to subsitute symbols in sympyfied expression properly?

my goal is to have a string turned into a symbolic expression using sympify and then make substitutions.
import sympy as sp
Eq_Str = 'a*x+b'
Eq_Sym = sp.sympify(Eq_Str)
Then, for instance, substitute a for something else:
Eq_Sym.subs(a,2)
But I get the error:
Traceback (most recent call last):
File "<ipython-input-5-e9892d6ffa06>", line 1, in <module>
Eq_Sym.subs(a,2)
NameError: name 'a' is not defined
I understand that there is no symbol a in the workspace. Am I right?
Is there a way to get the symbols from the set I get from Eq_Sym.free_symbols into the workspace so I can substitute them in Eq_Sym.
Thank you very much for taking the time to read this.
you can use globals() for that:
import sympy as sp
Eq_Str = 'a*x+b'
Eq_Sym = sp.sympify(Eq_Str)
for s in Eq_Sym.free_symbols :
globals()[s.name] = s;
print (Eq_Sym.subs(a,2)); #b + 2*x

OverflowError: range() result has too many items, although it hasn't

I have this for-loop:
for i in range(1000000000, 1000000030):
foo(i)
When I execute it this error is given:
Traceback (most recent call last):
File "/CENSORED/Activity.py", line 11, in <module>
for i in range(1000000000, 10000000030):
OverflowError: range() result has too many items.
As far as I know, this range-object should have exactly 30 elements...
Where is the problem?
Edit:
I have removed the extra zero, now I get this:
Traceback (most recent call last):
File "/CENSORED/Activity.py", line 12, in <module>
factorizeInefficient(i)
MemoryError
Edit 2:
def factorizeInefficient(n):
teiler = list()
for i in range(n):
if i != 0:
if (n%i)==0:
teiler.append(i)
print teiler
Just found the solution myself: There is a range(n) object in this as well and this causes the memory Error...
An extra question: How did you guys know this was python 2? (Btw you were right...)
Copy/pasting the range() part of your code:
>>> len(range(1000000000, 10000000030))
9000000030
So there are actually about 9 billion elements in the range. Your first argument is presumably missing a zero, or second argument has a zero too many ;-)
count your zeros once again ;) I'd say it's one too much.

Python: limit on the accuracy of float

The code gives an error because the value of "var" is very close to zero, less than 1e-80. I tried to fix this error using "Import decimal *", but it didn't really work. Is there a way to tell Python to round a number to zero when float number is very close to zero, i.e. < 1e-50? Or any other way to fix this issue?
Thank you
CODE:
import math
H=6.6260755e-27
K=1.3807e-16
C=2.9979E+10
T=100.0
x=3.07175e-05
cst=2.0*H*H*(C**3.0)/(K*T*T*(x**6.0))
a=H*C/(K*T*x)
var=cst*math.exp(a)/((math.exp(a)-1.0)**2.0)
print var
OUTPUT:
Traceback (most recent call last):
File "test.py", line 11, in <module>
var=cst*math.exp(a)/((math.exp(a)-1.0)**2.0)
OverflowError: (34, 'Numerical result out of range')
To Kevin:
The code was edited with following lines:
from decimal import *
getcontext().prec = 7
cst=Decimal(2.0*H*H*(C**3.0)/(K*T*T*(x**6.0)))
a=Decimal(H*C/(K*T*x))
The problem is that (math.exp(a)-1.0)**2.0 is too large to hold as an intermediate result.
>>> (math.exp(a) - 1.0)**2.0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: (34, 'Result too large')
However, for the value of a you are using,
>>> math.exp(a)/(math.exp(a)-1.0) == 1.0
True
so you can essentially cancel that part of the fraction, leaving
var = cst/(math.exp(a)-1.0)
which evaluates nicely to
>>> cst/(math.exp(a)-1.0)
7.932672271698049e-186
If you aren't comfortable rewriting the formula to that extent, use the associativity of the operations to avoid the large intermediate value. The resulting product is the same.
>>> cst/(math.exp(a)-1.0)*math.exp(a)/(math.exp(a)-1.0)
7.932672271698049e-186
I solved this issue but that will work only for this particular problem, not in general. The main issue is the nature of this function:
math.exp(a)/(math.exp(a)-1.0)**2.0
which decays very rapidly.
Problem can be easily solved restricting the value of "a" (which won't make any significant change in calculation). i.e.
if a>200:
var=0.0
else:
var=cst*math.exp(a)/((math.exp(a)-1.0)**2.0)

ifft function gives "'str' object is not callable" error

I am trying to take the inverse Fourier transform of a list, and for some reason I keep getting the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "simulating_coherent_data.py", line 238, in <module>
exec('ift%s = np.fft.ifft(nd.array(FTxSQRT_PS%s))'(x,x))
TypeError: 'str' object is not callable
And I can't figure out where I have a string. The part of my code it relates to is as follows
def FTxSQRT_PS(FT,PS):
# Import: The Fourier Transform and the Power Spectrum, both as lists
# Export: The result of FTxsqrt(PS), as a list
# Function:
# Takes each element in the FT and PS and finds FTxsqrt(PS) for each
# appends each results to a list called signal
signal = []
print type(PS)
for x in range(len(FT)):
indiv_signal = np.abs(FT[x])*math.sqrt(PS[x])
signal.append(indiv_signal)
return signal
for x in range(1,number_timesteps+1):
exec('FTxSQRT_PS%s = FTxSQRT_PS(fshift%s,power_spectrum%s)'%(x,x,x))
exec('ift%s = np.fft.ifft(FTxSQRT_PS%s)'(x,x))
Where FTxSQRT_PS%s are all lists. fshift%s is a np.array and power_spectrum%s is a list. I've also tried setting the type for FTxSQRT_PS%s as a np.array but that did not help.
I have very similar code a few lines up that works fine;
for x in range(1,number_timesteps+1):
exec('fft%s = np.fft.fft(source%s)'%(x,x))
where source%s are all type np.array
The only thing I can think of is that maybe np.fft.ifft is not how I should be taking the inverse Fourier transform for Python 2.7.6 but I also cannot find an alternative.
Let me know if you'd like to see the whole code, there is about 240 lines up to where I'm having trouble, though a lot of that is commenting.
Thanks for any help,
Teresa
You are missing a %
exec('ift%s = np.fft.ifft(FTxSQRT_PS%s)'(x,x))
Should be:
exec('ift%s = np.fft.ifft(FTxSQRT_PS%s)'%(x,x))

Trouble with for loops

We just learned for loops in class for about five minutes and we were already given a lab. I am trying but still not getting what I need to get. What I am trying to do is take a list of integers, and then only take the odd integers and add them up and then return them so if the list of integers was [3,2,4,7,2,4,1,3,2] the returned value would be 14
def f(ls):
ct=0
for x in (f(ls)):
if x%2==1:
ct+=x
return(ct)
print(f[2,5,4,6,7,8,2])
the error code reads
Traceback (most recent call last):
File "C:/Users/Ian/Documents/Python/Labs/lab8.py", line 10, in <module>
print(f[2,5,4,6,7,8,2])
TypeError: 'function' object is not subscriptable
Just a couple of minor mistakes:
def f(ls):
ct = 0
for x in ls:
# ^ Do not call the method, but just parse through the list
if x % 2 == 1:
ct += x
return(ct)
# ^ ^ parenthesis are not necessary
print(f([2,5,4,6,7,8,2]))
# ^ ^ Missing paranthesis
You're missing the parenthesis in the function call
print(f([2,5,4,6,7,8,2]))
rather than
print(f[2,5,4,6,7,8,2])

Categories

Resources