Upload with wtforms - unexpected end of regular expression - python

I am trying this code from here docs
class Form(Form):
image = FileField(u'Image File', validators=[Regexp(u'^[^/\\]\.jpg$')])
def validate_image(form, field):
if field.data:
field.data = re.sub(r'[^a-z0-9_.-]', '_', field.data)
Here is the error:
Traceback (most recent call last):
File "tornadoexample2-1.py", line 111, in <module>
class Form(Form):
File "tornadoexample2-1.py", line 119, in Form
image = FileField(u'Image File', validators=[Regexp(u'^[^/\\]\.jpg$')])
File "/usr/local/lib/python2.7/dist-packages/wtforms/validators.py", line 256, in __init__
regex = re.compile(regex, flags)
File "/usr/lib/python2.7/re.py", line 190, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.7/re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: unexpected end of regular expression
Any idea about what the problem?

The regexp in Regexp(u'^[^/\\]\.jpg$') is not quite good.
Try running this, you will get the same exception:
import re
re.compile(u'^[^/\\]\.jpg$')
You need to escape each \ slash twice inside the [] brackets.
So you can rewrite it as u'^[^/\\\\]\.jpg$' or as a raw string ur'^[^/\\]\.jpg$'.
Hope this helps.

Related

Python re escaping raises TypeError: first argument must be string or compiled pattern

I have an error log of a few thousands line and I would like to match all the occurences of these type:
Traceback (most recent call last):
File "my_code.py", line 83, in upload_detection_image
put_response = s3_object.put(Body=image, ContentType="image/jpeg")
File "/usr/local/lib/python3.6/dist-packages/boto3/resources/factory.py", line 520, in do_action
response = action(self, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/boto3/resources/action.py", line 83, in __call__
response = getattr(parent.meta.client, operation_name)(*args, **params)
File "/usr/local/lib/python3.6/dist-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.6/dist-packages/botocore/client.py", line 676, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.
More in general, I would like to match some text starting with a given string, and ending with another string. The text could be multiline. The problem is that the beginning and end of the text could contain some dedicated regex characters that might require to be escaped.
My attempt:
pattern = rf"Traceback(.*){re.escape('botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.')}",
re.findall(pattern, text, flags=re.MULTILINE)
It gives me error on findall:
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<string>", line 2, in <module>
File "/Users/me/.pyenv/versions/3.6.12/lib/python3.6/re.py", line 222, in findall
return _compile(pattern, flags).findall(string)
File "/Users/me/.pyenv/versions/3.6.12/lib/python3.6/re.py", line 300, in _compile
raise TypeError("first argument must be string or compiled pattern")
TypeError: first argument must be string or compiled pattern
Thanks
All of your comments contribute to the final solution:
pattern = rf"Traceback(.*?){re.escape('botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.')}"
results = re.findall(pattern, text, flags=re.DOTALL)

pytesseract tessedit_char_whitelist not accepting quote

I have started working with pytesserract in python. When i pass it single or double quote in
from PIL import Image
import pytesseract
import numpy as np
tesseract_config = r"""-c tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ#'<>(){};:"""
tesseract_language = "eng"
text = pytesseract.image_to_string(Image.open('res/outc001.jpg'), lang=tesseract_language, config=tesseract_config)
print text
it returns
Traceback (most recent call last):
File "main.py", line 15, in <module>
text = pytesseract.image_to_string(Image.open('res/outc001.jpg'), lang=tesseract_language, config=tesseract_config).split('\n')
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 193, in image_to_string
return run_and_get_output(image, 'txt', lang, config, nice)
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 140, in run_and_get_output
run_tesseract(**kwargs)
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 106, in run_tesseract
command += shlex.split(config)
File "/usr/lib/python2.7/shlex.py", line 279, in split
return list(lex)
File "/usr/lib/python2.7/shlex.py", line 269, in next
token = self.get_token()
File "/usr/lib/python2.7/shlex.py", line 96, in get_token
raw = self.read_token()
File "/usr/lib/python2.7/shlex.py", line 172, in read_token
raise ValueError, "No closing quotation"
ValueError: No closing quotation
I have been searching for a way to escape single and double quotes and none of them worked.
When i run tesseract as itself with
tesseract res/outc001.jpg tesseract_out/out001 -c "tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ#\'\"<>(){};:"
it works just fine.
Pytesseract uses shlex to separate config arguments.
The escape character for shlex is \, if you want to insert quotes in the shlex.split() function you must escape it with \.
If you want ' only in the whitelist:
tesseract_config = "-c tessedit_char_whitelist=blahblah\\'")
If you want " only:
tesseract_config = '-c tessedit_char_whitelist=blahblah\\"')
If you want both ' and ":
tesseract_config = '''-c tessedit_char_whitelist=blahblah\\'\\"''')
or
tesseract_config = """-c tessedit_char_whitelist=blahblah\\"\\'""")

Python Regex Error : nothing to repeat at position 0

I am trying to match strings which can be typed from a normal english keyboard.
So, it should include alphabets, digits, and all symbols present on our keyboard.
Corresponding regex : "[a-zA-Z0-9\t ./,<>?;:\"'`!##$%^&*()\[\]{}_+=|\\-]+"
I verfied this regex on regexr.com.
In python, on matching I am getting following error :
>>> a=re.match("+how to block a website in edge",pattern)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\re.py", line 163, in match
return _compile(pattern, flags).match(string)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\re.py", line 293, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\sre_compile.py", line 536, in compile
p = sre_parse.parse(p, flags)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\sre_parse.py", line 829, in parse
p = _parse_sub(source, pattern, 0)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\sre_parse.py", line 437, in _parse_sub
itemsappend(_parse(source, state, nested + 1))
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\tf_1.2\lib\sre_parse.py", line 638, in _parse
source.tell() - here + len(this))
sre_constants.error: nothing to repeat at position 0
This error message is not about position of arguments. Yes, in question above they are not in the right order, but this is only half of problem.
I've got this problem once when i had something like this:
re.search('**myword', '/path/to/**myword')
I wanted to get '**' automatically so i did not wanted to write '\' manually somewhere. For this cause there is re.escape() function. This is the right code:
re.search(re.escape('**myword'), '/path/to/**myword')
The problem here is that special character placed after the beginning of line.
You have your arguments for re.match backward: it should be
re.match(pattern, "+how to block a website in edge")

While calling simplify in sympy getting error?

When my python code tried to use simplify it shows following error. This problem showed after i run separate code file of pyparsing(Which execute successfully). The same code is working fine before.
Edit:
>>> expression="a+b+z"
>>> t=simplify(expression)
ast.py:4: SyntaxWarning: invalid pattern (**) passed to Regex
operator = pp.Regex("**").setName("operator")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sympy\simplify\simplify.py", line 507, in simplify
expr = sympify(expr)
File "C:\Python27\lib\site-packages\sympy\core\sympify.py", line 308, in sympify
from sympy.parsing.sympy_parser import (parse_expr, TokenError,
File "C:\Python27\lib\site-packages\sympy\parsing\sympy_parser.py", line 11, in <module>
import ast
File "ast.py", line 4, in <module>
operator = pp.Regex("**").setName("operator")
File "C:\Python27\lib\site-packages\pyparsing.py", line 1920, in __init__
self.re = re.compile(self.pattern, self.flags)
File "C:\Python27\Lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\Python27\Lib\re.py", line 244, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
Please suggest?
You have a local file, ast.py, which is getting imported in place of Python's built-in ast module. You should remove or rename this file to avoid the name conflict, as this can cause other modules to not work correctly.
Additionally, your local module contains the following line, which is causing an exception on import:
operator = pp.Regex("**").setName("operator")
** is not a valid regular expression. In a regular expression, * means "0 or more repetitions of the preceding expression", which doesn't make sense at the beginning of an expression because there is "nothing to repeat" (as the error message says).

Python Regex match: include Regex syntax

This is a list of characters I need for a regex match:
A-Za-z0-9_-\[]{}^`|
However, some of them, like \, [], ^ and | are regex syntax characters, when I tried using this pattern, I got this error:
Traceback (most recent call last):
File "C:\Users\dell\Desktop\test.py", line 8, in <module>
if re.match("^[A-Za-z0-9_-\[]{}^`|]*$", weird_input):
File "C:\Python27\lib\re.py", line 137, in match
return _compile(pattern, flags).match(string)
File "C:\Python27\lib\re.py", line 242, in _compile
raise error, v # invalid expression
error: bad character range
Is there any way I could include those characters?
You need to escape them using \ like:
Online Demo
import re
p = re.compile(ur'[A-Za-z0-9_\-\\\[\]\{\}^`\|]+')
test_str = u"test"
re.match(p, test_str)

Categories

Resources