Missing u-strings on Python 3.2? - python

I have a litany of unit tests that are run on Travis CI and only on PY3.2 it goes belly up. How can I solve this without using six.u()?
def test_parse_utf8(self):
s = String("foo", 12, encoding="utf8")
self.assertEqual(s.parse(b"hello joh\xd4\x83n"), u"hello joh\u0503n")
======================================================================
ERROR: Failure: SyntaxError (invalid syntax (test_strings.py, line 37))
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/virtualenv/python3.2.5/lib/python3.2/site-packages/nose/failure.py", line 39, in runTest
raise self.exc_val.with_traceback(self.tb)
File "/home/travis/virtualenv/python3.2.5/lib/python3.2/site-packages/nose/loader.py", line 414, in loadTestsFromName
addr.filename, addr.module)
File "/home/travis/virtualenv/python3.2.5/lib/python3.2/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/home/travis/virtualenv/python3.2.5/lib/python3.2/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/travis/build/construct/construct/tests/test_strings.py", line 37
self.assertEqual(s.build(u"hello joh\u0503n"), b"hello joh\xd4\x83n")
^
SyntaxError: invalid syntax
Trying to get this to work:
PY3 = sys.version_info[0] == 3
def u(s): return s if PY3 else s.decode("utf-8")
self.assertEqual(s.parse(b"hello joh\xd4\x83n"), u("hello joh\u0503n"))
Quote from https://pythonhosted.org/six/
On Python 2, u() doesn’t know what the encoding of the literal is.
Each byte is converted directly to the unicode codepoint of the same
value. Because of this, it’s only safe to use u() with strings of
ASCII data.
But the whole point of using unicode is to not be restricted to ASCII.

I think you're out of luck here.
Either use six.u() or drop support for Python 3.2.

Could you instead do from __future__ import unicode_literals and not use the u syntax anywhere?
from __future__ import unicode_literals makes string literals without a preceding u in earlier versions of Python act as in Python 3, that is default to unicode. So if you do from __future__ import unicode_literals and change all u"strings" to "strings", your string literals will be unicode in all versions. This will not affect b literals.

I taken the implementation of six.u() and discarded six.
import sys
PY3 = sys.version_info[0] == 3
def u(s): return s if PY3 else unicode(s.replace(r'\\', r'\\\\'), "unicode_escape")

Related

Compilation errors in the files included in the PyYAML-3.10 package while calling the module yaml from a python file

I am using the PyYAML-3.10 as part of a Python program on macOS 10, using Python version 2.7.10. I am not able to make sense of these compilation errors. Since PyYAML-3.10 is a stable version of PyYAML, it should give no compilation errors. The errors are listed below. Any suggestions would be appreciated.
File "pyR#TE.py", line 3, in <module>
import yaml
File "/Users/PyR#TE/pyrate-1.0.0/yaml/__init__.py", line 8, in <module>
from .loader import *
File "/Users/PyR#TE/pyrate-1.0.0/yaml/loader.py", line 4, in <module>
from .reader import *
File "/Users/PyR#TE/pyrate-1.0.0/yaml/reader.py", line 45, in <module>
class Reader(object):
File "/Users/PyR#TE/pyrate-1.0.0/yaml/reader.py", line 137, in Reader
NON_PRINTABLE = re.compile('[^\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD]')
raise error, v # invalid expression
sre_constants.error: bad character range
It seems that PyYAMP-3.10 is not compatible with Python 2. (Did you mean "PyYAML", by the way? I could not find a reference to a "PyYAMP" package anywhere.) The compilation error you are seeing is from re.compile - when Python is trying to compile a regular expression.
I tried using the line in your error message containing re.compile in Python 2 and Python 3.
Python 2:
>>> import re
>>> re.compile('[^\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 194, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: bad character range
>>>
Python 3:
>>> import re
>>> re.compile('[^\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD]')
re.compile('[^\t\n\r -~\x85\xa0-\ud7ff\ue000-�]')
>>>
So your options are either to find a package that supports Python 2, or to upgrade your code to Python 3. I recommend upgrading, as Python 2 is no longer supported.

Python 3 fails at pdb "b main" with UnicodeDecodeError?

The only similar question to this I've found is Django UnicodeDecodeError when using pdb - unfortunately, the solution there does not apply to this case.
Consider the following code, test.py:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# encoding: utf-8
def subtract(ina, inb):
myresult = ina - inb
return myresult
def main():
y2 = 10
y1 = 7
# calculate (y₂-y₁)
print("Calculating difference between y2: {} and y1: {}".format(y2, y1))
result = subtract(y2, y1)
print("The result is: {}".format(result))
if __name__ == '__main__':
main()
Using Python3 from Anaconda3 on Windows 10:
(base) C:\tmp>conda --version
conda 4.7.12
(base) C:\tmp>python --version
Python 3.7.3
... I can run this program without a problem:
(base) C:\tmp>python test.py
Calculating difference between y2: 10 and y1: 7
The result is: 3
However, if I want to debug/step through this program using pdb, it fails as soon as I type b main to set a breakpoint on the main function:
(base) C:\tmp>python -m pdb test.py
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) b main
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 648, in do_break
lineno = int(arg)
ValueError: invalid literal for int() with base 10: 'main'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 659, in do_break
code = func.__code__
AttributeError: 'str' object has no attribute '__code__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 1701, in main
pdb._runscript(mainpyfile)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 1570, in _runscript
self.run(statement)
File "C:\ProgramData\Anaconda3\lib\bdb.py", line 585, in run
exec(cmd, globals, locals)
File "<string>", line 1, in <module>
File "c:\tmp\test.py", line 6, in <module>
def subtract(ina, inb):
File "c:\tmp\test.py", line 6, in <module>
def subtract(ina, inb):
File "C:\ProgramData\Anaconda3\lib\bdb.py", line 88, in trace_dispatch
return self.dispatch_line(frame)
File "C:\ProgramData\Anaconda3\lib\bdb.py", line 112, in dispatch_line
self.user_line(frame)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 261, in user_line
self.interaction(frame, None)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 352, in interaction
self._cmdloop()
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 321, in _cmdloop
self.cmdloop()
File "C:\ProgramData\Anaconda3\lib\cmd.py", line 138, in cmdloop
stop = self.onecmd(line)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 418, in onecmd
return cmd.Cmd.onecmd(self, line)
File "C:\ProgramData\Anaconda3\lib\cmd.py", line 217, in onecmd
return func(arg)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 667, in do_break
(ok, filename, ln) = self.lineinfo(arg)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 740, in lineinfo
answer = find_function(item, fname)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 100, in find_function
for lineno, line in enumerate(fp, start=1):
File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 199: character maps to <undefined>
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> c:\programdata\anaconda3\lib\encodings\cp1252.py(23)decode()
-> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
(Pdb) q
Post mortem debugger finished. The test.py will be restarted
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) q
(base) C:\tmp>
The problem is the comment line: # calculate (y₂-y₁); if it is deleted, then pdb starts fine:
(base) C:\tmp>python -m pdb test.py
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) b main
Breakpoint 1 at c:\tmp\test.py:10
(Pdb) q
(base) C:\tmp>
I'm slightly surprised by this - wasn't Python3 supposed to be "utf-8 by default"?
Obviously, this is a trivial case where I can easily erase the single comment line that causes the trouble. However, I have a large script, where I have utf-8 characters all over the place, both in comments, and in prints I'd actually want to step through, and it is not really viable to go in and manually change all those instances to UTF-8 characters.
So, is there a way to cheat Python3's pdb, so it works - even if there are utf-8 characters present in the source code (regardless if in comments, or in actual commands)?
Python 3 is UTF-8 by default, but the environment in which it is operating is not - it has a default encoding of cp1252.
You can set the PYTHONIOENCODING environment variable to UTF-8 to override the default encoding, or change the environment to use UTF-8.
Edit
I analysed this too hastily. The above solutions apply to fixing unicode errors raised when reading or writing from stdin/stdout, but the problem here is that pdb opens a file for reading without specifying an encoding:
def find_function(funcname, filename):
cre = re.compile(r'def\s+%s\s*[(]' % re.escape(funcname))
try:
fp = open(filename)
except OSError:
return None
If no encoding is specified, according to the io docs Python will default to using the result of locale.getpreferredencoding - presumably cp1252 in this case.
One solution might be to set the console locale before running the debugger.
It may also be possible to set the PYTHONUTF8 environment variable to 1. Amongst other things, this will cause
open(), io.open(), and codecs.open() use the UTF-8 encoding by default.

While calling simplify in sympy getting error?

When my python code tried to use simplify it shows following error. This problem showed after i run separate code file of pyparsing(Which execute successfully). The same code is working fine before.
Edit:
>>> expression="a+b+z"
>>> t=simplify(expression)
ast.py:4: SyntaxWarning: invalid pattern (**) passed to Regex
operator = pp.Regex("**").setName("operator")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sympy\simplify\simplify.py", line 507, in simplify
expr = sympify(expr)
File "C:\Python27\lib\site-packages\sympy\core\sympify.py", line 308, in sympify
from sympy.parsing.sympy_parser import (parse_expr, TokenError,
File "C:\Python27\lib\site-packages\sympy\parsing\sympy_parser.py", line 11, in <module>
import ast
File "ast.py", line 4, in <module>
operator = pp.Regex("**").setName("operator")
File "C:\Python27\lib\site-packages\pyparsing.py", line 1920, in __init__
self.re = re.compile(self.pattern, self.flags)
File "C:\Python27\Lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\Python27\Lib\re.py", line 244, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
Please suggest?
You have a local file, ast.py, which is getting imported in place of Python's built-in ast module. You should remove or rename this file to avoid the name conflict, as this can cause other modules to not work correctly.
Additionally, your local module contains the following line, which is causing an exception on import:
operator = pp.Regex("**").setName("operator")
** is not a valid regular expression. In a regular expression, * means "0 or more repetitions of the preceding expression", which doesn't make sense at the beginning of an expression because there is "nothing to repeat" (as the error message says).

Invalid syntax while running test on Travis

I'm having a problem with Travis on every commit. My tests work on local but on Travis I get this error:
Traceback (most recent call last):
File "/opt/python/3.2.5/lib/python3.2/unittest/case.py", line 370, in _executeTestPart
function()
File "/opt/python/3.2.5/lib/python3.2/unittest/loader.py", line 32, in testFailure
raise exception
ImportError: Failed to import test module: test.test_parser
Traceback (most recent call last):
File "/opt/python/3.2.5/lib/python3.2/unittest/loader.py", line 261, in _find_tests
module = self._get_module_from_name(name)
File "/opt/python/3.2.5/lib/python3.2/unittest/loader.py", line 239, in _get_module_from_name
__import__(name)
File "/home/travis/build/davidmogar/genderator/test/test_parser.py", line 5, in <module>
import genderator
File "/home/travis/build/davidmogar/genderator/genderator/__init__.py", line 3, in <module>
from genderator.parser import Parser
File "/home/travis/build/davidmogar/genderator/genderator/parser.py", line 5, in <module>
from .utils import Normalizer
File "/home/travis/build/davidmogar/genderator/genderator/utils.py", line 63
u'\N{COMBINING TILDE}'
^
SyntaxError: invalid syntax
Here is the code where that line is:
def remove_accent_marks(text):
good_accents = {
u'\N{COMBINING TILDE}',
u'\N{COMBINING CEDILLA}'
}
return ''.join(c for c in unicodedata.normalize('NFKD', text)
if unicodedata.category(c) != 'Mn' or c in good_accents)
I have no idea about what is the problem because as I've said, all test are working in local. Here is my .travis.yml file:
language: python
python:
- "3.2"
- "3.3"
- "3.4"
script: python -m unittest discover
Any idea?
The u'...' syntax in Python 3 is only supported in Python 3.3 and up.
The u prefix is only there to support polyglot Python code (supporting both 2 and 3), and can be safely removed if you don't need to support Python 2.
If you need to support both Python 2 and 3.2, you'll have to use a different approach. You could use a from __future__ import to make all string literals in Python 2 produce unicode string objects; this applies per module:
from __future__ import unicode_literals
def remove_accent_marks(text):
good_accents = {
'\N{COMBINING TILDE}',
'\N{COMBINING CEDILLA}'
}
The strings will be treated as Unicode in both Python 2 and 3.
Or you could create your own polyglot function:
import sys
if sys.version_info[0] < 3:
u = lambda s: unicode(s.replace(r'\\', r'\\\\'), "unicode_escape")
else:
u = lambda s: s
and use that on all your Unicode strings:
def remove_accent_marks(text):
good_accents = {
u('\N{COMBINING TILDE}'),
u('\N{COMBINING CEDILLA}')
}
or you can use the six library to produce that bridge for you:
import six
def remove_accent_marks(text):
good_accents = {
six.u('\N{COMBINING TILDE}'),
six.u('\N{COMBINING CEDILLA}')
}
You may want to read the Python Porting HOWTO.

HeaderParseError in python

I get a HeaderParseError if I try to parse this string with decode_header() in python 2.6.5 (and 2.7). Here the repr() of the string:
'=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?='
This string comes from a mime email which contains a JPEG picture. Thunderbird can
decode the filename (which contains German umlauts).
>>> from email.header import decode_header
>>> decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header
raise HeaderParseError
email.errors.HeaderParseError
It seems an incompatibility between Python's character set for base64-encoded strings and the mail agent's:
>>> from email.header import decode_header
>>> a='QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw=='
>>> decode_header(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/email/header.py", line 108, in decode_header
raise HeaderParseError
email.errors.HeaderParseError
>>> a1= a.replace('_', '/')
>>> decode_header(a1)
[('Anmeldung Netzanschluss S\xecdring3p.jpg', 'iso-8859-1')]
>>> print _[0][0].decode(_[0][1])
Anmeldung Netzanschluss Südring3p.jpg
Python utilizes the character set that the Wikipedia article suggests (i.e 0-9, A-Z, a-z, +, /). In that same article, some alternatives (including the underscore that's the issue here) are included; however, the underscore's value is vague (it's value 62 or 63, depending on the alternative).
I don't know what Python can do to guess the intentions of b0rken mail agents; so I suggest you do some appropriate guessing whenever decode_header fails.
I'm calling “broken” the mail agent because there is no need to escape either + or / in a message header: it's not a URL, so why not use the typical character set?

Categories

Resources