Remove backslash continuation character - python

I'm trying to parse some code with AST, but I'm having an issue because of backslash continuation character.
When I have a continuation character \, textwrap will not manage to dedent the code, I would like to know how to get rid of it.
code = """
def foo():
message = "This is a very long message that will probably need to wrap at the end of the line!\n \
And it actually did!"
"""
import textwrap
print textwrap.dedent(code)
import ast
ast.parse(textwrap.dedent(code))
I'm adding more details to clarify the question:
I have a module nemo.py with the following content:
class Foo(object):
def bar(self):
message = "This is a very long message that will probably need to wrap at the end of the line!\n \
And it actually did!"
and the main module trying to parse the code:
import ast
import nemo
import inspect
import textwrap
code = str().join(inspect.getsourcelines(nemo.Foo.bar)[0])
ast.parse(textwrap.dedent(code))
And the traceback:
Traceback (most recent call last):
File "/Users/kelsolaar/Documents/Development/Research/_BI.py", line 7, in <module>
ast.parse(textwrap.dedent(code))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 37, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
def bar(self):
^
IndentationError: unexpected indent

This is because you misunderstood what textwrap.dedent() does.
It only remove any common leading white spaces. In your case there's no common leading white space, therefore nothing is removed.
Moreover, what you want is actually \\ instead of \n \ in this case. This is because you actually want what is printed to be parsed. \\ will print only one \ and it's what you want. \n \ will print a new line within "..." clause which is invalid.
Now consider this code:
>>> code = """
def foo():
message = "This is a very long message that will probably need to wrap at the end of the line! \\
And it actually did!"
"""
>>> print textwrap.dedent(code)
def foo():
message = "This is a very long message that will probably need to wrap at the e
nd of the line! \
And it actually did!"
>>> ast.parse(textwrap.dedent(code))
<_ast.Module object at 0x10e9e5bd0>
In this case there is common leading white spaces, and hence they are removed.
Edit:
If you want to get rid of the \ all together, you can consider using """My sentence""" for message in def bar.

For the second part of the question I the following simple replace covers my needs: code.replace("\\n", str())
import ast
import nemo
import inspect
import textwrap
code = str().join(inspect.getsourcelines(nemo.Foo.bar)[0])
code.replace("\\\n", str())
ast.parse(textwrap.dedent(code))

Related

syntax error 'EOL while scanning string literal (<unknown>, line 9)'

import os
async def cmdrun(client, message, prefix):
cmd = message.content.split(' ')[0].split(prefix)[1]
args = message.content.split(cmd)[1][1:].split(' ')
for filename in os.listdir('./commands'):
if filename.endswith('.py'):
imported = filename.split('.py')[0]
strin = f"from commands.{imported} import name, aliases, run\nx = name()\ny = aliases()\nawait message.channel.send(x + y)\nif x == {cmd} or {cmd} in y:\n await run(client, message, args)"
exec(strin)
I am making a discord bot with discord.py.
What is the error
That error is definitely always associated with a missing double-inverted/single-inverted comma. Here's my suggestions:
Try running the prepared lines of code you're attempting to use as is.
I couldn't find anything in the new documentation, but the old
documentation suggest that ...in the current implementation, multi-line compound statements must end with a newline: exec "for v in seq:\n\tprint v\n" works, but exec "for v in seq:\n\tprint v" fails with SyntaxError. Perhaps try adding a new line character at the end?
Try using triple inverted commas.
Hope this solves the issue!

Python 3.7 with regular expressions: Why can I no longer substitute with a string containing a backslash (\)?

For my problem, I've got a very simple example:
import re
my_string = re.sub(r"Hello", r"\Greetings", "Hello Folks!")
print(my_string)
The above, in Python 3.6, will print \Greetings Folks! to the standard output. Let's try this again in Python 3.7.0 or 3.7.4 (the versions which I was able to test). What happens? We receive an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/lib/python3.7/re.py", line 309, in _subx
template = _compile_repl(template, pattern)
File "/usr/lib/python3.7/re.py", line 300, in _compile_repl
return sre_parse.parse_template(repl, pattern)
File "/usr/lib/python3.7/sre_parse.py", line 1024, in parse_template
raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \G at position 0
Why is this? Is there a change in Python 3.7 which I've missed? What's the proper way around this problem?
Yes, there was a change. From the docs about re.sub:
Changed in version 3.7: Unknown escapes in repl consisting of '\' and an ASCII letter now are errors.
So just double up the backslash:
my_string = re.sub(r"Hello", r"\\Greetings", "Hello Folks!")
I guess you're using the \G construct, which you could probably use if you would have installed regex module. Either remove that or add another backslash might be options.
import re
my_string = re.sub(r"Hello", r"\\Greetings", "Hello Folks!")
print(my_string)
Alternative:
$pip install regex
or
$pip3 install regex
Then,
import regex
my_string = regex.sub(r"Hello", r"\Greetings", "Hello Folks!")
print(my_string)
Output
\Greetings Folks!

How to find a spurious print statement?

I'm debugging a large Python codebase. Somewhere, a piece of code is printing {} to console, presumably this is some old debugging code that's been left in by accident.
As this is the only console output that doesn't go through logger, is there any way I can find the culprit? Perhaps by redefining what the print statement does, so I can cause an exception?
Try to redirect sys.stdout to custom stream handler (see Redirect stdout to a file in Python?), where you can override write() method.
Try something like this:
import io
import sys
import traceback
class TestableIO(io.BytesIO):
def __init__(self, old_stream, initial_bytes=None):
super(TestableIO, self).__init__(initial_bytes)
self.old_stream = old_stream
def write(self, bytes):
if 'bb' in bytes:
traceback.print_stack(file=self.old_stream)
self.old_stream.write(bytes)
sys.stdout = TestableIO(sys.stdout)
sys.stderr = TestableIO(sys.stderr)
print('aa')
print('bb')
print('cc')
Then you will get nice traceback:
λ python test.py
aa
File "test.py", line 22, in <module>
print('bb')
File "test.py", line 14, in write
traceback.print_stack(file=self.old_stream)
bb
cc

Python warnings stack levels

I am trying the warning message doesn't include the source line that generated it, using warnings stack levels, but instead of seeing only the message, I am getting one more line which says:
File "sys", line 1
Is possible not to get this line?
This is my code:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import warnings
def warning_function():
warnings.warn("Python 3.x is required!", RuntimeWarning, stacklevel = 8)
if sys.version_info[0] < 3:
...
else:
warning_function()
Well that's exactly what you asked for: the stacklevel=8 parameter requires to unwind 7 calls between showing the current line. As you have not that number of calls, you end in the starting of Python interpreter.
If you want further control on the printed string, you should overwrite the warnings.showwarning function:
old_fw = warnings.showwarning # store previous function...
def new_sw(message, category, filename, lineno, file = None, line = None):
msg = warnings.formatwarning(message, category, filename, lineno,
line).split(':')[-2:]
sys.stderr.write("Warning (from warnings module):\n{}:{}\n".format(
msg[0][1:], msg[1]))
warnings.showwarning = new_sw
That way you will not have the File "...", line ... line

Setting the encoding for sax parser in Python

When I feed a utf-8 encoded xml to an ExpatParser instance:
def test(filename):
parser = xml.sax.make_parser()
with codecs.open(filename, 'r', encoding='utf-8') as f:
for line in f:
parser.feed(line)
...I get the following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "test.py", line 72, in search_test
parser.feed(line)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/xml/sax/expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb4' in position 29: ordinal not in range(128)
I'm probably missing something obvious here. How do I change the parser's encoding from 'ascii' to 'utf-8'?
Your code fails in Python 2.6, but works in 3.0.
This does work in 2.6, presumably because it allows the parser itself to figure out the encoding (perhaps by reading the encoding optionally specified on the first line of the XML file, and otherwise defaulting to utf-8):
def test(filename):
parser = xml.sax.make_parser()
parser.parse(open(filename))
Jarret Hardie already explained the issue. But those of you who are coding for the command line, and don't seem to have the "sys.setdefaultencoding" visible, the quick work around this bug (or "feature") is:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Hopefully reload(sys) won't break anything else.
More details in this old blog:
The Illusive setdefaultencoding
The SAX parser in Python 2.6 should be able to parse utf-8 without mangling it. Although you've left out the ContentHandler you're using with the parser, if that content handler attempts to print any non-ascii characters to your console, that will cause a crash.
For example, say I have this XML doc:
<?xml version="1.0" encoding="utf-8"?>
<test>
<name>Champs-Élysées</name>
</test>
And this parsing apparatus:
import xml.sax
class MyHandler(xml.sax.handler.ContentHandler):
def startElement(self, name, attrs):
print "StartElement: %s" % name
def endElement(self, name):
print "EndElement: %s" % name
def characters(self, ch):
#print "Characters: '%s'" % ch
pass
parser = xml.sax.make_parser()
parser.setContentHandler(MyHandler())
for line in open('text.xml', 'r'):
parser.feed(line)
This will parse just fine, and the content will indeed preserve the accented characters in the XML. The only issue is that line in def characters() that I've commented out. Running in the console in Python 2.6, this will produce the exception you're seeing because the print function must convert the characters to ascii for output.
You have 3 possible solutions:
One: Make sure your terminal supports unicode, then create a sitecustomize.py entry in your site-packages and set the default character set to utf-8:
import sys
sys.setdefaultencoding('utf-8')
Two: Don't print the output to the terminal (tongue-in-cheek)
Three: Normalize the output using unicodedata.normalize to convert non-ascii chars to ascii equivalents, or encode the chars to ascii for text output: ch.encode('ascii', 'replace'). Of course, using this method you won't be able to properly evaluate the text.
Using option one above, your code worked just fine for my in Python 2.5.
To set an arbitrary file encoding for a SAX parser, one can use InputSource as follows:
def test(filename, encoding):
parser = xml.sax.make_parser()
with open(filename, "rb") as f:
input_source = xml.sax.xmlreader.InputSource()
input_source.setByteStream(f)
input_source.setEncoding(encoding)
parser.parse(input_source)
This allows parsing an XML file that has a non-ASCII, non-UTF8 encoding. For example, one can parse an extended ASCII file encoded with LATIN1 like: test(filename, "latin1")
(Added this answer to directly address the title of this question, as it tends to rank highly in search engines.)
Commenting on janpf's answer (sorry, I don't have enough reputation to put it there), note that Janpf's version will break IDLE which requires its own stdout etc. that is different from sys's default. So I'd suggest modifying the code to be something like:
import sys
currentStdOut = sys.stdout
currentStdIn = sys.stdin
currentStdErr = sys.stderr
reload(sys)
sys.setdefaultencoding('utf-8')
sys.stdout = currentStdOut
sys.stdin = currentStdIn
sys.stderr = currentStdErr
There may be other variables to preserve, but these seem like the most important.

Categories

Resources