Python detecting a literal backslash as a line continuation character - python

How can I stop python seeing "\\" as an invalid line continuation character and start seeing it as a literal backslash? This is a chronic problem but as an example this line of code to move files in a folder to another subfolder :
[rename(I, f"Mainfolder\\InvalidFileStorage\\{ I.rsplit("\\").pop() }") for I in InvalidFiles]
(ps, I am aware that this list comprehension might not be right yet, but I haven't been able to bug test it since I can't run the code without it complaining about line continuation characters)
I am aware from previous instances of this happening that you can typically solve the problem by just moving the code into several lines and using variables to store it, but that would take this simple one liner and make it several times larger and I hate having to constantly do that for otherwise simple code segments.

Your issue comes from the fact that python's f-string parsing is quite weak and can't handle quotes inside expression areas, as they'll break the string:
f"asdf {"a" + "b"} asdf" # not allowed
However, quotes of the opposite type are:
f"asdf {'a' + 'b'} asdf" # fine
Once you fix this issue, it'll error saying that backslashes aren't allowed inside an f-string expression. The easiest way to circumvent this is to just move it to a function:
def process(s):
return s.rsplit("\\").pop()
[rename(I, f"Mainfolder\\InvalidFileStorage\\{ process(I) }") for I in InvalidFiles]

Related

Removing a control character using Python

I have a script that processes the output of a command (the aws help cli command).
I step through the output line-by-line and don't start the actual real parsing until I encounter the text "AVAILABLE COMMANDS" at which point I set a flag to true and start further processing on each line.
I've had this working fine - BUT on Ubuntu we encounter a problem which is this :
The CLI highlights the text in a way I have not seen before:
The output is very long, so I've grep'd the particular line in question - see below:
># aws ec2 help | egrep '^A'
>AVAILABLE COMMANDS
># aws ec2 help | egrep '^A' | cat -vet
>A^HAV^HVA^HAI^HIL^HLA^HAB^HBL^HLE^HE C^HCO^HOM^HMM^HMA^HAN^HND^HDS^HS$
What I haven't seen before is that each letter that is highligted is in the format X^HX.
I'd like to apply a simple transformation of the type X^HX --> X (for all a-zA-Z).
What have I tried so far:
well my workaround is this - first I remove control characters like this:
String = re.sub(r'[\x00-\x1f\x7f-\x9f]','',String)
but I still have to search for 'AAVVAAIILLAABBLLEE' which is totally ugly. I considered using a further regex to turn doubles to singles but that will catch true doubles and get messy.
I started writing a function with an iteration across a constructed list of alpha characters to translate as described, and I used hexdump to try to figure out the exact \x code of the control characters in question but could not get it working - I could remove H but not the ^.
I really don't want to use any additional modules because I want to make this available to people without them having to install extras. In conclusion I have a workaround that is quite ugly, but I'm sure someone must know a quick an easy way to do this translation. It's odd that it only seems to show up on Ubuntu.
After looking at this a little further I was able to put in place a solution:
from string import ascii_lowercase
from string import ascii_uppercase
def RemoveUbuntuHighlighting(String):
for Char in ascii_uppercase + ascii_lowercase:
Match = Char + '\x08' + Char
String = re.sub(Match,Char,String)
return(String)
I'm still a little confounded to see characters highlighted in the format (X\x08X), the arrangement does seem to repeat the same information unnecessarily.
The other thing I would advise to anyone not familiar with reading hexcode is that each pair of hexes is swapped around with respect to the order of their appearance.
A much simpler and more reliable fix is to replace a backspace and duplicate of any character.
I have also augmented this to handle underscores using the same mechanism (character, backspace, underscore).
String = re.sub(r'(.)\x08(\1|_)', r'\1', String)
Demo: https://ideone.com/yzwd2V
This highlighting was standard back when output was to a line printer; backspacing and printing the same character again would add pigmentation to produce boldface. (Backspacing and printing an underscore would produce underlining.)
Probably the AWS CLI can be configured to disable this by setting the TERM variable to something like dumb. There is also a utility col which can remove this formatting (try col-b; maybe see also colcrt). Though perhaps really the best solution would be to import the AWS Python code and extract the help message natively.

What is this error message, and how do I resolve it?

I'm taking google's python tutorial and may have fat fingered a few keys, causing an error. I don't recognize the issue here, and the ctrl+click links it allows me to follow take me to line 1 of the file I'm writing in and to python.exe.
It looks like there's an extra character somewhere in a file path? There are no syntax errors in the code itself as the debugger runs through it just fine.
I'm using Visual Studio Code
None of the code I've written (with my knowledge) is causing this error.
This is the error message I'm getting.
SyntaxError: invalid syntax
>>> & C:/Users/mcgilm1/AppData/Local/Programs/Python/Python37-32/python.exe c:/Users/mcgilm1/Documents/google-python-exercises/basic/string2.py
File "<stdin>", line 1
& C:/Users/mcgilm1/AppData/Local/Programs/Python/Python37-32/python.exe c:/Users/mcgilm1/Documents/google-python-exercises/basic/string2.py
^
Including your code could help us, here is a list of things to check that I found on the web: Debugging
Make sure you are not using a Python keyword for a variable name.
Check that you have a colon at the end of the header of every compound statement, including for, while, if, and def statements.
Check that indentation is consistent. You may indent with either spaces or tabs but it’s best not to mix them. Each level should be nested the same amount.
Make sure that any strings in the code have matching quotation marks.
If you have multiline strings with triple quotes (single or double), make sure you have terminated the string properly. An unterminated string may cause an invalid token error at the end of your program, or it may treat the following part of the program as a string until it comes to the next string. In the second case, it might not produce an error message at all!
An unclosed bracket – (, {, or [ – makes Python continue with the next line as part of the current statement. Generally, an error occurs almost immediately in the next line.
Check for the classic = instead of == inside a conditional.

Parsing blocks as Python

I am writing a lexer + parser in JFlex + CUP, and I wanted to have Python-like syntax regarding blocks; that is, indentation marks the block level.
I am unsure of how to tackle this, and whether it should be done at the lexical or sintax level.
My current approach is to solve the issue at the lexical level - newlines are parsed as instruction separators, and when one is processed I move the lexer to a special state which checks how many characters are in front of the new line and remembers in which column the last line started, and accordingly introduces and open block or close block character.
However, I am running into all sort of trouble. For example:
JFlex cannot match empty strings, so my instructions need to have at least one blanck after every newline.
I cannot close two blocks at the same time with this approach.
Is my approach correct? Should I be doing things different?
Your approach of handling indents in the lexer rather than the parser is correct. Well, it’s doable either way, but this is usually the easier way, and it’s the way Python itself (or at least CPython and PyPy) does it.
I don’t know much about JFlex, and you haven’t given us any code to work with, but I can explain in general terms.
For your first problem, you're already putting the lexer into a special state after the newline, so that "grab 0 or more spaces" should be doable by escaping from the normal flow of things and just running a regex against the line.
For your second problem, the simplest solution (and the one Python uses) is to keep a stack of indents. I'll demonstrate something a bit simpler than what Python does.
First:
indents = [0]
After each newline, grab a run of 0 or more spaces as spaces. Then:
if len(spaces) == indents[-1]:
pass
elif len(spaces) > indents[-1]:
indents.append(len(spaces))
emit(INDENT_TOKEN)
else:
while len(spaces) != indents[-1]:
indents.pop()
emit(DEDENT_TOKEN)
Now your parser just sees INDENT_TOKEN and DEDENT_TOKEN, which are no different from, say, OPEN_BRACE_TOKEN and CLOSE_BRACE_TOKEN in a C-like language.
Of you’d want better error handling—raise some kind of tokenizer error rather than an implicit IndexError, maybe use < instead of != so you can detect that you’ve gone too far instead of exhausting the stack (for better error recovery if you want to continue to emit further errors instead of bailing at the first one), etc.
For real-life example code (with error handling, and tabs as well as spaces, and backslash newline escaping, and handling non-syntactic indentation inside of parenthesized expressions, etc.), see the tokenize docs and source in the stdlib.

Python IDLE is glitching when I try to load images with pygame

I don't know how to explain this so I have included a video showing you what's happening.
https://www.youtube.com/watch?v=XCNl24mpko0&feature=youtu.be
Notice how it says it's trying to load "imagesed shield.png" This is because the baskslash is escaping "r". Putting an "r" at the front will fix it by converting the string to a raw string, as will replacing the backslash with a forward slash, or escaping the backslash itself.
red_shield = pyg.image.load(r'images\red shield.png')
red_shield2 = pyg.image.load('images/red shield.png')
red_shield3 = pyg.image.load('images\\red shield.png')
Edit: I suppose I should mention that I assume this is due to IDLE trying to represent a break character (\r is a break character, hence the answer). I don't really know if it's a real issue in the grand scheme of things.

How to append '\\?\' to the front of a file path in Python

I'm trying to work with some long file paths (Windows) in Python and have come across some problems. After reading the question here, it looks as though I need to append '\\?\' to the front of my long file paths in order to use them with os.stat(filepath). The problem I'm having is that I can't create a string in Python that ends in a backslash. The question here points out that you can't even end strings in Python with a single '\' character.
Is there anything in any of the Python standard libraries or anywhere else that lets you simply append '\\?\' to the front of a file path you already have? Or is there any other work around for working with long file paths in Windows with Python? It seems like such a simple thing to do, but I can't figure it out for the life of me.
"\\\\?\\" should give you exactly the string you want.
Longer answer: of course you can end a string in Python with a backslash. You just can't do so when it's a "raw" string (one prefixed with an 'r'). Which you usually use for strings that contains (lots of) backslashes (to avoid the infamous "leaning toothpick" syndrome ;-))
Even with a raw string, you can end in a backslash with:
>>> print r'\\?\D:\Blah' + '\\'
\\?\D:\Blah\
or even:
>>> print r'\\?\D:\Blah' '\\'
\\?\D:\Blah\
since Python concatenates to literal strings into one.

Categories

Resources