printing \x1bZ in python is weird, explanation needed

printing \x1bZ in python is weird, explanation needed - python

The following python code has some weird behavior which I don't understand
print('\x1bZ')
when I run this code whether in a file or in the interpreter I have a wierd outcome:
actual values as displayed when you write this value to a file as bytes:
Discoveries at time of posting this question:
whether single quotes or double quotes make a difference (they don't)
0x1b is hex for 27 which in ascii is ESC which matches as displayed with the second picture. This lead me to theorize that the letter Z in the string literal can be replaced but as per my test in point number 3 it cant be reproduced with other letters
instead of \x1bZ (ESC and then Z) trying ESC and then some other letter (I haven't checked all possibilities) yielded no apparent result except from replacing Z with c which seems to clear the terminal
Hypothesis that I came up with:
This page may be relevant to the answer: https://pypi.org/project/py100/ because I have found a pattern there that resembles weird result: Esc[?1;Value0c where Value would be replaced by something. also ^[[?1;<n>0c appears in https://espterm.github.io/docs/VT100%20escape%20codes.html
Is this some encoding problem?
Is this related to ANSI character escaping? [?1;0c vs [38;2; which is used when changing background color of text
Questions:
Why is this particular sequence of characters results in this output?
What is VT100 and how it is related if it is related? (I visited it's Wikipedia page)
whether it is possible to print a string that contains this specific sequence without that weird outcome as displayed in the first picture
all help and knowledge about this will be appreciated!!

Related

vscode python avoid indent of multiline string

So here is an example of a multiline string in vscode/python:
Cursor is after the p , and then you press enter, and end up like this:
i.e. the string ends up indented, which seems what you almost never want - why have an arbitratly amount of whitespace on the next line of this string ?
Is there any way change this in vscode, i.e. for multiline strings, it should end up with this:

I think this problem is related to different coding styles of different people.
For example,
def example(x):
if x:
a = '''
This is help
'''
def example(x):
if x:
a = '''This is help
'''
The automatic indenting of vscode line breaks is based on code blocks. If you want Vscode can identify multiline string, I think it would be better to submit future request in github. I've submitted this issue for you.

I am not 100% sure if what OP meant is just to refer to the indentation in the editor (namely, VSC) or if, by this:
i.e. the string ends up indented, which seems what you almost never want - why have an arbitrary amount of white space on the next line of this string?
...they also meant to refer to the actual output of the multi-line string,
(or also, just in case anybody else finds this post looking for a way to avoid this affecting the actual output of the multi-line string), I'd like to add as a complementary answer (cannot comment yet) that this was already beautifully answered here.
If that's the case and you're reading this for that reason, in short, all you want is to import the standard lib 'inspect' and post-process your string with it, using the cleandoc method.
Without breaking the indentation in your IDE, this method makes sure to give you the string output you actually expected:
All leading whitespace is removed from the first line. Any leading whitespace that can be uniformly removed from the second line onwards is removed. Empty lines at the beginning and end are subsequently removed. Also, all tabs are expanded to spaces.
(From the docs link above)
Hope that helps anyone.

python in Visual Studio Code - how to print funky stuff

I have been testing printing colors and characters in VS Code (version 1.69) using python 3.+. To print colored text in VS code you would use:
print("\033[31mThis is red font.\033[0m")
print("\033[32mThis is green font.\033[0m")
print("\033[33mThis is yellow font.\033[0m")
print("\033[34mThis is blue font.\033[0m")
print("\033[37mThis is the default font. \033[0m")
Special characters would be like the following:
print("\1\2\3\4\05\06\07\016\017\013\014\020")
print("\21\22\23\24\25\26\27\36\37\31\32\34\35")
Part 1 of my question: How would you print special characters from a loop? What I tried is:
for i in range(1, 99):
t = "\\" + str(i)
print(t)
Part 2: Is there a way to print dark text with a colored highlighted background?

The first example is showing ansi escape sequences, the second example is using a common convention in many languages, including Python, to include non-standard characters in a string by escaping their character value, but in your example, you may not be realising that you're escaping octal values, instead of decimal ones.
Printing them is no different from printing any character though - I think you may be confusing printing strings representing values and the actual values of variables, a very common mistake/confusion for beginning programmers. If you want to be able to print ('\21') without writing out the string, you could just print(chr(17)), because 17 is the decimal equivalent of octal 21.
Have a look at the documentation for string literals for more detail.
The loop you're trying to create would be something like:
for i in range(1, 99):
print(chr(i))
But you have to keep in mind that if i gets to 21, it's not printing '\21', but '\25' since 25 is the octal representation of the decimal value 21.
Note: also, you're asking specifically about VSCode, but that's a different question altogether. Whether or not the console in VSCode supports printing ANSI escape sequences depends on the type of terminal, it doesn't really have that much to do with what you do in your code. However, if you want ANSI escape sequences to render in text files, there's extensions for that.

How to get the length of a unicode string? [duplicate]

given a character like "✮" (\xe2\x9c\xae), for example, can be others like "Σ", "д" or "Λ") I want to find the "actual" length that character takes when printed onscreen
for example
len("✮")
len("\xe2\x9c\xae")
both return 3, but it should be 1

You may try like this:
unicodedata.normalize('NFC', u'✮')
len(u"✮")
UTF-8 is an unicode encoding which uses more than one byte for special characters. Check unicodedata.normalize()

My answer to a similar question:
You are looking for the rendering width from the current output context. For graphical UIs, there is usually a method to directly query this information; for text environments, all you can do is guess what a conformant rendering engine would probably do, and hope that the actual engine matches your expectations.

python 3 regex not finding confirmed matches

So I'm trying to parse a bunch of citations from a text file using the re module in python 3.4 (on, if it matters, a mac running mavericks). Here's some minimal code. Note that there are two commented lines: they represent two alternative searches. (Obviously, the little one, r'Rawls', is the one that works)
def makeRefList(reffile):
print(reffile)
# namepattern = r'(^[A-Z1][A-Za-z1]*-?[A-Za-z1]*),.*( \(?\d\d\d\d[a-z]?[.)])'
# namepattern = r'Rawls'
refsTuplesList = re.findall(namepattern, reffile, re.MULTILINE)
print(refsTuplesList)
The string in question is ugly, and so I stuck it in a gist: https://gist.github.com/paultopia/6c48c398a42d4834f2ae
As noted, the search string r'Rawls' produces expected output ['Rawls', 'Rawls']. However, the other search string just produces an empty list.
I've confirmed this regex (partially) works using the regex101 tester. Confirmation here: https://regex101.com/r/kP4nO0/1 -- this match what I expect it to match. Since it works in the tester, it should work in the code, right?
(n.b. I copied the text from terminal output from the first print command, then manually replaced \n characters in the string with carriage returns for regex101.)
One possible issue is that python has appended the bytecode flag (is the little b called a "flag?") to the string. This is an artifact of my attempt to convert the text from utf-8 to ascii, and I haven't figured out how to make it go away.
Yet re clearly is able to parse strings in that form. I know this because I'm converting two text files from utf-8 to ascii, and the following code works perfectly fine on the other string, converted from the other text file, which also has a little b in front of it:
def makeCiteList(citefile):
print(citefile)
citepattern = r'[\s(][A-Z1][A-Za-z1]*-?[A-Za-z1]*[ ,]? \(?\d\d\d\d[a-z]?[\s.,)]'
rawCitelist = re.findall(citepattern, citefile)
cleanCitelist = cleanup(rawCitelist)
finalCiteList = list(set(cleanCitelist))
print(finalCiteList)
return(finalCiteList)
The other chunk of text, which the code immediately above matches correctly: https://gist.github.com/paultopia/a12eba2752638389b2ee
The only hypothesis I can come up with is that the first, broken, regex expression is puking on the combination of newline characters and the string being treated as a byte object, even though a) I know the regex is correct for newlines (because, confirmation from the linked regex101), and b) I know it's matching the strings (because, confirmation from the successful match on the other string).
If that's true, though, I don't know what to do about it.
Thus, questions:
1) Is my hypothesis right that it's the combination of newlines and b that blows up my regex? If not, what is?
2) How do I fix that?
a) replace the newlines with something in the string?
b) rewrite the regex somehow?
c) somehow get rid of that b and make it into a normal string again? (how?)
thanks!
Addition
In case this is a problem I need to fix upstream, here's the code I'm using to get the text files and convert to ascii, replacing non-ascii characters:
this function gets called on utf-8 .txt files saved by textwrangler in mavericks
def makeCorpoi(citefile, reffile):
citebox = open(citefile, 'r')
refbox = open(reffile, 'r')
citecorpus = citebox.read()
refcorpus = refbox.read()
citebox.close()
refbox.close()
corpoi = [str(citecorpus), str(refcorpus)]
return corpoi
and then this function gets called on each element of the list the above function returns.
def conv2ASCII(bigstring):
def convHandler(error):
return ('1FOREIGN', error.start + 1)
codecs.register_error('foreign', convHandler)
bigstring = bigstring.encode('ascii', 'foreign')
stringstring = str(bigstring)
return stringstring

Aah. I've tracked it down and answered my own question. Apparently one needs to call some kind of encode method on the decoded thing. The following code produces an actual string, with newlines and everything, out the other end (though now I have to fix a bunch of other bugs before I can figure out if the final output is as expected):
def conv2ASCII(bigstring):
def convHandler(error):
return ('1FOREIGN', error.start + 1)
codecs.register_error('foreign', convHandler)
bigstring = bigstring.encode('ascii', 'foreign')
newstring = bigstring.decode('ascii', 'foreign')
return newstring
apparently the str() function doesn't do the same job, for reasons that are mysterious to me. This is despite an answer here How to make new line commands work in a .txt file opened from the internet? which suggests that it does.

Invalid character in identifier

I am working on the letter distribution problem from HP code wars 2012. I keep getting an error message that says "invalid character in identifier". What does this mean and how can it be fixed?
Here is the page with the information.
import string
def text_analyzer(text):
'''The text to be parsed and
the number of occurrences of the letters given back
be. Punctuation marks, and I ignore the EOF
simple. The function is thus very limited.
'''
result = {}
# Processing
for a in string.ascii_lowercase:
result [a] = text.lower (). count (a)
return result
def analysis_result (results):
# I look at the data
keys = analysis.keys ()
values \u200b\u200b= list(analysis.values \u200b\u200b())
values.sort (reverse = True )
# I turn to the dictionary and
# Must avoid that letters will be overwritten
w2 = {}
list = []
for key in keys:
item = w2.get (results [key], 0 )
if item = = 0 :
w2 [analysis results [key]] = [key]
else :
item.append (key)
w2 [analysis results [key]] = item
# We get the keys
keys = list (w2.keys ())
keys.sort (reverse = True )
for key in keys:
list = w2 [key]
liste.sort ()
for a in list:
print (a.upper (), "*" * key)
text = """I have a dream that one day this nation will rise up and live out the true
meaning of its creed: "We hold these truths to be self-evident, that all men
are created equal. "I have a dream that my four little children will one day
live in a nation where they will not be Judged by the color of their skin but
by the content of their character.
# # # """
analysis result = text_analyzer (text)
analysis_results (results)

The error SyntaxError: invalid character in identifier means you have some character in the middle of a variable name, function, etc. that's not a letter, number, or underscore. The actual error message will look something like this:
File "invalchar.py", line 23
values = list(analysis.values ())
^
SyntaxError: invalid character in identifier
That tells you what the actual problem is, so you don't have to guess "where do I have an invalid character"? Well, if you look at that line, you've got a bunch of non-printing garbage characters in there. Take them out, and you'll get past this.
If you want to know what the actual garbage characters are, I copied the offending line from your code and pasted it into a string in a Python interpreter:
>>> s=' values = list(analysis.values ())'
>>> s
' values \u200b\u200b= list(analysis.values \u200b\u200b())'
So, that's \u200b, or ZERO WIDTH SPACE. That explains why you can't see it on the page. Most commonly, you get these because you've copied some formatted (not plain-text) code off a site like StackOverflow or a wiki, or out of a PDF file.
If your editor doesn't give you a way to find and fix those characters, just delete and retype the line.
Of course you've also got at least two IndentationErrors from not indenting things, at least one more SyntaxError from stay spaces (like = = instead of ==) or underscores turned into spaces (like analysis results instead of analysis_results).
The question is, how did you get your code into this state? If you're using something like Microsoft Word as a code editor, that's your problem. Use a text editor. If not… well, whatever the root problem is that caused you to end up with these garbage characters, broken indentation, and extra spaces, fix that, before you try to fix your code.

If your keyboard is set to English US (International) rather than English US the double quotation marks don't work. This is why the single quotation marks worked in your case.

Similar to the previous answers, the problem is some character (possibly invisible) that the Python interpreter doesn't recognize. Because this is often due to copy-pasting code, re-typing the line is one option.
But if you don't want to re-type the line, you can paste your code into this tool or something similar (Google "show unicode characters online"), and it will reveal any non-standard characters. For example,
s=' values = list(analysis.values ())'
becomes
s=' values U+200B U+200B = list(analysis.values U+200B U+200B ())'
You can then delete the non-standard characters from the string.

Carefully see your quotation, is this correct or incorrect! Sometime double quotation doesn’t work properly, it's depend on your keyboard layout.

I got a similar issue. My solution was to change minus character from:
—
to
-

I got that error, when sometimes I type in Chinese language.
When it comes to punctuation marks, you do not notice that you are actually typing the Chinese version, instead of the English version.
The interpreter will give you an error message, but for human eyes, it is hard to notice the difference.
For example, "，" in Chinese; and "," in English.
So be careful with your language setting.

Not sure this is right on but when i copied some code form a paper on using pgmpy and pasted it into the editor under Spyder, i kept getting the "invalid character in identifier" error though it didn't look bad to me. The particular line was grade_cpd = TabularCPD(variable='G',\
For no good reason I replaced the ' with " throughout the code and it worked. Not sure why but it did work

A little bit late but I got the same error and I realized that it was because I copied some code from a PDF. Check the difference between these two:
-
−
The first one is from hitting the minus sign on keyboard and the second is from a latex generated PDF.

This error occurs mainly when copy-pasting the code. Try editing/replacing minus(-), bracket({) symbols.

You don't get a good error message in IDLE if you just Run the module. Try typing an import command from within IDLE shell, and you'll get a much more informative error message. I had the same error and that made all the difference.
(And yes, I'd copied the code from an ebook and it was full of invisible "wrong" characters.)

My solution was to switch my Mac keyboard from Unicode to U.S. English.

it is similar for me as well after copying the code from my email.
def update(self, k=1, step = 2):
if self.start.get() and not self.is_paused.get(): U+A0
x_data.append([i for i in range(0,k,1)][-1])
y = [i for i in range(0,k,step)][-1]
There is additional U+A0 character after checking with the tool as recommended by #Jacob Stern.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

printing \x1bZ in python is weird, explanation needed - python

Related

vscode python avoid indent of multiline string

python in Visual Studio Code - how to print funky stuff

How to get the length of a unicode string? [duplicate]

python 3 regex not finding confirmed matches

Invalid character in identifier

Categories

Resources