Python: \b behaves differently from book's description [duplicate] - python

This question already has an answer here:
Why Python IDLE Print Instead Of Backspace? [duplicate]
(1 answer)
Closed 6 years ago.
I'm reading automate_the_boring_stuff_with_python_2015 and I got to this snippet:
print(positionStr, end='')
print('\b' * len(positionStr), end='', flush=True)
where positionStr is a string defined earlier. I looked at python escape sequences and saw that \b is backspace but for some reason the author says it should erase the printed string
To erase text, print the \b backspace escape character. This special
character erases a character at the end of the current line on the screen. The line at u uses string replication to produce a string with as many \b
characters as the length of the string stored in positionStr, which has the
effect of erasing the positionStr string that was last printed.
this contradicts what I saw in here (table in mid page)
this differs from my results
As you can see I got a bunch of backspace chars, as I guess I should have (I ran a loop in which I printed the said string and then the \b string)
Now, is the book wrong or should I have done something different in order for it to work? Additionally, if this is wrong, is there a way to achieve this goal? (print string and then delete it)
As it can be seen from the picture, I work with python 3.5.3. on Windows 8.1

Not all consoles support the \b character as a deletion character, especially graphical ones.
(same thing happens when you write it to a file, the previous char is not deleted either)
Try your example in a native shell (Windows or Linux would work) and the characters will be properly deleted.
Windows CMD:
>>> print("a\bc")
c
PyScripter (that's what I have):
>>> print("a\bc")
a<strange char>c

Related

Is it possible to print characters on top of each other without erasing the previous one in order to have both superscript and subscript?

I am wondering if I can have print() outputs such as
in a terminal and/or an IPython/Jupyter Notebook. I want to develop a library working with toleranced dimensions and these types of pretty-printed outputs will come quite handy during development and testing.
What I know so far:
There are escape characters such as Carriage Return \r that goes to the beginning of the line without erasing the existing characters and the Backspace \b that deletes the last character. For example print("some text\bsome other text \rbingo", end="") should give me bingotexsome other text. Anyway, when printing a new character the previous one is erased.
I also know how to use Unicode characters to have superscripted/subscripted digits and plus/minus signs. For example, the print('1.23\u207a\u2074\u2027\u2075\u2076') will give me something like 1.23+4.56 and print('1.23\u208b\u2087.\u2088\u2089') outputs close to 1.23-7.89. Although what unicode characters should be used for superscript/subscript decimal delimiters (in this case period/dot/point) is still debatable. There are multiple options for superscipted dot including also \u0387 and \u22c5. However, AFIK there are no unicode characters suitable for subscripted dot. (more info here)
what I don't know
if there is an escape character or Unicode one that replicates the left arrow ← key on the keyboard?
how to print without erasing the pixels in the terminal? Is there a way to print/display characters on top of each other?
and if none of the above is possible in a terminal, if/how I can control the HTTP/CSS outputs in a Jupyter Notebook to print both superscript and subscript at the same time?
In Jupyter Notebook/Lab this should work:
from IPython.display import Math
Math(r"1.23^{+4.56}_{-7.89}")
For convenience, you can package it in a class:
from IPython.display import Math
class PPrint:
def __init__(self, base, sub, sup):
self.base = base
self.sub = sub
self.sup = sup
def _ipython_display_(self):
display(Math(f"{{{self.base}}}^{{{self.sub}}}_{{{self.sup}}}"))
Then you can create an instance e.g.:
x = PPrint("1.23", "+4.56", "-7.89")
and if you execute in a notebook either x or display(x), it should appear as in your example.

Why does PyCharm use double backslash to indicate escaping?

For instance, I write a normal string and another "abnormal" string like this:
Now I debug it, finding that in the debug tool, the "abnormal" string will be shown like this:
Here's the question:
Why does PyCharm show double backslashes instead of a single backslash? As is known to all, \' means '. Is there any trick?
What I believe is happening is the ' in your c variable string needs to be escaped and PyCharm knows this at runtime, given you have surrounded the full string in " (You'll notice in the debugger, your c string is now surrounded by '). To escape the single quote it changes it to \', but now, there is a \ in your string that needs escaping, and to escape \ in Python, you type \\.
EDIT
Let me see if I can explain the order of escaping going on here.
"u' this is not normal" is assigned to c
PyCharm converts the string in c to 'u' this is not normal' at runtime. See how, without escaping the 2nd ', your string is now closed off right after u.
PyCharm escapes the ' automatically for you by adding a slash before it. The string is now 'u\' this is not normal'. At this point, everything should be fine but PyCharm may be taking an additional step for safety.
PyCharm then escapes the slash it just added to your string, leaving the string as: 'u\\' this is not normal'.
It is likely a setting inside PyCharm. Does it cause an actual issue with your code?

python 3 regex not finding confirmed matches

So I'm trying to parse a bunch of citations from a text file using the re module in python 3.4 (on, if it matters, a mac running mavericks). Here's some minimal code. Note that there are two commented lines: they represent two alternative searches. (Obviously, the little one, r'Rawls', is the one that works)
def makeRefList(reffile):
print(reffile)
# namepattern = r'(^[A-Z1][A-Za-z1]*-?[A-Za-z1]*),.*( \(?\d\d\d\d[a-z]?[.)])'
# namepattern = r'Rawls'
refsTuplesList = re.findall(namepattern, reffile, re.MULTILINE)
print(refsTuplesList)
The string in question is ugly, and so I stuck it in a gist: https://gist.github.com/paultopia/6c48c398a42d4834f2ae
As noted, the search string r'Rawls' produces expected output ['Rawls', 'Rawls']. However, the other search string just produces an empty list.
I've confirmed this regex (partially) works using the regex101 tester. Confirmation here: https://regex101.com/r/kP4nO0/1 -- this match what I expect it to match. Since it works in the tester, it should work in the code, right?
(n.b. I copied the text from terminal output from the first print command, then manually replaced \n characters in the string with carriage returns for regex101.)
One possible issue is that python has appended the bytecode flag (is the little b called a "flag?") to the string. This is an artifact of my attempt to convert the text from utf-8 to ascii, and I haven't figured out how to make it go away.
Yet re clearly is able to parse strings in that form. I know this because I'm converting two text files from utf-8 to ascii, and the following code works perfectly fine on the other string, converted from the other text file, which also has a little b in front of it:
def makeCiteList(citefile):
print(citefile)
citepattern = r'[\s(][A-Z1][A-Za-z1]*-?[A-Za-z1]*[ ,]? \(?\d\d\d\d[a-z]?[\s.,)]'
rawCitelist = re.findall(citepattern, citefile)
cleanCitelist = cleanup(rawCitelist)
finalCiteList = list(set(cleanCitelist))
print(finalCiteList)
return(finalCiteList)
The other chunk of text, which the code immediately above matches correctly: https://gist.github.com/paultopia/a12eba2752638389b2ee
The only hypothesis I can come up with is that the first, broken, regex expression is puking on the combination of newline characters and the string being treated as a byte object, even though a) I know the regex is correct for newlines (because, confirmation from the linked regex101), and b) I know it's matching the strings (because, confirmation from the successful match on the other string).
If that's true, though, I don't know what to do about it.
Thus, questions:
1) Is my hypothesis right that it's the combination of newlines and b that blows up my regex? If not, what is?
2) How do I fix that?
a) replace the newlines with something in the string?
b) rewrite the regex somehow?
c) somehow get rid of that b and make it into a normal string again? (how?)
thanks!
Addition
In case this is a problem I need to fix upstream, here's the code I'm using to get the text files and convert to ascii, replacing non-ascii characters:
this function gets called on utf-8 .txt files saved by textwrangler in mavericks
def makeCorpoi(citefile, reffile):
citebox = open(citefile, 'r')
refbox = open(reffile, 'r')
citecorpus = citebox.read()
refcorpus = refbox.read()
citebox.close()
refbox.close()
corpoi = [str(citecorpus), str(refcorpus)]
return corpoi
and then this function gets called on each element of the list the above function returns.
def conv2ASCII(bigstring):
def convHandler(error):
return ('1FOREIGN', error.start + 1)
codecs.register_error('foreign', convHandler)
bigstring = bigstring.encode('ascii', 'foreign')
stringstring = str(bigstring)
return stringstring
Aah. I've tracked it down and answered my own question. Apparently one needs to call some kind of encode method on the decoded thing. The following code produces an actual string, with newlines and everything, out the other end (though now I have to fix a bunch of other bugs before I can figure out if the final output is as expected):
def conv2ASCII(bigstring):
def convHandler(error):
return ('1FOREIGN', error.start + 1)
codecs.register_error('foreign', convHandler)
bigstring = bigstring.encode('ascii', 'foreign')
newstring = bigstring.decode('ascii', 'foreign')
return newstring
apparently the str() function doesn't do the same job, for reasons that are mysterious to me. This is despite an answer here How to make new line commands work in a .txt file opened from the internet? which suggests that it does.

Format string confusing

Hi im new to Python and Ive been trying to get format print to work but, and this may be me being new, but it seems to be very badly implemented.Any examples for 2.7.6 dont work for the new version and their aren't any real examples I could find on the internet for 3.3. As such I would like to ask for a good example of how format string works. For instance ive been trying to get this to work from my homework.
day,date,year,hour,and minutes must be separate variables.
using one formatted print statement,print the following:
Date:5/31/2013
Time: 3:45 pm
I can get it to work with this code:
def date():
Month=5
Day=31
Year=2013
Hours=3
Minutes=45
Scale='pm'
a="Date: %i/%i/%i\nTime: %i:%i %s" %(Month,Day,Year,Hours,Minutes,Scale)
print(a)
It works but its not one line as asked for. Please help format is so confusing.
The \n in your format string is inserting the new line character. Remove the \n, and you will not have the newline any longer.
Characters preceded by a backslash are known as escape characters. They can be used to insert special formatting into strings. For example:
\n is the newline character,
\t is the tab character
Remove \n because that is used to create a line break.

Shell text to python string

I'm writing a little python utility to help move our shell -help documentation to searchable webpages, but I hit a weird block :
output = subprocess.Popen([sys.argv[1], '--help'],stdout=subprocess.PIPE).communicate()[0]
output = output.split('\n')
print output[4]
#NAME
for l in output[4]:
print l
#N
#A
#
#A
#M
#
#M
#E
#
#E
#or when written, n?na?am?me?e
It does this for any heading/subheading in the documentation, which makes it near unusable.
Any tips on getting correct formatting? Where did I screw up?
Thanks
The documentation contains overstruck characters done in the ancient line-printer way: print each character, followed by a backspace (\b aka \x08), followed by the same character again. So "NAME" becomes "N\bNA\bAM\bME\bE". If you can convince the program not to output that way, it would be the best; otherwise, you can clean it up with something like output = re.sub(r'\x08.', '', output)
A common way to mark a character as bold in a terminal is to print the character, followed by a backspace characters, followed by the character itself again (just like you would do it on a mechanical typewriter). Terminal emulators like xterm detect such sequences and turn them into bold characters. Programs shouldn't be printing such sequences if stdout is not a terminal, but if your tool does, you will have to clean up the mess yourself.

Categories

Resources