When receiving a JSON from some OCR server the encoding seems to be broken. The image includes some characters that are not encoded(?) properly. Displayed in console they are represented by \uXXXX.
For example processing an image like this:
ends up with output:
"some text \u0141\u00f3\u017a"
It's confusing because if I add some code like this:
mystr = mystr.replace(r'\u0141', '\u0141')
mystr = mystr.replace(r'\u00d3', '\u00d3')
mystr = mystr.replace(r'\u0142', '\u0142')
mystr = mystr.replace(r'\u017c', '\u017c')
mystr = mystr.replace(r'\u017a', '\u017a')
The output is ok:
"some text Ółźż"
What is more if I try to replace them by regex:
mystr = re.sub(r'(\\u[0-9|abcdef|ABCDEF]{4})', r'\g<1>', mystr)
The output remain "broken":
"some text \u0141\u00f3\u017a"
This OCR is processing image to MathML / Latex prepared for use in Python. Full documentation can be found here. So for example:
Will produce the following RAW output:
"\\(\\Delta=b^{2}-4 a c\\)"
Take a note that quotes are included in string - maybe this implies something to the case.
Why the characters are not being displayed properly in the first place while after this silly mystr.replace(x, x) it goes just fine?
Why the first method is working and re.sub fails? The code seems to be okay and it works fine in other script. What am I missing?
Python strings are unicode-encoded by default, so the string you have is different from the string you output.
>>> txt = r"some text \u0141\u00f3\u017a"
>>> txt
'some text \\u0141\\u00f3\\u017a'
>>> print(txt)
some text \u0141\u00f3\u017a
The regex doesn't work since there only is one backslash and it doesn't do anything to replace it. The python code converts your \uXXXX into the actual symbol and inserts it, which obviously works. To reproduce:
>>> txt[-5:]
'u017a'
>>> txt[-6:]
'\\u017a'
>>> txt[-6:-5]
'\\'
What you should do to resolve it:
Make sure your response is received in the correct encoding and not as a raw string. (e.g. use response.text instead of reponse.body)
Otherwise
>>> txt.encode("raw-unicode-escape").decode('unicode-escape')
'some text Łóź'
I'm looking to automate the formatting of sources in a Microsoft word document(.docx). The problem is that some of the text in the new format has to be in italics. Is there a way in python to format text in italics to the clipboard? If you manually(ctrl + c) copy italics, the italic part of the string is still kept in the clipboard. Because when you paste it out(ctrl + v) it's still in italics. This is why I\m wondering if it\s possible in python.
I've already looked at pyperclip, but they only provide information on how to copy plain strings. (https://pyperclip.readthedocs.io/en/latest/introduction.html).
It is possible with klembord.
Installation
pip install klembord
Code to set italic text to clipboard
import klembord
klembord.init()
# Set HTML formatted clipboard
klembord.set_with_rich_text('', 'Normal text, <i>Italic text</i>')
Short explanation
The set_with_rich_text() takes two arguments. The first argument is what is set to the "plain text" clipboard, which is then used if user pastes for example to Notepad. The second argument is the "HTML formatted clipboard", which is uses if user pastes to a rich text editor, such as Word.
Output when pasted to Word
The question is straightforward, is it possible to change the font family of text in a Python print() output? Like Times New Roman, Arial, or Comic Sans?
I only want to change some of the output. Not all of the text like in this question.
I'm using Python 3 and Jupyter Notebook on a Mac.
I know it's possible to make certain text bold like so:
bold_start = '\033[1m'
bold_end = '\033[0m'
print(bold_start, "Hello", bold_end, "World")
This outputs "Hello World" instead of "Hello World" or "Hello World"
Python strings are just strings of Unicode characters, they don't say anything about font one way or another. The font is determined by whatever is rendering the characters, e.g. the terminal program you're using, or the browser you're using. The print function just spits out the resulting string.
As you pointed out, if you're in a terminal that understands those escape sequences, then you can use those to affect the output. If your output is a web page, then you can embed html code to specify whatever you like, but all the python interpreter sees is a string of characters, not a string of characters in any particular font.
I'm coding a little app that asks a question using display dialog with default answer "", takes whatever the user input is (let's say die Universität), and sends it to a Python file. Python checks the spelling of the word, translates it, spits out a display dialog with the English translation.
The problem I'm having is that Applescript is not giving Python a nice encoding. Here's my code in Applescript:
set phrase to the text returned of (display dialog "Enter German Phrase" default answer "")
set command to "python /Users/Eli/Documents/Alias\\ Scripts/gm_en.py " & phrase
do shell script command
I get the input into Python. It's breaking everything, so I'm using chardet to figure out what the encoding is. It's giving me this: {'confidence': 0.7696762680042672, 'encoding': 'ISO-8859-2'}
Not only is this pretty innacurrate, it's an encoding I can find very little about online. Trying to convert with decode('iso-8859-2') gives very strange symbols.
Any ideas?
E.g:
print "hello"
What should I do to make the text "hello" bold?
class color:
PURPLE = '\033[95m'
CYAN = '\033[96m'
DARKCYAN = '\033[36m'
BLUE = '\033[94m'
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
BOLD = '\033[1m'
UNDERLINE = '\033[4m'
END = '\033[0m'
print(color.BOLD + 'Hello, World!' + color.END)
Use this:
print '\033[1m' + 'Hello'
And to change back to normal:
print '\033[0m'
This page is a good reference for printing in colors and font-weights. Go to the section that says 'Set graphics mode:'
And note this won't work on all operating systems but you don't need any modules.
You can use termcolor for this:
sudo pip install termcolor
To print a colored bold:
from termcolor import colored
print(colored('Hello', 'green', attrs=['bold']))
For more information, see termcolor on PyPi.
simple-colors is another package with similar syntax:
from simple_colors import *
print(green('Hello', ['bold'])
The equivalent in colorama may be Style.BRIGHT.
In straight-up computer programming, there is no such thing as "printing bold text". Let's back up a bit and understand that your text is a string of bytes and bytes are just bundles of bits. To the computer, here's your "hello" text, in binary.
0110100001100101011011000110110001101111
Each one or zero is a bit. Every eight bits is a byte. Every byte is, in a string like that in Python 2.x, one letter/number/punctuation item (called a character). So for example:
01101000 01100101 01101100 01101100 01101111
h e l l o
The computer translates those bits into letters, but in a traditional string (called an ASCII string), there is nothing to indicate bold text. In a Unicode string, which works a little differently, the computer can support international language characters, like Chinese ones, but again, there's nothing to say that some text is bold and some text is not. There's also no explicit font, text size, etc.
In the case of printing HTML, you're still outputting a string. But the computer program reading that string (a web browser) is programmed to interpret text like this is <b>bold</b> as "this is bold" when it converts your string of letters into pixels on the screen. If all text were WYSIWYG, the need for HTML itself would be mitigated -- you would just select text in your editor and bold it instead of typing out the HTML.
Other programs use different systems -- a lot of answers explained a completely different system for printing bold text on terminals. I'm glad you found out how to do what you want to do, but at some point, you'll want to understand how strings and memory work.
This depends if you're using Linux or Unix:
>>> start = "\033[1m"
>>> end = "\033[0;0m"
>>> print "The" + start + "text" + end + " is bold."
The text is bold.
The word text should be bold.
There is a very useful module for formatting text (bold, underline, colors, etc.) in Python. It uses the curses library, but it's very straightforward to use.
An example:
from terminal import render
print render('%(BG_YELLOW)s%(RED)s%(BOLD)sHey this is a test%(NORMAL)s')
print render('%(BG_GREEN)s%(RED)s%(UNDERLINE)sAnother test%(NORMAL)s')
I wrote a simple module named colors.py to make this a little more pythonic:
import colors
with colors.pretty_output(colors.BOLD, colors.FG_RED) as out:
out.write("This is a bold red text")
with colors.pretty_output(colors.BG_GREEN) as out:
out.write("This output have a green background but you " +
colors.BOLD + colors.FG_RED + "can" + colors.END + " mix styles")
print '\033[1m Your Name \033[0m'
\033[1m is the escape code for bold in the terminal.
\033[0m is the escape code for end the edited text and back default text format.
If you do not use \033[0m then all upcoming text of the terminal will become bold.
Check out Colorama. It doesn't necessarily help with bolding... but you can do colorized output on both Windows and Linux, and control the brightness:
from colorama import *
init(autoreset=True)
print Fore.RED + 'some red text'
print Style.BRIGHT + Fore.RED + 'some bright red text'
Install the termcolor module
sudo pip install termcolor
and then try this for colored text
from termcolor import colored
print colored('Hello', 'green')
or this for bold text:
from termcolor import colored
print colored('Hello', attrs=['bold'])
In Python 3 you can alternatively use cprint as a drop-in replacement for the built-in print, with the optional second parameter for colors or the attrs parameter for bold (and other attributes such as underline) in addition to the normal named print arguments such as file or end.
import sys
from termcolor import cprint
cprint('Hello', 'green', attrs=['bold'], file=sys.stderr)
Full disclosure, this answer is heavily based on Olu Smith's answer
and was intended as an edit, which would have reduced the noise on this page
considerably but because of some reviewers' misguided concept of
what an edit is supposed to be, I am now forced to make this a separate answer.
Simple boldness - two-line code
In Python 3, you could use Colorama - simple_colors:
(On the Simple Colours page*, go to the heading 'Usage'.) Before you do what is below. Make sure you pip install simple_colours.
from simple_colors import *
print(green('hello', 'bold'))
Some terminals allow to print colored text. Some colors look like if they are "bold". Try:
print ('\033[1;37mciao!')
The sequence '\033[1;37m' makes some terminals to start printing in "bright white" that may look a bit like bolded white. '\033[0;0m' will turn it off.
Assuming that you really mean "print" on a real printing terminal:
>>> text = 'foo bar\r\noof\trab\r\n'
>>> ''.join(s if i & 1 else (s + '\b' * len(s)) * 2 + s
... for i, s in enumerate(re.split(r'(\s+)', text)))
'foo\x08\x08\x08foo\x08\x08\x08foo bar\x08\x08\x08bar\x08\x08\x08bar\r\noof\x08\
x08\x08oof\x08\x08\x08oof\trab\x08\x08\x08rab\x08\x08\x08rab\r\n'
Just send that to your stdout.
A simple approach relies on Unicode Mathematical Alphanumeric Symbols.
Code
def bold(
text,
trans=str.maketrans(
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
"𝗔𝗕𝗖𝗗𝗘𝗙𝗚𝗛𝗜𝗝𝗞𝗟𝗠𝗡𝗢𝗣𝗤𝗥𝗦𝗧𝗨𝗩𝗪𝗫𝗬𝗭𝗮𝗯𝗰𝗱𝗲𝗳𝗴𝗵𝗶𝗷𝗸𝗹𝗺𝗻𝗼𝗽𝗾𝗿𝘀𝘁𝘂𝘃𝘄𝘅𝘆𝘇𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵",
),
):
return text.translate(trans)
Example
assert bold("Hello world") == "𝗛𝗲𝗹𝗹𝗼 𝘄𝗼𝗿𝗹𝗱"
Discussion
Several pros and cons I can think of. Feel free to add yours in the comments.
Advantages:
As short as readable.
No external library.
Portable: can be used for instance to highlight sections in an ipywidgets Dropdown.
Extensible to italics, etc. with the appropriate translation tables.
Language agnostic: the same technic can be implemented in any programming language.
Drawbacks:
Requires Unicode support and a font where all the required glyphs are defined. This should be ok on any reasonably modern system, though.
No copy-paste : produces a faux-text. Note that '𝘄𝗼𝗿𝗹𝗱'.isalpha() is still True, though.
No diacritics.
Implementation notes
In the code above, the translation table is given as an optional argument, meaning that it is evaluated only once, and conveniently encapsulated in the function which makes use it. If you prefer a more standard style, define a global BOLD_TRANS constant, or use a closure or a lightweight class.
The bold text goes like this in Python:
print("This is how the {}bold{} text looks like in Python".format('\033[1m', '\033[0m'))
This is how the bold text looks like in Python.
Printing in bold made easy.
Install quo using pip:
from quo import echo
echo(f"Hello, World!", bold=True)
There is something called escape sequence which is used to represent characters that is not available in your keyboard. It can be used for formatting text (in this case bold letter format), represent special character with specific ASCII code and to represent Unicode characters.
In Python, escape sequences are denoted by a backslash \ followed by one or more characters. For example, the escape sequence \n represents a newline character, and the escape sequence \t represents a tab character.
Here for formatting text in bold use \033[1m before and after the text you want to represent in bold.
example-
print("This line represent example of \033[1mescape sequence\033[0m.")
In the escape sequence \033[1m, the 1 enables bold text, while the m is the command to set the text formatting. The \033[0m escape sequence resets the text formatting to the default settings.
The \033[0m escape sequence is used after the \033[1m escape sequence to turn off bold text and return to the default text formatting. This is necessary because the \033[1m escape sequence only enables bold text, it does not disable it.
def say(text: str):
print ("\033[1;37m" + text)
say("Hello, world!")
my code works okay.