Reading LaTeX expression and pretty printing it as ASCII with python

Reading LaTeX expression and pretty printing it as ASCII with python - python

I'm running some python code in the terminal and I want to output a pretty ASCII representation of a LaTeX expression. I realize I can pop up a separate window using matplotlib, but I don't want the text in a separate window. Sympy does a nice job printing, but doesn't seem to import LaTeX (at least not that I've found).
Ideally, It would work something like this:
print('$x^2$')
would output in the console:
2
x
Just like sympy would if I made a sympy symbol x and printed x**2.
More complicated expressions would have to be handled appropriately. For example:
\lim_{n \to \inf} \sum_{x=1}^{n} \frac{1}{x}
could be rendered as:
n
-----
\ 1
lim > -----
n->inf / x
-----
x=1
Characters which are not supported in ASCII such as $\alpha$ could be expanded to an ASCII equivalent such as "alpha" or an error could be thrown.

LaTeX parsing is not currently included in SymPy, although it has been speculated since 2012 with a decently high priority.
The third-party latex2sympy may be able to fulfill your needs.

Related

How do I combine Sympy expressions with strings without having the formatting get messed up?

I'm using Sympy to do several symbolic computations in a Jupyter Notebook and I want to be able to get outputs of the form "x = [sympy expression] = [sympy expression]". If I just do something like
from sympy import symbols, diff
x = symbols('x')
y = diff(6*x**5)
y
It will display the value of y using nice textbook style notation. The problem is if I try to combine text with Sympy expressions. If I simply put a Sympy expression in a print statement, it doesn't preserve the formatting but will display the result using standard Python syntax (I've tried using pprint and all of Sympy's other print functions, but they didn't help). If I instead use the display function from IPython, that renders the Sympy expressions correctly, but it will also put them on separate lines from the text. So, for instance, if I do
display(f"y ={y}")
the "y =" will be on a separate line from the expression for y. If I have several Sympy expressions in one display statement, that results in things being needlessly broken up into several lines, which is a rather ugly output.
The only way I found around the unwanted line breaks is to wrap the Sympy expressions in IPython's Math function, which seems to convert the Sympy expressions to regular Python syntax (e.g. x² becomes x**2), then use regular expressions to convert the Python syntax to LaTex syntax, which the Math function will render properly (e.g. display(f'y = {re.sub('\*\*', '^', Math(y)}')). It works, but it's a lot of hassle. Is there an easier way?

Actually I figured it out, so I'm posting my solution in case anyone else has a similar question: the need for directly using regular expressions to convert the sympy expression to LaTex the Math can render can be avoided by using Sympy's LaTex function, which converts any sympy expression to LaTex syntax. So, doing, e.g.
from IPython.display import display, Math
from sympy import latex, symbols, diff
x = symbols('x')
y = diff(6*x**5)
display(Math(f' y = {latex(y)}'))
will give the single line output:
y = 30x⁴
which is exactly what I wanted (albeit, for more complicated expressions than the one I gave as an example).

Print non-unicode subscript to Python console

I'm using Python 3.10 to implement a classical mechanics problem, and I want to print a message to the console asking the user to input the initial velocities. I am using x, y, and z as coordinates so ideally I want to denote the velocity components as vx, vy, and vz.
Originally I thought of using unicode subscirpts, but apparently they don't exist for y and z (as this other SO answer explains).
Of course I could just display (v_x, v_y, v_z), but I wanted it to look a bit more polished. Is there an easy way to display non-unicode subscripts in Python? Or otherwise, some very bare-bones UI package where I can have more freedom in formatting the text (like using LaTeX, or Markdown)?

No -
The terminal emulator (or command line) can only display characters, and does not allow for unlimited character transformations, as it is possible with text on a web browser or in a graphic interface.
ALthough there are special character sequences that can trigger special features such as foreground and background color, underline and blinking, those have to be implemented by the terminal emulator program itself, and appart from a subset, there is no universal code convention. The closest one, are what we usually call "ANSI escape code sequences" do not provide for a subscript or super-script convert - you can check the available codes on wikipedia - those will work in most terminal programs for Linux and MacOS and most custom terminal programas in windows (including the one called "terminal" in the microsoft store), but not on the default "cmd" app which cames pre-installed in windows.
(There is a Python package called "colorama" which tries to overcome this limitation to cmd, allowing cross-platform terminal programs able to display rich text - but it will filter out the codes for using fullcolor in terminal programs that accept them, so it is not always a good idea)
All that said, the tables as they are in the linked wikepdia article may be a bit confusing - but for shorten: "CSI" is the sequence "\x1b[" - where "\x1b"is the "ESC" character (decimal 27) , and "[" is a literal "open square bracket" char - the "SGR" sequence is"\x1b[<list-of-parameters-separated-by-;>m"` (again, "m" here is just the plain letter "m" closing a sequence of numeric codes that may change the way the terminal will display some text.
So, for front-face red text, you may want to do:
print("\x1b[31mThis text in red\x1b[39m normal color").
(note that the numbers are also plain decimal strings with the digits)
You will note that the code "74" is reserved for "subscript" code - however, I don't know of a terminal emulator which implements it.

Removing a control character using Python

I have a script that processes the output of a command (the aws help cli command).
I step through the output line-by-line and don't start the actual real parsing until I encounter the text "AVAILABLE COMMANDS" at which point I set a flag to true and start further processing on each line.
I've had this working fine - BUT on Ubuntu we encounter a problem which is this :
The CLI highlights the text in a way I have not seen before:
The output is very long, so I've grep'd the particular line in question - see below:
># aws ec2 help | egrep '^A'
>AVAILABLE COMMANDS
># aws ec2 help | egrep '^A' | cat -vet
>A^HAV^HVA^HAI^HIL^HLA^HAB^HBL^HLE^HE C^HCO^HOM^HMM^HMA^HAN^HND^HDS^HS$
What I haven't seen before is that each letter that is highligted is in the format X^HX.
I'd like to apply a simple transformation of the type X^HX --> X (for all a-zA-Z).
What have I tried so far:
well my workaround is this - first I remove control characters like this:
String = re.sub(r'[\x00-\x1f\x7f-\x9f]','',String)
but I still have to search for 'AAVVAAIILLAABBLLEE' which is totally ugly. I considered using a further regex to turn doubles to singles but that will catch true doubles and get messy.
I started writing a function with an iteration across a constructed list of alpha characters to translate as described, and I used hexdump to try to figure out the exact \x code of the control characters in question but could not get it working - I could remove H but not the ^.
I really don't want to use any additional modules because I want to make this available to people without them having to install extras. In conclusion I have a workaround that is quite ugly, but I'm sure someone must know a quick an easy way to do this translation. It's odd that it only seems to show up on Ubuntu.

After looking at this a little further I was able to put in place a solution:
from string import ascii_lowercase
from string import ascii_uppercase
def RemoveUbuntuHighlighting(String):
for Char in ascii_uppercase + ascii_lowercase:
Match = Char + '\x08' + Char
String = re.sub(Match,Char,String)
return(String)
I'm still a little confounded to see characters highlighted in the format (X\x08X), the arrangement does seem to repeat the same information unnecessarily.
The other thing I would advise to anyone not familiar with reading hexcode is that each pair of hexes is swapped around with respect to the order of their appearance.

A much simpler and more reliable fix is to replace a backspace and duplicate of any character.
I have also augmented this to handle underscores using the same mechanism (character, backspace, underscore).
String = re.sub(r'(.)\x08(\1|_)', r'\1', String)
Demo: https://ideone.com/yzwd2V
This highlighting was standard back when output was to a line printer; backspacing and printing the same character again would add pigmentation to produce boldface. (Backspacing and printing an underscore would produce underlining.)
Probably the AWS CLI can be configured to disable this by setting the TERM variable to something like dumb. There is also a utility col which can remove this formatting (try col-b; maybe see also colcrt). Though perhaps really the best solution would be to import the AWS Python code and extract the help message natively.

Force LaTeX font to match default matplotlib font

I have seen this issue pop up here and there but have yet to find a suitable answer.
When making a plot in matplotlib, the only way to insert symbols and math functions (like fractions, exponents, etc...) is to use TeX formatting. However, by default TeX formatting uses a different font AND italicizes the text. So for example, if I wanted an axis label to say the following:
photons/cm^2/s/Angstrom
I have to do the following:
ax1.set_ylabel(r'Photons/$cm^2$/s/$\AA$')
This produces a very ugly label that uses 2 different fonts and has bits and pieces italicized.
How do I permanently change the font of TeX (Not the other way around) so that it matches the default font used by matplotlib?
I have seen other solutions that tell the user to manually make all text the same in a plot by using \mathrm{} for example but this is ridiculously tedious. I have also seen solutions which change the default font of matplotlib to match TeX which seem utterly backwards to me.

It turns out the solution was rather simple and a colleague of mine had the solution.
If I were to use this line of code to create a title:
fig.suptitle(r'$H_2$ Emission from GJ832')
The result would be "H2 Emission from GJ832" which is an illustration of the problem I was having. However, it turns out anything inside of the $$ is converted to math type and thus the italics assigned.
If we change that line of code to the following:
fig.suptitle(r'H$_2$ Emission from GJ832')
Then the result is "H2 Emission from GJ832" without the italics. So this is an example of where we can constrain the math type to include only the math parts of the text, namely creating the subscript of 2.
However, if I were to change the code to the following:
fig.suptitle(r'H$_{two}$ Emission from GJ832')
the result would be "Htwo Emission from GJ832" which introduces the italics again. In this case, and for any case where you must have text (or are creating unit symbols) inside the dollar signs, you can easily remove the italics the following way:
fig.suptitle(r'H$_{\rm two}$ Emission from GJ832')
or in the case of creating a symbol:
ax2.set_xlabel(r'Wavelength ($\rm \AA$)')
The former results in "Htwo Emission from GJ832"
and the latter in "Wavelength (A)"
where A is the Angstrom symbol.
Both of these produce the desired result with nothing italicized by calling \rm before the text or symbol in the dollar signs. The result is nothing italicized INCLUDING the Angstrom symbol created by \AA.
While this doesn't change the default of the TeX formatting, it IS a simple solution to the problem and doesn't require any new packages. Thank you Roland Smith for the suggestions anyway. I hope this helps others who have been struggling with the same issue.

For typesetting units, use the siunitx package (with mode=text) rather than math mode.
Update: The above is only valid when you have defined text.usetex : True in your rc settings.
From the matplotlib docs:
Note that you do not need to have TeX installed, since matplotlib ships its own TeX expression parser, layout engine and fonts.
And:
Regular text and mathtext can be interleaved within the same string. Mathtext can use the Computer Modern fonts (from (La)TeX), STIX fonts (with are designed to blend well with Times) or a Unicode font that you provide. The mathtext font can be selected with the customization variable mathtext.fontset
Reading this, it sounds that setting mathtext.fontset and the regular font that matplotlib uses the same would solve the problem if you don't use TeX.

Pretty-print Lisp using Python

Is there a way to pretty-print Lisp-style code string (in other words, a bunch of balanced parentheses and text within) in Python without re-inventing a wheel?

Short answer
I think a reasonable approach, if you can, is to generate Python lists or custom objects instead of strings and use the pprint module, as suggested by #saulspatz.
Long answer
The whole question look like an instance of an XY-problem. Why? because you are using Python (why not Lisp?) to manipulate strings (why not data-structures?) representing generated Lisp-style code, where Lisp-style is defined as "a bunch of parentheses and text within".
To the question "how to pretty-print?", I would thus respond "I wouldn't start from here!".
The best way to not reinvent the wheel in your case, apart from using existing wheels, is to stick to a simple output format.
But first of all all, why do you need to pretty-print? who will look at the resulting code?
Depending on the exact Lisp dialect you are using and the intended usage of the code, you could format your code very differently. Think about newlines, indentation and maximum width of your text, for example. The Common Lisp pretty-printer is particulary evolved and I doubt you want to have the same level of configurability.
If you used Lisp, a simple call to pprint would solve your problem, but you are using Python, so stick with the most reasonable output for the moment because pretty-printing is a can of worms.
If your code is intended for human readers, please:
don't put closing parenthesis on their own lines
don't vertically align open and close parenthesis
don't add spaces between opening parenthesis
This is ugly:
( * ( + 3 x )
(f
x
y
)
)
This is better:
(* (+ 3 x)
(f x y))
Or simply:
(* (+ 3 x) (f x y))
See here for more details.
But before printing, you have to parse your input string and make sure it is well-formed. Maybe you are sure it is well-formed, due to how you generate your forms, but I'd argue that the printer should ignore that and not make too many assumptions. If you passed the pretty-printer an AST represented by Python objects instead of just strings, this would be easier, as suggested in comments. You could build a data-structure or custom classes and use the pprint (python) module. That, as said above, seems to be the way to go in your case, if you can change how you generate your Lisp-style code.
With strings, you are supposed to handle any possible input and reject invalid ones.
This means checking that parenthesis and quotes are balanced (beware of escape characters), etc.
Actually, you don't need to really build an intermediate tree for printing (though it would probably help for other parts of your program), because Lisp-style code is made of forms that are easily nested and use a prefix notation: you can scan your input string from left-to-right and print as required when seeing parenthesis (open parenthesis: recurse; close parenthesis, return from recursion). When you first encounter an unescaped double-quote ", read until the next one ", ...
This, coupled with a simple printing method, could be sufficient for your needs.

I think the easiest method would be to use triple quotations. If you say:
print """
(((This is some lisp code))) """
It should work.
You can format your code any way you like within the triple quotes and it will come out the way you want it to.
Best of luck and happy coding!

I made this rudimentary pretty printer once for prettifying CLIPS, which is based on Lisp. Might help:
def clips_pprint(clips_str: str) -> str:
"""Pretty-prints a CLIPS string.
Indents a CLIPS string for easier visual confirmation during development
and verification.
Assumes the CLIPS string is valid CLIPS, i.e. braces are paired.
"""
LB = "("
RB = ")"
TAB = " " * 4
formatted_clips_str = ""
tab_count = 0
for c in clips_str:
if c == LB:
formatted_clips_str += os.linesep
for _i in range(tab_count):
formatted_clips_str += TAB
tab_count += 1
elif c == RB:
tab_count -= 1
formatted_clips_str += c
return formatted_clips_str.strip()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.