Adding syntax to IPython?

Adding syntax to IPython? - python

I would like to add some syntax changes to (my installation of) IPython. For example, I might want to use \+ to mean operator.add. I imagine that I can insert some code that would process the input and turn it into actual (I)Python, and then IPython can do its own processing. But I don't know where to put that code.
(Disclaimer: Don't do it for production code, or code that's intended for other people to see/use.)

Here is an example of how to transform "\+ a b" to "a + b".
from IPython.core.inputtransformer import StatelessInputTransformer
#StatelessInputTransformer.wrap
def my_filter(line):
words = line.split()
if line.startswith(r'\+ ') and len(words) == 3:
return '{} + {}'.format(*words[1:])
return line
ip = get_ipython()
ip.input_transformer_manager.physical_line_transforms.insert(0, my_filter())
Note that this is all string based. This hook executes in an unevaluated context. It means that you can't do conditional transformation based on which value is a or b. A magic would best suit your need in that case.
Moreover, you have to be careful when parsing input string. In my example, the following is broken \+ (a * b) c because of the split. In that case, you will need a tokenization tool. IPython provides one with TokenInputTransformer. It works like StatelessInputTransformer but it is called with a list of tokens instead of the whole line.
Simply run this code to add the filter. If you want it to be available as you start IPython, you can save it as a .py or .ipy file and put it in
~/.ipython/profile_*/startup
https://ipython.org/ipython-doc/dev/config/inputtransforms.html

Related

How to turn strings into operations?

I recently tried to find polynomials with given points and stumbled upon the problem that I can’t use strings like normal mathematical operations: "3 + 1" + "2 + 1" because it will return "3 + 12 + 1". I than tried to just iterate over the string but had the next difficulty that I cant just unstringify operations like "+" or "-".
Does anyone know how to do solve the problem?

eval() is very dangerous
It can execute any commands, including unsafe or malicious strings.
Use Pyparsing (more info here and another question and example here).
Another option is the ast module (good example here). If you want more functionality, ast can open up more commands, but pyparsing should work well.
A third, more lightweight option is this single file parser

If this calculation will exist purely within the code, with the user having no direct access, then you can use eval().
Example:
print(eval("3 + 4"))
Output:
7
The point of eval() is to execute Python code that is stored as a string. If the user is inputting values to be used within an eval() statement, you should absolutely do some error-checking to make sure the user is inputting numbers. If your program was intended to be released for commercial use, say, you run the risk of a user inputting Python code that can be executed using eval().

Are there any types of text where `isspace()` would not detect a whitespace, including text processed outside of python?

I noticed in some Python code, which deals with text not processed in Python, they don't use the standard isspace() built-in. They use some other types of filtering.
An example is here
https://github.com/huggingface/transformers/blob/master/src/transformers/data/processors/squad.py#L80
def _is_whitespace(c):
if c == " " or c == "\t" or c == "\r" or c == "\n" or ord(c) == 0x202F:
return True
return False
I am wondering if there are any scenarios whereisspace() would not identify a whitespace from text, perhaps text processed outside of python? If not, what method would be?

In this specific case the user implementation detects fewer characters as whitespace than what the built-in cpython implementation does (from Where is the complete implementation of python isspace()). The reason for doing this is unknown without knowing details from the project you're linking to - the commit message from five months ago when the code was added does not indicate that there is any specific reason for including their own version - it's probably due to not knowing that the isspace method exists.
You usually don't have a complete list of methods from the standard library in your head and will end up reimplementing those from time to time. In this case it can probably (without knowing the reason for it being added, we can only guess) safely be replaced by the built-in version. There might be a reason for them wishing to have a more narrow implementation though.

Pretty-print Lisp using Python

Is there a way to pretty-print Lisp-style code string (in other words, a bunch of balanced parentheses and text within) in Python without re-inventing a wheel?

Short answer
I think a reasonable approach, if you can, is to generate Python lists or custom objects instead of strings and use the pprint module, as suggested by #saulspatz.
Long answer
The whole question look like an instance of an XY-problem. Why? because you are using Python (why not Lisp?) to manipulate strings (why not data-structures?) representing generated Lisp-style code, where Lisp-style is defined as "a bunch of parentheses and text within".
To the question "how to pretty-print?", I would thus respond "I wouldn't start from here!".
The best way to not reinvent the wheel in your case, apart from using existing wheels, is to stick to a simple output format.
But first of all all, why do you need to pretty-print? who will look at the resulting code?
Depending on the exact Lisp dialect you are using and the intended usage of the code, you could format your code very differently. Think about newlines, indentation and maximum width of your text, for example. The Common Lisp pretty-printer is particulary evolved and I doubt you want to have the same level of configurability.
If you used Lisp, a simple call to pprint would solve your problem, but you are using Python, so stick with the most reasonable output for the moment because pretty-printing is a can of worms.
If your code is intended for human readers, please:
don't put closing parenthesis on their own lines
don't vertically align open and close parenthesis
don't add spaces between opening parenthesis
This is ugly:
( * ( + 3 x )
(f
x
y
)
)
This is better:
(* (+ 3 x)
(f x y))
Or simply:
(* (+ 3 x) (f x y))
See here for more details.
But before printing, you have to parse your input string and make sure it is well-formed. Maybe you are sure it is well-formed, due to how you generate your forms, but I'd argue that the printer should ignore that and not make too many assumptions. If you passed the pretty-printer an AST represented by Python objects instead of just strings, this would be easier, as suggested in comments. You could build a data-structure or custom classes and use the pprint (python) module. That, as said above, seems to be the way to go in your case, if you can change how you generate your Lisp-style code.
With strings, you are supposed to handle any possible input and reject invalid ones.
This means checking that parenthesis and quotes are balanced (beware of escape characters), etc.
Actually, you don't need to really build an intermediate tree for printing (though it would probably help for other parts of your program), because Lisp-style code is made of forms that are easily nested and use a prefix notation: you can scan your input string from left-to-right and print as required when seeing parenthesis (open parenthesis: recurse; close parenthesis, return from recursion). When you first encounter an unescaped double-quote ", read until the next one ", ...
This, coupled with a simple printing method, could be sufficient for your needs.

I think the easiest method would be to use triple quotations. If you say:
print """
(((This is some lisp code))) """
It should work.
You can format your code any way you like within the triple quotes and it will come out the way you want it to.
Best of luck and happy coding!

I made this rudimentary pretty printer once for prettifying CLIPS, which is based on Lisp. Might help:
def clips_pprint(clips_str: str) -> str:
"""Pretty-prints a CLIPS string.
Indents a CLIPS string for easier visual confirmation during development
and verification.
Assumes the CLIPS string is valid CLIPS, i.e. braces are paired.
"""
LB = "("
RB = ")"
TAB = " " * 4
formatted_clips_str = ""
tab_count = 0
for c in clips_str:
if c == LB:
formatted_clips_str += os.linesep
for _i in range(tab_count):
formatted_clips_str += TAB
tab_count += 1
elif c == RB:
tab_count -= 1
formatted_clips_str += c
return formatted_clips_str.strip()

Find word/pattern/string in txt/xml-file and add incremental number

I'm using Textwrangler who falls short when it comes to adding numbers to replacements.
I have an xml-file with several strings containg the words:
generatoritem id="Outline Text"
I need to add an incrementing number at the end of each substitution, like so:
generatoritem id="Outline Text1"
So I need to replace 'Outline Text' with 'Outline Text' and an incrementing number.
I found an answer on a similar question and tried to type in this in textwrangler and hit Check Syntax:
perl -ple 's/"Outline Text"/$n++/e' /path/of/file.xml
Plenty of errors.. So I need to be explained this nice one liner. Or perhaps get a new one or a Python script?

-p makes perl read your file(s) one line at a time, and for each line it will execute the script and then emit the line as modified by your script. Note that there is an implicit use of a variable called $_ - it is used as the variable holding the line being read, it's also the default target for s/// and it's the default source for the print after each line.
You won't need the -l (the l in the middle of -ple) for the task you describe, so I won't bother going into it. Remove it.
The final flag -e (the e at the end of -ple) introduces your 'script' from the command line itself - ie allowing a quick and small use of perl without a source file.
Onto the guts of your script: it is wrong for the purpose you describe, and as an attempt it's also a bit unsafe.
If you want to change "Outline text" into something else, your current script replaces ALL of it with $n - which is not what you describe you want. A simple way to do exactly what you ask for is
s/(id="Outline text)(")/$1 . $n++ . $2/eg;
This matches the exact text you want, and notice that I'm also matching id= for extra certainty in case OTHER parts of your file contains "Outline text" - don't laugh, it can happen!
By putting ( ) around parts of the pattern, those bits are saved in variables known as $1, $2 etc. I am then using these in the replacement part. The . operator glues the pieces together, giving your result.
The /e at the end of the s/// means that the replacement is treated as a Perl expression, not just a plain replacement string. I've also added g which makes it match more than once on a line - you may have more than one interesting id= on a line in the input file, so be ready for it.
One final point. You seem to suggest you want to start numbering from 1, so replace $n++ with ++$n. For my suggested change, the variable $n will start as empty (effectively zero) it will be incremented to 1 (and 2, and 3 and ......) and THEN it's value will be used.

Terminate multi-line string in Python console

In PyCharm, if I open a Python Console, I can't terminate a multi-line string.
Here's what happens in IDLE for comparison:
>>> words = '''one
two
three'''
>>> print(words)
one
two
three
>>>
But if I try the same thing in an interactive Python Console from within PyCharm, the console expects more input after I type the final 3 apostrophes. Anyone know why?
>>> words = '''one
... two
... three'''
...

I'm not sure what the context is, but in many cases it would just be easier to make a tuple/list from the things you want printed on different lines and join them with "\n":
>>> words = "\n".join(["one", "two", "three"])
You may also try three double-quote symbols instead. Maybe PyCharm is confused by what's being delimited. I've always wondered this in Python because strings can be concatenated just by pure juxtaposition. So effectively, '' 'one\n\two\nthree' '' ought to take the three different strings, (1) '' (2) 'one\n\two\nthree' and (3) '', and concatenate them. Since the spaces between them ought not be needed (principle of least astonishment), it's more intuitive to me that the triple-single-(or double)-quote would be interpreted that way. But since the triple version is it's own special character, it doesn't work like that.
In IPython the syntax you give works with no problem. IPython also provides a nice magic command %cpaste in which you can paste multi-line expressions or statements, and then delimit the final line with --, and upon hitting enter, it executes the pasted block. I prefer IPython (running in a buffer in Emacs) to PyCharm by a lot, but maybe you can see if there's a comparable magic function, or just look up the source for that magic function and write one yourself?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.