Writing string with ANSI escape sequences to file - python

A python module I am using provides a hook that allows capturing user keyboard input before it is sent to a shell terminal. The problem I am facing is that it captures input character-by-character, which makes capturing the input commands difficult when the user performs such things as backspacing or moving the cursor.
For example, given the string exit\x1b[4D\x1b[Jshow myself out, the following takes place:
>>> a = exit\x1b[4D\x1b[Jshow myself out
>>> print(a)
show myself out
>>> with open('file.txt', 'w+') as f:
>>> f.write(a)
>>> exit()
less abc.txt
The less command shows the raw command (exit\x1b[4D\x1b[Jshow myself out), when in fact I would like it to be stored 'cleanly' as it is displayed when using the print function (show myself out).
Printing the result, or 'cat'ing the file shows exactly what I would want to be displayed, but I am guessing here that the terminal is transforming the output.
Is there a way to achieve a 'clean' write to file, either using some python module, or some bash utility? Surely there must be some module out there that can do this for me?

less is interpreting the control characters.
You can get around this with the -r command line option:
$ less -r file.txt
show myself out
From the manual:
-r or --raw-control-chars
Causes "raw" control characters to be displayed. The default is
to display control characters using the caret notation; for
example, a control-A (octal 001) is displayed as "^A". Warning:
when the -r option is used, less cannot keep track of the actual
appearance of the screen (since this depends on how the screen
responds to each type of control character). Thus, various dis‐
play problems may result, such as long lines being split in the
wrong place.
The raw control characters are sent to the terminal, which then interprets them as cat would.
As others have stated, you would need to interpret the characters yourself before writing them to a file.

Related

Why does subprocess.run not read new lines yet subprocess.call does?

Why is it that calling an executable via subprocess.call gives different results to subprocess.run?
The output of the call method is perfect - all new lines removed, formatting of the document is exactly right, '-' characters, bullets and tables are handled perfectly.
Running exactly the same function with the run method however and reading the output from stdout completely throws the output. Full of '\n', 'Â\xad', '\x97', '\x8f' characters with spacing all over the place.
Here's the code I'm using:
Subprocess.CALL
result=subprocess.call(['/path_to_pdftotext','-layout','/path_to_file.pdf','-'])
Subprocess.RUN
result=subprocess.run(['/path_to_pdftotext','-layout','/path_to_file.pdf','-'],stdout=PIPE, stderr=PIPE, universal_newlines=True, encoding='utf-8')
I don't understand why the run method doesn't parse and display the file in the same way. I'd use call however I need to save the result of the pdftotext conversion to a variable (in the case of run: var = result.stdout).
I can go through and just identify all the unicode it's not picking up in run and strip it out but I figure there must just be some encoding / decoding settings that the run method changes.
EDIT
Having read a similarly worded question - I believe this is different in scope as I'm wanting to understand why the output is different.
I've made some tests.
Are you printing the content on the console? Try to send the text in a text file with subprocess in both cases and see if it is different:
result=subprocess.call(['/path_to_pdftotext','-layout','/path_to_file.pdf','test.txt'])
result=subprocess.run(['/path_to_pdftotext','-layout','/path_to_file.pdf','test2.txt'])
and compare test.txt and test2.txt. In my case they are identical.
I suspect that the difference you are experiencing is not strictly related to subprocess, but how the console represent the output in both cases.
As said in the answer I linked in the comments, call():
It is equivalent to: run(...).returncode (except that the input and
check parameters are not supported)
That is your result stores an integer (the returncode) and the output is printed in the console, which seems to show it with the correct encoding, newlines etc.
With run() the result is a CompletedProcess instance. The CompletedProcess.stdout argument is:
Captured stdout from the child process. A bytes sequence, or a string
if run() was called with an encoding or errors. None if stdout was not
captured.
So being a bytes sequence or a string, python represents it differently when printed on the console, showing all the stuffs '\n', 'Â\xad', '\x97', '\x8f' and so on.

How do I press enter with pexpect [duplicate]

I am working with pythons pexpect module to automate tasks, I need help in figuring out key characters to use with sendcontrol. how could one send the controlkey ENTER ? and for future reference how can we find the key characters?
here is the code i am working on.
#!/usr/bin/env python
import pexpect
id = pexpect.spawn ('ftp 192.168.3.140')
id.expect_exact('Name')
id.sendline ('anonymous')
id.expect_exact ('Password')
*# Not sure how to send the enter control key
id.sendcontrol ('???')*
id.expect_exact ('ftp')
id.sendline ('dir')
id.expect_exact ('ftp')
lines = id.before.split ('\n')
for line in lines :
print line
pexpect has no sendcontrol() method. In your example you appear to be trying to send an empty line. To do that, use:
id.sendline('')
If you need to send real control characters then you can send() a string that contains the appropriate character value. For instance, to send a control-C you would:
id.send('\003')
or:
id.send(chr(3))
Responses to comment #2:
Sorry, I typo'ed the module name -- now fixed. More importantly, I was looking at old documentation on noah.org instead of the latest documentation at SourceForge. The newer documentation does show a sendcontrol() method. It takes an argument that is either a letter (for instance, sendcontrol('c') sends a control-C) or one of a variety of punctuation characters representing the control characters that don't correspond to letters. But really sendcontrol() is just a convenient wrapper around the send() method, which is what sendcontrol() calls after after it has calculated the actual value that you want to send. You can read the source for yourself at line 973 of this file.
I don't understand why id.sendline('') does not work, especially given that it apparently works for sending the user name to the spawned ftp program. If you want to try using sendcontrol() instead then that would be either:
id.sendcontrol('j')
to send a Linefeed character (which is control-j, or decimal 10) or:
id.sendcontrol('m')
to send a Carriage Return (which is control-m, or decimal 13).
If those don't work then please explain exactly what does happen, and how that differs from what you wanted or expected to happen.
If you're just looking to "press enter", you can send a newline:
id.send("\n")
As for other characters that you might want to use sendcontrol() with, I found this useful: https://condor.depaul.edu/sjost/lsp121/documents/ascii-npr.htm
For instance, I was interested in Ctrl+v. Looking it up in the table shows this line:
control character
python & java
decimal
description
^v
\x16
22
synchronous idle
So if I want to send that character, I can do any of these:
id.send('\x16')
id.send(chr(22))
id.sendcontrol('v')
sendcontrol() just looks up the correct character to send and then sends it like any other text
For keys not listed in that table, you can run this script: https://github.com/pexpect/pexpect/blob/master/tests/getch.py (ctrl space to exit)
For instance, ran that script and pressed F4 and it said:
27<STOP>
79<STOP>
83<STOP>
So then to press F4 via pexpect:
id.send(chr(27) + chr(79) + chr(83))

How to print terminal formatted output to a variable

Is there a method to print terminal formatted output to a variable?
print 'a\bb'
--> 'b'
I want that string 'b' to a variable - so how to do it?
I am working with a text string from telnet. Thus I want to work with the string that would be printed to screen.
So what I am looking for is something like this:
simplify_string('a\bb') ==> 'b'
Another example with a carriage return:
simplify_string('aaaaaaa\rbb') ==> 'bbaaaaa'
This turns out to be quite tricky because there are a lot of terminal formatting commands (including e.g. cursor up/down/left/right commands, terminal colour codes, vertical and horizontal tabs, etc.).
So, if you want to emulate a terminal properly, get a terminal emulator! pyte (pip install pyte) implements a VT102-compatible in-memory virtual terminal. So, you can feed it some text, and then get the formatted text from it:
import pyte
screen = pyte.Screen(80, 24)
stream = pyte.ByteStream(screen)
stream.feed(b'xyzzz\by\rfoo')
# print the first line of text ('foozy')
print(screen.display[0].rstrip())
To handle multiple lines, just join all of the lines in the text (e.g. '\n'.join(row.rstrip() for row in screen.display).rstrip()).
Note that this doesn't handle trailing spaces, but those would be indistinguishable on a real terminal anyway.

Prevent terminal character set switch on data printing

I am running a console application that takes data coming from various sensors around the house. Sometimes the transmission is interrupted and thus the packet does not make sense. When that happens, the contents of the packet is output to a terminal session.
However, what has happened is that while outputting the erroneous packet, it has contained characters that changed character set of the current terminal window, rendering any text (apart from numbers) as unreadable gibberish.
What would be the best way to filter the erroneous packets before their display while retaining most of the special characters? What exactly are sequences that can change behaviour of the terminal?
I would also like to add that apart from scrambled output, the application still works as it should.
Your terminal may be responding to ANSI escape codes.
To prevent the data from inadvertently affecting the terminal, you could print the repr of the data, rather than the data itself:
For example,
good = 'hello'
bad = '\033[41mRED'
print('{!r}'.format(good))
print('{!r}'.format(bad))
yields
'hello'
'\x1b[41mRED'
whereas
print(good)
print(bad)
yields
(Btw, typing reset will reset the terminal.)
You can convert all characters outside of the ASCII range which should eliminate any stray escape sequences that will change the state of your terminal.
print s.encode('string-escape')
When you get a packet check it for validity before outputting it. One possibility is to check that each character in the packet is printable, that is, in the range of 32-127 decimal, before output. Or add checksums to the packets and reject any with bad checksums.

^H ^? in python

Some terminals will send ^? as backspace, some other terminals will send ^H.
Most of the terminals can be configured to change their behavior.
I do not want to deal with all the possible combinations but I would like to accept both ^? and ^H as a backspace from python.
doing this
os.system("stty erase '^?'")
I will accept the first option and with
os.system("stty erase '^H'")
I will accept the second one but the first will be no longer available.
I would like to use
raw_input("userinput>>")
to grab the input.
The only way I was able to figure out is implementing my own shell which works not on "raw based input" but on "char based input".
Any better (and quicker) idea?
The built-in function raw_input() (or input() in Python 3) will automatically use the readline library after importing it. This gives you a nice and full-feautured line editor, and it is probably your best bet on platforms where it is available, as long as you don't mind Readline having a contagious licence (GPL).
I don't know your question exactly. IMO, you need a method to read some line-based text(including some special character) from console to program.
No matter what method you use, if read this character have special mean in different console, you should confront a console(not only system-specific, but also console-specific) question, all text in console will be store in buffer first, and then show in screen, finally processed and send in to your program. Another way to surround this problem is to use a raw line-obtaining console environment.
You can add a special method(a decorator) to decorate the raw_input() or somewhat input method to process special word.
After solved that question
using this snippet can deal with input,:
def pre():
textline=raw_input()
# ^? should replace to the specific value.
textline.replace("^?","^H")
return textline
To be faster, maybe invoke some system function depend on OS is an idea. But in fact, IO in python is faster enough for common jobs.
To fix ^? on erase do stty erase ^H

Categories

Resources