Handling lines with quotes using python's readline - python

I've written a simple shell-like program that uses readline in order to provide smart completion of arguments. I would like the mechanism to support arguments that have spaces and are quoted to signify as one argument (as with providing the shell with such).
I've seen that shlex.split() knows how to parse quoted arguments, but in case a user wants to complete mid-typing it fails (for example: 'complete "Hello ' would cause an exception to be thrown when passed to shlex, because of unbalanced quotes).
Is there code for doing this?
Thanks!

I don't know of any existing code for the task, but if I were to do this I'd catch the exception, try adding a fake trailing quote, and see how shlex.split does with the string thus modified.

GNU Readline allows for that scenario with the variable rl_completer_quote_characters. Unfortunatelly, Python does not export that option on the standard library's readline module (even on 3.7.1, the latest as of this writing).
I found a way of doing that with ctypes, though:
import ctypes
libreadline = ctypes.CDLL ("libreadline.so.6")
rl_completer_quote_characters = ctypes.c_char_p.in_dll (
libreadline,
"rl_completer_quote_characters"
)
rl_completer_quote_characters.value = '"'
Note this is clearly not portable (possibly even between Linux distros, as the libreadline version is hardcoded, but I didn't have plain libreadline.so on my computer), so you may have to adapt it for your environment.
Also, in my case, I set only double quotes as special for the completion feature, as that was my use case.
References
https://robots.thoughtbot.com/tab-completion-in-gnu-readline#adding-quoting-support
#eryksun's comment on how to set data to a global variable in a shared library using python

To make #caxcaxcoatl answer a little bit more portable, readline hardcoded version can be replaces with readline.__file__ and it will be:
import ctypes
import readline
libreadline = ctypes.CDLL (readline.__file__)
rl_completer_quote_characters = ctypes.c_char_p.in_dll (
libreadline,
"rl_completer_quote_characters"
)
rl_completer_quote_characters.value = '"'

Related

Taming shlex.split() behaviour

There are other questions on SO that get close to answering mine, but I have a very specific use case that I have trouble solving. Consider this:
from asyncio import create_subprocess_exec, run
async def main():
command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_exec(*command)
await proc.wait()
run(main())
This causes trouble, because program.exe is called with these arguments:
['C:\\some folder', '-o\\server\\share\\some folder', 'a "quote"']
That is, the double backslash is no longer there, as shlex.split() removes it. Of course, I could instead (as other answers suggest) do this:
proc = await create_subprocess_exec(*command, posix=False)
But then program.exe is effectively called with these arguments:
['"C:\\some folder"', '-o"\\\\server\\share\\some folder"', '"a \\"', 'quote\\""']
That's also no good, because now the double quotes have become part of the content of the first parameter, where they don't belong, even though the second parameter is now fine. The third parameters has become a complete mess.
Replacing backslashes with forward slashes, or removing quotes with regular expressions all don't work for similar reasons.
Is there some way to get shlex.split() to leave double backslashes before server names alone? Or just at all? Why does it remove them in the first place?
Note that, by themselves these are perfectly valid commands (on Windows and Linux respectively anyway):
program.exe "C:\some folder" -o"\\server\share\some folder"
echo "hello \"world""
And even if I did detect the OS and used posix=True/False accordingly, I'd still be stuck with the double quotes included in the second argument, which they shouldn't be.
For now, I ended up with this (arguably a bit of a hack):
from os import name as os_name
from shlex import split
def arg_split(args, platform=os_name):
"""
Like calling shlex.split, but sets `posix=` according to platform
and unquotes previously quoted arguments on Windows
:param args: a command line string consisting of a command with arguments,
e.g. r'dir "C:\Program Files"'
:param platform: a value like os.name would return, e.g. 'nt'
:return: a list of arguments like shlex.split(args) would have returned
"""
return [a[1:-1].replace('""', '"') if a[0] == a[-1] == '"' else a
for a in (split(args, posix=False) if platform == 'nt' else split(args))]
Using this instead of shlex.split() gets me what I need, while not breaking UNC paths. However, I'm sure there's some edge cases where correct escaping of double quotes isn't correctly handled, but it has worked for all my test cases and seems to be working for all practical cases so far. Use at your own risk.
#balmy made the excellent observation that most people should probably just use:
command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_shell(command)
Instead of
command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_exec(*command)
However, note that this means:
it's not easy to check or replace individual arguments
you have the problem that always comes with using create_subprocess_exec if part of your command is based on external input, someone can inject code; in the words of the documentation (https://docs.python.org/3/library/asyncio-subprocess.html):
It is the application’s responsibility to ensure that all
whitespace and special characters are quoted appropriately to avoid
shell injection vulnerabilities. The shlex.quote() function can be
used to properly escape whitespace and special shell characters in
strings that are going to be used to construct shell commands.
And that's still a problem, as quote() also doesn't work correctly for Windows (by design).
I'll leave the question open for a bit, in case someone wishes to point out why the above is a really bad idea, or if someone has a better one.
As far as I can tell, the shlex module is the wrong tool if you are dealing with the Windows shell.
The first paragraph of the docs says (my italics):
The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell.
Admittedly, that talks about just one class, not the entire module. Later, the docs for the quote function say (boldface in the original, this time):
Warning The shlex module is only designed for Unix shells.
To be honest, I'm not sure what the non-Posix mode is supposed to be compatible with. It could be, but this is just me guessing, that the original versions of shlex parsed a syntax of its own which was not quite compatible with anything else, and then Posix mode got added to actually be compatible with Posix shells. This mailing list thread, including this mail from ESR seems to support this.
For the -o parameter, but the leading " at the start of it not in the middle, and double the backslashes
Then use posix=True
import shlex
command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
print( "Original command Posix=True", shlex.split(command, posix=True) )
command = r'program.exe "C:\some folder" "-o\\\\server\\share\\some folder" "a \"quote\""'
print( "Updated command Posix=True", shlex.split(command, posix=True) )
result:
Original command Posix=True ['program.exe', 'C:\\some folder', '-o\\server\\share\\some folder', 'a "quote"']
Updated command Posix=True ['program.exe', 'C:\\some folder', '-o\\\\server\\share\\some folder', 'a "quote"']
The backslashes are still double in the result, but that's standard Python representation of a \ in a string.

How to use pyftsubset of Fonttools inside of the python environment, not from the command line

I need to subset very many font files and I need to do that from within the python environment. Yet, Fonttools is very poorly documented and I cannot find a module and the proper function syntax to perform subsetting based on unicode from within python, not as a command line tool (pyftsubset). Some of my files contain various errors when read by the Fonttools and I cannot catch exceptions using !command inside jupyter.
pyftsubset is itself just a Python script, which calls fontTools.subset.main, which in turn parses sys.argv (command-line args) to perform subsetting. You can do the same thing pretty easily in your own script, for example:
import sys
from fontTools.subset import main as ss
sys.argv = [None, '/path/to/font/file.ttf', '--unicodes=U+0020-002F']
ss() # this is what actually does the subsetting and writes the output file
Obviously you'll want to use your own values for --unicodes plus the numerous other pyftsubset options, but in general this scheme should work. Possible caveat is if you have other parts of your program that use/rely on sys.argv; if that's the case you might want to capture the initial values in another variable before modifying sys.argv and calling the subsetter, then re-set it to the initial values after.
I think that should be a pythonic way to do it properly:
from fontTools import subset
subsetter = subset.Subsetter()
subsetter.populate(unicodes=["U+0020", "U+0021"])
subsetter.subset(font)
While font is your TTFont and you might need to check the documentation for how to exactly pass in the the list of unicodes. I didn’t test this exact code, but I tested it with subsetter.populate(glyphs=["a", "b"]) which does a similar job, but with glyphNames instead. The populate method can take these arguments as documented: populate(self, glyphs=[], gids=[], unicodes=[], text='')
I found a clue to that in this discussion.

Calling a subprocess with mixed data type arguments in Python

I am a bit confused as to how to get this done.
What I need to do is call an external command, from within a Python script, that takes as input several arguments, and a file name.
Let's call the executable that I am calling "prog", the input file "file", so the command line (in Bash terminal) looks like this:
$ prog --{arg1} {arg2} < {file}
In the above {arg1} is a string, and {arg2} is an integer.
If I use the following:
#!/usr/bin/python
import subprocess as sbp
sbp.call(["prog","--{arg1}","{arg2}","<","{file}"])
The result is an error output from "prog", where it claims that the input is missing {arg2}
The following produces an interesting error:
#!/usr/bin/python
import subprocess as sbp
sbp.call(["prog","--{arg1} {arg2} < {file}"])
all the spaces seem to have been removed from the second string, and equal sign appended at the very end:
command not found --{arg1}{arg2}<{file}=
None of this behavior seems to make any sense to me, and there isn't much that one can go by from the Python man pages found online. Please note that replacing sbp.call with sbp.Popen does not fix the problem.
The issue is that < {file} isn’t actually an argument to the program, but is syntax for the shell to set up redirection. You can tell Python to use the shell, or you can setup the redirection yourself.
from subprocess import *
# have shell interpret redirection
check_call('wc -l < /etc/hosts', shell=True)
# set up redirection in Python
with open('/etc/hosts', 'r') as f:
check_call(['wc', '-l'], stdin=f.fileno())
The advantage of the first method is that it’s faster and easier to type. There are a lot of disadvantages, though: it’s potentially slower since you’re launching a shell; it’s potentially non-portable because it depends on the operating system shell’s syntax; and it can easily break when there are spaces or other special characters in filenames.
So the second method is preferred.

Unable to draw Unicode characters with Python's PyCDC.DrawText()

I'm trying to draw Unicode characters using PyCDC.DrawText(), but it seems to draw two ASCII characters instead. For example, when trying to draw 'Я' (\u042F), I get: http://i.stack.imgur.com/hh9RJ.png
My string is defined as a Unicode string:
text = u'Я'
And the file starts with:
# -*- coding:utf-8 -*-
I also tried printing the string (to the console) and it comes out fine, so the problem is probably lying within the implementation of DrawText().
Thanks!
To output Unicode text on Windows you need to encode it in UTF-16 and call the wide character version of the DrawText() or TextOut() Win32 functions. In case you aren't familiar, the Windows API is natively UTF-16 and has parallel 8 bit ANSI versions for legacy support.
I know nothing of the Win32 wrapper you are using but rather suspect that PyCDC.DrawText() is calling the ANSI version of whichever one of these Win32 functions is doing the work. Your solution will likely involve finding a way to invoke DrawTextW() or TextOutW(). You could do it with ctypes, and these functions must surely be available through PyWin32 also.
However, I would probably opt for something higher level, like PyQt.
David: 'these functions must surely be available through PyWin32 ' actually, they are not.
After dozens of hours of searching, trying to figure out where within win32ui, win32gui, etc. there might be a hidden TextOutW, writing my own C extension that had other flaws so couldn't use it, writing an external prog. called from within python only to find out that HDC handles cannot be transfered to other processes, I have finally stumbled upon this single-line elegant pre-programmed solution, based on ctypes as suggested above:
you need the TextOutW or similar function as you were used to from windows gdi c. Although there is a function called win32gdi.DrawTextW which works exactly like the windows counterpart, yet sometimes you need to specifically use e.g. TextOut, ExtTextOut etc., which are not available in the unicode W-suffixed versions in pywin32's win32gdi
to achieve this, instead of using the limited win32gui functions, use windll.gdi32.TextOutW available from ctypes:
from ctypes import *
import win32gdi
# init HDC
# set hdc to something like
hdc = win32gdi.CreateDC(print_processor, printername, devmode)
# here comes the ctypes function that does your deal
text = u'Working! \u4e00\u4e01'
windll.gdi32.TextOutW(hdc, x, y, text, len(text))
# ... continue your prog ...
Have fun with this

Can you access registers from python functions in vim

It seems vims python sripting is designed to edit buffer and files rather than work nicely with vims registers. You can use some of the vim packages commands to get access to the registers but its not pretty.
My solution for creating a vim function using python that uses a register
is something like this.
function printUnnamedRegister()
python <<EOF
print vim.eval('##')
EOF
endfunction
Setting registers may also be possible using something like
function setUnnamedRegsiter()
python <<EOF
s = "Some \"crazy\" string\nwith interesting characters"
vim.command('let ##="%s"' % myescapefn(s) )
EOF
endfunction
However this feels a bit cumbersome and I'm not sure exactly what myescapefn should be.
So I've never been able to get the setting version to work properly.
So if there's a way to do something more like
function printUnnamedRegister()
python <<EOF
print vim.getRegister('#')
EOF
endfunction
function setUnnamedRegsiter()
python <<EOF
s = "Some \"crazy\" string\nwith interesting characters"
vim.setRegister('#',s)
EOF
endfunction
Or even a nice version of myescapefn I could use then that would be very handy.
UPDATE:
Based on the solution by ZyX I'm using this piece of python
def setRegister(reg, value):
vim.command( "let #%s='%s'" % (reg, value.replace("'","''") ) )
If you use single quotes everything you need is to replace every occurence of single quote with two single quotes.
Something like that:
python import vim, re
python def senclose(str): return "'"+re.sub(re.compile("'"), "''", str)+"'"
python vim.command("let #r="+senclose("string with single 'quotes'"))
Update: this method relies heavily on an (undocumented) feature of the difference between
let abc='string
with newline'
and
execute "let abc='string\nwith newline'"
: while the first fails the second succeeds (and it is not the single example of differences between newline handling in :execute and plain files). On the other hand, eval() is somewhat more expected to handle this since string("string\nwith newline") returns exactly the same thing senclose does, so I write this things now only using vim.eval:
python senclose = lambda str: "'"+str.replace("'", "''")+"'"
python vim.eval("setreg('#r', {0})".format(senclose("string with single 'quotes'")))

Categories

Resources