Why is it that calling an executable via subprocess.call gives different results to subprocess.run?
The output of the call method is perfect - all new lines removed, formatting of the document is exactly right, '-' characters, bullets and tables are handled perfectly.
Running exactly the same function with the run method however and reading the output from stdout completely throws the output. Full of '\n', 'Â\xad', '\x97', '\x8f' characters with spacing all over the place.
Here's the code I'm using:
Subprocess.CALL
result=subprocess.call(['/path_to_pdftotext','-layout','/path_to_file.pdf','-'])
Subprocess.RUN
result=subprocess.run(['/path_to_pdftotext','-layout','/path_to_file.pdf','-'],stdout=PIPE, stderr=PIPE, universal_newlines=True, encoding='utf-8')
I don't understand why the run method doesn't parse and display the file in the same way. I'd use call however I need to save the result of the pdftotext conversion to a variable (in the case of run: var = result.stdout).
I can go through and just identify all the unicode it's not picking up in run and strip it out but I figure there must just be some encoding / decoding settings that the run method changes.
EDIT
Having read a similarly worded question - I believe this is different in scope as I'm wanting to understand why the output is different.
I've made some tests.
Are you printing the content on the console? Try to send the text in a text file with subprocess in both cases and see if it is different:
result=subprocess.call(['/path_to_pdftotext','-layout','/path_to_file.pdf','test.txt'])
result=subprocess.run(['/path_to_pdftotext','-layout','/path_to_file.pdf','test2.txt'])
and compare test.txt and test2.txt. In my case they are identical.
I suspect that the difference you are experiencing is not strictly related to subprocess, but how the console represent the output in both cases.
As said in the answer I linked in the comments, call():
It is equivalent to: run(...).returncode (except that the input and
check parameters are not supported)
That is your result stores an integer (the returncode) and the output is printed in the console, which seems to show it with the correct encoding, newlines etc.
With run() the result is a CompletedProcess instance. The CompletedProcess.stdout argument is:
Captured stdout from the child process. A bytes sequence, or a string
if run() was called with an encoding or errors. None if stdout was not
captured.
So being a bytes sequence or a string, python represents it differently when printed on the console, showing all the stuffs '\n', 'Â\xad', '\x97', '\x8f' and so on.
Related
I have a python program like this:
raw_data = sys.stdin.buffer.read(nbytes) # Read from standard input stream
# Do something with raw_data to get output_data HERE...
output_mask = output_data.tostring() # Convert to bytes
sys.stdout.buffer.write(b'results'+output_mask) # Write to standard output stream
Then I get the my_py.exe of this python program using Pyinstaller. I test the my_py.exe using subprocess.run() in Python. It is fine.
However, I need to call this my_py.exe in IDL. IDL has this tutorial on how to use its SPAWN command with pipes. So my IDL program which calls the my_py.exe is like this:
SPAWN['my_py.exe', arg], COUNT=COUNT , UNIT=UNIT
WRITEU, UNIT, nbytes, data_to_stream
READU, UNIT, output_from_exe
Unfortunately, the IDL program above hang at READU. Does anyone know the issue I have here? Is the problem in my python read and write?
You are missing a comma in the SPAWN command, although I imagine if that typo was in your code, IDL would issue a syntax error before you ever got to READU. But, if for some reason IDL is quietly continuing execution with an erroneous SPAWN call, maybe READU is hanging because it's trying to read some nonsense logical unit. Anyway, it should read:
SPAWN,['my_py.exe', arg], UNIT=UNIT
Here's the full syntax for reference:
SPAWN [, Command [, Result] [, ErrResult] ]
Keywords (all platforms): [, COUNT=variable] [, EXIT_STATUS=variable] [ ,/NOSHELL] [, /NULL_STDIN] [, PID=variable] [, /STDERR] [, UNIT=variable {Command required, Result and ErrResult not allowed}]
UNIX-Only Keywords: [, /NOTTYRESET] [, /SH]
Windows-Only Keywords: [, /HIDE] [, /LOG_OUTPUT] [, /NOWAIT]
I've eliminated the COUNT keyword, because, according to the documentation, COUNT contains the number of lines in Result, if Result is present, which it is not. In fact, Result is not even allowed here, since you're using the UNIT keyword. I doubt that passing the COUNT keyword is causing READU to hang, but it's unnecessary.
Also, check this note from the documentation
to make sure that the array you are passing as a command is correct:
If Command is present, it must be specified as follows:
On UNIX, Command is expected to be scalar unless used in conjunction with the NOSHELL keyword, in which case Command is expected to be a string array where each element is passed to the child process as a separate argument.
On Windows, Command can be a scalar string or string array. If it is a string array, SPAWN glues together each element of the string array, with each element separated by whitespace.
I don't know the details of your code, but here's some further wild speculation:
You might try setting the NOSHELL keyword, just as a shot in the dark.
I have occasionally had problems with IDL not seeming to finish writing to disk when I haven't closed the file unit, so make sure that you are using FREE_LUN, UNIT after READU. I know you said it hangs at READU, but my thinking here is that maybe it's only appearing to hang, and just can't continue until the file unit is closed.
Finally, here's something that could actually be the problem, and is worth looking into (from the tutorial you linked to):
A pipe is simply a buffer maintained by the operating system with an interface that makes it appear as a file to the programs using it. It has a fixed length and can therefore become completely filled. When this happens, the operating system puts the process that is filling the pipe to sleep until the process at the other end consumes the buffered data. The use of a bidirectional pipe can lead to deadlock situations in which both processes are waiting for the other. This can happen if the parent and child processes do not synchronize their reading and writing activities.
I am using Raspberry Pi to record audio. I tried pyaudio but it didn't work, then I tried to use the subprocess module. As the recording needs to be executed multiple times, I need to make sure that the recoding filename is different after every recording.
For example, I would like to:
filename = datetime.now().strftime("%Y-%m-%d_%H_%M_%S")+".wav"
My question is: can I pass this filename as an argument to subprocess? I checked the document, it says only string and list are supported as arguments in subprocess.
This filename is a string. So nothing prevents it from being used as one of the strings in subprocess.
Take care to use the list of strings variant with shell=False (the default) and the string variant with shell=True. Then everything should work as needed.
I'm using subprocess.call where you just give it an array of argumets and it will build the command line and execute it.
First of all is there any escaping involved? (for example if I pass as argument a path to a file that has spaces in it, /path/my file.txt will this be escaped? "/path/my file.txt")
And is there any way to get this command line that's generated (after escaping and all) before being executed?
As I need to check if the generated command line is not longer than certain amount of characters (to make sure it will not give an error when it gets executed).
If you're not using shell=True, there isn't really a "command line" involved. subprocess.Popen is just passing your argument list to the underlyingexecve() system call.
Similarly, there's no escaping, because there's no shell involved and hence nothing to interpret special characters and nothing that is going to attempt to tokenize your string.
There isn't a character limit to worry about because the arguments are never concatenated into a single command line. There may be limits on the maximum number of arguments and/or the length of individual arguments.
If you are using shell=True, you have to construct the command line yourself before passing it to subprocess.
A python module I am using provides a hook that allows capturing user keyboard input before it is sent to a shell terminal. The problem I am facing is that it captures input character-by-character, which makes capturing the input commands difficult when the user performs such things as backspacing or moving the cursor.
For example, given the string exit\x1b[4D\x1b[Jshow myself out, the following takes place:
>>> a = exit\x1b[4D\x1b[Jshow myself out
>>> print(a)
show myself out
>>> with open('file.txt', 'w+') as f:
>>> f.write(a)
>>> exit()
less abc.txt
The less command shows the raw command (exit\x1b[4D\x1b[Jshow myself out), when in fact I would like it to be stored 'cleanly' as it is displayed when using the print function (show myself out).
Printing the result, or 'cat'ing the file shows exactly what I would want to be displayed, but I am guessing here that the terminal is transforming the output.
Is there a way to achieve a 'clean' write to file, either using some python module, or some bash utility? Surely there must be some module out there that can do this for me?
less is interpreting the control characters.
You can get around this with the -r command line option:
$ less -r file.txt
show myself out
From the manual:
-r or --raw-control-chars
Causes "raw" control characters to be displayed. The default is
to display control characters using the caret notation; for
example, a control-A (octal 001) is displayed as "^A". Warning:
when the -r option is used, less cannot keep track of the actual
appearance of the screen (since this depends on how the screen
responds to each type of control character). Thus, various dis‐
play problems may result, such as long lines being split in the
wrong place.
The raw control characters are sent to the terminal, which then interprets them as cat would.
As others have stated, you would need to interpret the characters yourself before writing them to a file.
I am using graphviz's dot to generate some svg graphs for a web application. I call dot using Popen:
p = subprocess.Popen(u'/usr/bin/dot -Kfdp -Tsvg', shell=True,\
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
str = u'long-unicode-string-i-want-to-convert'
(stdout,stderr) = p.communicate(str)
What happends is that the dot program throw errors like:
Error: not well-formed (invalid token) in line 1
... <tr><td cellpadding="4bgcolor="#EEE8AA"> ...
in label of node n260
That obvious error is most certainly NOT in the input string. In particular, if I save it to str.txt with utf-8 encoding and do
/usr/bin/dot -Kfdp -Tsvg < str.txt > myimg.svg
I get the desired output. The only 'special' thing about str is that it contain characters like the danish øæå.
Right now I have no clue what I should do. The problem may very well be in dot; but it certainly seem to be triggered by Popen being different than using < from the shell, and i have no idea where to begin. Any help or ideas for alternatively calling dot (besides writing all the data to a file and calling that!) would be very appreciated!
Sounds like you should be doing:
stdout, stderr = p.communicate(str.encode('utf-8'))
(except, of course, that you shouldn't shadow the builtin str.) The unicode type in Python holds unicode data, not UTF-8. If you want UTF-8, you need to explicitly encode it.
On top of that, there's no reason to use shell=True in that snippet, nor is the unicode literal passed to subprocess.Popen a particularly good idea (it just gets encoded to ASCII anyway.) And the backslash at the end is unnecessary -- Python knows the line is continued, because you have an open parenthesis that hasn't been closed yet. So, use:
p = subprocess.Popen(['/usr/bin/dot', '-Kfdp', '-Tsvg'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE)