As I started asking on a previous question, I'm extracting a tarball using the tarfile module of python. I don't want the extracted files to be written on the disk, but rather get piped directly to another program, specifically bgzip.
#!/usr/bin/env python
import tarfile, subprocess, re
mov = []
def clean(s):
s = re.sub('[^0-9a-zA-Z_]', '', s)
s = re.sub('^[^a-zA-Z_]+', '', s)
return s
with tarfile.open("SomeTarballHere.tar.gz", "r:gz") as tar:
for file in tar.getmembers():
if file.isreg():
mov = file.name
proc = subprocess.Popen(tar.extractfile(file).read(), stdout = subprocess.PIPE)
proc2 = subprocess.Popen('bgzip -c > ' + clean(mov), stdin = proc, stdout = subprocess.PIPE)
mov = None
But now I get stuck on this:
Traceback (most recent call last):
File "preformat.py", line 12, in <module>
proc = subprocess.Popen(tar.extractfile(file).read(), stdout = subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 36] File name too long
Is there any workaround for this? I have been using the LightTableLinux.tar.gz (it contains the files for a text editor program) as a tarball to test the script on it.
The exception is raised in the forked-off child process when trying to execute the target program from this invocation:
proc = subprocess.Popen(tar.extractfile(file).read(), stdout = subprocess.PIPE)
This
reads the contents of an entry in the tar file
tries to execute a program with the name of the contents of that entry.
Also your second invocation won't work, as you are trying to use shell redirection without using shell=True in Popen():
proc2 = subprocess.Popen('bgzip -c > ' + clean(mov), stdin = proc, stdout = subprocess.PIPE)
The redirect may also not be necessary, as you should be able to simply redirect the output from bgzip to a file from python directly.
Edit: Unfortunately, despite extractfile() returning a file-like object, Popen() expects a real file (with a fileno). Hence, a little wrapping is required:
with tar.extractfile(file) as tarfile, file(clean(mov), 'wb') as outfile:
proc = subprocess.Popen(
('bgzip', '-c'),
stdin=subprocess.PIPE,
stdout=outfile,
)
shutil.copyfileobj(tarfile, proc.stdin)
proc.stdin.close()
proc.wait()
Related
windows 7 python 2.7
when I use popen to open a process:
from ctypes import *
dldtool = cdll.LoadLibrary(r'main.dll')
cmd = "dld_tool -c {} -r programmer.bin -f {}".format(port,file)
print cmd
with LOCK:
process = Popen(cmd, stdout=PIPE)
while process.poll() is None:
out = process.stdout.readline()
if out != '':
print out
error occurs:
process = Popen(cmd, stdout=PIPE)
File "C:\Python27\lib\subprocess.py", line 390, in __init__
errread, errwrite)
File "C:\Python27\lib\subprocess.py", line 640, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
The main.dll is in the working directory. should I change the code in python or change the any config?
You should use either the shell=True parameter if you want to pass the entire command with arguments as one string:
process = Popen(cmd, stdout=PIPE, shell=True)
or shlex.split to split your command line into a list (after importing shlex):
process = Popen(shlex.split(cmd), stdout=PIPE)
Otherwise the entire command line with arguments would be treated as one file name, and the system naturally would not be able to find it.
I'm writing a python string to parse a value from a JSON file, run a tool called Googler with a couple of arguments including the value from the JSON file, and then save the output of the tool to a file (CSV preferred, but that's for another day).
So far the code is:
import json
import os
import subprocess
import time
with open("test.json") as json_file:
json_data = json.load(json_file)
test = (json_data["search_term1"]["apparel"]["biba"])
#os.system("googler -N -t d1 "+test) shows the output, but can't write to a file.
result= subprocess.run(["googler", "-N","-t","d1",test], stdout=subprocess.PIPE, universal_newlines=True)
print(result.stdout)
When I run the above script, nothing happens, the terminal just sits blank until I send a keyboard interrupt and then I get this error:
Traceback (most recent call last):
File "script.py", line 12, in <module>
result= subprocess.run(["googler", "-N","-t","d1",test], stdout=subprocess.PIPE, universal_newlines=True)
File "/usr/lib/python3.5/subprocess.py", line 695, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/usr/lib/python3.5/subprocess.py", line 1059, in communicate
stdout = self.stdout.read()
KeyboardInterrupt
I tried replacing the test variable with a string, same error. The same line works on something like "ls", "-l", "/dev/null".
How do I extract the output of this tool and write it to a file?
Your googler command works in interactive mode. It never exits, so your program is stuck.
You want googler to run the search, print the output and then exit.
From the docs, I think --np (or --noprompt) is the right parameter for that. I didn't test.
result = subprocess.run(["googler", "-N", "-t", "d1", "--np", test], stdout=subprocess.PIPE, universal_newlines=True)
Objective: Converting ppt to pdf using python 3.6.1
Scenario: MS Office is not installed in windows server
Code used:
from subprocess import Popen, PIPE
import time
def convert(src, dst):
d = {'src': src, 'dst': dst}
commands = [
'/usr/bin/docsplit pdf --output %(dst)s %(src)s' % d,
'oowriter --headless -convert-to pdf:writer_pdf_Export %(dst)s %(src)s' % d,
]
for i in range(len(commands)):
command = commands[i]
st = time.time()
process = Popen(command, stdout=PIPE, stderr=PIPE, shell=True) # I am aware of consequences of using `shell=True`
out, err = process.communicate()
errcode = process.returncode
if errcode != 0:
raise Exception(err)
en = time.time() - st
print ('Command %s: Completed in %s seconds' % (str(i+1), str(round(en, 2))))
if __name__ == '__main__':
src = 'C:\xxx\ppt'
dst = 'C:\xxx\ppt\destination'
convert(src, dst)
Error Encountered:
Traceback (most recent call last):
File "C:/PythonFolder/ppt_to_pdf.py", line 134, in <module>
convert(src, dst)
File "C:/PythonFolder/ppt_to_pdf.py", line 123, in convert
process = Popen(command, stdout=PIPE, stderr=PIPE, shell=True) # I am aware of consequences of using `shell=True`
File "C:\Python 3.6.1\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "C:\Python 3.6.1\lib\subprocess.py", line 990, in _execute_child
startupinfo)
ValueError: embedded null character
Does anyone know how to fix this error?
Or any other python library that will help in this case.
Since you're running on Windows, the command /usr/bin/docsplit pdf --output %(dst)s %(src)s won't convert the PPT, since it seems it's for Linux. Popen might be having trouble handling that command, causing the error.
Converting a PPT to a PDF in the command line on Windows is kinda hard. I think your best bet is to install LibreOffice and run with headless mode. There's also a SuperUser question on it where the asker ends up using C# interop libraries, but I think that requires Microsoft Office to be installed.
Thanks.
I would like to run an exe from this directory:/home/pi/pi_sensors-master/bin/Release/
This exe is then run by tying mono i2c.exe and it runs fine.
I would like to get this output in python which is in a completely different directory.
I know that I should use subprocess.check_output to take the output as a string.
I tried to implement this in python:
import subprocess
import os
cmd = "/home/pi/pi_sensors-master/bin/Release/"
os.chdir(cmd)
process=subprocess.check_output(['mono i2c.exe'])
print process
However, I received this error:
The output would usually be a data stream with a new number each time, is it possible to capture this output and store it as a constantly changing variable?
Any help would be greatly appreciated.
Your command syntax is incorrect, which is actually generating the exception. You want to call mono i2c.exe, so your command list should look like:
subprocess.check_output(['mono', 'i2c.exe']) # Notice the comma separation.
Try the following:
import subprocess
import os
executable = "/home/pi/pi_sensors-master/bin/Release/i2c.exe"
print subprocess.check_output(['mono', executable])
The sudo is not a problem as long as you give the full path to the file and you are sure that running the mono command as sudo works.
I can generate the same error by doing a ls -l:
>>> subprocess.check_output(['ls -l'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/subprocess.py", line 537, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
However when you separate the command from the options:
>>> subprocess.check_output(['ls', '-l'])
# outputs my entire folder contents which are quite large.
I strongly advice you to use the subprocess.Popen -object to deal with external processes. Use Popen.communicate() to get the data from both stdout and stderr. This way you should not run into blocking problems.
import os
import subprocess
executable = "/home/pi/pi_sensors-master/bin/Release/i2c.exe"
proc = subprocess.Popen(['mono', executable])
try:
outs, errs = proc.communicate(timeout=15) # Times out after 15 seconds.
except TimeoutExpired:
proc.kill()
outs, errs = proc.communicate()
Or you can call the communicate in a loop if you want a 'data-stream' of sort, an answer from this question:
from subprocess import Popen, PIPE
executable = "/home/pi/pi_sensors-master/bin/Release/i2c.exe"
p = Popen(["mono", executable], stdout=PIPE, bufsize=1)
for line in iter(p.stdout.readline, b''):
print line,
p.communicate() # close p.stdout, wait for the subprocess to exit
Given the function
def get_files_from_sha(sha, files):
from subprocess import Popen, PIPE
import tarfile
if 0 == len(files):
return {}
p = Popen(["git", "archive", sha], bufsize=10240, stdin=PIPE, stdout=PIPE, stderr=PIPE)
tar = tarfile.open(fileobj=p.stdout, mode='r|')
p.communicate()
contents = {}
doall = files == '*'
if not doall:
files = set(files)
for entry in tar:
if (isinstance(files, set) and entry.name in files) or doall:
tf = tar.extractfile(entry)
contents[entry.name] = tf.read()
if not doall:
files.discard(entry.name)
if not doall:
for fname in files:
contents[fname] = None
tar.close()
return contents
which is called in a loop for some values of sha, after a while (in my case, 4 iterations) it starts to fail at the call to tf.read(), with the message:
Traceback (most recent call last):
File "../yap-analysis/extract.py", line 243, in <module>
commits, identities, identities_by_name, identities_by_email, identities_freq = build_commits(commits)
File "../yap-analysis/extract.py", line 186, in build_commits
commit = get_commit(commit)
File "../yap-analysis/extract.py", line 84, in get_commit
contents = get_files_from_sha(commit['sha'], files)
File "../yap-analysis/extract.py", line 42, in get_files_from_sha
contents[entry.name] = tf.read()
File "/usr/lib/python2.7/tarfile.py", line 817, in read
buf += self.fileobj.read()
File "/usr/lib/python2.7/tarfile.py", line 737, in read
return self.readnormal(size)
File "/usr/lib/python2.7/tarfile.py", line 746, in readnormal
return self.fileobj.read(size)
File "/usr/lib/python2.7/tarfile.py", line 573, in read
buf = self._read(size)
File "/usr/lib/python2.7/tarfile.py", line 581, in _read
return self.__read(size)
File "/usr/lib/python2.7/tarfile.py", line 606, in __read
buf = self.fileobj.read(self.bufsize)
ValueError: I/O operation on closed file
I suspect there is some parallelization that subprocess attempts to make (?).
What is the actual cause and how to solve it in a clean and robust way on python2?
Do not use .communicate() on the Popen instance; it'll read the stdout stream until it is finished. From the documentation:
Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached.
The code for .communicate() even adds an explicit .close() call on the stdout of the pipe.
Simply removing the call to .communicate() should be enough, but do also add a .wait() after reading the tarfile contents:
tar.close()
p.stdout.close()
p.wait()
It could be that tar.close() also closes p.stdout, but an extra .close() there should not hurt.
I think your problem is the p.communicate(). This method sends to stdin, reads from stdout and stderr (which you are not capturing) and waits for the process to terminate.
tarfile is trying to read from the processes stdout, and by the time it does then the process is finished, hence the error.
I have not tried running your code (I don't have access to git) but you probably don't want the p.communicate at all, try commenting it out.