GhostScript Percentage Complete - python

I have a project that i'm working on using Python and GhostScript. Currently I have a PDF file with several hundred pages, I need to take this PDF and seperate the pages into TIF Files, I have the code up and running correctly, however as this program can take some time to complete, I would like to have an output to the user showing how far in the process it is.
gs_appdir = 'C:\\Program Files\\gs\\gs9.10\\bin\\'
os.chdir(gs_appdir)
args = "-o \""+wkgdir+"Images\\Image_%03d.tif\" -sDEVICE=tiffg4 -q -NODISPLAY -r200x200 -s \""+inPDFFile+"\""
os.system("gswin64c.exe %s" % args)
When I take the -q parameter out of the arguments it will show me, however it will go line by line by line:
Page 1 of xxxx
Page 2 of xxxx
Page 3 of xxxx
...et c.
I would like to do something similar to this:
print "\r[ Preparing Images ] %s%%" % ((img+1)*100/imgcount),
Where it will print the percentage on the same line, however i'm not quite sure how to accomplish this using GhostScript. Any advice would be greatly appriciated.

Hmm I think basically you can't do this. With a chunk of PostScript programming you could get a message of the form you want passed out, but it will be on stdout, so if you run with -q then it will be suppressed.
You would have to start by using the4 Ghostscript extensions to open the PDF file, begin the PDF interpretation, count the number of pages, then enter a loop rendering each page and emitting a message. Followed by closing everything down. Normally the convenience routines in Ghostscript take care of all this for you, but they won't emit the kind of message you want.
Even if it was all emitted on the same line, I don't think you would like it. You seem to be assuming that the later lines will simply erase (write on top of) the previous lines, and that isn't the case I don't think, though it does depend on the command shell (and therefore the operating system) you are using. I note you seem to be using Windows, you might be able to get this effect with a \r I'm not sure.
Fundamentally, if you want to do this you will have to write a new application which uses the API rather than executing Ghostscript as an external process, and process the stdout callback yourself.

Related

Safe way to view json currently being written by Python code

I have a script I'm running a bunch of times that generates and logs data in json files. These take days to run and I need to run several dozen test cases. I log progress in json files for post-processing. I'd like to check in occasionally to see how long it has left. This is all single thread, but I've dealt with multiprocessing enough to be scared of opening the file while it's being written for fear that viewing it will place a temporary lock on the file.
Is it safe to view the json in a linux terminal using nano log_file.json while my Python scripts are running and could attempt to write to the log at any time?
If it is not safe, are there any alternatives?
I'm worried if Python tries to record an entry that it could be lost or throw an error while I'm viewing progress. Viewing only, no saving obviously. I'd love to check in on progress to switch between test cases faster, but I really don't want to raise an error that loses days of progress if it's unable to write to the json.
Sorry if this is a duplicate, I tried searching but I'm not sure what to even search for this question.
You can use tail command on terminal to view the logs. Following is the full command:-
tail -F <path_to_file>
It will show some of the last lines of the file and continue to show if data is being written in the file.

Displaying all results of execution

When I execute a python program, the results starts to appear quickly and I can't read it all. It just flushes over my screen.
When the execution ends, I can no longer see the first displays, because the terminal display space is limited.
How save the output, so I can read all of it?
You have a few options here.
Add a breakpoint and learn how to use the debugger. Once you add this command (import pdb;pdb.set_trace() # this will take some learning so look up what pdb is online. actually, i prefer 'ipdb' instead.), the code will stop at that specific point when you execute it.
Save it to a file (python file.py > filename.txt) and then read it afterwards. Bonus: Before you ask yourself, where are my outputs? https://askubuntu.com/questions/625224/how-to-redirect-stderr-to-a-file
(More advanced) Your code is spitting out too much garbage output. You can remove some of the code or use python logging filters.
May be platform dependant.
On Linux you can also pipe your program output into your favorite pager (less for example) if you don't want to write it to a file.
python file.py | less

Printing Automation using win32api Python [duplicate]

I'm correcting assignments from my students right now and I'd like to automate an annoying step I always have to do.
After annotating their PDF solutions, I need to Print them to PDF files in order to bake my annotations into the PDF so that they can be included in LaTeX. Right now I have to manually choose "Microsoft Print to PDF" and enter the PDFs name with a leading underscore (which is what my automatically generated LaTeX files expect). This gets annoying for 30+ files.
So I'd like to issue this in a batch-script automatically for all the PDFs to minimize my efforts to a simple double-click. I have seen that this is possible with e.g. C# (Here), but I'd like a solution with a simple batch script.
Can this be done?
Edit:
The C#-Code I found actually does not get the Job done. You can't print existing PDFs that way. I'd need to use Spire.PDF to do that. The Free Version however messes up the PDF; I can download the "Full" version in NuGet, this however generates a disclaimer at the beginning of any PDF, and it still can't handle things I draw in Adobe Reader DC. So C# really is not an option, I need a command-line solution.
You'll better install pdfcreator
and use the commandline options
I assume it should be quite easy using PowerShell, but I ran into the same problem as described in this post.
The PowerShell solution from here creates only blank PDF files for me.
There probably exist better solutions, but I managed to combine PDFtoPrinter and this post.
A batch script could look like this:
for /R %%f in (*.pdf) do (
(echo with createobject^("wscript.shell"^)
echo .run "<path to PDFtoPrinter.exe> ""%%f"""
echo wscript.sleep 3000
echo .sendkeys """%%~df%%~pf%%~nf_correction.pdf"""
echo .sendkeys "{enter}"
echo wscript.sleep 3000
echo end with) > %temp%\sk.vbs
start /w %temp%\sk.vbs
)
This script uses Microsoft Print to PDF to create corresponding files of the format <filename>_correction.pdf.
The batch script creates an sk.vbs script in %temp% and runs it.
The sk.vbs script then handles the file saving dialog of Microsoft Print to PDF.
Additionally, this solution has the drawback that you can't use your computer while the script runs because the sk.vbs script must send keys to the window in focus.

What is the best way of running shell commands from a web based interface?

Imagine a web application that allows a logged in user to run a shell command on the web server at the press of a button. This is relatively simple in most languages via some standard library os tools.
But if that command is long running you don't want your UI to hang. Again this is relatively easy to deal with using some sort of background process or putting the command to be executed onto a message queue (and maybe saving the output and status somewhere for later consumption). Just return quickly saving we'll run that and get back to you.
What I'd like to do is show the output of said web ui triggered shell command as it happens. So vertically scrolling text like when running in a terminal.
I have a vague idea of how I might approach this, streaming the output to a websocket perhaps and simply printing the output to screen.
What I'd like to ask is:
Are their any plugins, libraries or applications that already do this. Something I can either use or read the source of. Ideally an open source python/django or ruby/rails tool, but other stacks would be interesting too.
I'm not sure if it's what you want, but there are some web based ssh clients out there. If you care about security and really just want dynamic feedback, you could look into comet or just have a frame with its own http session that doesn't end until it's done printing.
web-based ssh client would work, into the host (there are java ssh clients out there).
Ruby has a web-based terminal:
http://tryruby.org (link to the source is at the bottom of the page).
You could also embed Ruby via jruby: http://tim.lossen.de/2007/03/jruby/applet.html
http://github.com/jruby/jruby/blob/master/samples/irb-applet.html
I haven't heard of any libraries that do this, but you'll need to setup the system command and call out to the system. You will then need to "pump" the sysout and syserr standard inputs and pipe that data back out to your web client.
As an example for this style of problem, look into code snippits of how people use ruby/python/etc to transcode a video, i.e. http://kpumuk.info/ruby-on-rails/encoding-media-files-in-ruby-using-ffmpeg-mencoder-with-progress-tracking/ - my example was taken from this blog post.
class MediaFormatException < StandardError
end
def execute_mencoder(command)
progress = nil
IO.popen(command) do |pipe|
pipe.each("r") do |line|
if line =~ /Pos:[^(]*(s*(d+)%)/
p = $1.to_i
p = 100 if p > 100
if progress != p
progress = p
print "PROGRESS: #{progress}n"
$defout.flush
end
end
end
end
raise MediaFormatException if $?.exitstatus != 0
end
I don't know if this example is pulling data from both sysout and syserr, but you will definitely need to be pulling data from both of those interfaces, typically if the buffer fills up, the executing command might hang or fail (I have experienced this with Python). This method will also look different if the only thing you do is return line to the web client - in a terminal, the progress indicator of ffmpeg/mencoder remains stationary on the bottom line, but this method will give you a long list of progress indicator updates. Pipe line out to your terminal and you'll see what I'm referring to.
So, I've tried to answer my own question with code as I couldn't find anything to quite fit the bill. Hopefully it's useful to anyone coming across the same problem.
Redbeard 0X0A pointed me in the general direction, I was able to get a stand along ruby script doing what I wanted using popen. Extending this to using EventMachine (as it provided a convenient way of writing a websocket server) and using it's inbuilt popen method solved my problem.
More details here http://morethanseven.net/2010/09/09/Script-running-web-interface-with-websockets.html and the code at http://github.com/garethr/bolt/
Certainly not the best way to run shell commands, but likely the easiest:
#!/bin/sh
echo Content-Type: text/plain
echo
/usr/bin/uptime
http://www.sente.cc/scripts/uptime.cgi
Take a look at Galaxy (online demo) or Yabi.
Except from the requirement to be able to show output during the job run, they are both excellent solutions to this! They are also both written i Python (and Yabi even on django).
They were both built with bioinformatics in mind, but really are both general job runner/workflow tools.
They will let you specify parameters in a web interface, see queued/running/finished jobs in a separate column, and after the jobs are finished, inspect details and results, or re-run the job, with possibly changed parameters.
Galaxy is the easier one to install. The Galaxy installation boils down to downloading and run "sh run.sh"), and adding your own tool boils down to creating an XML file in the line of:
<tool id="mytool" name="My Tool" version="1.0.0">
<description>Does this and that</description>
<command>somecommand --aparam $aparam</command>
<inputs>
<param name="aparam" type="text" label="A parameter"/>
</inputs>
<outputs>
<data name="outfile" format="tabular"/>
</outputs>
</tool>
... and place it in the /tools folder, and add a line in the tool_conf.xml to tell galaxy of your new tool (There you can also get rid of the bioinformatics-tools, so they don't mess up your tools menu).
Yabi is more complicated to install (see the readme file), but the process might be smooth if you are on the right kind of system. On the other hand, it allows you even do the tool configuration in the web interface, rather than as an XML file like in Galaxy.
Galaxy still is the one with the biggest community though, which is reflected in the number of features/already integrated tools (See the toolshed for shared tools/wrapper).
websocketd looks like the perfect tool for that.

Running multiple processes and capturing the output in python with pygtk

I'd like to write a simple application that runs multiple programs and displays their output in multiple terminal (style) windows. In addition, I want to be able to read the stdout/stderr of these processes and search for keywords in the output.
I've tried implementing this two ways in python, the first using subprocess.Popen and the second using vte (python-vte).
I've only gotten Popen to work w/ polling. I have to constantly check to see if the processes have data to be read, read the data, and then send it to my TextArea. It's been recommended to use gobject.io_add_watch() instead, but whenever I try that my program hangs on the second call to io_add_watch--it's like it can only handle one file descriptor at a time.
vte works great but I haven't found a reliable way to capture the output. You can get a callback when the cursor moves and then screen scrape w/ get_text(), but I've already run into cases where these programs I'm viewing generate an obscene about of tty in one go and then it's off the screen. There doesn't appear to be a callback that contains new text to be added to the window.
Any ideas?
I did something similar to this using the subprocess.Popen. For each process I actually ended up redirecting the stdout and stderr to a temporary file, then periodically checking the file for updates and dumping the output into a TextView.
The reason for not using a pipe to the process was that the processes themselves were volatile and prone to segfaults. When that happened I sometimes lost data between the last read and the segfault (which was the most needed data to determine the cause of the segfault).
As it turned out, sometimes I'd want to save the output from a specific process, so this method worked well for me.
If you go with igkuk's suggestion, I got some good advice on watching files for changes in a related question. That worked pretty well for me (I was watching a log file for changes).
You want to use select to monitor the pipes from your subprocesses. It's better than polling.

Categories

Resources