Python Script Executed with Makefile - python

I am writing python scripts and execute them in a Makefile. The python script is used to process data in a pipeline. I would like Makefile to execute the script every time I make a change to my python scripts.
Does anyone have an idea of how to do this?

That's not a lot of information, so this answer is a bit vague. The basic principle of Makefiles is to list dependencies for each target; in this case, your target (let's call it foo) depends on your python script (let's call it do-foo.py):
foo: do-foo.py
python do-foo.py > foo
Now foo will be rerun whenever do-foo.py changes (provided, of course, you call make).

And in case when the scripts that need to be run don't produce any useful output file that can be used as a target, you can just use a dummy target:
scripts=a.py b.py c.py
checkfile=.pipeline_up_to_date
$(checkfile): $(scripts)
touch $(checkfile)
echo "Launching some commands now."
default: $(checkfile)

If you want that Makefile to be automatically "maked" immediately after saving, pyinotify, which is a wrapper for inotify, might be the only possibility under Linux. It registers at the kernel to detect FS changes and calls back your function.
See my previous post on that topic.

Related

python script calling another script

I wrote a python script that works. The first line of my script is reading an hdf5 file
readFile = h5py.File('FileName_00','r')
After reading the file, my script does several mathematical operations, successfully working. In the output I got function F.
Now, I want to repeat the same script for different files. Basically, I only need to modify FileName_00 by FimeName_01 or ....FileName_10. I was thinking to create a script that call this script!
I never wrote a script that call another script, so any advice would be appreciable.
One option: turn your existing code into a function which takes a filename as an argument:
def myfunc(filename):
h5py.file(filename, 'r')
...
Now, after your existing code, call your function with the filenames you want to input:
myfunc('Filename_00')
myfunc('Filename_01')
myfunc('Filename_02')
...
Even more usefully, I definitely recommend looking into
if(__name__ == '__main__')
and argparse (https://docs.python.org/3/library/argparse.html) as jkr noted.
Also, if you put your algorithm in a function like this, you can import it and use it in another Python script. Very useful!
Although there are certainly many ways to achieve what you want without multiple python scripts, as other answerers have shown, here's how you could do it.
In python we have this function os.system (learn more about it here: https://docs.python.org/3/library/os.html#os.system). Simply put, you can use it like this:
os.system("INSERT COMMAND HERE")
Replacing INSERT COMMAND HERE with the command you use to run your python script. For example, with a script named script.py you could conceivably (depending on your environment) include the following line of code in a secondary python script:
os.system("python script.py")
Running the secondary python script would run script.py as well. FWIW, I don't necessarily think this is the best way to accomplish your goal -- I tend to agree with DraftyHat's solution in most circumstances. But in case you were curious, this is certainly an option in python. I've used this functionality in the past, albeit not to run other python scripts, but to execute commands in the shell. Hope this helps!

Running powershell scripts with python under one session

I try to create a python program which will deobfuscate powershell malware, which uses IEX. My python program is actually hooking the IEX function and instead of running the desired string, it will print the string.
Now my problem is that I have some .ps1 scripts (for examples 1.ps1, 2.ps1, etc..) and I want to run all of them under the same session so that by this, all the local variables created by 1.ps1 script, the 2.ps1 script will be able to use...
Now I tried so many ways, First I tried with subprocess but it always creates a new session for every time I enter a command (which is the path of the .ps1 file). Then I found this project at GitHub:
https://gist.github.com/MarkBaggett/a7c10195b2626c78009bf73bcdb6db20
Which is really awesome and did work but still, it seems that when I run the command ./1.ps1 it still does not store the local variables at the session (Maybe it opens a new one when running a script).
I tried to do also "Get-Content 1.ps1 | iex" but then it crashes since I have functions there for example:
function Invoke-Expression()
{
param(
[Parameter( `
Mandatory=$True, `
Valuefrompipeline = $True)]
[String]$Command
)
Write-Host $Command
}
taken from PSDecode project:
https://github.com/R3MRUM/PSDecode/blob/master/PSDecode.psm1#L28
Anyway, any ideas about how I can do this? I have those scripts on my desktop but no idea how to run them at the same session so they will use the same local variables...
Two things that I did though but they really suck:
1. Convert all the scripts to 1 script and run it, but in next run that I will use this program I might have 100 scripts or more and I don't really want to do this.
2. I can save the local variables from each script and load it to another yet I want to use it in the worst case scenario and still didn't get there.
Thank you so much for helping me and sorry for my grammar my English is not my mother language as you can see :)
Maybe you're looking for dot sourcing:
Runs a script in the current scope so that any functions, aliases, and variables that the script creates are added to the current scope.
PowerShell
. c:\scripts\sample.ps1
If so dot-source your ps1 files, and call the functions inside them.
Hope that helps.

Python - run another python script with current environment passing the arguments over and getting the printed print output

A little bit of an ugly question, but I didn't find existing SO posts which cover it.
Right now I need to use an existing python tool available on this github
This is a rather big piece of code with a lot of dependencies which I don't want to mess with. In a nutshell one can run its module by passing the command line arguments, for example:
timesearch.py timesearch -r "subreddit1" -l "1466812800" -up "1498348800"
Now, I need to run this tool a bunch of times using a for loop, passing over different argument values each time. The tool also prints out some output into command line when you run it - and I would like to intercept and print it out from my python script as well. Finally, I need to ensure that before I move on in my loop and run the tool another time that current execution of the timesearch tool is completed.
One side note here - I do need to ensure that the timesearch is executed using same environment which I use to run my main script with for loop.
I am trying to understand what is the best way to do it.
If I just go for this it doesn't work:
import os
#for loop will go here
os.system('python timesearch.py timesearch -r "ethereum" -l "1466812800" -up "1498348800"')
It fails due to several reasons - it doesn't use the environment in which I am writing my script with a loop, it also doesn't capture the print output of timesearch.
Any advice on how to achieve it?
Just to highlight - I can't just go and pull function I need in timesearch, since it calls the __init__ to set up some things based on the arguments you pass.
I wouldn't call python script with os.system. There is basically one function which you need to use: main(sys.argv[1:])
https://github.com/voussoir/timesearch/blob/master/timesearch/__init__.py#L435.

Running jobs on a cluster submitted via qsub from Python. Does it make sense?

I have the situation where I am doing some computation in Python, and based on the outcomes I have a list of target files that are candidates to be passed to 2nd program.
For example, I have 50,000 files which contain ~2000 items each. I want to filter for certain items and call a command line program to do some calculation on some of those.
This Program #2 can be used via shell command line, but requires also a lengthy set of arguments. Because of performance reasons I would have to run Program #2 on a cluster.
Right now, I am running Program #2 via
'subprocess.call("...", shell=True)
But I'd like to run it via qsub in future.
I have not much experience of how exactly this could be done in a reasonably efficient manner.
Would it make sense to write temporary 'qsub' files and run them via subprocess() directly from the Python script? Is there a better, maybe more pythonic solution?
Any ideas and suggestions are very welcome!
It makes perfect sense, although I would go for another solution.
As far as I understand, you have programme #1 that determines which of your 50,000 files needs to be computed by programme #2.
Both programme #1 and #2 are written in Python. Excellent choice.
Incidentally, I have a Python module that might come in handy: https://gist.github.com/stefanedwards/8841307
If you are running the same qsub-system as I have (no idea what ours is called), you cannot use command arguments on the submitted scripts. Instead, any options are submitted via the -v option, that puts them into environment variables, e.g.:
[me#local ~] $ python isprime.py 1
1: True
[me#local ~] $ head -n 5 isprime.py
#!/usr/bin/python
### This is a python script ...
import os
os.chdir(os.environ.get('PBS_O_WORKDIR','.'))
[me#local ~] $ qsub -v isprime='1 2 3' isprime.py
123456.cluster.control.com
[me#local ~]
Here, isprime.py could handle command line arguments using argparse. Then you just need to check whether the script is running as a submitted job, and then retrieve said arguments from the environment variables (os.environ).
When programme #2 is modified to be run on the cluster, programme #1 can submit jobs by using subprocess.call(['qsub','-v options=...','programme2.py'], shell=FALSE)
Another approach would be to queue all the files in a database (say, an SQLite database). Then you could have programme #1 check all non-processed entries in the database, determine the outcome (run, not run, run with special options).
You now have the opportunity to run programme #2 in parallel on the cluster, which simply checks for the database for files to analyse.
Edit: When Programme #2 is an executable
Instead of a python script, we use a bash script that takes environment variables and puts them on a command line for the programme:
#!/bin/bash
cd .
# put options into context/flags etc.
if [ -n $option1 ]; then _opt1="--opt1 $option1"; fi
# we can even define our own defaults
_opt2='--no-verbose'
if [ -n $opt2 ]; then _opt2="-o $opt2"; fi
/path/to/exe $_opt1 $opt2
If you are going for the database solution, then have a python script that checks the database for unprocessed files, mark file as being processed (do these to in a single transaction), get options, call executable with subprocess, when done, mark file as done, check for a new file, etc.
You obviously have built yourself a string cmd containing a command that you could enter in a shell for running the 2nd program. You are currently using subprocess.call(cmd, shell=True) for executing the 2nd program from a Python script (it then becomes executed within a process on the same machine as the calling script).
I understand that you are asking how to submit a job to a cluster so that this 2nd program is run on the cluster instead of the calling machine. Well, this is pretty easy and the method is independent of Python, so there is no 'pythonic' solution, just an obvious one :-) : replace your current cmd with a command that defers the heavy work to the cluster.
First of all, dig into the documentation of your cluster's qsub command (the underlying batch system might be SGE or LSF, or whatever, you need to get the corresponding docs) and try to find the shell command line that properly submits an example job of yours to the cluster. It might look as simple as qsub ...args... cmd, whereas cmd here is the content of the original cmd string. I assume that you now have the entire qsub command needed, let's call it qsubcmd (you have to come up with that on your own, we can't help there). Now all you need to do in your original Python script is calling
subprocess.call(qsubcmd, shell=True)
instead of
subprocess.call(cmd, shell=True)
Note that qsub likely only works on very few machines, typically known as your cluster 'head node(s)'. This means that your Python script that wants to submit these jobs should run on this machine (if that is not possible, you need to add an ssh login procedure to the submission process that we don't want to discuss here).
Please also note that, if you have the time, you should look into the shell=True implications of your subprocess usage. If you can circumvent shell=True, this will be the more secure solution. This might however not be an issue in your environment.

How to I get scons to invoke an external script?

I'm trying to use scons to build a latex document. In particular, I want to get scons to invoke a python program that generates a file containing a table that is \input{} into the main document. I've looked over the scons documentation but it is not immediately clear to me what I need to do.
What I wish to achieve is essentially what you would get with this makefile:
document.pdf: table.tex
pdflatex document.tex
table.tex:
python table_generator.py
How can I express this in scons?
Something along these lines should do -
env.Command ('document.tex', '', 'python table_generator.py')
env.PDF ('document.pdf', 'document.tex')
It declares that 'document.tex' is generated by calling the Python script, and requests a PDF document to be created from this generatd 'document.tex' file.
Note that this is in spirit only. It may require some tweaking. In particular, I'm not certain what kind of semantics you would want for the generation of 'document.tex' - should it be generated every time? Only when it doesn't exist? When some other file changes? (you would want to add this dependency as the second argument to Command() that case).
In addition, the output of Command() can be used as input to PDF() if desired. For clarity, I didn't do that.
In this simple case, the easiest way is to just use the subprocess module
from subprocess import call
call("python table_generator.py")
call("pdflatex document.tex")
Regardless of where in your SConstruct file these lines are placed, they will happen before any of the compiling and linking performed by SCons.
The downside is that these commands will be executed every time you run SCons, rather than only when the files have changed, which is what would happen in your example Makefile. So if those commands take a long time to run, this wouldn't be a good solution.
If you really need to only run these commands when the files have changed, look at the SCons manual section Writing Your Own Builders.

Categories

Resources