Chaining Python Scripts

Chaining Python Scripts - python

I have two user defined python scripts. First takes a file and processes it, while the second script takes the output of first and runs an executable, and supplies the output of first script to program with additional formatting.
I need to run these scripts via another python script, which is my main executable script.
I searched a bit about this topic and;
I can use importlib to gather the content of scripts so that I can call them at appropriate times. This requires the scripts to be under my directory/or modification to path environment variable. So it is a bit ugly looking at best, not seem pythonish.
Built-in eval function. This requires the user to write a server-client like structure, cause the second script might have to run the said program more than one time while the first script still gives output.
I think I'm designing something wrong, but I cannot come up with a better approach.
A more detailed explenation(maybe gibberish)
I need to benchmark some programs, while doing so I have a standard form of data, and this data needs to be supplied to benchmark programs. The scripts are (due to nature of benchmark) special to each program, and needs to be bundled with benchmark definition, yet I need to create this program as a standalone configurable tester. I think, I have designed something wrong, and would love to hear the design approaches.
PS: I do not want to limit the user, and this is the reason why I choose to run python scripts.

I created a few test scripts to make sure this works.
The first one (count_01.py) sleeps for 100 seconds, then counts from 0 to 99 and sends it to count_01.output.
The second one (count_02.py) reads the output of first one (count_01.output) and adds 1 to each number and writes that to count_02.output.
The third script (chaining_programs.py) runs the first one and waits for it to finish before calling the second one.
# count_01.py --------------------
from time import sleep
sleep(100)
filename = "count_01.output"
file_write = open(filename,"w")
for i in range(100):
#print " i = " + str(i)
output_string = str(i)
file_write.write(output_string)
file_write.write("\n")
file_write.close()
# ---------------------------------
# count_02.py --------------------
file_in = "count_01.output"
file_out = "count_02.output"
file_read = open(file_in,"r")
file_write = open(file_out,"w")
for i in range(100):
line_in = file_read.next()
line_out = str(int(line_in) + 1)
file_write.write(line_out)
file_write.write("\n")
file_read.close()
file_write.close()
# ---------------------------------
# chaining_programs.py -------------------------------------------------------
import subprocess
import sys
#-----------------------------------------------------------------------------
path_python = 'C:\Python27\python.exe' # 'C:\\Python27\\python.exe'
#
# single slashes did not work
#program_to_run = 'C:\Users\aaaaa\workspace\Rich_Project_044_New_Snippets\source\count.py'
program_to_run_01 = 'C:\\Users\\aaaaa\\workspace\\Rich_Project_044_New_Snippets\\source\\count_01.py'
program_to_run_02 = 'C:\\Users\\aaaaa\\workspace\\Rich_Project_044_New_Snippets\\source\\count_02.py'
#-----------------------------------------------------------------------------
# waits
sys.pid = subprocess.call([path_python, program_to_run_01])
# does not wait
sys.pid = subprocess.Popen([path_python, program_to_run_02])
#-----------------------------------------------------------------------------

Related

How to make a python script stopable from another script?

TL;DR: If you have a program that should run for an undetermined amount of time, how do you code something to stop it when the user decide it is time? (Without KeyboardInterrupt or killing the task)
--
I've recently posted this question: How to make my code stopable? (Not killing/interrupting)
The answers did address my question, but from a termination/interruption point of view, and that's not really what I wanted. (Although, my question didn't made that clear)
So, I'm rephrasing it.
I created a generic script for example purposes. So I have this class, that gathers data from a generic API and write the data into a csv. The code is started by typing python main.py on a terminal window.
import time,csv
import GenericAPI
class GenericDataCollector:
def __init__(self):
self.generic_api = GenericAPI()
self.loop_control = True
def collect_data(self):
while self.loop_control: #Can this var be changed from outside of the class? (Maybe one solution)
data = self.generic_api.fetch_data() #Returns a JSON with some data
self.write_on_csv(data)
time.sleep(1)
def write_on_csv(self, data):
with open('file.csv','wt') as f:
writer = csv.writer(f)
writer.writerow(data)
def run():
obj = GenericDataCollector()
obj.collect_data()
if __name__ == "__main__":
run()
The script is supposed to run forever OR until I command it to stop. I know I can just KeyboardInterrupt (Ctrl+C) or abruptly kill the task. That isn't what I'm looking for. I want a "soft" way to tell the script it's time to stop, not only because interruption can be unpredictable, but it's also a harsh way to stop.
If that script was running on a docker container (for example) you wouldn't be able to Ctrl+C unless you happen to be in the terminal/bash inside the docker.
Or another situation: If that script was made for a customer, I don't think it's ok to tell the customer, just use Ctrl+C/kill the task to stop it. Definitely counterintuitive, especially if it's a non tech person.
I'm looking for way to code another script (assuming that's a possible solution) that would change to False the attribute obj.loop_control, finishing the loop once it's completed. Something that could be run by typing on a (different) terminal python stop_script.py.
It doesn't, necessarily, needs to be this way. Other solutions are also acceptable, as long it doesn't involve KeyboardInterrupt or Killing tasks. If I could use a method inside the class, that would be great, as long I can call it from another terminal/script.
Is there a way to do this?
If you have a program that should run for an undetermined amount of time, how do you code something to stop it when the user decide it is time?

In general, there are two main ways of doing this (as far as I can see). The first one would be to make your script check some condition that can be modified from outside (like the existence or the content of some file/socket). Or as #Green Cloak Guy stated, using pipes which is one form of interprocess communication.
The second one would be to use the built in mechanism for interprocess communication called signals that exists in every OS where python runs. When the user presses Ctrl+C the terminal sends a specific signal to the process in the foreground. But you can send the same (or another) signal programmatically (i.e. from another script).
Reading the answers to your other question I would say that what is missing to address this one is a way to send the appropriate signal to your already running process. Essentially this can be done by using the os.kill() function. Note that although the function is called 'kill' it can send any signal (not only SIGKILL).
In order for this to work you need to have the process id of the running process. A commonly used way of knowing this is making your script save its process id when it launches into a file stored in a common location. To get the current process id you can use the os.getpid() function.
So summarizing I'd say that the steps to achieve what you want would be:
Modify your current script to store its process id (obtainable by using os.getpid()) into a file in a common location, for example /tmp/myscript.pid. Note that if you want your script to be protable you will need to address this in a way that works in non-unix like OSs like Windows.
Choose one signal (typically SIGINT or SIGSTOP or SIGTERM) and modify your script to register a custom handler using signal.signal() that addresses the graceful termination of your script.
Create another (note that it could be the same script with some command line paramater) script that reads the process id from the known file (aka /tmp/myscript.pid) and sends the chosen signal to that process using os.kill().
Note that an advantage of using signals to achieve this instead of an external way (files, pipes, etc.) is that the user can still press Ctrl+C (if you chose SIGINT) and that will produce the same behavior as the 'stop script' would.

What you're really looking for is any way to send a signal from one program to another, independent, program. One way to do this would be to use an inter-process pipe. Python has a module for this (which does, admittedly, seem to require a POSIX-compliant shell, but most major operating systems should provide that).
What you'll have to do is agree on a filepath beforehand between your running-program (let's say main.py) and your stopping-program (let's say stop.sh). Then you might make the main program run until someone inputs something to that pipe:
import pipes
...
t = pipes.Template()
# create a pipe in the first place
t.open("/tmp/pipefile", "w")
# create a lasting pipe to read from that
pipefile = t.open("/tmp/pipefile", "r")
...
And now, inside your program, change your loop condition to "as long as there's no input from this file - unless someone writes something to it, .read() will return an empty string:
while not pipefile.read():
# do stuff
To stop it, you put another file or script or something that will write to that file. This is easiest to do with a shell script:
#!/usr/bin/env sh
echo STOP >> /tmp/pipefile
which, if you're containerizing this, you could put in /usr/bin and name it stop, give it at least 0111 permissions, and tell your user "to stop the program, just do docker exec containername stop".
(using >> instead of > is important because we just want to append to the pipe, not to overwrite it).
Proof of concept on my python console:
>>> import pipes
>>> t = pipes.Template()
>>> t.open("/tmp/file1", "w")
<_io.TextIOWrapper name='/tmp/file1' mode='w' encoding='UTF-8'>
>>> pipefile = t.open("/tmp/file1", "r")
>>> i = 0
>>> while not pipefile.read():
... i += 1
...
At this point I go to a different terminal tab and do
$ echo "Stop" >> /tmp/file1
then I go back to my python tab, and the while loop is no longer executing, so I can check what happened to i while I was gone.
>>> print(i)
1704312

Writing file in append mode in python [duplicate]

I am writing a script that is required to perform safe-writes to any given file i.e. append a file if no other process is known to be writing into it. My understanding of the theory was that concurrent writes were prevented using write locks on the file system but it seems not to be the case in practice.
Here's how I set up my test case:
I am redirecting the output of a ping command:
ping 127.0.0.1 > fileForSafeWrites.txt
On the other end, I have the following python code attempting to write to the file:
handle = open('fileForSafeWrites.txt', 'w')
handle.write("Probing for opportunity to write")
handle.close()
Running concurrently both processes gracefully complete. I see that fileForSafeWrites.txt has turned into a file with binary content, instead of a write lock issued by the first process that protects it from being written into by the Python code.
How do I force either or both of my concurrent processes not to interfere with each other? I have read people advise the ability to get a write file handle as evidence for the file being write to safe, such as in https://stackoverflow.com/a/3070749/1309045
Is this behavior specific to my Operating System and Python. I use Python2.7 in an Ubuntu 12.04 environment.

Use the lockfile module as shown in Locking a file in Python

Inspired from a solution described for concurrency checks, I came up with the following snippet of code. It works if one is able to appropriately predict the frequency at which the file in question is written. The solution is through the use of file-modification times.
import os
import time
'''Find if a file was modified in the last x seconds given by writeFrequency.'''
def isFileBeingWrittenInto(filename,
writeFrequency = 180, overheadTimePercentage = 20):
overhead = 1+float(overheadTimePercentage)/100 # Add some buffer time
maxWriteFrequency = writeFrequency * overhead
modifiedTimeStart = os.stat(filename).st_mtime # Time file last modified
time.sleep(writeFrequency) # wait writeFrequency # of secs
modifiedTimeEnd = os.stat(filename).st_mtime # File modification time again
if 0 < (modifiedTimeEnd - modifiedTimeStart) <= maxWriteFrequency:
return True
else:
return False
if not isFileBeingWrittenInto('fileForSafeWrites.txt'):
handle = open('fileForSafeWrites.txt', 'a')
handle.write("Text written safely when no one else is writing to the file")
handle.close()
This does not do true concurrency checks but can be combined with a variety of other methods for practical purposes to safely write into a file without having to worry about garbled text. Hope it helps the next person searching for a way to do this.
EDIT UPDATE:
Upon further testing, I encountered a high-frequency write process that required the conditional logic to be modified from
if 0 < (modifiedTimeEnd - modifiedTimeStart) < maxWriteFrequency
to
if 0 < (modifiedTimeEnd - modifiedTimeStart) <= maxWriteFrequency
That makes a better answer, in theory and in practice.

Best way to return a value from a python script

I wrote a script in python that takes a few files, runs a few tests and counts the number of total_bugs while writing new files with information for each (bugs+more).
To take a couple files from current working directory:
myscript.py -i input_name1 input_name2
When that job is done, I'd like the script to 'return total_bugs' but I'm not sure on the best way to implement this.
Currently, the script prints stuff like:
[working directory]
[files being opened]
[completed work for file a + num_of_bugs_for_a]
[completed work for file b + num_of_bugs_for_b]
...
[work complete]
A bit of help (notes/tips/code examples) could be helpful here.
Btw, this needs to work for windows and unix.

If you want your script to return values, just do return [1,2,3] from a function wrapping your code but then you'd have to import your script from another script to even have any use for that information:
Return values (from a wrapping-function)
(again, this would have to be run by a separate Python script and be imported in order to even do any good):
import ...
def main():
# calculate stuff
return [1,2,3]
Exit codes as indicators
(This is generally just good for when you want to indicate to a governor what went wrong or simply the number of bugs/rows counted or w/e. Normally 0 is a good exit and >=1 is a bad exit but you could inter-prate them in any way you want to get data out of it)
import sys
# calculate and stuff
sys.exit(100)
And exit with a specific exit code depending on what you want that to tell your governor.
I used exit codes when running script by a scheduling and monitoring environment to indicate what has happened.
(os._exit(100) also works, and is a bit more forceful)
Stdout as your relay
If not you'd have to use stdout to communicate with the outside world (like you've described).
But that's generally a bad idea unless it's a parser executing your script and can catch whatever it is you're reporting to.
import sys
# calculate stuff
sys.stdout.write('Bugs: 5|Other: 10\n')
sys.stdout.flush()
sys.exit(0)
Are you running your script in a controlled scheduling environment then exit codes are the best way to go.
Files as conveyors
There's also the option to simply write information to a file, and store the result there.
# calculate
with open('finish.txt', 'wb') as fh:
fh.write(str(5)+'\n')
And pick up the value/result from there. You could even do it in a CSV format for others to read simplistically.
Sockets as conveyors
If none of the above work, you can also use network sockets locally *(unix sockets is a great way on nix systems). These are a bit more intricate and deserve their own post/answer. But editing to add it here as it's a good option to communicate between processes. Especially if they should run multiple tasks and return values.

Python: Reread contents of a file

I have a file that an application updates every few seconds, and I want to extract a single number field in that file, and record it into a list for use later. So, I'd like to make an infinite loop where the script reads a source file, and any time it notices a change in a particular figure, it writes that figure to an output file.
I'm not sure why I can't get Python to notice that the source file is changing:
#!/usr/bin/python
import re
from time import gmtime, strftime, sleep
def write_data(new_datapoint):
output_path = '/media/USBHDD/PythonStudy/torrent_data_collection/data_one.csv'
outfile = open(output_path, 'a')
outfile.write(new_datapoint)
outfile.close()
forever = 0
previous_data = "0"
while forever < 1:
input_path = '/var/lib/transmission-daemon/info/stats.json'
infile = open(input_path, "r")
infile.seek(0)
contents = infile.read()
uploaded_bytes = re.search('"uploaded-bytes":\s(\d+)', contents)
if uploaded_bytes:
current_time = strftime("%Y-%m-%d %X", gmtime())
current_data = uploaded_bytes.group(1)
if current_data != previous_data:
write_data(","+ current_time + "$" + uploaded_bytes.group(1))
previous_data = uploaded_bytes.group(1)
infile.close()
sleep(5)
else:
print "couldn't write" + strftime("%Y-%m-%d %X", gmtime())
infile.close()
sleep(60)
As is now, the (messy) script writes once correctly, and then I can see that although my source file (stats.json) file is changing, my script never picks up on any changes. It keeps on running, but my output file doesn't grow.
I thought that an open() and a close() would do the trick, and then tried throwing in a .seek(0).
What file method am I missing to ensure that python re-opens and re-reads my source file, (stats.json)?

Unless you are implementing some synchronization mechanism or could guarantee somehow atomic read and write, I think you are calling for race condition and subtle bugs here.
Imagine the "reader" accessing the file whereas the "writer" hasn't completed its write cycle. There is a risk of reading incomplete/inconsistent data. In "modern" systems, you could also hit the cache -- and not seeing file modifications "live" as they appends.

I can think of two possible solutions:
You forgot the parentheses on the close in the else of the infinite loop.
infile.close --> infile.close()
The program that is changing the JSON file is not closing the file, and therefore it is not actually changing.

Two problems I see:
Are you sure your file is really updated on filesystem? I do not know on what operating system you are playing with your code, but caching may kick your a$$ in this case, if the files is not flushed by producer.
Your problem is worth considering pipe instead of file, however I cannot guarantee what transmission will do if it stuck on writing to pipe if your consumer is dead.
Answering your problems, consider using one of the following:
pynotifyu
watchdog
watcher
These modules are intended to monitor changes on filesystem and then call proper actions. Method in your example is primitive, has big performance penalty and couple other problems mentioned already in other answers.

Ilya, would it help to check(os.path.getmtime), whether stats.json changed before you process the file?
Moreover, i'd suggest to make advantage of the fact it's JSON file:
import json
import os
import sys
dir_name ='/home/klaus/.config/transmission/'
# stats.json of daemon might be elsewhere
file_name ='stats.json'
full_path = os.path.join(dir_name, file_name)
with open(full_path) as fp:
json.load(fp)
data = json.load(fp)
print data['uploaded-bytes']

Thanks for all the answers, unfortunately my error was in the shell, and not in the script with Python.
The cause of the problem turned out to be the way I was putting the script in the background. I was doing: Ctrl+Z which I thought would put the task in the background. But it does not, Ctrl+Z only suspends the task and returns you to the shell, a subsequent bg command is necessary for the script to run on infinite loop in the background

Python for: loop won't pause?

Alright, so I have a script to find, move, and rename files when given a filename to search for. I wrote a wrapper to iterator throughout all my folder in my Robotics folder to automate the process. Here's the code:
#! /usr/bin/env python
import os
import sys
import time
files = [ ... Big long list of filenames ... ]
for item in files:
sys.stdout.write("Analyzing & Moving " + item + "... ")
os.system('python mover-s.py "' + item + '"')
sys.stdout.write("Done.\n")
print ""
print "All files analyzed, moved, and renamed."
Now, it takes ~2s to execute and finished the original script, what I want it to do is display "Analyzing & Moving whatever...." and then AFTER the script is finished, display "Done.".
My issue is that both the message and the "Done." message appear at the same time.
I've added a small pause to it, about .25s, and the same thing, but it just adds .25s to the time it takes to display "Analyzing & Moving whatever... Done."
Basically, why won't it show my first message, pause, then display the second? Because right now it displays the entire line at the same time. This may be because of my poor knowledge of pipes and whatnot..

Add a call to sys.stdout.flush() right after the first write().
The behaviour you're seeing has to do with buffering. See Usage of sys.stdout.flush() method for a discussion.

There are a couple of issues here.
First, to call another python script, there is no reason to be using os.system as you're currently doing. you should simply have a line along the lines of
import movers # can't have hyphen in a module name
movers.main()
or whatever and let than do it.
Secondly, if you are going to move the files using Python's built-in libraries, see this SO question which explains that you should use shutil.copyfile rather than os.system.
This will also take care of the pause issue.

....
for item in files:
sys.stdout.write("Analyzing & Moving " + item + "... ")
os.system('python mover-s.py "' + item + '"')
sys.stdout.write("Done.\n")
print ""
....

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.