I am working on a script that needs to be run both from a command prompt, such as BASH, and from the Console in Spyder. Running from a command prompt allows the script file name to be followed by several arguments which can then be utilized within the script; >python script1.py dataFile.csv Results outputFile.csv. These arguments are referenced within the script as elements of the list sys.argv.
I've tried using subprocess.run("python script1.py dataFile.csv Results outputFile.csv") to enable the console to behave as the command line, but sometimes it works fine and other times it needs certain arguments, like -f between python and the file name, before it will display what is displayed in the command line. Different computers disagree on whether such arguments help or hurt.
I've searched and searched, and found some clever ways to use technicalities of the specific operating system to distinguish, but is there something native to Python I can use?
If you import sys in the console and then call sys.argv, it will show you the value ['']. While running a script within Spyder expands that array to ['script1.py'] (plus the file address), it will still not get larger than one entry.
If, on the other hand, you run the script from the command line the way you showed above, then sys.argv will have a value of ['script1.py', 'dataFile.csv', 'Results', 'outputFile.csv']. You can utilize the differences between these to distinguish between the cases.
What are the best differences to use? You want to distinguish between two possibilities, so an if - else pair would be best in the code. What's true in one case and false in the other? if sys.argv will not work, because in both cases the list contains at least one string, so that will be effectively True in both cases.
if len(sys.argv) > 1 works, and it adds the capability to run from the command line and go with what is coded for the console case.
Related
Original question: I am running a python script
python script.py #runs/run_1/parameters.txt
Is there some way to access the string runs/run_1 in my script?
Actual question: I noticed that what I really need is a little different from the above. Independently from the directory from which I run the script, I need to get the location of parameters.txt.
For example, when I'm running the script from the directory runs/run_1 itself, I still need to get the path. I could do this with os.getcwd().
But when I'm running the code from PyCharm, I pass #/runs/run_1/parameters.txt as a parameter, while the script itself lives in some other directory. Here, I would need to read from sys.argv as suggested in the comments below.
For now I will have to do differentiate these cases with an if-statement checking whether the string before parameters.txt in python script.py #.../parameters.txt is empty. Is there a better way?
A little bit of an ugly question, but I didn't find existing SO posts which cover it.
Right now I need to use an existing python tool available on this github
This is a rather big piece of code with a lot of dependencies which I don't want to mess with. In a nutshell one can run its module by passing the command line arguments, for example:
timesearch.py timesearch -r "subreddit1" -l "1466812800" -up "1498348800"
Now, I need to run this tool a bunch of times using a for loop, passing over different argument values each time. The tool also prints out some output into command line when you run it - and I would like to intercept and print it out from my python script as well. Finally, I need to ensure that before I move on in my loop and run the tool another time that current execution of the timesearch tool is completed.
One side note here - I do need to ensure that the timesearch is executed using same environment which I use to run my main script with for loop.
I am trying to understand what is the best way to do it.
If I just go for this it doesn't work:
import os
#for loop will go here
os.system('python timesearch.py timesearch -r "ethereum" -l "1466812800" -up "1498348800"')
It fails due to several reasons - it doesn't use the environment in which I am writing my script with a loop, it also doesn't capture the print output of timesearch.
Any advice on how to achieve it?
Just to highlight - I can't just go and pull function I need in timesearch, since it calls the __init__ to set up some things based on the arguments you pass.
I wouldn't call python script with os.system. There is basically one function which you need to use: main(sys.argv[1:])
https://github.com/voussoir/timesearch/blob/master/timesearch/__init__.py#L435.
I have a python application which i want to purpose as a multi as a multi terminal handler, i want each object to have it's own terminal separated from the rest each running it's own instance, exactly like when i run two or more separate terminals in Linux (/bin/sh or /bin/bash)
sample: (just logic not code)
first_terminal = terminalInstance()
second_terminal = terminalInstance()
first_result = first_terminal.doSomething("command")
second_result = second_terminal.doSomething("command")
i actually need to have each terminal to grab a stdin & stdout in a virtual environment and control them, this is why they must be seperate, is this possible in python range? i've seen alot of codes handling a single terminal but how do you do it with multiple terminals.
PS i don't want to include while loops (if possible) since i want to add scalability from dealing with 2 or more terminals to as much as my system can handle? is it possible to control them by reference giving each terminal a reference and then calling on that object and issuing a command?
The pexpect module (https://pypi.python.org/pypi/pexpect/), among others, allows you to launch programs via a pseudo-tty, which "allows your script to spawn a child application and control it as if a human were typing commands."
You can easily spawn multiple commands, each running in a separate pseudo-tty and represented by a separate object, and you can interact with each object separately. There is a lot of flexibility as to when/how you interact. You can send input to them, and read their output, either blocking or non-blocking, and incorporating timeouts and alternative outputs.
Here's a trivial session example (run bash, have it execute an "ls" command, gather the first line of output).
import pexpect
x = pexpect.spawn("/bin/bash")
x.sendline("ls")
x.expect("\n") # End of echoed command
x.expect("\n") # End of first line of output
print x.before # Print first line of output
Note that you'll receive all the output from the terminal, typically including an echoed copy of every character you send to it. If running something like a shell, you might also need to set the shell prompt (or determine the shell prompt in use) and use that in parsing the output (i.e. in finding the end of each command's output).
I'm thinking on how to design a program that as a result of all events, checks a command file (which essentially holds key-value pairs where the key is a command and the value is the code to execute) and runs that command's matching code.
It'll be run on a unix/linux machine.
For simplicity's sake, the program will be as follows:
It'll wait for the user's input. When the user inputs a valid command (i.e a command that appears in the commands file), the matching code will be executed. If it doesn't match any command it'll print "No matching command".
So if my command file looks like:
run1='a.py'
run2='b.py'
and I enter "run1" then a.py will be executed. If I enter "run3" then "No matching command" will be printed.
I want to implement this in C++ and I've seen similar implementations where people used system() to execute the commands but this feel like a bad way to achieve this.
What other options do I have to achieve this?
p.s - I wrote in my example that the code being run is in python. I'm not sure I want that to be the only option (so I'll need to be able to identify the type. Lets assume I can do that).
Is this still achievable?
There are LOTS of options. system("a.py"); will do what you want (assuming python is installed correctly on the system you are running on). If that is the "best" solution really depends on what you want to achieve, and that isn't entirely clear from your question.
Most other solutions will be more or less system specific. You could, for example, in Unix/Linux use fork() and one of the flavours of exec() [with python as the actual executable, and "a.py" as the file to run in python], but that won't work in Windows, where you would have to use, for example, spawn() (again with python as the executable file and for example "a.py" as the code to run).
I have the situation where I am doing some computation in Python, and based on the outcomes I have a list of target files that are candidates to be passed to 2nd program.
For example, I have 50,000 files which contain ~2000 items each. I want to filter for certain items and call a command line program to do some calculation on some of those.
This Program #2 can be used via shell command line, but requires also a lengthy set of arguments. Because of performance reasons I would have to run Program #2 on a cluster.
Right now, I am running Program #2 via
'subprocess.call("...", shell=True)
But I'd like to run it via qsub in future.
I have not much experience of how exactly this could be done in a reasonably efficient manner.
Would it make sense to write temporary 'qsub' files and run them via subprocess() directly from the Python script? Is there a better, maybe more pythonic solution?
Any ideas and suggestions are very welcome!
It makes perfect sense, although I would go for another solution.
As far as I understand, you have programme #1 that determines which of your 50,000 files needs to be computed by programme #2.
Both programme #1 and #2 are written in Python. Excellent choice.
Incidentally, I have a Python module that might come in handy: https://gist.github.com/stefanedwards/8841307
If you are running the same qsub-system as I have (no idea what ours is called), you cannot use command arguments on the submitted scripts. Instead, any options are submitted via the -v option, that puts them into environment variables, e.g.:
[me#local ~] $ python isprime.py 1
1: True
[me#local ~] $ head -n 5 isprime.py
#!/usr/bin/python
### This is a python script ...
import os
os.chdir(os.environ.get('PBS_O_WORKDIR','.'))
[me#local ~] $ qsub -v isprime='1 2 3' isprime.py
123456.cluster.control.com
[me#local ~]
Here, isprime.py could handle command line arguments using argparse. Then you just need to check whether the script is running as a submitted job, and then retrieve said arguments from the environment variables (os.environ).
When programme #2 is modified to be run on the cluster, programme #1 can submit jobs by using subprocess.call(['qsub','-v options=...','programme2.py'], shell=FALSE)
Another approach would be to queue all the files in a database (say, an SQLite database). Then you could have programme #1 check all non-processed entries in the database, determine the outcome (run, not run, run with special options).
You now have the opportunity to run programme #2 in parallel on the cluster, which simply checks for the database for files to analyse.
Edit: When Programme #2 is an executable
Instead of a python script, we use a bash script that takes environment variables and puts them on a command line for the programme:
#!/bin/bash
cd .
# put options into context/flags etc.
if [ -n $option1 ]; then _opt1="--opt1 $option1"; fi
# we can even define our own defaults
_opt2='--no-verbose'
if [ -n $opt2 ]; then _opt2="-o $opt2"; fi
/path/to/exe $_opt1 $opt2
If you are going for the database solution, then have a python script that checks the database for unprocessed files, mark file as being processed (do these to in a single transaction), get options, call executable with subprocess, when done, mark file as done, check for a new file, etc.
You obviously have built yourself a string cmd containing a command that you could enter in a shell for running the 2nd program. You are currently using subprocess.call(cmd, shell=True) for executing the 2nd program from a Python script (it then becomes executed within a process on the same machine as the calling script).
I understand that you are asking how to submit a job to a cluster so that this 2nd program is run on the cluster instead of the calling machine. Well, this is pretty easy and the method is independent of Python, so there is no 'pythonic' solution, just an obvious one :-) : replace your current cmd with a command that defers the heavy work to the cluster.
First of all, dig into the documentation of your cluster's qsub command (the underlying batch system might be SGE or LSF, or whatever, you need to get the corresponding docs) and try to find the shell command line that properly submits an example job of yours to the cluster. It might look as simple as qsub ...args... cmd, whereas cmd here is the content of the original cmd string. I assume that you now have the entire qsub command needed, let's call it qsubcmd (you have to come up with that on your own, we can't help there). Now all you need to do in your original Python script is calling
subprocess.call(qsubcmd, shell=True)
instead of
subprocess.call(cmd, shell=True)
Note that qsub likely only works on very few machines, typically known as your cluster 'head node(s)'. This means that your Python script that wants to submit these jobs should run on this machine (if that is not possible, you need to add an ssh login procedure to the submission process that we don't want to discuss here).
Please also note that, if you have the time, you should look into the shell=True implications of your subprocess usage. If you can circumvent shell=True, this will be the more secure solution. This might however not be an issue in your environment.