I have 4 linux EC2 instances created from the same AMI that I use to process files in S3.
I run the same python script on each instance. It takes a directory of files in S3 to process and a number telling it which files it is supposed to process.
Say mydir contains myfile1 ... myfile8.
On instance 0 I call:
python process.py mydir 0
This causes it to process myfile1 and myfile5.
On instance 1 I call:
python process.py mydir 1
This causes it to process myfile2 and myfile2.
And so on.
Inside the script I do:
keys = keys[pid::4] where pid is the argument from the command line.
I redistribute changes to my python script by syncing from S3.
Is there a simple way to automate this further?
I would like to press one button and say dir=yourdir and have it sync code from s3 and run on each instance.
You can try using Fabric.
Example taken from Fabric documentation:
from fabric import Connection
result = Connection('web1.example.com').run('uname -s', hide=True)
msg = "Ran {0.command!r} on {0.connection.host}, got stdout:\n{0.stdout}"
print(msg.format(result))
# Output:
# Ran 'uname -s' on web1.example.com, got stdout:
# Linux
Related
I have a file structure that looks something like this:
Master:
First
train.py
other1.py
Second
train.py
other2.py
Third
train.py
other3.py
I want to be able to have one Python script that lives in the Master directory that will do the following when executed:
Loop through all the subdirectories (and their subdirectories if they exist)
Run every Python script named train.py in each of them, in whatever order necessary
I know how to execute a given python script from another file (given its name), but I want to create a script that will execute whatever train.py scripts it encounters. Because the train.py scripts are subject to being moved around and being duplicated/deleted, I want to create an adaptable script that will run all those that it finds.
How can I do this?
You can use os.walk to recursively collect all train.py scripts and then run them in parallel using ProcessPoolExecutor and the subprocess module.
import os
import subprocess
from concurrent.futures import ProcessPoolExecutor
def list_python_scripts(root):
"""Finds all 'train.py' scripts in the given directory recursively."""
scripts = []
for root, _, filenames in os.walk(root):
scripts.extend([
os.path.join(root, filename) for filename in filenames
if filename == 'train.py'
])
return scripts
def main():
# Make sure to change the argument here to the directory you want to scan.
scripts = list_python_scripts('master')
with ProcessPoolExecutor(max_workers=len(scripts)) as pool:
# Run each script in parallel and accumulate CompletedProcess results.
results = pool.map(subprocess.run,
[['python', script] for script in scripts])
for result in results:
print(result.returncode, result.stdout)
if __name__ == '__main__':
main()
Which OS are you using ?
If Ubuntu/CentOS try this combination:
import os
//put this in master and this lists every file in master + subdirectories and then after the pipe greps train.py
train_scripts = os.system("find . -type d | grep train.py ")
//next execute them
python train_scripts
If you are using Windows you could try running them from a PowerShell script. You can run two python scripts at once with just this:
python Test1.py
python Folder/Test1.py
And then add a loop and or a function that goes searching for the files. Because it's Windows Powershell, you have a lot of power when it comes to the filesystem and controlling Windows in general.
I am trying to run external sample.py script in /path-to-scollector/collectors/0 folder from scollector.
scollector.toml:
Host = "localhost:0"
ColDir="//path-to-scollector//collectors//"
BatchSize=500
DisableSelf=true
command to run scollector:
scollector-windows-amd64.exe -conf scollector.toml -p
But I am not getting the sample.py metrics in the output. It is expected to run continuosly and print output to cnosole. Also when I am running:
scollector-windows-amd64.exe -conf scollector.toml -l
my external collector is not listed.
In your scollector.toml, You should one line as below,
Filter=["sample.py "] .
in your sample.py, you need this line
#!/usr/bin/python
For running scollector on linux machine the above solution works well. But with windows its a bit tricky. Since scollector running on windows can only identify batch files. So we need to do a little extra work for windows.
create external collector :-
It can be written in any language python,java etc. It contains the main code to get the data and print to console.
Example my_external_collector.py
create a wrapper batch script :-
wrapper_external_collector.bat.
Trigger my_external_collector.py inside wrapper_external_collector.bat.
python path_to_external/my_external_collector.py
You can pass arguments to the script also.Only disadvantage is we need to maintain two scripts.
I'm running an hourly cron job for testing. This job runs a python file called "rotateLogs". Cron can't use extensions, so the first line of the file is #!/usr/bin/python. This python file(fileA) then calls another python file(fileB) elsewhere in the computer. fileB logs out to a log file with the time stamp, etc. However, when fileB is run through fileA as a cron job, it creates its log files as rw-r--r-- files.
The problem is that if I then try to log to the files from fileB, they can't write to it unless you run them with sudo permissions. So I am looking for some way to deal with this. Ideally, it would be nice to simply make the files as rw-rw-r-- files, but I don't know how to do that with cron. Thank you for any help.
EDIT: rotateLogs(intentionally not .py):
#!/usr/bin/python
#rotateLogs
#Calls the rotateLog function in the Communote scripts folder
#Designed to be run as a daily log rotation cron job
import sys,logging
sys.path.append('/home/graeme/Communote/scripts')
import localLogging
localLogging.localLog("Hourly log",logging.error)
print "Hello"
There is no command in crontab, but it is running properly on the hourly cron(at 17 minutes past the hour).
FileB's relevant function:
def localLog(strToLog,severityLevel):
#Allows other scripts to log easily
#Takes their log request and appends it to the log file
logging.basicConfig(filename=logDirPath+getFileName(currDate),format="%(asctime)s %(message)s")
#Logs strToLog, such as logging.warning(strToLog)
severityLevel(strToLog)
return
I'm not sure how to find the user/group of the cronjob, but it's simply in /etc/cron.hourly, which I think is root?
It turns out that cron does not source any shell profiles (/etc/profile, ~/.bashrc), so the umask has to be set in the script that is being called by cron.
When using user-level crontabs (crontab -e), the umask can be simply set as follows:
0 * * * * umask 002; /path/to/script
This will work even if it is a python script, as the default value of os.umask inherits from the shell's umask.
However, placing a python script in /etc/cron.hourly etc., there is no way to set the umask except in the python script itself:
import os
os.umask(002)
I have a perl script that sorts files from one incoming directory into other directories on a Ubuntu server.
As it is now I'm running it as a cron job every few minutes but it can give problems if the scripts starts while a files is getting written to the incoming dir.
A better solution would be to start it when a file is written to the incoming dir or any sub dirs.
I'm thinking I could run another script as a service that will call my sorting script whenever a dir change occur, however I have no idea of how to go about doing it.
On Linux you can use pyinotify library: https://github.com/seb-m/pyinotify
For watching subdirectories use rec=True in add_watch() invocation. Complete example monitoring /tmp directory and its subdirectories for file creation:
import pyinotify
class EventHandler(pyinotify.ProcessEvent):
def process_IN_CREATE(self, event):
# Processing of created file goes here.
print "Created:", event.pathname
wm = pyinotify.WatchManager()
notifier = pyinotify.Notifier(wm, EventHandler())
wm.add_watch('/tmp', pyinotify.IN_CREATE, rec=True)
notifier.loop()
I'm trying to copy thousands files to a remote server. These files are generated in real-time within the script. I'm working on a Windows system and need to copy the files to a Linux server (hence the escaping).
I currently have:
import os
os.system("winscp.exe /console /command \"option batch on\" \"option confirm off\" \"open user:pass#host\" \"put f1.txt /remote/dest/\"")
I'm using Python to generate the files but need a way to persist the remote connection so that I can copy each file, to the server, as it is generated (as opposed to creating a new connection each time). That way, I'll only need to change the field in the put option thus:
"put f2 /remote/dest"
"put f3 /remote/dest"
etc.
I needed to do this and found that code similar to this worked well:
from subprocess import Popen, PIPE
WINSCP = r'c:\<path to>\winscp.com'
class UploadFailed(Exception):
pass
def upload_files(host, user, passwd, files):
cmds = ['option batch abort', 'option confirm off']
cmds.append('open sftp://{user}:{passwd}#{host}/'.format(host=host, user=user, passwd=passwd))
cmds.append('put {} ./'.format(' '.join(files)))
cmds.append('exit\n')
with Popen(WINSCP, stdin=PIPE, stdout=PIPE, stderr=PIPE,
universal_newlines=True) as winscp: #might need shell = True here
stdout, stderr = winscp.communicate('\n'.join(cmds))
if winscp.returncode:
# WinSCP returns 0 for success, so upload failed
raise UploadFailed
This is simplified (and using Python 3), but you get the idea.
Instead of using an external program (winscp) you could also use an python ssh-library like pyssh.
You would have to start persistent WinSCP sub-process in Python and feed the put commands to its standard input continuously.
I do not have Python example for this, but there's an equivalent JScript example:
https://winscp.net/eng/docs/guide_automation_advanced#inout
or C# example:
https://winscp.net/eng/docs/guide_dotnet#input
Though using WinSCP .NET assembly via its COM interface for Python would be a way easier:
https://winscp.net/eng/docs/library