Python Azure Batch - Permission Denied Linux node - python

I am running a python script on several Linux nodes (after the creation of a pool) using Azure Batch. Each node uses 14.04.5-LTS version of Ubuntu.
In the script, I am uploading several files on each node and then I run several tasks on each one of these nodes. But, I get a "Permission Denied" error when I try to execute the first task. Actually, the task is an unzip of few files (fyi, the uploading of these zip files went well).
This script was running well until last weeks. I suspect an update of Ubuntu version but maybe it's something else.
Here is the error I get :
error: cannot open zipfile [ /mnt/batch/tasks/shared/01-AXAIS_HPC.zip ]
Permission denied
unzip: cannot find or open /mnt/batch/tasks/shared/01-AXAIS_HPC.zip,
Here is the main part of the code :
credentials = batchauth.SharedKeyCredentials(_BATCH_ACCOUNT_NAME,_BATCH_ACCOUNT_KEY)
batch_client = batch.BatchServiceClient(
credentials,
base_url=_BATCH_ACCOUNT_URL)
create_pool(batch_client,
_POOL_ID,
application_files,
_NODE_OS_DISTRO,
_NODE_OS_VERSION)
helpers.create_job(batch_client, _JOB_ID, _POOL_ID)
add_tasks(batch_client,
_JOB_ID,
input_files,
output_container_name,
output_container_sas_token)
with add_task :
def add_tasks(batch_service_client, job_id, input_files,
output_container_name, output_container_sas_token):
print('Adding {} tasks to job [{}]...'.format(len(input_files), job_id))
tasks = list()
for idx, input_file in enumerate(input_files):
command = ['unzip -q $AZ_BATCH_NODE_SHARED_DIR/01-AXAIS_HPC.zip -d $AZ_BATCH_NODE_SHARED_DIR',
'chmod a+x $AZ_BATCH_NODE_SHARED_DIR/01-AXAIS_HPC/00-EXE/linux/*',
'PATH=$PATH:$AZ_BATCH_NODE_SHARED_DIR/01-AXAIS_HPC/00-EXE/linux',
'unzip -q $AZ_BATCH_TASK_WORKING_DIR/'
'{} -d $AZ_BATCH_TASK_WORKING_DIR/{}'.format(input_file.file_path,idx+1),
'Rscript $AZ_BATCH_NODE_SHARED_DIR/01-AXAIS_HPC/03-MAIN.R $AZ_BATCH_TASK_WORKING_DIR $AZ_BATCH_NODE_SHARED_DIR/01-AXAIS_HPC $AZ_BATCH_TASK_WORKING_DIR/'
'{} {}' .format(idx+1,idx+1),
'python $AZ_BATCH_NODE_SHARED_DIR/01-IMPORT_FILES.py '
'--storageaccount {} --storagecontainer {} --sastoken "{}"'.format(
_STORAGE_ACCOUNT_NAME,
output_container_name,
output_container_sas_token)]
tasks.append(batchmodels.TaskAddParameter(
'Task{}'.format(idx),
helpers.wrap_commands_in_shell('linux', command),
resource_files=[input_file]
)
)
Split = lambda tasks, n=100: [tasks[i:i+n] for i in range(0, len(tasks), n)]
SPtasks = Split(tasks)
for i in range(len(SPtasks)):
batch_service_client.task.add_collection(job_id, SPtasks[i])
Do you have any insights to help me on this issue? Thank you very much.
Robin

looking at the error, i.e.
error: cannot open zipfile [ /mnt/batch/tasks/shared/01-AXAIS_HPC.zip ]
Permission denied unzip: cannot find or open /mnt/batch/tasks/shared/01-AXAIS_HPC.zip,
This seems like that the file is not present at the current shared directory location or it is is not in correct permission. The former is more likely.
Is there any particular reason you are using the shared directory way? also, How are you uploading the file? (i.e. hope that the use of async and await is correctly done, i.e. there is not greedy process which is running your task before the shared_dir stuff is available to the node.)
side note: you own the node so you can RDP / SSH into the node and find it out that the shared_dir are actually present.
Few things to ask will be: how are you uploading these zip files.
Also if I may ask, what is the Design \ user scenario here and how exactly you are intending to use this.
Recommendation:
There are few other ways you can use zip files in the azure node, like via resourcefile or via application package. (The applicaiton package way might suite it better to deal with *.zip file) I have added few documetns and places you can have a look at the sample implementation and guidance for this.
I think a good place to start are: hope material and sample below will help you. :)
Also I would recommend to recreate your pool if it is old which will ensure you have the node running at the latest version.
Azure batch learning path:
Azure batch api basics
Samples & demo link or look here
Detailed walk through depending on what you are using i.e. CloudServiceConfiguration or VirtualMachineConfiguration link.

Related

How can I change the logfile naming for the cluster scheduler in snakemake?

I am running job on a PBS TORQUE cluster and want to customize my log scripts for rules repeated for many files.
The default naming scheme is for each script for each rule snakejob.{rulename}.{id}.sh.o26730731, e.g. snakejob.all.7.sh.o26730731, where only the ending varies for different files (as they are executed one after another). This comes from the script snakemake creates for submission to the cluster.
I can specify a common log-directory to qsub using the -e or -o options.
I know that profiles exist or that one could use wildcards, something like (I have to test that):
snakemake --jobs 10 --cluster "qsub -o logs/{wildcards.file} -e logs/{wildcards.file}"
Alternatively the naming of the script temporarily saved by snakemake under .snakemake/tmp<hash> could be altered to achieve unique naming of logs per file.
I tried to set the log-directory in the rule, but this did not work when I specified a directory (missing .log):
rule target:
input:
# mockfile approach: https://stackoverflow.com/a/53751654/9684872
# replace? https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#directories-as-outputs
file = expand(os.path.join(config['DATADIR'], "{file}", "{file}.txt"), file=FILES)
rule execute:
log:
#dir = os.path.join(config['DATADIR'], "{file}") # Building DAG is stuck in endless loop
dir = os.path.join(config['DATADIR'], "{file}.log") # works
params:
logdir = os.path.join(config['DATADIR'], "{file}") #works
So what is your approach or what would you suggest how this could be solved best to have logs identified by the {file} wildcard?

How to run Open Pose binary (.exe) from within a Python script?

I am making a body tracking application where I want to run Open Pose if the user chooses to track their body movements. The OpenPose binary file can be run like so:
bin\OpenPoseDemo.exe --write_json 'path\to\dump\output'
So, in my Python script, I want to have a line of code that would run Open Pose, instead of having to ask the user to manually run OpenPose by opening a separate command line window. For that, I have tried:
import os
os.popen(r"C:\path\to\bin\OpenPoseDemo.exe --write_json 'C:\path\to\dump\output'")
But this gives the following error:
Error:
Could not create directory: 'C:\Users\Admin\Documents\Openpose\. Status error = -1. Does the parent folder exist and/or do you have writing access to that path?
Which I guess means that OpenPose can be opened only by going inside the openpose directory where the bin subdirectory resides. So, I wrote a shell script containing this line:
bin\OpenPoseDemo.exe --write_json 'C:\path\to\dump\output'
and saved it as run_openpose_binary.sh in the openpose directory (i.e., the same directory where bin is located).
I then tried to run this shell script from within my Python script like so:
import subprocess
subprocess.call(['sh', r'C:\path\to\openpose\run_openpose_binary.sh'])
and this gives the following error:
FileNotFoundError: [WinError 2] The system cannot find the file specified
I also tried the following:
os.popen(r"C:\path\to\openpose\run_openpose_binary.sh")
and
os.system(r"C:\path\to\openpose\run_openpose_binary.sh")
These do not produce any error, but instead just pop up a blank window and closes.
So, my question is, how do I run the OpenPoseDemo.exe from within my Python script?
For your last method, you're missing the return value from os.popen, which is a pipe. So, what you need is something like:
# untested as I don't have access to a Windows system
import os
with os.popen(r"/full/path/to/sh C:/path/to/openpose/run_openpose_binary.sh") as p:
# pipes work like files
output_of_command = p.read().strip() # this is a string
or, if you want to future-proof yourself, the alternative is:
# untested as I don't have access to a Windows system
popen = subprocess.Popen([r'/full/path/to/sh.exe', r'/full/path/to/run_openpose_binary.sh')], stdin=subprocess.PIPE, stdout=subprocess.PIPE,encoding='utf-8')
stdout, stderr = popen.communicate(input='')
Leave a comment if you have further difficulty.
I've had to fight this battle several times and I've found a solution. It's likely not the most elegant solution but it does work, and I'll explain it using an example of how to run OpenPose on a video.
You've got your path to the openpose download and your path to the video, and from there it's a 3-line solution. First, change the current working directory to that openpose folder, and then create your command, then call subprocess.run (I tried using subprocess.call and that did not work. I did not try shell=False but I have heard it's a safer way to do so. I'll leave that up to you.)
import os
import subprocess
openpose_path = "C:\\Users\\me\\Desktop\\openpose-1.7.0-binaries-win64-gpu-python3.7-flir-3d_recommended\\openpose\\"
video_path = "C:\\Users\\me\\Desktop\\myvideo.mp4"
os.chdir(openpose_path)
command = "".join(["bin\\OpenPoseDemo.exe", " -video ", video_path])
subprocess.run(command, shell=True)

How to write check_mk manual checks

I have a python scripts which checks for the processes
import subprocess
s = subprocess.check_output('tasklist', shell=True)
if "cmd.exe" in s:
if "java.exe" not in str(s):
print "selenium server is not up"
if "FreeSSHDService.exe" not in str(s):
print "SSH is not up"
else:
print "Everything is awesome"
I want to add a check on check_mk dashboard, what are the steps to add this check and where I have to up this script.
import subprocess
s = subprocess.check_output('tasklist', shell=True)
if "cmd.exe" in s:
if "java.exe" not in str(s):
return 2, "selenium server is not up")
if "FreeSSHDService.exe" not in str(s):
return 2, "SSH is not up"
else:
return 0, "Everything is awesome"
First of all I'm assuming the node you want to check is MS Windows based, in which case I cannot help you much because my expertise is about UNIX and Linux.
This web link will help you to check your Windows based nodes, especially the paragrah 10. Extending the Windows agent.
In Linux, once the check_mk_agent is installed, there are three ways according to how deep you want to get into the check_mk guts. In Windows I think there are the same methods.
As a local service: you copy your python code into the local folder, whatever it is located in Windows, and edits the [global] section of the check_mk.ini configuration file to make run the py and pyc file name extensions.
As a MRPE check: you make your python program print its output according to the Nagios output check format and edit the [mrpe] section of the check_ini configuration file according to the notes in the paragraph 10.2. As a disadvantage, the WARNING and CRITICAL values/ranges are fixed in the check_ini file -- they cannot be changed in WATO.
As a check_mk agent: you turn your python program into a check_mk agent. I think this is the most difficult way because each check_mk agent has to have a counterpart definition/declaration/inventory file in the check_mk server in order to be used in WATO and to configure its parameters. I never wrote down one, but if you are keen on, you should read this guidelines.
Best regards.
If you want to execute such script you just need to put it (with the correct rights, chmod +755) in the ~/local/lib/nagios/plugins directory.
Then you have to create a rule from the "host and services parameters -> active checks -> Classical active & passive checks"
Once done, you need to enter the command line "python ~/local/lib/nagios/plugins/nameofyourscript.py"
I am not sure about the output though, still working on it for python scripts.

Ida Pro 6.9 crash

I am writing a plugin for Ida (in python) that utilizes the Etcd remote key value storage system. My problem is that when I attempt to get a lock on the server
lock = etcd.Lock(self.client, 'ida_lock')
Should timeout after 30 seconds. Hopefully that is enough.
lock.acquire(blocking=True,lock_ttl=None,timeout=30)
if lock.is_acquired:
data,idc_file = self.get_idc_data()
if os.path.isfile('expendable.idc'):
self.client.write('/fREd/' + self.md5 + '/all/', idc_file, prevValue = open('expendable.idc','r').readlines())
else:
self.client.write('/fREd/' + self.md5 + '/all/', idc_file)
lock.release()
like so, Ida freezes and I was wondering if anyone had any insight on why this is happening or how to fix it.
So for reference the method that includes this is called via a keyboard shortcut
idaapi.add_hotkey('Ctrl-.', self.push_data)
and there is no doubt that it is the lock that causes the problem.
You can look at the python-etcd source at https://github.com/jplana/python-etcd
There are keys already exist under the directory /_locks/ida_lock.
To list the files under /_locks/ida_lock:
etcdctl ls /_locks/ida_lock
To rescue yourself from this, run :
etcdctl rm /_locks/ida_lock --dir --recursive
To avoid this situation, you may run lock.release() in the finally block as if you don't release, there will be a file remained under /_locks/ida_lock.
Further more, you can add some logging configs (which you can reference here) to dig more when dealing with this kind of problems.

How do I queue FTP commands in Twisted?

I'm writing an FTP client using Twisted that downloads a lot of files and I'm trying to do it pretty intelligently. However, I've been having the problem that I'll download several files very quickly (sometimes ~20 per batch, sometimes ~250) and then the downloading will hang, only to eventually have connections time out and then the download and hang start all over again. I'm using a DeferredSemaphore to only download 3 files at a time, but I now suspect that this is probably not the right way to avoid throttling the server.
Here is the code in question:
def downloadFiles(self, result, directory):
# make download directory if it doesn't already exist
if not os.path.exists(directory['filename']):
os.makedirs(directory['filename'])
log.msg("Downloading files in %r..." % directory['filename'])
files = filterFiles(None, self.fileListProtocol)
# from http://stackoverflow.com/questions/2861858/queue-remote-calls-to-a-python-twisted-perspective-broker/2862440#2862440
# use a DeferredSemaphore to limit the number of files downloaded simultaneously from the directory to 3
sem = DeferredSemaphore(3)
jobs = [sem.run(self.downloadFile, f, directory) for f in files]
d = gatherResults(jobs)
return d
def downloadFile(self, f, directory):
filename = os.path.join(directory['filename'], f['filename']).encode('ascii')
log.msg('Downloading %r...' % filename)
d = self.ftpClient.retrieveFile(filename, FTPFile(filename))
return d
You'll noticed that I'm reusing an FTP connection (active, by the way) and using my own FTPFile instance to make sure the local file object gets closed when the file download connection is 'lost' (ie completed). Looking at FTPClient I wonder if I should be using queueCommand directly. To be honest, I got lost following the retrieveFile command to _openDataConnection and beyond, so maybe it's already being used.
Any suggestions? Thanks!
I would suggest using queueCommand, as you suggested I'd suspect the semaphore you're using is probably causing you issues. I believe using queueCommand will limit your FTPClient to a single active connection (though I'm just speculating), so you may want to think about creating a few FTPClient instances and passing download jobs to them if you want to do things quickly. If you use queueStringCommand, you get a Deferred that you can use to determine where each client is up to, and even add another job to the queue for that client in the callback.

Categories

Resources