sh: command not found when using Snakemake profiles

sh: command not found when using Snakemake profiles - python

I'm trying to create a new profile for snakemake to easy run workflows on our cluster system. I followed the examples available in the Snakemake-Profiles repo, and have the following directory structure:
project/
- my_cluster/
-- config.yaml
-- cluster-submit.py
-- jobscript.sh
- Snakefile
- (other files)
Thus, the my_cluster directory contains the cluster profile. Excerpt of config.yaml:
cluster: "cluster-submit.py {dependencies}"
jobscript: "jobscript.sh"
jobs: 1000
immediate-submit: true
notemp: true
If I try to run my workflow as follows:
snakemake --profile my_cluster target
I get the following error: sh: command not found: cluster-submit.py, for each rule snakemake tries to submit. Which may not be super surprising considering the cluster-submit.py script lives in a different directory. I'd have thought that snakemake would handle the paths when using profiles. So am I forgetting something? Did I misconfigure something? The cluster-submit.py file has executable permissions and when using absolute paths in my config.yaml this works fine.

Related

Snakemake: creating directories and files anywhere while using singularity

Whenever I use snakemake with singularity and would like create a directory or file, I run into read-only errors if I try to write outside of the snakemake directory.
For example, if my rule contains the following:
container:
docker://some_image
shell:
"touch ~/hello.txt"
then everything runs fine. hello.txt is created inside the snakemake directory. However if my touch command tries to create a file outside of the snakemake directory:
container:
docker://some_image
shell:
"touch /home/user/hello.txt"
Then I get the following error:
touch: cannot touch '/home/user/hello.txt': Read-only file system
Is there anyway to give snakemake the ability to create files anywhere it wants when using singularity?

Singularity by default mounts certain directories including user's home directory. In your first command (touch ~/hello.txt), file gets written to home directory, where singularity has read/write permissions. However in your second command (touch /home/user/hello.txt), singularity doesn't have read/write access to /home/user/; you will need to bind that path manually using singularity's --bind and supply it to snakemake via --singularity-args.
So the snakemake command would look something like
snakemake --use-singularity --singularity-args "--bind /home/user/"

Pytest in Azure pipeline does not see specific files?

I have a src directory which contains a pyproject.toml, right next to my setup.py (which I can find, for a fact).
But when I run the task
- bash: pytest -c src/pyproject.toml
displayName: Run tests
workingDirectory: $(Pipeline.Workspace)
I get FileNotFoundError: [Errno 2] No such file or directory: '/home/vsts/work/1/src/pyproject.toml'.
Everything works fine without trying to point to this file from pytest. Why? The same setup works fine locally.

The workingDirectory should be $(System.DefaultWorkingDirectory), not $(Pipeline.Workspace).
The $(Pipeline.Workspace) is the local path on the agent where all folders for a given build pipeline are created.
The $(System.DefaultWorkingDirectory) is the local path on the agent where your source code files are downloaded.
Click this document for detailed information about predefined variables in Azure DevOps

How to create a new Spacy 3.0 project from scratch?

I'm trying to create a new Spacy 3.0 project from scratch for a custom NLP pipeline. There seems to be no way of doing this. The only mechanism I can find in the documentation is to clone an existing project repository and then edit it. Is there any other way of doing this?
>>> python -m spacy project --help
Usage: python -m spacy project [OPTIONS] COMMAND [ARGS]...
Command-line interface for spaCy projects and templates. You'd typically
start by cloning a project template to a local directory and fetching its
assets like datasets etc. See the project's project.yml for the available
commands.
Options:
--help Show this message and exit.
Commands:
assets Fetch project assets like datasets and pretrained weights.
clone Clone a project template from a repository.
document Auto-generate a README.md for a project.
dvc Auto-generate Data Version Control (DVC) config.
pull Retrieve available precomputed outputs from a remote storage.
push Persist outputs to a remote storage.
run Run a named command or workflow defined in the project.yml.

All you need is a directory containing project.yml. A minimal one-command project.yml:
commands:
- name: "demo"
script:
- "python --version"
Nearly everything is optional. Assets are not required:
spacy project assets
⚠ No assets specified in project.yml
Run the one defined command:
spacy project run demo
==================================== demo ====================================
Running command: /home/username/venv/spacy/bin/python --version
Python 3.7.3
See: https://spacy.io/usage/projects#project-yml

How to enable cloudbuild.yaml for zip-based CloudFunction deployments?

Given some generic Python code, structured like ...
cloudbuild.yaml
requirements.txt
functions/
folder_a/
test/
main_test.py
main.py
If I'm ...
creating a .zip from above folder and
using either Terraform's google_cloudfunctions_function resource or gcloud functions deploy to upload/deploy the function
... it seems the build configuration for cloudbuild (cloudbuild.yaml) included in the .zip is never considered during build (i.e. while / prior to resolving requirements.txt).
I've set up cloudbuild.yaml to grant access to a private github repository (which contains a dependency listed in requirements.txt). Unfortunately, build fails with (terraform output):
Error: Error waiting for Updating CloudFunctions Function: Error code 3, message: Build failed: {"error": {"canonicalCode": "INVALID_ARGUMENT", "errorMessage": "pip_download_wheels had stderr output:\nCommand \"git clone -q ssh://git#github.com/SomeWhere/SomeThing.git /tmp/pip-req-build-a29nsum1\" failed with error code 128 in None\n\nerror: pip_download_wheels returned code: 1", "errorType": "InternalError", "errorId": "92DCE9EA"}}
According to cloud build docs, a cloudbuild.yaml can be specified using gcloud builds submit --config=cloudbuild.yaml . -- is there any way to supply that parameter to gcloud functions deploy (or even Terraform), too? I'd like to stay with the current, "transparent" code build, i.e. I do not want to set up code build separately but just upload my zip and have the code be built and deployed "automatically", while respecting codebuild.yaml.

It looks like you're trying to authenticate to a private Git repo via SSH. This is unfortunately not currently supported by Cloud Functions.
The alternative would be to vendor your private dependency into the directory before creating your .zip file.

Issues with using proper directory when running python script in gitlab-ci

I have a python script that I am trying to run as part of gitlab pages deployment of a jekyll site. My site has blog posts that have various tags, and the python script generates the .md files for the tag pages. The script works perfectly fine when I just manually run it in an IDE, however I want it to be part of the gitlab ci deployment process
here is what my gitlab-ci.yml setup looks like:
run:
image: python:latest
script:
- python tag_generator.py
artifacts:
paths:
- public
only:
- master
pages:
image: ruby:2.3
stage: deploy
script:
- bundle install
- bundle exec jekyll build -d public
artifacts:
paths:
- public
only:
- master
however, it doesn't actually create the files that it's supposed to create, here is the output from the job "run":
...
Cloning repository...
Cloning into '/builds/username/projectname'...
Checking out 4c8a47fe as master...
Skipping Git submodules setup
$ python tag_generator.py
Tags generated, count 23
Uploading artifacts...
WARNING: public: no matching files
ERROR: No files to upload
Job succeeded
the script reads out "tags generated, count ___" once it's executed, so it is running, however the files that it's supposed to create aren't being created/uploaded into the right directory. there is a /tag directory in the root project folder, that is where they are supposed to go.
I realize that the issue must have something to do with the public folder, however when I don't have
artifacts:
paths:
- public
it still doesn't create the files in the /tag directory, so it doesn't work whether I have -public or not, and I don't know what the problem is.

I FIGURED IT OUT!
the "build" for the project isn't made in the repo, gitlab clones the repo into another place, so I had to change the artifact path for the python job so that it's in the cloned "build" location, like so:
run:
image: python:latest
stage: test
before_script:
- python -V # Print out python version for debugging
- pip install virtualenv
script:
- python tag_generator.py
artifacts:
paths:
- /builds/username/projectname/tag
only:
- master

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.