Given some generic Python code, structured like ...
cloudbuild.yaml
requirements.txt
functions/
folder_a/
test/
main_test.py
main.py
If I'm ...
creating a .zip from above folder and
using either Terraform's google_cloudfunctions_function resource or gcloud functions deploy to upload/deploy the function
... it seems the build configuration for cloudbuild (cloudbuild.yaml) included in the .zip is never considered during build (i.e. while / prior to resolving requirements.txt).
I've set up cloudbuild.yaml to grant access to a private github repository (which contains a dependency listed in requirements.txt). Unfortunately, build fails with (terraform output):
Error: Error waiting for Updating CloudFunctions Function: Error code 3, message: Build failed: {"error": {"canonicalCode": "INVALID_ARGUMENT", "errorMessage": "pip_download_wheels had stderr output:\nCommand \"git clone -q ssh://git#github.com/SomeWhere/SomeThing.git /tmp/pip-req-build-a29nsum1\" failed with error code 128 in None\n\nerror: pip_download_wheels returned code: 1", "errorType": "InternalError", "errorId": "92DCE9EA"}}
According to cloud build docs, a cloudbuild.yaml can be specified using gcloud builds submit --config=cloudbuild.yaml . -- is there any way to supply that parameter to gcloud functions deploy (or even Terraform), too? I'd like to stay with the current, "transparent" code build, i.e. I do not want to set up code build separately but just upload my zip and have the code be built and deployed "automatically", while respecting codebuild.yaml.
It looks like you're trying to authenticate to a private Git repo via SSH. This is unfortunately not currently supported by Cloud Functions.
The alternative would be to vendor your private dependency into the directory before creating your .zip file.
Related
I'm simply adopting the Eventbridge ETL design pattern and it gives me this error when I deploy:
[100%] fail: docker login --username AWS --password-stdin https://315997497220.dkr.ecr.us-west-2.amazonaws.com exited with error code 1:
❌ the-eventbridge-etl failed: Error: Failed to publish one or more
assets. See the error messages above for more information. at
Object.publishAssets
(/home/mubashir/.nvm/versions/node/v16.3.0/lib/node_modules/aws-cdk/lib/util/asset-publishing.ts:25:11)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at Object.deployStack
(/home/mubashir/.nvm/versions/node/v16.3.0/lib/node_modules/aws-cdk/lib/api/deploy-stack.ts:237:3)
at CdkToolkit.deploy
(/home/mubashir/.nvm/versions/node/v16.3.0/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:194:24)
at initCommandLine
(/home/mubashir/.nvm/versions/node/v16.3.0/lib/node_modules/aws-cdk/bin/cdk.ts:267:9)
Failed to publish one or more assets. See the error messages above for
more information.
The steps I took. Github repo has a video I followed
npx cdkp init the-eventbridge-etl --lang=python
cd the-eventbridge-etl
python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt
cdk synth
cdk deploy
The first error I get is related to bootstrapping. So I bootstrap.
export CDK_NEW_BOOTSTRAP=1
npx cdk bootstrap aws://315997497220/us-east-2 --cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess --trust 315997497220 aws://315997497220/us-east-2
I've naturally updated the cdk.json file for using the above bootstrapping technique. I've tried all bootstrap techniques, with and without a qualifier, and with its subsequent changes to cdk.json. I don't think it's a bootstrap issue.
I get the above error and I don't know what the issue is. I have not made any changes to the code.
I geuss you need to get and pipe a password first as you use the --password-stdin flag. Try:
aws ecr get-login-password | docker login --username AWS --password-stdin https://315997497220.dkr.ecr.us-west-2.amazonaws.com
I'm trying to structure my GCP gcloud functions (in python) as a separate file per function. It sounds like from the documentation it should be supported but when I try to use --source as a file name instead of a directory it fails. Is there something I'm doing wrong?
Here is the command I use:
gcloud functions deploy createAccount --runtime python38 --trigger-http --source=postgres/createAccount.py --region us-central1
and the error I get back is:
ERROR: (gcloud.functions.deploy) argument --source: Provided path does not point to a directory
But if I put my "createAccount" python function in main.py inside the postgres directory and use this command the function deploys perfectly:
gcloud functions deploy createAccount --runtime python38 --trigger-http --source=postgres --region us-central1
Here it loojs as though it should accept file names in the --source option:
https://cloud.google.com/functions/docs/first-python
See this section:
Any ideas if there is a way to not make main.py one big monolith of all my cloud functions?
If we look at the documentation of the gcloud command for deploying functions and the --source flag within:
https://cloud.google.com/sdk/gcloud/reference/functions/deploy#--source
We find that it unambiguously says that it wants a directory as a parameter and not a source file. The link you gave which seems to say that we can specify a file looks to be in error. I think that is a mistake and it can only be a directory that is supplied with --source.
This would seem to imply that you can create multiple directories ... where each directory contains just the function you wish to deploy.
Intro
My scenario is that I want to re-use shared code from a repo in Azure DevOps across multiple projects. I've built a pipeline that produces a wheel as an artifact so I can download it to other pipelines.
The situation
Currently I have succesfully setup a pipeline that deploys the Python Function App. The app is running fine and stable. I use SCM_DO_BUILD_DURING_DEPLOYMENT=1 and ENABLE_ORYX_BUILD=1 to achieve this.
I am now in the position that I want to use the artifact (Python/pip wheel) as mentioned in the intro.
I've added a step in the pipeline and I am able to download the artifact successfully. The next step is ensuring that the artifact is installed during my Python Function App Zip Deployment. And that is where I am stuck at.
The structure of my zip looks like:
__app__
| - MyFirstFunction
| | - __init__.py
| | - function.json
| | - example.py
| - MySecondFunction
| | - __init__.py
| | - function.json
| - wheels
| | - my_wheel-20201014.10-py3-none-any.whl <<< this is my wheel
| - host.json
| - requirements.txt
The problem
I've tried to add commands like POST_BUILD_COMMAND and PRE_BUILD_COMMAND to get pip install the wheel but it seems the package is not found (by Oryx/Kudu) when I use the command:
-POST_BUILD_COMMAND "python -m pip install --find-links=home/site/wwwroot/wheels my_wheel"
Azure DevOps does not throw any exception or error message. Just when I execute the function I get an exception saying:
Failure Exception: ModuleNotFoundError: No module named 'my_wheel'.
My question is how can I change my solution to make sure the build is able to install my_wheel correctly.
Sidenote: Unfortunately I am not able to use the Artifacts feed from Azure DevOps to publish my_wheel and let pip consume that feed.
Here is how my custom wheel works in VS code locally:
Navigate to your DevOps, edit pipeline YAML file, add a python script to specify the wheel file to be installed:
pip install my_wheel-20201014.10-py3-none-any.whl
Like this:
Enable App service Log and navigate to Log Stream to see if it works on Azure:
I have solved my issue by checking out the repository of my shared code and included the shared code in the function app package.
Also I replaced the task AzureFunctionApp#1 with the AzureCLI#2 task and deploy the function app with a az functionapp deployment source config-zip command. I set the application settings via a separate AzureAppServiceSettings#1 step in the pipeline.
AzureCLI#2:
It is not the exact way I wanted to solve this because I still have to include the requirements of the shared code in the root requirements.txt as well.
Switching the task AzureFunctionApp#1 to the AzureCLI#2 gives me more feedback in the pipeline. The result should be the same.
We are building data pipeline using Beam Python SDK and trying to run on Dataflow, but getting the below error,
A setup error was detected in beamapp-xxxxyyyy-0322102737-03220329-8a74-harness-lm6v. Please refer to the worker-startup log for detailed information.
But could not find detailed worker-startup logs.
We tried increasing memory size, worker count etc, but still getting the same error.
Here is the command we use,
python run.py \
--project=xyz \
--runner=DataflowRunner \
--staging_location=gs://xyz/staging \
--temp_location=gs://xyz/temp \
--requirements_file=requirements.txt \
--worker_machine_type n1-standard-8 \
--num_workers 2
pipeline snippet,
data = pipeline | "load data" >> beam.io.Read(
beam.io.BigQuerySource(query="SELECT * FROM abc_table LIMIT 100")
)
data | "filter data" >> beam.Filter(lambda x: x.get('column_name') == value)
Above pipeline is just loading the data from BigQuery and filtering based on some column value. This pipeline works like a charm in DirectRunner but fails on Dataflow.
Are we doing any obvious setup mistake? anyone else getting the same error? We could use some help to resolve the issue.
Update:
Our pipeline code is spread across multiple files, so we created a python package. We solved setup error problem by passing --setup_file argument instead of --requirements_file.
We resolved this setup error issue by sending a different set of arguments to the dataflow. Our code is spread across multiple files, so had to create a package for it. If we use --requirements_file, the job will start, but fail eventually, because it wouldn't be able to find the package in the workers. Beam Python SDK sometimes does not throw explicit error message for these instead, it will retry the job and fail. To get your code running with a package, you will need to pass --setup_file argument, which has dependencies listed in it. Make sure package created by python setup.py sdist command includes all the files required by your pipeline code.
If you have a privately hosted python package dependency then pass --extra_package with the path to the package.tar.gz file. Better way is to store in a GCS bucket and pass the path here.
I have written an example project to get started with Apache Beam Python SDK on Dataflow - https://github.com/RajeshHegde/apache-beam-example
Read about it here - https://medium.com/#rajeshhegde/data-pipeline-using-apache-beam-python-sdk-on-dataflow-6bb8550bf366
I'm building a prediction pipeline using Apache Beam/Dataflow. I need to include the model files inside the dependencies available to the remote workers. The Dataflow job failed with the same error log:
Error message from worker: A setup error was detected in beamapp-xxx-xxxxxxxxxx-xxxxxxxx-xxxx-harness-xxxx. Please refer to the worker-startup log for detailed information.
However, this error message didn't give any details about the worker-startup log. Finally, I found a way to have the worker log and solve the problem.
As is known, Dataflow creates compute engines to run jobs and save logs on them so that we can access the vm to see logs. We can connect to the vm in use by our Dataflow job from the GCP console via SSH. Then we can check the boot-json.log file located in /var/log/dataflow/taskrunner/harness:
$ cd /var/log/dataflow/taskrunner/harness
$ cat boot-json.log
Here we should pay attention. When running in batch mode, the vm created by Dataflow is ephemeral and closed when the job failed. If the vm is closed, we can't access it anymore. But a process including a failing item is retried 4 times, so normally we have enough time to open boot-json.log and see what is going on.
At last, I put my Python setup solution here that may help someone else:
main.py
...
model_path = os.path.dirname(os.path.abspath(__file__)) + '/models/net.pd'
# pipeline code
...
MANIFEST.in
include models/*.*
setup.py complete example
REQUIRED_PACKAGES = [...]
setuptools.setup(
...
include_package_data=True,
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
package_data={"models": ["models/*"]},
...
)
Run Dataflow pipelines
$ python main.py --setup_file=/absolute/path/to/setup.py ...
This doc show the command to download the source of an app I have in app engine:
appcfg.py -A [YOUR_APP_ID] -V [YOUR_APP_VERSION] download_app [OUTPUT_DIR]
Thats fine, but I also have services that I deployed. Using this command I can only seem to download the "default" service. I also deployed "myservice01" and "myservice02" to app engine in my GCP project. How do I specify the code of a specific service to download?
I tried this command as suggested:
appcfg.py -A [YOUR_APP_ID] -M [YOUR_MODULE] -V [YOUR_APP_VERSION] download_app [OUTPUT_DIR]
It didn't fail but this is the ouput I got (and it didn't download anything)
01:30 AM Host: appengine.google.com
01:30 AM Fetching file list...
01:30 AM Fetching files...
Now as a test I tried it with the name of a module I know doesn't exist and I got this error:
Error 400: --- begin server output ---
Version ... of Module ... does not exist.
So I at least know its successfully finding the module and version, but doesn't seem to want to download them?
Also specify the module (services used to be called modules):
-M MODULE, --module=MODULE
Set the module, overriding the module value from
app.yaml.
So something like:
appcfg.py -A [YOUR_APP_ID] -M [YOUR_MODULE] -V [YOUR_APP_VERSION] download_app [OUTPUT_DIR]
Side note: YOUR_APP_VERSION should really read YOUR_MODULE_VERSION :)
Of course, the answer assumes the app code downloads were not permanently disabled from the Console's GAE App Settings page:
Permanently prohibit code downloads
Once this is set, no one, including yourself, will ever be able to
download the code for this application using the appcfg download_app
command.