I've seen a few similar questions here but I don't think this specific one has been answered yet. I am on a machine learning team and we do a LOT of discovery/exploratory analysis in a local environment.
I am trying to pass secrets stored in my github enterprise account to my local environment the same way that Azure Keyvault does.
Here is my workflow file:
name: qubole_access
on: [pull_request, push]
env:
## Sets environment variable
QUBOLE_API_TOKEN: ${{secrets.QUBOLE_API_TOKEN}}
jobs:
job1:
runs-on: self-hosted
steps:
- name: step 1
run: echo "The API key is:${{env.QUBOLE_API_TOKEN}}"
I can tell it's working because the job runs successfully in the workflow
The workflow file is referencing an API token to access our Qubole database. This token is stored as a secret in the 'secrets' area of my repo
What I want to do now is reference that environment variable in a LOCAL python environment. It's important that it be in a local environment because it's less expensive and I don't want to risk anyone on my team accidentally forgetting and pushing secrets in their code, even if it's in a local git ignore file.
I have fetched/pulled/pushed/restarted etc etc and I can't get the variable into my environment.
When I check the environment variables by running env in the terminal, no environment variables show up there either.
Is there a way to treat github secrets like secrets in azure keyvault? Or am I missing something obvious?
Related
As the title says, I have a Python script I wrote that I would like to allow others to use.
The script is an API aggregator, and it requires a client_id and secret to access the API. As of now I have an env file which stores these values and I'm able to get the values from the env file.
My question is now that I have finished the script locally, how do I deploy with the environment variables it so others can use it given that the environment variables are required?
Sorry if this is a simple question - new to writing scripts for others to use.
The only thing I could think of was including the .env when I push to github, but not sure if that's great practice since my client_id and secret are stored there
Let's say I want to create a simple cloud function to run a python script, where the main.py is in a github repository mirrored via Cloud Source Repositories. My questions is, if I need to reference information that I don't want to add to the repository - is there another way to access that information? For example, let's say I want to have a config.py which I reference in main.py. Is it possible to save and reference config.py somewhere in GCP instead? (e.g. Storage)?
Thanks!
Another answer that came to mind is the use of GCP's Runtime Configurator. This is an API within the Google Cloud Platform that lets you store information to use within GCE resources, e.g. cloud functions. Note that as we speak, this feature is still in Beta! Here is a small demo:
Create your project config:
gcloud beta runtime-config configs create my-project-config
Set a variable in your project config:
gcloud beta runtime-config configs variables set --config-name my-project-config --is-text my-variable "hello world"
The service account running the cloud function needs the following permissions:
runtimeconfig.configs.get
runtimeconfig.variables.list
Use that variable in a cloud function (Python):
from google.cloud import runtimeconfig
client = runtimeconfig.Client()
config = client.config('my-config')
print(config.get_variable('variable-name'))
# <Variable: my-config, variable-name>
print(config.get_variable('does-not-exist'))
# None
It seems like what you might want is Environment Variables for Cloud Functions or possibly even Secrets in Cloud Functions.
Other than that, Cloud Functions are completely stateless, so you'd need to connect to some external datastore like a database to load private configuration.
Look into variable substitution in Cloud Build where a 'build trigger' would contain non-repository values that would be inserted in 'build steps' into your Cloud Function as environment variables.
https://cloud.google.com/cloud-build/docs/configuring-builds/substitute-variable-values
https://cloud.google.com/functions/docs/env-var
In addition to the other answers, we use a somewhat different approach. It boils down in having a public repo which contains all the cloud function Python code. We have another private repository which only contains configuration, like config.py. Let's describe this as an example:
Create 2 repositories, for example:
github.com/organization/cloud-function (public)
github.com/organization/config (private)
Set a cloudbuild trigger on the config repository, and set a cloudbuild trigger on the cloud-function repository to trigger the build on the config repository. Here is some documentation about creating cloudbuild triggers.
In the last step everything comes together. Remember, your configuration is private, so not accessible to anyone else. Everytime someone pushes changes to one of the repositories, it should trigger the cloudbuild.yaml in your private repo. That cloudbuild.yaml looks something like this:
---
timeout: 1800s
steps:
# Clone public repo
- name: 'gcr.io/cloud-builders/git'
args:
- 'clone'
- 'https://github.com/organization/cloud-function.git'
# Copy config
- name: 'gcr.io/cloud-builders/gcloud'
entrypoint: 'bash'
args:
- '-c'
- |
cp config.py cloud-function/
# Deploy cloud-function
- name: 'gcr.io/cloud-builders/gcloud'
dir: 'cloud-function'
entrypoint: 'bash'
args:
- '-c'
- |
gcloud functions deploy ...
In addition, you can put references (secret_id) to Google Secret Manager secrets in your config.py. You could also use --env-vars-file for which the actual file is stored in the private repository. Another bonus is that you can have directories in your private repo which represent a $BRANCH_NAME or $PROJECT_ID, which makes it easy to create multiple environments (test, development, production etc.). This way you are sure the correct configuration for the environment is injected in the cloud function. We use this as follows:
my-dev-gcp-project > build trigger on development branch
my-prd-gcp-project > build trigger on production branch
In the cloudbuild.yaml we clone the public repo with ${BRANCH_NAME}
and copy the config from a source directory called
${PROJECT_ID}/config.py. With this setup you have clear separation
between development and production config and code.
I'm following the guide here and can't seem to get my Python app (which is deployed fine on GCP) to read the environment variables I've created in Cloud Functions.
The REST endpoint for the function returns the environment variables fine as I've coded up the Python method in the function to just do os.environ.get() on a request parameter that is passed in. However, in my actual deployed application, I don't want to do a REST GET call every time I need an environment variable. I would expect using os.environ.get() in my application would be enough, but that returns blank.
How does one go about retrieving environment variables on GCP with just a simple os.environ.get() or do I really have to make a call to an endpoint every time?
I have been struggling with this for some time. The only solution I have found to set environment variables for the whole app is to define them in app.yaml. See the env_variables section here.
But then you cannot commit app.yaml to any version control repository if you don't want people to see the environment variables. You could add it to .gitignore. There are more secure ways to handle secrets storage if these variables contain sensitive data. If you need more robust security, you might find some inspiration here.
Intro
This link details how to install jupyter locally and work against an Azure HDInsight cluster. This works well getting things setup.
However:
Not all python packages that we have available locally are available on the cluster.
Some local processing may want to be done before 'submitting' a cell to the cluster.
I'm aware that python packages that are not installed can be installed via script actions and %%configure, however given our use of dotenv locally these don't seem to be viable solutions.
Problem
Source control with git
Git repos are local on dev machines We store
configuration/sensitive environment variables in .env files
locally (they are not checked into git)
dotenv package is used to
read sensitive variables and set locally for execution
blob storage
account names and keys are example of these variables
how to pass these locally set variables to a pyspark cell?
Local cell example
Followed by pyspark cell
Let's say I have some code running on a Heroku dyno (such as this autoscaling script), that needs access to the Platform API. To access the API, I have to authenticate using my app's API Key.
What's the right way to do this?
That script I referenced hardcoded the API Key in the script itself.
A better practice generally seems to put secrets in environment variables, which is what Heroku normally recommends. However, they say they say:
Setting the HEROKU_API_KEY environment variable on your machine will
interfere with normal functioning of auth commands from Toolbelt.
Clearly I could store the API key with under a different key name.
What's the right way? I couldn't find this in the documentation, but seems like a common issue.
Yes, storing this token into a config var is the right way to go.
As for HEROKU_API_KEY, this will happen because locally, the toolbelt will look for the environment variable as one solution to try to fetch your token.
This won't impact your production environment (the heroku toolbelt isn't available within dynos).
Locally, you can also set it easily with a tool like python-dotenv, which will allow you to have a local .env file (don't check it into source control, or your token could be corrupted), with all of it's values available as env vars in your dev app.