Storing sensitive information in the code

Storing sensitive information in the code - python

I'm currently using the azure-cosmos module in Python to connect to a database on Azure. I want to fetch the data, make a few transformations, and then push it to a new container.
You need the key and client ID to connect to the database, which I've used as variables in my code for now, as follows:
url = 'https://xyz.azure.com:443/'
key ='randomlettersandnumbers=='
client = CosmosClient(url, credential=key)
This seems to be a bad practice intuitively, and especially once I push this to Git, anyone could gain access to my database. So what's the most secure way to do this?
I'm coming from a non-SWE background, so apologies if this question is dumb.
Thanks!

The way I deal with this kind of problem is by using environment variables
import os
url = os.environ.get("url-endpoint")
key = os.environ.get("api-key")
client = CosmosClient(url, credential=key)
You can set them in your ssh shell like that:
export url-endpoint="https://xyz.azure.com:443/"
export api-key="randomlettersandnumbers=="
Or you can put them in a bash script envs.sh
export url-endpoint="https://xyz.azure.com:443/"
export api-key="randomlettersandnumbers=="
And then you can use source command.
source envs.sh
You have a good article about storing sensitive data using environment variables here

Related

How can I be sure that a Library like Pandas is not sending my API Key Secrets to places outside from my Local?

Let's say:
I have my python code in main.py and I am using Pandas
I am storing my API Key(to some azure service) in a Windows Environment Variable ( variable name = "AZURE_KEY" and variable_value = "abc123abc")
I will import this API Key in main.py using azure_key = os.environ.get("AZURE_KEY")
Question:
How can I be sure that Pandas Library hasn't sent azure_key's value to somewhere outside my local system?
Possible Approach:
I know one way is to go through the entire Pandas module files and understand the source code to see if any fishy stuff is happening , but such an approach is not feasible.
Note:
Pandas is just an example for the question.I want to use an API Key within a Streamlit code.
Hence,Please take this question agnostic to the library..

For a production system (on a server), you could use a firewall to filter outgoing connections
For a development system (your machine), you could add restrictions to the "API Key" account (e.g. only access test data, only access systems you really need, etc.)

Hiding sensitive information in Python

I am creating a Python script which reads a spreadsheet and issues Rest requests to an external service. A token must be obtained to issue requests to the Rest service, so the script needs a username and password to obtain the oauth2 token.
What's the best or standard way to hide or not have visible information in the Python script?

I recommend using a config file. Let's create a config file and name it config.cfg. The file structure should look more or less like this:
[whatever]
key=qwerertyertywert2345
secret=sadfgwertgrtujdfgh
Then in python you can load it this way:
from configparser import ConfigParser
config = ConfigParser()
config.read('config.cfg')
my_key = config['whatever']['key']
my_secret = config['whatever']['secret']

In general, the most standard way to handle secrets in Python is by putting them in runtime configuration.
You can do that by reading explicitly from external files or using os.getenv to read from environment variables.
Another way is to use a tool like python-decouple, which lets you use the environment (variables), a .env file, and an .ini file in a cascade, so that developers and operations can control the environment in local, dev, and production systems.

What is the best way to store login credentials on Airflow?

I found out there are lot of ways to store it as variables, hooks and other ways using encryption. I would like to know what's the best way to do it.

Currently there 2 ways of storing secrests:
1) Airflow Variables: Value of a variable will be hidden if the key contains any words in (‘password’, ‘secret’, ‘passwd’, ‘authorization’, ‘api_key’, ‘apikey’, ‘access_token’) by default, but can be configured to show in clear-text as shown in the image below.
However, there is a known-bug where anyone with an access to UI can export all the variables which will expose the secrets.
2) Airflow Connections:
You can use the Passwords field in Airflow connections which will encrypt that field if you had installed the crypto package (pip install apache-airflow[crypto]). The password field would just appear as blank in the UI as shown in the screenshot.
More on Securing connections: https://airflow.apache.org/howto/secure-connections.html
I recommend the 2nd approach as even if someone gets access to the UI, he/she won't be able to get your secrets. Keep in mind though that you need to install the crypto package for this.
You can then access the secrets as below:
from airflow.hooks.base_hook import BaseHook
connection = BaseHook.get_connection(CONN_ID)
slack_token = connection.password
You can set the CONN_ID as the name of your connection.

How to provide Redshift Database Password in Python Script in AWS Datapipeline?

I am using Redshift and have to write some custom scripts to generate reports. I am using AWS datapipeline CustomShellActivity for running my custom logic. I am using python and boto3. I am wondering what is the safest way and in fact, best practice to provide database password in python script. I am sure that hardcoding password in script is not good practice. What other options do I have or should I explore?

A pretty standard approach is to store credentials in a secure S3 bucket and download them as part of the deployment/launch process using an IAM role with access to the secure bucket. For limited runtime cases like lambda or datapipeline you could download from S3 to an in memory buffer using boto.Key.get_contents_as_string() at startup, parse the file and set up your credentials.
For increased security you can incorporate KMS secret management. Here is an example that combines the two.

I usually store them as an environment variables. I am not sure about the AWS data pipeline deployment, but on a standard Linux box (EC2), you could do:
# ~/.profile or /etc/profile
export MY_VAR="my_value"
And then you can access them in Python like this:
# python script
import os
my_var_value = os.environ['MY_VAR'] if 'MY_VAR' in os.environ else 'default'

TweePy - how to hide API key

I am building simple app which is using Twitter API. What I have to do to hide my Twitter app keys? For example, if I will put my program to the internet and somebody who look up to the code will know my consumer key, access token etc. And if I not include this information into my program, that it won't be work!

I'm assuming by putting on the internet you mean publishing your code on github or such.
In that case you should always separate code and configuration. Put your API keys in an .ini file, i.e. config.ini, then load that file from python program using configparser
Add configuration file to your .gitignore so it would not get added to the source control.

Assuming you're running on a Unix like system, one way to handle this is environment variables.
In your shell you can do this:
export TWITTER_API_KEY=yoursecretapikey
Note that you don't use quotes of any kind for this.
Then in your script:
import os
twitter_key = os.environ.get('TWITTER_API_KEY')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Storing sensitive information in the code - python

Related

How can I be sure that a Library like Pandas is not sending my API Key Secrets to places outside from my Local?

Hiding sensitive information in Python

What is the best way to store login credentials on Airflow?

How to provide Redshift Database Password in Python Script in AWS Datapipeline?

TweePy - how to hide API key

Categories

Resources