How to test if process is "in cloud" on AWS - python

I have a python application which should run remotely via an AWS pipeline and use secrets to get parameters such as database credentials. When running the application locally the parameters are loaded from a parameters.json file. My problem is how to to test I run remotely (so replacing IN_CLOUD_TEST):
import boto3
from json import load
if [IN_CLOUD_TEST]:
params_raw = boto3.client('ssm').get_parameters_by_path(Path='/', Recursive=True)['Parameters']
params = format_params(params)
else:
with open('parameters.txt') as json_file:
params = load(json_file)
I could of course use a try/except, but there must be something nicer.

You could check using AWS APIs, but a simpler alternative (and one that doesn't require making HTTP calls, helping you shave off some latency) is to set an environment variable on your remote server that tells it it's the production server and read it from the code.
import boto3
from json import load
from os import getenv
if getenv('IS_REMOTE', False):
params_raw = boto3.client('ssm').get_parameters_by_path(Path='/', Recursive=True)['Parameters']
params = format_params(params)
else:
with open('parameters.txt') as json_file:
params = load(json_file)
You could also apply the same logic but defining a variable that equals true when your server is supposed to be the testing one, and setting it on your local testing machine.

Related

How do I cache files within a function app?

I am trying to temporarily store files within a function app. The function is triggered by an http request which contains a file name. I first check if the file is within the function app storage and if not, I write it into the storage.
Like this:
if local_path.exists():
file = json.loads(local_path.read_text('utf-8'))
else:
s = AzureStorageBlob(account_name=account_name, container=container,
file_path=file_path, blob_contents_only=True,
mi_client_id=mi_client_id, account_key=account_key)
local_path.write_bytes(s.read_azure_blob())
file = json.loads((s.read_azure_blob()).decode('utf-8'))
This is how I get the local path
root = pathlib.Path(__file__).parent.absolute()
file_name = pathlib.Path(file_path).name
local_path = root.joinpath('file_directory').joinpath(file_name)
If I upload the files when deploying the function app everything works as expected and I read the files from the function app storage. But if I try to save or cache a file, it breaks and gives me this error Error: [Errno 30] Read-only file system:
Any help is greatly appreciated
So after a year I discovered that Azure function apps can utilize the python os module to save files which will persist throughout calls to the function app. No need to implement any sort of caching logic. Simply use os.write() and it will work just fine.
I did have to implement a function to clear the cache after a certain period of time but that was a much better alternative to the verbose code I had previously.

AWS Batch - How to access AWS Batch environment variables within python script running inside Docker container

I have a Docker container which executes a python script inside it as the ENTRYPOINT. This is the DockerFile
FROM python:3
ADD script.py /
EXPOSE 80
RUN pip install boto3
RUN pip install uuid
ENTRYPOINT ["python","./script.py"]
This is the Python script:
import boto3
import time
import uuid
import os
guid = uuid.uuid4()
timestr = time.strftime("%Y%m%d-%H%M%S")
job_index = os.environ['AWS_BATCH_JOB_ARRAY_INDEX']
filename = 'latest_test_' + str(guid) + '_.txt'
with open(filename, 'a+') as f:
data = job_index
f.write(data)
client = boto3.client(
's3',
# Hard coded strings as credentials, not recommended.
aws_access_key_id='',
aws_secret_access_key=''
)
response = client.upload_file(filename, 'api-dev-dpstorage-s3', 'docker_data' + filename + '.txt')
with open('response2.txt', 'a+') as f:
f.write('all done')
exit
It is simply designed to create a file, write the job array index into the file and push it to an S3 Bucket. The job array index from AWS Batch is being sourced from one of the pre-defined environment variables. I have uploaded the image to AWS ECR, and have set up an AWS Batch to run a job with an array of 10. This should execute the job 10 times, with my expectation that 10 files are dumped into S3, each containing the array index of the job itself.
If I don't include the environment variable and instead just hard code a value into the text file, the AWS Batch job works. If I include the call to os.environ to get the variable, the job fails with this AWS Batch error:
Status reasonEssential container in task exited
I'm assuming there is an issue with how I'm trying to obtain the environment variable. Does anyone know how I could correctly reference either one of the built in environment variables and/or a custom environment variable defined in the job?
AWS provides docker env configuration by job definition parameters, where you specify:
"environment" : [
{ "AWS_BATCH_JOB_ARRAY_INDEX" : "string"},
]
This will be turned into docker env parameter:
$ docker run --env AWS_BATCH_JOB_ARRAY_INDEX=string $container $cmd
Thus can be accessed by:
import os
job_id = os.environ['AWS_BATCH_JOB_ARRAY_INDEX']
But watch out if you are passing sensitive data in this way, it is not wise to pass credentials in plain text. Instead, in this case, you may want to create a compute environment.

SnapLogic Python Read and Execute SQL File

I have a simple SQL file that I'd like to read and execute using a Python Script Snap in SnapLogic. I created an expression library file to reference the Redshift account and have included it as a parameter in the pipeline.
I have the code below from another post. Is there a way to reference the pipeline parameter to connect to the Redshift database, read the uploaded SQL file and execute the commands?
fd = open('shared/PythonExecuteTest.sql', 'r')
sqlFile = fd.read()
fd.close()
sqlCommands = sqlFile.split(';')
for command in sqlCommands:
try:
c.execute(command)
except OperationalError, msg:
print "Command skipped: ", msg
You can access pipeline parameters in scripts using $_.
Let's say, you have a pipeline parameter executionId. Then to access it in the script you can do $_executionId.
Following is a test pipeline.
With the following pipeline parameter.
Following is the test data.
Following is the script
# Import the interface required by the Script snap.
from com.snaplogic.scripting.language import ScriptHook
import java.util
class TransformScript(ScriptHook):
def __init__(self, input, output, error, log):
self.input = input
self.output = output
self.error = error
self.log = log
# The "execute()" method is called once when the pipeline is started
# and allowed to process its inputs or just send data to its outputs.
def execute(self):
self.log.info("Executing Transform script")
while self.input.hasNext():
try:
# Read the next document, wrap it in a map and write out the wrapper
in_doc = self.input.next()
wrapper = java.util.HashMap()
wrapper['output'] = in_doc
wrapper['output']['executionId'] = $_executionId
self.output.write(in_doc, wrapper)
except Exception as e:
errWrapper = {
'errMsg' : str(e.args)
}
self.log.error("Error in python script")
self.error.write(errWrapper)
self.log.info("Finished executing the Transform script")
# The Script Snap will look for a ScriptHook object in the "hook"
# variable. The snap will then call the hook's "execute" method.
hook = TransformScript(input, output, error, log)
Output:
Here, you can see that the executionId was read from the pipeline parameters.
Note: Accessing pipeline parameters from scripts is a valid scenario but accessing other external systems from the script is complicated (because you would need to load the required libraries) and not recommended. Use the snaps provided by SnapLogic to access external systems. Also, if you want to use other libraries inside scripts, try sticking to Javascript instead of going to python because there are a lot of open source CDNs that you can use in your scripts.
Also, you can't access any configured expression library directly from the script. If you need some logic in the script, you would keep it in the script and not somewhere else. And, there is no point in accessing account names in the script (or mappers) because, even if you know the account name, you can't use the credentials/configurations stored in that account directly; that is handled by SnapLogic. Use the provided snaps and mappers as much as possible.
Update #1
You can't access the account directly. Accounts are managed and used internally by the snaps. You can only create and set accounts through the accounts tab of the relevant snap.
Avoid using script snap as much as possible; especially, if you can do the same thing using normal snaps.
Update #2
The simplest solution to this requirement would be as follows.
Read the file using a file reader
Split based on ;
Execute each SQL command using the Generic JDBC Execute Snap

Using Python Boto with AWS Support API

I've used boto to interact with S3 with with no problems, but now I'm attempting to connect to the AWS Support API to pull back info on open tickets, trusted advisor results, etc. It seems that the boto library has different connect methods for each AWS service? For example, with S3 it is:
conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
According to the boto docs, the following should work to connect to AWS Support API:
>>> from boto.support.connection import SupportConnection
>>> conn = SupportConnection('<aws access key>', '<aws secret key>')
However, there are a few problems I see after digging through the source code. First, boto.support.connection doesn't actually exist. boto.connection does, but it doesn't contain a class SupportConnection. boto.support.layer1 exists, and DOES have the class SupportConnection, but it doesn't accept key arguments as the docs suggest. Instead it takes 1 argument - an AWSQueryConnection object. That class is defined in boto.connection. AWSQueryConnection takes 1 argument - an AWSAuthConnection object, class also defined in boto.connection. Lastly, AWSAuthConnection takes a generic object, with requirements defined in init as:
class AWSAuthConnection(object):
def __init__(self, host, aws_access_key_id=None,
aws_secret_access_key=None,
is_secure=True, port=None, proxy=None, proxy_port=None,
proxy_user=None, proxy_pass=None, debug=0,
https_connection_factory=None, path='/',
provider='aws', security_token=None,
suppress_consec_slashes=True,
validate_certs=True, profile_name=None):
So, for kicks, I tried creating an AWSAuthConnection by passing keys, followed by AWSQueryConnection(awsauth), followed by SupportConnection(awsquery), with no luck. This was inside a script.
Last item of interest is that, with my keys defined in a .boto file in my home directory, and running python interpreter from the command line, I can make a direct import and call to SupportConnection() (no arguments) and it works. It clearly is picking up my keys from the .boto file and consuming them but I haven't analyzed every line of source code to understand how, and frankly, I'm hoping to avoid doing that.
Long story short, I'm hoping someone has some familiarity with boto and connecting to AWS API's other than S3 (the bulk of material that exists via google) to help me troubleshoot further.
This should work:
import boto.support
conn = boto.support.connect_to_region('us-east-1')
This assumes you have credentials in your boto config file or in an IAM Role. If you want to pass explicit credentials, do this:
import boto.support
conn = boto.support.connect_to_region('us-east-1', aws_access_key_id="<access key>", aws_secret_access_key="<secret key>")
This basic incantation should work for all services in all regions. Just import the correct module (e.g. boto.support or boto.ec2 or boto.s3 or whatever) and then call it's connect_to_region method, supplying the name of the region you want as a parameter.

JMeter - Run a python script before calling each HTTP request sampler

I am new to Jmeter. My HTTP request sampler call looks like this
Path= /image/**image_id**/list/
Header = "Key" : "Key_Value"
Key value is generated by calling a python script which uses the image_id to generate a unique key.
Before each sampler I wanted to generate the key using python script which will be passed as a header to the next HTTP Request sampler.
I know I have to used some kind of preprocessor to do that. Can anyone help me do it using a preprocessor in jmeter.
I believe that Beanshell PreProcessor is what you're looking for.
Example Beanshell code will look as follows:
import java.io.BufferedReader;
import java.io.InputStreamReader;
Runtime r = Runtime.getRuntime();
Process p = r.exec("/usr/bin/python /path/to/your/script.py");
p.waitFor();
BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = "";
StringBuilder response = new StringBuilder();
while ((line = b.readLine()) != null) {
response.append(line);
}
b.close();
vars.put("ID",response.toString());
The code above will execute Python script and put it's response into ID variable.
You will be able to refer it in your HTTP Request as
/image/${ID}/list/
See How to use BeanShell: JMeter's favorite built-in component guide for more information on Beanshell scripting in Apache JMeter and a kind of Beanshell cookbook.
You can also put your request under Transaction Controller to exclude PreProcessor execution time from load report.
A possible solution posted by Eugene Kazakov here:
JSR223 sampler has good possibility to write and execute some code,
just put jython.jar into /lib directory, choose in "Language" pop-up
menu jython and write your code in this sampler.
Sadly there is a bug in Jython, but there are some suggestion on the page.
More here.
You can use a BSF PreProcessor.
First download the Jython Library and save to your jmeter's lib directory.
On your HTTP sampler add a BSF PreProcessor, choose as language Jython and perform your needed magic to obtain the id, as an example I used this one:
import random
randImageString = ""
for i in range(16):
randImageString = randImageString + chr(random.randint(ord('A'),ord('Z')))
vars.put("randimage", randImageString)
Note the vars.put("randimage",randImageString") which will insert the variable available later to jmeter.
Now on your test you can use ${randimage} when you need it:
Now every Request will be different changing with the value put to randimage on the Python Script.

Categories

Resources