When you complete this tutorial https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-getting-started-hello-world.html you download the AWS SAM CLI and run the commands in order to create a simple AWS hello-world application. When you run the program it triggers what AWS calls a lambda function and at the end of the tutorial you can open it in your browser in the url window using: http://127.0.0.1:3000/hello, if you see a message here that shows curly braces and the words 'hello-world' that means it is successful.
Running the AWS SAM commands generates a lot of boiler plate code which is a bit confusing. This can all be seen inside a code editor. One of them is called event.json, which of course is a JSON object but why is it there? what does it represent in relation to this program? I am trying to understand what this AWS SAM application is ultimately doing and what the files generated mean and represent here.
Can someone simply break down what AWS SAM is doing and the meaning behind the boiler plate code it generates?
Thank you
event.json contain the input your lambda function will get in json format. Regardless of how a lambda is triggered, it will always have 2 fixed parameters: Event and Context. Context contains additional information about the trigger like source, while Event Contains any input parameters that your lambda needs to run.
You can test this out by yourself by editing the event.json and giving your own values. If you open the lambda code file you will see this event object being used in the lambda_handler.
Other boilerplate stuff is your template where you can define the configuration of your lambdas as well as any other services you might use like layers or a database or api gateway.
You also get a requirements.txt file which contains names of any third party libraries that your function requires. These will be packaged along with the code.
Ninad's answer is spot on. I just want to add a practical application of how these json files are used. One way the event.json is used is when you are invoking your lambdas using the command sam local invoke. When you are invoking the lambda locally, you pass the event.json (or what ever you decide to call the file, you will likely have multiples) as a parameter. As Ninad mentioned, the event file has everything your lambda needs to run in terms of input. When the lambdas are hooked up to other services and running live, these inputs would be being feed to your lambda from that service.
Related
I've got a few small Python functions that post to twitter running on AWS. I'm a novice when it comes to Lambda, knowing only enough to get the functions running.
The functions have environment variables set in Lambda with various bits of configuration, such as post frequency and the secret data for the twitter application. These are read into the python script directly.
It's all triggered by an Event Bridge cron job that runs every hour.
I'm wanting to create a test event that will allow me to invoke the function manually, but would like to be able to change the post frequency variable when run like this.
Is there a simple way to change environment variables when running a test event?
That is very much possible and there are multiple ways to do it. One is to use AWS CLI's aws lambda update-function-configuration: https://docs.aws.amazon.com/cli/latest/reference/lambda/update-function-configuration.html
Alternatively, depending on programming language that you prefer, you can use AWS SDK that also has a similar method, you can find an example with JS SDK in this doc: https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/javascript_lambda_code_examples.html
GCP has a published create_instance() code snippet available here, which I've seen on SO in a couple places e.g. here. However, as you can see in the first link, it's from 2015 ("Copyright 2015 Google Inc"), and Google has since published another code sample for launching a GCE instance dated 2022. It's available on github here, and this newer create_instance function is what's featured in GCP's python API documentation here.
However, I can't figure out how to pass a startup script via metadata to run on VM startup using the modern python function. I tried adding
instance_client.metadata.items = {'key': 'startup-script',
'value': job_script}
to the create.py function (again, available here along with supporting utility functions it calls) but it threw an error that the instance_client doesn't have that attribute.
GCP's documentation page for starting a GCE VM with a startup script is here, where unlike most other similar pages, it contains code snippets only for console, gcloud and (REST)API; not SDK code snippets for e.g. Python and Ruby that might show how to modify the python create_instance function above.
Is the best practice for launching a GCE VM with a startup script from a python process really to send a post request or just wrap the gcloud command
gcloud compute instances create VM_NAME \
--image-project=debian-cloud \
--image-family=debian-10 \
--metadata-from-file=startup-script=FILE_PATH
...in a subprocess.run()? To be honest I wouldn't mind doing things that way since the code is so compact (the gcloud command at least, not the POST request way), but since GCP provides a create_instance python function I had assumed using/modifying-as-necessary that would be the best practice from within python...
Thanks!
So, the simplest (!) way with the Python library to create the equivalent of --metadata-from-file=startup-scripts=${FILE_PATH} is probably:
from google.cloud import compute_v1
instance = compute_v1.Instance()
metadata = compute_v1.Metadata()
metadata.items = [
{
"key":"startup-script",
"value":'#!/usr/bin/env bash\necho "Hello Freddie"'
}
]
instance.metadata = metadata
And another way is:
metadata = compute_v1.Metadata()
items = compute_v1.types.Items()
items.key = "startup-script"
items.value = """
#!/usr/bin/env bash
echo "Hello Freddie"
"""
metadata.items = [items]
NOTE In the examples, I'm embedding the content of the FILE_PATH in the script for convenience but you could, of course, use Python's open to achieve a more comparable result.
It is generally always better to use a library|SDK if you have one to invoke functionality rather than use subprocess to invoke the binary. As mentioned in the comments, the primary reason is that language-specific calls give you typing (more in typed languages), controlled execution (e.g. try) and error handling. When you invoke a subprocess its string-based streams all the way down.
I agree that the Python library for Compute Engine using classes feels cumbersome but, when you're writing a script, the focus could be on the long-term benefits of more explicit definitions vs. the short-term pain of the expressiveness. If you just wanna insert a VM, by all means using gcloud compute instances create (I do this all the time in Bash) but, if you want to use a more elegant language like Python, then I encourage you to use Python entirely.
CURIOSITY gcloud is written in Python. If you use Python subprocess to invoke gcloud commands, you're using Python to invoke a shell that runs Python to make a REST call ;-)
I have 3 scripts: the 1st and the 3rd are written in R, and the 2nd in Python.
The output of the 1st script is the input of the 2nd script, and its output is the input of the 3rd one.
The inputs and outputs are search keywords or phrases.
For example, the output of the 1st script is Hello, then the 2nd turns the word to olleH, and the 3rd one converts the letters to uppercase: OLLEH.
My question is how can I connect those scripts and let them run automatically, without my intervention, on AWS. What will be the commands? How can the output of the 1st script be saved, and play a role as the input of the 2nd one, etc.?
I would start an sh Script (or bat on a Windows machine). Then use the return values for the scripts as input for the next. So something like:
SET var1 = Rscript script1.R
SET var2 = py script2.py $var1
SET var3 = Rscript script3.R $ $var2
echo $var3
Of course you need to change your scripts to using the inputs you submitted.
I have never used AWS so I'm unfamiliar with that, but this seems like a workflow management system would solve these issues. Take a look into snakemake or nextflow. With these tools you can easily (after you get used to it) do exactly what you describe. Run scripts/tools that depend on each other sequentially (and also in parallel).
You can use AWS Step Functions to achieve your goal. For Python parts you can use AWS Lambda tasks, for R parts - AWS ECS tasks, and orchestrate data flow accordingly.
https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html
For commands, I wouldn't count on receiving a comprehensive response - workflows are complex and very individual in each case, but I would recommend defining them via some sort of IaC solution like CloudFormation or AWS CDK and keeping them under git.
https://docs.aws.amazon.com/cdk/api/latest/docs/aws-stepfunctions-readme.html
Is there a mechanism to automatically register flows/new flows if a local agent is running, without having to manually run e.g. flow.register(...) on each one?
In airflow, I believe they have a process that regularly scans for any files with dag in the name in the specified airflow home folder, then searches them for DAG objects. And if it finds them it loads them so they are accessible through the UI without having to manually 'register' them.
Does something similar exist for prefect. So for example if I just created the following file test_flow.py, without necessarily running it or adding flow.run_agent() is there a way for it to just be magically registered and accessible through the UI :) - just by it simply existing in the proper place?
# prefect_home_folder/test_flow.py
import prefect
from prefect import task, Flow
#task
def hello_task():
logger = prefect.context.get("logger")
logger.info("Hello, Cloud!")
flow = Flow("hello-flow", tasks=[hello_task])
flow.register(project_name='main')
I could write a script that has similar behavior to the airflow process to scan a folder and register flows at regular intervals, but I wonder if it's a bit hacky or if there is a better solution and I'm justing thinking too much in terms of airflow?
Great question (and awesome username!) - in short, I suggest you are thinking too much in terms of Airflow. There are a few reasons this is not currently available in Prefect:
explicit is better than implicit
Prefect flows are not constrained to live in one place and are not constrained to have the same runtime environments; this makes both the automatic discovery of a flow + re-serializing it complicated from a single agent process (which is not required to share the same runtime environment as the flows it submits)
agents are better thought of as being parametrized by deployment infrastructure, not flow storage
Ideally for production workflows you'd use a CI/CD process so that anytime you make a code change an automatic job is triggered that re-registers the flow. A few comments that may be helpful:
you don't actually need to re-register the flow for every possible code change; for example, if you changed the message that your hello_task logs in your example, you could simply re-save the flow to its original location (what this looks like depends on the type of storage you use). Ultimately you only need to re-register if any of the metadata about your flow changes (retry settings, task names, dependency relationships, etc.)
you can use flow.register("My Project", idempotency_key=flow.serialized_hash()) to automatically capture this; this pattern will only register a new version if the flow's backend representation changes in some way
I am writing a watchman command with watchman-make and I'm at a loss when trying to access exactly what was changed in the directory. I want to run my upload.py script and inside the script I would like to access filenames of newly created files in /var/spool/cups-pdf/ANONYMOUS .
so far I have
$ watchman-make -p '/var/spool/cups-pdf/ANONYMOUS' -—run 'python /home/pi/upload.py'
I'd like to add another argument to python upload.py so I can have an exact filepath to the newly created file so that I can send the new file over to my database in upload.py,
I've been looking at the docs of watchman and the closest thing I can think to use is a trigger object. Please help!
Solution with watchman-wait:
Assuming project layout like this:
/posts/_SUBDIR_WITH_POST_NAME_/index.md
/Scripts/convert.sh
And the shell script like this:
#!/bin/bash
# File: convert.sh
SrcDirPath=$(cd "$(dirname "$0")/../"; pwd)
cd "$SrcDirPath"
echo "Converting: $SrcDirPath/$1"
Then we can launch watchman-wait like this:
watchman-wait . --max-events 0 -p 'posts/**/*.md' | while read line; do ./Scripts/convert.sh $line; done
When we changing file /posts/_SUBDIR_WITH_POST_NAME_/index.md the output will be like this:
...
Converting: /Users/.../Angular/dartweb_quickstart/posts/swift-on-android-building-toolchain/index.md
Converting: /Users/.../Angular/dartweb_quickstart/posts/swift-on-android-building-toolchain/index.md
...
watchman-make is intended to be used together with tools that will perform a follow-up query of their own to discover what they want to do as a next step. For example, running the make tool will cause make to stat the various deps to bring things up to date.
That means that your upload.py script needs to know how to do this for itself if you want to use it with watchman.
You have a couple of options, depending on how sophisticated you want things to be:
Use pywatchman to issue an ad-hoc query
If you want to be able to run upload.py whenever you want and have it figure out the right thing (just like make would do) then you can have it ask watchman directly. You can have upload.py use pywatchman (the python watchman client) to do this. pywatchman will get installed if the the watchman configure script thinks you have a working python installation. You can also pip install pywatchman. Once you have it available and in your PYTHONPATH:
import pywatchman
client = pywatchman.client()
client.query('watch-project', os.getcwd())
result = client.query('query', os.getcwd(), {
"since": "n:pi_upload",
"fields": ["name"]})
print(result["files"])
This snippet uses the since generator with a named cursor to discover the list of files that changed since the last query was issued using that same named cursor. Watchman will remember the associated clock value for you, so you don't need to complicate your script with state tracking. We're using the name pi_upload for the cursor; the name needs to be unique among the watchman clients that might use named cursors, so naming it after your tool is a good idea to avoid potential conflict.
This is probably the most direct way to extract the information you need without requiring that you make more invasive changes to your upload script.
Use pywatchman to initiate a long running subscription
This approach will transform your upload.py script so that it knows how to directly subscribe to watchman, so instead of using watchman-make you'd just directly run upload.py and it would keep running and performing the uploads. This is a bit more invasive and is a bit too much code to try and paste in here. If you're interested in this approach then I'd suggest that you take the code behind watchman-wait as a starting point. You can find it here:
https://github.com/facebook/watchman/blob/master/python/bin/watchman-wait
The key piece of this that you might want to modify is this line:
https://github.com/facebook/watchman/blob/master/python/bin/watchman-wait#L169
which is where it receives the list of files.
Why not triggers?
You could use triggers for this, but we're steering folks away from triggers because they are hard to manage. A trigger will run in the background and have its output go to the watchman log file. It can be difficult to tell if it is running, or to stop it running.
The interface is closer to the unix model and allows you to feed a list of files on stdin.
Speaking of unix, what about watchman-wait?
We also have a command that emits the list of changed files as they change. You could potentially stream the output from watchman-wait in your upload.py. This would make it have some similarities with the subscription approach but do so without directly using the pywatchman client.