Is it practically possible to simulate AWS environment locally using Moto and Python?
I want to write a aws gluejob that will fetch record from my local database and will upload to S3 bucket for data quality check and later trigger a lambda function for cronjob run using Moto Library using moto.lock_glue decorator.Any suggestion or document would be highly appreciated as I don't see much clue on same.Thank you in advance.
AFAIK, moto is meant to patch boto modules for testing.
I have experience working with LocalStack, a docker you can run locally, and it acts as a live service emulator for most AWS services (some are only available for paying users).
https://docs.localstack.cloud/getting-started/
You can see here which services are supported by the free version.
https://docs.localstack.cloud/user-guide/aws/feature-coverage/
in order to use it, you need to change the endpoint-url to point to the local service running on docker.
As it's a docker, you can incorporate it with remote tests as well e.g., if you're using k8s or a similar orchestrator
Related
Does anyone have any good suggestions for where I can store my custom Python modules on Google Cloud Platform?
I have a bunch of modules that I would like to access from the different GCP services I am using (App Engine, Compute Engine, Cloud Functions etc), without having to copy the Python files and upload to the service's Python environment each time.
I was thinking GCS could be an option but then I am not sure how I would then get the module into, say Cloud Functions or App Engine?
Any ideas?
The code will eventually need to be written to your service's local storage. Python does not access code remotely during execution unless you write your code to do so (download the module and then execute). Package your code as modules and publish to PyPI and then add them as dependencies. When you deploy a service, your modules will be downloaded.
I have tried to follow googles documentation on how to set up local development using a database (https://cloud.google.com/appengine/docs/standard/python/tools/using-local-server#Python_Using_the_Datastore). However, i do not have the experience level to follow along. I am not even sure if that was the right guide. The application is a Django project that uses python 2.7. To run the local host, i usually type dev_appserver.py --host 127.0.0.1 .
My questions are:
how do i download the data store database on google cloud. I do not want to download the entire database, just enough data to populate local host so i can do tests
once the database is download, what do i need to do to connect it to the localhost? Do i have to change a parameter somewhere?
do i need to download the datastore? Can i just make a duplicate on the cloud and then connect to that datastore?
When i run localhost, should it not already be connected to the datastore? Since the site works when it is running on the cloud. Where can i find the connection URI?
Thanks for the help
The development server is meant to simulate the whole App Engine Environment, if you examine the output of the dev_appserver.py command you'll see something like Starting Cloud Datastore emulator at: http://localhost:PORT. Your code will interact with that bundled Datastore automatically, pushing and retrieving data according to the code you wrote. Your data will be saved on a file in local storage and will persist across different runs of the development server unless it's explicitly deleted.
This option doesn't provide facilities to import data from your existing Cloud Datastore instance although it's a ready to go solution if your testing procedures can afford populating the local database with mock data through the use of a custom created script that does so programmatically. If you decide for this approach just write the data creation script and execute it before running the tests.
Now, there is another option to simulate local Datastore using the Cloud SDK that comes with handy features for your purposes. You can find the available information for it under Running the Datastore Emulator documentation page. This emulator has support to import entities downloaded from your production Cloud Datastore as well as for exporting them into files.
Back to your questions:
Export data from the Cloud instance into a GCS bucket following this, then download the data from the bucket to your filesystem following this, finally import the data into the emulator with the command shown here.
To use the emulator you need to first run gcloud beta emulators datastore start in a Cloud Shell and then in a separate tab run dev_appserver.py --support_datastore_emulator=true --datastore_emulator_port=8081 app.yaml.
The development server uses one of the two aforementioned emulators, in both cases it is not connected to your Cloud Datastore. You might create another project aimed for development purposes with a copy of your database and deploy your application there so you don't use the emulator at all.
Requests at datastore are made trough the endpoint https://datastore.googleapis.com/v1/projects/project-id although this is not related to how the emulators manage the connections in your local server.
Hope this helps.
I have a Python AWS Lambda running on a Linux, but due to some dependencies, I need it to be deployed on a Windows. I have tried using Python Azure Functions and have successfully deployed it on a Linux as well, but found out they cannot be deployed on Windows. Is it possible to do it with AWS Lambda?
Basically my solution has a few .exe that need to be run by a python library (Tesseract OCR and pytesseract)
AWS Lambda and Azure Functions are considered Function as a Service (FaaS) solutions, where the developer worries about the code and the cloud provider worries about availability, scalability and the platform underneath to run the code.
In that aspect, you can't run any of them on a server. If you need specific Windows dependencies, you must create a Python project as you normally would, install the dependencies and configure the Windows Server, being responsible for infrastructure and OS configurations and management.
I am searching for this for a few days, found some approaches like Serverless or Localstack, but what I would really like to do is be able to code everything using AWS API Gateway and Lambdas for a cloud-based version of my software (which is solved) and not manage my deployments.
Then...
A customer wants to host a copy of it inside its own private network, so... I wanna use the very same Lambda code (which makes no use of other AWS 'magic' services like DynamoDB ... only "regular" dependencies) injecting it into a container running "an API Gateway"-like software (perhaps a python/flask parsing the exported API Gateway config?).
I am willing to build this layer unless a better idea shows up. So I would be able to put my lambdas on a folder lets say "aws_lambda", and my container would know how to transform the HTTP payload to an AWS event payload, import the module, call 'lambda_handler' ... and hopefully that is it. Having another container with MySQL and another with Nginx (emulating CloudFront for static website) and I will be done. The whole solution in a can.
Any suggestions? Am I crazy?
Does anyone know some existing software solution to solve this?
If you are willing to use AWS SAM, the AWS SAM CLI offers what you're looking for.
The AWS SAM CLI implements its own API Gateway equivalent and runs the AWS Lambda functions in Docker containers. While it's mainly intended for testing, I don't see any reason, why you shouldn't be able to use it for your use-case as well.
Besides different serverless plugins and localstack you can try AWS SAM Cli to run local api gateway . The command is start-api https://docs.aws.amazon.com/lambda/latest/dg/test-sam-cli.html . It probably would not scale, never tried myself and it is intended for testing.
Curiosly what you are consider to do (transform lambda into normal flask server is oppozite to zappa, which is serverless package that convert normal flask server into a lambda function and uploads it to AWS). If you succed in your original idea of converting a flask request into lambda event and care to package you code, it can be called unzappa. While zappa is mature and large package, probably it would be easier to 'invert' some light-weight thing like awsgi https://github.com/slank/awsgi/blob/master/awsgi/init.py
#Lovato, I have been using https://github.com/lambci/docker-lambda Which is a docker image that mimics lambda environment, lambci seems to be maintaining a good version of the lambda images for nodejs, java, python, .net and even go lang. So you can technically reuse your entire lambda code in a docker running lambda "like" environment. I call it lambda-like mostly because aws doesn't fully publish every piece of information about how lambda works. But this is the nearest approximation I've seen. I use this for local development and testing of lambda. And I have tested a trail "offline" lambda. Let me know if this works for you.
I do prefer to use the docker files and create my docker images for my use.
I have a huge amount of data (hundreds of Gigas) on Google BigQuery and for easy of use (many post query treatements) I'm working with the bigquery python package. The problem is that I have to run again all my queries whenever I shut my laptop down, this is very expensive as my dataset is about one Tera. I think of Google Compute Engine but this is a poor solution as I will still paying for my machines if I don't stop them. My last solution is to mount a docker image on our own sandbox, this is cheaper and can do exactly what I'm looking for. So I would like to know if someone has ever mounted a docker image for BigQuery ? Thanks for helping!
We mount all of our python/bigquery projects into docker containers and push them to google cloud registry.
Automated scheduling, dependancy graphing, and logging can be handled with Google Cloud Composer (Airflow). Its pretty simple to get set up, and Airflow has a Kubernetes Pod Operator, That allows you to specify a python file to run in your docker image on GCR. You can use this workflow to make sure all of your queries and python scripts are run on GCP without having to worry about Google Compute Engine, or any devops type of things.
https://cloud.google.com/composer/docs/how-to/using/using-kubernetes-pod-operator
https://cloud.google.com/composer/