Cant write to Database with AWS Lambda - python

I am trying to write files to a postgres database with AWS Lambda but I am facing an error:
Calling the invoke API action failed with this message: Network Error
My code looks like this:
from sqlalchemy import create_engine
import pandas as pd
def test(event=None, context=None):
conn = create_engine('postgresql://user:password#url:5439/database')
df = pd.DataFrame([{'A': 'foo', 'B': 'green', 'C': 11},{'A':'bar', 'B':'blue', 'C': 20}])
df.to_sql('your_table', conn, index=False, if_exists='replace', schema='schema')
test()
Resources:
Memory - 1280MB
Timeout - 2 minutes
What is the problem here and how else could I write pandas Dataframe to a Database with AWS Lambda?

I'm assuming the Postgres instance is in RDS.
Is your lambda in your VPC? You can check this on the function's page in admin console, in the VPC box. By default it's not and the VPC box says "None".
Case 1: Lambda is not in VPC
Then the issue might be that the security group associated with your RDS instance does not allow connections from outside the VPC. That's the default if you didn't touch the security group. Find the security group for your RDS instance from the RDS admin, then check out the "Inbound rules" for that security group. Lambdas don't have an IP so you'll need to add an inbound rule allowing at least postgres traffic for source "0.0.0.0/0", i.e. the entire internet.
This should be sufficient but note that this is not considered very good for security, since anyone can now in theory reach your DB (and worse if they can guess the password). But depending on your project that might not be a problem for you. If that is an issue for you, you could instead associate your lambda with the same VPC the RDS instance is in, in order to provide better networking security, and move to Case 2.
Case 2: Lambda is in a VPC
I'm assuming you put the lambda in the same VPC as the RDS instance for simplicity - if not you probably know what you're doing.
All you need to do now (providing you didn't touch other network configs) is ensure your RDS instance's security group allows access from your lambda's security group. So you could put both in the default security group, or put them in separate groups but make sure the RDS one has an inbound rule allowing the lambda one.
Note that if your lambda also needs to call external services (since you mention querying an API), in order to enable that, after linking it to your VPC you'll also need to create a NAT Gateway like I described here: https://stackoverflow.com/a/61273118/299754

Related

Connecting to DocumentDB from AWS Lambda using Python

I am trying to connect to DocumentDB from a Lambda function.
I have configured my DocumentDB as per this tutorial and can access it through the cloud9 command prompt.
The documentDB cluster is part of two security groups. The first security group is called demoDocDB and the second called default and is the vpc defulat security group.
The inbound rules for demoDocDB forward requests from the cloud9 instance to port 27017 where my documentDB database is running.
The inbound rules for the defualt security group specify all traffic, all port ranges and a source of itself. The VPC ID is the default VPC setup.
In lambda when editing the VPC details, I have inputted:
VPC - The defualt VPC
Subnets - Chosen all 3 subnets available
Security Groups - The default security group for VPC
The function has worked twice in writting to the Database, the rest of the time it has timed out, the timeout on the Lambda function is 2 minutes but before reaching that it will throw a time out error.
[ERROR] ServerSelectionTimeoutError: MY_DATABASE_URL:27017: [Errno -2] Name or service not known
The snippet of code below is what is trying to be executed, the function will never reach the print("INSERTED DATA") it times out during the insert statement.
def getDBConnection():
client = pymongo.MongoClient(***MY_URL***)
##Specify the database to be used
db = client.test
print("GOT CONNECTION",db)
##Specify the collection to be used
col = db.myTestCollection
print("GOT COL",col)
##Insert a single document
col.insert_one({'hello':'Amazon DocumentDB'})
print("INSERTED DATA")
##Find the document that was previously written
x = col.find_one({'hello':'Amazon DocumentDB'})
##Print the result to the screen
print("RETRIEVED DATA",x)
##Close the connection
client.close()
I have tried changing the version of pymongo as this thread suggested however it did not help.
Make sure your Lambda function is not in the public subnet, otherwise, it will not work. So, that means you need to go back to the Lambda console and remove the public subnet from the VPC editable section.
Make sure you have a Security group specifically for your Lambda Function as follows:
Lambda Security Group Outbound Rule:
Type Protocol Port Range Destination
All Traffic All All 0.0.0.0/0
You can also restrict this to HTTP/HTTPS on Ports 80/443 if you'd like.
2.Check the Security Group of your DocumentDB Cluster to see if it is set up with an inbound rule as follows:
Type Protocol Port Range Source
Custom TCP TCP 27017 Lambda Security Group
Your Lambda Function needs to have the correct permissions, those are:
Managed policy AWSLambdaBasicExecutionRole
Managed policy AWSLambdaVPCAccessExecutionRole
After doing this your VPC section should look something like this:
1. VPC - The default VPC
2. Subnets - Chosen 2 subnets (Both Private)
3. Security Group for your Lambda function. Not the default security group
And that should do it for you. Let me know if it does not work though and I'll try and help you troubleshoot.

Unable to connect to aws redshift from python within lambda

I am trying to connect to redshift with python through lambda. The purpose is to perform queries on the redshift database.
I've tried this by getting the temp aws credentials and connecting with psycopg2, but it isn't successful without any error messages. (IE: the lambda just time out)
rs_host = "mytest-cluster.fooooooobaaarrrr.region111111.redshift.amazonaws.com"
rs_port = 5439
rs_dbname = "dev"
db_user = "barrr_user"
def lambda_handler(events, contx):
# The cluster_creds is able to be obtained successfully. No issses here
cluster_creds = client.get_cluster_credentials(DbUser=db_user,
DbName=rs_dbname,
ClusterIdentifier="mytest-cluster",
AutoCreate=False)
try:
# It is this psycopg2 connection that cant work...
conn = psycopg2.connect(host=rs_host,
port=rs_port,
user=cluster_creds['DbUser'],
password=cluster_creds['DbPassword'],
database=rs_dbname
)
return conn
except Exception as e:
print(e)
Also, the lambda execution role itself has these policies:
I am not sure why am I still not able to connect to redshift via python to perform queries.
I have also tried with the sqlalchemy libary but no luck there.
As what Johnathan Jacobson mentioned above. It was the security groups and network permissions that caused my problem.
You can maybe review the documentation at Create AWS Lambda Function to Connect Amazon Redshift with C-Sharp in Visual Studio
Since you have already your code in Python, you can concentrate on the networking part of the tutorial
While launching AWS Lambda functions, it is possible to select a VPC and subnet where the serverless lambda function servers will spinup
You can choose exactly the same VPC and the subnet(s) where you have created your Amazon Redshift cluster
Also, revise the IAM role you have attached to the AWS Lambda function. It requires additionally the AWSLambdaVPCAccessExecutionRole policy
This will be solving issues between connections from different VPCs
Again, even you have launched the lambda function in the same VPC and subnet with Redshift cluster, it is better to check the security group of the cluster so that it accepts connections
Hope it works,

AWS Lambda Function cannot access other services

I have a problem with an AWS Lambda Function which depends upon DynamoDB and SQS to function properly. When I try to run the lambda stack, they time out when trying to connect to the SQS service. The AWS Lambda Function lies inside a VPC with the following setup:
A VPC with four subnets
Two subsets are public, routing their 0.0.0.0/16 traffic to an internet gateway
A MySQL server sits in a public subnet
The other two contain the lambdas and route their 0.0.0.0/16 traffic to a NAT which lives in one of the public subnets.
All route tables have a 10.0.0.0/16 to local rule (is this the problem because Lambdas use private Ip's inside a VPC?)
The main rout table is the one with the NAT, but I explicitly associated the public nets with the internet gateway routing table
The lambdas and the mysql server share a security group which allows for inbound internal access (10.x/16) as well as unrestricted outbound traffic (0.0.0.0/16).
Traffic between lambdas and the mysql instance is no problem (except if I put the lambdas outside the VPC, then they can't access the server even if I open up all ports). Assume the code for the lambdas is also correct, as it worked before I tried to mask it in a private net. Also the lambda execution roles have been set accordingly (or do they need adjustments after moving them to a private net?).
Adding a dynamodb endpoint solved the problems with the database, but there are no VPC endpoints available for some of the other services. Following some answers I found here, here, here and in the announcements / tutorials here and here, I am pretty sure I followed all the recommended steps.
I would be very thankful and glad for any hints where to check next, as I have currently no idea what could be the problem here.
EDIT: The function don't seem to have any internet access at all, since a toy example I checked also timed out:
import urllib.request
def lambda_handler(event, context):
test = urllib.request.urlopen(url="http://www.google.de")
return test.status
Of course the problem was sitting in front of the monitor again. Instead of routing 0.0.0.0/0 (any traffic) to the internet gateway, I had just specified 0.0.0.0/16 (traffic from machines with an 0.0.x.x ip) to the gate. Since no machines with such ip exists any traffic was blocked from entering leaving the VPC.
#John Rotenstein: Thx, though for the hint about lambdash. It seems like a very helpful tool.
Your configuration sounds correct.
You should test the configuration to see whether you can access any public Internet sites, then test connecting to AWS.
You could either write a Lambda function that attempts such connections or you could use lambdash that effectively gives you a remote shell running on Lambda. This way, you can easily test connectivity from the command line, such as curl.

Python AWS lambda function connecting to RDS mysql: Timeout error

When the python lambda function is executed I get "Task timed out after 3.00 seconds" error. I am trying the same example function.
When I try to run the same code from eclipse it works fine and I can see the query result. Same way I can connect to the db instance from local-machine Mysql workbench without any issues.
I tried creating a role with with full administrator access policy for this lambda function and even then its not working fine. The db instance has a vpc and I just added my local ip address there using the edit CIDR option so I can access the instance through my local machine workbench. For VPC, subnet and security group parameter in lambda function I gave the same values as I have in the RDS db instance.
I have also increased the timeout for lambda function and still I see the timeout error.
Any input would be appreciated.
For VPC, subnet and security group parameter in lambda function I gave the same values as I have in the RDS db instance.
Security groups don't automatically trust their own members to access other members.
Add a rule to this security group for "MySQL" (TCP port 3306) but instead of specifying an IP address, start typing s g into the box and select the id of the security group that you are adding the rule to -- so that the group is self-referential.
Note that this is probably not the correct long-term fix, because if your Lambda function needs to access the Internet or most AWS services, the Lambda function needs to be on a private subnet behind a NAT device. That does not describe the configuration of the subnet where your RDS instance is currently configured, because you mentioned adding your local IP to allow access to RDS. That suggests your RDS is on a public subnet.
See also Why Do We Need Private Subnets in VPC for a better understanding of public vs. private subnets.

Unable to provision an AWS SQL Server RDS instance in Multi AZ using boto

I'm trying to provision an SQL Server Standard Edition AWS RDS instance which is mirrored across two AZs using boto's rds2.
Whenever I call the create_db_instance method in boto.rds2.layer1.RDSConnection with the appropriate arguments, I keep getting the following error:
boto.exception.JSONResponseError: JSONResponseError: 400 Bad Request
{'RequestId': 'fdc54b48-0586-11e5-951d-c3153310155b', 'Error': {'Message': 'To configure Multi-AZ for SQL Server DB Instances please apply or remove the "Mirroring" option using Option Groups.', 'Code': 'InvalidParameterCombination', 'Type': 'Sender'}}
I've verified that I'm setting the option multi_az = True and the option_group_name is set to an option group which has mirroring enabled. Here's my call to create_db_instance. Are there any other settings which need to be set before I can provision this RDS instance which is mirrored?
conn.create_db_instance(db_instance_identifier=new_db_name,
allocated_storage=allocated_storage,
db_instance_class=rds_instance_class,
master_username=master_username,
master_user_password=master_password,
port=port,
engine=rds_engine,
multi_az=rds_multi_az,
auto_minor_version_upgrade=auto_minor_version_upgrade,
db_subnet_group_name=rds_subnet_group,
license_model=license_model,
iops=iops,
vpc_security_group_ids=rds_vpc_security_group,
option_group_name=option_group_name
)
I'm also seeing another issue where I can either provision with IOPS or provision with Magnetic disks when I remove the iops option. But, I haven't figured out a way to provision with just General Purpose SSDs.
I was able to get around this issue by removing the multi_az option but using an option group which has mirroring set. This way, the RDS server is provisioned across multiple AZs without me specifying the multi_az option. My new call looks like this:
conn.create_db_instance(db_instance_identifier=new_db_name,
allocated_storage=allocated_storage,
db_instance_class=rds_instance_class,
master_username=master_username,
master_user_password=master_password,
port=port,
engine=rds_engine,
auto_minor_version_upgrade=auto_minor_version_upgrade,
db_subnet_group_name=rds_subnet_group,
license_model=license_model,
iops=iops,
vpc_security_group_ids=domain_controller_sg,
option_group_name=option_group_name
)

Categories

Resources