AWS Lambda & MySQL Connection Handling - python

I am currently using AWS Lambda (Python 3.6) to talk to a MySQL database. I also have Slack commands triggering the queries to the database. On occasion, I have noticed that I can change things directly through MySQL Workbench and then trigger a query through Slack which returns old values. I currently connect to MySQL outside of the python handler like this:
BOT_TOKEN = os.environ["BOT_TOKEN"]
ASSET_TABLE = os.environ["ASSET_TABLE"]
REGION_NAME = os.getenv('REGION_NAME', 'us-east-2')
DB_NAME = os.environ["DB_NAME"]
DB_PASSWORD = os.environ["DB_PASSWORD"]
DB_DATABASE = os.environ["DB_DATABASE"]
RDS_HOST = os.environ["RDS_HOST"]
port = os.environ["port"]
try:
conn = pymysql.connect(RDS_HOST, user=DB_NAME, passwd=DB_PASSWORD, db=DB_DATABASE, connect_timeout=5, cursorclass=pymysql.cursors.DictCursor)
cursor = conn.cursor()
except:
sys.exit()
The MySQL connection is done outside of any definition at the very top of my program. When Slack sends a command, I call another definition that then queries MySQL. This works okay sometimes, but other times can send my old data that has not updated. The whole layout is like this:
imports
SQL connections
SQL query definitions
handler definition
I tried moving the MySQL connection portion inside of the handler, but then the SQL query definitions do not recognize my cursor (out of scope, I guess).
So my question is, how do I handle this MySQL connection? Is it best to keep the MySQL connection outside of any definitions? Should I open and close the connection each time? Why is my data stale? Will Lambda ALWAYS run the entire routine or can it try to split the load between servers (I swear I read somewhere that I cannot rely on Lambda to always read my entire routine; sometimes it just reads the handler)?
I'm pretty new to all this, so any suggestions are much appreciated. Thanks!
Rest of the code if it helps:
################################################################################################################################################################################################################
# Slack Lambda handler.
################################################################################################################################################################################################################
################################################################################################################################################################################################################
# IMPORTS
###############
import sys
import os
import pymysql
import urllib
import math
################################################################################################################################################################################################################
################################################################################################################################################################################################################
# Grab data from AWS environment.
###############
BOT_TOKEN = os.environ["BOT_TOKEN"]
ASSET_TABLE = os.environ["ASSET_TABLE"]
REGION_NAME = os.getenv('REGION_NAME', 'us-east-2')
DB_NAME = os.environ["DB_NAME"]
DB_PASSWORD = os.environ["DB_PASSWORD"]
DB_DATABASE = os.environ["DB_DATABASE"]
RDS_HOST = os.environ["RDS_HOST"]
port = os.environ["port"]
################################################################################################################################################################################################################
################################################################################################################################################################################################################
# Attempt SQL connection.
###############
try:
conn = pymysql.connect(RDS_HOST, user=DB_NAME, passwd=DB_PASSWORD, db=DB_DATABASE, connect_timeout=5, cursorclass=pymysql.cursors.DictCursor)
cursor = conn.cursor()
except:
sys.exit()
################################################################################################################################################################################################################
# Define the URL of the targeted Slack API resource.
SLACK_URL = "https://slack.com/api/chat.postMessage"
################################################################################################################################################################################################################
# Function Definitions.
###############
def get_userExistance(user):
statement = f"SELECT 1 FROM slackDB.users WHERE userID LIKE '%{user}%' LIMIT 1"
cursor.execute(statement, args=None)
userExists = cursor.fetchone()
return userExists
def set_User(user):
statement = f"INSERT INTO `slackDB`.`users` (`userID`) VALUES ('{user}');"
cursor.execute(statement, args=None)
conn.commit()
return
################################################################################################################################################################################################################
################################################################################################################################################################################################################
# Slack command interactions.
###############
def lambda_handler(data, context):
# Slack challenge answer.
if "challenge" in data:
return data["challenge"]
# Grab the Slack channel data.
slack_event = data['event']
slack_userID = slack_event['user']
slack_text = slack_event['text']
channel_id = slack_event['channel']
slack_reply = ""
# Check sql connection.
try:
conn = pymysql.connect(RDS_HOST, user=DB_NAME, passwd=DB_PASSWORD, db=DB_DATABASE, connect_timeout=5, cursorclass=pymysql.cursors.DictCursor)
cursor = conn.cursor()
except pymysql.OperationalError:
connected = 0
else:
connected = 1
# Ignore bot messages.
if "bot_id" in slack_event:
slack_reply = ""
else:
# Start data sift.
if slack_text.startswith("!addme"):
if get_userExistance(slack_userID):
slack_reply = f"User {slack_userID} already exists"
else:
slack_reply = f"Adding user {slack_userID}"
set_user(slack_userID)
# We need to send back three pieces of information:
data = urllib.parse.urlencode(
(
("token", BOT_TOKEN),
("channel", channel_id),
("text", slack_reply)
)
)
data = data.encode("ascii")
# Construct the HTTP request that will be sent to the Slack API.
request = urllib.request.Request(
SLACK_URL,
data=data,
method="POST"
)
# Add a header mentioning that the text is URL-encoded.
request.add_header(
"Content-Type",
"application/x-www-form-urlencoded"
)
# Fire off the request!
urllib.request.urlopen(request).read()
# Everything went fine.
return "200 OK"
################################################################################################################################################################################################################

All of the code outside the lambda handler is only run once per container. All code inside the handler is run every time the lambda is invoked.
A lambda container lasts for between 10 and 30 minutes depending on usage. A new lambda invocation may or may not run on an already running container.
It's possible you are invoking a lambda in a container that is over 5 minutes old where your connection has timed out.

Related

accessing postgresql database in python using functions

Let me start off by saying I am extremely new to Python and Postgresql so I feel like I'm in way over my head. My end goal is to get connected to the dvdrental database in postgresql and be able to access/manipulate the data. So far I have:
created a .config folder and a database.ini is within there with my login credentials.
in my src i have a config.py folder and use config parser, see below:
def config(filename='.config/database.ini', section='postgresql'):
# create a parser
parser = ConfigParser()
# read config file
parser.read(filename)
# get section, default to postgresql
db = {}
if parser.has_section(section):
params = parser.items(section)
for param in params:
db[param[0]] = param[1]
else:
raise Exception('Section {0} not found in the {1} file'.format(section, filename))
return db
then also in my src I have a tasks.py file that has a basic connect function, see below:
import pandas as pd
from clients.config import config
import psycopg
def connect():
""" Connect to the PostgreSQL database server """
conn = None
try:
# read connection parameters
params = config()
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg.connect(**params)
# create a cursor
cur = conn.cursor()
# execute a statement
print('PostgreSQL database version:')
cur.execute('SELECT version()')
# display the PostgreSQL database server version
db_version = cur.fetchone()
print(db_version)
# close the communication with the PostgreSQL
cur.close()
except (Exception, psycopg.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
print('Database connection closed.')
if __name__ == '__main__':
connect()
Now this runs and prints out the Postgresql database version which is all well & great but I'm struggling to figure out how to change the code so that it's more generalized and maybe just creates a cursor?
I need the connect function to basically just connect to the dvdrental database and create a cursor so that I can then use my connection to select from the database in other needed "tasks" -- for example I'd like to be able to create another function like the below:
def select_from_table(cursor, table_name, schema):
cursor.execute(f"SET search_path TO {schema}, public;")
results= cursor.execute(f"SELECT * FROM {table_name};").fetchall()
return results
but I'm struggling with how to just create a connection to the dvdrental database & a cursor so that I'm able to actually fetch data and create pandas tables with it and whatnot.
so it would be like
task 1 is connecting to the database
task 2 is interacting with the database (selecting tables and whatnot)
task 3 is converting the result from 2 into a pandas df
thanks so much for any help!! This is for a project in a class I am taking and I am extremely overwhelmed and have been googling-researching non-stop and seemingly end up nowhere fast.
The fact that you established the connection is honestly the hardest step. I know it can be overwhelming but you're on the right track.
Just copy these three lines from connect into the select_from_table method
params = config()
conn = psycopg.connect(**params)
cursor = conn.cursor()
It will look like this (also added conn.close() at the end):
def select_from_table(cursor, table_name, schema):
params = config()
conn = psycopg.connect(**params)
cursor = conn.cursor()
cursor.execute(f"SET search_path TO {schema}, public;")
results= cursor.execute(f"SELECT * FROM {table_name};").fetchall()
conn.close()
return results

Python SQLalchemy - can I pass the connection object between functions?

I have a python application that is reading from mysql/mariadb, uses that to fetch data from an api and then inserts results into another table.
I had setup a module with a function to connect to the database and return the connection object that is passed to other functions/modules. However, I believe this might not be a correct approach. The idea was to have a small module that I could just call whenever I needed to connect to the db.
Also note, that I am using the same connection object during loops (and within the loop passing to the db_update module) and call close() when all is done.
I am also getting some warnings from the db sometimes, those mostly happen at the point where I call db_conn.close(), so I guess I am not handling the connection or session/engine correctly. Also, the connection id's in the log warning keep increasing, so that is another hint, that I am doing it wrong.
[Warning] Aborted connection 351 to db: 'some_db' user: 'some_user' host: '172.28.0.3' (Got an error reading communication packets)
Here is some pseudo code that represents the structure I currently have:
################
## db_connect.py
################
# imports ...
from sqlalchemy import create_engine
def db_connect():
# get env ...
db_string = f"mysql+pymysql://{db_user}:{db_pass}#{db_host}:{db_port}/{db_name}"
try:
engine = create_engine(db_string)
except Exception as e:
return None
db_conn = engine.connect()
return db_conn
################
## db_update.py
################
# imports ...
def db_insert(db_conn, api_result):
# ...
ins_qry = "INSERT INTO target_table (attr_a, attr_b) VALUES (:a, :b);"
ins_qry = text(ins_qry)
ins_qry = ins_qry.bindparams(a = value_a, b = value_b)
try:
db_conn.execute(ins_qry)
except Exception as e:
print(e)
return None
return True
################
## main.py
################
from sqlalchemy import text
from db_connect import db_connect
from db_update import db_insert
def run():
try:
db_conn = db_connect()
if not db_conn:
return False
except Exception as e:
print(e)
qry = "SELECT *
FROM some_table
WHERE some_attr IN (:some_value);"
qry = text(qry)
search_run_qry = qry.bindparams(
some_value = 'abc'
)
result_list = db_conn.execute(qry).fetchall()
for result_item in result_list:
## do stuff like fetching data from api for every record in the query result
api_result = get_api_data(...)
## insert into db:
db_ins_status = db_insert(db_conn, api_result)
## ...
db_conn.close
run()
EDIT: Another question:
a) Is it ok in a loop, that does an update on every iteration to use the same connection, or would it be wiser to instead pass the engine to the run() function and call db_conn = engine.connect() and db_conn.close() just before and after each update?
b) I am thinking about using ThreadPoolExecutor instead of the loop for the API calls. Would this have implications on how to use the connection, i.e. can I use the same connection for multiple threads that are doing updates to the same table?
Note: I am not using the ORM feature mostly because I have a strong DWH/SQL background (though not so much as DBA) and I am used to writing even complex sql queries. I am thinking about switching to just using PyMySQL connector for that reason.
Thanks in advance!
Yes you can return/pass connection object as parameter but what is the aim of db_connect method, except testing connection ? As I see there is no aim of this db_connect method therefore I would recommend you to do this as I done it before.
I would like to share a code snippet from one of my project.
def create_record(sql_query: str, data: tuple):
try:
connection = mysql_obj.connect()
db_cursor = connection.cursor()
db_cursor.execute(sql_query, data)
connection.commit()
return db_cursor, connection
except Exception as error:
print(f'Connection failed error message: {error}')
and then using this one as for another my need
db_cursor, connection, query_data = fetch_data(sql_query, query_data)
and after all my needs close the connection with this method and method call.
def close_connection(connection, db_cursor):
"""
This method used to close SQL server connection
"""
db_cursor.close()
connection.close()
and calling method
close_connection(connection, db_cursor)
I am not sure can I share my github my check this link please. Under model.py you can see database methods and to see how calling them check it main.py
Best,
Hasan.

Can't connect to Azure Postgresql with a Python Function App

I've followed this, this and this and a lot of other resources and still can't get connected to the database. That's why I've opened this question. I'm beyond frustrated. Hopefully someone can steer me into the right direction. Below is the steps that I've done.
I did steps 1-3 from this article. By the way should I use the Application ID of the azure function or the Principal ID of the System Assigned Identity when creating a role for the function in the database? I've used the Application ID.
I've added all the possible outbound ip addresses of the azure function to pass the database firewall.
Function is on Linux Consumption plan. According to this article you need to use the 2017-09-01 api version if function is on Linux Consumption plan.
I did not find anything in the function properties/configuration on os.environ["MSI_ENDPOINT"], os.environ["MSI_SECRET"] so I'm assuming that these are being assigned by microsoft when the function gets executed. Here's the Exception that I'm getting when running the function:
"Exception while executing function: Functions.FunctionTrigger Result: Failure
Exception: UnboundLocalError: local variable 'connection' referenced before assignment. if connection:"
Furthermore, I can not see any logs even though I'm writing them in the function body. Not in the function insights nor in the storage account that is defined for the function. So basically I'm flying blind. Moreover, initially I was using psycopg2 and I was receiving the exception in here. Then I switched to psycopg2-binary and the exception went away. Any help would be appreciated.
import logging
import os
import azure.functions as func
import psycopg2
from psycopg2 import Error
import requests
def main(mytimer: func.TimerRequest) -> None:
utc_timestamp = datetime.datetime.utcnow().replace(
tzinfo=datetime.timezone.utc).isoformat()
if mytimer.past_due:
logging.info('The timer is past due!')
logging.info('Python timer trigger function ran at %s', utc_timestamp)
try:
#get access token
# identity_endpoint = os.environ["IDENTITY_ENDPOINT"]
# identity_header = os.environ["IDENTITY_HEADER"]
# resource_uri="https://database.windows.net/"
# token_auth_uri = f"{identity_endpoint}?resource={resource_uri}&api-version=2019-08-01"
# head_msi = {'X-IDENTITY-HEADER':identity_header}
# resp = requests.get(token_auth_uri, headers=head_msi)
# access_token = resp.json()['access_token']
msi_endpoint = os.environ["MSI_ENDPOINT"]
msi_header = os.environ["MSI_SECRET"]
# resource_uri="https://database.windows.net/"
resource_uri="https://ossrdbms-aad.database.windows.net"
token_auth_uri = f"{msi_endpoint}?resource={resource_uri}&api-version=2017-09-01"
head_msi = {'secret':msi_header}
resp = requests.get(token_auth_uri, headers=head_msi)
access_token = resp.json()['access_token']
logging.info(msi_endpoint)
logging.info(msi_header)
logging.info(access_token)
USER = 'name of the role that I created for the function'
connection = psycopg2.connect(
user = USER,
password = access_token,
host = HOST,
database = DB,
port = '5432'
)
cursor = connection.cursor()
query = "SELECT * FROM table;"
cursor.execute(query)
except (Exception, Error) as error:
print(error)
logging.info(error)
finally:
if connection:
cursor.close()
connection.close()
print("PostgreSQL connection is closed")
I am not sure if your requirement is to connect to Azure PostgreSQL using Managed Identity. If not, then the possible issue is using psycopg2. use psycopg2-binary instead. Add psycopg2-binary to your requirements.txt. This will fix the issue.

How to get the column names in redshift using Python boto3

I want to get the column names in redshift using python boto3
Creaed Redshift Cluster
Insert Data into it
Configured Secrets Manager
Configure SageMaker Notebook
Open the Jupyter Notebook wrote the below code
import boto3
import time
client = boto3.client('redshift-data')
response = client.execute_statement(ClusterIdentifier = "test", Database= "dev", SecretArn= "{SECRET-ARN}",Sql= "SELECT `COLUMN_NAME` FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_SCHEMA`='dev' AND `TABLE_NAME`='dojoredshift'")
I got the response but there is no table schema inside it
Below is the code i used to connect I am getting timed out
import psycopg2
HOST = 'xx.xx.xx.xx'
PORT = 5439
USER = 'aswuser'
PASSWORD = 'Password1!'
DATABASE = 'dev'
def db_connection():
conn = psycopg2.connect(host=HOST,port=PORT,user=USER,password=PASSWORD,database=DATABASE)
return conn
How to get the ip address go to https://ipinfo.info/html/ip_checker.php
pass your hostname of redshiftcluster xx.xx.us-east-1.redshift.amazonaws.com or you can see in cluster page itself
I got the error while running above code
OperationalError: could not connect to server: Connection timed out
Is the server running on host "x.xx.xx..xx" and accepting
TCP/IP connections on port 5439?
I fixed with the code, and add the above the rules
import boto3
import psycopg2
# Credentials can be set using different methodologies. For this test,
# I ran from my local machine which I used cli command "aws configure"
# to set my Access key and secret access key
client = boto3.client(service_name='redshift',
region_name='us-east-1')
#
#Using boto3 to get the Database password instead of hardcoding it in the code
#
cluster_creds = client.get_cluster_credentials(
DbUser='awsuser',
DbName='dev',
ClusterIdentifier='redshift-cluster-1',
AutoCreate=False)
try:
# Database connection below that uses the DbPassword that boto3 returned
conn = psycopg2.connect(
host = 'redshift-cluster-1.cvlywrhztirh.us-east-1.redshift.amazonaws.com',
port = '5439',
user = cluster_creds['DbUser'],
password = cluster_creds['DbPassword'],
database = 'dev'
)
# Verifies that the connection worked
cursor = conn.cursor()
cursor.execute("SELECT VERSION()")
results = cursor.fetchone()
ver = results[0]
if (ver is None):
print("Could not find version")
else:
print("The version is " + ver)
except:
logger.exception('Failed to open database connection.')
print("Failed")

How to handle MySQL connection(s) with Python multithreading

I have a main Python script which connects to a MySQL database and pulls out few records from it. Based on the result returned it starts as many threads (class instances) as many records are grabbed. Each thread should go back to the database and update another table by setting one status flag to a different state ("process started").
To achieve this I tried to:
1.) Pass the database connection to all threads
2.) Open a new database connection from each thread
but none of them were working.
I could run my update without any issue in both cases by using try/except, but the MySQL table has not been updated, and no error was generated. I used commit in both cases.
My question would be how to handle MySQL connection(s) in such a case?
Update based on the first few comments:
MAIN SCRIPT
-----------
#Connecting to DB
db = MySQLdb.connect(host = db_host,
db = db_db,
port = db_port,
user = db_user,
passwd = db_password,
charset='utf8')
# Initiating database cursor
cur = db.cursor()
# Fetching records for which I need to initiate a class instance
cur.execute('SELECT ...')
for row in cur.fetchall() :
# Initiating new instance, appending it to a list and
# starting all of them
CLASS WHICH IS INSTANTIATED
---------------------------
# Connecting to DB again. I also tried to pass connection
# which has been opened in the main script but it did not
# work either.
db = MySQLdb.connect(host = db_host,
db = db_db,
port = db_port,
user = db_user,
passwd = db_password,
charset='utf8')
# Initiating database cursor
cur_class = db.cursor()
cur.execute('UPDATE ...')
db.commit()
Here is an example using multithreading deal mysql in Python, I don't know
your table and data, so, just change the code may help:
import threading
import time
import MySQLdb
Num_Of_threads = 5
class myThread(threading.Thread):
def __init__(self, conn, cur, data_to_deal):
threading.Thread.__init__(self)
self.threadID = threadID
self.conn = conn
self.cur = cur
self.data_to_deal
def run(self):
# add your sql
sql = 'insert into table id values ({0});'
for i in self.data_to_deal:
self.cur.execute(sql.format(i))
self.conn.commit()
threads = []
data_list = [1,2,3,4,5]
for i in range(Num_Of_threads):
conn = MySQLdb.connect(host='localhost',user='root',passwd='',db='')
cur = conn.cursor()
new_thread = myThread(conn, cur, data_list[i])
for th in threads:
th.start()
for t in threads:
t.join()
It seems there's no problem with my code but with my MySQL version. I'm using MySQL standard community edition and based on the official documentation found here :
The thread pool plugin is a commercial feature. It is not included in MySQL community distributions.
I'm about to upgrade to MariaDB to solve this issue.
Looks like mysql 5.7 does support multithreading.
As you tried previously - absolutely make sure to pass the connection within the def worker(). defining the connections globally was my mistake
Here's sample code that prints 10 records via 5 threads, 5 times
import MySQLdb
import threading
def write_good_proxies():
local_db = MySQLdb.connect("localhost","username","PassW","DB", port=3306 )
local_cursor = local_db.cursor (MySQLdb.cursors.DictCursor)
sql_select = 'select http from zproxies where update_time is null order by rand() limit 10'
local_cursor.execute(sql_select)
records = local_cursor.fetchall()
id_list = [f['http'] for f in records]
print id_list
def worker():
x=0
while x< 5:
x = x+1
write_good_proxies()
threads = []
for i in range(5):
print i
t = threading.Thread(target=worker)
threads.append(t)
t.start()

Categories

Resources