I've followed this, this and this and a lot of other resources and still can't get connected to the database. That's why I've opened this question. I'm beyond frustrated. Hopefully someone can steer me into the right direction. Below is the steps that I've done.
I did steps 1-3 from this article. By the way should I use the Application ID of the azure function or the Principal ID of the System Assigned Identity when creating a role for the function in the database? I've used the Application ID.
I've added all the possible outbound ip addresses of the azure function to pass the database firewall.
Function is on Linux Consumption plan. According to this article you need to use the 2017-09-01 api version if function is on Linux Consumption plan.
I did not find anything in the function properties/configuration on os.environ["MSI_ENDPOINT"], os.environ["MSI_SECRET"] so I'm assuming that these are being assigned by microsoft when the function gets executed. Here's the Exception that I'm getting when running the function:
"Exception while executing function: Functions.FunctionTrigger Result: Failure
Exception: UnboundLocalError: local variable 'connection' referenced before assignment. if connection:"
Furthermore, I can not see any logs even though I'm writing them in the function body. Not in the function insights nor in the storage account that is defined for the function. So basically I'm flying blind. Moreover, initially I was using psycopg2 and I was receiving the exception in here. Then I switched to psycopg2-binary and the exception went away. Any help would be appreciated.
import logging
import os
import azure.functions as func
import psycopg2
from psycopg2 import Error
import requests
def main(mytimer: func.TimerRequest) -> None:
utc_timestamp = datetime.datetime.utcnow().replace(
tzinfo=datetime.timezone.utc).isoformat()
if mytimer.past_due:
logging.info('The timer is past due!')
logging.info('Python timer trigger function ran at %s', utc_timestamp)
try:
#get access token
# identity_endpoint = os.environ["IDENTITY_ENDPOINT"]
# identity_header = os.environ["IDENTITY_HEADER"]
# resource_uri="https://database.windows.net/"
# token_auth_uri = f"{identity_endpoint}?resource={resource_uri}&api-version=2019-08-01"
# head_msi = {'X-IDENTITY-HEADER':identity_header}
# resp = requests.get(token_auth_uri, headers=head_msi)
# access_token = resp.json()['access_token']
msi_endpoint = os.environ["MSI_ENDPOINT"]
msi_header = os.environ["MSI_SECRET"]
# resource_uri="https://database.windows.net/"
resource_uri="https://ossrdbms-aad.database.windows.net"
token_auth_uri = f"{msi_endpoint}?resource={resource_uri}&api-version=2017-09-01"
head_msi = {'secret':msi_header}
resp = requests.get(token_auth_uri, headers=head_msi)
access_token = resp.json()['access_token']
logging.info(msi_endpoint)
logging.info(msi_header)
logging.info(access_token)
USER = 'name of the role that I created for the function'
connection = psycopg2.connect(
user = USER,
password = access_token,
host = HOST,
database = DB,
port = '5432'
)
cursor = connection.cursor()
query = "SELECT * FROM table;"
cursor.execute(query)
except (Exception, Error) as error:
print(error)
logging.info(error)
finally:
if connection:
cursor.close()
connection.close()
print("PostgreSQL connection is closed")
I am not sure if your requirement is to connect to Azure PostgreSQL using Managed Identity. If not, then the possible issue is using psycopg2. use psycopg2-binary instead. Add psycopg2-binary to your requirements.txt. This will fix the issue.
Related
I have a python application that is reading from mysql/mariadb, uses that to fetch data from an api and then inserts results into another table.
I had setup a module with a function to connect to the database and return the connection object that is passed to other functions/modules. However, I believe this might not be a correct approach. The idea was to have a small module that I could just call whenever I needed to connect to the db.
Also note, that I am using the same connection object during loops (and within the loop passing to the db_update module) and call close() when all is done.
I am also getting some warnings from the db sometimes, those mostly happen at the point where I call db_conn.close(), so I guess I am not handling the connection or session/engine correctly. Also, the connection id's in the log warning keep increasing, so that is another hint, that I am doing it wrong.
[Warning] Aborted connection 351 to db: 'some_db' user: 'some_user' host: '172.28.0.3' (Got an error reading communication packets)
Here is some pseudo code that represents the structure I currently have:
################
## db_connect.py
################
# imports ...
from sqlalchemy import create_engine
def db_connect():
# get env ...
db_string = f"mysql+pymysql://{db_user}:{db_pass}#{db_host}:{db_port}/{db_name}"
try:
engine = create_engine(db_string)
except Exception as e:
return None
db_conn = engine.connect()
return db_conn
################
## db_update.py
################
# imports ...
def db_insert(db_conn, api_result):
# ...
ins_qry = "INSERT INTO target_table (attr_a, attr_b) VALUES (:a, :b);"
ins_qry = text(ins_qry)
ins_qry = ins_qry.bindparams(a = value_a, b = value_b)
try:
db_conn.execute(ins_qry)
except Exception as e:
print(e)
return None
return True
################
## main.py
################
from sqlalchemy import text
from db_connect import db_connect
from db_update import db_insert
def run():
try:
db_conn = db_connect()
if not db_conn:
return False
except Exception as e:
print(e)
qry = "SELECT *
FROM some_table
WHERE some_attr IN (:some_value);"
qry = text(qry)
search_run_qry = qry.bindparams(
some_value = 'abc'
)
result_list = db_conn.execute(qry).fetchall()
for result_item in result_list:
## do stuff like fetching data from api for every record in the query result
api_result = get_api_data(...)
## insert into db:
db_ins_status = db_insert(db_conn, api_result)
## ...
db_conn.close
run()
EDIT: Another question:
a) Is it ok in a loop, that does an update on every iteration to use the same connection, or would it be wiser to instead pass the engine to the run() function and call db_conn = engine.connect() and db_conn.close() just before and after each update?
b) I am thinking about using ThreadPoolExecutor instead of the loop for the API calls. Would this have implications on how to use the connection, i.e. can I use the same connection for multiple threads that are doing updates to the same table?
Note: I am not using the ORM feature mostly because I have a strong DWH/SQL background (though not so much as DBA) and I am used to writing even complex sql queries. I am thinking about switching to just using PyMySQL connector for that reason.
Thanks in advance!
Yes you can return/pass connection object as parameter but what is the aim of db_connect method, except testing connection ? As I see there is no aim of this db_connect method therefore I would recommend you to do this as I done it before.
I would like to share a code snippet from one of my project.
def create_record(sql_query: str, data: tuple):
try:
connection = mysql_obj.connect()
db_cursor = connection.cursor()
db_cursor.execute(sql_query, data)
connection.commit()
return db_cursor, connection
except Exception as error:
print(f'Connection failed error message: {error}')
and then using this one as for another my need
db_cursor, connection, query_data = fetch_data(sql_query, query_data)
and after all my needs close the connection with this method and method call.
def close_connection(connection, db_cursor):
"""
This method used to close SQL server connection
"""
db_cursor.close()
connection.close()
and calling method
close_connection(connection, db_cursor)
I am not sure can I share my github my check this link please. Under model.py you can see database methods and to see how calling them check it main.py
Best,
Hasan.
I'm trying to create a Prefect task that receives as input an instance of PyMySQL connection, such as:
#task
def connect_db():
connection = pymysql.connect(user=user,
password=password,
host=host,
port=port,
db=db,
connect_timeout=5,
cursorclass=pymysql.cursors.DictCursor,
local_infile=True)
return connection
#task
def query_db(connection) -> Any:
query = 'SELECT * FROM myschema.mytable;'
with connection.cursor() as cur:
cur.execute(query)
rows = cur.fetchall()
return rows
#task
def get_df(rows) -> Any:
return pd.DataFrame(rows, dtype=str)
#task
def save_csv(df):
path = 'mypath'
df.to_csv(path, sep=';', index=False)
with Flow(FLOW_NAME) as f:
con = connect_db()
rows = query_db(con)
df = get_df(rows)
save_csv(df)
However, as I try to register the resulting flow, it raises "TypeError: cannot pickle 'socket' object". Going through Prefect's Docs, I've found built-in MySQL Tasks ( https://docs.prefect.io/api/latest/tasks/mysql.html#mysqlexecute), but they open and close connections each time they're called. Is there any way to pass a connection previously opened to a Prefect Task (or implement such thing as a connection manager)?
I tried to replicate your example but it registers fine. The most common way an error like this pops up is if you have a client in the global namespace that the flow uses. Prefect will try to serialize that upon registration. For example, the following code snippet will error if you try to register it:
import pymysql
connection = pymysql.connect(user=user,
password=password,
host=host,
port=port,
db=db,
connect_timeout=5,
cursorclass=pymysql.cursors.DictCursor,
local_infile=True)
#task
def query_db(connection) -> Any:
query = 'SELECT * FROM myschema.mytable;'
with connection.cursor() as cur:
cur.execute(query)
rows = cur.fetchall()
return rows
with Flow(FLOW_NAME) as f:
rows = query_db(connection)
This errors because the connection variable is serialized along with the flow object. You can work around this by storing your Flow as a script. See this link for more information:
https://docs.prefect.io/core/idioms/script-based.html#using-script-based-flow-storage
This will avoid the serialization of the Flow object and create that connection during runtime.
If this happens during runtime
If you encounter this error during runtime, there are two possible reasons you can see this. The first is Dask serializing it, and the second is from the Prefect checkpointing.
Dask uses cloudpickle to send the data to the workers across a network. So if you use Prefect with a DaskExecutor, it will use cloudpickle to send the tasks for execution. Thus, task inputs and outputs need to be serializable. In this scenario, you should instantiate the Client and perform the query inside a task (like you saw with the current MySQL Task implementation)
If you use a LocalExecutor, task outputs are serialized by default because checkpointing is on by default. You can toggle with by doing checkpoint=False when you define the task.
If you need further help, feel free to join the Prefect Slack channel at prefect.io/slack .
I have a MySQL server installed locally and I have Python code that accesses MySQL Database and executes a simple query:
from mysql.connector import connect
from mysql.connector import ProgrammingError
DB = {
'user':'andrei',
'password':'qwertttyy',
'host':'localhost',
'port':'3306',
'db':'my_database'
}
class Connection:
instance = None
def __new__(cls):
if not cls.instance:
try:
cls.instance = connect(**DB)
except:
raise
return cls.instance
def excuteDQL(query):
cnx = Connection()
cursor = cnx.cursor()
try:
cursor.execute(query)
return cursor.fetchall()
except ProgragrammingError as err:
print('You have an error in your MySQL syntax. Please check and retry')
return []
if __name__ == '__main__':
while True:
query = input('Enter a SQL query: ')
for tuple in executeDQL(query):
print(tuple)
If I go out there and find a cloud MySQL hosting service and pay for it, the access would be as easy as changing the DB mapping with different info?
I think it should be because the connection would still be over standard TCP/IP, except, in this case, it happens to come back the same machine that is emitting. I guess, under the hood, data is packed following TCP/IP rules up to the IP layer, and then these are transferred as IP Packets from the Python process through the OS Networking API to the MySQL Server listening to the port, without further down processing into the Access Layer since the packets never leave the machine, which I understand is the purpose of the Access Layer of the TCP/IP stack, that is, to abstract the physical road the data takes.
Did I say something coherent in my guessing?
If I'm wrong, How can I put a MySQL Server in the cloud?
Yes how you connect to the database would not change. It will be as simple as changing the host name and providing whatever credentials you need ( Access Token , User info, etc). The way you insert data doesn't change once you make a connection to the DB.
Here is a good script which should provide some info: https://gist.github.com/kirang89/7161185
I am currently using AWS Lambda (Python 3.6) to talk to a MySQL database. I also have Slack commands triggering the queries to the database. On occasion, I have noticed that I can change things directly through MySQL Workbench and then trigger a query through Slack which returns old values. I currently connect to MySQL outside of the python handler like this:
BOT_TOKEN = os.environ["BOT_TOKEN"]
ASSET_TABLE = os.environ["ASSET_TABLE"]
REGION_NAME = os.getenv('REGION_NAME', 'us-east-2')
DB_NAME = os.environ["DB_NAME"]
DB_PASSWORD = os.environ["DB_PASSWORD"]
DB_DATABASE = os.environ["DB_DATABASE"]
RDS_HOST = os.environ["RDS_HOST"]
port = os.environ["port"]
try:
conn = pymysql.connect(RDS_HOST, user=DB_NAME, passwd=DB_PASSWORD, db=DB_DATABASE, connect_timeout=5, cursorclass=pymysql.cursors.DictCursor)
cursor = conn.cursor()
except:
sys.exit()
The MySQL connection is done outside of any definition at the very top of my program. When Slack sends a command, I call another definition that then queries MySQL. This works okay sometimes, but other times can send my old data that has not updated. The whole layout is like this:
imports
SQL connections
SQL query definitions
handler definition
I tried moving the MySQL connection portion inside of the handler, but then the SQL query definitions do not recognize my cursor (out of scope, I guess).
So my question is, how do I handle this MySQL connection? Is it best to keep the MySQL connection outside of any definitions? Should I open and close the connection each time? Why is my data stale? Will Lambda ALWAYS run the entire routine or can it try to split the load between servers (I swear I read somewhere that I cannot rely on Lambda to always read my entire routine; sometimes it just reads the handler)?
I'm pretty new to all this, so any suggestions are much appreciated. Thanks!
Rest of the code if it helps:
################################################################################################################################################################################################################
# Slack Lambda handler.
################################################################################################################################################################################################################
################################################################################################################################################################################################################
# IMPORTS
###############
import sys
import os
import pymysql
import urllib
import math
################################################################################################################################################################################################################
################################################################################################################################################################################################################
# Grab data from AWS environment.
###############
BOT_TOKEN = os.environ["BOT_TOKEN"]
ASSET_TABLE = os.environ["ASSET_TABLE"]
REGION_NAME = os.getenv('REGION_NAME', 'us-east-2')
DB_NAME = os.environ["DB_NAME"]
DB_PASSWORD = os.environ["DB_PASSWORD"]
DB_DATABASE = os.environ["DB_DATABASE"]
RDS_HOST = os.environ["RDS_HOST"]
port = os.environ["port"]
################################################################################################################################################################################################################
################################################################################################################################################################################################################
# Attempt SQL connection.
###############
try:
conn = pymysql.connect(RDS_HOST, user=DB_NAME, passwd=DB_PASSWORD, db=DB_DATABASE, connect_timeout=5, cursorclass=pymysql.cursors.DictCursor)
cursor = conn.cursor()
except:
sys.exit()
################################################################################################################################################################################################################
# Define the URL of the targeted Slack API resource.
SLACK_URL = "https://slack.com/api/chat.postMessage"
################################################################################################################################################################################################################
# Function Definitions.
###############
def get_userExistance(user):
statement = f"SELECT 1 FROM slackDB.users WHERE userID LIKE '%{user}%' LIMIT 1"
cursor.execute(statement, args=None)
userExists = cursor.fetchone()
return userExists
def set_User(user):
statement = f"INSERT INTO `slackDB`.`users` (`userID`) VALUES ('{user}');"
cursor.execute(statement, args=None)
conn.commit()
return
################################################################################################################################################################################################################
################################################################################################################################################################################################################
# Slack command interactions.
###############
def lambda_handler(data, context):
# Slack challenge answer.
if "challenge" in data:
return data["challenge"]
# Grab the Slack channel data.
slack_event = data['event']
slack_userID = slack_event['user']
slack_text = slack_event['text']
channel_id = slack_event['channel']
slack_reply = ""
# Check sql connection.
try:
conn = pymysql.connect(RDS_HOST, user=DB_NAME, passwd=DB_PASSWORD, db=DB_DATABASE, connect_timeout=5, cursorclass=pymysql.cursors.DictCursor)
cursor = conn.cursor()
except pymysql.OperationalError:
connected = 0
else:
connected = 1
# Ignore bot messages.
if "bot_id" in slack_event:
slack_reply = ""
else:
# Start data sift.
if slack_text.startswith("!addme"):
if get_userExistance(slack_userID):
slack_reply = f"User {slack_userID} already exists"
else:
slack_reply = f"Adding user {slack_userID}"
set_user(slack_userID)
# We need to send back three pieces of information:
data = urllib.parse.urlencode(
(
("token", BOT_TOKEN),
("channel", channel_id),
("text", slack_reply)
)
)
data = data.encode("ascii")
# Construct the HTTP request that will be sent to the Slack API.
request = urllib.request.Request(
SLACK_URL,
data=data,
method="POST"
)
# Add a header mentioning that the text is URL-encoded.
request.add_header(
"Content-Type",
"application/x-www-form-urlencoded"
)
# Fire off the request!
urllib.request.urlopen(request).read()
# Everything went fine.
return "200 OK"
################################################################################################################################################################################################################
All of the code outside the lambda handler is only run once per container. All code inside the handler is run every time the lambda is invoked.
A lambda container lasts for between 10 and 30 minutes depending on usage. A new lambda invocation may or may not run on an already running container.
It's possible you are invoking a lambda in a container that is over 5 minutes old where your connection has timed out.
I am working on a web service with Twisted that is responsible for calling up several packages I had previously used on the command line. The routines these packages handle were being prototyped on their own but now are ready to be integrated into our webservice.
In short, I have several different modules that all create a mysql connection property internally in their original command line forms. Take this for example:
class searcher:
def __init__(self,lat,lon,radius):
self.conn = getConnection()[1]
self.con=self.conn.cursor();
self.mgo = getConnection(True)
self.lat = lat
self.lon = lon
self.radius = radius
self.profsinrange()
self.cache = memcache.Client(["173.220.194.84:11211"])
The getConnection function is just a helper that returns a mongo or mysql cursor respectively. Again, this is all prototypical :)
The problem I am experiencing is when implemented as a consistently running server using Twisted's WSGI resource, the sql connection created in init times out, and subsequent requests don't seem to regenerate it. Example code for small server app:
from twisted.web import server
from twisted.web.wsgi import WSGIResource
from twisted.python.threadpool import ThreadPool
from twisted.internet import reactor
from twisted.application import service, strports
import cgi
import gnengine
import nn
wsgiThreadPool = ThreadPool()
wsgiThreadPool.start()
# ensuring that it will be stopped when the reactor shuts down
reactor.addSystemEventTrigger('after', 'shutdown', wsgiThreadPool.stop)
def application(environ, start_response):
start_response('200 OK', [('Content-type','text/plain')])
params = cgi.parse_qs(environ['QUERY_STRING'])
try:
lat = float(params['lat'][0])
lon = float(params['lon'][0])
radius = int(params['radius'][0])
query_terms = params['query']
s = gnengine.searcher(lat,lon,radius)
query_terms = ' '.join( query_terms )
json = s.query(query_terms)
return [json]
except Exception, e:
return [str(e),str(params)]
return ['error']
wsgiAppAsResource = WSGIResource(reactor, wsgiThreadPool, application)
# Hooks for twistd
application = service.Application('Twisted.web.wsgi Hello World Example')
server = strports.service('tcp:8080', server.Site(wsgiAppAsResource))
server.setServiceParent(application)
The first few requests work fine, but after mysqls wait_timeout expires, the dread error 2006 "Mysql has gone away" error surfaces. It had been my understanding that every request to the WSGI Twisted resource would run the application function, thereby regenerating the searcher object and re-leasing the connection. If this isn't the case, how can I make the requests processed as such? Is this kind of Twisted deployment not transactional in this sense? Thanks!
EDIT: Per request, here is the prototype helper function calling up the connection:
def getConnection(mong = False):
if mong == False:
connection = mysql.connect(host = db_host,
user = db_user,
passwd = db_pass,
db = db,
cursorclass=mysql.cursors.DictCursor)
cur = connection.cursor();
return (cur,connection)
else:
return pymongo.Connection('173.220.194.84',27017).gonation_test
i was developing a piece of software with twisted where i had to utilize a constant MySQL database connection. i did run into this problem and digging through the twisted documentation extensively and posting a few questions i was unable to find a proper solution.There is a boolean parameter you can pass when you are instantiating the adbapi.connectionPool class; however it never seemed to work and i kept getting the error irregardless. However, what i am guessing the reconnect boolean represents is the destruction of the connection object when SQL disconnect does occur.
adbapi.ConnectionPool("MySQLdb", cp_reconnect=True, host="", user="", passwd="", db="")
I have not tested this but i will re-post some results when i do or if anyone else has please share.
When i was developing the script i was using twisted 8.2.0 (i havent touched twisted in a while) and back then the framework had no such explicit keep alive method, so i developed a ping/keepalive extension employing event driven paradigm twisted builds upon in conjunction with direct MySQLdb module ping() method (see code comment).
As i was typing this response; however, i did look around the current twisted documentation i was still unable to find an explicit keep-alive method or parameter. My guess is because twisted itself does not have database connectivity libraries/classes. It uses the methods available to python and provides an indirect layer of interfacing with those modules; with some exposure for direct calls to the database library being used. This is accomplished by using the adbapi.runWithConnection method.
here is the module i wrote under twisted 8.2.0 and python 2.6; you can set the intervals between pings. what the script does is, every 20 minutes it pings the database and if it fails, it attempts to reconnect back to it every 60 seconds. I must warn that the script does NOT handle sudden/dropped connection; that you can handle through addErrback whenever you run a query through twisted, atleast thats how i did it. I have noticed that whenever database connection drops, you can only find out if it has when you are executing a query and the event raises an errback, and then at that point you deal with it. Basically, if i dont run a query for 10 minutes, and my database disconnects me, my application will not respond in real time. the application will realize the connection has been dropped when it runs the query that follows; so the database could have disconnected us 1 minute after the first query, 5, 9, etc....
I guess this sort of goes back to the original idea that i have stated, twisted utilizes python's own libraries or 3rd party libraries for database connectivity and because of that, some things are handled a bit differently.
from twisted.enterprise import adbapi
from twisted.internet import reactor, defer, task
class sqlClass:
def __init__(self, db_pointer):
self.dbpool=db_pointer
self.dbping = task.LoopingCall(self.dbping)
self.dbping.start(1200) #20 minutes = 1200 seconds; i found out that if MySQL socket is idled for 20 minutes or longer, MySQL itself disconnects the session for security reasons; i do believe you can change that in the configuration of the database server itself but it may not be recommended.
self.reconnect=False
print "database ping initiated"
def dbping(self):
def ping(conn):
conn.ping() #what happens here is that twisted allows us to access methods from the MySQLdb module that python posesses; i chose to use the native command instead of sending null commands to the database.
pingdb=self.dbpool.runWithConnection(ping)
pingdb.addCallback(self.dbactive)
pingdb.addErrback(self.dbout)
print "pinging database"
def dbactive(self, data):
if data==None and self.reconnect==True:
self.dbping.stop()
self.reconnect=False
self.dbping.start(1200) #20 minutes = 1200 seconds
print "Reconnected to database!"
elif data==None:
print "database is active"
def dbout(self, deferr):
#print deferr
if self.reconnect==False:
self.dbreconnect()
elif self.reconnect==True:
print "Unable to reconnect to database"
print "unable to ping MySQL database!"
def dbreconnect(self, *data):
self.dbping.stop()
self.reconnect=True
#self.dbping = task.LoopingCall(self.dbping)
self.dbping.start(60) #60
if __name__ == "__main__":
db = sqlClass(adbapi.ConnectionPool("MySQLdb", cp_reconnect=True, host="", user="", passwd="", db=""))
reactor.callLater(2, db.dbping)
reactor.run()
let me know how it works out for you :)