link da questão
Briefly, I want to know what this "DB-API" mechanism is.
Are there multiple DB-APIs (there are more than 1 DB-API)?
Is it just a 'rules' document?
have a source code?
What is it for?
Is psycopg2 an example of a DB-API or is it a library that follows DB-APIs standards?
Is the DB-API specified in SQLAlchemy a SQLAlchemy-specific DB-API (if that is possible)?
I think that's it !!!
Regarding the dialect, I ask another question later.
The python db api is defined in https://www.python.org/dev/peps/pep-0249/ and I believe just a spec or as you say rules document.
Modules like psycopg2 fulfill those requirements, so are an implementation of that api. SqlAlchemy allows you to swap out which db api implementation you use so you can change your underlying database server or use features offered by another driver/db api implementation and still use the same database server.
As I understand it SqlAlchemy supports multiple db api implementations which you specify using a connection uri, explained here https://docs.sqlalchemy.org/en/13/core/engines.html#database-urls.
Related
Why are both a connection and a cursor needed in the Python database API specification? In the context of SQLite3 these objects seem redundant, as shown:
in the question "Why do you need to create a cursor when querying a sqlite database?".
even in SQLite3's official documentation, which calls cursor objects "often superfluous" and provides shortcut methods which act on connections instead of cursors.
The main reason for the existence in SQLite3 of both connection and cursor objects seems to be compliance with the Python database API specification v2.0. This specification is applicable not just to SQLite interfaces such as SQLite3 but to a range of databases in Python. From a design perspective, why is it beneficial to differentiate between a connection and a cursor? Are there any conceptual or efficiency advantages?
Doc says:
Hooks are interfaces to external platforms and databases like Hive, S3, MySQL, Postgres, HDFS, and Pig. Hooks implement a common interface when possible, and act as a building block for operators. Ref
But why do we need them?
I want to select data from one Postgres DB, and store to another one. Can I use, for example, psycopg2 driver inside python script, which runs by a python operator, or airflow should know for some reason what exactly I'm doing inside script, so, I need to use PostgresHook instead of just psycopg2 driver?
You should use just PostresHook. Instead of using psycopg2 as so:
conn = f'{pass}:{server}#host etc}'
cur = conn.cursor()
cur.execute(query)
data = cur.fetchall()
You can just type:
postgres = PostgresHook('connection_id')
data = postgres.get_pandas_df(query)
Which can also make use of encryption of connections.
So using hooks is cleaner, safer and easier.
While it is possible to just hardcode the connections in your script and run it, the power of hooks will allow to edit environment variables from within the UI.
Have a look at "Automate AWS Tasks Thanks to Airflow Hooks" to learn a bit more about how to use hooks.
AWS Recently launched the Data API. This simplifies creating Lambda functions, eliminating the necessity for additional complexity by allowing API calls, instead of direct database connections.
I'm trying to use SQLAlchemy in an AWS Lambda Function, and I'd really like to take advantage of this new API.
Does anyone know if there is any support for this, or if support for this is coming?
Alternatively, how difficult would it be to create a new Engine to support this?
SQLAlchemy calls database drivers "dialects". So if you're using SQLAlchemy with PostgreSQL and using psycopg2 as the driver, then you're using the psycopg2 dialect of PostgreSQL.
I was looking for the same thing as you, and found no existing solution, so I wrote my own and published it. To use the AWS Aurora RDS Data API, I created a SQL dialect package for it, sqlalchemy-aurora-data-api. This in turn required me to write a DB-API compatible Python DB driver for Aurora Data API, aurora-data-api. After installing with pip install sqlalchemy-aurora-data-api, you can use it like this:
from sqlalchemy import create_engine
cluster_arn = "arn:aws:rds:us-east-1:123456789012:cluster:my-aurora-serverless-cluster"
secret_arn = "arn:aws:secretsmanager:us-east-1:123456789012:secret:MY_DB_CREDENTIALS"
engine = create_engine('postgresql+auroradataapi://:#/my_db_name',
echo=True,
connect_args=dict(aurora_cluster_arn=cluster_arn, secret_arn=secret_arn))
with engine.connect() as conn:
for result in conn.execute("select * from pg_catalog.pg_tables"):
print(result)
As an alternative, if you want something more like Records, you can try Camus https://github.com/rizidoro/camus.
XTA (XA Transaction API, http://www.tiian.org/lixa/XTA.html) is a new API that has been developed inside the LIXA project to support two phase commit transactions in the context of FaaS (Function as a Service) and microservice oriented, polyglot applications.
The API already supports C and C++ languages; it aims to support many more, at the bare minimum Python, PHP and Java.
I'm currently working on supporting Python with PostgreSQL and MySQL, this mail thread is related to Python/MySQL.
XTA is implemented in C language and XTA for Python is generated using SWIG: I would like to repeat the approach for all the languages that provides drivers derived from MySQL C API
Now the request for help: XTA needs to enlist all the resource managers (here MySQL) to manage them using 2 phase commit, basically it requires a pointer (MYSQL *) that must be passed to MysqlXaResource constructor http://www.tiian.org/lixa/manuals/xta/CPP/classxta_1_1MysqlXaResource.html to create an XTA object associated to an already opened MySQL connection.
Here are the basic steps of a Python example program (https://github.com/tiian/lixa/blob/master/doc/examples/xta/python/example_xta_sa21.py):
# initialize XTA environment
Xta_Init()
# create a new MySQL connection
# Note: using _mysql or MySQLdb functions
rm2 = MySQLdb.connect("localhost", "lixa", "", "lixa")
# alternatively, usign _mysql
rm2 = _mysql.connect("localhost", "lixa", "", "lixa")
# create a new XTA Transaction Manager object
tm = TransactionManager()
# create an XA resource for MySQL
#
# how to retrieve MYSQL * from rm2 ?
xar2 = MysqlXaResource(rm2.???, "PostgreSQL", "dbname=testdb")
Looking at the last statement, the stack is:
XTA native C library expects "MYSQL *" to register the connection handler
XTA C++ wrapper expects "MYSQL *" as the first parameter to construct the object
XTA Python (SWIG generated) wrapper expects a "SWIG generated" MYSQL * pointer (it can be changed with another well known type by mean of "typemap" directive (https://github.com/tiian/lixa/blob/master/src/xta/python/xta.i )
_mysql.connect() and MySQLdb.connect() do not provide me something equivalent to MYSQL *, at least it seems so to me.
Do you have any hint about retrieving something like a PyCapsule initialized with the MYSQL * native connection?
Thanks in advance for your help.
Regards,
Ch.F.
After some documentation reading I have found out no clear solution.
Here is a pull-request I have proposed the development team to create a "get_native_connection()" method: https://github.com/PyMySQL/mysqlclient-python/pull/269
I've looked over Google Cloud SQL's documentation and various searches, but I can't find out whether it is possible to use SQLAlchemy with Google Cloud SQL, and if so, what the connection URI should be.
I'm looking to use the Flask-SQLAlchemy extension and need the connection string like so:
mysql://username:password#server/db
I saw the Django example, but it appears the configuration uses a different style than the connection string. https://developers.google.com/cloud-sql/docs/django
Google Cloud SQL documentation:
https://developers.google.com/cloud-sql/docs/developers_guide_python
Update
Google Cloud SQL now supports direct access, so the MySQLdb dialect can now be used. The recommended connection via the mysql dialect is using the URL format:
mysql+mysqldb://root#/<dbname>?unix_socket=/cloudsql/<projectid>:<instancename>
mysql+gaerdbms has been deprecated in SQLAlchemy since version 1.0
I'm leaving the original answer below in case others still find it helpful.
For those who visit this question later (and don't want to read through all the comments), SQLAlchemy now supports Google Cloud SQL as of version 0.7.8 using the connection string / dialect (see: docs):
mysql+gaerdbms:///<dbname>
E.g.:
create_engine('mysql+gaerdbms:///mydb', connect_args={"instance":"myinstance"})
I have proposed an update to the mysql+gaerdmbs:// dialect to support both of Google Cloud SQL APIs (rdbms_apiproxy and rdbms_googleapi) for connecting to Cloud SQL from a non-Google App Engine production instance (ex. your development workstation). The change will also modify the connection string slightly by including the project and instance as part of the string, and not require being passed separately via connect_args.
E.g.
mysql+gaerdbms:///<dbname>?instance=<project:instance>
This will also make it easier to use Cloud SQL with Flask-SQLAlchemy or other extension where you don't explicitly make the create_engine() call.
If you are having trouble connecting to Google Cloud SQL from your development workstation, you might want to take a look at my answer here - https://stackoverflow.com/a/14287158/191902.
Yes,
If you find any bugs in SA+Cloud SQL, please let me know. I wrote the dialect code that was integrated into SQLAlchemy. There's a bit of silly business about how Cloud SQL bubbles up exceptions, so there might be some loose ends there.
For those who prefer PyMySQL over MySQLdb (which is suggested in the accepted answer), the SQLAlchemy connection strings are:
For Production
mysql+pymysql://<USER>:<PASSWORD>#/<DATABASE_NAME>?unix_socket=/cloudsql/<PUT-SQL-INSTANCE-CONNECTION-NAME-HERE>
Please make sure to
Add the SQL instance to your app.yaml:
beta_settings:
cloud_sql_instances: <PUT-SQL-INSTANCE-CONNECTION-NAME-HERE>
Enable the SQL Admin API as it seems to be necessary:
https://console.developers.google.com/apis/api/sqladmin.googleapis.com/overview
For Local Development
mysql+pymysql://<USER>:<PASSWORD>#localhost:3306/<DATABASE_NAME>
given that you started the Cloud SQL Proxy with:
cloud_sql_proxy -instances=<PUT-SQL-INSTANCE-CONNECTION-NAME-HERE>=tcp:3306
it is doable, though I haven't used Flask at all so I'm not sure about establishing the connection through that. I got it working through Pyramid and submitted a patch to SQLAlchemy (possibly to the wrong repo) here:
https://bitbucket.org/sqlalchemy/sqlalchemy/pull-request/2/added-a-dialect-for-google-app-engines
That has since been replaced and accepted into SQLAlchemy as
http://www.sqlalchemy.org/trac/ticket/2484
I don't think it's made it way to a release though.
There are some issues with Google SQL throwing different exceptions so we had issues with things like deploying a database automatically. You also need to disable connection pooling using NullPool as mentioned in the second patch.
We've since moved to using the datastore through NDB so I haven't followed the progess of these fixes for a while..
PostgreSQL, pg8000 and flask_sqlalchemy
Adding information in case someone is on the lookout how to use flask_sqlalchemy with PostgreSQL: Using pg8000 as driver, the working connection string is
postgres+pg8000://<db_user>:<db_pass>#/<db_name>