I have a simple web2py server that we use to visualize data from our PostgreSQL Server. The following functions are all part of the global models in web2py.
The current solution to fetch data is very simple. Every time I connect, and after I get the data I close the connection:
# Old way:
# (imports excluded)
def get_data(query):
postgres_connection = psycopg2.connect("credentials")
df = psql.frame_query(query, con=postgres_connection) # Pandas function to put data from query into DataFrame
postgres.close()
return df
For small queries, opening and closing the connection takes about 9/10 of the time run the function.
Is this a good way to do it instead? If not, what is a better way?
# Better way?
def connect():
"""
Create a connection to server.
"""
return psycopg2.connect("credentials")
db_connection = connect()
def create_pandas_frame(query):
"""
Get query if connection is open.
"""
return psql.frame_query(query, con=db_connection)
def get_data(query):
"""
Try to get data, open a new conneciton if connection is closed.
"""
try:
data = create_pandas_frame(query)
except:
global db_connection
db_connection = connect()
data = create_pandas_frame(query)
return data
If you run that code in a web2py model file, you'll end up creating a new connection on each HTTP request anyway. Instead, you might consider connection pooling.
An easier option might be to use the web2py DAL to fetch the data. Something like:
from pandas.core.api import DataFrame
db = DAL([db connection string], pool_size=10, migrate_enabled=False)
rows = db.executesql(query)
data = DataFrame.from_records(rows, columns=[list, of, column, names])
If you specify the pool_size argument to DAL(), it will automatically maintain a connection pool to be used across requests.
Note, I haven't tried this, so it may need some tweaking, but something along these lines should work.
If you'd like, you can even use the DAL to generate the SQL by defining database table models:
db.define_table('mytable',
Field('field1', 'integer'),
Field('field2', 'double'),
Field('field3', 'boolean'))
rows = db.executesql(db(db.mytable.id > 0)._select())
data = DataFrame.from_records(rows, columns=db.mytable.fields)
The ._select() method just generates the SQL without actually doing the select. The SQL is then passed to .executesql() to fetch the data.
An alternative is to create a special Pandas processor and pass it as the processor argument to .select().
def pandas_processor(rows, fields, columns, cacheable):
return DataFrame.from_records(rows, columns=columns)
data = db(db.mytable.id > 0).select(processor=pandas_processor)
I used Anthony's answer and now have functions that look like this:
# In one of the models files.
from pandas.core.api import DataFrame
external_db = DAL('postgres://connection_stuff',pool_size=10,migrate_enabled=False)
def create_simple_html_table(query):
dict_from_db = external_db.executesql(query, as_dict=True)
return DataFrame(dict_from_db).to_html()
Then later in a view or controller a html table is created using:
# In Controller:
my_table = create_simple_html_table('select * from random_table limit 50')
# In View:
{{=XML(create_simple_html_table('select * from random_table limit 50'))}}
I still need to do more testing, but my understanding so far is that this solution will let me query things from the external db and let web2py keep the connection, and let web2py use the same connection for all users.
Note that this solution is only good if all you want to do is to read and write to you Postgres server with raw SQL.
If you want to use DAL to read and write, you need to either try to find the DAL alternative called MyDAL or play around with the search_path option in Postgres.
Related
I'm using Python (and Peewee) to connect to a SQLite database. My data access layer (DAL) is a mix of peewee ORM and SQL-based functions. I would like to enable EXPLAIN PLAN for all queries upon connecting to the database and toggle it via configuration or CLI parameter ... how can I do that using the Python API?
from playhouse.db_url import connect
self._logger.info("opening db connection to database, creating cursor and initializing orm model ...")
self.__db = connect(url)
# add support for a REGEXP and POW implementation
# TODO: this should be added only for the SQLite case and doesn't apply to other vendors.
self.__db.connection().create_function("REGEXP", 2, regexp)
self.__db.connection().create_function("POW", 2, pow)
self.__cursor = self.__db.cursor()
self.__cursor.arraysize = 100
# what shall I do here to enable EXPLAIN PLANs?
That is a feature of the sqlite interactive shell. To get the query plans, you will need to explicitly request it. This is not quite straightforward with Peewee because it uses parameterized queries. You can get the SQL executed by peewee in a couple of ways.
# Print all queries to stderr.
import logging
logger = logging.getLogger('peewee')
logger.addHandler(logging.StreamHandler())
logger.setLevel(logging.DEBUG)
Or for an individual query:
query = SomeModel.select()
sql, params = query.sql()
# To get the query plan:
curs = db.execute_sql('EXPLAIN ' + sql, params)
print(curs.fetchall()) # prints query plan
I have the following code in flask
sql = text('select * from person')
results = self.db.engine.execute(sql)
for row in results:
print(".............", row) # prints nothing
people = Person.query.all() # shows all person data
Now given this situation, it's obvious, the self.db is not using the same connection somehow that Person.query is using. However, given this situation, can I get the connection somehow from Person.query object?
PS. This is for testing and I'm using SQLite3 database. I tried this in postgres, but outcome is the same.
Just figured out. Try Person.query.session.execute(sql). Voila!
Below is my current code. It connects successfully to the organization. How can I fetch the results of a query in Azure like they have here? I know this was solved but there isn't an explanation and there's quite a big gap on what they're doing.
from azure.devops.connection import Connection
from msrest.authentication import BasicAuthentication
from azure.devops.v5_1.work_item_tracking.models import Wiql
personal_access_token = 'xxx'
organization_url = 'zzz'
# Create a connection to the org
credentials = BasicAuthentication('', personal_access_token)
connection = Connection(base_url=organization_url, creds=credentials)
wit_client = connection.clients.get_work_item_tracking_client()
results = wit_client.query_by_id("my query ID here")
P.S. Please don't link me to the github or documentation. I've looked at both extensively for days and it hasn't helped.
Edit: I've added the results line that successfully gets the query. However, it returns a WorkItemQueryResult class which is not exactly what is needed. I need a way to view the column and results of the query for that column.
So I've figured this out in probably the most inefficient way possible, but hope it helps someone else and they find a way to improve it.
The issue with the WorkItemQueryResult class stored in variable "result" is that it doesn't allow the contents of the work item to be shown.
So the goal is to be able to use the get_work_item method that requires the id field, which you can get (in a rather roundabout way) through item.target.id from results' work_item_relations. The code below is added on.
for item in results.work_item_relations:
id = item.target.id
work_item = wit_client.get_work_item(id)
fields = work_item.fields
This gets the id from every work item in your result class and then grants access to the fields of that work item, which you can access by fields.get("System.Title"), etc.
I'm new to Python and SQLAlchemy. I've been playing about with retrieving things from the database, and it's worked every time, but im a little unsure what to do when the select statement will return multiple rows. I tried using some older code that worked before I started SQLAlchemy, but db is a SQLAlchemy object and doesn't have the execute() method.
application = Applications.query.filter_by(brochureID=brochure.id)
cur = db.execute(application)
entries = cur.fetchall()
and then in my HTML file
{% for entry in entries %}
var getEmail = {{entry.2|tojson|safe}}
emailArray.push(getEmail);
I looked in the SQLAlchemy documentation and I couldn't find a .first() equivalent to getting all the rows. Can anyone point me in the right direction? No doubt it's something very small.
Your query is correct, you just need to change the way you interact with the result. The method you are looking for is all().
application = Applications.query.filter_by(brochureID=brochure.id)
entries = application.all()
the Usual way to work with orm queries is through the Session class, somewhere you should have a
engine = sqlalchemy.create_engine("sqlite:///...")
Session = sqlalchemy.orm.sessionmaker(bind=engine)
I'm not familiar with flask, but it likely does some of this work for you.
With a Session factory, your application is instead
session = Session()
entries = session.query(Application) \
.filter_by(...) \
.all()
I've got a web-application which is built with Pyramid/SQLAlchemy/Postgresql and allows users to manage some data, and that data is almost completely independent for different users. Say, Alice visits alice.domain.com and is able to upload pictures and documents, and Bob visits bob.domain.com and is also able to upload pictures and documents. Alice never sees anything created by Bob and vice versa (this is a simplified example, there may be a lot of data in multiple tables really, but the idea is the same).
Now, the most straightforward option to organize the data in the DB backend is to use a single database, where each table (pictures and documents) has user_id field, so, basically, to get all Alice's pictures, I can do something like
user_id = _figure_out_user_id_from_domain_name(request)
pictures = session.query(Picture).filter(Picture.user_id==user_id).all()
This is all easy and simple, however there are some disadvantages
I need to remember to always use additional filter condition when making queries, otherwise Alice may see Bob's pictures;
If there are many users the tables may grow huge
It may be tricky to split the web application between multiple machines
So I'm thinking it would be really nice to somehow split the data per-user. I can think of two approaches:
Have separate tables for Alice's and Bob's pictures and documents within the same database (Postgres' Schemas seems to be a correct approach to use in this case):
documents_alice
documents_bob
pictures_alice
pictures_bob
and then, using some dark magic, "route" all queries to one or to the other table according to the current request's domain:
_use_dark_magic_to_configure_sqlalchemy('alice.domain.com')
pictures = session.query(Picture).all() # selects all Alice's pictures from "pictures_alice" table
...
_use_dark_magic_to_configure_sqlalchemy('bob.domain.com')
pictures = session.query(Picture).all() # selects all Bob's pictures from "pictures_bob" table
Use a separate database for each user:
- database_alice
- pictures
- documents
- database_bob
- pictures
- documents
which seems like the cleanest solution, but I'm not sure if multiple database connections would require much more RAM and other resources, limiting the number of possible "tenants".
So, the question is, does it all make sense? If yes, how do I configure SQLAlchemy to either modify the table names dynamically on each HTTP request (for option 1) or to maintain a pool of connections to different databases and use the correct connection for each request (for option 2)?
After pondering on jd's answer I was able to achieve the same result for postgresql 9.2, sqlalchemy 0.8, and flask 0.9 framework:
from sqlalchemy import event
from sqlalchemy.pool import Pool
#event.listens_for(Pool, 'checkout')
def on_pool_checkout(dbapi_conn, connection_rec, connection_proxy):
tenant_id = session.get('tenant_id')
cursor = dbapi_conn.cursor()
if tenant_id is None:
cursor.execute("SET search_path TO public, shared;")
else:
cursor.execute("SET search_path TO t" + str(tenant_id) + ", shared;")
dbapi_conn.commit()
cursor.close()
Ok, I've ended up with modifying search_path in the beginning of every request, using Pyramid's NewRequest event:
from pyramid import events
def on_new_request(event):
schema_name = _figire_out_schema_name_from_request(event.request)
DBSession.execute("SET search_path TO %s" % schema_name)
def app(global_config, **settings):
""" This function returns a WSGI application.
It is usually called by the PasteDeploy framework during
``paster serve``.
"""
....
config.add_subscriber(on_new_request, events.NewRequest)
return config.make_wsgi_app()
Works really well, as long as you leave transaction management to Pyramid (i.e. do not commit/roll-back transactions manually, letting Pyramid to do that at the end of request) - which is ok as committing transactions manually is not a good approach anyway.
What works very well for me it to set the search path at the connection pool level, rather than in the session. This example uses Flask and its thread local proxies to pass the schema name so you'll have to change schema = current_schema._get_current_object() and the try block around it.
from sqlalchemy.interfaces import PoolListener
class SearchPathSetter(PoolListener):
'''
Dynamically sets the search path on connections checked out from a pool.
'''
def __init__(self, search_path_tail='shared, public'):
self.search_path_tail = search_path_tail
#staticmethod
def quote_schema(dialect, schema):
return dialect.identifier_preparer.quote_schema(schema, False)
def checkout(self, dbapi_con, con_record, con_proxy):
try:
schema = current_schema._get_current_object()
except RuntimeError:
search_path = self.search_path_tail
else:
if schema:
search_path = self.quote_schema(con_proxy._pool._dialect, schema) + ', ' + self.search_path_tail
else:
search_path = self.search_path_tail
cursor = dbapi_con.cursor()
cursor.execute("SET search_path TO %s;" % search_path)
dbapi_con.commit()
cursor.close()
At engine creation time:
engine = create_engine(dsn, listeners=[SearchPathSetter()])