I got a Database which have Text formatted Dates, now i need to filter specific Date ranges.
This Query works for me:
SELECT CDate(field) AS df
FROM table
WHERE CDate(field)=Date();
Sadly i didn't found how i can use a SQLAlchemy Query like this.
This works for me:
import sqlalchemy as sa
import sqlalchemy_access as sa_a
# …
tbl = sa.Table("so71529087", sa.MetaData(), sa.Column("field", sa_a.DateTime))
tbl.drop(engine, checkfirst=True)
tbl.create(engine)
qry = sa.select(sa.func.CDate(tbl.c.field).label("df")).where(
sa.func.CDate(tbl.c.field) == sa.text("Date()")
)
print(qry)
"""
SELECT CDate(so71529087.field) AS df
FROM so71529087
WHERE CDate(so71529087.field) = Date()
"""
with engine.begin() as conn:
results = conn.execute(qry).all()
I'm trying to copy a database using SQLAlchemy. The first attempt was:
from from sqlalchemy import create_engine, MetaData
from sqlalchemy.orm import sessionmaker
from urls import engine_urls
engine1 = create_engine(engine_urls[0])
engine2 = create_engine(engine_urls[1])
metadata = MetaData()
metadata.reflect(engine1)
tables = metadata.tables
metadata.create_all(engine2)
Session1 = sessionmaker(bind=engine1)
from sqlalchemy import insert
with Session1.begin() as session:
for key in tables:
table_object = tables[key]
for row in session.query(table_object):
s = insert(table_object).\
values(**dict(zip(row.keys(), row)))
engine2.execute(s)
But this code does not work since the order in which inserts are done is arbitrary and this violates FK constraints. For example, inserting a child before a parent will cause such a violation. How could I achieve this task? Is there a part of the framework that would do this easily? I can't find it.
Here is what I use. Works well.
from sqlalchemy import create_engine, MetaData, event
from sqlalchemy.sql import sqltypes
# Requires SQLALCHEMY 1.4+
src_engine = create_engine("sqlite:///mydb.sqlite")
src_metadata = MetaData(bind=src_engine)
exclude_tables = ('sqlite_master', 'sqlite_sequence', 'sqlite_temp_master')
tgt_engine = create_engine("postgresql+psycopg2://#localhost/ngas")
tgt_metadata = MetaData(bind=tgt_engine)
#event.listens_for(src_metadata, "column_reflect")
def genericize_datatypes(inspector, tablename, column_dict):
column_dict["type"] = column_dict["type"].as_generic(allow_nulltype=True)
tgt_conn = tgt_engine.connect()
tgt_metadata.reflect()
# drop all tables in target database
for table in reversed(tgt_metadata.sorted_tables):
if table.name not in exclude_tables:
print('dropping table =', table.name)
table.drop()
# # Delete all data in target database
# for table in reversed(tgt_metadata.sorted_tables):
# table.delete()
tgt_metadata.clear()
tgt_metadata.reflect()
src_metadata.reflect()
# create all tables in target database
for table in src_metadata.sorted_tables:
if table.name not in exclude_tables:
table.create(bind=tgt_engine)
# refresh metadata before you can copy data
tgt_metadata.clear()
tgt_metadata.reflect()
# Copy all data from src to target
for table in tgt_metadata.sorted_tables:
src_table = src_metadata.tables[table.name]
stmt = table.insert()
for index, row in enumerate(src_table.select().execute()):
print("table =", table.name, "Inserting row", index)
stmt.execute(row._asdict())
if anyone had difficulties executing the proposed
routine as a solution because "stmt.execute(row._asdict())"
generates an error in version 1.4, here is an alternative
that I successfully produced:
# Copy all data from src to target
for table in tgt_metadata.sorted_tables:
src_table = src_metadata.tables[table.name]
for index, row in enumerate(src_table.select().execute()):
print("table =", table.name, "Inserting row", index, '>>', dict(row))
stmt = table.insert().values(row._asdict())
tgt_conn.execute(stmt)
tgt_conn.commit()
When I tried to join two tables I got the following error:
sqlalchemy.exc.ObjectNotExecutableError: Not an executable object: sqlalchemy.sql.selectable.Join at 0x7f31a35b02e8; Join object on
chanel(139851192912136) and Device(139851192912864)
My code is:
import sqlalchemy as db
from sqlalchemy import and_,or_,not_,inspect,text,inspection
engine = db.create_engine("mssql+pymssql://sa:elnetsrv#192.108.55.95/ELNetDB")
Data1 = db.Table("chanel", metadata, autoload=True, autoload_with=engine)
Data2 = db.Table("Device",metadata,autoload = True,autoload_with = engine)
metadata = db.MetaData()
j = Data1.join(Data2,Data1.columns.No == Data2.columns.ID)
print(engine.execute(j))
Data1.join(Data2,Data1.columns.No == Data2.columns.ID) is not executable because it is not a query object.
You can try this instead (assuming you want to select every column from Data1):
print( engine.execute(select([Data1]).select_from(j) )
see https://docs.sqlalchemy.org/en/13/core/metadata.html#sqlalchemy.schema.Table.join for reference.
Is there a solution converting a SQLAlchemy <Query object> to a pandas DataFrame?
Pandas has the capability to use pandas.read_sql but this requires use of raw SQL. I have two reasons for wanting to avoid it:
I already have everything using the ORM (a good reason in and of itself) and
I'm using python lists as part of the query, e.g.:
db.session.query(Item).filter(Item.symbol.in_(add_symbols) where Item is my model class and add_symbols is a list). This is the equivalent of SQL SELECT ... from ... WHERE ... IN.
Is anything possible?
Below should work in most cases:
df = pd.read_sql(query.statement, query.session.bind)
See pandas.read_sql documentation for more information on the parameters.
Just to make this more clear for novice pandas programmers, here is a concrete example,
pd.read_sql(session.query(Complaint).filter(Complaint.id == 2).statement,session.bind)
Here we select a complaint from complaints table (sqlalchemy model is Complaint) with id = 2
For completeness sake: As alternative to the Pandas-function read_sql_query(), you can also use the Pandas-DataFrame-function from_records() to convert a structured or record ndarray to DataFrame.
This comes in handy if you e.g. have already executed the query in SQLAlchemy and have the results already available:
import pandas as pd
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker
SQLALCHEMY_DATABASE_URI = 'postgresql://postgres:postgres#localhost:5432/my_database'
engine = create_engine(SQLALCHEMY_DATABASE_URI, pool_pre_ping=True, echo=False)
db = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))
Base = declarative_base(bind=engine)
class Currency(Base):
"""The `Currency`-table"""
__tablename__ = "currency"
__table_args__ = {"schema": "data"}
id = Column(Integer, primary_key=True, nullable=False)
name = Column(String(64), nullable=False)
# Defining the SQLAlchemy-query
currency_query = db.query(Currency).with_entities(Currency.id, Currency.name)
# Getting all the entries via SQLAlchemy
currencies = currency_query.all()
# We provide also the (alternate) column names and set the index here,
# renaming the column `id` to `currency__id`
df_from_records = pd.DataFrame.from_records(currencies
, index='currency__id'
, columns=['currency__id', 'name'])
print(df_from_records.head(5))
# Or getting the entries via Pandas instead of SQLAlchemy using the
# aforementioned function `read_sql_query()`. We can set the index-columns here as well
df_from_query = pd.read_sql_query(currency_query.statement, db.bind, index_col='id')
# Renaming the index-column(s) from `id` to `currency__id` needs another statement
df_from_query.index.rename(name='currency__id', inplace=True)
print(df_from_query.head(5))
The selected solution didn't work for me, as I kept getting the error
AttributeError: 'AnnotatedSelect' object has no attribute 'lower'
I found the following worked:
df = pd.read_sql_query(query.statement, engine)
If you want to compile a query with parameters and dialect specific arguments, use something like this:
c = query.statement.compile(query.session.bind)
df = pandas.read_sql(c.string, query.session.bind, params=c.params)
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
engine = create_engine('postgresql://postgres:postgres#localhost:5432/DB', echo=False)
Base = declarative_base(bind=engine)
Session = sessionmaker(bind=engine)
session = Session()
conn = session.bind
class DailyTrendsTable(Base):
__tablename__ = 'trends'
__table_args__ = ({"schema": 'mf_analysis'})
company_code = Column(DOUBLE_PRECISION, primary_key=True)
rt_bullish_trending = Column(Integer)
rt_bearish_trending = Column(Integer)
rt_bullish_non_trending = Column(Integer)
rt_bearish_non_trending = Column(Integer)
gen_date = Column(Date, primary_key=True)
df_query = select([DailyTrendsTable])
df_data = pd.read_sql(rt_daily_query, con = conn)
Using the 2.0 SQLalchemy syntax (available also in 1.4 with the flag future=True) it looks that pd.read_sql is not implemented yet and it will raise:
NotImplementedError: This method is not implemented for SQLAlchemy 2.0.
This is an open issue that won't be solved till pandas 2.0, you can find some information about this here and here.
I didn't find any satisfactory work around, but some people seems to be using two configurations of the engine, one with the flag future False:
engine2 = create_engine(URL_string, echo=False, future=False)
This solution would be OK if you query strings, but using the ORM, the best I could do is a custom function yet to be optimized, but it works:
Conditions = session.query(ExampleTable)
def df_from_sql(query):
return pd.DataFrame([i.__dict__ for i in query]).drop(columns='_sa_instance_state')
df = df_from_sql(ExampleTable)
This solution in any case would be provisional till pd.read_sql has implemented the new syntax.
When you're using the ORM it's as simple as this:
pd.DataFrame([r._asdict() for r in query.all()])
Good alternative to pd.read_sql when you don't want to expose sql and sessions to the business logic code.
Found it here: https://stackoverflow.com/a/52208023/1635525
This answer provides a reproducible example using an SQL Alchemy select statement and returning a pandas data frame. It is based on an in memory SQLite database so that anyone can reproduce it without installing a database engine.
import pandas
from sqlalchemy import create_engine
from sqlalchemy import MetaData, Table, Column, Text
from sqlalchemy.orm import Session
Define table metadata and create a table
engine = create_engine('sqlite://')
meta = MetaData()
meta.bind = engine
user_table = Table('user', meta,
Column("name", Text),
Column("full_name", Text))
user_table.create()
Insert some data into the user table
stmt = user_table.insert().values(name='Bob', full_name='Sponge Bob')
with Session(engine) as session:
result = session.execute(stmt)
session.commit()
Read the result of a select statement into a pandas data frame
# Select data into a pandas data frame
stmt = user_table.select().where(user_table.c.name == 'Bob')
df = pandas.read_sql_query(stmt, engine)
df
Out:
name full_name
0 Bob Sponge Bob
if use SQL query
def generate_df_from_sqlquery(query):
from pandas import DataFrame
query = db.session.execute(query)
df = DataFrame(query.fetchall())
if len(df) > 0:
df.columns = query.keys()
else:
columns = query.keys()
df = pd.DataFrame(columns=columns)
return df
profile_df = generate_df_from_sqlquery(profile_query)
Simple example using the CursorResult.keys() method to get the column names.
import sqlalchemy as sa
import pandas as pd
engine = sa.create_engine(...)
with engine.connect() as conn:
result = conn.execute("SELECT * FROM foo;")
df = pd.DataFrame(result.all(), columns=result.keys())
https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.Result.keys
Adding to answers using read_sql like #van, when my query involved a join, sqlalchemy seemed to be implicitly adding aliased columns from the join tables like id_1, id_2 incase the join tables and primary table both had an id column for example. Using .all() removes these implicit columns before returning results but read_sql will include these columns.
Solutions for that case for me was to be explicit on my selects. So I replaced
query = session.query(model)
with
query = session.query(model.col_1, model.col_2)
or for select all
query = session.query(*model.__table__.columns.values())
then
df = pd.read_sql(query.statement, query.session.bind)
from sqlalchemy import create_engine, MetaData, ForeignKey
engine = create_engine("mysql://user:passwd#localhost/shema", echo=False)
meta = MetaData(engine, True)
conn = engine.connect()
tb_list = meta.tables["tb_list"]
tb_data = meta.tables["tb_data"]
tb_list.c.i_data.append_foreign_key( ForeignKey(tb_data.c.i_id) )
q = tb_list.outerjoin(tb_data).select()
res = conn.execute(q)
And now, how can I get columns type of query result res
One of decisions:
res._key_cache[ col_name ][0]
Do you know something else ?
you'd say:
types = [col.type for col in q.columns]
the (compiled) statement is on the result too if you feel like digging:
types = [col.type for col in res.context.compiled.statement.columns]
if you want the DBAPI version of the types, which is a little more varied based on DBAPI:
types = [elem[1] for elem in res.cursor.description]
maybe we'll look into adding this kind of metadata more directly to the ResultProxy.