SQLAlchemy ORM conversion to pandas DataFrame

SQLAlchemy ORM conversion to pandas DataFrame - python

Is there a solution converting a SQLAlchemy <Query object> to a pandas DataFrame?
Pandas has the capability to use pandas.read_sql but this requires use of raw SQL. I have two reasons for wanting to avoid it:
I already have everything using the ORM (a good reason in and of itself) and
I'm using python lists as part of the query, e.g.:
db.session.query(Item).filter(Item.symbol.in_(add_symbols) where Item is my model class and add_symbols is a list). This is the equivalent of SQL SELECT ... from ... WHERE ... IN.
Is anything possible?

Below should work in most cases:
df = pd.read_sql(query.statement, query.session.bind)
See pandas.read_sql documentation for more information on the parameters.

Just to make this more clear for novice pandas programmers, here is a concrete example,
pd.read_sql(session.query(Complaint).filter(Complaint.id == 2).statement,session.bind)
Here we select a complaint from complaints table (sqlalchemy model is Complaint) with id = 2

For completeness sake: As alternative to the Pandas-function read_sql_query(), you can also use the Pandas-DataFrame-function from_records() to convert a structured or record ndarray to DataFrame.
This comes in handy if you e.g. have already executed the query in SQLAlchemy and have the results already available:
import pandas as pd
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker
SQLALCHEMY_DATABASE_URI = 'postgresql://postgres:postgres#localhost:5432/my_database'
engine = create_engine(SQLALCHEMY_DATABASE_URI, pool_pre_ping=True, echo=False)
db = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))
Base = declarative_base(bind=engine)
class Currency(Base):
"""The `Currency`-table"""
__tablename__ = "currency"
__table_args__ = {"schema": "data"}
id = Column(Integer, primary_key=True, nullable=False)
name = Column(String(64), nullable=False)
# Defining the SQLAlchemy-query
currency_query = db.query(Currency).with_entities(Currency.id, Currency.name)
# Getting all the entries via SQLAlchemy
currencies = currency_query.all()
# We provide also the (alternate) column names and set the index here,
# renaming the column `id` to `currency__id`
df_from_records = pd.DataFrame.from_records(currencies
, index='currency__id'
, columns=['currency__id', 'name'])
print(df_from_records.head(5))
# Or getting the entries via Pandas instead of SQLAlchemy using the
# aforementioned function `read_sql_query()`. We can set the index-columns here as well
df_from_query = pd.read_sql_query(currency_query.statement, db.bind, index_col='id')
# Renaming the index-column(s) from `id` to `currency__id` needs another statement
df_from_query.index.rename(name='currency__id', inplace=True)
print(df_from_query.head(5))

The selected solution didn't work for me, as I kept getting the error
AttributeError: 'AnnotatedSelect' object has no attribute 'lower'
I found the following worked:
df = pd.read_sql_query(query.statement, engine)

If you want to compile a query with parameters and dialect specific arguments, use something like this:
c = query.statement.compile(query.session.bind)
df = pandas.read_sql(c.string, query.session.bind, params=c.params)

from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
engine = create_engine('postgresql://postgres:postgres#localhost:5432/DB', echo=False)
Base = declarative_base(bind=engine)
Session = sessionmaker(bind=engine)
session = Session()
conn = session.bind
class DailyTrendsTable(Base):
__tablename__ = 'trends'
__table_args__ = ({"schema": 'mf_analysis'})
company_code = Column(DOUBLE_PRECISION, primary_key=True)
rt_bullish_trending = Column(Integer)
rt_bearish_trending = Column(Integer)
rt_bullish_non_trending = Column(Integer)
rt_bearish_non_trending = Column(Integer)
gen_date = Column(Date, primary_key=True)
df_query = select([DailyTrendsTable])
df_data = pd.read_sql(rt_daily_query, con = conn)

Using the 2.0 SQLalchemy syntax (available also in 1.4 with the flag future=True) it looks that pd.read_sql is not implemented yet and it will raise:
NotImplementedError: This method is not implemented for SQLAlchemy 2.0.
This is an open issue that won't be solved till pandas 2.0, you can find some information about this here and here.
I didn't find any satisfactory work around, but some people seems to be using two configurations of the engine, one with the flag future False:
engine2 = create_engine(URL_string, echo=False, future=False)
This solution would be OK if you query strings, but using the ORM, the best I could do is a custom function yet to be optimized, but it works:
Conditions = session.query(ExampleTable)
def df_from_sql(query):
return pd.DataFrame([i.__dict__ for i in query]).drop(columns='_sa_instance_state')
df = df_from_sql(ExampleTable)
This solution in any case would be provisional till pd.read_sql has implemented the new syntax.

When you're using the ORM it's as simple as this:
pd.DataFrame([r._asdict() for r in query.all()])
Good alternative to pd.read_sql when you don't want to expose sql and sessions to the business logic code.
Found it here: https://stackoverflow.com/a/52208023/1635525

This answer provides a reproducible example using an SQL Alchemy select statement and returning a pandas data frame. It is based on an in memory SQLite database so that anyone can reproduce it without installing a database engine.
import pandas
from sqlalchemy import create_engine
from sqlalchemy import MetaData, Table, Column, Text
from sqlalchemy.orm import Session
Define table metadata and create a table
engine = create_engine('sqlite://')
meta = MetaData()
meta.bind = engine
user_table = Table('user', meta,
Column("name", Text),
Column("full_name", Text))
user_table.create()
Insert some data into the user table
stmt = user_table.insert().values(name='Bob', full_name='Sponge Bob')
with Session(engine) as session:
result = session.execute(stmt)
session.commit()
Read the result of a select statement into a pandas data frame
# Select data into a pandas data frame
stmt = user_table.select().where(user_table.c.name == 'Bob')
df = pandas.read_sql_query(stmt, engine)
df
Out:
name full_name
0 Bob Sponge Bob

if use SQL query
def generate_df_from_sqlquery(query):
from pandas import DataFrame
query = db.session.execute(query)
df = DataFrame(query.fetchall())
if len(df) > 0:
df.columns = query.keys()
else:
columns = query.keys()
df = pd.DataFrame(columns=columns)
return df
profile_df = generate_df_from_sqlquery(profile_query)

Simple example using the CursorResult.keys() method to get the column names.
import sqlalchemy as sa
import pandas as pd
engine = sa.create_engine(...)
with engine.connect() as conn:
result = conn.execute("SELECT * FROM foo;")
df = pd.DataFrame(result.all(), columns=result.keys())
https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.Result.keys

Adding to answers using read_sql like #van, when my query involved a join, sqlalchemy seemed to be implicitly adding aliased columns from the join tables like id_1, id_2 incase the join tables and primary table both had an id column for example. Using .all() removes these implicit columns before returning results but read_sql will include these columns.
Solutions for that case for me was to be explicit on my selects. So I replaced
query = session.query(model)
with
query = session.query(model.col_1, model.col_2)
or for select all
query = session.query(*model.__table__.columns.values())
then
df = pd.read_sql(query.statement, query.session.bind)

Related

Creating additional column while importing csv via df.to_sql in sqlalchemy framework

I've to import csv data in sql using sqlAlchemy.
The csv has to columns (x, y) but I need to add a third column (delta_y) in the sql database to store processed data.
Using the following code it reads the csv to the sql database but is not creating the actual empty column in the database. Is there a smooth way to inherit was is mapped out in the class?
from sqlalchemy import Column, Integer, Float, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine, update
engine = create_engine('sqlite:///hausarbeit_db.sqlite3', echo=True)
Base = declarative_base()
class Test(Base):
__tablename__ = "test"
id = Column(Integer, primary_key=True)
x = Column(Float)
y = Column(Float)
delta_y = Column(Float)
Base.metadata.create_all(engine)
file_name = 'Beispiel-Datensaetze//test.csv'
df = pd.read_csv(file_name)
df.to_sql('test', con=engine, index_label="id", if_exists='replace')
TEST = Base.metadata.tables['test']
I'm also happy to hear any other hints or tips around the code above.
Thanks!

Can't you add a new empty column in the data-frame after reading from the csv
df["delta_y"] = np.nan
# or
df["delta_y"] = ""

Creating schema via declarative mapping: Base.metadata.create_all(engine) does not work

Here is an absurd problem with sqlalchemy that seems easy! First, this is my config file for connecting to mysql database:
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
engine = create_engine('mysql://root:#localhost:3306/digi')
and then, I am trying to create a table called 'sale-history' :
from config import *
from sqlalchemy import *
class Sale(Base):
__tablename__ = 'sale-history'
order_id = column(Integer, primary_key= True)
customer_id = column(Integer)
item_id = column(Integer) #froeign key with product list
cartFinalize_dateTime = column(DATETIME)
amount_ordrered = column(Integer)
city_name = column(String(191))
quantity_ordered = column(Integer)
def __repr__(self):
return "<Sale(city_name='%s')>" % (self.city_name)
Sale.__table__
Base.metadata.create_all(engine)
Now, what I wonder is that
Sale.__table__
and
Base.metadata.create_all(engine)
are not known to my code. More accurate, these are not in suggestion options showed by pycharm editor. Debugging the code does not throw any error(returns 0). What should I do to create tables?
I appreciate your consideration so much!

The code is using column to define columns in the table but it should be using Column (note the upper-case "C").
A few tips/comments
Pycharm may provide better support if you avoid the from module import * idiom. You can alias module names if they are to long to type, for example import sqlalchemy as sa
You can see the SQL generated by the engine by passing echo=True to create_engine
Tablenames with hyphens need to be quoted with backticks to be valid. Sqlalchemy does this automatically, but other applications may not. Using underscores instead may be more convenient.
The final code might look like this:
config
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
engine = create_engine('mysql://root:#localhost:3306/test', echo=True)
Model
import sqlachemy as sa
import config
class Sale(Base):
__tablename__ = 'sale-history'
order_id = sa.Column(sa.Integer, primary_key=True)
customer_id = sa.Column(sa.Integer)
item_id = sa.Column(sa.Integer) # foreign key with product list
cartFinalize_dateTime = sa.Column(sa.DATETIME)
amount_ordrered = sa.Column(sa.Integer)
city_name = sa.Column(sa.String(191))
quantity_ordered = sa.Column(sa.Integer)
def __repr__(self):
return "<Sale(city_name='%s')>" % (self.city_name)
Base.metadata.create_all(config.engine)

Strange issue in SQLAlchemy

Consider the following code which creates a very simple table (without using SQLAlchemy), then adds an entry to it using SQLAlchemy ORM and retrieves it:
import sqlite3
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
DB_PATH = '/tmp/tst.db'
#create a DB
sqlite_conn = sqlite3.connect(DB_PATH)
sqlite_conn.execute('''CREATE TABLE tst (
id INTEGER PRIMARY KEY ASC AUTOINCREMENT,
c0 INTEGER,
c1 INTEGER
);''')
sqlite_conn.commit()
#intialize an SA engine/session/mapped class
engine = create_engine('sqlite:///{}'.format(DB_PATH))
Base = declarative_base()
Base.metadata.reflect(bind=engine)
Session = sessionmaker(bind=engine)
class Tst(Base):
__table_name__ = 'tst'
__table__ = Base.metadata.tables[__table_name__]
columns = list(__table__.columns)
field_names = [c.name for c in columns]
#add an entry to the table
session = Session()
inst = Tst()
session.add(inst)
session.commit()
#retrieve an entry from the table
session = Session()
inst = session.query(Tst).first()
print inst.c1
One may expect that the code above will just print 'None', as 'c1' was not assigned a value. Instead of it, I'm getting the following error message:
Traceback (most recent call last):
File "...", line 39, in <module>
print inst.c1
AttributeError: 'Tst' object has no attribute 'c1'
But if the following line will be removed/commented:
field_names = [c.name for c in columns]
the output will be as expected.
In general, it looks like the iteration over Table.columns inside the class definition will cause the last column to be omitted from the class instances.
Following this answer, I actually changed the code to use Inspector, and it worked fine. However, AFAIK, accessing Table.columns is completely legitimate, so I wanted to understand whether it's buggy behavior or something wrong on my side.
P.S. tested with SQLAlchemy 1.1.9
P.P.S. the issue doesn't appear to be related to a specific DB dialect - reproduced with MySQL, sqlite.

This is more of a Python version issue than an SQLAlchemy issue. The root cause is the leaking of the name c from the list-comprehension in Python 2. It becomes part of the namespace of the constructed class, and so SQLAlchemy sees it as if you were explicitly naming the last column in the list columns in your class definition. Your class definition is equivalent to:
class Tst(Base):
__table_name__ = 'tst'
__table__ = Base.metadata.tables[__table_name__]
columns = list(__table__.columns)
...
c = columns[-1] # The last column of __table__
If you change your print statement to:
print inst.c
you'll get None as you expected. If you must have your field_names, you could for example remove the name from the namespace:
class Tst(Base):
__table_name__ = 'tst'
__table__ = Base.metadata.tables[__table_name__]
columns = list(__table__.columns)
field_names = [c.name for c in columns]
del c
but this is unportable (and ugly) between Python 2 and 3, since the name would not actually exist in 3. You could also work around the issue with attrgetter():
from operator import attrgetter
class Tst(Base):
__table_name__ = 'tst'
__table__ = Base.metadata.tables[__table_name__]
columns = list(__table__.columns)
field_names = list(map(attrgetter('name'), columns))
or use a generator expression:
field_names = list(c.name for c in columns)

Mapping lots of similar tables in SQLAlchemy

I have many (~2000) locations with time series data. Each time series has millions of rows. I would like to store these in a Postgres database. My current approach is to have a table for each location time series, and a meta table which stores information about each location (coordinates, elevation etc). I am using Python/SQLAlchemy to create and populate the tables. I would like to have a relationship between the meta table and each time series table to do queries like "select all locations that have data between date A and date B" and "select all data for date A and export a csv with coordinates". What is the best way to create many tables with the same structure (only the name is different) and have a relationship with a meta table? Or should I use a different database design?
Currently I am using this type of approach to generate a lot of similar mappings:
from sqlalchemy import create_engine, MetaData
from sqlalchemy.types import Float, String, DateTime, Integer
from sqlalchemy import Column, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship, backref
Base = declarative_base()
def make_timeseries(name):
class TimeSeries(Base):
__tablename__ = name
table_name = Column(String(50), ForeignKey('locations.table_name'))
datetime = Column(DateTime, primary_key=True)
value = Column(Float)
location = relationship('Location', backref=backref('timeseries',
lazy='dynamic'))
def __init__(self, table_name, datetime, value):
self.table_name = table_name
self.datetime = datetime
self.value = value
def __repr__(self):
return "{}: {}".format(self.datetime, self.value)
return TimeSeries
class Location(Base):
__tablename__ = 'locations'
id = Column(Integer, primary_key=True)
table_name = Column(String(50), unique=True)
lon = Column(Float)
lat = Column(Float)
if __name__ == '__main__':
connection_string = 'postgresql://user:pw#localhost/location_test'
engine = create_engine(connection_string)
metadata = MetaData(bind=engine)
Session = sessionmaker(bind=engine)
session = Session()
TS1 = make_timeseries('ts1')
# TS2 = make_timeseries('ts2') # this breaks because of the foreign key
Base.metadata.create_all(engine)
session.add(TS1("ts1", "2001-01-01", 999))
session.add(TS1("ts1", "2001-01-02", -555))
qs = session.query(Location).first()
print qs.timeseries.all()
This approach has some problems, most notably that if I create more than one TimeSeries the foreign key doesn't work. Previously I've used some work arounds, but it all seems like a big hack and I feel that there must be a better way of doing this. How should I organise and access my data?

Alternative-1: Table Partitioning
Partitioning immediately comes to mind as soon as I read exactly the same table structure. I am not a DBA, and do not have much production experience using it (even more so on PostgreSQL), but
please read PostgreSQL - Partitioning documentation. Table partitioning seeks to solve exactly the problem you have, but over 1K tables/partitions sounds challenging; therefore please do more research on forums/SO for scalability related questions on this topic.
Given that both of your mostly used search criterias, datetime component is very important, therefore there must be solid indexing strategy on it. If you decide to go with partitioning root, the obvious partitioning strategy would be based on date ranges. This might allow you to partition older data in different chunks compared to most recent data, especially assuming that old data is (almost never) updated, so physical layouts would be dense and efficient; while you could employ another strategy for more "recent" data.
Alternative-2: trick SQLAlchemy
This basically makes your sample code work by tricking SA to assume that all those TimeSeries are children of one entity using Concrete Table Inheritance. The code below is self-contained and creates 50 table with minimum data in it. But if you have a database already, it should allow you to check the performance rather quickly, so that you can make a decision if it is even a close possibility.
from datetime import date, datetime
from sqlalchemy import create_engine, Column, String, Integer, DateTime, Float, ForeignKey, func
from sqlalchemy.orm import sessionmaker, relationship, configure_mappers, joinedload
from sqlalchemy.ext.declarative import declarative_base, declared_attr
from sqlalchemy.ext.declarative import AbstractConcreteBase, ConcreteBase
engine = create_engine('sqlite:///:memory:', echo=True)
Session = sessionmaker(bind=engine)
session = Session()
Base = declarative_base(engine)
# MODEL
class Location(Base):
__tablename__ = 'locations'
id = Column(Integer, primary_key=True)
table_name = Column(String(50), unique=True)
lon = Column(Float)
lat = Column(Float)
class TSBase(AbstractConcreteBase, Base):
#declared_attr
def table_name(cls):
return Column(String(50), ForeignKey('locations.table_name'))
def make_timeseries(name):
class TimeSeries(TSBase):
__tablename__ = name
__mapper_args__ = { 'polymorphic_identity': name, 'concrete':True}
datetime = Column(DateTime, primary_key=True)
value = Column(Float)
def __init__(self, datetime, value, table_name=name ):
self.table_name = table_name
self.datetime = datetime
self.value = value
return TimeSeries
def _test_model():
_NUM = 50
# 0. generate classes for all tables
TS_list = [make_timeseries('ts{}'.format(1+i)) for i in range(_NUM)]
TS1, TS2, TS3 = TS_list[:3] # just to have some named ones
Base.metadata.create_all()
print('-'*80)
# 1. configure mappers
configure_mappers()
# 2. define relationship
Location.timeseries = relationship(TSBase, lazy="dynamic")
print('-'*80)
# 3. add some test data
session.add_all([Location(table_name='ts{}'.format(1+i), lat=5+i, lon=1+i*2)
for i in range(_NUM)])
session.commit()
print('-'*80)
session.add(TS1(datetime(2001,1,1,3), 999))
session.add(TS1(datetime(2001,1,2,2), 1))
session.add(TS2(datetime(2001,1,2,8), 33))
session.add(TS2(datetime(2002,1,2,18,50), -555))
session.add(TS3(datetime(2005,1,3,3,33), 8))
session.commit()
# Query-1: get all timeseries of one Location
#qs = session.query(Location).first()
qs = session.query(Location).filter(Location.table_name == "ts1").first()
print(qs)
print(qs.timeseries.all())
assert 2 == len(qs.timeseries.all())
print('-'*80)
# Query-2: select all location with data between date-A and date-B
dateA, dateB = date(2001,1,1), date(2003,12,31)
qs = (session.query(Location)
.join(TSBase, Location.timeseries)
.filter(TSBase.datetime >= dateA)
.filter(TSBase.datetime <= dateB)
).all()
print(qs)
assert 2 == len(qs)
print('-'*80)
# Query-3: select all data (including coordinates) for date A
dateA = date(2001,1,1)
qs = (session.query(Location.lat, Location.lon, TSBase.datetime, TSBase.value)
.join(TSBase, Location.timeseries)
.filter(func.date(TSBase.datetime) == dateA)
).all()
print(qs)
# #note: qs is list of tuples; easy export to CSV
assert 1 == len(qs)
print('-'*80)
if __name__ == '__main__':
_test_model()
Alternative-3: a-la BigData
If you do get into performance problems using database, I would probably try:
still keep the data in separate tables/databases/schemas like you do right now
bulk-import data using "native" solutions provided by your database engine
use MapReduce-like analysis.
Here I would stay with python and sqlalchemy and implemnent own distributed query and aggregation (or find something existing). This, obviously, only works if you do not have requirement to produce those results directly on the database.
edit-1: Alternative-4: TimeSeries databases
I have no experience using those on a large scale, but definitely an option worth considering.
Would be fantastic if you could later share your findings and whole decision-making process on this.

I would avoid the database design you mention above. I don't know enough about the data you are working with, but it sounds like you should have two tables. One table for location, and a child table for location_data. The location table would store the data you mention above such as coordinates and elevations. The location_data table would store the location_id from the location table as well as the time series data you want to track.
This would eliminate changing db structure and code changes every time you add another location, and would allow the types of queries you are looking at doing.

Two parts:
only use two tables
there's no need to have dozens or hundreds of identical tables. just have a table for location and one for location_data , where every entry will fkey onto location. also create an index on the location_data table for the location_id, so you have efficient searching.
don't use sqlalchemy to create this
i love sqlalchemy. i use it every day. it's great for managing your database and adding some rows, but you don't want to use it for initial setup that has millions of rows. you want to generate a file that is compatible with postgres' "COPY" statement [ http://www.postgresql.org/docs/9.2/static/sql-copy.html ] COPY will let you pull in a ton of data fast; it's what is used during dump/restore operations.
sqlalchemy will be great for querying this and adding rows as they come in. if you have bulk operations, you should use COPY.

SQLAlchemy - copy schema and data of subquery to another database

I am trying to copy data from a subquery from postgres (from_engine) to sqlite database. I can achieve this for copying a table using following command:
smeta = MetaData(bind=from_engine)
table = Table(table_name, smeta, autoload=True)
table.metadata.create_all(to_engine)
However, I am not sure how to achieve the same for a subquery statement.
-Sandeep
Edit:
Follow up on the answer. Once I have created the table I want to create a subquery stmt as follows:
table = Table("newtable", dest_metadata, *columns)
stmt = dest_session.query(table).subquery();
However, the last stmt ends up with error
cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (ProgrammingError) relation "newtable" does not exist
LINE 3: FROM newtable) AS anon_1

One way that works at least in some cases:
Use column_descriptions of a query object to get some information about the columns in the result set.
With that information you can build the schema to create the new table in the other database.
Run the query in the source database and insert the results into the new table.
First of some setup for the example:
from sqlalchemy import create_engine, MetaData,
from sqlalchemy import Column, Integer, String, Table
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
# Engine to the database to query the data from
# (postgresql)
source_engine = create_engine('sqlite:///:memory:', echo=True)
SourceSession = sessionmaker(source_engine)
# Engine to the database to store the results in
# (sqlite)
dest_engine = create_engine('sqlite:///:memory:', echo=True)
DestSession = sessionmaker(dest_engine)
# Create some toy table and fills it with some data
Base = declarative_base()
class Pet(Base):
__tablename__ = 'pets'
id = Column(Integer, primary_key=True)
name = Column(String)
race = Column(String)
Base.metadata.create_all(source_engine)
sourceSession = SourceSession()
sourceSession.add(Pet(name="Fido", race="cat"))
sourceSession.add(Pet(name="Ceasar", race="cat"))
sourceSession.add(Pet(name="Rex", race="dog"))
sourceSession.commit()
Now to the interesting bit:
# This is the query we want to persist in a new table:
query= sourceSession.query(Pet.name, Pet.race).filter_by(race='cat')
# Build the schema for the new table
# based on the columns that will be returned
# by the query:
metadata = MetaData(bind=dest_engine)
columns = [Column(desc['name'], desc['type']) for desc in query.column_descriptions]
column_names = [desc['name'] for desc in query.column_descriptions]
table = Table("newtable", metadata, *columns)
# Create the new table in the destination database
table.create(dest_engine)
# Finally execute the query
destSession = DestSession()
for row in query:
destSession.execute(table.insert(row))
destSession.commit()
There should be more efficient ways to do the last loop. But bulk-insert is another topic.

You can also go through a pandas data frame. For example a method would use pandas.read_sql(query, source.connection) and df.to_sql(table_name, con=destination.connection).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

SQLAlchemy ORM conversion to pandas DataFrame - python

Below should work in most cases: df = pd.read_sql(query.statement, query.session.bind) See pandas.read_sql documentation for more information on the parameters.

Just to make this more clear for novice pandas programmers, here is a concrete example, pd.read_sql(session.query(Complaint).filter(Complaint.id == 2).statement,session.bind) Here we select a complaint from complaints table (sqlalchemy model is Complaint) with id = 2

The selected solution didn't work for me, as I kept getting the error AttributeError: 'AnnotatedSelect' object has no attribute 'lower' I found the following worked: df = pd.read_sql_query(query.statement, engine)

If you want to compile a query with parameters and dialect specific arguments, use something like this: c = query.statement.compile(query.session.bind) df = pandas.read_sql(c.string, query.session.bind, params=c.params)

When you're using the ORM it's as simple as this: pd.DataFrame([r._asdict() for r in query.all()]) Good alternative to pd.read_sql when you don't want to expose sql and sessions to the business logic code. Found it here: https://stackoverflow.com/a/52208023/1635525

Related

Creating additional column while importing csv via df.to_sql in sqlalchemy framework

Creating schema via declarative mapping: Base.metadata.create_all(engine) does not work

Strange issue in SQLAlchemy

Mapping lots of similar tables in SQLAlchemy

SQLAlchemy - copy schema and data of subquery to another database

Categories

Resources