Selecting distinct column values in SQLAlchemy/Elixir - python

In a little script I'm writing using SQLAlchemy and Elixir, I need to get all the distinct values for a particular column. In ordinary SQL it'd be a simple matter of
SELECT DISTINCT `column` FROM `table`;
and I know I could just run that query "manually," but I'd rather stick to the SQLAlchemy declarative syntax (and/or Elixir) if I can. I'm sure it must be possible, I've even seen allusions to this sort of thing in the SQLAlchemy documentation, but I've been hunting through that documentation for hours (as well as that of Elixir) and I just can't seem to actually figure out how it would be done. So what am I missing?

You can query column properties of mapped classes and the Query class has a generative distinct() method:
for value in Session.query(Table.column).distinct():
pass

For this class:
class Assurance(db.Model):
name = Column(String)
you can do this:
assurances = []
for assurance in Assurance.query.distinct(Assurance.name):
assurances.append(assurance.name)
and you will have the list of distinct values

I wanted to count the distinct values, and using .distinct() and .count() would count first, resulting in a single value, then do the distinct. I had to do the following
from sqlalchemy.sql import func
Session.query(func.count(func.distinct(Table.column))

For class,
class User(Base):
name = Column(Text)
id = Column(Integer, primary_key=True)
Method 1: Using load_only
from sqlalchemy.orm import load_only
records= (db_session.query(User).options(load_only(name)).distinct().all())
values = [record[0] if len(record) == 1 else record for record in records] # list of distinct values
Method2: without any imports
records = db_session.query(User.name).distinct().all()
l_values = [record.__dict__[l_columns[0]] for record in records]

for user in session.query(users_table).distinct():
print user.posting_id

SQL Alchemy version 2 encourages the use of the select() function. You can use an SQL Alchemy table to build a select statement that extracts unique values:
select(distinct(table.c.column_name))
SQL Alchemy 2.0 migration ORM usage:
"The biggest visible change in SQLAlchemy 2.0 is the use of Session.execute() in conjunction with select() to run ORM queries, instead of using Session.query()."
Reproducible example using pandas to collect the unique values.
Define and insert the iris dataset
Define an ORM structure for the iris dataset, then use pandas to insert the
data into an SQLite database. Pandas inserts with if_exists="append" argument
so that it keeps the structure defined in SQL Alchemy.
import seaborn
import pandas
from sqlalchemy import create_engine
from sqlalchemy import MetaData, Table, Column, Text, Float
from sqlalchemy.orm import Session
Define metadata and create the table
engine = create_engine('sqlite://')
meta = MetaData()
meta.bind = engine
iris_table = Table('iris',
meta,
Column("sepal_length", Float),
Column("sepal_width", Float),
Column("petal_length", Float),
Column("petal_width", Float),
Column("species", Text))
iris_table.create()
Load data into the table
iris = seaborn.load_dataset("iris")
iris.to_sql(name="iris",
con=engine,
if_exists="append",
index=False,
chunksize=10 ** 6,
)
Select unique values
Re using the iris_table from above.
from sqlalchemy import distinct, select
stmt = select(distinct(iris_table.c.species))
df = pandas.read_sql_query(stmt, engine)
df
# species
# 0 setosa
# 1 versicolor
# 2 virginica

the marked solution showed me an error so I just specified the column and it worked here is the code
for i in (session.query(table_name.c.column_name).distinct()):
print(i)

Related

How set start of auto increment in flask-sqlalchemy [duplicate]

The autoincrement argument in SQLAlchemy seems to be only True and False, but I want to set the pre-defined value aid = 1001, the via autoincrement aid = 1002 when the next insert is done.
In SQL, can be changed like:
ALTER TABLE article AUTO_INCREMENT = 1001;
I'm using MySQL and I have tried following, but it doesn't work:
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Article(Base):
__tablename__ = 'article'
aid = Column(INTEGER(unsigned=True, zerofill=True),
autoincrement=1001, primary_key=True)
So, how can I get that? Thanks in advance!
You can achieve this by using DDLEvents. This will allow you to run additional SQL statements just after the CREATE TABLE ran. Look at the examples in the link, but I am guessing your code will look similar to below:
from sqlalchemy import event
from sqlalchemy import DDL
event.listen(
Article.__table__,
"after_create",
DDL("ALTER TABLE %(table)s AUTO_INCREMENT = 1001;")
)
According to the docs:
autoincrement –
This flag may be set to False to indicate an integer primary key column that should not be considered to be the “autoincrement” column, that is the integer primary key column which generates values implicitly upon INSERT and whose value is usually returned via the DBAPI cursor.lastrowid attribute. It defaults to True to satisfy the common use case of a table with a single integer primary key column.
So, autoincrement is only a flag to let SQLAlchemy know whether it's the primary key you want to increment.
What you're trying to do is to create a custom autoincrement sequence.
So, your example, I think, should look something like:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.schema import Sequence
Base = declarative_base()
class Article(Base):
__tablename__ = 'article'
aid = Column(INTEGER(unsigned=True, zerofill=True),
Sequence('article_aid_seq', start=1001, increment=1),
primary_key=True)
Note, I don't know whether you're using PostgreSQL or not, so you should make note of the following if you are:
The Sequence object also implements special functionality to accommodate Postgresql’s SERIAL datatype. The SERIAL type in PG automatically generates a sequence that is used implicitly during inserts. This means that if a Table object defines a Sequence on its primary key column so that it works with Oracle and Firebird, the Sequence would get in the way of the “implicit” sequence that PG would normally use. For this use case, add the flag optional=True to the Sequence object - this indicates that the Sequence should only be used if the database provides no other option for generating primary key identifiers.
I couldn't get the other answers to work using mysql and flask-migrate so I did the following inside a migration file.
from app import db
db.engine.execute("ALTER TABLE myDB.myTable AUTO_INCREMENT = 2000;")
Be warned that if you regenerated your migration files this will get overwritten.
I know this is an old question but I recently had to figure this out and none of the available answer were quite what I needed. The solution I found relied on Sequence in SQLAlchemy. For whatever reason, I could not get it to work when I called the Sequence constructor within the Column constructor as has been referenced above. As a note, I am using PostgreSQL.
For your answer I would have put it as such:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Sequence, Column, Integer
import os
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Sequence, Integer, create_engine
Base = declarative_base()
def connection():
engine = create_engine(f"postgresql://postgres:{os.getenv('PGPASSWORD')}#localhost:{os.getenv('PGPORT')}/test")
return engine
engine = connection()
class Article(Base):
__tablename__ = 'article'
seq = Sequence('article_aid_seq', start=1001)
aid = Column('aid', Integer, seq, server_default=seq.next_value(), primary_key=True)
Base.metadata.create_all(engine)
This then can be called in PostgreSQL with:
insert into article (aid) values (DEFAULT);
select * from article;
aid
------
1001
(1 row)
Hope this helps someone as it took me a while
You can do it using the mysql_auto_increment table create option. There are mysql_engine and mysql_default_charset options too, which might be also handy:
article = Table(
'article', metadata,
Column('aid', INTEGER(unsigned=True, zerofill=True), primary_key=True),
mysql_engine='InnoDB',
mysql_default_charset='utf8',
mysql_auto_increment='1001',
)
The above will generate:
CREATE TABLE article (
aid INTEGER UNSIGNED ZEROFILL NOT NULL AUTO_INCREMENT,
PRIMARY KEY (aid)
)ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8
If your database supports Identity columns*, the starting value can be set like this:
import sqlalchemy as sa
tbl = sa.Table(
't10494033',
sa.MetaData(),
sa.Column('id', sa.Integer, sa.Identity(start=200, always=True), primary_key=True),
)
Resulting in this DDL output:
CREATE TABLE t10494033 (
id INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 200),
PRIMARY KEY (id)
)
Identity(..) is ignored if the backend does not support it.
* PostgreSQL 10+, Oracle 12+ and MSSQL, according to the linked documentation above.

Select specific columns with cast using SQLAlchemy

I'm using SQLAlchemy (Version: 1.4.44) and I'm having some unexpected results when trying to select columns and using cast on those columns.
First, most of the examples and even current documentation suggests column selection should work by passing an array to the select function like this:
s = select([table.c.col1])
However, I get the following error if I try this:
s = my_table.select([my_table.columns.user_id])
sqlalchemy.exc.ArgumentError: SQL expression for WHERE/HAVING role expected, got [Column('user_id', String(), table=<my_table>)].
Some examples suggest just placing the field directly in the select query.
s = select(table.c.col1)
But this seems to do nothing more than create an idle where-clause out of the field.
I eventually was able to achieve column selection with this approach:
s = my_table.select().with_only_columns(my_table.columns.created_at)
But I am not able to use cast for some reason with this approach.
s = my_table.select().with_only_columns(cast(my_table.columns.created_at, Date))
ValueError: Couldn't parse date string '2022' - value is not a string.
All help appreciated!
I don't think table.select() is common usage. SQLAlchemy is in a big transition right now on its way to 2.0. In 1.4 (and in 2) the following syntax should work, use whatever session handling you already have working I just mean the select(...):
from sqlalchemy.sql import select, cast
from sqlalchemy.dialects.postgresql import INTEGER
class User(Base):
__tablename__ = "users"
id = Column(
Integer, nullable=False, primary_key=True
)
name = Column(Text)
with Session(engine) as session:
u1 = User(name="1")
session.add(u1)
session.commit()
with Session(engine) as session:
my_table = User.__table__
# Cast user name into integer.
print (session.execute(select(cast(my_table.c.name, INTEGER))).all())

How do I use the SQLAlchemy ORM to do an insert with a subquery (moving data from one table to another)

The following simple example describes my problem with my postgres DB (although my question is more about sqlalchemy than postgres):
I have a table called detection with columns:
id
item
price_in_cents
shop_id
I have another table called item with the following columns:
id
detection_id (foreign key to detection.id)
price_in_dollar
I want to move the entire dataset of a certain shop from table detection to table item whilst also performing an operation to convert cents to dollars (the example is theoretical, my real problem has a different operation than cents to dollars).
In raw SQL I can use the following query:
INSERT INTO item (detection_id, price_in_dollar)
SELECT id AS detection_id,
price_in_cent / 100 AS price_in_dollar
FROM detection
WHERE shop_id = {shop_id}
Is it possible to replicated this query using SQLAlchemy? Due to the volume of the data (could be millions of rows) I do not want to first download the data to do the operation and then upload it. My example that would work would be:
q = session.query(Detection).filter(Detection.shop_id == shop_id)
for detection_record in q:
session.add(Item(detection_id=detection_record.id,
price_in_dollar=detection_record.price_in_cent / 100))
session.commit()
This would however download all the data to the machine first instead of doing all the work in the DB itself and thus has different behaviour than my example query.
Just because you're using ORM in your project doesn't mean that you have to use ORM for everything. SQLAlchemy ORM is good for pulling relational "things" down as Python objects and working with them. For server-side operations, SQLAlchemy Core is the tool to use.
Assuming that you have declared your ORM objects using Declarative, e.g.,
import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Detection(Base):
__tablename__ = "detection"
# ...
then you can use Core to create the server-side operation with code like this:
meta = Base.metadata
item_t = meta.tables[Item.__tablename__]
detection_t = meta.tables[Detection.__tablename__]
target_shop_id = 1 # test value
ins = item_t.insert().from_select(
["detection_id", "price_in_dollar"],
sa.select(
[
detection_t.c.id.label("detection_id"),
(detection_t.c.price_in_cents / sa.text("100")).label(
"price_in_dollar"
),
]
).where(detection_t.c.shop_id == target_shop_id),
)
with engine.begin() as conn:
conn.execute(ins)
and the generated SQL text is
INSERT INTO item (detection_id, price_in_dollar) SELECT detection.id AS detection_id, detection.price_in_cents / 100 AS price_in_dollar
FROM detection
WHERE detection.shop_id = ?

Example of using the 'callable' method in pandas.to_sql()?

I'm trying to make a specific insert statement that has an ON CONFLICT argument (I'm uploading to a Postgres database); will the df.to_sql(method='callable') allow that? Or is it intended for another purpose? I've read through the documentation, but I wasn't able to grasp the concept. I looked around on this website and others for similar questions, but I haven't found one yet. If possible I would love to see an example of how to use the 'callable' method in practice. Any other ideas on how to effectively load large numbers of rows from pandas using ON CONFLICT logic would be much appreciated as well. Thanks in advance for the help!
Here's an example on how to use postgres's ON CONFLICT DO NOTHING with to_sql
# import postgres specific insert
from sqlalchemy.dialects.postgresql import insert
def to_sql_on_conflict_do_nothing(pd_table, conn, keys, data_iter):
# This is very similar to the default to_sql function in pandas
# Only the conn.execute line is changed
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(insert(pd_table.table).on_conflict_do_nothing(), data)
conn = engine.connect()
df.to_sql("some_table", conn, if_exists="append", index=False, method=to_sql_on_conflict_do_nothing)
I have just had similar problem, and followed by to this answer I came up with solution on how to send df to potgresSQL ON CONFLICT:
1. Send some initial data to the database to create the table
from sqlalchemy import create_engine
engine = create_engine(connection_string)
df.to_sql(table_name,engine)
2. add primary key
ALTER TABLE table_name ADD COLUMN id SERIAL PRIMARY KEY;
3. prepare index on the column (or columns) you want to check the uniqueness
CREATE UNIQUE INDEX review_id ON test(review_id);
4. map the sql table with sqlalchemy
from sqlalchemy.ext.automap import automap_base
ABase = automap_base()
Table = ABase.classes.table_name
Table.__tablename__ = 'table_name'
6. do your insert on conflict with:
from sqlalchemy.dialects.postgresql import insert
insrt_vals = df.to_dict(orient='records')
insrt_stmnt = insert(Table).values(insrt_vals)
do_nothing_stmt = insrt_stmnt.on_conflict_do_nothing(index_elements=['review_id'])
results = engine.execute(do_nothing_stmt)

Distinct values from an ARRAY column

I'm using PostgreSQL's ARRAY to store tags for images.
How can I write an ORM query in SQLAlchemy, which returns the set of all tags found in the table, for the following model:
from sqlalchemy.dialects.postgresql import ARRAY
class Image(Base):
__tablename__ = 'images'
id = Column(String, primary_key=True)
tags = Column(ARRAY(Unicode))
I guess I need to use a lateral join, but I do not know how to do it using SQLAlchemy's ORM syntax.
PG version: 9.5
You can use func.unnest:
from sqlalchemy import func
session.query(func.unnest(Image.tags)).distinct().all()
distinct() will make the result a set and unnest will split the arrays into separate rows (like the postgresql function unnest).

Categories

Resources