Bulk update in SQLAlchemy from text query result/ResultProxy or dict - python

I would like to bulk update an ORM table in SQLAlchemy from a query for which I only have the text and a database connection. I cannot easily (I believe) reflect the source query in the ORM because it could come from an unlimited set of tables. The extra wrinkle is that I would like to update a key-value HSTORE column (postgres). I think I can figure out how to do this row-by-row, but would prefer a bulk UPDATE FROM-style operation.
To keep it simple:
class Table(Base):
__tablename__= 'table'
id = Column(Integer, primary_key=True)
hstore = Column(MutableDict.as_mutable(HSTORE))
query_to_update_from = 'select id, attr1, attr2 from source_table where id between 1 and 100'
I would like to update Table.hstore with {'attr1':attr1, 'attr2':attr2} where ids match. I want any columns not named id to update the hstore.
I know I can do session.execute('select id, attr1, attr2 from source_table where id between 1 and 100') and get a list of column names and row data easily. I can make a list of dictionaries from that, but can't figure out how to use that in a bulk update.
I have also tried making a query().subquery out of the raw text query to no avail, understandably since there isn't the required structure.
I am stumped at this point!

Related

Insert different UUID on each row of a large table by python

15
I have a table with ~80k rows with imported data. Table structure is as follows:
order_line_items
id
order_id
product_id
quantity
price
uuid
On import, the order_id, product_id, quantity, and price were imported, but the uuid field was left null.
Is there a way, using python's UUID() function, to add a uuid to each row of the table in bulk? I could use a script to cycle through each row and update it but if there is a python solution, that would be fastest.
Probably you need to add the default uuid for the table/model and and save value
from uuid import uuid4
from sqlalchemy import Column, String
class Table(Base):
__tablename__ = 'table'
id = Column(String, primary_key=True, default=uuid4)
# add other column
records =[] # records in dict
sess = session() # database session
# save all records in db
sess.bulk_insert_mappings(Table, records)
sess.commit()
A more Pythonic way in adding/modifying a value in a column is by using map method. You can find refer here for more details: https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html.
Basically, what map is doing is map values of a column according to an function.
Your function must return a value for this to works, and you can take in the original value in the column as argument.
I'm fairly certain you can do this directly in MySQL using the UUID function.
UPDATE your_table_name SET uuid = UUID();

sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "product_pkey" [duplicate]

I have a tabled called products
which has following columns
id, product_id, data, activity_id
What I am essentially trying to do is copy bulk of existing products and update it's activity_id and create new entry in the products table.
Example:
I already have 70 existing entries in products with activity_id 2
Now I want to create another 70 entries with same data except for updated activity_id
I could have thousands of existing entries that I'd like to make a copy of and update the copied entries activity_id to be a new id.
products = self.session.query(model.Products).filter(filter1, filter2).all()
This returns all the existing products for a filter.
Then I iterate through products, then simply clone existing products and just update activity_id field.
for product in products:
product.activity_id = new_id
self.uow.skus.bulk_save_objects(simulation_skus)
self.uow.flush()
self.uow.commit()
What is the best/ fastest way to do these bulk entries so it kills time, as of now it's OK performance, is there a better solution?
You don't need to load these objects locally, all you really want to do is have the database create these rows.
You essentially want to run a query that creates the rows from the existing rows:
INSERT INTO product (product_id, data, activity_id)
SELECT product_id, data, 2 -- the new activity_id value
FROM product
WHERE activity_id = old_id
The above query would run entirely on the database server; this is far preferable over loading your query into Python objects, then sending all the Python data back to the server to populate INSERT statements for each new row.
Queries like that are something you could do with SQLAlchemy core, the half of the API that deals with generating SQL statements. However, you can use a query built from a declarative ORM model as a starting point. You'd need to
Access the Table instance for the model, as that then lets you create an INSERT statement via the Table.insert() method.
You could also get the same object from models.Product query, more on that later.
Access the statement that would normally fetch the data for your Python instances for your filtered models.Product query; you can do so via the Query.statement property.
Update the statement to replace the included activity_id column with your new value, and remove the primary key (I'm assuming that you have an auto-incrementing primary key column).
Apply that updated statement to the Insert object for the table via Insert.from_select().
Execute the generated INSERT INTO ... FROM ... query.
Step 1 can be achieved by using the SQLAlchemy introspection API; the inspect() function, applied to a model class, gives you a Mapper instance, which in turn has a Mapper.local_table attribute.
Steps 2 and 3 require a little juggling with the Select.with_only_columns() method to produce a new SELECT statement where we swapped out the column. You can't easily remove a column from a select statement but we can, however, use a loop over the existing columns in the query to 'copy' them across to the new SELECT, and at the same time make our replacement.
Step 4 is then straightforward, Insert.from_select() needs to have the columns that are inserted and the SELECT query. We have both as the SELECT object we have gives us its columns too.
Here is the code for generating your INSERT; the **replace keyword arguments are the columns you want to replace when inserting:
from sqlalchemy import inspect, literal
from sqlalchemy.sql import ClauseElement
def insert_from_query(model, query, **replace):
# The SQLAlchemy core definition of the table
table = inspect(model).local_table
# and the underlying core select statement to source new rows from
select = query.statement
# validate asssumptions: make sure the query produces rows from the above table
assert table in select.froms, f"{query!r} must produce rows from {model!r}"
assert all(c.name in select.columns for c in table.columns), f"{query!r} must include all {model!r} columns"
# updated select, replacing the indicated columns
as_clause = lambda v: literal(v) if not isinstance(v, ClauseElement) else v
replacements = {name: as_clause(value).label(name) for name, value in replace.items()}
from_select = select.with_only_columns([
replacements.get(c.name, c)
for c in table.columns
if not c.primary_key
])
return table.insert().from_select(from_select.columns, from_select)
I included a few assertions about the model and query relationship, and the code accepts arbitrary column clauses as replacements, not just literal values. You could use func.max(models.Product.activity_id) + 1 as a replacement value (wrapped as a subselect), for example.
The above function executes steps 1-4, producing the desired INSERT SQL statement when printed (I created a products model and query that I thought might be representative):
>>> print(insert_from_query(models.Product, products, activity_id=2))
INSERT INTO products (product_id, data, activity_id) SELECT products.product_id, products.data, :param_1 AS activity_id
FROM products
WHERE products.activity_id != :activity_id_1
All you have to do is execute it:
insert_stmt = insert_from_query(models.Product, products, activity_id=2)
self.session.execute(insert_stmt)

How to find the reference key/id in a sqlite database

So I am trying to figure out the proper way to use the sqlite database, but I feel like I got it all wrong when it comes to the Key/ID part.
I'm sure the question has been asked before and answered somewhere, but I have yet to find it, so here it goes.
From what I've gathered so far I am supposed to use the Key/ID for reference to entries across tables, correct?
So if table A has an entry with ID 1 and then several columns of data, then table B uses ID 1 in table A to access that data.
I can do that and it works out just fine as long as I already know the Key/ID.
What I fail to understand is how to do this if I don't already know it.
Consider the following code:
import sqlite3
conn = sqlite3.connect("./DB")
conn.execute("""CREATE TABLE IF NOT EXISTS Table_A (
A_id INTEGER NOT NULL PRIMARY KEY UNIQUE,
A_name TEXT
)""")
conn.execute("""CREATE TABLE IF NOT EXISTS Table_B (
B_id INTEGER NOT NULL PRIMARY KEY UNIQUE,
B_name TEXT,
B_A_id INTEGER
)""")
conn.execute("""INSERT INTO Table_A (A_name) VALUES ('Something')""")
conn.commit()
I now want to add an entry to Table_B and have it refer to the entry I just made in the B_A_id column.
How do I do this?
I have no idea what the Key/ID is, and all I do know is that it has 'Something' in in the A_name column. Can I find it without making a query for 'Something' or checking the database directly? Cause that feels a bit backwards.
Am I doing it wrong or am I missing something here?
Maybe I am just being stupid.
You don't need to know the A_id from Table_A.
All you need is the value of the column A_name, say it is 'Something', which you want to reference in Table_B and you can do it like this:
INSERT INTO Table_B (B_name, B_A_id)
SELECT 'SomethingInTableB', A_Id
FROM Table_A
WHERE A_name = 'Something'
or:
INSERT INTO Table_B (B_name, B_A_id) VALUES
('SomethingInTableB', (SELECT A_Id FROM Table_A WHERE A_name = 'Something'))
You are on the right path, but have run into the problem that the Connection.execute() function is actually a shortcut for creating a cursor and executing the query using that. To retrieve the id of the new row in Table_A explicitly create the cursor, and access the lastrowid attribute, for example:
c = conn.cursor()
c.execute("""INSERT INTO Table_A (A_name) VALUES ('Something')""")
print(c.lastrowid) # primary key (A_id) of the new row
For more information about Connection and Cursor objects, refer to the python sqlite3 documentation.

SQLAlchemy group_by SQLite vs PostgreSQL

For the web app we are building we used SQLite for testing purposes. Recently we wanted to migrate to PostgreSQL. That's where the problems started:
We have this SQLAlchemy model (simplified)
class Entity(db.Model):
id = db.Column(db.Integer, primary_key=True)
i_want_this = db.Column(db.String)
some_value = db.Column(db.Integer)
I want to group all Entitys by some_value which i did like this (simplified):
db.session.query(Entity, db.func.count()).group_by(Entity.some_value)
In SQLite this worked. In retrospect I see that it does not make sense but SQLite did make sense of it. I can't say for sure which of the entities was returned.
Now in PostgrSQL we get this error:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "entity.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT entity.id AS entity_id, entity.i_want_this AS entity_not...
^
[SQL: 'SELECT entity.id AS entity_id, entity.i_want_this AS entity_i_want_this, count(*) AS count_1 \nFROM entity GROUP BY entity.some_value']
And that error totally makes sense.
So my first question is: Why does SQLite allow this and how does it do it (what hidden aggregation is used)?
My second question is obvious: How would I do it with PostgreSQL?
I'm actually only interested in the count and the first i_want_this value. So I could do this:
groups = db.session.query(db.func.min(Entity.id), db.func.count()).group_by(Entity.some_value)
[(Entity.query.get(id_), count) for id_, count in groups]
But I don't want these additional get queries.
So I want to select the first entity (The entity with the minimal id) and the number of entities grouped by some_value or the first i_want_this and the count grouped by some_value
EDIT to make it clear:
I want to group by some_value (Done)
I want to get the number of entities in each group (Done)
I want to get the entity with the lowest id in each group (Need help on this)
Alternatively I want to get the i_want_this value of the entity with the lowest id in each group (Need help on this)
Concerning your first question, check the documentation:
Each expression in the result-set is then evaluated once for each
group of rows. If the expression is an aggregate expression, it is
evaluated across all rows in the group. Otherwise, it is evaluated
against a single arbitrarily chosen row from within the group. If
there is more than one non-aggregate expression in the result-set,
then all such expressions are evaluated for the same row.
Concerning the second question, you'll probably have to explain what you actually want to achieve, considering that your current query also in SQLite returns more or less random results.
EDIT:
To get the entities with minimum id per group, you can use the Query.select_from construct:
import sqlalchemy.sql as sa_sql
# create the aggregate/grouped query
grouped = sa_sql.select([sa_sql.func.min(Entity.id).label('min_id')])\
.group_by(Entity.some_value)\
.alias('grouped')
# join it with the full entities table
joined = sa_sql.join(Entity, grouped, grouped.c.min_id == Entity.id)
# and let sqlalchemy pull the entities from this statement:
session.query(Entity).select_from(joined)
This will produce the following SQL:
SELECT entities.id AS entities_id,
entities.i_want_this AS entities_i_want_this,
entities.some_value AS entities_some_value
FROM entities JOIN (SELECT min(entities.id) AS min_id
FROM entities GROUP BY entities.some_value) AS grouped
ON grouped.min_id = entities.id

How to achieve that every record in two tables have different id?

I have two tables inherited from base table ( SQLALCHEMY models)
class Base(object):
def __tablename__(self):
return self.__name__.lower()
id = Column(Integer, primary_key=True, nullable=False)
utc_time = Column(Integer, default=utc_time(), onupdate=utc_time())
datetime = Column(TIMESTAMP, server_default=func.now(), onupdate=func.current_timestamp())
and inherited tables Person and Data
How to achieve that every Person and Data have different id, every id to be unique for two tables ? ( Person when generate of id to be aware of Data ids and vice versa)
if you're using Postgresql, Firebird, or Oracle, use a sequence that's independent of both tables to generate primary key values. Otherwise, you need to roll some manual process like an "id" table or something like that - this can be tricky to do atomically.
Basically if I were given this problem, I'd ask why exactly would two tables need unique primary key values like that - if the primary key is an autoincrementing integer, that indicates it's meaningless. It's only purpose is to provide a unique key into a single table.

Categories

Resources