SQLAlchemy: is it possible to store session wide properties - python

I am working on a software that manipulates SQL tables using SQLALchemy.
Each operation a user will perform (insertion, modification, deletion) must be logged on a specific LOG table.
The log table looks like this:
+-----------+----------------------------------------------+
| user_id | log |
+-----------+----------------------------------------------+
| 21 | Value x added in table y |
| 12 | Value z deleted from table w |
To write such logs, I have a function define in the table Log that insert a new log with the following prototype.
class Foo(Base):
__tablename__ = 'foo'
id = Column(Integer, primary_key=True)
value = Column(String)
#staticmethod
def insert(value):
item = Foo()
item.value = value
session.add(item)
Log.add(item)
class Log(Base):
user_id = Column(Integer, not_null=True)
value = Column(String, not_null=True)
#saticmethod
def add(item):
logitem = Log()
logitem.user_id = x
logitem.value = "Insertion of %s" % item.value
session.add(logitem)
The code above does not work because 'x' for user is not defined.
I don't want to pass the user_id as an argument when I call the Foo.insert method. I would like to know if it is possible to bind the user_id to the session so that the user_id would be define once and persist for all sql queries.

Sessions have an info attribute, a user-modifiable dictionary. The dictionary can be pre-populated when a session is created, and modified and accessed thereafter:
s = Session(info={'foo': 'bar'})
foo = s.info['foo']
s.info['baz'] = 'quux'

As far as I know, you can't use the session as a global namespace, and it doesn't look like a good idea anyway. If you really need to do that, it should be done by your application, using any other global or session state you have, not by coupling SQLAlchemy to your authentication/user-session process.

Related

SQLAlchemy I/O where parameters

I'm currently working on a project with a database I use through SQLAlchemy I/O and I stumble on a problem I can't solve. In the following DbSession is an asynchronous session and select is the select function of the library.
I have a class Player with 3 attributes id: BigInteger (Primary Key), name: String, other_id(nullable, if not null can serve as a primary key): BigInteger.
class Player(Base):
__tablename__ = "players"
id = Column(Integer, primary_key=True)
name = Column(String)
other_id = Column(BigInteger, nullable=True)
I implemented 2 methods get and get_by_id:
get is working well and select a Player in the table through its id:
#classmethod
async def get(cls, id):
query = select(cls).where(cls.id == id)
results = await DbSession.execute(query)
result = results.scalars().all()[0]
return result
My problem comes with get_by_id which is supposed to find a player through its other_id.
I tried:
#classmethod
async def get_dc_id(cls, id):
query = select(cls).filter(cls.other_id == id)
results = await DbSession.execute(query)
result = results.scalars().all()[0]
return result
As well as:
#classmethod
async def get_dc_id(cls, id):
query = select(cls).where(cls.other_id == id)
results = await DbSession.execute(query)
result = results.scalars().all()[0]
return result
But both send back an error:
ProgrammingError: (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) <class 'asyncpg.exceptions.UndefinedFunctionError'>: operator does not exist: character varying = bigint
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
[SQL: SELECT players.id, players.name, players.other_id
FROM players
WHERE players.other_id = %s]
[parameters: (331534096054616068,)]
If I understand this right, the parameter of the call is actually the id I gave to my function but wrapped in a sort of tuple (that comes from I don't know where). It throws an error as this tuple doesn't match the type BigInteger other_id is supposed to have. I checked multiple times that I'm effectively giving an integer as an argument to get_by_id (here equal to 331534096054616068). I must admit that I don't know why the id ends up wrapped in a tuple, if it's a normal behavior or not as I just started working with sqlalchemy.
Any hint or help will be greatly appreciated.
It seems that your actual schema differs and other_id is a STRING column.
Can you inspect the postgresql db directly with psql and check if \d players is correct?
I get the same error if I change your column definition to other_id = Column(String, nullable=True) and create the db schema but it works if I recreate the schema with other_id = Column(BigInteger, nullable=True).

SQLAlchemy + Postgres (print created types, e.g. enum types)

I would like to be able to get info about what types will be created during SQLAlchemy's create_all(). Yes, they can be printed if I set up echo-ing of generated SQL, but how can i print it without actually hitting database? For example, I have a model:
class MyModel(Base):
id = Column(Integer, primary_key=True)
indexed_field = Column(String(50))
enum_field = Column(Enum(MyEnum))
__table_args__ = (
Index("my_ix", indexed_field),
)
where MyEnum is:
class MyEnum(enum.Enum):
A = 0
B = 1
I can get CREATE TABLE statement and all CREATE INDEX statements like this:
from sqlalchemy.schema import CreateTable, CreateIndex
print(str(CreateTable(MyModel.__table__).compile(postgres_engine)))
for idx in MyModel.__table__.indexes:
print(str(CreateIndex(idx)).compile(postgres_engine))
Result will be something like that:
CREATE TABLE my_model (
id SERIAL NOT NULL,
indexed_field VARCHAR(50),
enum_field myenum,
PRIMARY KEY (id)
)
CREATE INDEX my_ix ON my_model (indexed_field)
Notice the line enum_field myenum. How can I get generated SQL for CREATE TYPE myenum... statement?
I've found the answer!
from sqlalchemy.dialects.postgresql.base import PGInspector
PGInspector(postgres_engine).get_enums()
It returns a list of all created enums, which IMO is even better than raw sql, documentation is here.

Using bulk_update_mappings in SQLAlchemy to update multiple rows with different values

I have two tables Foo and Bar. I just added a new column x to the Bar table which has to be populated using values in Foo
class Foo(Base):
__table__ = 'foo'
id = Column(Integer, primary_key=True)
x = Column(Integer, nullable=False)
class Bar(Base):
__table__ = 'bar'
id = Column(Integer, primary_key=True)
x = Column(Integer, nullable=False)
foo_id = Column(Integer, ForeignKey('foo.id'), nullable=False)
One straightforward way to do it would be iterating over all the rows in Bar and then updating them one-by-one, but it takes a long time (there are more than 100k rows in Foo and Bar)
for b, foo_x in session.query(Bar, Foo.x).join(Foo, Foo.id==Bar.foo_id):
b.x = foo_x
session.flush()
Now I was wondering if this would be right way to do it -
mappings = []
for b, foo_x in session.query(Bar, Foo.x).join(Foo, Foo.id==Bar.foo_id):
info = {'id':b.id, 'x': foo_x}
mappings.append(info)
session.bulk_update_mappings(Bar, mappings)
There are not much examples on bulk_update_mappings out there. The docs suggest
All those keys which are present and are not part of the primary key
are applied to the SET clause of the UPDATE statement; the primary key
values, which are required, are applied to the WHERE clause.
So, in this case id will be used in the WHERE clause and then that would be updates using the x value in the dictionary right ?
The approach is correct in terms of usage. The only thing I would change is something like below
mappings = []
i = 0
for b, foo_x in session.query(Bar, Foo.x).join(Foo, Foo.id==Bar.foo_id):
info = {'id':b.id, 'x': foo_x}
mappings.append(info)
i = i + 1
if i % 10000 == 0:
session.bulk_update_mappings(Bar, mappings)
session.flush()
session.commit()
mappings[:] = []
session.bulk_update_mappings(Bar, mappings)
This will make sure you don't have too much data hanging in memory and you don't do a too big insert to the DB at a single time
Not directly related to this question, but for those searching for more performance when updating/inserting using both methods: bulk_update_mappings and bulk_insert_mappings, just add the fast_executemany to your engine as follows:
engine = create_engine(connection_string, fast_executemany=True)
You can use that parameter in sqlalchemy versions above 1.3. This parameter comes from pyodbc and it will for sure speed up your bulk requests.

Sqlalchemy one to many relationship join?

I am trying to do a simple join query like this,
SELECT food._id, food.food_name, food_categories.food_categories FROM food JOIN food_categories ON food.food_category_id = food_categories._id
but keep receiving an error. Here is how my classes are setup.
class Food_Categories(db.Model):
__tablename__ = 'food_categories'
_id = db.Column(db.Integer, primary_key=True)
food_categories = db.Column(db.String(30))
class Food(db.Model):
__tablename__ = 'food'
_id = db.Column(db.Integer, primary_key=True)
food_name = db.Column(db.String(40))
food_category_id = db.Column(db.Integer, ForeignKey(Food_Categories._id))
food_category = relationship("Food_Categories")
My query function looks like this.
#app.route('/foodlist')
def foodlist():
if request.method == 'GET':
results = Food.query.join(Food_Categories.food_categories).all()
json_results = []
for result in results:
d = {'_id': result._id,
'food': result.food_name,
'food_category': result.food_categories}
json_results.append(d)
return jsonify(user=json_results)
I am using Flask. When I call the route I get this error.
AttributeError: 'ColumnProperty' object has no attribute 'mapper'
I essentially want this:
| id | food_name | food_category |
and have the food_category_id column replaced with the actual name of the food category located in other table.
Are my tables/relationships set up correctly? Is my query setup correctly?
Your tables and relationships are setup correctly. Your query needs a change.
The reason for an error is the fact that you try to perform a join on the column (Food_Categories.food_categories) instead of a Table (or mapped model object). Technically, you should replace your query with the one below to fix the error:
results = Food.query.join(Food_Categories).all()
This will fix the error, but will not generate the SQL statement you desire, because it will return instances of Food only as a result even though there is a join.
In order to build a query which will generate exactly the SQL statement you have in mind:
results = (db.session.query(Food._id, Food.food_name,
Food_Categories.food_categories,)
.join(Food_Categories)).all()
for x in results:
# print(x)
print(x._id, x.food_name, x.food_categories)
Please note that in this case the results are not instances of Food, but rather tuples with 3 column values.

SQLAlchemy: One-Way Relationship, Correlated Subquery

thanks in advance for your help.
I have two entities, Human and Chimp. Each has a collection of metrics, which can contain subclasses of a MetricBlock, for instance CompleteBloodCount (with fields WHITE_CELLS, RED_CELLS, PLATELETS).
So my object model looks like (forgive the ASCII art):
--------- metrics --------------- ----------------------
| Human | ----------> | MetricBlock | <|-- | CompleteBloodCount |
--------- --------------- ----------------------
^
--------- metrics |
| Chimp | --------------
---------
This is implemented with the following tables:
Chimp (id, …)
Human (id, …)
MetricBlock (id, dtype)
CompleteBloodCount (id, white_cells, red_cells, platelets)
CholesterolCount (id, hdl, ldl)
ChimpToMetricBlock(chimp_id, metric_block_id)
HumanToMetricBlock(human_id, metric_block_id)
So a human knows its metric blocks, but a metric block does not know its human or chimp.
I would like to write a query in SQLAlchemy to find all CompleteBloodCounts for a particular human. In SQL I could write something like:
SELECT cbc.id
FROM complete_blood_count cbc
WHERE EXISTS (
SELECT 1
FROM human h
INNER JOIN human_to_metric_block h_to_m on h.id = h_to_m.human_id
WHERE
h_to_m.metric_block_id = cbc.id
)
I'm struggling though to write this in SQLAlchemy. I believe correlate(), any(), or an aliased join may be helpful, but the fact that a MetricBlock doesn't know its Human or Chimp is a stumbling block for me.
Does anyone have any advice on how to write this query? Alternately, are there other strategies to define the model in a way that works better with SQLAlchemy?
Thank you for your assistance.
Python 2.6
SQLAlchemy 0.7.4
Oracle 11g
Edit:
HumanToMetricBlock is defined as:
humanToMetricBlock = Table(
"human_to_metric_block",
metadata,
Column("human_id", Integer, ForeignKey("human.id"),
Column("metric_block_id", Integer, ForeginKey("metric_block.id")
)
per the manual.
Each primate should have a unique ID, regardless of what type of primate they are. I'm not sure why each set of attributes (MB, CBC, CC) are separate tables, but I assume that they have more than one dimension (primate) such as time, otherwise I would only have one giant table.
Thus, I would structure this problem in the following manner:
Create a parent object Primate and derive humans and chimps from it. This example is using single table inheritance, though you may want to use joined table inheritance based on their attributes.
class Primate(Base):
__tablename__ = 'primate'
id = Column(Integer, primary_key=True)
genus = Column(String)
...attributes all primates have...
__mapper_args__ = {'polymorphic_on': genus, 'polymorphic_identity': 'primate'}
class Chimp(Primate):
__mapper_args__ = {'polymorphic_identity': 'chimp'}
...attributes...
class Human(Primate):
__mapper_args__ = {'polymorphic_identity': 'human'}
...attributes...
class MetricBlock(Base):
id = ...
Then you create a single many-to-many table (you can use an association proxy instead):
class PrimateToMetricBlock(Base):
id = Column(Integer, primary_key=True) # primary key is needed!
primate_id = Column(Integer, ForeignKey('primate.id'))
primate = relationship('Primate') # If you care for relationships.
metricblock_id = Column(Integer, ForeignKey('metric_block.id')
metricblock = relationship('MetricBlock')
Then I would structure the query like so (note that the on clause is not necessary since SQLAlchemy can infer the relationships automatically since there's no ambiguity):
query = DBSession.query(CompleteBloodCount).\
join(PrimateToMetricBlock, PrimateToMetricBlock.metricblock_id == MetricBlock.id)
If you want to filter by primate type, join the Primate table and filter:
query = query.join(Primate, Primate.id == PrimateToMetricBlock.primate_id).\
filter(Primate.genus == 'human')
Otherwise, if you know the ID of the primate (primate_id), no additional join is necessary:
query = query.filter(PrimateToMetricBlock.primate_id == primate_id)
If you're only retrieving one object, end the query with:
return query.first()
Otherwise:
return query.all()
Forming your model like this should eliminate any confusion and actually make everything simpler. If I'm missing something, let me know.

Categories

Resources