I am trying to create many objects. Currently it is done by calling session.add() then session.flush() upon each object being created. This is bad because there are over 500+ objects being created so each of them performs a query which affects performance. I am trying to optimise this.
To give some background, here is how my object roughly looks like:
class Metric:
data: str
fk_document_id: int (relationship)
fk_user_id: int (realtionship)
Key thing to highlight, there are 2 relationships in this object, implemented using the relationship property that SQLAlchemy offers.
My first go was to change the code to use bulk_save_objects() however this method does not save the relationships and also requires a session.commit() which I'd ideally not do because there is a chance this transaction could get rolled back after the creation of the Metric objects.
Then I found add_all() however this performs an add() for each object anyways, so I do not think it will impact the current performance much.
I can't see any other approach, so my question is if there is a way in this case to use bulk_save_objects() to persist relationships and possibly not require a commit. Alternatively, if there is another method to do this that I do not see, please let me know.
Related
It seems to me that MetaData.reflect() and sqlalchemy.ext.automap.prepare() tables should be able to be used interchangeably for many use cases, but they can't be.
The metadata.tables['mytable'] into conn.execute(select(...)) returns a sqlalchemy.engine.cursor.CursorResult and your iterator gets the columns directly (eg x.columnA).
But automap_base().classes.mytable into the same conn.execute(select(...)) returns a sqlalchemy.engine.result.ChunkedIteratorResult and you need x.mytable.columnA to get at the column.
The sqlalchemy.engine.Result() documention says as much:
New in version 1.4: The Result object provides a completely updated
usage model and calling facade for SQLAlchemy Core and SQLAlchemy ORM.
In Core, it forms the basis of the CursorResult object which replaces
the previous ResultProxy interface. When using the ORM, a higher level
object called ChunkedIteratorResult is normally used.
Can I generically convert one to the other? That is, some wrapper that works for every table without needing the table name?
What's the best futureproof way to do this? I want my code to be forward-looking to sqlalchemy 2.0. Does that mean I should move away from either automap or MetaData?
sqlalchemy 1.4.35
This is the difference between the Core and the ORM.
select() from a Table vs. ORM class
While the SQL generated in these examples looks the same whether we
invoke select(user_table) or select(User), in the more general case
they do not necessarily render the same thing, as an ORM-mapped class
may be mapped to other kinds of “selectables” besides tables. The
select() that’s against an ORM entity also indicates that ORM-mapped
instances should be returned in a result, which is not the case when
SELECTing from a Table object.
Don't hesitate to use the ORM. It's higher level, pythonic, cool, and automap is ORM.
I like how Django ORM lazy loads related objects in the queryset, but I guess it's quite unpredictable as it is.
The queryset API doesn't keep the related objects when they are used to make a queryset, thereby fetching them again when accessed later.
Suppose I have a ModelA instance (say instance_a) which is a foreign key (say for_a) of some N instances of ModelB. Now I want to perform query on ModelB which has the given ModelA instance as the foreign key.
Django ORM provides two ways:
Using .filter() on ModelB:
b_qs = ModelB.objects.filter(for_a=instance_a)
for instance_b in b_qs:
instance_b.for_a # <-- fetches the same row for ModelA again
Results in 1 + N queries here.
Using reverse relations on ModelA instance:
b_qs = instance_a.for_a_set.all()
for instance_b in b_qs:
instance_b.for_a # <-- this uses the instance_a from memory
Results in 1 query only here.
While the second way can be used to achieve the result, it's not part of the standard API and not useable for every scenario. For example, if I have instances of 2 foreign keys of ModelB (say, ModelA and ModelC) and I want to get related objects to both of them.
Something like the following works:
ModelB.objects.filter(for_a=instance_a, for_c=instance_c)
I guess it's possible to use .intersection() for this scenario, but I would like a way to achieve this via the standard API. After all, covering such cases would require more code with non-standard queryset functions which may not make sense to the next developer.
So, the first question, is it possible to optimize such scenarios with the the standard API itself?
The second question, if it's not possible right now, can it be added with some tweaks with the QuerySet?
PS: It's my first time asking a question here, so forgive me if I made any mistake.
You could improve the query by using select_related():
b_qs = ModelB.objects.select_related('for_a').filter(for_a=instance_a)
or
b_qs = instance_a.for_a_set.select_related('for_a')
Does that help?
You use .select_related(..) [Django-doc] for ForeignKeys, or .prefetch_related(..) [Django-doc] for something-to-many relations.
With .select_related(..) you will make a LEFT OUTER JOIN at the database side, and fetch records for the two objects, and thus do the deserialization to the proper objects.
ModelB.objects.select_related('for_a').filter(for_a=instance_a)
For relations that are one-to-many (so a reversed ForeignKey), or ManyToManyFields, this is not a good idea, since it could result in a large amount of duplicate objects that are retrieved. This would result in a large answer from the database, and a lot of work at the Python end to deserialize these objects. .prefetch_related will make individual queries, and then do the linking itself.
I'm creating a game mod for Counter-Strike in python, and it's basically all done. The only thing left is to code a REAL database, and I don't have any experience on sqlite, so I need quite a lot of help.
I have a Player class with attribute self.steamid, which is unique for every Counter-Strike player (received from the game engine), and self.entity, which holds in an "Entity" for player, and Entity-class has lots and lots of more attributes, such as level, name and loads of methods. And Entity is a self-made Python class).
What would be the best way to implement a database, first of all, how can I save instances of Player with an other instance of Entity as it's attribute into a database, powerfully?
Also, I will need to get that users data every time he connects to the game server, (I have player_connect event), so how would I receive the data back?
All the tutorials I found only taught about saving strings or integers, but nothing about whole instances. Will I have to save every attribute on all instances (Entity instance has few more instances as it's attributes, and all of them have huge amounts of attributes...), or is there a faster, easier way?
Also, it's going to be a locally saved database, so I can't really use any other languages than sql.
You need an ORM. Either you roll your own (which I never suggest), or you use one that exists already. Probably the two most popular in Python are sqlalchemy, and the ORM bundled with Django.
SQL databses typically can hold only fundamental datatypes. You can use SQLAlchemy if you want to map your models so that their attributes are automatically mapped to SQL types - but it would require a lot of study and trial and error using SQLlite on your part.
I think you are not entirely correct when you say "it has to be SQL" - if you are running Python code, you can save whatver format you like.
However, Python allows you to serialize your instance Data to a string - which is persistable in a database.
So, you can create a varchar(65535) field in the SQL, along with an ID field (which could be the player ID number you mentioned, for example), and persist to it the value returned by:
import pickle
value = pickle.dumps(my_instance)
When retrieving the value you do the reverse:
my_instance = pickle.loads(value)
I'm working with SQLAlchemy for the first time and was wondering...generally speaking is it enough to rely on python's default equality semantics when working with SQLAlchemy vs id (primary key) equality?
In other projects I've worked on in the past using ORM technologies like Java's Hibernate, we'd always override .equals() to check for equality of an object's primary key/id, but when I look back I'm not sure this was always necessary.
In most if not all cases I can think of, you only ever had one reference to a given object with a given id. And that object was always the attached object so technically you'd be able to get away with reference equality.
Short question: Should I be overriding eq() and hash() for my business entities when using SQLAlchemy?
Short answer: No, unless you're working with multiple Session objects.
Longer answer, quoting the awesome documentation:
The ORM concept at work here is known as an identity map and ensures that all operations upon a particular row within a Session operate upon the same set of data. Once an object with a particular primary key is present in the Session, all SQL queries on that Session will always return the same Python object for that particular primary key; it also will raise an error if an attempt is made to place a second, already-persisted object with the same primary key within the session.
I had a few situations where my sqlalchemy application would load multiple instances of the same object (multithreading/ different sqlalchemy sessions ...). It was absolutely necessary to override eq() for those objects or I would get various problems. This could be a problem in my application design, but it probably doesn't hurt to override eq() just to be sure.
I have a lot of model classes with ralations between them with a CRUD interface to edit. The problem is that some objects can't be deleted since there are other objects refering to them. Sometimes I can setup ON DELETE rule to handle this case, but in most cases I don't want automatic deletion of related objects till they are unbound manually. Anyway, I'd like to present editor a list of objects refering to currently viewed one and highlight those that prevent its deletion due to FOREIGN KEY constraint. Is there a ready solution to automatically discover referers?
Update
The task seems to be quite common (e.g. django ORM shows all dependencies), so I wonder that there is no solution to it yet.
There are two directions suggested:
Enumerate all relations of current object and go through their backref. But there is no guarantee that all relations have backref defined. Moreover, there are some cases when backref is meaningless. Although I can define it everywhere I don't like doing this way and it's not reliable.
(Suggested by van and stephan) Check all tables of MetaData object and collect dependencies from their foreign_keys property (the code of sqlalchemy_schemadisplay can be used as example, thanks to stephan's comments). This will allow to catch all dependencies between tables, but what I need is dependencies between model classes. Some foreign keys are defined in intermediate tables and have no models corresponding to them (used as secondary in relations). Sure, I can go farther and find related model (have to find a way to do it yet), but it looks too complicated.
Solution
Below is a method of base model class (designed for declarative extention) that I use as solution. It is not perfect and doesn't meet all my requirements, but it works for current state of my project. The result is collected as dictionary of dictionaries, so I can show them groupped by objects and their properties. I havn't decided yet whether it's good idea, since the list of referers sometimes is huge and I'm forced to limit it to some reasonable number.
def _get_referers(self):
db = object_session(self)
cls, ident = identity_key(instance=self)
medatada = cls.__table__.metadata
result = {}
# _mapped_models is my extension. It is collected by metaclass, so I didn't
# look for other ways to find all model classes.
for other_class in medatada._mapped_models:
queries = {}
for prop in class_mapper(other_class).iterate_properties:
if not (isinstance(prop, PropertyLoader) and \
issubclass(cls, prop.mapper.class_)):
continue
query = db.query(prop.parent)
comp = prop.comparator
if prop.uselist:
query = query.filter(comp.contains(self))
else:
query = query.filter(comp==self)
count = query.count()
if count:
queries[prop] = (count, query)
if queries:
result[other_class] = queries
return result
Thanks to all who helped me, especially stephan and van.
SQL: I have to absolutely disagree with S.Lott' answer.
I am not aware of out-of-the-box solution, but it is definitely possible to discover all the tables that have ForeignKey constraints to a given table. One needs to use properly the INFORMATION_SCHEMA views such as REFERENTIAL_CONSTRAINTS, KEY_COLUMN_USAGE, TABLE_CONSTRAINTS, etc. See SQL Server example. With some limitations and extensions, most versions of new relational databases support INFORMATION_SCHEMA standard. When you have all the FK information and the object (row) in the table, it is a matter of running few SELECT statements to get all other rows in other tables that refer to given row and prevent it from being deleted.
SqlAlchemy: As noted by stephan in his comment, if you use orm with backref for relations, then it should be quite easy for you to get the list of parent objects that keep reference to the object you are trying to delete, because those objects are basically mapped properties of your object (child1.Parent).
If you work with Table objects of sql alchemy (or not always use backref for relations), then you would have to get values of foreign_keys for all the tables, and then for all those ForeignKeys call references(...) method, providing your table as a parameter. In this way you will find all the FKs (and tables) that have reference to the table your object maps to. Then you can query all the objects that keep reference to your object by constructing the query for each of those FKs.
In general, there's no way to "discover" all of the references in a relational database.
In some databases, they may use declarative referential integrity in the form of explicit Foreign Key or Check constraints.
But there's no requirement to do this. It can be incomplete or inconsistent.
Any query can include a FK relationship that is not declared. Without the universe of all queries, you can't know the relationships which are used but not declared.
To find "referers" in general, you must actually know the database design and have all queries.
For each model class, you can easily see if all its one-to-many relations are empty simply by asking for the list in each case and seeing how many entries it contains. (There is probably a more efficient way implemented in terms of COUNT, too.) If there are any foreign keys relating to the object, and you have your object relations set up correctly, then at least one of these lists will be non-zero in length.