SqlAlchemy: Inspect/Reflect information about relationship

SqlAlchemy: Inspect/Reflect information about relationship - python

I have a sqlalchemy class R which implements a m:n relation between two other classes A and B. So R has two integer columns source_id and target_id which hold the ids of the referenced instances. And R has two properties source_obj and target_obj which are defined via relationship. It's more or less the same as decribed here in the documenation.
What I want to do is to retrieve the referenced classes from R. I'm using sqlalchemy 0.8 and tried to use the inspect() method on R.source_obj, but I only get back a InstrumentedAttribute which seems not to be of much help. At least I was not able to extract any useful information or to find any documentation about it.
Any help would be very appreciated! How do I get A and B from R?

Try something like this. I'm also dealing with this and find no documentation, think this can help you to start.
from sqlalchemy import inspect
i = inspect(model)
for relation in i.relationships:
print(relation.direction.name)
print(relation.remote_side)
print(relation._reverse_property)
dir(relation)

I spent the majority of the day working on this same problem, and I was able to write a list comprehension that takes in a table and then spits out a list of the table names which are connected via a relationship or a foreign key. You need to convert that string into a reference to the actual class, but otherwise it works just fine.
relationship_list = [str(list(column.remote_side)[0]).split('.')[0] for column \
in inspect(table).relationships]
By removing the .split('.')[0], you can get a list of the actual columns which are referred to by the connections. The comprehension is pretty ugly, but it works. Hope this helps anyone else who is looking for the same thing I was!

Related

Getting a list of results, 1 for each foreign key

I have a model, Reading, which has a foreign key, Type. I'm trying to get a reading for each type that I have, using the following code:
for type in Type.objects.all():
readings = Reading.objects.filter(
type=type.pk)
if readings.exists():
reading_list.append(readings[0])
The problem with this, of course, is that it hits the database for each sensor reading. I've played around with some queries to try to optimize this to a single database call, but none of them seem efficient. .values for instance will provide me a list of readings grouped by type, but it will give me EVERY reading for each type, and I have to filter them with Python in memory. This is out of the question, as we're dealing with potentially millions of readings.

if you use PostgreSQL as your DB backend you can do this in one-line with something like:
Reading.objects.order_by('type__pk', 'any_other_order_field').distinct('type__pk')
Note that the field on which distinct happens must always be the first argument in the order_by method. Feel free to change type__pk with the actuall field you want to order types on (e.g. type__name if the Type model has a name property). You can read more about distinct here https://docs.djangoproject.com/en/dev/ref/models/querysets/#distinct.
If you do not use PostgreSQL you could use the prefetch_related method for this purpose:
#reading_set could be replaced with whatever your reverse relation name actually is
for type in Type.objects.prefetch_related('reading_set').all():
readings = type.reading_set.all()
if len(readings):
reading_list.append(readings[0])
The above will perform only 2 queries in total. Note I use len() so that no extra query is performed when counting the objects. You can read more about prefetch_related here https://docs.djangoproject.com/en/dev/ref/models/querysets/#prefetch-related.
On the downside of this approach is you first retrieve all related objects from the DB and then just get the first.
The above code is not tested, but I hope it will at least point you towards the right direction.

SQLAlchemy: Knowing the field names and values of a model object?

I'm trying to keep to SOLID object oriented programming principles, stay DRY, etc, but my newness to Python/SQLAlchemy/Pyramid is making it very hard.
I'm trying to take what I now know to be a SQLAlchemy model used to create a simple Pyramid Framework object and use what I know to be "reflection" in C#, it may be called something different in Python (Introspection? Not sure as this is only my second week with python but I have lots of experience in other languages (C/C++/C#,Java, etc) so the trouble seems to be mapping my knowledge to the vocabulary of python, sorry), to find out the field names of the database table, and most importantly, the current field values, when I do not know the column names or ANY of the shape of the object in advance.
Thats right; I don't know that the 'derp' instance has a field named id or name, just that it has columns and a value in each of them. And thats all I care about.
The goal is to be able to take any SQLAlchemy defined data model, and convert it to a dictionary of column_name -> column_value fields of simple data types of the kind found in JSON as I want to ultimately serialize any object I create in SQLAlchemy to a json object, but I will settle for a dictionary as from there its trivial as long as the dictionary holds the correct types of data. Doing this for every object by hand is a violation of too many good clean code rules and will create too much work over time; I could spend another week on this and still save time and effort by doing it the right way.
So if I have a class defined in SQLAlchemy as:
class SimpleFooModel(Base):
id = Column(Integer, primary_key=True, autoincrement=True, nullable=False)
name = Column(VARCHAR(length=12), nullable=False, index=True)
.. and I have an instance of this equal to (in python):
derp = SimpleFooModel(id=7, name="Foobar")
I want to be able to having ONLY the 'derp' instance variable described above, and NO OTHER KNOWLEDGE of how the model is shaped, and be able to flatten it out to a python key->value dictionary for that simple object, where every value in that dictionary can be serialized to JSON using import json from python syslib.
The problem is , I have been up for 2 days looking at this and I cant find an answer that gives me the results I want in my unit tests ANYWHERE; Google keeps taking me to really old posts here on SO about really old versions of the library that either use interfaces that no longer apply, or have accepted answers that do not actually work at all; and since none of them are recent that does surprise me (but why Stack Overflow keeps them when they are wrong and allows google to mislead people does surprise me)
I know I could wire every object manually for every object to json, etc, but thats not only NOT ELEGANT, its inefficient because it just creates more work for me as I create more objects and could lead to big bugs down the road. I want to know how to do this the correct way, with introspection/reflection, but nobody seems to know, and the people who claim to have all given examples here on stack overflow that actually do not work at all (at least with the current versions of things)
This seems like a really common use case for me; and getting the column field list and then iterating through it with getattr - like many of the answers say to do - doesn't work as expected either; it just creates what look like namespaces that never return the actual value of the column, and don't actually exist in any code as none of the fields created by sqlalchmy are singleton/static.
So:
from sqlalchemy.inspection import inspect
obj = inspect(derp, raiseerr=True)
for key in obj.attrs.keys():
fields[key] = getattr(derp, key)
print fields[key]
Just gives me:
[Class Name].[Column Name]
.. or in this case just gives me:
SimpleFooModel.id
SimpleFooModel.name
NOT the values of 7 and "Foobar" for id and name respectively, that I actually expected in my tests.
In fact it seems like I cant even find WHERE the values are being stored in the object model; or I could brute force the issue and get them from there as an ugly, evil hack I would be ashamed to look at. All I get through the "official public api" is a lot of objects that seem to have no clue where the real data is being stored, but will happily tell me the name of the path used by the column name and type, restrictions, etc... just not the actual data that I want.
Yet since my requirement is that I do not know the field names in advance, using a call to derp.id or derp.name to collect the value is not an option since that would violate SOLID and force me to duplicate work for every single class. So its not an option.
Maybe its the fact I have not slept in 2 days but its really hard for me to not see this as a serious design flaw in these libs; I just want to serialize a SQLAlchemy defined Model object representing a single row in a table into a python dictionary without having to know the names of the fields in advance, and while many other languages make this easy or even trivial, this seems to be far too hard than it should be.
Can somebody please explain either a working solution or why I am wrong to want to apply SOLID to my code?
EDIT: Updated spelling.

Extend your model with following class:
class BaseModel(object):
#classmethod
def _get_keys(cls):
return sa.orm.class_mapper(cls).c.keys()
def get_dict(self):
d = {}
for k in self._get_keys():
d[k] = getattr(self, k)
return d
This will do exactly what you want, return a dict in form of {'column_name':'value'} pairs.

How do to explicitly define the query used in subqueryload_all?

I'm using subqueryload/subqueryload_all pretty heavily, and I've run into the edge case where I tend to need to very explicitly define the query that is used during the subqueryload. For example I have a situation where I have posts and comments. My query looks something like this:
posts_q = db.query(Post).options(subqueryload(Post.comments))
As you can see, I'm loading each Post's comments. The problem is that I don't want all of the posts' comments, I need to also take into account a deleted field, and they need to be ordered by create time descending. The only way I have observed this being done, is by adding options to the relationship() declaration between posts and comments. I would prefer not to do this, b/c it means that that relationship cannot be reused everywhere after that, as I have other places in the app where those constraints may not apply.
What I would love to do, is explicitly define the query that subqueryload/subqueryload_all uses to load the posts' comments. I read about DisjointedEagerLoading here, and it looks like I could simply define a special function that takes in the base query, and a query to load the specified relationship. Is this a good route to take for this situation? Anyone ever run into this edge case before?

The answer is that you can define multiple relationships between Posts and Comments:
class Post(...):
active_comments = relationship(Comment,
primary_join=and_(Comment.post_id==Post.post_id, Comment.deleted=False),
order_by=Comment.created.desc())
Then you should be able to subqueryload by that relationship:
posts_q = db.query(Post).options(subqueryload(Post.active_comments))
You can still use the existing .comments relationship elsewhere.

I also had this problem and it took my some time to realize that this is an issue by design. When you say Post.comments then you refer to the relationship that says "these are all the comments of that post". However, now you want to filter them. If you'd now specify that condition somewhere on subqueryload then you are essentially loading only a subset of values into Post.comments. Thus, there will be values missing. Essentially you have a faulty representation of your data in the model.
The question here is how to approach this then, because you obviously need this value somewhere. The way I go is building the subquery myself and then specify special conditions there. That means you get two objects back: The list of posts and the list of comments. That is not a pretty solution, but at least it is not displaying data in a wrong way. If you were to access Post.comments for some reason, you can safely assume it contains all posts.
But there is room for improvement: You might want to have this attached to your class so you don't carry around two variables. The easy way might be to define a second relationship, e.g. published_comments which specifies extra parameters. You could then also control that no-one writes to it, e.g. with attribute events. In these events you could, instead of forbidding manipulation, handle how manipulation is allowed. The only problem might be when updates happen, e.g. when you add a comment to Post.comments then published_comments won't be updated automatically because they are not aware of each other. Again, I'd take events for this if this is a required feature (but with the above ugly solution you would not have that either).
As a last, hybrid, solution you could take the first approach and then just assign those values to your object, e.g. Post.deleted_comments = deleted_comments.
The thing to keep in mind here is that it is generally not a clever idea to manipulate the query the ORM makes as this could lead to problems later on. I have taken this approach and manipulated the queries (with contains_eager this is easily possible) but it has created problems on some points (while generally being functional) so I dropped that approach.

Flask relationship model multiple get or create if not exist

I just started with Flask and SQLAlchemy in flask.
So I have a many-to-many relationship using the example here http://docs.sqlalchemy.org/en/latest/orm/tutorial.html
If you scrolldown to the part about Keywords and tags this is what I am working on.
So far I am able to insert new Keywords related to my Post and I am using append. Which is wrong I know. So what happens is that the next time a non unique keyword occurs in a blog post it will throw an error about Conflict with Keyword (since keywords are supposed to be unique)
I know the right way is something else, I just don't know what. I have seen an example of
get_or_create(keyword) which basically filters by keyword and then adds it if not found. However I believe as data size grows this will also be wrong. (Several calls on every save with single insert). I love the way SQLAlchemy is doing multiple insert automatically. I wish to keep that but avoid this duplicate key issue.
Edit: found the solution, SQLAlchemy docs guide you towards error but the explanation is in there. I have added the answer.

Ok after hours of trial and error I found the solution, plus somethings I was doing wrong.
This is how SQL alchemy works. the answer is merge.
make a LIST of tags as Tag models, don;t matter if they exist as long as your primary key is name or something unique.
tags = [Tag('a1'),Tag('a2')]
Say you have Tag a1 already in DB but we don't really care. All we want is to insert if related data does not exist. WHich is what SQLalchemy does.
Now you make a Post with the LIST of ALL the tags we made. If its one only , it also is a list.
therefore
new_post = Post('a great new post',post_tags=tags)
db.session.merge(new_post)
db.session.commit()
I have used Flask syntax but the idea is same. Just make sure you are not creating the Model OUTSIDE the session. More likely, you wont do it.
This was actually simple but nowhere in the SQLAlchemy docs this example is mentioned. They use append() which is wrong. It's only to create new Tags knowing you are not making duplicates.
Hope it helps.

Using Elixir, how can I get the table object of a self-referential relationship to perform inserts on?

I'm using Elixir with SQLite and I'd like to perform multiple inserts as per the docs:
http://www.sqlalchemy.org/docs/05/sqlexpression.html#executing-multiple-statements
However, my ManyToMany relationship is self-referential and I can't figure out where to get the insert() object from. Can anyone help?

It might be easy if you just stick with SQL Alchemy's built in Declarative style instead of using Elixir as much of what it does is now doable in there. Then you can follow the example here: Many to Many
Then look very closely at the code where a post is added and then keywords related to that post are added. You get multiple inserts done for you into the relator table - the one that maintains the many to many relationship:
>>> post.keywords.append(Keyword('wendy'))
>>> post.keywords.append(Keyword('firstpost'))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.