Flask relationship model multiple get or create if not exist - python

I just started with Flask and SQLAlchemy in flask.
So I have a many-to-many relationship using the example here http://docs.sqlalchemy.org/en/latest/orm/tutorial.html
If you scrolldown to the part about Keywords and tags this is what I am working on.
So far I am able to insert new Keywords related to my Post and I am using append. Which is wrong I know. So what happens is that the next time a non unique keyword occurs in a blog post it will throw an error about Conflict with Keyword (since keywords are supposed to be unique)
I know the right way is something else, I just don't know what. I have seen an example of
get_or_create(keyword) which basically filters by keyword and then adds it if not found. However I believe as data size grows this will also be wrong. (Several calls on every save with single insert). I love the way SQLAlchemy is doing multiple insert automatically. I wish to keep that but avoid this duplicate key issue.
Edit: found the solution, SQLAlchemy docs guide you towards error but the explanation is in there. I have added the answer.

Ok after hours of trial and error I found the solution, plus somethings I was doing wrong.
This is how SQL alchemy works. the answer is merge.
make a LIST of tags as Tag models, don;t matter if they exist as long as your primary key is name or something unique.
tags = [Tag('a1'),Tag('a2')]
Say you have Tag a1 already in DB but we don't really care. All we want is to insert if related data does not exist. WHich is what SQLalchemy does.
Now you make a Post with the LIST of ALL the tags we made. If its one only , it also is a list.
therefore
new_post = Post('a great new post',post_tags=tags)
db.session.merge(new_post)
db.session.commit()
I have used Flask syntax but the idea is same. Just make sure you are not creating the Model OUTSIDE the session. More likely, you wont do it.
This was actually simple but nowhere in the SQLAlchemy docs this example is mentioned. They use append() which is wrong. It's only to create new Tags knowing you are not making duplicates.
Hope it helps.

Related

Properly refresh an SQLAlchemy session to view externally updated data

After trying everything suggested here, I still can't get SQLAlchemy to display the correct results!
I've used various combinations of Nick's answer, session.commit(), flush() and expire_all(), restarted MySQL, even restarted the entire freaking server, and I still get old results from SQLAlchemy...why????
The most infuriating thing about this whole issue is that I can see from any other application, or even from a direct connection.execute() call, that the updated data is there. I just can't get it to display on the webpage!
BTW this is in a Pyramid app, not Flask, but since Pyramid is 99% Flask it shouldn't make a difference, right?
MTIA for any help on this, it's driving me nuts!!
PS: I tried to add this as an answer to the linked question, but it was deleted for not being a valid answer. So for future reference, if I just want to add something to an existing question without having to post an entirely new one, how would I go about that?
EDIT: My apologies zvone, here is my code:
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
session = DBSession()
query = session.query(Item).join(Item.tagged)
filters = []
for term in searchTerms:
subterms = term.split(' ')
for subterm in subterms:
filters.append(Item.itemTitle.like('%' + subterm + '%'))
filters.append(Tag.tagName.like('%' + subterm + '%'))
query = query.filter(or_(*filters))
matchedItems = query.all()
And to make some more sense out of it, here's the context:
I'm building a basic CMS where users can upload and download items of any type (text files, images, etc.).
The whole idea of this page is to allow the user to search for items that have been tagged with certain expressions. Tags are entered in the search field as a comma-delimited string of search phrases, e.g. "movies, books, photos, search term with spaces". This string is split up into its counterparts to create searchTerms, a Python list of all the terms entered into the field.
You can see in the code where I'm iterating through searchTerms, splitting phrases into separate words and adding query filters for each word.
The problem arises when searching for "big, theory". I know for certain that 3 users on the production site have posted Big Bang Theory episodes, but after migrating these DB records to my dev server, I only get one search result (the old amount).
Many thanks again for the help! :D

Django+autocomplete_light dynamic choiceField in form

I have a strange problem.
I don't know how I should even start doing it, my written english is really bad, so I can't really google it, because it seems complicated.
I'm doing a simple database web application using Django 1.7.1, I want to use autocomplete_light for autocompletion of some fields.
I'm using SQLite Database, in DB I have some "dictionary" tables, it means that user is likely to use some names multiple times in other records, so in "master" table, I store just id of that name. Is there any way of making such ChoiceFields and MultipleChoiceFields (for "reversed" situation), that if user will write new value (not stored in "dictionary" yet) in it, it will be automatically added to "dictionary" table?
I would be really thankful for any advices, or even suggestions where should I search such thing.
I did it as simple add another like in administration panel:
http://django-autocomplete-light.readthedocs.org/en/stable-2.x.x/addanother.html

Dynamic Scalable Mysql Table

Here is my situation. I used Python, Django and MySQL for a web development.
I have several tables for form posting, whose fields may change dynamically. Here is an example.
Like a table called Article, it has three fields now, called id INT, title VARCHAR(50), author VARCHAR(20), and it should be able store some other values dynamically in the future, like source VARCHAR(100) or something else.
How can I implement this gracefully? Is MySQL be able to handle it? Anyway, I don't want to give up MySQL totally, for that I'm not really familiar with NoSQL databases, and it may be risky to change technique plan in the process of development.
Any ideas welcome. Thanks in advance!
You might be interested in this post about FriendFeed's schemaless SQL approach.
Loosely:
Store documents in JSON, extracting the ID as a column but no other columns
Create new tables for any indexes you require
Populate the indexes via code
There are several drawbacks to this approach, such as indexes not necessarily reflecting the actual data. You'll also need to hack up django's ORM pretty heavily. Depending on your requirements you might be able to keep some of your fields as pure DB columns and store others as JSON?
I've never actually used it, but django-not-eav looks like the tool for the job.
"This app attempts the impossible: implement a bad idea the right way." I already love it :)
That said, this question sounds like a "rethink your approach" situation, for sure. But yes, sometimes that is simply not an option...

How do to explicitly define the query used in subqueryload_all?

I'm using subqueryload/subqueryload_all pretty heavily, and I've run into the edge case where I tend to need to very explicitly define the query that is used during the subqueryload. For example I have a situation where I have posts and comments. My query looks something like this:
posts_q = db.query(Post).options(subqueryload(Post.comments))
As you can see, I'm loading each Post's comments. The problem is that I don't want all of the posts' comments, I need to also take into account a deleted field, and they need to be ordered by create time descending. The only way I have observed this being done, is by adding options to the relationship() declaration between posts and comments. I would prefer not to do this, b/c it means that that relationship cannot be reused everywhere after that, as I have other places in the app where those constraints may not apply.
What I would love to do, is explicitly define the query that subqueryload/subqueryload_all uses to load the posts' comments. I read about DisjointedEagerLoading here, and it looks like I could simply define a special function that takes in the base query, and a query to load the specified relationship. Is this a good route to take for this situation? Anyone ever run into this edge case before?
The answer is that you can define multiple relationships between Posts and Comments:
class Post(...):
active_comments = relationship(Comment,
primary_join=and_(Comment.post_id==Post.post_id, Comment.deleted=False),
order_by=Comment.created.desc())
Then you should be able to subqueryload by that relationship:
posts_q = db.query(Post).options(subqueryload(Post.active_comments))
You can still use the existing .comments relationship elsewhere.
I also had this problem and it took my some time to realize that this is an issue by design. When you say Post.comments then you refer to the relationship that says "these are all the comments of that post". However, now you want to filter them. If you'd now specify that condition somewhere on subqueryload then you are essentially loading only a subset of values into Post.comments. Thus, there will be values missing. Essentially you have a faulty representation of your data in the model.
The question here is how to approach this then, because you obviously need this value somewhere. The way I go is building the subquery myself and then specify special conditions there. That means you get two objects back: The list of posts and the list of comments. That is not a pretty solution, but at least it is not displaying data in a wrong way. If you were to access Post.comments for some reason, you can safely assume it contains all posts.
But there is room for improvement: You might want to have this attached to your class so you don't carry around two variables. The easy way might be to define a second relationship, e.g. published_comments which specifies extra parameters. You could then also control that no-one writes to it, e.g. with attribute events. In these events you could, instead of forbidding manipulation, handle how manipulation is allowed. The only problem might be when updates happen, e.g. when you add a comment to Post.comments then published_comments won't be updated automatically because they are not aware of each other. Again, I'd take events for this if this is a required feature (but with the above ugly solution you would not have that either).
As a last, hybrid, solution you could take the first approach and then just assign those values to your object, e.g. Post.deleted_comments = deleted_comments.
The thing to keep in mind here is that it is generally not a clever idea to manipulate the query the ORM makes as this could lead to problems later on. I have taken this approach and manipulated the queries (with contains_eager this is easily possible) but it has created problems on some points (while generally being functional) so I dropped that approach.

Select list of Primary Key objects in SQLAlchemy

First off, this is my first project using SQLAlchemy, so I'm still fairly new.
I am making a system to work with GTFS data. I have a back end that seems to be able to query the data quite efficiently.
What I am trying to do though is allow for the GTFS files to update the database with new data. The problem that I am hitting is pretty obvious, if the data I'm trying to insert is already in the database, we have a conflict on the uniqueness of the primary keys.
For Efficiency reasons, I decided to use the following code for insertions, where model is the model object I would like to insert the data into, and data is a precomputed, cleaned list of dictionaries to insert.
for chunk in [data[i:i+chunk_size] for i in xrange(0, len(data), chunk_size)]:
engine.execute(model.__table__.insert(),chunk)
There are two solutions that come to mind.
I find a way to do the insert, such that if there is a collision, we don't care, and don't fail. I believe that the code above is using the TableClause, so I checked there first, hoping to find a suitable replacement, or flag, with no luck.
Before we perform the cleaning of the data, we get the list of primary key values, and if a given element matches on the primary keys, we skip cleaning and inserting the value. I found that I was able to get the PrimaryKeyConstraint from Table.primary_key, but I can't seem to get the Columns out, or find a way to query for only specific columns (in my case, the Primary Keys).
Either should be sufficient, if I can find a way to do it.
After looking into both of these for the last few hours, I can't seem to find either. I was hoping that someone might have done this previously, and point me in the right direction.
Thanks in advance for your help!
Update 1: There is a 3rd option I failed to mention above. That is to purge all the data from the database, and reinsert it. I would prefer not to do this, as even with small GTFS files, there are easily hundreds of thousands of elements to insert, and this seems to take about half an hour to perform, which means if this makes it to production, lots of downtime for updates.
With SQLAlchemy, you simply create a new instance of the model class, and merge it into the current session. SQLAlchemy will detect if it already knows about this object (from cache or the database) and will add a new row to the database if needed.
newentry = model(chunk)
session.merge(newentry)
Also see this question for context: Fastest way to insert object if it doesn't exist with SQLAlchemy

Categories

Resources