Really long query - python

How do u do long query? Is there way to optimize it?
I would do complicated and long query:
all_accepted_parts = acceptedFragment.objects.filter(fragmentID = fragment.objects.filter(categories = fragmentCategory.objects.filter(id=1)))
but it doesn't work, i get:
Error binding parameter 0 - probably unsupported type.
I will be thankful for any hint how i could optimize it or solve of course too - more thankful :)

If it's not working, you can't optimize it. First make it work.
At first glance, it seems that you have really mixed concepts about fields, relationships and equality/membership. First go thought the docs, and build your query piece by piece on the python shell (likely from the inside out).
Just a shot in the dark:
all_accepted_parts = acceptedFragment.objects.filter(fragment__in = fragment.objects.filter(categories = fragmentCategory.objects.get(id=1)))
or maybe:
all_accepted_parts = acceptedFragment.objects.filter(fragment__in = fragment.objects.filter(categories = 1))

As others have said, we really need the models, and some explanation of what you're actually trying to achieve.
But it looks like you want to do a related table lookup. Rather than getting all the related objects in a separate nested query, you should use Django's related model syntax to do the join within your query.
Something like:
acceptedFragment.objects.filter(fragment__categories__id = 1)

Related

Python Mongodb sorting too big, how to use index?

I'm trying to iterate in Python over all elements of a large Mongodb database.
Usually, I do:
mgclient = MongoClient('mongodb://user:pwd#0.0.0.0:27017')
mgdb = mgclient['mongo']
mgcol = mgdb['name']
for mg_ob in mgcol.find().sort('Date').sort('time'):
#DOTHINGS
But it says "Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit".
So I created an index named 'SortedTime', but I don't understand how I can use it now.
Basically, I'm trying to have something like:
mgclient = MongoClient('mongodb://user:pwd#0.0.0.0:27017')
mgdb = mgclient['mongo']
mgcol = mgdb['name']
for mg_ob in mgcol.find()['SortedTime']:
#DOTHINGS
Any ideas ? A little hand would be much appreciated.
I hope this post will help others. Thank you very much
Update:
I managed to make it work thanks to Joe. After I created the Index:
resp = mgcol.create_index(
[
("date", 1),
("time", 1)
]
)
print ("index response:", resp)
What I did was just:
mgclient = MongoClient('mongodb://user:pwd#0.0.0.0:27017')
mgdb = mgclient['mongo']
mgcol = mgdb['name']
for mg_ob in mgcol.find():
#DOTHINGS
No need to use the index name.
Your query is sorting on 2 fields, Date and time, so you will need an index that includes these fields first in the key specification.
Working from the mongo shell, you might use the createIndex shell helper:
db.getSiblingDB("mongo").getCollection("name").createIndex({Date:1, time:1})
Working from the client side, you might use the createIndexes database command.
Once the index has been created, query just like you did before and the mongod's query executor should use the index.
You can use explain() to get detailed query execution stages to see which indexes were considered and the comparative performance of each.

How can I query rows with unique values on a joined column?

I'm trying to have my popular_query subquery remove dupe Place.id, but it doesn't remove it. This is the code below. I tried using distinct but it does not respect the order_by rule.
SimilarPost = aliased(Post)
SimilarPostOption = aliased(PostOption)
popular_query = (db.session.query(Post, func.count(SimilarPost.id)).
join(Place, Place.id == Post.place_id).
join(PostOption, PostOption.post_id == Post.id).
outerjoin(SimilarPostOption, PostOption.val == SimilarPostOption.val).
join(SimilarPost,SimilarPost.id == SimilarPostOption.post_id).
filter(Place.id == Post.place_id).
filter(self.radius_cond()).
group_by(Post.id).
group_by(Place.id).
order_by(desc(func.count(SimilarPost.id))).
order_by(desc(Post.timestamp))
).subquery().select()
all_posts = db.session.query(Post).select_from(filter.pick()).all()
I did a test printout with
print [x.place.name for x in all_posts]
[u'placeB', u'placeB', u'placeB', u'placeC', u'placeC', u'placeA']
How can I fix this?
Thanks!
This should get you what you want:
SimilarPost = aliased(Post)
SimilarPostOption = aliased(PostOption)
post_popularity = (db.session.query(func.count(SimilarPost.id))
.select_from(PostOption)
.filter(PostOption.post_id == Post.id)
.correlate(Post)
.outerjoin(SimilarPostOption, PostOption.val == SimilarPostOption.val)
.join(SimilarPost, sql.and_(
SimilarPost.id == SimilarPostOption.post_id,
SimilarPost.place_id == Post.place_id)
)
.as_scalar())
popular_post_id = (db.session.query(Post.id)
.filter(Post.place_id == Place.id)
.correlate(Place)
.order_by(post_popularity.desc())
.limit(1)
.as_scalar())
deduped_posts = (db.session.query(Post, post_popularity)
.join(Place)
.filter(Post.id == popular_post_id)
.order_by(post_popularity.desc(), Post.timestamp.desc())
.all())
I can't speak to the runtime performance with large data sets, and there may be a better solution, but that's what I managed to synthesize from quite a few sources (MySQL JOIN with LIMIT 1 on joined table, SQLAlchemy - subquery in a WHERE clause, SQLAlchemy Query documentation). The biggest complicating factor is that you apparently need to use as_scalar to nest the subqueries in the right places, and therefore can't return both the Post id and the count from the same subquery.
FWIW, this is kind of a behemoth and I concur with user1675804 that SQLAlchemy code this deep is hard to grok and not very maintainable. You should take a hard look at any more low-tech solutions available like adding columns to the db or doing more of the work in python code.
I don't want to sound like the bad guy here but... in my opinion your approach to the issue seems far less than optimal... if you're using postgresql you could simplify the whole thing using WITH ... but a better approach factoring in my assumption that these posts will be read much more often than updated would be to add some columns to your tables that are updated by triggers on insert/update to other tables, at least if performance is likely to ever become an issue this is the solution I'd go with
Not very familiar with sqlalchemy, so can't write it in clear code for you, but the only other solution I can come up with uses at least a subquery to select the things from order by for each of the columns in group by, and that will add significantly to your already slow query

python postgis ST_Contains failing query

I have been trying the following but always fails,
roomTypeSQL = "SELECT spftype FROM cameron_toll_spatialfeatures WHERE ST_Contains(ST_GeomFromText(%s), ST_geomFromWKB(geometry)) = 'True';"
roomTypeData = (pointTested) # "POINT(-3.164005 55.926378)"
.execute(roomTypeSQL, roomTypeData)
I want to get the polygon from my table which contains the specific point. I have also tried ST_Within which also fails. I think my problem is related to the formatting of the point and polygon but I have tried almost all combinations and nothing does the job. I tried defining my polygon and it worked but I must do it with a polygon from the database. My postgresql log file is not particularly helpful either..
Can anybody see anything going wrong?
Thanks in advance!
might be a simple answer...st_contains returns 't' not 'true'. Postgres is cap sensitive, make sure t not T
This had to do with python operators. I entered all the sql arguments using the python % operator properly and it worked. Like this,
roomTypeSQL = "SELECT spftype FROM cameron_toll_spatialfeatures WHERE ST_Contains(ST_GeomFromText(%s),ST_geomFromWKB(geometry))=%s;"
roomTypeData = (pointTested,'t') # "POINT(-3.164005 55.926378)"
.execute(roomTypeSQL, roomTypeData)
The python operator can be quite frustrating sometimes. It doesn't always work as expected. I have some examples of SQL commands in which I had to place the arguments directly inside SQL commands. This method worked although its not advisable in Python documentation.

sort by count in json

I'm using tastypie to create json from my django models however I'm running into a problem that I think should have a simple fix.
I have an object Blogs wich has Comment object children. I want to be able to do something like this with my json:
/api/v1/blogs/?order_by=comment_count
But I can't figure out how to sort on a field that's not part of the original comment/ blog model. I create comment_count myself in a dehydrate method that just takes the array of comments and returns comments.count()
Any help would be much appreciated - I can't seem to find any explanation.
If I understood correctly this should help:
Blog.objects.annotate(comment_count=Count('comments')).order_by('comment_count')
You might be able to do it with extra like something like:
Blog.objects.extra(
select={
'entry_count': 'SELECT COUNT(*) FROM blog_entry WHERE blog_entry.blog_id = blog_blog.id'
},
order_by = ['-entry_count'],
)
I haven't tested this, but it should work. The caveat is it will only work with a relational database.

How to a query a set of objects and return a set of object specific attribute in SQLachemy/Elixir?

Suppose that I have a table like:
class Ticker(Entity):
ticker = Field(String(7))
tsdata = OneToMany('TimeSeriesData')
staticdata = OneToMany('StaticData')
How would I query it so that it returns a set of Ticker.ticker?
I dig into the doc and seems like select() is the way to go. However I am not too familiar with the sqlalchemy syntax. Any help is appreciated.
ADDED: My ultimate goal is to have a set of current ticker such that, when new ticker is not in the set, it will be inserted into the database. I am just learning how to create a database and sql in general. Any thought is appreciated.
Thanks. :)
Not sure what you're after exactly but to get an array with all 'Ticker.ticker' values you would do this:
[instance.ticker for instance in Ticker.query.all()]
What you really want is probably the Elixir getting started tutorial - it's good so take a look!
UPDATE 1: Since you have a database, the best way to find out if a new potential ticker needs to be inserted or not is to query the database. This will be much faster than reading all tickers into memory and checking. To see if a value is there or not, try this:
Ticker.query.filter_by(ticker=new_ticker_value).first()
If the result is None you don't have it yet. So all together,
if Ticker.query.filter_by(ticker=new_ticker_value).first() is None:
Ticker(ticker=new_ticker_value)
session.commit()

Categories

Resources