SQLAlchemy Order By before Group By - python

I've researched a lot in terms of doing a query to do an order by before a group by and have found most answers in terms of raw SQL. However, I'd like to see a solution in SQLAlchemy Code.
My original naive solution looked as follows:
session.query(MyTable).order_by(timestamp).group_by(begin_date)
unfortunately, this causes the table to first be grouped then ordered, which will not return what I am expecting.
Second,I tried:
stmt = session.query(MyTable).order_by(timestamp)
session.query(stmt).group_by(begin_date)
This returns the correct results, however, the results are of KeyedTuples whereas I want to actually have MyTable objects for backwards compatibility reasons.
How can I achieve this?

The code in latest comment of original question seems still have some problem. Here is a working version:
stmt = session.query(MyTable).order_by(timestamp).subquery()
session.query().add_entity(MyTable, alias=stmt).group_by(begin_date);

Related

Correct way to combine query terms using boolean logic in Github repo search (Pygithub)

I'm using the search_repositories() function from the Pygithub package and am looking to build a query something like this in pseudo code:
('keyword1' OR 'keyword2) AND ('keyword3' OR 'keyword4') AND last_push_date > '2020-01-01'
So far I've not worked out the correct way to break up the clauses according to the logic of the parentheses and also have found that if I have any boolean logic, the 'qualifier' (i.e. the pushdate:>'2020-01-01') seems to break the query string and return no results.
I'm aware from the docs that you cannot have more than 5 AND/OR operators but in my testing that wasn't the case.
It seems that for instance I can do something like:
'"firstword secondword" OR "thirdword fourthword"' and it treats each as a phrase correctly, but adding a qualifier afterwards then returns no results - even if I put the pushed date to way back.
Also this seems to work:
'oneword anotherword+pushed:>2020-01-01'
But combining such logic just returns 0 results.
Any thoughts?

How to do WHERE IN with sqlalchemy?

Here is what I'm trying to do. The database is Postgres.
numbers is a Python set() or some other iterable.
session.execute(table.update().\
where(func.substring(table.c.number,1 ,5) in numbers).\
values(provider="testprovider")
The problem with this is that in numbers doesn't seem to work. It will work for a single number if I do something like where(func.substring(table.c.number,1, 5)== '12345')
I've googled a lot on how to do WHERE IN with sqlalchemy, but all of them are with a SELECT query or do use the .in_() function on the column itself. For example this answer: link
The documentation also says the .in_() only works on the columns.
So, I don't really know how to proceed here. How can I write this expression?
Ah nevermind, although the documentation doesn't say it, I can use the .in_() function on the result of func.substring.
So
where(func.substring(table.c.number, 1, 5).in_(numbers))
worked.

Index of row looping over django queryset [duplicate]

I have a QuerySet, let's call it qs, which is ordered by some attribute which is irrelevant to this problem. Then I have an object, let's call it obj. Now I'd like to know at what index obj has in qs, as efficiently as possible. I know that I could use .index() from Python or possibly loop through qs comparing each object to obj, but what is the best way to go about doing this? I'm looking for high performance and that's my only criteria.
Using Python 2.6.2 with Django 1.0.2 on Windows.
If you're already iterating over the queryset and just want to know the index of the element you're currently on, the compact and probably the most efficient solution is:
for index, item in enumerate(your_queryset):
...
However, don't use this if you have a queryset and an object obtained by some unrelated means, and want to learn the position of this object in the queryset (if it's even there).
If you just want to know where you object sits amongst all others (e.g. when determining rank), you can do it quickly by counting the objects before you:
index = MyModel.objects.filter(sortField__lt = myObject.sortField).count()
Assuming for the purpose of illustration that your models are standard with a primary key id, then evaluating
list(qs.values_list('id', flat=True)).index(obj.id)
will find the index of obj in qs. While the use of list evaluates the queryset, it evaluates not the original queryset but a derived queryset. This evaluation runs a SQL query to get the id fields only, not wasting time fetching other fields.
QuerySets in Django are actually generators, not lists (for further details, see Django documentation on QuerySets).
As such, there is no shortcut to get the index of an element, and I think a plain iteration is the best way to do it.
For starter, I would implement your requirement in the simplest way possible (like iterating); if you really have performance issues, then I would use some different approach, like building a queryset with a smaller amount of fields, or whatever.
In any case, the idea is to leave such tricks as late as possible, when you definitely knows you need them.
Update: You may want to use directly some SQL statement to get the rownumber (something lie . However, Django's ORM does not support this natively and you have to use a raw SQL query (see documentation). I think this could be the best option, but again - only if you really see a real performance issue.
It's possible for a simple pythonic way to query the index of an element in a queryset:
(*qs,).index(instance)
This answer will unpack the queryset into a list, then use the inbuilt Python index function to determine it's position.
You can do this using queryset.extra(…) and some raw SQL like so:
queryset = queryset.order_by("id")
record500 = queryset[500]
numbered_qs = queryset.extra(select={
'queryset_row_number': 'ROW_NUMBER() OVER (ORDER BY "id")'
})
from django.db import connection
cursor = connection.cursor()
cursor.execute(
"WITH OrderedQueryset AS (" + str(numbered_qs.query) + ") "
"SELECT queryset_row_number FROM OrderedQueryset WHERE id = %s",
[record500.id]
)
index = cursor.fetchall()[0][0]
index == 501 # because row_number() is 1 indexed not 0 indexed

Querying a list in mongoengine; contains vs in

I have a ListField in a model with ids (ReferenceField), and I need to do a query if a certain id is in that list. AFAIK I have 2 options for this:
Model.objects.filter(refs__contains='59633cad9d4bc6543aab2f39')
or:
Model.objects.filter(refs__in=['59633cad9d4bc6543aab2f39'])
Which one is the most efficient for this use case?
The model looks like:
class Model(mongoengine.Document):
refs = mongoengine.ListField(mongoengine.ReferenceField(SomeOtherModel))
From what I can read in the mongoengine documentation, contains is really a string query, but it works surprisingly here as well. But I'm guessing that __in is more efficient since it should be optimized for lists, or am I wrong?
The string queries normally under the covers are all regex query so would be less efficient. However, the exception is when testing against reference fields! The following queries are:
Model.objects.filter(refs__contains="5305c92956c02c3f391fcaba")._query
{'refs': ObjectId('5305c92956c02c3f391fcaba')}
Which is a direct lookup.
Model.objects.filter(refs__in=["5305c92956c02c3f391fcaba"])._query
{'refs': {'$in': [ObjectId('5305c92956c02c3f391fcaba')]}}
This probably is less efficient, but would probably be extremely marginal. The biggest impact would be the number of docs and whether or not the refs field has an index.

Search a database for elements including a string variable

I'm pretty new to SQLite and Python and have run into a bit of confusion. I'm trying to return all elements in a column that contain a substring which is passed to a function as a variable in Python. My code is running, but it's returning an empty result instead of the correct result.
Here's the code with the names generalized:
def myFunc(cursor,myString):
return cursor.execute("""select myID from Column where name like '%'+?'%' """,(myString,))
Like I said, the code does run without error but returns an empty result instead of the result that I know it should be. I'm assuming it has something to do with my use of the wildcard and/or question mark, but I can't be sure. Anyone have any ideas? Thanks in advance for your time/help! Also, this is my first post, so I apologize in advance if I missed any of the recommended protocols for asking questions.
Well, '%'+?'%' definitely isn't going to work—you're trying to concatenate with + on the left, but with no operator…
You can compute LIKE-search fields if you do it right—'%'+?+'%', in this case. That will cause problems with some databases (from not working, to doing a less efficient search), but, at least according to CL.'s comment, sqlite3 will be fine.
But the easy thing to do is to just substitute a complete parameter, rather than part of one. You can put % into the parameters, and it'll be interpreted just fine. So:
return cursor.execute("""select myID from Column where name like ?""",
('%'+myString+'%',))
And this also has the advantage that if you want to do a search for initial substrings ('foo%'), it'll be the same SQL statement but with a different parameter.
Try this:
def myFunc(cursor,myString):
return cursor.execute('select myID from Column where name like "{0}"'.format(myString))

Categories

Resources