Reorder of SQLAlchemy Query results based on external ranking

Reorder of SQLAlchemy Query results based on external ranking - python

The results of an ORM query (e.g., MyObject.query()) need to be ordered according to a ranking algorithm that is based on values not within the database (i.e. from a separate search engine). This means 'order_by' will not work, since it only operates on fields within the database.
But, I don't want to convert the query results to a list, then reorder, because I want to maintain the ability to add further constraints to the query. E.g.:
results = MyObject.query()
results = my_reorder(results)
results = results.filter(some_constraint)
Is this possible to accomplish via SQLAlchemy?

I am afraid you will not be able to do it, unless the ordering can be derived from the fields of the object's table(s) and/or related objects' tables which are in the database.
But you could return from your code the tuple (query, order_func/result). In this case the query can be still extended until it is executed, and then resorted. Or you could create a small Proxy-like class, which will contain this tuple, and will delegate the query-extension methods to the query, and query-execution methods (all(), __iter__, ...) to the query and apply ordering when executed.
Also, if you could calculate the value for each MyObject instance beforehand, you could add a literal column to the query with the values and then use order_by to order by it. Alternatively, add temporary table, add rows with the computed ordering value, join on it in the query and add ordering. But I guess these are adding more complexity than the benefit they bring.

Related

session.execute returns model objects instead of actual data

I have switched from connection.execute to session.execute. I am not able to get usable data from it. The results seem to contain references to models instead of actual row data.
with Session(engine) as s:
q = select(WarrantyRequest)
res = s.execute(q)
keys = res.keys()
data_list = res.all()
print(keys) # should print list of column names
print(data_list) # should print list of lists with row data
dict_list = s.execute(q).mappings().all()
print(dict_list) # should print list of dicts with column names as keys
It prints
RMKeyView(['WarrantyRequest'])
[(<models.mock.WarrantyRequest object at 0x7f4d065d3df0>,), ...]
[{'WarrantyRequest': <models.mock.WarrantyRequest object at 0x7f002b464df0>}, ... ]
When doing the same with connection.execute, I got the expected results.
What am I missing?
There is this paragraph in the docs which kind of describes the behaviour, but I am not able to tell what I am supposed to do to get data out of it.
It’s important to note that while methods of Query such as Query.all() and Query.one() will return instances of ORM mapped objects directly in the case that only a single complete entity were requested, the Result object returned by Session.execute() will always deliver rows (named tuples) by default; this is so that results against single or multiple ORM objects, columns, tables, etc. may all be handled identically.
If only one ORM entity was queried, the rows returned will have exactly one column, consisting of the ORM-mapped object instance for each row. To convert these rows into object instances without the tuples, the Result.scalars() method is used to first apply a “scalars” filter to the result; then the Result can be iterated or deliver rows via standard methods such as Result.all(), Result.first(), etc.

Querying a model
q = select(WarrantyRequest)
returns rows that consist of individual model instances. To get rows of raw data instead, query the model's __table__ attribute:
q = select(WarrantyRequest.__table__)
SQLAlchemy's ORM layer presents database tables and rows in an object-oriented way, on the assumption that the programmer wants to work with objects and their attributes rather than raw data.

Django intersection of non commited model objects and commited model objects

What I have is a list of model objects I haven't run bulk_create on:
objs = [Model(id=1, field=foo), Model(id=2, field=bar)]
What I'd like to do is intersect objs with Model.objects.all() and return only those objects which aren't already in the database (based on the field value).
For example, if my database was:
[Model(id=3, field=foo)]
Then the resulting objs should be:
objs = [Model(id=1, field=bar)]
Is something like this possible?
Edit:
So a bit of further explanation:
What I'm doing is I have an import command, that I'm trying to have an --append flag included.
Without the flag, I delete the tables, and start fresh.
With the flag, I want to bulk create a large number of objects (single creation is much slower - I've checked), and I don't want to have objects with the same field values but different ids.
I've already tried filtering out duplicates after insertion, but it's quite slow and I wanted to test this approach to see if it's faster.
The objects are read from CSV files, and it's faster to make a list of Model, and then bulk_create, as opposed to running create on each row.

We can probably best do this by first constructing a set of values the field column in the database has:
field_vals = set(Model.objects.values_list('field', flat=True).distinct())
and then we can perform a filtering like:
filtered_objs = [obj for obj in objs if obj.field not in field_vals]
By constructing a set first, we run a single query, construct a set in O(n) (with n the number of Models in the database), and then filter in O(m) (with m the number of objects in objs). So the algorithm is O(m+n).
Based on the question however, it looks like you could probably save the effort of constructing these objects in the first place. You can use Django's get_or_create function. And use it like:
obj, created = Model.objects.get_or_create(field=foo)
with obj the objected (either fetched from the database, or created in the database), and created a boolean that is True, if we had to create a new object.

Django compare queryset from different databases

I need to compare 2 querysets from the same model from 2 different databases.
I expect the difference between them. In this case, I grab only one column (charfield), from two databases and want to compare this "list", i.e. it would be great to work with sets and difference methods of sets.
But I can't simple subtract querysets, also set(queryset) and list(querysets) -- this give me nothing (not an error), i.e.
diff_set = set(articles1) - set(articles2)
I switched db's on the fly, make 2 querysets and try to compare them (filter, or exclude)
articles1 = list(Smdocuments.objects.using('tmp1').only('id').filter(doctype__exact='CQ'))
# right connection
connections.databases['tmp2']['HOST'] = db2.host
connections.databases['tmp2']['NAME'] = db2.name
articles2 = list(Smdocuments.objects.using('tmp2').only('id').filter(doctype__exact='CQ'))
# okay to chain Smdocuments objects, gives all the entries
all = list(chain(articles1, articles2))
# got nothing, even len(diff_set) is none
diff_set = set(articles1) - set(articles2)
# this one raise error Subqueries aren't allowed across different databases.
articles_exclude = Smdocuments.objects.using('tmp1').only('id').filter(doctype__exact='CQ')
len(articles1)
diff_ex = Smdocuments.objects.using('tmp2').only('id').filter(doctype__exact='CQ').exclude(id__in=articles_exclude)
len(diff_ex)
diff_ex raise an error
Subqueries aren't allowed across different databases. Force the inner
query to be evaluated using list(inner_query).
So, "Model objects" not so easy to manipulate, and querysets between difference databases as well.
I see, thats not a good db scheme, but it's another application with distributed db, and I need to compare them.
It's would be enough to compare by one column, but probably compare full queryset will work for future.
Or, should I convert queryset to list and compare raw data?

Your question is really unclear about what you actually expect, but here are a couple hints anyway:
First, model instances (assuming they are instances of the same model of course) compare on their primary key value, which is also used as hash for dicts and sets, so if you want to compare the underlying database records you should not work on model instances but on the raw db values as lists of tuples or dicts. You can get those using (resp.) Queryset.values_list() or Queryset.values() - not forgetting to list() them so you really get a list and not a queryset.
Which brings us to the second important point: while presenting themselves as list-likes (in that they support len(), iteration, subscripting and - with some restrictions - slicing), Querysets are NOT lists. You can't compare two querysets (well you can but they compare on identity, which means that two querysets will only be equal if they are actually the very same object), and, more important, using a queryset as an argument to a 'field__in=' lookup will result in a SQL subquery where passing a proper list results in a plain 'field IN (...)' where clause. This explains the error you get with the exclude(...) approach.
To make a long story short, if you want to effectively compare database rows, you want:
# the fields you want to compate records on
fields = 'field1', 'field2', 'fieldN'
rows1 = list(YouModel.objects.using('tmp1').filter(...).values_list(*fields))
rows2 = list(YouModel.objects.using('tmp2').filter(...).values_list(*fields))
# now you have two lists of tuples so you can apply ordinary python comparisons / set operations etc
print rows1 == rows2
print set(rows1) - set(rows2)
# etc

Aerospike where query index python

We are currently testing "aerospike".
But there are certain points in the documentation that we do not understand with reference to the keys.
key = ('trivium', 'profile', 'data')
# Write a record
client.put(key, {
'name': 'John Doe',
'bin_data': 'KIJSA9878MGU87',
'public_profile': True
})
We read about the namespace, but when we try to query with the general documentation.
client = aerospike.client(config).connect()
query = client.query('trivium', 'profile')
query.select('name', 'bin_data')
query.where(p.equals('public_profile', True))
print(query.results())
The result is null, but when we eerase the "where" statement the query brings all the records, the documentation says that the query work with the secondary index, but how that works?
Regards.

You can use one filter in a query. That filter, in your case, the equality filter, is on the public_profile bin. To use the filter, you must build a secondary index (SI) on public_profile bin, however SIs can only be on bins containing numeric or string data type. So to do what you are trying to do, change public_profile to a numeric entry say 0 or 1, then add a secondary index on that bin and use the equality filter on the value of 0 or 1. While you can build multiple SIs, you can only invoke one filter in a any given query. You cannot chain multiple filters with an "AND". If you have to use multiple filters, you will have to write Stream UDFs (User Defined Functions). You can use AQL to define SIs, you just have to do once.
$aql
aql>help --- see the command to add secondary index.
aql>exit
SIs reside in process RAM. Once defined, any new data added or modify is automatically indexed by aerospike as applicable. If you define index on public_profile as NUMERIC but in some records insert string data in that bin, those records will not be indexed and won't participate in the query filter.

Why is a pickled SQLAlchemy model object smaller than its pickled `dict`?

The same attributes stored in __dict__ are needed to restore the object, right?

I think a SQLAlchemy RowProxy uses _row, a tuple, to store the value. It doesn't have a __dict__, so no storage overhead of a _dict__ per row. Its _parent object has fields which store the column names to index pos in tuple lookup. Pretty common thing to do if you are trying to cut on down sql fetching result sizes - the column list is always the same for each row of the same select, so you rely on a common parent to keep track of which index of the tuple holds which column rather than having your own per-row __dict__.
Additional advantage is that, at the db lib connect level, sql cursors return (always?) their values in tuples, so you have little processing overhead. But a straight sql fetch is just that, a cursor descr & a bunch of disconnected rows with tuples in them - SQLALchemy bridges that and allows to use column names.
Now, as to how the unpickling process goes, you'd have to look at the actual implementation.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reorder of SQLAlchemy Query results based on external ranking - python

Related

session.execute returns model objects instead of actual data

Django intersection of non commited model objects and commited model objects

Django compare queryset from different databases

Aerospike where query index python

Why is a pickled SQLAlchemy model object smaller than its pickled `dict`?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reorder of SQLAlchemy Query results based on external ranking - python

Related

session.execute returns model objects instead of actual data

Django intersection of non commited model objects and commited model objects

Django compare queryset from different databases

Aerospike where query index python

Why is a pickled SQLAlchemy model object smaller than its pickled `__dict__`?

Categories

Resources

Why is a pickled SQLAlchemy model object smaller than its pickled `dict`?