Optimizing a inequality query in ndb over two properties

Optimizing a inequality query in ndb over two properties - python

I'm trying to do a query into a range of valid dates
q = Licence.query(Licence.valid_from <= today,
Licence.valid_to >= today,
ancestor = customer.key
).fetch(keys_only=True)
I know that Datastore doesn't support inequality queries over two propperties.
So I do this:
kl = Licence.query(Licence.valid_from <= today,
ancestor = customer.key
).fetch(keys_only=True)
licences = ndb.get_multi(kl)
for item in licences:
if item.valid_to < today:
licence.remove(item)
But I don't like because I think that I use too much RAM retrieving more entities (or keys) from the Datastore that I finally need.
Any body knows a better way of doing this type of queries?
Is enough to use .filter() before .get()?
Thanks

One solution would be to create a new field, like start_week, which buckets the queries and allows you to use an IN query to filter:
q = Licence.query(Licence.start_week in range(5,30),
Licence.valid_to >= today,
ancestor = customer.key)
Even simpler: Use a projection query to identify the right set of data without fetching full entities. This is faster than a regular query.
it = Licence.query(License.valid_from <= today,
ancestor = customer.key
).iter(projection=[License.valid_to])
keys = [e.key for e in it if e.valid_to >= today]
licenses = ndb.get_multi(keys)

Related

Filtering the result of a Query in flask SQLAlchemy

I am working on a query that depends on the result of another query. Here's my question: Is it possible to filter a query using the result of another query?
I have two tables that I would like to query by joining them. I tried this query:
# Get all data for a particular country
de_market= db.session.query(
func.count(certificate.id),
func.month(certificate.creation_time).label('month'),
func.year(certificate.creation_time).label('year')).join(B2cCustomer.certificates).group_by(func.month(certificate.creation_time),
func.year(certificate.creation_time))
.filter(certificate.report_sent != None)
.filter(B2cCustomer.market == market_code)
.all()
so after the result I want to then filter the result again:
# filter the country data by the different msm campaign
get_data_based_on_msm = de_market.filter(certifcate.msm == msm).all()
I did this but didn't get a good result. Is there a better way to do this?

Is it possible to filter a query using the result of another query?
Yes.
Well, kind of.
You wrote:
de_market = db.session.query(
...
.filter(certificate.report_sent != None)
.filter(B2cCustomer.market == market_code)
.all()
# and then expressed an interest in
de_market.filter(certificate.msm == msm).all()
That is to say, we assign q = ... .filter(cond1).filter(cond2)
and later we are interested in q.filter(cond3).
Can you do that? Yes.
Now, here is the fly in the ointment.
You finished up with retrieving the result rows using .all().
My advice is to not do that.
Clone the query,
write a helper function that returns such a query,
do what you must to make an un-executed query available.
With that in hand, yes, you can easily tack on extra filters.
Here is some idiomatic usage:
q = ... .filter(cond1).filter(cond2)
if cond3:
q = q.filter(cond3)
if limit:
q = q.limit(limit)
return q
But as soon as you ask for results using .all(),
all bets are off, you don't get to modify the query after that.

find all documents with same id in different collections in firestore

I have this structure in firestore. Many collections with id the user_id and inside each of them many documents with IDs the date of departure. The documents contain the fields "from" and "to" with the airport name.
I want to retrieve all the IDs of collections (the users IDs) that have the same documents of a choosed user in input for see who shared the flight with this user in all the flights he made.
I'm using python.
UPDATE: I solved my issue in this way.
#app.route('/infos/<string:user_id>/', methods=['GET'])
def user_info(user_id):
docs = db.collection(f'{user_id}').stream()
travels = []
for doc in docs:
sharing_travellers = []
tmp = doc.to_dict()
tmp['date'] = doc.id
colls = db.collections()
for coll in colls:
if coll.id != user_id:
date = datetime.strptime(doc.id, '%Y-%m-%d')
query = db.collection(f'{coll.id}').stream()
for q in query:
other_date = datetime.strptime(q.id, '%Y-%m-%d')
if abs((date - other_date).days) < 1:
json_obj = q.to_dict()
if json_obj['from'] == tmp['from'] and json_obj['to'] == tmp['to']:
sharing_travellers.append(coll.id)
tmp['shared'] = sharing_travellers
travels.append(tmp)
return render_template('user_info.html', title=user_id, travels=travels)

The only way to read across collections is if those collections have the same name. If that was the case, you could use a collection group query.
Since your collections don't have the same name though, you'll have to get the list collections, and then look in each collection separately.

I support Frank's answer but I want to elaborate that it might be wise to reformat the structure of your database to better accommodate this type of situation. cross collection searching is limited to collection group queries which are already limited, and additional methods will require costly solutions.
It's often better to have a dedicated collection with those ID as field values of which you can query per user and in a collective group.

Fast way to convert SQLAlchemy objects to Python dicts

I have this query that returns a list of student objects:
query = db.session.query(Student).filter(Student.is_deleted == false())
query = query.options(joinedload('project'))
query = query.options(joinedload('image'))
query = query.options(joinedload('student_locator_map'))
query = query.options(subqueryload('attached_addresses'))
query = query.options(subqueryload('student_meta'))
query = query.order_by(Student.student_last_name, Student.student_first_name,
Student.student_middle_name, Student.student_grade, Student.student_id)
query = query.filter(filter_column == field_value)
students = query.all()
The query itself does not take much time. The problem is converting all these objects (can be 5000+) to Python dicts. It takes over a minute with this many objects.Currently, the code loops thru the objects and converts using to_dict(). I have also tried _dict__ which was much faster but this does not convert all relational objects it seems.
How can I convert all these Student objects and related objects quickly?

Maybe this will help you...
from collections import defaultdict
def query_to_dict(student_results):
result = defaultdict(list)
for obj in student_results:
instance = inspect(obj)
for key, x in instance.attrs.items():
result[key].append(x.value)
return result
output = query_to_dict(students)

query = query.options(joinedload('attached_addresses').joinedload('address'))
By chaining address joinedload to attached_addresses I was able to significantly speed up the query.
My understanding of why this is the case:
Address objects were not being loaded with the initial query. Every iteration thru the loop, the db would get hit to retrieve the Address object. With joined load, Address objects are now loaded upon initial query.
Thanks to Corley Brigman for the help.

geoSpatial & Location based search in google appengine python

I want to achieve something like the map drag search on airbnb (https://www.airbnb.com/s/Paris--France?source=ds&page=1&s_tag=PNoY_mlz&allow_override%5B%5D=)
I am saving the data like this in datastore
user.lat = float(lat)
user.lon = float(lon)
user.geoLocation = ndb.GeoPt(float(lat),float(lon))
and whenever I drag & drop map or zoom in or zoom out I get following parameters in my controller
def get(self):
"""
This is an ajax function. It gets the place name, north_east, and south_west
coordinates. Then it fetch the results matching the search criteria and
create a result list. After that it returns the result in json format.
:return: result
"""
self.response.headers['Content-type'] = 'application/json'
results = []
north_east_latitude = float(self.request.get('nelat'))
north_east_longitude = float(self.request.get('nelon'))
south_west_latitude = float(self.request.get('swlat'))
south_west_longitude = float(self.request.get('swlon'))
points = Points.query(Points.lat<north_east_latitude,Points.lat>south_west_latitude)
for row in points:
if row.lon > north_east_longitude and row.lon < south_west_longitude:
listingdic = {'name': row.name, 'desc': row.description, 'contact': row.contact, 'lat': row.lat, 'lon': row.lon}
results.append(listingdic)
self.write(json.dumps({'listings':results}))
My model class is given below
class Points(ndb.Model):
name = ndb.StringProperty(required=True)
description = ndb.StringProperty(required=True)
contact = ndb.StringProperty(required=True)
lat = ndb.FloatProperty(required=True)
lon = ndb.FloatProperty(required=True)
geoLocation = ndb.GeoPtProperty()
I want to improve the query.
Thanks in advance.

No, you cannot improve the solution by checking all 4 conditions in the query because ndb queries do not support inequality filters on multiple properties. From NDB Queries (emphasis mine):
Limitations: The Datastore enforces some restrictions on queries.
Violating these will cause it to raise exceptions. For example,
combining too many filters, using inequalities for multiple
properties, or combining an inequality with a sort order on a
different property are all currently disallowed. Also filters
referencing multiple properties sometimes require secondary indexes to
be configured.
and
Note: As mentioned earlier, the Datastore rejects queries using inequality filtering on more than one property.

Best practices for manipulating database result sets in Python?

I am writing a simple Python web application that consists of several pages of business data formatted for the iPhone. I'm comfortable programming Python, but I'm not very familiar with Python "idiom," especially regarding classes and objects. Python's object oriented design differs somewhat from other languages I've worked with. So, even though my application is working, I'm curious whether there is a better way to accomplish my goals.
Specifics: How does one typically implement the request-transform-render database workflow in Python? Currently, I am using pyodbc to fetch data, copying the results into attributes on an object, performing some calculations and merges using a list of these objects, then rendering the output from the list of objects. (Sample code below, SQL queries redacted.) Is this sane? Is there a better way? Are there any specific "gotchas" I've stumbled into in my relative ignorance of Python? I'm particularly concerned about how I've implemented the list of rows using the empty "Record" class.
class Record(object):
pass
def calculate_pnl(records, node_prices):
for record in records:
try:
# fill RT and DA prices from the hash retrieved above
if hasattr(record, 'sink') and record.sink:
record.da = node_prices[record.sink][0] - node_prices[record.id][0]
record.rt = node_prices[record.sink][1] - node_prices[record.id][1]
else:
record.da = node_prices[record.id][0]
record.rt = node_prices[record.id][1]
# calculate dependent values: RT-DA and PNL
record.rtda = record.rt - record.da
record.pnl = record.rtda * record.mw
except:
print sys.exc_info()
def map_rows(cursor, mappings, callback=None):
records = []
for row in cursor:
record = Record()
for field, attr in mappings.iteritems():
setattr(record, attr, getattr(row, field, None))
if not callback or callback(record):
records.append(record)
return records
def get_positions(cursor):
# get the latest position time
cursor.execute("SELECT latest data time")
time = cursor.fetchone().time
hour = eelib.util.get_hour_ending(time)
# fetch the current positions
cursor.execute("SELECT stuff FROM atable", (hour))
# read the rows
nodes = {}
def record_callback(record):
if abs(record.mw) > 0:
if record.id: nodes[record.id] = None
return True
else:
return False
records = util.map_rows(cursor, {
'id': 'id',
'name': 'name',
'mw': 'mw'
}, record_callback)
# query prices
for node_id in nodes:
# RT price
row = cursor.execute("SELECT price WHERE ? ? ?", (node_id, time, time)).fetchone()
rt5 = row.lmp if row else None
# DA price
row = cursor.execute("SELECT price WHERE ? ? ?", (node_id, hour, hour)).fetchone()
da = row.da_lmp if row else None
# update the hash value
nodes[node_id] = (da, rt5)
# calculate the position pricing
calculate_pnl(records, nodes)
# sort
records.sort(key=lambda r: r.name)
# return the records
return records

The empty Record class and the free-floating function that (generally) applies to an individual Record is a hint that you haven't designed your class properly.
class Record( object ):
"""Assuming rtda and pnl must exist."""
def __init__( self ):
self.da= 0
self.rt= 0
self.rtda= 0 # or whatever
self.pnl= None #
self.sink = None # Not clear what this is
def setPnl( self, node_prices ):
# fill RT and DA prices from the hash retrieved above
# calculate dependent values: RT-DA and PNL
Now, your calculate_pnl( records, node_prices ) is simpler and uses the object properly.
def calculate_pnl( records, node_prices ):
for record in records:
record.setPnl( node_prices )
The point isn't to trivially refactor the code in small ways.
The point is this: A Class Encapsulates Responsibility.
Yes, an empty-looking class is usually a problem. It means the responsibilities are scattered somewhere else.
A similar analysis holds for the collection of records. This is more than a simple list, since the collection -- as a whole -- has operations it performs.
The "Request-Transform-Render" isn't quite right. You have a Model (the Record class). Instances of the Model get built (possibly because of a Request.) The Model objects are responsible for their own state transformations and updates. Perhaps they get displayed (or rendered) by some object that examines their state.
It's that "Transform" step that often violates good design by scattering responsibility all over the place. "Transform" is a hold-over from non-object design, where responsibility was a nebulous concept.

Have you considered using an ORM? SQLAlchemy is pretty good, and Elixir makes it beautiful. It can really reduce the ammount of boilerplate code needed to deal with databases. Also, a lot of the gotchas mentioned have already shown up and the SQLAlchemy developers dealt with them.

Depending on how much you want to do with the data you may not need to populate an intermediate object. The cursor's header data structure will let you get the column names - a bit of introspection will let you make a dictionary with col-name:value pairs for the row.
You can pass the dictionary to the % operator. The docs for the odbc module will explain how to get at the column metadata.
This snippet of code to shows the application of the % operator in this manner.
>>> a={'col1': 'foo', 'col2': 'bar', 'col3': 'wibble'}
>>> 'Col1=%(col1)s, Col2=%(col2)s, Col3=%(col3)s' % a
'Col1=foo, Col2=bar, Col3=wibble'
>>>

Using a ORM for an iPhone app might be a bad idea because of performance issues, you want your code to be as fast as possible. So you can't avoid boilerplate code. If you are considering a ORM, besides SQLAlchemy I'd recommend Storm.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimizing a inequality query in ndb over two properties - python

Related

Filtering the result of a Query in flask SQLAlchemy

find all documents with same id in different collections in firestore

Fast way to convert SQLAlchemy objects to Python dicts

geoSpatial & Location based search in google appengine python

Best practices for manipulating database result sets in Python?

Categories

Resources