geoSpatial & Location based search in google appengine python - python

I want to achieve something like the map drag search on airbnb (https://www.airbnb.com/s/Paris--France?source=ds&page=1&s_tag=PNoY_mlz&allow_override%5B%5D=)
I am saving the data like this in datastore
user.lat = float(lat)
user.lon = float(lon)
user.geoLocation = ndb.GeoPt(float(lat),float(lon))
and whenever I drag & drop map or zoom in or zoom out I get following parameters in my controller
def get(self):
"""
This is an ajax function. It gets the place name, north_east, and south_west
coordinates. Then it fetch the results matching the search criteria and
create a result list. After that it returns the result in json format.
:return: result
"""
self.response.headers['Content-type'] = 'application/json'
results = []
north_east_latitude = float(self.request.get('nelat'))
north_east_longitude = float(self.request.get('nelon'))
south_west_latitude = float(self.request.get('swlat'))
south_west_longitude = float(self.request.get('swlon'))
points = Points.query(Points.lat<north_east_latitude,Points.lat>south_west_latitude)
for row in points:
if row.lon > north_east_longitude and row.lon < south_west_longitude:
listingdic = {'name': row.name, 'desc': row.description, 'contact': row.contact, 'lat': row.lat, 'lon': row.lon}
results.append(listingdic)
self.write(json.dumps({'listings':results}))
My model class is given below
class Points(ndb.Model):
name = ndb.StringProperty(required=True)
description = ndb.StringProperty(required=True)
contact = ndb.StringProperty(required=True)
lat = ndb.FloatProperty(required=True)
lon = ndb.FloatProperty(required=True)
geoLocation = ndb.GeoPtProperty()
I want to improve the query.
Thanks in advance.

No, you cannot improve the solution by checking all 4 conditions in the query because ndb queries do not support inequality filters on multiple properties. From NDB Queries (emphasis mine):
Limitations: The Datastore enforces some restrictions on queries.
Violating these will cause it to raise exceptions. For example,
combining too many filters, using inequalities for multiple
properties, or combining an inequality with a sort order on a
different property are all currently disallowed. Also filters
referencing multiple properties sometimes require secondary indexes to
be configured.
and
Note: As mentioned earlier, the Datastore rejects queries using inequality filtering on more than one property.

Related

find all documents with same id in different collections in firestore

I have this structure in firestore. Many collections with id the user_id and inside each of them many documents with IDs the date of departure. The documents contain the fields "from" and "to" with the airport name.
I want to retrieve all the IDs of collections (the users IDs) that have the same documents of a choosed user in input for see who shared the flight with this user in all the flights he made.
I'm using python.
UPDATE: I solved my issue in this way.
#app.route('/infos/<string:user_id>/', methods=['GET'])
def user_info(user_id):
docs = db.collection(f'{user_id}').stream()
travels = []
for doc in docs:
sharing_travellers = []
tmp = doc.to_dict()
tmp['date'] = doc.id
colls = db.collections()
for coll in colls:
if coll.id != user_id:
date = datetime.strptime(doc.id, '%Y-%m-%d')
query = db.collection(f'{coll.id}').stream()
for q in query:
other_date = datetime.strptime(q.id, '%Y-%m-%d')
if abs((date - other_date).days) < 1:
json_obj = q.to_dict()
if json_obj['from'] == tmp['from'] and json_obj['to'] == tmp['to']:
sharing_travellers.append(coll.id)
tmp['shared'] = sharing_travellers
travels.append(tmp)
return render_template('user_info.html', title=user_id, travels=travels)
The only way to read across collections is if those collections have the same name. If that was the case, you could use a collection group query.
Since your collections don't have the same name though, you'll have to get the list collections, and then look in each collection separately.
I support Frank's answer but I want to elaborate that it might be wise to reformat the structure of your database to better accommodate this type of situation. cross collection searching is limited to collection group queries which are already limited, and additional methods will require costly solutions.
It's often better to have a dedicated collection with those ID as field values of which you can query per user and in a collective group.

Graphene-Django, fetching 10,000+ data points

I am building a backend GraphQL service with Django & Graphene and I am having trouble with figuring out what the best way would be to fetch thousands of data points in a performant manner.
Some context, we have an experiment that has 35,000 data points that needs to be displayed on a Plotly plot. The data points are stored in a database table called data_point. Here is the schema of this table:
data_point_id | value | type | created_at
In my Graphene schema, I have a resolver to get all data points for a given experiment as shown below
class ExperimentNode(DjangoObjectType):
users = generic.GenericScalar()
aliases = generic.GenericScalar()
properties = generic.GenericScalar()
data_points = graphene.List(DataPointType)
class Meta:
model = Experiment
# what other filter fields would we want for a batch?
filter_fields = {
'experiment_id': ['exact']
}
interfaces = (graphene.relay.Node, )
def resolve_data_points(self, info, **kwargs):
try:
data_points_for_experiment = DataPoint.objects.filter(experiment_id=self.experiment_id)
data_points = []
for data_point in data_points_for_experiment:
value = data_point.numeric_value
measured_at = data_point.measured_at
attribute = data_point.measurement_type_id
data_points.append({
"value": value,
"measured_at": measured_at,
"attribute": attribute
})
return data_points
except Exception:
return None
Here is my query class
class Query(graphene.ObjectType):
all_batches = DjangoFilterConnectionField(BatchNode)
When I run the query, as expected, it takes a very long time to fetch all 35,000 data points (longer than 10 seconds). I'm still pretty new to Graphene and I'm wondering what the best way would be to fetch all 35,000 points in a fast way. I heard pagination could possibly resolve my issue, but I'm not sure how I would implement such pagination method within Graphene/Django. I know Relay provides a cursor-based pagination, but it limits the maximum amount of values per page to 100.
Any help would be appreciate, and please let me know if I need to provide more code for more clarification.
Thanks!

Selecting column by string in Google App Engine NDB

I have an entity in Google App Engine as below:
class HesapKalemi(ndb.Model):
hk=ndb.IntegerProperty(indexed=True)
ha=ndb.StringProperty(indexed=True)
A=ndb.FloatProperty(default=0.00)
B=ndb.FloatProperty(default=0.00)
C=ndb.FloatProperty(default=0.00)
F=ndb.FloatProperty(default=0.00)
G=ndb.FloatProperty(default=0.00)
H=ndb.FloatProperty(default=0.00)
I=ndb.FloatProperty(default=0.00)
J=ndb.FloatProperty(default=0.00)
DG=ndb.FloatProperty(default=0.00)
As known, the normal query can be below:
sektorkodu=self.request.get('sektorkodu')
qall=HesapKalemi.query().order(HesapKalemi.hk)
for hesap in qall:
hesap.ho=hesap.A
Is there any way to fetching A column by writing this way:
hesap.GETTHECOLUMN('A') or
hesap.GETTHECOLUMN(sektorkodu)
I have a very horizontal table and want to query it without if-else structure by the .GETTHECOLUMN('string') method.
Is there this kind of method?
In the NDB world, this is called Projection, or a Projection Query. In that link to the docs, you'll see the following:
Projection queries are similar to SQL queries of the form:
SELECT name, email, phone FROM CUSTOMER
So the .GETTHECOLUMN('A') method you're after would look like either of these:
qall_option_one = HesapKalemi.query().order(HesapKalemi.hk).fetch(projection=['A'])
qall_option_two = HesapKalemi.query().order(HesapKalemi.hk).fetch(projection=[HasepKalemi.A])
# to access the values
for hesap in qall_option_one:
print hesap
# output:
# HesapKalemi(key=Key('HesapKalemi', 1234567890), A=0.00, _projection=('A',))
# HesapKalemi(key=Key('HesapKalemi', 1234567891), A=0.00, _projection=('A',))
# ...
This is a bit faster than getting the full entities with all of their properties, but you do still have to iterate through them afterwards, even if you want to just generate a list of the 'A' values. Another option you should look at is "Calling a Function For Each Entity (Mapping)", where you define a callback function to be called on each entity as the query runs. So let's say you just want a list of the 'A' values. You could form that list like this:
def callback(hesap):
return hesap.A
a_values = HesapKelami.query().map(callback)
# a_values = [0.00, 0.00, ...]
If you're really after performance, look into asynchronous gets.
Note: instead of projection, you could use GQL, but that would look messier/more confusing than using projection with the regular ndb Query syntax IMO.
Edit: To answer your question in your comment, you can use either projection or mapping to select data from multiple properties.
Projection of multiple properties:
qall_option_one = HesapKalemi.query().order(HesapKalemi.hk).fetch(projection=['A', 'B', 'C'])
qall_option_two = HesapKalemi.query().order(HesapKalemi.hk).fetch(projection=[HesapKalemi.A, HesapKalemi.B, HesapKalemi.C])
# to access the values
for hesap in qall_option_one:
print hesap
# output:
# HesapKalemi(key=Key('HesapKalemi', 1234567890), A=0.00, B=0.00, C=0.00 _projection=('A', 'B', 'C',))
# HesapKalemi(key=Key('HesapKalemi', 1234567891), A=0.00, B=0.00, C=0.00 _projection=('A', 'B', 'C',))
# ...
Mapping to return multiple properties:
def callback(hesap):
# this returns a tuple of A,B,C values
return hesap.A, hesap.B, hesap.C
values = HesapKelami.query().map(callback)
# values is a list of tuples
# values = [(0.00, 0.00, 0.00), (0.00, 0.00, 0.00), ...]
Edit #2: After rereading the question and comments, I think your question, or at least part of it, may be how to get the property from the model itself using a string, and not how to pull one column out of the datastore. To answer that question, use getattr(hesap, "property_name"), or, and this may be more suited to your needs, turn hesap into a dict with hesap_dict = hesap.to_dict(). Then you could do this:
property_name = 'some_string'
hesap = HesapKelami.query().fetch(1)[0]
hesap_dict = hesap.to_dict()
property_value = hesap_dict.get(property_name, None)
You could pass hesap_dict to your Jinja2 template, and then I think you could accomplish what you asked about in your comments.

Querying objects using attribute of member of many-to-many

I have the following models:
class Member(models.Model):
ref = models.CharField(max_length=200)
# some other stuff
def __str__(self):
return self.ref
class Feature(models.Model):
feature_id = models.BigIntegerField(default=0)
members = models.ManyToManyField(Member)
# some other stuff
A Member is basically just a pointer to a Feature. So let's say I have Features:
feature_id = 2, members = 1, 2
feature_id = 4
feature_id = 3
Then the members would be:
id = 1, ref = 4
id = 2, ref = 3
I want to find all of the Features which contain one or more Members from a list of "ok members." Currently my query looks like this:
# ndtmp is a query set of member-less Features which Members can point to
sids = [str(i) for i in list(ndtmp.values('feature_id'))]
# now make a query set that contains all rels and ways with at least one member with an id in sids
okmems = Member.objects.filter(ref__in=sids)
relsways = Feature.geoobjects.filter(members__in=okmems)
# now combine with nodes
op = relsways | ndtmp
This is enormously slow, and I'm not even sure if it's working. I've tried using print statements to debug, just to make sure anything is actually being parsed, and I get the following:
print(ndtmp.count())
>>> 12747
print(len(sids))
>>> 12747
print(okmems.count())
... and then the code just hangs for minutes, and eventually I quit it. I think that I just overcomplicated the query, but I'm not sure how best to simplify it. Should I:
Migrate Feature to use a CharField instead of a BigIntegerField? There is no real reason for me to use a BigIntegerField, I just did so because I was following a tutorial when I began this project. I tried a simple migration by just changing it in models.py and I got a "numeric" value in the column in PostgreSQL with format 'Decimal:( the id )', but there's probably some way around that that would force it to just shove the id into a string.
Use some feature of Many-To-Many Fields which I don't know abut to more efficiently check for matches
Calculate the bounding box of each Feature and store it in another column so that I don't have to do this calculation every time I query the database (so just the single fixed cost of calculation upon Migration + the cost of calculating whenever I add a new Feature or modify an existing one)?
Or something else? In case it helps, this is for a server-side script for an ongoing OpenStreetMap related project of mine, and you can see the work in progress here.
EDIT - I think a much faster way to get ndids is like this:
ndids = ndtmp.values_list('feature_id', flat=True)
This works, producing a non-empty set of ids.
Unfortunately, I am still at a loss as to how to get okmems. I tried:
okmems = Member.objects.filter(ref__in=str(ndids))
But it returns an empty query set. And I can confirm that the ref points are correct, via the following test:
Member.objects.values('ref')[:1]
>>> [{'ref': '2286047272'}]
Feature.objects.filter(feature_id='2286047272').values('feature_id')[:1]
>>> [{'feature_id': '2286047272'}]
You should take a look at annotate:
okmems = Member.objects.annotate(
feat_count=models.Count('feature')).filter(feat_count__gte=1)
relsways = Feature.geoobjects.filter(members__in=okmems)
Ultimately, I was wrong to set up the database using a numeric id in one table and a text-type id in the other. I am not very familiar with migrations yet, but as some point I'll have to take a deep dive into that world and figure out how to migrate my database to use numerics on both. For now, this works:
# ndtmp is a query set of member-less Features which Members can point to
# get the unique ids from ndtmp as strings
strids = ndtmp.extra({'feature_id_str':"CAST( \
feature_id AS VARCHAR)"}).order_by( \
'-feature_id_str').values_list('feature_id_str',flat=True).distinct()
# find all members whose ref values can be found in stride
okmems = Member.objects.filter(ref__in=strids)
# find all features containing one or more members in the accepted members list
relsways = Feature.geoobjects.filter(members__in=okmems)
# combine that with my existing list of allowed member-less features
op = relsways | ndtmp
# prove that this set is not empty
op.count()
# takes about 10 seconds
>>> 8997148 # looks like it worked!
Basically, I am making a query set of feature_ids (numerics) and casting it to be a query set of text-type (varchar) field values. I am then using values_list to make it only contain these string id values, and then I am finding all of the members whose ref ids are in that list of allowed Features. Now I know which members are allowed, so I can filter out all the Features which contain one or more members in that allowed list. Finally, I combine this query set of allowed Features which contain members with ndtmp, my original query set of allowed Features which do not contain members.

Best practices for manipulating database result sets in Python?

I am writing a simple Python web application that consists of several pages of business data formatted for the iPhone. I'm comfortable programming Python, but I'm not very familiar with Python "idiom," especially regarding classes and objects. Python's object oriented design differs somewhat from other languages I've worked with. So, even though my application is working, I'm curious whether there is a better way to accomplish my goals.
Specifics: How does one typically implement the request-transform-render database workflow in Python? Currently, I am using pyodbc to fetch data, copying the results into attributes on an object, performing some calculations and merges using a list of these objects, then rendering the output from the list of objects. (Sample code below, SQL queries redacted.) Is this sane? Is there a better way? Are there any specific "gotchas" I've stumbled into in my relative ignorance of Python? I'm particularly concerned about how I've implemented the list of rows using the empty "Record" class.
class Record(object):
pass
def calculate_pnl(records, node_prices):
for record in records:
try:
# fill RT and DA prices from the hash retrieved above
if hasattr(record, 'sink') and record.sink:
record.da = node_prices[record.sink][0] - node_prices[record.id][0]
record.rt = node_prices[record.sink][1] - node_prices[record.id][1]
else:
record.da = node_prices[record.id][0]
record.rt = node_prices[record.id][1]
# calculate dependent values: RT-DA and PNL
record.rtda = record.rt - record.da
record.pnl = record.rtda * record.mw
except:
print sys.exc_info()
def map_rows(cursor, mappings, callback=None):
records = []
for row in cursor:
record = Record()
for field, attr in mappings.iteritems():
setattr(record, attr, getattr(row, field, None))
if not callback or callback(record):
records.append(record)
return records
def get_positions(cursor):
# get the latest position time
cursor.execute("SELECT latest data time")
time = cursor.fetchone().time
hour = eelib.util.get_hour_ending(time)
# fetch the current positions
cursor.execute("SELECT stuff FROM atable", (hour))
# read the rows
nodes = {}
def record_callback(record):
if abs(record.mw) > 0:
if record.id: nodes[record.id] = None
return True
else:
return False
records = util.map_rows(cursor, {
'id': 'id',
'name': 'name',
'mw': 'mw'
}, record_callback)
# query prices
for node_id in nodes:
# RT price
row = cursor.execute("SELECT price WHERE ? ? ?", (node_id, time, time)).fetchone()
rt5 = row.lmp if row else None
# DA price
row = cursor.execute("SELECT price WHERE ? ? ?", (node_id, hour, hour)).fetchone()
da = row.da_lmp if row else None
# update the hash value
nodes[node_id] = (da, rt5)
# calculate the position pricing
calculate_pnl(records, nodes)
# sort
records.sort(key=lambda r: r.name)
# return the records
return records
The empty Record class and the free-floating function that (generally) applies to an individual Record is a hint that you haven't designed your class properly.
class Record( object ):
"""Assuming rtda and pnl must exist."""
def __init__( self ):
self.da= 0
self.rt= 0
self.rtda= 0 # or whatever
self.pnl= None #
self.sink = None # Not clear what this is
def setPnl( self, node_prices ):
# fill RT and DA prices from the hash retrieved above
# calculate dependent values: RT-DA and PNL
Now, your calculate_pnl( records, node_prices ) is simpler and uses the object properly.
def calculate_pnl( records, node_prices ):
for record in records:
record.setPnl( node_prices )
The point isn't to trivially refactor the code in small ways.
The point is this: A Class Encapsulates Responsibility.
Yes, an empty-looking class is usually a problem. It means the responsibilities are scattered somewhere else.
A similar analysis holds for the collection of records. This is more than a simple list, since the collection -- as a whole -- has operations it performs.
The "Request-Transform-Render" isn't quite right. You have a Model (the Record class). Instances of the Model get built (possibly because of a Request.) The Model objects are responsible for their own state transformations and updates. Perhaps they get displayed (or rendered) by some object that examines their state.
It's that "Transform" step that often violates good design by scattering responsibility all over the place. "Transform" is a hold-over from non-object design, where responsibility was a nebulous concept.
Have you considered using an ORM? SQLAlchemy is pretty good, and Elixir makes it beautiful. It can really reduce the ammount of boilerplate code needed to deal with databases. Also, a lot of the gotchas mentioned have already shown up and the SQLAlchemy developers dealt with them.
Depending on how much you want to do with the data you may not need to populate an intermediate object. The cursor's header data structure will let you get the column names - a bit of introspection will let you make a dictionary with col-name:value pairs for the row.
You can pass the dictionary to the % operator. The docs for the odbc module will explain how to get at the column metadata.
This snippet of code to shows the application of the % operator in this manner.
>>> a={'col1': 'foo', 'col2': 'bar', 'col3': 'wibble'}
>>> 'Col1=%(col1)s, Col2=%(col2)s, Col3=%(col3)s' % a
'Col1=foo, Col2=bar, Col3=wibble'
>>>
Using a ORM for an iPhone app might be a bad idea because of performance issues, you want your code to be as fast as possible. So you can't avoid boilerplate code. If you are considering a ORM, besides SQLAlchemy I'd recommend Storm.

Categories

Resources