I have the following models:
class Member(models.Model):
ref = models.CharField(max_length=200)
# some other stuff
def __str__(self):
return self.ref
class Feature(models.Model):
feature_id = models.BigIntegerField(default=0)
members = models.ManyToManyField(Member)
# some other stuff
A Member is basically just a pointer to a Feature. So let's say I have Features:
feature_id = 2, members = 1, 2
feature_id = 4
feature_id = 3
Then the members would be:
id = 1, ref = 4
id = 2, ref = 3
I want to find all of the Features which contain one or more Members from a list of "ok members." Currently my query looks like this:
# ndtmp is a query set of member-less Features which Members can point to
sids = [str(i) for i in list(ndtmp.values('feature_id'))]
# now make a query set that contains all rels and ways with at least one member with an id in sids
okmems = Member.objects.filter(ref__in=sids)
relsways = Feature.geoobjects.filter(members__in=okmems)
# now combine with nodes
op = relsways | ndtmp
This is enormously slow, and I'm not even sure if it's working. I've tried using print statements to debug, just to make sure anything is actually being parsed, and I get the following:
print(ndtmp.count())
>>> 12747
print(len(sids))
>>> 12747
print(okmems.count())
... and then the code just hangs for minutes, and eventually I quit it. I think that I just overcomplicated the query, but I'm not sure how best to simplify it. Should I:
Migrate Feature to use a CharField instead of a BigIntegerField? There is no real reason for me to use a BigIntegerField, I just did so because I was following a tutorial when I began this project. I tried a simple migration by just changing it in models.py and I got a "numeric" value in the column in PostgreSQL with format 'Decimal:( the id )', but there's probably some way around that that would force it to just shove the id into a string.
Use some feature of Many-To-Many Fields which I don't know abut to more efficiently check for matches
Calculate the bounding box of each Feature and store it in another column so that I don't have to do this calculation every time I query the database (so just the single fixed cost of calculation upon Migration + the cost of calculating whenever I add a new Feature or modify an existing one)?
Or something else? In case it helps, this is for a server-side script for an ongoing OpenStreetMap related project of mine, and you can see the work in progress here.
EDIT - I think a much faster way to get ndids is like this:
ndids = ndtmp.values_list('feature_id', flat=True)
This works, producing a non-empty set of ids.
Unfortunately, I am still at a loss as to how to get okmems. I tried:
okmems = Member.objects.filter(ref__in=str(ndids))
But it returns an empty query set. And I can confirm that the ref points are correct, via the following test:
Member.objects.values('ref')[:1]
>>> [{'ref': '2286047272'}]
Feature.objects.filter(feature_id='2286047272').values('feature_id')[:1]
>>> [{'feature_id': '2286047272'}]
You should take a look at annotate:
okmems = Member.objects.annotate(
feat_count=models.Count('feature')).filter(feat_count__gte=1)
relsways = Feature.geoobjects.filter(members__in=okmems)
Ultimately, I was wrong to set up the database using a numeric id in one table and a text-type id in the other. I am not very familiar with migrations yet, but as some point I'll have to take a deep dive into that world and figure out how to migrate my database to use numerics on both. For now, this works:
# ndtmp is a query set of member-less Features which Members can point to
# get the unique ids from ndtmp as strings
strids = ndtmp.extra({'feature_id_str':"CAST( \
feature_id AS VARCHAR)"}).order_by( \
'-feature_id_str').values_list('feature_id_str',flat=True).distinct()
# find all members whose ref values can be found in stride
okmems = Member.objects.filter(ref__in=strids)
# find all features containing one or more members in the accepted members list
relsways = Feature.geoobjects.filter(members__in=okmems)
# combine that with my existing list of allowed member-less features
op = relsways | ndtmp
# prove that this set is not empty
op.count()
# takes about 10 seconds
>>> 8997148 # looks like it worked!
Basically, I am making a query set of feature_ids (numerics) and casting it to be a query set of text-type (varchar) field values. I am then using values_list to make it only contain these string id values, and then I am finding all of the members whose ref ids are in that list of allowed Features. Now I know which members are allowed, so I can filter out all the Features which contain one or more members in that allowed list. Finally, I combine this query set of allowed Features which contain members with ndtmp, my original query set of allowed Features which do not contain members.
Related
I want to get get account Ids that will be associated with determined list of ids, currently I filter by one exactly id and I would like to input various Ids so I can get a Wider result.
My code:
from typing import List
from project import models
def get_followers_ids(system_id) -> List[int]:
return list(models.Mapper.objects.filter(system_id__id=system_id
).values_list('account__id', flat=True))
If I run the code, I get the Ids associated with the main ID, the output will be a list of ids related to the main one (let's say, "with connection to"):
Example use:
system_id = 12350
utility_ids = get_followers_ids(system_id)
print(utility_ids)
output:
>>> [14338, 14339, 14341, 14343, 14344, 14346, 14347, 14348, 14349, 14350, 14351]
But I would like to input more variables avoiding to fell in a for loop, which will be slow because it will do many requests to the server.
The input I would like to use is a list or similar, it should be able to input various arguments at a time.
And the output should be a list of relations (doing the least number of requests to DB), example
if id=1 is related to [3,4,5,6]
and if id=2 is related to [5,6,7,8]
The output should be [3,4,5,6,7,8]
you can use the Field lookups, in your case use "In" lookup
so:
# system_ids is now a list
def get_followers_ids(system_ids) -> List[int]:
# here add in the end in filter field the "__in"
return list(models.Mapper.objects.filter(system_id__id__in=system_ids
).values_list('account__id', flat=True))
system_ids = [12350, 666]
utility_ids = get_followers_ids(system_ids)
print(utility_ids)
I'm currently setting up a database for solar active regions and one of the columns is supposed to get the region number which, for now, I have declared in the following way:
noaa_number = sql.Column(sql.Integer, nullable=True)
However, since a new number may be assigned as the region evolves, which column type would better support a list to keep all the numbers that a given region is given? So instead of having an entry like:
noaa_number = 12443
I could have my result stored as:
#a simple entry
noaa_number = [12443]
#or multiple results
noaa_number = [12444,12445]
Where these elements in the list would be integers.
I was checking the documentation and the best idea I had was to place this column as a string and parse all the numbers out of it. While that would work just fine I was wondering if there is a better and more appropriate way of doing so.
In some cases you can use array column. This is really not bad way to store very specific data. Example:
class Example(db.Model):
id = db.Column(db.Integer, primary_key=True)
my_array = db.Column(db.ARRAY(db.Integer())
# You can easily find records:
# Example.my_array.contains([1, 2, 3]).all()
# You can use text items of array
# db.Column(db.ARRAY(db.Text())
Also you can use CompositeArray (sqlalchemy_utils) to use custom database types as array items. Example:
# let's imagine that we have some meta history
history = db.Column(
CompositeArray(
CompositeType(
'history',
[
db.Column('message', db.Text),
]
)
)
# example of history type:
CREATE TYPE history AS (
message text
);
Note! Not sure about sqlite but with postgres should works
fine.
Hope this helps.
I have this working but I'm sure there must be a better method
The context is a movie/television app so there are titles (movies/tv) and people who act in each, many to many relationship.
I have a "titlepeople" model with information such as:
id, people_fk, title_fk, role_title
On movies where a cast member has alot of roles I need to display their information like:
Tom Hanks: Gardener, Police Man #1, Another Role #4
Is there anyway I can optimize the below way of doing this so the code isn't so lengthy?
cast_unique = list()
for person in cast:
#if not in the unique list, add them
if person.people not in [p.people for p in cast_unique]:
cast_unique.append(person)
else:
# if in the list, append the role information
if person.role_title:
for c in cast_unique:
if c.people == person.people:
# append role info
c.role_title = '{0} / {1}'.format(c.role_title, person.role_title)
Thanks
You should change cast_unique to be a dictionary where you use the cast member as the key. This will allow much greater performance because you won't have to iterate the cast_unique iterable.
Also, your use a list comprehension in the if person.people not in [p.people for p in cast_unique]: requires an entire list to be create of people for every iteration for the test; which, could use a lot of memory plus there's no way to short circuit the list comprehension when a match occurs. Still a dictionary is a much better data type to use for this situation.
cast_unique = {}
for person in cast:
if person.people not in cast_unique:
cast_unique[person.people] = person
else:
cast_unique[person.people].role_title = '{0} / {1}'.format(cast_unique[person.people].role_title, person.role_title)
I have a Django project that has two models: Group and Person. Groups can contain either Person objects or other Group objects. Groups cannot form a cycle (i.e. Group A containing Group B containing Group A), which results in a tree structure where Person objects are leaves.
My question is - how can I count all the contained Group objects and Person objects within a high level Group (like the root Group) with as few SQL queries as possible?
A naive approach with O(N) (where N is # of subgroups) SQL queries would be:
def Group(models.Model):
name = models.CharField(max_length=150)
parent_group = models.ForeignKey('self', related_name=child_groups, null=True, blank=True)
# returns tuple (# of subgroups, # of person objects)
def count_objects(self):
count = (self.child_groups.count(), self.people.count())
for child_group in self.child_groups.all():
# this adds tuples together ( e.g: (1,2) and (1,2) make (2,4) )
tuple(map(operator.add, count, child_group.count_objects()))
def Person(models.Model):
user = models.ForeignKey(User)
picture = models.ImageSpecField(...)
group = models.ForeignKey('Group', related_name="people")
Is there a way to improve this or should I just store these values within the Group object?
So this is an existing problem that many others have tackled. If you're using Django, check this out:
http://django-mptt.github.com/django-mptt/index.html
Within Postgres you could use recursive queries, although there is no direct support for this in Django.
Alternatively you could consider denormalising the count, possibly there are libraries to do this. A quick google gave me: http://pypi.python.org/pypi/django-composition/
If you have to select the same values quite often and they don't change that much, you could try caching them.
I am writing a simple Python web application that consists of several pages of business data formatted for the iPhone. I'm comfortable programming Python, but I'm not very familiar with Python "idiom," especially regarding classes and objects. Python's object oriented design differs somewhat from other languages I've worked with. So, even though my application is working, I'm curious whether there is a better way to accomplish my goals.
Specifics: How does one typically implement the request-transform-render database workflow in Python? Currently, I am using pyodbc to fetch data, copying the results into attributes on an object, performing some calculations and merges using a list of these objects, then rendering the output from the list of objects. (Sample code below, SQL queries redacted.) Is this sane? Is there a better way? Are there any specific "gotchas" I've stumbled into in my relative ignorance of Python? I'm particularly concerned about how I've implemented the list of rows using the empty "Record" class.
class Record(object):
pass
def calculate_pnl(records, node_prices):
for record in records:
try:
# fill RT and DA prices from the hash retrieved above
if hasattr(record, 'sink') and record.sink:
record.da = node_prices[record.sink][0] - node_prices[record.id][0]
record.rt = node_prices[record.sink][1] - node_prices[record.id][1]
else:
record.da = node_prices[record.id][0]
record.rt = node_prices[record.id][1]
# calculate dependent values: RT-DA and PNL
record.rtda = record.rt - record.da
record.pnl = record.rtda * record.mw
except:
print sys.exc_info()
def map_rows(cursor, mappings, callback=None):
records = []
for row in cursor:
record = Record()
for field, attr in mappings.iteritems():
setattr(record, attr, getattr(row, field, None))
if not callback or callback(record):
records.append(record)
return records
def get_positions(cursor):
# get the latest position time
cursor.execute("SELECT latest data time")
time = cursor.fetchone().time
hour = eelib.util.get_hour_ending(time)
# fetch the current positions
cursor.execute("SELECT stuff FROM atable", (hour))
# read the rows
nodes = {}
def record_callback(record):
if abs(record.mw) > 0:
if record.id: nodes[record.id] = None
return True
else:
return False
records = util.map_rows(cursor, {
'id': 'id',
'name': 'name',
'mw': 'mw'
}, record_callback)
# query prices
for node_id in nodes:
# RT price
row = cursor.execute("SELECT price WHERE ? ? ?", (node_id, time, time)).fetchone()
rt5 = row.lmp if row else None
# DA price
row = cursor.execute("SELECT price WHERE ? ? ?", (node_id, hour, hour)).fetchone()
da = row.da_lmp if row else None
# update the hash value
nodes[node_id] = (da, rt5)
# calculate the position pricing
calculate_pnl(records, nodes)
# sort
records.sort(key=lambda r: r.name)
# return the records
return records
The empty Record class and the free-floating function that (generally) applies to an individual Record is a hint that you haven't designed your class properly.
class Record( object ):
"""Assuming rtda and pnl must exist."""
def __init__( self ):
self.da= 0
self.rt= 0
self.rtda= 0 # or whatever
self.pnl= None #
self.sink = None # Not clear what this is
def setPnl( self, node_prices ):
# fill RT and DA prices from the hash retrieved above
# calculate dependent values: RT-DA and PNL
Now, your calculate_pnl( records, node_prices ) is simpler and uses the object properly.
def calculate_pnl( records, node_prices ):
for record in records:
record.setPnl( node_prices )
The point isn't to trivially refactor the code in small ways.
The point is this: A Class Encapsulates Responsibility.
Yes, an empty-looking class is usually a problem. It means the responsibilities are scattered somewhere else.
A similar analysis holds for the collection of records. This is more than a simple list, since the collection -- as a whole -- has operations it performs.
The "Request-Transform-Render" isn't quite right. You have a Model (the Record class). Instances of the Model get built (possibly because of a Request.) The Model objects are responsible for their own state transformations and updates. Perhaps they get displayed (or rendered) by some object that examines their state.
It's that "Transform" step that often violates good design by scattering responsibility all over the place. "Transform" is a hold-over from non-object design, where responsibility was a nebulous concept.
Have you considered using an ORM? SQLAlchemy is pretty good, and Elixir makes it beautiful. It can really reduce the ammount of boilerplate code needed to deal with databases. Also, a lot of the gotchas mentioned have already shown up and the SQLAlchemy developers dealt with them.
Depending on how much you want to do with the data you may not need to populate an intermediate object. The cursor's header data structure will let you get the column names - a bit of introspection will let you make a dictionary with col-name:value pairs for the row.
You can pass the dictionary to the % operator. The docs for the odbc module will explain how to get at the column metadata.
This snippet of code to shows the application of the % operator in this manner.
>>> a={'col1': 'foo', 'col2': 'bar', 'col3': 'wibble'}
>>> 'Col1=%(col1)s, Col2=%(col2)s, Col3=%(col3)s' % a
'Col1=foo, Col2=bar, Col3=wibble'
>>>
Using a ORM for an iPhone app might be a bad idea because of performance issues, you want your code to be as fast as possible. So you can't avoid boilerplate code. If you are considering a ORM, besides SQLAlchemy I'd recommend Storm.