Creating a Generative Search with Elixir

Creating a Generative Search with Elixir - python

I'm stuck using Elixir and I currently have a really messy way of searching through a database that i'd like to improve. The documentation provides insight on how to do a basic generative search but I need to step through many different classes and i'd prefer to use Elixir rather than scanning through the list myself.
Here's an example:
Class Student:
hobby = Field(String)
additional_info = OneToOne('AdditionalInformation', inverse='student')
user_profile = OneToOne('UserProfile', inverse='student')
Class AdditionalInformation:
state = Field(String)
city = Field(String)
student = OneToOne('Student', inverse='additional_info')
Class UserProfile:
username = Field(String)
date_signed_up = Field(DateTime)
student = OneToOne('Student', inverse = 'user_profile')
In this example, i'd like to find all students that:
Signed up after 2008
Are from California
Have "video games" as their hobby
I'm thinking there should be a way for me to go:
result = UserProfile.query.filter_by(date_signed_up>2008)
result.query.filter_by(UserProfile.student.hobby='blabla')
result.query....
Currently i'm putting them into a list and looking for a set.

I haven't used Elixir, but I have used SQLAlchemy. I don't think you can do what you want given that current setup. As far as I know, there is no way to filter by relationships directly.
It's unclear whether you're creating new tables or dealing with existing ones, so I'm just going to throw some info at you and hope some of it is helpful.
You can join tables together in SQLAlchemy (assuming there's a foreign key called student_id on UserProfile). This would give you all students who signed up since 2008.
result = Student.query.join(UserProfile).filter(Student.id==UserProfile.student_id).filter(UserProfile.date_signed_up>2008).all()
You can chain .filter() together like I did above, or you can pass multiple args to them. I find this especially useful for dealing with unknown numbers of filters, like you'd get from a search form.
conditions = [UserProfile.date_signed_up>2008]
if something_is_true:
conditions.append(UserProfile.username=="foo")
result = Student.query.join(UserProfile).filter(Student.id==UserProfile.student_id).filter(and_(*conditions)).all()
There's also more complex stuff you can do with hybrid properties, but that doesn't seem appropriate here.

Related

Django: prefetch_related() with m2m through relationship v2

I know there is already a similar question, but I think my case is a bit more complicated because I have a different entry point.
These are my models:
class m_Interaction(models.Model):
fk_ip = models.ForeignKey('m_IP', related_name="interactions")
class m_User(models.Model):
name = models.CharField(max_length=200)
class m_IP(models.Model):
fk_user = models.ForeignKey('m_User', related_name="ips" )
class m_Feature(models.Model):
name = models.CharField(max_length=200)
m2m_interaction = models.ManyToManyField(m_Interaction, related_name='features', through='m_Featurescore')
class m_Featurescore(models.Model):
score = models.FloatField(null=False)
fk_interaction = models.ForeignKey(m_Interaction, related_name='featurescore')
fk_feature = models.ForeignKey(m_Feature, related_name='featurescore')
I start with m_User, follow the reverse relationship over m_IP to the Interactions (m_Interaction). Then I want to get every m_Featurescore.score for each Interaction for a specific instance of m_Feature.
My working query to access at least all interactions in a performant way:
m_User.objects.all().prefetch_related('ips__interactions')
But I can't figure out the correct 'prefetch_related'-statement to access the m_Featurescore.score like this
db_obj_interaction.featurescore.get(fk_feature=db_obj_feature).score
without making a lot of queries.
I already tried almost all combinations of the following:
'ips__interactions__features__featurescore'
Any suggestions?

I found the answer to my own question with the help of noamk in the comments:
I didn't consider that the get()-method in db_obj_interaction.featurescore.get(fk_feature=db_obj_feature).score will issue a new query everytime it's called (it's kinda obvious now).
Therefore I simply restructured my code and now I don't need get() anymore and can use the benefit of the prefetch.
If somebody still needs to filter the Prefetch()-object should be used as suggested by noamk

GAE datastore key usage for referenceproperty prior to .put()

This seems like a simple question, however wanted something more clear than what I'm doing currently:
Given tables like these (example only):
class People(db.Model):
FirstName = db.StringProperty(multiline=False,required=True)
LastName = db.StringProperty(multiline=False,required=True)
class Animals(db.Model):
AnimalName = db.StringProperty(multiline=False,required=True)
class SpiritAnimal(db.Model):
Person = db.ReferenceProperty(Candidates,required=True)
Animal = db.ReferenceProperty(Candidates,required=True)
There exists a way to fill in 'Person' and 'Animal' using queries to the other two tables like so (example only):
# Query for some person(s)
query = People.all()
query.filter('FirstName', 'Patrick')
query.get()
for person in query:
newSpiritAnimal = SpiritAnimal(
Person = person,
Animal = animal # Assuming pulled previously
)
newSpiritAnimal.put()
Also you can just grab keys, however here is where my question comes into play:
Based off a query such as above, can you just pull the key and use later? Of course you can, but what's the best method to do so?
Let's think about this example:
for person in query:
key_for_later_use = person.key()
Now we can use:
Person = key_for_later_use
One would assume correct? Except this person.key() object doesn't seem to be doing the trick so I looked into it more:
str(person.key())
This provides a key that looks like what you would see in the GAE SDK Console when viewing the 'Datastore Viewer' thus potentially useful, but not having luck with that either.
What's the best way to grab a key off a query, potentially when iterating via for loop?
I've been trying to offload datastore queries by creating a list which I check for something existing, then grab from another list the key:
people_list = [] # Assume populated with 'FirstName'
people_list_keys = [] # Assume populated with person.key()
if 'Patrick' in people_list:
patrick_key = people_list_keys[people.index('Patrick')]
However person.key() doesn't really work, str() around that looks right but doesn't work right.. and by that I mean using that as SpiritAnimal.Person on insert for the ReferenceProperty.
Thoughts?
Oh and I'm seriously not making a SpiritAnimal application, this is all just examples ;)

There might be a disconnect elsewhere, I ran this code:
People(FirstName="Patrick", LastName="Doe").put()
animal = Animals(AnimalName="Tiger").put()
people_list = []
people_list_keys = []
query = People.all()
query.filter('FirstName', 'Patrick')
query.get()
for person in query:
people_list.append(person.FirstName)
people_list_keys.append(person.key())
patrick_key = people_list_keys[people_list.index('Patrick')]
newSpiritAnimal = SpiritAnimal(
Person = patrick_key,
Animal = animal
)
newSpiritAnimal.put()
And the Spirit Animal was 'put' no problem. I don't quite get what your trying to do. Perhaps a little more explanation and I can help a bit more.

Python Model with ReferenceProperty and join table

I believe this is trival but fairly new to Python.
I am trying to create a model using google app engine.
Basically from a E/R point of view
I have 2 objects with a join table (the join table captures the point in time of the join)
Something like this
Person | Idea | Person_Idea
-------------------------------
person.key idea.key person.key
idea.key
date_of_idea
my Python code would look like
class Person (db.Model):
#some properties here....
class Idea(db.Model):
#some properties here....
class IdeaCreated(db.Model):
person= db.ReferenceProperty(Person)
idea= db.ReferenceProperty(Idea)
created = db.DateTimeProperty(auto_now_add = True)
What I want to be able to do is have a convient way to get all ideas a person has (bypass idea created objects) -sometimes I will need the list of ideas directly.
The only way I can think to do this is to add the follow method on the User class
def allIdeas(self):
ideas = []
for ideacreated in self.ideacreated_set:
ideas.append(ideacreated.idea)
return ideas
Is this the only way to do this? I is there a nicer way that I am missing?
Also assuming I could have a GQL and bypass hydrating the ideaCreated instances (not sure the exact syntax) but putting a GQL query smells wrong to me.

you should use the person as an ancestor/parent for the idea.
idea = Idea(parent=some_person, other_field=field_value).put()
then you can query all ideas where some_person is the ancestor
persons_ideas = Idea.all().ancestor(some_person_key).fetch(1000)
the ancestor key will be included in the Idea entities key and you won't be able to change that the ancestor once the entity is created.
i highly suggest you to use ndb instead of db https://developers.google.com/appengine/docs/python/ndb/
with ndb you could even use StructuredProperty or LocalStructuredProperty
https://developers.google.com/appengine/docs/python/ndb/properties#structured
EDIT:
if you need a many to many relationship look in to ListProperties and store the Persons keys in that property. then you can query for all Ideas with that Key in that property.
class Idea(db.Model):
person = db.StringListProperty()
idea = Idea(person = [str(person.key())], ....).put()
add another person to the idea
idea.person.append(str(another_person.key())).put()
ideas = Idea.filter(person=str(person.key())).fetch(1000)
look into https://developers.google.com/appengine/docs/python/datastore/typesandpropertyclasses#ListProperty

How to handle data correction in Django (but this isn't Django-specific!)

I have a Django application that gathers information about composers (in the musical sense) from various sources - APIs, HTTP POSTs, scraping, and so on.
Once this information is aggregated, it's not very high quality. So you might have "J S Bach" in one place, "J. S. Bach" in another, and various other mistakes. This leads to several rows in my table that represent the same person.
I want to eliminate these duplicates, by making "J. S. Bach" the canonical version, and have it so that if we ever see "J S Bach", we know to correct it. In reality, there are quite a lot of variations, but I'm happy for the process of correction to be a manual one with human input.
So my question is, what's the best way to represent this in code? At the moment, my model is:
class Composer(models.Model):
name = models.CharField(max_length=100)
Should I:
Have a new ComposerCorrection model, that maps composer_id to canonical_id?
Add an optional canonical_id to the Composer model?
Some other thing I've not considered?
It's also worth mentioning that there are other relationships that involve composer, such as a Work belonging to a Composer. When a correction happens, these IDs would also need to be re-pointed somehow, but I think that's not part of the main problem here.
Let me know if you'd like any more information!

Adding on to VascoP's answer (I'd make this a cmoment but there's a little too much code in it), you could store his replace_dic in the database so that you can add corrections through e.g. the Django admin, without having to change any code. This might look like:
class ComposerCorrection(models.Model):
wrong_name = models.CharField(max_length=100, unique=True)
canonical_name = models.CharField(max_length=100)
def correct_name(name):
try:
return ComposerCorrection.objects.get(wrong_name=name).canonical_name
except ComposerCorrection.DoesNotExist:
return name
Then you can put correct_name in the save() method of Composer (or as a pre-save signal), and also add VascoP's correctComposer function as a post-save signal for ComposerCorrection objects, so that adding a new one will fix the database without having to do anything else.

When you find a wrongly named Composer you should update these relationships and remove the wrongly named Composer:
def correctComposer(canonical_composer_name, wrong_composer_name):
canonical_composer = Composer.objects.get(name__exact=canonical_composer_name)
wrong_composer = Composer.objects.get(name__exact=wrong_composer_name)
# repeat this for each relationship
work = wrong_composer.work_set.all()
for entry in work:
entry.composer = canonical_composer
correction.save()
wrong_composer.delete()
EDIT: That works for previously inserted Composers. For auto-correcting upon insertion a different method could be used since we don't need to create new composers if there's already a canonical composer that suits him.
For this you can keep a dictionary (which should be kept near the model for readability) of frequent mistakes and a correcNames function:
replace_dic = {
'motzart' : 'Mozart',
'j s bach' : 'J. S. Bach'
}
def correctNames(name, dic):
return dic.get(name.lower(), name)
By making keys lowercase you get case-insensitive replacement which is kind of a bonus.
And then you might override the Composer save method like this:
def save(self, *args, **kwargs):
self.name = correctNames(self.name, replace_dic)
super(Composer, self).save()

If Composer only contains name before finishing data collection, for simplicity, I may choose not to normalize composer name to Composer at first, but store them in Work instance directly. Just as
class Work(models.Model):
composer_name = models.CharField(max_length=100)
...
And manually filter by composer name and perform batch update in the admin changelist of Work, w/ help of filter and action.
You could then create Composer instances and link Work instance to them, or even use composer_name as primary key of Composer..

What is the proper model to reduce logic in this situation?

I am setting up a model where two players are involved in a competition. I'm leaning towards this model:
def match(models.Model):
player = ForeignKey(Player)
opponent = ForeignKey(Player)
score = PositiveSmallIntegerField()
games_won = PositiveSmallIntegerField()
games_lost = PositiveSmallIntegerField()
won_match = BooleanField()
There are statistics involved, however, and it would require another pull to find the matching record for the opponent if I want to describe the match in full.
Alternatively I could set up the model to include full stats:
def match(models.Model):
home_player = ForeignKey(Player)
away_player = ForeignKey(Player)
home_player_score = PositiveSmallIntegerField()
away_player_score = PositiveSmallIntegerField()
...
But that seems equally bad, as I would have to do two logic sets for one player (to find his scores when he's home_player and his scores when he's away_player).
The final option is to do two inserts per match, both with full stats, and keep redundant data in the table.
There seems like a better way, and therefore I poll SO.

Id go with the first model and use select_related() to avoid the extra db calls.

If you're looking to reduce redundancy and maintain consistiency of logic...
Match:
- id
- name
Match_Player: (2 records per match)
- match_id
- player_id
- is_home
Match_Player_Score:
- match_id
- player_id
- score

I'd avoid having redundant data in the database. This leaves open the possibility of the database data getting internally inconsistent and messing up everything.
Use a single entry per match, as in your second example. If you plan ahead, you can accomplish the two sets of logic pretty easily. Take a look at proxy models. There might be an elegant way to do this -- have all of your logic refer to the data fields through accessors like get_my_score and get_opponent_score. Then build a Proxy Model class which swaps home and away.
class match(models.Model):
def get_my_score(self):
return self.home_player_score
def get_opponent_score(self):
return self.away_player_score
def did_i_win(self):
return self.get_my_score() > self.get_opponent_score()
class home_player_match(match):
class Meta:
proxy = True
def get_my_score(self):
return self.away_player_score
def get_opponent_score(self):
return self.home_player_score
Or maybe you want two Proxy models, and have the names in the base model class be neutral. The problem with this approach is that I don't know how to convert a class from one proxy model to another without reloading from the database. You want a "rebless" as in perl. You could do this by using containment rather than inheritance. Or maybe just a flag in the wrapper class (not stored in the database) saying whether or not to swap fields. But I'd recommend some solution like that -- solve the tricky stuff in code and don't let the database get inconsistent.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating a Generative Search with Elixir - python

Related

Django: prefetch_related() with m2m through relationship v2

GAE datastore key usage for referenceproperty prior to .put()

Python Model with ReferenceProperty and join table

How to handle data correction in Django (but this isn't Django-specific!)

What is the proper model to reduce logic in this situation?

Categories

Resources